Ohio State University Extension Bulletin

Agronomic Crops Team On-Farm Research Projects 1998

Special Circular 166-99


A Word About Statistics

Why Statistics?

To assess the variability that is always present, and then make reasonable, mathematics-based guesses as to whether or not observed effects are due to chance or to treatments.

When we conclude that there is a reasonable chance that differences were in fact due to treatments, then we say treatments had a significant effect. This conclusion does not mean that we proved that the treatments caused differences, only that we are satisfied that our guess is probably correct.

When we are unable to draw the conclusion that treatments differed, we say that the treatments are not significantly different. This does not mean that treatments had no effect -- it simply says that our research trial was not able to detect such an effect. There are two possibilities here -- either the treatments really did not have an effect, or they did have an effect, but the experiment was not adequate to detect it.

Small effects are very difficult to prove. This is due to the fact that unexplained variation or "background noise" will usually "drown out" small effects. As a means to evaluate how well a particular trial was able to control unexplained variation, we use the Coefficient of Variation or CV. It is simply the standard deviation of all samples in a trial divided by the overall mean of all samples. It is usually expressed as a percentage of the overall mean. A goal for most field trials is to achieve a CV of 12% or less. The smaller the CV or "background noise" the easier it is to detect variation due to treatments. A trial having a CV of 5% and five to six replications of each treatment will have a reasonable chance of detecting a true 10% difference between treatment means.

What Does Probability Level Mean?

If we declare two averages are "significantly different" at 5% probability level or P = 0.05, we are saying that we are willing to make a mistake one out of 20 times if in fact they are truly equal. The 5% probability level is the standard used for most field trials. However, 5% may be too conservative or overly cautious for some farmer-researchers. In some on-farm research trials, it may be decided that a wrong decision may not be very costly. This could be the case where treatment costs are essentially the same, e.g., seed costs in variety comparisons. It may be decided to use a probability level of 10% if one is willing to make a mistake one out of 10 times, or 20% for a risk of one out of five.

Picking the probability level is a "decision rule." Increasing the sample size or replicates reduces the chances of making an incorrect decision when the same decision rule is applied.

In on-farm research trials, experience has shown that five to six replicates are usually needed to detect meaningful and real differences between treatments if they exist. Each treatment is represented at least once within each replicate. Replications may be located adjacent to each other within a single field or located in separate fields or farms.

Randomization of treatments within a replicate is important to avoid biased location of treatments. Having treatments in the same order in replicates across a field may cause bias due to soil fertility trends or soil moisture trends stretching across the field.

The F-Test and Least Significant Difference

A test for significance for differences between or among treatment means is the F-test. It is the ratio of the variation due to treatments divided by the variation of individual samples. Values close to one indicate there is little or no variation due to treatments. Values much larger than one indicate that variation due to treatments is larger than expected by chance alone.

If an F value for a trial is found to be significant and there are more than two treatments being analyzed, then further testing requires calculating another test for significance called the Least Significant Difference (LSD). The LSD helps to detect which pairs of treatment means are significantly different from each other. When a trial contains more than two treatments, it is sound statistical protocol to conduct an F-test before pairwise comparisons are made with an LSD. This procedure is referred to as Fisher's (protected) LSD. If a trial contains only two treatments, then using an F-test to find significance is equivalent to using LSD alone.

For most trials in this report, an F-statistic was calculated first. If treatments were found to be significantly different, then an LSD is usually reported in lieu of the F value.


Back | Table of Contents