Dr. Phil E. Rzewnicki, Ohio State University On-Farm Research Coordinator
Statistics are used to assess the variability that is always present, and then make reasonable, mathematics-based guesses as to whether or not observed effects are due to chance or to treatments.
When we conclude that there is a reasonable chance that differences were, in fact, due to treatments, then we say treatments had a significant effect. This conclusion does not mean that we proved that the treatments caused differences, only that we are satisfied that our guess is probably correct.
When we are unable to draw the conclusion that treatments differed, we say that the treatments are not significantly different. This does not mean that treatments had no effect — it simply says that our research trial was not able to detect such an effect. There are two possibilities here — either the treatments really did not have an effect, or they did have an effect, but the experiment was not adequately designed to detect it.
If we declare two averages are “significantly different” at 5% probability level or P = 0.05, we are saying that we are willing to make a mistake one out of 20 times if, in fact, they are truly equal. The 5% probability level is the standard used for most field trials. However, 5% may be too con-servative or overly cautious for some farmer-researchers. In some on-farm research trials, it may be decided that a wrong decision may not be very costly. This could be the case where treatment costs are essentially the same, e.g., seed costs in variety comparisons. It may be decided to use a probability level of 10% if one is willing to make a mistake one out of 10 times, or 20% for a risk of one out of five.
Selecting the probability level is a “decision rule.” Increasing the sample size or replicates reduces the chances of making an incorrect decision when the same decision rule is applied.
In on-farm research trials, experience has shown that five to six replicates are usually needed to detect meaningful and real differences between treatments if they exist. Each treatment is represented at least once within each replicate. Replications may be located adjacent to each other within a single field or located in separate fields or farms.
Randomization of treatments within a replicate is important to avoid biased location of treatments. Having treatments in the same order in replicates across a field may cause bias due to soil fertility trends or soil moisture trends stretching across the field.
A test for significance for differences between or among treatment means is the F-test. It is the ratio of the variation due to treatments divided by the variation of individual samples. Values close to one indicate there is little or no variation due to treatments. Values much larger than one indicate that variation due to treatments is larger than expected by chance alone.
If an F value for a trial is found to be significant and there are more than two treatments being analyzed, then further testing requires calculating another test for significance called the Least Significant Difference (LSD). The LSD helps to detect which pairs of treatment means are significantly different from each other. When a trial contains more than two treatments, it is sound statistical protocol to conduct an F-test before pairwise comparisons are made with LSD. This procedure is referred to as Fisher’s (protected) LSD.
Using LSD alone can lead to increasing error in making comparisons since the likelihood of declaring significant differences between any two treatments increases as more than two comparisons are made. If a trial contains only two treatments, then using an F-test to find significance is equivalent to using LSD alone.
For most trials in this report, an F-statistic was calculated first. If treatments were found to be significantly different, then LSD is sometimes reported in lieu of the F value.