When a state Extension specialist or Extension educator makes a presentation, the individual will occasionally make reference to “statistical significance” or some variant that alludes to statistical analysis and its use in determining treatment differences. So what is meant by statistical significance? Why should a producer, consultant or retailer care about statistics? Can the average of treatment effects be used alone to evaluate differences? These are often asked questions that need clarification. Within this fact sheet we will attempt to explain why researchers use statistics as a tool, why statistics are useful and necessary, and why it is difficult to draw meaningful conclusions from split-field, unreplicated data.
Useful Statistical Terms
Observation—a measurement that is made for some response(s) of interest (yield, plant stand, nutrient status, disease incidence, insect infestation, etc.).
Treatment—the controlled application of a process or product (seeding rate, fertilization rate, insecticide application, fungicide application, etc.) to an experimental plot that will hypothetically have an impact on a response(s) of interest (growth, yield, etc.). The number of treatments can vary with an experiment, but two treatments may be adequate to address more general questions of interest (“does this foliar application increase my yield?”, “does this new variety yield better on my farm than my current variety?”, etc.).
Experimental error—the uncontrolled variability encountered in the field. An alternative definition is differences in observations from treatments due to environmental conditions that cannot be controlled by the experimenter (differences in soil texture, topography, soil compaction, rainfall, nutrient status, disease infestation, etc.).
Statistics 101
Any observation made within an experiment has a certain amount of variability (error) associated with it. Variability encountered in the field is what necessitates statistics. In order to determine whether or not differences in observations are due to the imposed treatments, we need to know how much error was encountered within the experiment. Statistics allow us to quantify and assess this error in relation to the differences from imposed treatments. If only a single observation is made you cannot estimate experimental error. Multiple observations associated with each treatment, or replications, are needed. If a measured response in an experiment has a large amount of variability, it will be more difficult to determine statistical significance. The more variability (experimental error) encountered, the lower the sensitivity of the experiment.
In a field experiment, the observations can be confounded with a multitude of uncontrolled soil and environmental factors; therefore, we must replicate the treatments across the field. To ensure the estimates of experimental error for each treatment are unbiased (not systematically influenced by underlying environmental conditions like soil type, topography, etc.), the replications should be randomly placed within the field. We have just covered the two most important concepts of modern statistics: (1) to estimate the experimental error of treatments requires replication, and (2) to ensure an unbiased estimate of experimental error requires randomization of the treatments.
Statistical Significance
Statistical significance is often mentioned but seldom explained. When an experiment is conducted (properly replicated and randomized), the experimental error is computed and used to assess whether or not treatments differ “significantly” from one another. Statistics are based on probability, and researchers select what level of probability constitutes significance. The significance level (often referred to as alpha or “α” in scientific studies) selected is solely at the discretion of the researcher. The scientific community in general prefers a significance level of 90% (α = 0.1) or 95% (α = 0.05), meaning that a researcher can state with 90% or 95% probability that the difference between treatments did not occur by sheer chance. The probability value (often referred to as “p” or “p-value” in scientific studies) is a calculated value from a dataset that gives the probability that differences observed in the trial did not occur by sheer chance. If the p-value from the experiment is less than the significance level, then the treatments are considered “significantly” different.
This is where some gray area enters into research; what is the appropriate probability level? Each researcher has his or her own set of criteria. The next time you attend an Extension event and the speaker is discussing some research data, think about what level of probability is being used to evaluate treatment differences. Statistics allow researchers to assess the error associated with conducting an experiment and to separate real treatment differences from differences caused by uncontrollable environmental factors. Researchers can separate the grain from the chaff as it were. Like any tool, it must be used properly to be effective.
Importance of Replication
Assume you want to evaluate a fungicide treatment on your farm, so you split a field in two and apply the treatment to one half and leave the other half untreated. At the end of the year you harvest each of the two halves and observe a 4 bushel per acre increase in yield on the treated side. This 4 bushel per acre difference seems like a good deal, so you decide that next year all of your acres will be treated with this new fungicide. Are you sure that the additional 4 bushels per acre was due to the application of the fungicide? Closer inspection of the field reveals that the half of the field that showed the yield response was dominated by a lighter texture soil that drained better than the other half of the field. Due to excessive moisture the half of the field with better drainage might be expected to perform better. With the field split in two, it is impossible to determine what factor contributed to the yield increase. There are a multitude of other possible explanations for the yield increase (historical management differences, fertility level differences, insect pressure, disease pressure, natural variation in soil productivity, etc.). Since we have no replication it is very difficult to reach a definite conclusion as to the cause of the yield increase. This is not to say that the 4 bushel per acre increase was not real, but you just do not know that the yield difference was due to the treatment you applied or to some other factor.
Replication allows us to estimate the error associated with carrying out the experiment itself. Let’s revisit the fungicide experiment. Assume you split the field into strips, and established a replicated and randomized trial with three strips that were treated with the fungicide and three that were not. We will look at two different scenarios based on the harvest information.
Scenario 1: At harvest the yield levels of the three treated strips are 59, 52 and 51. The three untreated strips yielded 44, 57 and 49. The average yield levels for the treated and untreated strips are 54 and 50 bushels per acre, respectively. Statistical analysis reveals that the probability of the fungicide treatment resulting in greater yield by sheer chance is 57% (p = 0.57). Thus, as experimenters, we are concerned that the differences in yield between the two treatments may have occurred by chance alone.
Scenario 2: At harvest the yield levels of the three treated strips are 54, 56 and 52. The three untreated strips yielded 49, 51 and 50. The average yield levels for the treated and untreated strips are 54 and 50 bushels per acre, respectively (same as scenario 1). Statistical analysis reveals that the probability of the fungicide treatment resulting in a 4 bushel per acre yield increase by sheer chance is 6% (p = 0.06). Thus, we are comfortable stating that the treated plots yielded significantly higher than the untreated plots.
The averages for each treatment in each scenario are the same, but the spread in the data is greater in scenario 1 compared to scenario 2. The increase in variability between observations increases the experimental error, making it more difficult to identify treatment differences. In other words, some underlying source of error exists that we cannot control or possibly even measure.
Importance of Randomization
Like replication, randomization is an essential feature of a well-designed experiment. Think about our initial experiment where the field was split in two. There was an underlying difference in soil productivity due to soil texture and drainage that could affect the experimental outcome by biasing (confounding) the data. To properly conduct the experiment, this variation should be accounted for in the experimental design. Even if you replicated both treatments (with and without fungicide) three times as you did in the replication section, the conclusions you reach may not be correct if the fungicide treatment was always applied to the same half of the field. The data would be biased (confounded) based on its location in the field.
Least Significant Difference
The next item in our discussion of agricultural statistics is the term “least significance difference” or LSD. This number is often mentioned at Extension meetings and in university publications that provide information and summaries of research. The question is, “what does this number mean?”
Least significant difference is used to compare means of different treatments that have an equal number of replications. What does that mean? Let’s take our example above. We had two different scenarios which can be seen in the tables below.
Recall back to the previous discussion that for the first scenario the probability of the treated plots being different from the untreated plots by sheer chance was 57%. For scenario 2, the probability that the treated plots would be different than the untreated plots by sheer chance was 6%. This was primarily influenced by the amount of error associated with the experiment for each scenario, and we are much more comfortable attributing yield differences from imposed treatments to scenario 2 because the probability of the yield differences existing due to sheer chance is low.
Now let’s look at this another way using LSD.
For scenario 1, at a significance level of 90% (or stated as 0.10) the LSD value is 17.1. For the treated plots to be different than the untreated plots they must differ by at least 17.1 (which they do not).
For scenario 2, at a significance level of 0.10 the LSD value is 2.9. Since the differences between the treatments are greater than 2.9, we feel comfortable stating that the treatments were significantly different.
Scenario 1 | Scenario 2 | ||
Treated plots | Yield, bu/acre | Treated plots | Yield, bu/acre |
Rep 1 | 59 | Rep 1 | 54 |
Rep 2 | 52 | Rep 2 | 56 |
Rep 3 | 51 | Rep 3 | 52 |
Average | 54 | Average | 54 |
Untreated plots | Untreated plots | ||
Rep 1 | 44 | Rep 1 | 49 |
Rep 2 | 57 | Rep 2 | 51 |
Rep 3 | 49 | Rep 3 | 50 |
Average | 50 | Average | 50 |
LSD (0.10) | 17.1 | LSD (0.10) | 2.9 |
Hopefully this will help you understand whether or not two treatments are different the next time you are sitting in an Extension meeting or reading a research summary. Remember, as has been mentioned before, research studies should be conducted over multiple locations and under different environmental conditions to prove their robustness.
Summary
Statistics allow us to evaluate treatment differences and determine whether or not the imposed treatments made a difference. It allows us to make meaningful comparisons to help us decide what production practices are beneficial and those that are not. A general understanding of statistics will help you as an end-user understand how university Extension personnel arrive at their recommendations and prompt you to question information that is being sold to you. Remember: split field information can be useful, but has severe limitations that should be viewed with caution. In order for an experiment to be properly carried out it must contain replication and randomization. Be sure you consider the randomization and replication from any on-farm field trial before making large-scale management changes based on the presented results.
This current fact sheet is a revision of the 2008 ANR fact sheet “Statistics and Agricultural Research” originally authored by Robert Mullen, Edwin Lentz, Greg LaBarge, and Kieth Diedrick.