Ohio State University Extension Bulletin

Research and Reviews: Dairy

Special Circular 169-99


Forecasting Herd Structure, Milk Production and Their Variances for Production Risk Management

N. R. St-Pierre1*, C. S. Thraen**, and L. R. Jones22
*The Ohio State University Department of Animal Sciences
**The Ohio State University Department of Agricultural, Environmental, and Development Economics
22FARME Institute, Inc., Homer, N.Y.

Abstract

Substantial increases in milk-price volatility have resulted from changes in federal dairy policies. For a dairy farm, however, monthly gross milk receipts are a function of unit price and quantity produced. Both can vary substantially over time. Therefore, to be effective, risk-management strategies must address milk and input-price volatility (price-risk management) and fluctuations in milk production per cow and cow numbers (production-risk management). Herd milk production through time can be modeled as a discrete stochastic process using finite Markov chains. Cows at time t = 0 are assigned to homogeneous production cells in four-dimensional arrays with coordinates determined by parity (PAR = 1,2,3), week in milk (WIM = 1,…52), pregnancy status (PREG = 0,1), and week pregnant (WP = 1,…42). The processes of aging, pregnancy, involuntary cull, voluntary cull, abortion, dry-off, and freshening from week i-1 to week i are accounted for using nonstationary transition probabilities. Bayesian estimates of transition probabilities are derived from historical herd data, assuming that individual outcomes are from Bernoulli distributions. The values of parameters uh for the Benoulli distributions are unknown but have prior distributions that follow beta distributions with parameters ah and bh estimated from historical data. Herd observations are then used to generate posterior distributions of uh, also from beta distributions. Projecting from one week to the next is accomplished by moving animals from one production cell to the next based on the transitional probability assigned to that path. Summing production estimates and variances of all independent cells provides for an expected herd production with an associated variance. As expected, the forecast variance increases with time, reflecting increased uncertainty of distant projections. Model validation presents an interesting problem because future observations used for validation are under human control and not independent of the forecast.

Introduction

Radical changes in federal dairy policies have resulted in significant increases in milk price volatility in the United States (Figure 1). Dairy operations are now facing important income risk-management questions. Issues of liquidity and cash flows are often as important as profitability.

Figure 1. Evolution of milk price in the U.S> from 1981 to 1999

New futures and options markets are now available as instruments to manage the variance associated with output (milk) price risk (variance). However, gross income is the product of price per unit multiplied by the number of units produced (yield). Therefore, it is equally important to have the ability to forecast the number of production units (cows) and their productivity (pounds of milk per cow) as it is to forecast the price of milk. Additionally, a forecast must include an estimate of its dispersion (variance) if it is to be useful in risk management. Currently, there is no model available to forecast short- and medium-term herd structure and productivity (expectations and variances). The objectives of this research were: (1) to develop a forecasting model using multiple Markov chains, (2) to determine an algorithm for the estimation of the model’s parameters, and (3) to derive the equations for calculating the variance of the forecast.

A Multiple Markov Chains Model

The model segments the herd in a multiple of disjoint sets, each representing a group of animals of uniform physiological status and with identical production ability (identical in a stochastic sense). Animals of milking age are assigned to a four-dimensional matrix with the following four indices: parity (PAR = 1,2,3), pregnancy status (PREG = 0,1), week in milk (WIM = 1,2,…52), and week pregnant (WP = 1, …,42). Growing animals are assigned to a two-dimensional matrix with week of growth (WG = 1, …, 156) and pregnancy status (PREG = 0,1) as indices. The conceptual structure of the model is presented in Figure 2. At time t, a given cell contains the expected number of animals (of PAR = i1, PREG = i2, WIM = i3, and WP = i4), the variance of this expected number, the expected milk production per animal in the cell, and its variance. Forecasting is done by advancing time (weekly steps) and transferring virtual animals using Markov chains.


Figure 2. Schematic representation of the forecasting model.
Figure 2. Schematic representation of the forecasting model.

Based on biology, a number of cells are never populated (e.g., cows at WIM = 3 and WP = 5) and are ignored in the computer implementation of the model. Likewise, of the thousands of possible chains linking cells (i1, i2, i3, i4) at time t to all cells at time t + 1, only a few can physically take place. Those are represented symbolically in Table 1. A maximum of four chains connect a given cell at time t to four other cells at time t + 1. The proportion of animals moving from a cell jt at time t to a cell jt+1 at time t + 1 is determined from a transition probability ki with 0 ≤ ki = 1. The sum of all ki from a cell jt must be equal to 1 (i.e., all animals must be accounted for from one period to the next). Figure 3 is a diagram of the processes of aging and culling from time t to time t + 1 for a specific cell, in this example, for first parity animals, open in their first week in milk at time t. For this example group, there is a probability that some animals will be culled. This transition is set to 0.75% in this example. This rate reflects a 1.5% death loss in the first two weeks of lactation. The other probability affecting the cell is that some cows will increase DIM by one week without becoming pregnant. In the example, this probability accounts for the remaining animals (100% - 0.75%) or 99.25%.

Table 1. Markov Chains Required to Respresent Aging, Pregnancy, Culling, Abortion, Dry-Off, and Parturition.

Segment Pregnancy Actions1 Symbolic2
Milking herd 0 a WIM + 1
    b PREG = 1, WIM + 1, WP + 1
    c Cull cell
  1 a WIM + 1, WP + 1
    b PREG = 0, WIM + 1, WP = 0
    d WIM = 0, WP + 1
    c Cull cell
  1 a WP + 1
    f PREG = 0, WIM = 1, WP = 0, PAR + 1
    c Cull cell
Replacement heifer 0 a WG + 1
    p PREG = 1, WP = 1, WG + 1
    c Cull cell
  1 a WP + 1, WG + 1
    b PREG = 0, WP = 0, WG + 1
    f PREG = 0, WP = 0, PAR = 1, WIN = 1
    c Cull cell
1 a=aging, b=abort, c=culling, d=dry-off, f=freshing, and p=initiates pregnancy.
2 PAR=Parity (1, 2, 3), PREG=pregnancy status (0, 1), WG=week of growth (1, ..., 156), WIM=week in milk (1, ..., 52), WP=week pregnant (1, ..., 42), and cull cell=stack counting the number of animals culled.

Figure 3. Diagram representing the process of aging or culling from a cell 
	at time t to corresponding cells at time t+1.

Although there is a maximum of four chains leaving a given cell at time t, the number of chains arriving at a given cell at time t + 1 is at most two (Table 2). This has an important repercussion on the size of the problem. The model has a potential of 3 x 52 x 2 x 42 = 13,104 cells for the milking herd alone. Each of these could, from an algebraic basis, be connected to 13,104 cells from time = t to time = t + 1, which would result in 171,714,816 transition probabilities. Fortunately, biology reduces considerably the number of actual transition probabilities.

Table 2. Potential End-Points of Markov Chain Processes.

Segment Pregnancy Event1
Milking 0 Aging: Wini-1 -> WIMi
    Abort: Wini-1 -> WIMi, PREG = 1 -> Preg = 0
    Freshening: WIM = 0 -> WIM = 1
  1 Aging: Wini-1 -> WIMi; WPi-1 -> WPi
    Pregnancy: Wini-1 -> WIMi; WP = 1
Replacement heifer 0 Aging: WGi-1 -> WGi
    Abort: WGi-1 -> WGi, PREG = 1 -> PREG = 0
  1 Aging: WGi-1 -> WGi
    Pregnancy: WGi-1 -> WGi; WP = 1
1 PREG=pregnancy status (0, 1), WG=week of growth (1, ..., 156), WIM=week in milk (1, ..., 52), and WP=week pregnant (1, ..., 42).

Expectation and Variance of Milk Produced

The expected herd milk production (M) at time t (Mt) is simply the sum of the expected milk production of all cells:

E(Mt) = 3j E(Mtj )

and because cells are disjoint, the variance of Mt is simply

VAR(Mt) = 3jVAR(Mtj )

This is the easy part. It states that if we can calculate the estimated milk production and variance of each cell, then we can easily calculate the expected milk production and variance of the whole herd. We are left with the not so-simple problem of estimating expectation and variance at the cell level.

At time t, the expected milk production of a cell j with indices (PAR = i1, PREG = i2, WIM = i3, and WP = i4) is

E(Mtj ) =

(nt-1i1,0,i3-1,0 x kai1,0,i3,-1,0 x MPCi1,0,i3,0) +

  (nt-1i1,1,i3,-1,1 x kbi1,1,i3-1,1 x MPCi1,0,i3,0) + ... +
  (nt-1i1,1,i3,-1,36 x kbi1,1,i3-1,36 x MPCi1,0,i3,0)

if PREG = 0 at time t

E(Mtj ) =(nt-1i1,1,i3,-1,i4-1 x kai1,1,i3-1,i4-1 x MPCi1,1,i3,i4)

if PREG = 1 and WP > 1 at time t

t t-1 P

E(Mtj ) =(nt-1i1,0,i3,-1,0 x kpi1,0,i3-1,0 x MPCi1,1,i3,1)

if PREG = 1 and WP = 1 at time t

where

nt-1i1, i2, i3, i4 = number of cows in cell (i1, i2, i3, i4) at time t-1

kai1, i2, i3, i4= transition probability of simply aging by a week of cows in cell (i1, i2, i3, i4) at time t-1

kbi1,1,i3,i4 = transition probability of abortingof cows in cell (i1, 1, i3, i4) at time

kpi1,0,i3,0 = transition probability of becoming pregnant of cows in cell (i1, 0, i3, 0) at time t-1

MPCi1, i2, i3, i4 = milk per cow for cows in cell (i1, i2, i3, i4) at time t

These equations are really not as complex as they look. For example, let us figure the expected production from first-parity cows, pregnant, 20 weeks in milk, and 10 weeks pregnant at time t. Using the second equation, we have:

E(Mt1,1,20,10 ) =(nt-11,1,19,9 x ka1,1,19,9 x MPC1,1,20,10)

That is, the expected total milk production for this set of cows is equal to the product of the number of cows of first parity, pregnant, 19 weeks in milk and 9 weeks pregnant in the prior period (t-1), times the probability that these cows will age one week and still be in the herd, pregnant and lactating, times the expected milk per cow for cows of first parity, pregnant, 20 weeks in milk and 10 weeks pregnant. In this equation, n, k, and MPC are three random variables, but because they are independent, the expectation of their product is equal to the product of their respective expectations (DeGroot, 1975, p. 153). The derivation of the variance of Mtj is not as straightforward because it involves the product of three random variables (n, k, and MPC). To simplify the notation, we will temporarily relabel the three random variables n, K, and MPC as X1, X2 and X3. Without loss of generality, we can suppose that Xi has mean ui and variance of order n-1 and VAR(Xi) = Σ21. Consider again the function g(X1, X2, X3) = X1 x X2 x X3 = g(X) for brevity. If g1i(u) is δg(X)/δXi evaluated at u1, u2, and u3, then we have a Taylor expansion:

g(X) = g(u) + 33i=1 g1i(u) (Xi - ui) + 0(n-1)

where 0(n-1) is some unknown residual function, assumed to be negligible in the vicinity of ui . By definition

VAR{g(X)} = E{ 33i=1 g1i(u) (Xi - ui)]2} + 0(n-1)
  = 33i=1 {g1i (u)}2 VAR(Xi), thus
VAR (X1 x X2 x X3) = u22 u23 Σ21 + u21 u23 Σ22 + u21 u22 Σ23

It should be apparent that the distribution of Mt will be far from being normal because it is the product of a binomial variate (n), a beta variate (k), and a normal variate (MPC). However, because

Mt = 3j Mtj

for which j is a relatively large number (> 1,000), then Mt will tend to normality under a Central Limit effect. Variables ni1, i2, i3, i4 are calculated recursively. In the initial period, they are set as constants based on the current status of cows in the herd. The E(MPCi1, i2, i3, i4) and VAR (MPCi1, i2, i3, i4) are estimated from herd historical production data using a mixed model that accounts for the effect of parity, age class, days in milk class, season, and month of freshening (Kachman and Everett, 1989; Keown et al., 1986; Stanton et al., 1992). The estimation of transition probabilities, however, proved to be difficult from historical data. Because of the large number of cells, the matrix is generally sparse and in many instances, frequencies are too low to generate useful and accurate estimates. A different approach must be derived to generate transition probability estimates with their distribution.

In general, precision of parameter estimates increases with the number of observations. The amount of data required to obtain a satisfactory parameter estimate is dependent on the parameter, the assumed distribution, and, of course, the degree of precision required for a satisfactory status. Our model requires the estimation of numerous transitional probabilities for the implementation of Markov chains. In many instances, transition probabilities must be estimated from just a few observations; there are instances where such probability parameters must be estimated in the absence of prior data. Lastly, estimations of transition probabilities are not independent from one another. For example, the probability that open cows become pregnant between WIM i-1 and i is not independent of the probability that open cows become pregnant between WIM i and i + 1. That is, the conception rate should not vary widely from week to week but should vary through WIM in a smooth fashion. The next section describes a procedure for estimating transition probabilities using a Bayesian approach with prior distributions to correct for sparcity of data.

Bayesian Parameter Estimates of Transition Probabilities From Sparse Data

The Prior Distribution

Specifying a Prior Distribution
Consider the problem of statistical inference in which observations are to be taken from a distribution for which the probability distribution function (PDF) is f(x | u) , where u is a parameter having an unknown value. Without loss of generality, it is assumed that the unknown value of u must lie in a specified parameter space Ω. In many instances, before any observations from f(x | u) are available, we can summarize our previous information and knowledge about where in Ω the value of u is likely to lie by constructing a probability distribution for u on the set Ω. In other words, outside sources of knowledge indicate that u is more likely to lie in certain regions of Ω than others. The relative likelihood of the different regions can be expressed in terms of probability distribution of Ω, because it represents the relative likelihood that the true value of u lies in each of various regions of Ω prior to observing any values from f(x | u).

Controversial Nature of Prior Distributions
The concept of prior distribution is very controversial in statistics. This controversy is closely related to the controversy regarding the meaning of probability. Some people believe that a prior distribution can be chosen for the parameter u in every statistics problem. They believe that this distribution is a subjective probability distribution in the sense that it represents an individual experimenter’s information and subjective beliefs about where the true value of u is likely to lie. They also believe that a prior distribution is no different from any other probability distribution used in the field of statistics. These people adhere to the Bayesian philosophy of statistics.

Others believe that in many problems, it is not appropriate to speak of a probability distribution of u because the true value of u is not a random variable at all but a fixed number whose value happens to be unknown to the experimenter. These people believe that a prior distribution can be assigned to a parameter u only when there is extensive previous information about the relative frequencies of u estimates.

The method that we propose for the estimation of transition probabilities is Bayesian because it assumes that u follows a prior distribution. The method, however, uses objective distribution estimates as opposed to subjective estimates. Therefore, although the procedure is derived using Bayesian methodologies, it retains some characteristics of classical statistics due to its objective, data-driven approach.

The Posterior Distribution

Suppose that n random variables, x1, … ,xn, form a random sample from a distribution for which the PDF is f(x | u). The value of the parameter u is unknown and the prior PDF of u is ξ(u). Because the random variables x1 , …, xn form a random sample from the distribution for which the PDF is f(x | u), it follows that their joint PDF is:

(1)fn (x1, … ,xn | u) = f(x1 | u) … f(xn | u).

If we use the vector notation x = (x1, …, xn), then the joint PDF can be written as fn(x | u).

In a Bayesian context, u itself has a distribution for which the PDF is ξ(u). The n-dimensional marginal joint PDF gn(x) of x1, …, xn can be written in the form:

(2)gn(x) = I(fn (x | u) ξ(u) d u

Also, the conditional PDF of u given that X1 = x1, …, Xn = xn, which is denoted by x(u | x), must be equal to the joint PDF of X1, …, Xn and u divided by the marginal joint PDF of X1, …, Xn. Therefore, we have:

(3)ξ(u | x) = fn (x | u) ξ(u)
  gn (x)

This is called the posterior distribution function of u because it is the distribution of u after the values of X1, …, Xn have been observed.

The Likelihood Function
The denominator on the right side of equation (3) is simply the integral of the numerator over all possible values of u. Although the value of this integral depends on the observed values X1, …, Xn, it does not depend on u and it may be treated as a constant when the right side of the equation is regarded as a PDF of u. Therefore, we may write the following relation:

(4)ξ(u | x) a fn (x |u) x(u)

When the joint PDF of the observations in a random sample is regarded as a function of u for given values of X1, …, Xn, it is called the likelihood function. In this terminology, the relation in (4) states that the posterior PDF of u is proportional to the product of the likelihood function and the prior PDF of u.

Conjugate Prior Distributions
One last aspect needs to be explained before we can describe the Bayesian estimation of transitional probabilities. In working with the likelihood function (4), certain prior distributions are particularly convenient for use with samples from certain other distributions. For example, if a random sample is taken from a Bernoulli distribution for which the value of the parameter u is unknown, and if the prior distribution of u is a beta distribution, the posterior distribution of u will again be a beta distribution.

Expectations and Variances of Transition Probabilities

Prior Distribution

In a Markov chain,

(5) ni + 1 = ni x ki

where i indicates the time period and ni is the number of objects connecting time i to time i + 1. With this formulation, the ni + 1 are assumed to follow a Bernoulli distribution with parameter ui. In a Bayesian sense, ui is a random variable which will be assumed to follow a beta distribution. The first step in our procedure is to calculate the values of parameter ai and bi of a beta distribution based on the knowledge from all periods 1, 2, …, i, scaled to the average knowledge at any given period.

In our model, pregnancy is modeled as a series of Markov chains. For animals of parity i, the following diagram shows the series of Markov chains:

Week in Milk

Let n1 be the number of animals at WIM = 1 and m1 be the number of animals at WIM = 2 that became pregnant at WIM = 1.

Let the initial values of α and β be:

(6)α0 = Σi-1m1

(7) β0 = Σi-1n1 - Σi-1 m1 = Σi-1n1 - α0

An initial estimate for ki could be calculated as follows:

(8)k0i =      α0     
  α0 + β0

and k01 = k02 = … = k0i-1

These prior estimates, however, carry too much weight because many of the prior observations at different time periods came from the same animals. Prior estimates are therefore weighted by the average number of observations at each time period. The prior estimates of α> (αPr) and β (βPr) are then:

(9)αPr =   ( α0 )    *  (α0 + β0
  α0 + β0   i-1
       
  = α0    
      i-1    

βPr = β0
  i-1

and kPri =   αPr  
  Pr + βPr)

that is, the prior estimates of all ki are all the same and each follows a beta distribution with parameters αPr and βPr.

Posterior Distribution
The posterior distributions of each ki will follow a beta distribution with the following parameters:

αPti = αPr + mi

βPti = βPr + (ni - mi)

that is, the posterior is based on the prior (same for all ki), plus the observations specific to each period. For those periods without observation, the posterior is equal to the prior (the average transition probability). For those periods with substantially more observations than the average period, the posterior will be largely dependent on the observations with little weight put on the prior. Under this procedure, transition probabilities across time periods are relatively smooth unless solid evidence from observations points to the contrary.

Therefore, the posterior distribution of ki, noted as kPti, is a beta distribution with an expectation (mean) and variance:

E (kPti) =   aiPt  
 

αPti+ βPti


VAR (kiPt)=

  aiPt biPt  

  ( αPti+ βPti)2 ( αPti+ βPti + 1)

An example:

Periods

1 2 3 4 5 Total
ni 100 92 80 50 50 372
mi 8 5 12 0 4 29
Uncorrected ki
0.080 0.054 0.150 0.000 0.080
α0=29 αPr=29÷5=5.8 kPr=0.078
β0=343 βPr=343÷5=68.6
αPti 13.8 10.8 17.8 5.8 9.8
βPti 160.6 155.6 136.6 118.6 114.6
kPti 0.079 0.065 0.115 0.047 0.079

Programming and Validation

At the time of this writing, the model has been programmed crudely without any consideration for a user-friendly interface. The model does behave as expected with the variance of expected milk production increasing with the length of forecasting period. The authors are currently seeking outside sources of funding to support the programming work needed for the development of a user-friendly software.

The validation of the model raises an interesting question: how do you validate a model where the target changes are based on information related to the forecast? For example, assume that the model is forecasting a reduction in production in six months from now due to a smaller number of cows in production. Sometime during the next six months (and likely before six months lapse), the manager will realize he/she is about to face a reduction in production and will take action to prevent this drop. A combination of culling and purchasing of animals will take place with the result that the forecasted drop in production will not occur, although the forecast may have been very accurate. This problem is similar in nature to that of forecasting airplane landing sites. Most airplanes eventually land where they were supposed to even though a model could forecast that they were about to miss the runway from 500 miles out. The problem is not with the forecast but with the pilot who, fortunately for the passengers, takes corrective action so that the plane lands on the runway. The value of an advanced forecast is that the plane will follow a more direct and economical route to the landing site. Likewise, an advanced forecast would minimize costs associated with corrective actions in the physical production of milk.

References

DeGroot, M. H. 1975. Probability and Statistics. Addison-Wesley, MA.

Kachman, S. D. and R. W. Everett. 1989. Test day model with individual herd correction factors. J. Dairy Sci. 72(Suppl. 1):60.(Abstr.).

Keown, J. F., R. W. Everett, N. B. Empet, and L. H. Wadell. 1986. Lactation curves. J. Dairy Sci. 69:769.

Stanton, T. L., L. R. Jones, R. W. Everett, and S. D. Kachman. 1992. Estimating milk, fat, and protein lactation curves with a test day model. J. Dairy Sci. 75:1691-1700.


1 For more information, contact at: The Ohio State University, 221A Animal Science Building, 2029 Fyffe Road, Columbus, OH 43210; (614) 292-6507, Fax (614) 292-1515; email:st-pierre.8@osu.edu


Back | Forward | Table of Contents