An Introduction to Probability Theory and Its Applications, Vol. 1, 3rd Edition - PDF Free Download (2025)

the summation extending over the positive integers < a12. This form is more natural than (5.7) because now the coefficients cos 7Tvla form a decreasing sequence and so for large n it is essentially only the first term that counts.

*6. CONNECTION WITH DIFFUSION PROCESSES This section is devoted to an informal discussion of random walks in which the length c5 of the individual steps is small but the steps are spaced so close in time that the resultant change appears practically as a continuous motion. A passage to the limit leads to the Wiener process (Brownian motion) and other diffusion processes. The intimate connection between such processes and random walks greatly contributes to the understanding of both. Ii The problem may be formulated in mathematical as well as in physical terms. It is best to begin with an unrestricted random walk starting at the origin. The nth step takes the particle to the position Sn where Sn = Xl + ... + Xn is the sum of n independent random variables each assuming the values + 1 and 1 . with probabilities p and fj, respectively. Thus (6.1)

Var (Sn) = 4pqn.

Figure 4 of 111,6 presents the first 10,000 steps of such a random walk with p = q = !; to fit the graph to a printed page it was necessary to choose 6 This approach was also fruitful historically. It was fully exploited (though in a heuristic manner) by L. Bachelier, whose work has inspired A. Kolmogorov to develop the formal foundations of Markov processes. See, in particular, L. Bachelier, Calcul des probabilites, Paris (Gauthier-Villars), 1912.

appropriate scales for the two axes. Let us now go a step further and 'contemplate a motion picture of the random walk. Suppose that it is to take 1000 seconds (between 16 and 17 minutes). To present one million steps it is necessary that the random walk proceeds at the rate of one step per millisecond, and this fixes the time scale. What units are we to choose to be reasonably sure that the record will fit a screen of a given height? For this question we use a fixed unit of measurement, say inches or feet, both for the screen and the length of the individual steps. We are then no longer concerned with the variables 8 m but with 68 m where 6 stands for the length of the individual steps. Now (6.2) and it is clear from the central limit theorem that the contemplated film is possible only if for n = 1,000,000 both quantities in (6.2) are smaller than the width of the screen. But if p ~ q and em is comparable to the width of the screen, ~2n will be indistinguishable from 0 and the film will show linear motion without visible chance fluctuations. The character of the random walk can be discerned only when ~2n is of a moderate positive magnitude, and this is possible only when p - q is of a magnitude comparable to ~. If the question were purely mathematical we should conclude that the desired graphical presentation is impossible unless p = q, but the situation is entirely different when viewed from a physical point of view. In Brownian motion we see particles suspended in a liquid moving in random fashion, and the question arises naturally whether the motion can be interpreted as the result of a tremendous number of collisions with smaller particles in the liquid. It is, of course, an over-simplification to assume that the collisions are spaced uniformly in tIme and that each collision causes a displacement precisely equal to ±~. Anyhow, for a first orientation we treat the impacts as governed by Bernoulli trials and ask whether the observed motion of the particles is compatible with this picture. From actual observations we find the ave~age displacement c and the variance D for a unit time interval. Denote by r the (unknown) number of collISIOns per tIme umt. Then we must have, approximately, (6.3)

In a simulated experiment no chance fluctuations would be observable unless the two condItIOns (6.3) are satIsfied WIth D > O. An experiment with p = 0.6 and ~r = 1 is imaginable, but in it the vari~nce would be so small that the motion would appear deterministic: A clUIhp of particles initially close together would remain together as if it were a rigid body.

[XIV. 7

Essentially the same consideration applies to many other phenomena in physics, economics, learning theory, evolution theory, etc., when slow fluctuations of the state of a system are interpreted as the result of a huge number of successive sma]] changes due to random impacts. The simple random-walk model does not appear realistic in any particular case, bu t fortunately the situation is similar to that in the central limit theorem. Under surprisingly mild conditions the nature of the individual changes is not important, because the observable effect depends only on their expectatIOn and varIance. In such CIrcumstances It is natural to take the simple random-walk model as universal prototype. To summarize, as a preparation for a more profound study of various stochastic processes it is natural to consider random walks in which the length c5 of the individual steps is small, the number r of steps per time unit is large, and p - q is small, the balance being such that (6.3) holds (where c and D > are given constants). The words large and small are vague and must remain flexible for practical applications. 7 The analytical formulation of the problem is as follows. To every choice of c5, r, and p there corresponds a random walk. We ask what happens in the limit when c5 ~ 0, r ~ 00, and p ~ i in such a manner that

Two procedures are avaIlable. Whenever we are In possessIOn of an explicit expression for relevant probabilities we can pass to the limit directly. We shall illustrate this method because it sheds new light on the normal approximation and the limit theorems derived in chapter III. This method is of limited scope, however, because it does not lend itself to generalizations. More fruitful is the start from the difference equations governing the random walks and the derivation of the limiting differential equations. It turns out that these differential equations govern well defined stochastic processes depending on a continuous time parameter. The same IS true of varIOUS ObVIOUS generalIzatIOns of these dIfferentIal equations, and so the second method leads to the important general class of diffusion processes. 7 The numbe~ of IJlolecular shocks per time unit is beyond imagination. At the other extreme, in evolutiO~r1 theory one considers small changes from one generation to the next, and the time separating two generations is not small by everyday standards. The number of generations considered is not fantastic either, but may go into many thousands. The point is that the. process proceeds on a scale where the changes appear in practice continuous and a diffusion model with continuous time is preferable to the random-walk model.

To describe the direct method in the simplest case we continue to denote by {Sn} the standard random walk with unit steps and put

= k}.

In our accelerated random walk the nth step takes place at epoch n/r, and the position is Snc5 = kc5. We are interested in the probability of finding the particle at a given epoch t in the neighborhood of a given point x, and so we must investigate the asymptotic behavior of vk • n when k ~ 00 and n ~ 00 In such a manner that n/r ~ t and kc5 ~ x. The event {Sn = k} requires that nand k be of the same parity and takes place when exactly (n +k)/2 among the first n steps lead to the right. From the de Moivre-Laplace approximation we conclude therefore that in our passage to the limit _(6.6)

where the sign .-.. indicates that the ratio of the two sides tends to unity. Now vk • n is the probability of finding Snc5 between kc5 and (k+2)c5, and since this interval has length 2c5 we can say that the ratio vk • n/(2c5) measures locally the probability per unit length, that is the probability density. The last relation in (6.6) implies that the ratio vk.nl(2~) tends to (6.7) It follows that sums of the probabilities vk •n can be approxImated by integrals over vet, x), and our result may be restated to the effect that with our passage to the limit

P{ a.

dx.

I he Integral on the nght can be expressed In terms of the normal distribution function 91 and (6.8) is in fact only a notational variant of the de Moivre-Laplace limit theorem for the binomial distribution. The approach based on the appropriate difference equations is more interesting Considering the position of the particle at the nth and the (n+ l)st trial it is obvious that the probabilities Vk.n satisfy the difference equations (6.9)

On multiplying by 2c5 it follows from our preceding result that the limit vet, x) should be an approximate solution of the difference equation (6.10)

= pv(t, x-c5) + qv(t, x+<5).

Since v has continuous derivatives we can expand the terms according to Taylor's theorem. Using the first-order approximation on the left and second-order approximation on the right we get (after canceling the leading terms) (6.11)

= (q_p) c5r. ov(t,

+ ....

In our passage to the limit the omitted terms tend to zero and (6.11) becomes in the limit (6.12)

D 02V(t, x) . 2 ox2

This is a special diffusion equation also known as the Fokker-Planck equatIOn for dIflusIOn. Our calculatIOns were purely formal and heurIstIc, but it will not come as a surprise that the function v of (6.7) indeed satisfies the differential equation (6.12). Furthermore, it can be shown that (6.7) represents the only solution of the diffusion equation having the obvious properties required by the probabilistic interpretation. The diffusion equation (6.12) can be generalized by permitting the coefficIents c and D to depend on x and t. Furthermore, It possesses obvious analogues in higher dimensions, and all these generalizations can be derived directly from general probabilistic postulates. This topic will be taken up in chapter X of volume 2; here we must be satisfied by these brief and heuristic indications of the connections between random walks and general diffusion theory. As a second example we take the ruin probabilities U z . n discussed in the preceding two sections. The underlying difference equations (4.1) differ from (6.9) in that the coefficients p and q are interchanged. 8 The formal calculations indicated in (6.11) now lead to a diffusion equation obtained from (6.12) on replacing c by c. Our limiting procedure leads from the probabilities uz . n to a function u(t, ~) which satisfies this modified diffusion equation and which has probabilistic significance 8 The reason is that in U z • n the variable z stands for the initial position whereas the . probability Vk.n refers to the position at the running time. In the terminology to be introduced in volume 2, probabilities depending on the initial position satisfy backward (retrospective) equations, the others forward (or Fokker-Planck) equations. In physics the latter are sometimes called continuity equations. The same situation will be encountered in chapter XVII.

similar to u z •n : In a diffusion process starting at the point ~ > 0 the probability that the particle reaches the origin before reaching the point rx > ~ and that this event occurs in the time interval t1 < t < t2 is given by the integral of u(t,~) over this interval. The formal calculations are as follows. For u z •n we have the explicit expression (5.8). Since z and n must be of the same parity, U z •n corresponds to the interval between n/r and (n + 2)/r, and we have to calculate the limit of the ratio u z •n r/2 when r ~ 00 and c5 ~ 0 in accordance with (6.4). The length a of the inter val and the initial position z must be adjusted so as to obtain the limits rx and r Thus z ~/c5 and a rx/c5. It is now easy to find the limits for the individual factors in (5.8). From (6.4) we get 2p 1 + cc5/ D, and ('..J

~ D.

1 2 2 2 e-7};v 1T Dt/a. .

Finally sin v7Tc5/rx "" v7Tc5/rx. Substitution into (5.8) leads formally to (6.15)

2.

Dt/a. sm - . rx

(Since the series converges uniformly it is not difficult to justify the formal calculations.) In physical diffusion theory (6.15) is kno wn as Furth' s formula for first passages. [For the limiting case rx = 00 see problem 14. For an alternative form of (6.15) see problem 22.] *7 RANDOM WALKS IN THE PI,ANE

AND SPACE In a two-dimensional random walk the particle moves in unit steps in one of the four directions parallel to the x- and y-axes. For a particle starting at the origin the possible positions are all points of the plane with integral-valued coordinates. Each position has four neighbors. Similarly, in three dimensions each position has six neighbors. The random walk IS defined by specifying the corresponding four or six probabilities. For

topic and may be omitted at first reading.

simplicity we shall consider only the symmetric case where all directions have the same probability. The complexity of problems is considerably greater than in one dimension, for now the domains to which the particle is restricted may have arbitrary shapes and complicated boundaries take the place of the single-point barriers in the one-dimensional case. We begin with an interesting theorem due to Polya. 9

Before provmg the theorem let us gIve two alternatIVe formulatIOns, both due to Polya. First, it is almost obvious that the theorem implies that in one and two dimensions there is probability 1 that the particle will pass infinitely often through every pOSSible point; in three dimensions this is not true, however Thlis the statement "all roads lead to Rome" is, in a way, justified in two dimensions. Alternatively, consider two particles performing independent symmetric random walks, the steps occurring simultaneously. Will they ever meet? To simplify language let us define the distance of two possible positions as the smallest number of steps leading from one position to the other. (This distance equals the sum of absolute differences of the coordinates) If the two particles move one step each, their mutual distance either remains the same or changes, by two units, and so their distance either is even at all times or else is always odd. In the second case the 'two particles can never occupy the same position. In the first case it is readily seen that the probability of their meeting at the nth step equals the probability that the first particle reaches in 2n steps the initial position of the second particle. Hence our theorem states that in two, but not in three, dimensions the two particles are sure infinitely often to occupy the same position. If the initial distance of the two particles is odd, a similar argument shows that they will infinitely often occupy neighboring positions. If this is called meeting, then our theorem asserts that in one and two dimensions the two particles are certain to meet infinitely often, but in three dimensions there is a positive probability that they never meet. 9 G. Polya, Ober eme AuJgabe der Wahrschemltchkettsrechnung betreffend die Irrfahrt im Strassennetz, Mathematische Annalen, vol. 84 (1921), pp. 149-160. The numerical value 0.35 was calculated by W. H. McCrea and F. J. W. Whipple, Random paths in two and three dimensions, Proceedings of the Royal Society of Edinburgh, vol. 60 (1940), pp. 281-298.

Proof. For One dimension the theorem has been proved in example XIII,(4.h) by the method of recurrent events. The proof for two and three dimensions proceeds along the same lines. Let Un be the probability that the nth trial takes the particle to the initial position. According to theorem 2 of XIII,3, we have to prove that in the case of two dimensions I Un diverges, whereas in the case of three dimensions I Un ;::::::j 0.53. In two dimensions a return to the initial position is possible only if the numbers of steps in the positive x- and y-directions equal those in the negative xand y-directions, respectively. Hence· Un = 0 if n is odd and [using the multinomial distribution VI,(9.2)]

4-2n(~r.

shows that U2n is of the order of magnitude lin, so that as asserted. In three dimensions we find similarly (7.2)

(1..

< I'l.

}2.

Within the braces we have the terms of a trinomial distrubution, and we know that they add to unity. Hence the sum of the squares is smaller than the maximum term within braces, and the latter is attained when both j and k are about n/3. Stirling's formula shows that this maximum is of the order of magnitude n 1, and therefore U2n is of the magnitude 1/141 . . /n 3 so that I "2 It converges as asserted ~ We conclude this section with another problem which generalizes the concept of absorbing barriers. Consider the case of two dimensions where instead of the interval 0 < x < a we have a plane domain D, that is, a collection of points with integral-valued coordinates Each point has four neighbors, but for some points of D One or more of the neighbors lie outside D. Such points form the boundary of D, and all other points are called interior points. In the one-dimensional case the two barriers form the boundary, and our problem consisted in finding the probability

that, starting from z, the particle will reach the boundary point 0 before reaching a. By analogy, we now ask for the probability that the particle will reach a certain section of the boundary before reaching any boundary point that is not in this section This means that we divide all boundary points into two sets B' and B". If (x, y) is an interior point, we seek the probability u(x, y) that, starting from (x, y), the particle will reach

a point of B' before reaching a point of B". In particular, if B' consists of a single point, then u(x, y) is the probability that the particle will, sooner or later, be absorbed at the partIcular pomt. Let (x, y) be an interior point. The first step takes the particle from (x, y) to one of the four neighbors (x± 1, y), (x, y± 1), and if all four of them are interior points, we have obviously (7.4) u(x, y)

+ u(x-l, y) + u(x, y+ 1) + u(x, y-l)].

This is a partial difference equation which takes the place of (2.1) (with P = q = i)· If (x+ 1, y) is a boundary point, then its contribution u(x+ 1, y) must be replaced by 1 or 0, according to whether (x+ 1, y) belongs to B' or B". Hence (7.4) will be valid for all interior points if we agree that for a boundary point (~, 'Y)) in B' we put u(~, 'Y)) = 1 whereas u(~, 'Y)) = 0 if (~, 'Y)) is in B". This convention takes the place of the boundary conditions (2.2). In (7.4) we have a system of linear equations for the unknowns u(x, y); to each interior point there correspond one unknown and one equation The system is non-homogeneous, since in it there appears at least one boundary point (~, 'Y)) of B'and it gives rise to a contribution! on the right side. If the domain D is finite, there are as many equations as unknowns, and it is well known that the system has a unique solution if, and only if, the corresponding homogeneous system (with u(~, 'Y)) = 0 for all boundary points) has no non-vanishing solution, Now u(x, y) is the mean of the four neighboring values u(x± 1, y), u(x, y± 1) and cannot exceed all four. In other words, in the interior u(x, y) has neither a maximum nor a minimum in the strict sense, and the greatest and the smallest value occur at boundary points. Hence, if all boundary values vanish, so does u(x, y) at all interior points, which proves the existence and uniqueness of the solution of (7.4). Since the boundary values are 0 and 1, all values u(x, y) lie between 0 and 1, as is required for probabilities. These statements are true also for the case of infinite domains, as can be seen from a general theorem on infinite Markov chains. ]0 10 Explicit solutions are known in only a few cases and are always very complicated. Solutions for the case of rectangular domains, infinite strips, etc., will be found in the paper by McCrea and Whipple cited in the preceding footnote.

*8. THE GENERALIZED ONE-DIMENSIONAL RANDOM WALK (SEQUENTIAL SAMPLING) We now return to one dimension but abandon the restriction that the particle moves in unit steps. Instead, at each step the particle shall have probability Pk to move from any point x to x + k, where the integer k may be zero, positive, or negative. We shall investigate the following ruin problem: The particle starts from a position z such that 0 < z < a; we

seek the probability U z that the particle will arrive at some position <0 before reaching any position > a. In other words, the position of the particle following the nth trial is the point z + Xl + X 2 + ... + Xn of the x-axis, where the {X k } are mutually independent random variables with the common distribution {p J , the process stops when for the first time either Xl + ... Xn < -z or Xl + ... Xn > a - z. This problem has attracted widespread interest in connection with sequential sampling. There the X k represent certain characteristics of samples or observations. Measurements are taken until a sum Xl + + ... + X k falls outside two preassigned limits (our -z and a - z). In the first case the procedure leads to what is technically known as rejection, in the second case to acceptance. l l

Example. (a) As an illustration, take Bartky's double-sampling inspection scheme. To test a consignment of items, samples of size N are taken and subjected to complete inspection. It is assumed that the samples are stochastically independent and that the number of defectives in each has the same binomial distribution. Allowance is made for one defective item per sample, and so we let X k + 1 equal the number of defectives in the kth sample. Then for k > 0

and P 1 = qS, P,c = 0 for x < -1. The procedural rule is as follows: A preliminary sample is drawn and, if it contains no defective, the whole consignment is accepted; if the number of defectives exceeds u, the whole lot is rejected. In either of these cases the process stops. If, however, the number z of defectives lies in the range I < z < a, the sampling

* This section is not used later on. The general theory of sequential statistical procedures was developed by Abraham Wald during the Second World War in connection with important practical problems. Modern treatm~~ts can be found in many textbooks on mathematical statistics. Bartky's scheme described in the example dates from 1943 and seems to have been the very first sequential sampling procedure proposed in the literature. 11

continues in the described way as long as the sum is contained between 1 and a. Sooner or later it will become either 0, in which case the consignment is accepted, or >a, in which case the consignment is rejected. ~ Without loss of generality we shall suppose that steps are possible in both the positive and negative directions. Otherwise we would have either U z = 0 or U z = I for all z. The probability of ruin at the first step is obviously (8.1)

+ P-z-l + P-z-2 + ...

(a quantity which may be zero). The random walk continues only if the particle moved to a position x with 0 < x < a; the probability of a jump from z to x is PX-li' and the probability of subsequent ruin is then U X ' Therefore a-I

Once more we have here a - I linear equations for a - I unknowns U z • The system is non-homogeneous, since at least for z = I the probability r 1 is different from zero (because steps in the negative direction are possible). To show that the linear system (8.2) possesses a unique solution we must show that the associated homogeneous system a-I

has no solution except zero. To reduce the number of subscripts appearing in the proof we assume that P-l ~ 0 (but the argument applies equally to other positive terms with negative index). Suppose, then, that U z satisfies (8.3) and denote by M the maximum of the values U z • Let U r = M. Since the coefficients Px-z in (8.3) add to < I this equation is possible for z - r only if those ux that actually appear on the right side (with positive coefficients) equal M and if their coefficients add to unity. Hence Ur - 1 = M and, arguing in the same way, U r - 2 = U r - 3 = ... = = U 1 = M. However, for z = I the coefficients Pz-r in (8.3) add to less than unity, so that M must be zero. It follows that (8.2) has a unique solution, and thus our problem is determined. Again we simplify the writing by introducing the boundat y conditions Ux = I if x

a.

XIV.8]

THE GENERALIZED ONE-DIMENSIONAL RANDOM WALK

365

Then (8.2) can be written in the form (8.5) the summation now extending over all x [for x > a we have no contribution owing to the second condition (8.4); the contributions for x < 0 add to r z owing to the first condition]. For large a it is,tumbersome to solve a-I linear equations directly, and it is preferable to use the method of particular solutions analogous to the procedure of section 2. It works whenever the probability distribution {Pk} has relatively few positive terms. Suppose that only the Pk with -v < k < fl are different from zero, so that the largest possible jumps in the positive and negative directions are p; and JI, respectively. The characteristic equation (8.6) is equivalent to an algebraic equation of degree v + fl. If a is a root of (8.6), then uz = a Z is a formal solution of (8.5) for all z, but this solution does not satisfy the boundary conditions (8.4). If (8.6) has fl + v distinct roots aI, a 2, ... , then the linear combination (8.7) is again a formal solution of (8.5) for all z, and we must adjust the constants Ak to satisfy the boundary conditions. Now for 0 < Z < a only values x with-v + 1 < x < a + fl - 1 appear in (8.5). It suffices therefore to satisfy the boundary conditions (8.4) for x 0, 1, 2, ... , -v + 1, and x = a, a + 1, ... , a + fl - 1, so that we have fl + v conditions in all. If a k is a double root of (8.6), we lose one constant, but in this case it is easily seen that U z = za~ is another formal solution. In every case the fl + v boundary conditions determine the f.-l + v arbitrary constants. Example. (b) Suppose that each individual step takes the particle to one of the four nearest positions, and we let P-2 = P-l = PI = P2 = 1. The characteristic equation (8.6) is a- 2 + a-I + a + a 2 = 4. To solve it we put t = a + { j l : with tl)is substitution our equation becomes (2 + ( 6, which has the ,roots t 2, 3. Solving t 0 + 0 1 for (f we find the four roots (8.8)

a1

=

a2

=

1,

366 Since

RANDOM WALK AND RUIN PROBLEMS 0'1

[XIV.8

is a double root, the general solution of (S.5) in our case is

(S.9) The boundary conditions U o = U-l = 1 and Ua = ua+1 = 0 lead to four linear equations for the coefficients A; and to the final solution (S.10)

z

= 1- -

II

z

a

+

(2z-a)(O'~-O'~)

- a(O';Z-a_O'!z-a)

.-:....------:~~----.:~--..:.....~-~'!-....:.

a{(a+2)(o~

o~)

a(o~+2

ot~}

Numerical Approximations. Usually it is cumbersome to find all the roots, but rather satisfactory approximations can be obtained in a surprisingly simple way. Consider first the case where the probability distribution {Pk} has mean zero. Then the characteristic equation (86) has a double root at a - 1, and A + Bz is a formal solution of (8.5). Of course, the two constants A and B do not suffice to satisfy the p, + v boundary conditions (8.4). However, if we determine A and B so that A + Bz vanishes for z = a + p, - 1 and equals 1 for z = 0, then A + Bx ~ 1 for x ~ 0 and A + Bx ~ 0 for a ~ x < a + p, so that A + Bz satisfies the boundary conditions (8.4) with the equality sign replaced by~. Hence the difference A + Bz - U z is a formal solutIOn of (8.5) With non-negative boundary values, and therefore A + Bz - U z ~ O. In like manner we can get a lower bound for U z by determining A and B so that A + Bz vanishes for z = a and equals 1 for ·z = = -v + 1. Hence a-z a+p,-z-1 - - - < U z -< - a+p,-1 ----(8.11) a+v-lI hiS estimate IS excellent when a IS large as compared to p, + v. [Of course, (I-zla) is a better approximation but does not give precise bounds.] Next, consider the general case where the mean of the distribution {pd is not zero. The characteristic equation (8.6) has then a simple root at a = 1. The left side of (8.6) approaches 00 as a -+ 0 and as a -+ 00. For positive a the curve y = 'i:.p"a k is continuous and convex, and since it intersects the line y 1 at (] 1, there exists exactly one more intersection. Therefore, the characteristic equation (8.6) has exactly two positive roots, 1 and al. As before, we see that A + Bat is a formal solution of (8.5), and we can apply our previous argument to this solution instead of A + Bz. We find in this case U z R:I

(8.12) and have the Theorem. The solution of our ruin problem satisfies the inequalities (8.11) if {pd has zero mean, and (8.12) otherwise. Here a l is the unique positive root different from 1 of (8.6), and p, and -v are defined, respectively, as the largest and smallest subscript for which PIc ~ o. . Let m = 'i:.kp" be the expected gain in a single trial (or expected length of a single step). It is easily seen from (8.6) that a l > 1 or a l < 1 according to whether m < 0 or m > O. Letting a -+ 00, we conclude from our theorem that in a game against an infinitely rich adversary the probability of an ultimate ruin is one if and only if m ~ O. The duration of game can be discussed by similar methods (cf. problem 9).

XIV.9]

PROBLEMS FOR SOLUTION

367

9. PROBLEMS FOR SOLUTION Note: Problems 1-4 refer only to section 2 and require no calculations. 1. In a random walk starting at the origin find the probability that the point a > 0 will be reached before the pomt -b < O. 2. Prove that with the notations of section 2: (a) In a random walk starting at the origin the probability to reach the point a > 0 before returning to the origin equals pO -ql)' (b) In a random walk starting at a > 0 the probability to reach the origin bCrOIC lctuming to the stalting point equals qqa-l' 3. If q ~ p, conclude from the preceding problem: In a random walk starting at the origin the number of visits to the point a > 0 that take place before the first return to the origin has a geometric distribution with ratio 1 -qqa-l' (Why is the condition q ~p necessary?) 4. Using the preceding two problems prove the theorem12 • The number of visits to the point a > 0 that take place prior to the first return to the origin has expectation (p/q)a when p < q and 1 when p = q. 5. Consider the ruin problem of sections 2 and 3 for the case of a modified random walk in which the particle moves a unit step to the right or left, or stays at its present position ~ith probabilities ~, (3, )!, respectively (~ + fJ I I' 1). (In gambling terminology, the bet may result in a tie.) 6. Consider the ruin problem of sections 2 and 3 for the case where the origin is an elastic barrier (as defined in section 1). The difference equations for the probability of ruin (absorption at the origin) and for the expected duration are the same, but with new boundary conditions. 7. A particle moves at each step two units to the right or one unit to the left, with corresponding probabilities p and q (p + q - 1). If the starting position is z > 0, find the probability qz that the particle will ever reach the origin. (This is a ruin problem against an infinitely rich adversary.) Hint: The analogue to (2.1) leads to a cubic equation with the particular solution qz = 1 and two particular solutions of the form }.z, where A satisfies a quadratic eq'lation 8. ContinuationP Show that ql equals the probability that in a sequence of Bernoulli trials the accumulated number of failures will ever exceed twice the accumulated number of successes. [When p = q this probability equals (V5 -1)/2.] 12 The truly amazing implications of this result appear best in the language of fair games. A perfect coiIl is tossed uIltil the first equalizatioIl of the accumulated Ilumbers of heads and tails. The gambler receives one penny for every time that the accumulated number of heads exceeds the accumulated number of tails by m. The "fair entrance fee" equals 1 independently of m. For a different (elementary) proof see problems 1-2 of XII, 10 in volume 2. 13 This problem was formulated by D. J. Newman. That its solution is a simple corollary to the preceding problem (in the second edition) was observed by W. A. O'N. Waugh. The reader may try the same approach for the more general problem when the factor 2 is replaced by some other rational. A solution along different lines was devised by J. S. Frame. See Solution to problem 4864, Amer. Math. Monthly, vol. 67 (1960), pp. 700-702.

368

[XIV.9

RANDOM WALK AND RUIN PROBLEMS

9. In the generalized random-walk problem of section 8 put [in analogy with (8.1)] pz = Pa-z + Pa+l-Z + pa+2-z + ... , and let d z.n be the probability that the game lasts for exactly n steps. Show that for n ~ 1 a-I

L dx,nPx-z

dz,n+l =

x=l

with d Z.1 = r z + pz. Hence prove that the generating function dzCa) is the solution of the system of linear equations a-I

a-1dzCa) -

L dxCa)px_z =

x=l

rz

+

=

L.dz.na n

pz·

By differentiation it follows that the expected duration ez is the solution of a-I

ez -

L expx_z =

1.

x=l

10. In the random walk with absorbing barriers at the points 0 and a and with initial position z, let wz.n(x) be the probability that the nth step takes the particle to the position x. Find the difference equations and boundary conditions which determine wz.n(x). 11. Continuation. Modify the boundary conditions for the case of two reflecting barriers (i.e., elastic barriers with 0 = 1). 12. A symmetric random walk (p = '1) has possible positions 1,2, ... , a-I. There is an absorbing barrier at the origin and a reflecting barrier at the other end. Find the generating function for the waiting time for absorption. 13. An alternative form for the first-passage probabilities. In the explicit formula (5.7) for the ruin probabilities let a -+ 00. Show that the result is U •n

z

2jJ(n-z)/2'l(n+z)/2

il l cos n -

1TX •

sin

1TX •

sin

1TXZ •

dx.

Consequently, this formula must be equivalent to (4.14). Verify this by showing that the appropriate difference equations and boundary conditions are satisfied. 14. Continuation: First passages in diffusion. Show that the passage to the lImIt descrIbed In sectIOn 6 leads from the last formula to the expressIOn

for the probability density for the waiting time for absorption at the origin in a diffusion starting at the point z > O. When p = '1 this result is equivalent to the lImIt theorem 3 of 111,7. Note: In the following problems vx.n is the probability (6.1) that in an unrestricted random walk starting at the origin the nth step takes the particle to the position x. The reflection principle of III, 1 leads to an alternative treatment.

XIV.9]

369

PROBLEMS FOR SOLUTION

15. Method of images. 14 Let p = q =!. In a random walk in (0, (0) with an absorbing barrier at the origin and initial position at z > 0, let uz,n(x) be the probability that the nth step takes the particle to the position x > 0. Show that uz.n(x) = vx-z,n - vX+z,n' [Hint: Show that a difference equation corresponding to (4.1) and the approprIate boundary condItIons are satIsfied.] 16. Continuation. If the origin is a reflecting barrier, then uz,n(x) ]7

Continuation

=

vx-z,n

+ vX+z-I,n'

IUhe random walk is restricted to (0, a) and both barriers

are absorbing, then (9.1)

uz,n(x) =

L {Vx- z- 2ka,n -

VX+z-2ka,n},

k

the summation extending over all k. positive or negative (only finitely many terms are different from zero). If both barriers are reflecting, equation (9.1) holds with minus replaced by plus and x + z replaced by x + z - 1. 18. Distribution of maxima. In a symmetric unrestricted random walk starting at the origin let Mn be the maximum abscissa of the particle in n steps. Using problem 15, show that (9.2)

L

19. Let VXCs) = vx,nsn (cf. the note preceding problem 15). Prove that ViI;(s) = Vo(s)A.;-X(s) when x ~ 0 and VxCs) = VoCs)AIX(s) when x ~ 0, where AI(S) and A2(S) are defined in (4.8). Moreover, Vo(s) = 0-4pqs2)-!. Note: These relations follow directly from the fact that AI(S) and }'2(S) are generating functions of first-passage times as explained at the conclllsion of section 4. 20. In a random walk in (0, (0) with an absorbing barrier at the origin and initial position at z, let uz.n(x) be the probability that the nth step takes the particle to the position x, and let (j)

(9.3)

UZCs; x) =

L uz,n(x)sn. n=O

Using problem 19, show that UzCs; x)

=

Vx_z(s) -

A~(S) VXCs).

Conclude

(9.4)

Compare with the result .of problem ] 5 and derive (94) from the latter by combinatorial methods. 14 Problems 15-17 are examples of the method of images. The term Vx-z,n corresponds to a particle in an unrestricted random walk, and VXH,n to an "image point."

In (9.1) we find image points starting from various positions, obtained by repeated

reflections at both boundaries. In problems 20-21 we get the general result for the unsymmetric random walk using generating functions. In the theory of differential equations the method of images is always ascribed to Lord Kelvin. In the probabilistic literature the equivalent reflection principle is usually attributed to D. Andre. See footnote 5 of IlI,l.

370

RANDOM WALK AND RUIN PROBLEMS

[XIV.9

:2l. Alternatice formula for the probability of ruin (5.7).

Expanding (4.11)

into a geometric series, prove that UZ,n

t )' .

Z~ CJ:

ka

W z + 2ka ,n

k=O

where Wz,n denotes the first-passage probability of (4.14). 22. If the passage to the limit of s~ction 6 is applied to the expression for Uz,n given in the preceding problem, show that the probability density of the absorptIon tIme equal s15 (fj

I a + 2kr:t.)e-(~+2krt.)2/(2Dt) k=-CIJ

(Hint· Apply the normal approximation to the binomial distribution) 23. Renewal method for the ruin problem. I6 In the random walk with two absorbing barriers let U z . n and u:,n be, respectively, the probabilities of absorption at the left and the right barriers. By a proper interpretation prove the truth of the following two equations: VzCs) J7 o(s)

Va_z(s) = Vz(s) vacs)

Vt (s) J7_ a(s),

+ Vz*(S) Vo(s).

Derive (4.11) by solving this system for VZCs). 24. Let uz.n(x) be the probability that the particle, starting from z, will at the nth step be at x without having previously touched the absorbing barriers. Using the notations of problem 23, show that for the corresponding generating function Vis;.1') = ~uz.n(x)sn we have Vis; x) = Vx_is) - Vis) Vx(s) - Vz*(s) Vx_a(s).

(No calculations are required.) 25 Continuation The generating f"nction [(s; x) of the preceding problem can be obtained by putting VzCs; x) = Vx_is) - A}'i(s) - B},~(s) and determining the constants so that the boundary conditions Vz(s; x) = 0 for Z = 0 and Z = a are satisfied. With reflecting barriers the boundary conditIOns are Vo(s; x) = VI(s; x) and Vacs; x) = Va-I(s; x). 26. Prove the formula

" cos n t . cos tx . dt

z;

by showing that the appropriate difference equation is satisfied. Conclude that VXCs) = (27T)-I

p)X/2 [" (_q- .

1T

cos tx . /_ dt. 1 - 2 v Pq . s . cos t

15 The agreement of the new formula with the limiting form (6.15) is a well-known fact of the theory of theta functions. See XIX, (5.8) of volume 2. 16 Problems 23-25 contain a new and independent derivation of the main results concerning random walks in one dimension.

XIV.9]

PROBLEMS FOR SOLUTION

371

27. In a three-dimensional symmetric random walk the particle has probability one to pass infinitely often through any particular line x = m, Y = n. (Hint: Cf. problem 5.) 28. In a two-dimensional symmetric random walk starting at the origin the probability that the nth step takes the particle to ($, y) is (27T)-22-

n

I"" I"" (cos

a.

+ cos

fJ)n . cos xa. . cos yfJ . da. dfJ.

Verify this formula and find the analogue for three dimensions. (Hint: Check that the expression satisfies the proper difference equation.) 29. In a two-dimensional symmetric random walk let D~ = x 2 + y2 be the square of the distance of the particle from the origin at time n. Prove E(D~) = n. [Hint: Calculate E(D;_l -D;).] 30. In a symmetric random walk in d dimensions the particle has probability 1 to return infinitely often to a position already previously occupied. Hint: At each step the probability of moving to a new position is at most (2d -l)/(2d). 31. Show that the method described in section 8 works also for the generating function VzCs) of the waiting time for ruin.

CHAPTER XV

Markov Chains

·1. DEFINITION Up to now we have been concerned mostly with independent trials which can be described as follows. A set of possible outcomes Eb E 2 , • • • , (finite or infinite in number) is given, and with each there is associated a probability Pk; the probabilities of sample sequences are defined by the multiplicative property P{(E;o' E;1' ... , E;J} = P;OP;1 ... Pin' In the theory of Markov chains we consider the simplest generalization which consists in permitting the outcome of any trial to depend. on the outcome of the directly preceding trial (and only on it). The outcome Ek is no longer associated with a fixed probability Pk' but to every pair (E;, Ek) there corresponds a conditional probability P;k; given that E; has occurred at some trial, the probability of Ek at the next trial is pike In addition to the P;k we must be given the probability a k of the outcome Ek at the initial trial. For Pilr to have the meaning attributed to them, the probabilities of sample sequences corresponding to hvo, three, or four trials must be defined by P{(Ej, Ek)} = a;p;k' P{(E;, Ek, Er)} = a;pikPkr' P{(Ej, E k, Er, Es)} = a;p;kPkrPrS' and generally P{(Ejo' E;1' ... , E;J} ajopio;1P;1j2 P;n-2;n- 1P jn-1;n' Here the initial trial is ~umbered zero, so that trial number one is the second trial. (This convention is convenient and has been introduced tacitly in the preceding chapter.) Several processes treated in the preceding chapters are Markov chains, but in special cases it is often preferable to use different notations and modes of description. The principal results of the present chapter concern the existence of certain limits and equilibrium distributions; they are, of course, independent of notations and apply to all Markov chains.

(1.1)

372

XV.1]

373

DEFINITION

Examples. (a) Random walks. A random walk on the line is a Markov chain, but it is natural to order the possible positions in a doubly infinite sequence ... , - 2, -1, 0, 1, '0;-: . .. With this order transitions are possibly only between neighboring positions, that is, Pik = unless k = j ± 1. With our present notations we would be compelled to order the integers in a simple sequence, say 0, 1, -1, 2, -2, ... and this would lead to clumsy formulas for the probabilities Pik' The same remark applies to random walks in higher dimensions: For actual calculations it is preferable to specify the points by their coordinates, but the symbolism of the present chapter can be used for theoretical purposes. (b) Branching processes. Instead of saying that the nth trial results in Ek we said in XII,3 that the nth generation is of size k. Otherwise, we were concerned with a standard Markov chain whose transition probability Pik is the coefficient of a Sk in the jth power pies) of the given generating function. (c) Urn models. It is obvious that several urn models of V,2 represent Markov chains. Conversely, every Markov chain is equivalent to an urn model as follows. Each occtlIring sUbscript is represented by an urn, and each urn contains balls marked E1> E 2 , . . •. The composition of the urns remains fixed, but varies from urn to urn; in the )th urn the probability to draw a ball marked Ek is Pjk. At the initial, or zero-th, trial an urn is chosen in accordance with the probability distribution {aJ. From that urn a ball is drawn at random, and if it ismarked E i , the next drawing is made from thejth urn, etc. Obviously with this procedtlIe the probability of a sequence (Ejo " •• ,Ej ) is given by (I.1). We see that the notion of a Markov chain is not more general than urn models, but the new symbolism will prove more practical and more intuitive. ~

°

If a k is the probability of Ek at the initial (or zero th) trial, vie must have a k > and L a k = 1. Moreover, whenever E j occurs it must be followed by some E k , and it is therefore necessary that for all j and k

°

(1.2)

Pi!

+ Pj2 + Pj3 + ... =

1,

Pjk

> 0.

We now show that for any numbers a k and Pik satIsfymg these condItions, the assignment (1 J) is a permissible definition ofprohahilities in the sample space corresponding to n + 1 trials. The numbers defined in (1.1) being non-negative, we need only prove that they add to unity. Fix first )0,)1' ... ,jn-l and add the numbers (I.1) for all possible jn" Using (1.2) with j = jn-l' we see immediately that the sum equals aiopioi 1 • • • Pi n- 2 i n_1 • I hus the sum over all numbers (1.1) does not depend on n, and since La io = 1, the sum equals unity for all n. The definition (1.1) depends, formally on the number of trials, but our argument proves the mutual consistency of the definitions (I.!) for all n.

374

[XV.l

MARKOV CHAINS

For example, to obtain the probability of the event "the first two trials result in (E i , E k )," we have to fix io = i and il = k, and add the probabilities (1.1) for all possible i2,ja, ... ,in" We have just shown that the sum is a i P,k and is thus independent of n. This means that it is usually not necessary explicitly to refer to the number of trials; the event (Ej, •• • 0 ... ,Ei) has the same probabIlIty III all sample spaces of more than r trials. In connection with independent trials it has been pointed out repeatedly that. from a mathematical point of view, it is most satisfactory to introduce only the unique sample space of unending sequences of trials and to consider the result of finitely many trials as the beginning of an infinite sequence. This statement holds true also for Markov chains. Unfortunately, sample spaces of infinitely many trials lead beyond the theory of discrete probabilities to which we are restricted in the present volume. To summarize, our starting point is the following Definition. A sequence of trials with possible outcomes E l , E 2 , • •• is called a Markov chain l if the probabilities of sample sequences are defined by (1.1) in terms of a probability distribution {aT,} for Ek at the initial (or zero-th) trial and fixed conditional probabilities Pjk of Ek given that E j has occurred at the preceding trial.

A slightly modified termin010gy is better adapted for applications of Markov chains. The possible outcomes Ek are usually referred to as possible states of the system; instead of saying that the nth trial results in Ek one says that the nth step kads to E k, or that Ek is entered at the nth step. Finally, Pjk is called the probability of a transition from E j to E k • As usual we imagine the trials performed at a uniform rate so that the number of the step serves as time parameter. The transition probabilities Pjk will be arranged in a matrix of transition probabilities .Pn P12 PIa

P2l P22 P2a

(1.3)

P =

Pal Pa2

Paa

1 This is not the standard terminology. We are here considering only a special class of Markov chains, and, strictly speaking, here and in the following sections the term Markov chain should always be qualified by adding the clause "with stationary transition probabilities." Actually, the general type of Markov chain is rarely studied. It will be defined in section 13, where the Markov property will be discussed in relation to general stochastic processes. There the reader will also find examples of dependent trials that do not form Markov chains.

XV.2]

ILLUSTRATIVE EXAMPLES

375

where the first subscript stands for row, the second for column. Clearly P is a square matrix with non-negative elements and unit row sums. Such a matrix (finite or infinite) is called a stochastic matrix. Any stochastic matrix can serve as a matrix of transition probabilities; together with our initial distribution {a k } it completely defines a Markov chain with states £1' £2'···· In some special cases it is convenient to number the states starting with o rather than with 1. A zero row and zero column are then to be added toP. Historical Note. Various problems treated in the classical literature by urn models now appear as special Markov chains, but the original methods were entirely different. Furthermore, many urn models are of a different character because they involve aftereffects, and this essential difference was not properly understood. In fact, the confusion persisted long after Markov's pioneer work. A. A. Markov (1856-1922) laid the foundations of the theory of finite Markov chains, but concrete applications remained confined largely to card-shuffling and linguistic problems. The theoretical treatment was usually by algebraic methods related to those described in the next chapter. This approach is outlined in M. Frechet's monograph. 2 The theory of ehains with infinitely many states was introduced by A. KohllOgOroV. 3 The new approach in the first edition of this book made the theory accessible to a wider public and drew attention to the variety of possible applications. Since then Markov chains have become a standard topic in probability and a familiar tool in , many applications. For more recent theoretical developments see the notes to sections 11 and 12.

2. ILLUSTRATIVE EXAMPLES (For applications to the classical problem of card-shuffling See section 10.) (a) When there are only two possible states £1 and £2 the matrix of transition probabilities is necessarily of the form p =

[1 lI.

P

P 1-

lI.

J.

Such a chain could be realized by the following conceptual experiment. A partIcle moves along the x-aXIS III such a way that ItS absolute speed remains constant but the direction of the motion can be reversed The system is said to be in state £1 if the particle moves in the positive direction, and in state £2 if the motion is to the left. Then p is the probability 2 Recherches theoriques modernes sur Ie calcul des probabilites, vol. 2 (Theorie des cvenements en ehaine dans le eas d'un nombre fini d'etats possibles), Paris, 1938. 3 Anfangsgriinde der Theorie der Markoffschen Ketten mit unendlich vielen moglichen Zustiinden, Matematiceskii Sbornik, N.s., vol. 1 (1936), pp. 607-610. This paper contains no proofs. A complete exposition was given only in Russian, in Bulletin de l'Universite d'Etat it Moscou, Sect. A., vol. 1 (1937), pp. 1-15.

376

[XV.2

MARKOV CHAINS

of a reversal when the particle moves to the right, and r.t. the probability of a reversal when it moves to the left. [For a complete analysis of this chain see example XVJ,(2.a).] (b) Random walk with absorbing barriers. Let the possible states be Eo, E1 , . • • , Ep and consider the matrix of transition probabilities

r-I

0 0 0

o

0 0-

q 0 p 0

000

o

q 0 p

000

{\

{\

p= v

o

v

{\

v

{\

v

0 0 0

'1

o

{\

v

p

From each of the "interior" states E b • •. , E p _ 1 transitions are possible to the right and the left neighbors (with Pi,HI = P and Pi.i I = q). However, no transition is pos~ible from either Eo or Ep to any other state; the system may move from one state to another, but once Eo or Ep is reached, the system stays there fixed forever. Clearly this Markov chain differs only terminologically from the model of a random walk with absorbing barriers at 0 and p discussed in the last chapter. There the random walk started from a fixed point z of the interval. In Markov chain terminology this amounts to choosing the initial distribution so that a z = 1 (and hence ax = 0 for x ¥:- z). To a randomly chosen initial state there corresponds the initial distribution a k = I/(p+ I). (c) Reflecting barriers. An interesting variant of the preceding example is represented by the chain with possible states E 1 , • • . ,Ep and transition probabilities o 0 0q p 0 0

q ,0 p 0

000

o

q 0 p

000

o o

0 0 0

q 0 p

p=

L....-

0 0 0

This chain may be interpreted in gambling language by considering two players playing for unit stakes with the agreement that every time a player

XV.2]

377

ILLUSTRATIVE EXAMPLES

loses his last dollar his adversay returns it so that the game can continue forever. We suppose that the players own between them p + 1 dollars and say that the system is in state Ek if the two capitals are k and p - k + 1, respectively. The transition probabilities are then given by our matrix P. In the terminology introduced in XIV, lour chain represents a random walk with reflecting barriers at the points t and p + t. Random walks with elastic barriers can be treated in the same way. A complete analysis of the reflecting barrier chain will be found in XVI,3. [See also example (7.c).] (d) Cyclical random walks. Again let the possible states be E 1 , E 2, ... ,Ep but order them cyclically s,o that Ep has the neighbors E p_ 1 and E 1 • If, as before, the system always passes either to the right or to the left neighbor, the rows of the matrix P are as in example (b), except that the first row is (0, P, 0, 0, ... , 0, q) and the last (p, 0, 0, 0, ... , 0, q, 0). More generally, we may permit transitions between any two states. Let qo, qb ... ,qp-l be, respectively, the probability of staying fixed or moving 1,2, ... , p - 1 units to the right (where k units to the right is the same as p - k umts to the left). Then P IS the cyclIcal matnx

P =

qo

ql

q2

qp-2

qp-l

qp-l

qo

ql

qp-3

qp-2

qp-2

qp-l

qo

qp-4

qp-3

For an analysis of this chain see example XVI,(2.d). (e) The Ehrenfest model of diffUSion. Once more we consider a chain with the p + 1 states Eo, E 1 , . • . ,Ep and transitions possible only to the right and to the left neighbor; this time we put Pi .Hl . 1 - j/ p. and Pi i-I = i/ p, so that r-{\

v

p-l p=

°

1 J.

°

2p-l

° ° ° °

{\ V

1 - p-l

° ° °

{\ V

()

()-

V

v

°

°° °°

° °

p-l

1 - 2p-l

°

°

-

378

[XV.2

MARKOV CHAINS

This chain has two interesting physical interpretations. For a discussion of various recurrence problems in statistical mechanics, P. ~nd T. Ehrenfest 4 described a conceptual urn experiment where p mol~cules are distributed in hvo containerli 4. and B. A.t each trial a molecule is chosen at random and moved from its container to the other. The state of the system is determined by the number of molecules in A. Suppose that at a certain moment there are exactly k molecules in the container A. At the next trial the system passes into E k - 1 or Ek+l according to whether a molecule in A or B is chosen; the corresponding probabilities are k/ p and (p-k)/ p, and therefore our chain describes Ehrenfest's experiment. However, our chain can also be interpreted as diffusion with a central force, that is, a random walk in which the probability of a step to the right varies 'Nith the position. From x j the particle is mon~ likely to move to the right or to the left according as j < p/20r j > p/2; this means that the particle has a tendency to move toward x = p/2, which corresponds to an attractive elastic force increasing in direct proportion to the distance. [The Ehrenfest model has been described in example V(2.c); see also example (7.d) and problem 12.] (f) The Bernoulli.. Laplace model of diffusion. 5 A model similar to the Ehrenfest model was proposed by D. Bernoulli as a probabilistic analogue for the flow of two incompressible liquids between two containers. This time we have a total of 2p particles among which p are black and p white. Since these particles are supposed to represent incompressible liquids the densities must not change, and so the number p of particles in each urn remains constant. We say that the system is in state Ek (k = 0, '1, ... , p) if the first urn contains k white particles. (This implies that it contains p - k black particles while the second urn contains p - k v/hite and k black particles). At each trial one particle is chosen from each urn, and these two particles are interchanged. The transition probabilities are then given by (2.1)

P;.H

=

(1-r p

·· = 2 j(p - j) P33 2 P

4 P. and T. Ehrnnfest, Oger zwei· gekarmte Einwiinde gegen das &ttzmannsG.ke H-Theorem, Physikalische Zeitschrift, vol. 8 (1907), pp. 311-314. Ming Chen Wang and G. E. Uhlenbeck, On the theory of the Brownian motion II, Reviews of Modern Physics, vol. 17 (1945), pp. 323-342. For a more complete discussion see M. Kac, Random walk and the theory of Brownian motion, Amer. Math. Monthly, vol. 54 (1947), pp. 369-391. These authors do not mention Markov chains, but Kac uses methods closely related to those described in the next chapter. See also B. Friedman, A simple urn model, Communications on Pure and Applied Mathematics, vol. 2 (1949), pp. 59-70. 5 In the form of an urn model this problem was treated by Daniel Bernoulli in 1769, criticized by Malfatti in 1782, and analyzed by Laplace in 1812. See I. Todhunter, A history of the mathematical theory of probability, Cambridge, 1865.

XV.2]

ILLUSTRA TIVE EXAMPLES

379

and Pjk = 0 whenever Ij - kl > 1 (here j = 0, ... , p). [For the steady state distribution see example (7.e); for a generalization of the model see problem 10.] (g) Random placements of halls Consider a sequence of independent trials each consisting in placing a ball at random in one of p given cells (or urns). We say that the system is in state Ek if exactly k cells are occupied. This determines a Markov chain with states Eo, ... , Ep and transition probabilities such that (2.2)

j

Pjj

= -,

,P

pj,HI

P. - j

=--

P

and, of course, Pjk = 0 . for all other combinations of j and k. If initially all cells are empty, the distribution {a k } is determined by a o = 1 and a k = 0 for k > O. [This chain· is further analyzed in example XVI,(2.e). Random placements of balls were treated from different points of view in 11,5 and IV,2.] (h) An example from cell genetics. 6 A Markov chain with states Eo, ... ,EN and transition probabilities

(2.3)

occurs in a biological problem which may be described roughly as follows. Each cell of a certain organism contains N particles, some of which are of type A, the others of type B . . The cell is said to be in state E j if it contains exactly j particles of type A. Daughter cells are formed by cell division, but prior to the division each particle replicates itself; the daughter cell inherits N particles chosen at random from the 2j particles of type A and 2N - 2j particles of type B present in the parental cell. The probability that a daughter cell is in state Ek is then given by the hypergeometric distribution (2.3). It v/ill be shown in example (S.b) that a.ftel sufficiently many generations the entire population will be (and remain) in one of the pure states Eo or EN; the probabilities of these two contingencies are 1 - lIN and jlN, respectively, where E j stands for the initial state. 6 I. V. Schensted, Model of subnuclear segregation in the macronucleus of Ciliates, The Amer. Naturalist, vol. 92 (1958), pp. 161-170. This author uses essentially the methods of chapter XVI, but does not mention Markov chains. Our formulation of the problem is mathematically equivalent, but oversimplified biologically.

3S0

[XV.2

MARKOV CHAINS

(i) Examples from population genetics. 7 Consider the successive genera-

tions of a population (such as the plants in a corn field) which is kept constant in size by the selection of N individuals in each generation. A particular gene assuming the forms A and a has 2 N representatives; if in the nth generation A occurs j times, then a occurs 2N - j times. In this case we say that the popUlation is in state E j (0 < j < 2N). Assuming random mating, the composition of the following generation is determined by 2N Bernoulli trials in which the A-gene has probability J/2N. We have therefore a Markov cham WIth (2.4) In the states Eo and E2N all genes are of the same type, and no exit from these states is possible. (They are called homozygous.) It will be shown in example (S.b) that ultimately the population will be fixed at one of the homozygous states Eo or E 2N . If the popUlation starts from the initial· state E j the correspondmg probabIlItIes are I - j/(2N) and j/(2N). This model can be modified so as to take into account possible mutations and selective advantages of the genes. (j) A breeding problem. In the so-called brother-sister mating two individuals are mated, and among their direct descendants two individuals of opposite sex are selected at random. These are again mated, and the process continues mdefimtely. WIth three genotypes AA, Aa, aa for each parent, we have to distinguish six combinations of parents which we label as follows: E1 = AA X AA, E2 = AA X Aa, E3 = Aa X Aa, E4 = Aa X aa, E5 = aa X aa, E6 = AA X aa. Using the rules of V,5 it is easily seen that the matrix of transition probabilities is in this case

1

0-

4

2"

1

1. 4

ls

t

t

t

l's

k

t

i

t

1

- 0

0-

7 This problem was discussed by different methods by R. A. Fisher and S. Wright. The formulation in terms of Markov chains is due to G. Malecot, Sur un prob/erne de probabilites en chaine que pose /a genetique, Comptes rendus de I' Academie des Sciences, vol. 219 (1944), pp. 379-381. .

XV.2]

381

ILLUSTRATIVE EXAMPLES

[The discussion is continued in problem 4; a complete treatment is given in example XVI,(4.b).] . (k) Recurrent events and residual waiting times. The chain with states Eo, E 1 • • •. and transition probabilities

r-

11 h 13 h ...1

0 p,=

-

0 0

-

be used repeatedly for purposes of illustration; the probabilities Ik are arbitrary except that they must add to unity. To visualize the process suppose that it starts from the initial state Eo. If the first step leads to E k - 1 the system is bound to pass successively through E k - 2 , E k - 3 , ••• , and at the kth step the system returns to Eo, whence the process starts from scratch. The successive returns to Eo thus represent a persistent recurrent event E with the distribution {Ik} for the recurrence times. The state of the system at any time is determined by the waiting time to the next passage through Eo. In,most concrete realizations of recurrent events the waiting time for the next occurrence depends on future developments and our ~4arkov chain is then without operational meaning. But the chain is meaningful when it is possible to imagine that simultaneously with each occurrence of S there occurs a random experiment whose outcome decides on the length of the next waiting time. Such situations occur in practice although they are the exception rather than the rule. For example, in the theory of self-renewing aggregates [example XIII,(10.d)] it is sometimes assumed that the lifetime of a nevlly installed piece of equipment depends on the choice of this piece but is completely determined once the choice is made. Again, in the theory of queues at servers or telephone trunk lines the successive departures of customers usually correspond to recurrent events. Suppose now that there are many types of customers but that each type re'quires service of a known duration. The waiting time between two successive departures is then uniquely determined from the moment when the corresponding customer joins the waiting line. [See example (7.g).] WIll

382

[XV.3

MARKOV CHAINS

(I) Another chain connected with recurrent events. Consider again a chain with possible states Eo, Eb ... and transition probabilities ~ql

p=

PI

q2 0

P2

q3

P3

q4

P4

-

where Pk + qk = I. For a picturesque description we may interpret the state E,. as representing the "age" of the system When the system reaches age k the aging process continues with probability Pk+1' but with probability qk+l it rejuvenates and starts afresh with age zero. The successive passages through the state Eo again represent a recurrent event and the probability that a recurrence time equals k is given by the product PIP2 ... Pk-lqk' It IS pOSSIble to choose the Pk III such a way as to obtam a prescribed distribution {fk} for the recurrence times; it suffices to put ql = fl' then q2 = hlpl' and so on. Generally '-

(2.5)

1 - 11 - ... - Ik Pk = 1 - f - ... -.( . 1 Jk-l

Here k = 0, 1, . .. and the zeroth trial counts as failure. In other words, the index k eq uals the length of the uninten upted block of successes ending at the nth trial. The transition probabilities are those of the preceding example with Pk = P and qk = q for all k. 3. HIGHER TRANSITION PROBABILITIES

We shall denote by p}~) the probability of a transition from E j to Ek in exactly n steps. In other words, p~~) is the conditional probability of entering Ek at the nth step given the initial state Ej; this is the sum of the

XV.3]

HIGHER TRANSITION PROBABILITIES

383

probabilities of all possible paths EjEj1 ... Ejn_1Ek of length n starting at E j and ending at E k • In particular p;~) = Pjk and (3.1)

(2)

P jk

'" = £..

P jvPvk·

By induction we get the general recursion formula ~n+1) PJk

(3.2)

= '" P' p(n). £.. JV vk ' v

a further mductIOn on m leads to the basic identity ~m+n) = '" p(,m)p(n) PJk £.. JV Vk

(3.3)

v

(which is a special case of the Chapman-Kolmogorov identity). It reflects the simple fact that the first m steps lead from E j to some interrr:ediate state E v , and that the probability of a subsequent passage from Ev to Ek does not depend on the manner in which Ev was reached. 8 In the same way as the Pjk form the matrix P, we arrange the p}~) in a matrix to he denoted by pn. Then (3.2) states that to obtain the element p}~+1) of pn+1 we have to multiply the elements of the jth row of P by the corresponding elements of the kth column of pn and add all products. This operation is called row-into-column multiplication of the matrices P and pn and is expressed symbolically by the equation pn+1 = ppn. This suggests calling pn the nth power of P; equation (3.3) expresses the pmpn. familiar la w pm+n In order to have (3.3) true for all n > we define pj2) by p;~) = 1 and pj2) = for j ~k as is natural.

°

°

Examples. (a) Independent trials. Explicit expressions for the higherordertraIlsitioIl probabilities are usually hard to come by, but fortunately they are only of minor interest. As an important, if trivial, exception we note the special case of independent trials. This case arises when all rows of P are identical with a given probability distribution, and it is clear without calculations that this implies pn - P for all n. (b) Success runs. In example (2.m) it is easy to see [either from the recUt sion formula (3.2) OI directly from the definition of the process] that qpk

p~~) = pk

o 8

for

k = 0, 1, ... , n - 1

for

k = n

otherwise

The latter property is characteristic of Markov processes to be defined in section 13.

It has been assumed for a long time that (3.3) could be used for a definition of Markov

chains but, surprisingly, this is not so [see example (13./)].

384

[XV.4

MARKOV CHAINS

In this case it is clear that 'pn converges to a matrix such that all elements ~ in the column number k equal qpk. Absolute Probabilities

Let again a j stand for the probability of the state E j at the initial (or zeroth) trial. The (unconditional) probability of entering Ek at the nth step is then (3.4) Usually we let the process start from a fixed state E i , that is, we put 1. In this case aicn ) = p~~). We feel intuitively that the influence of the initial state should gradually wear off so that for large n the distribution (3.4) should be nearly independent of the initial 4jstribution {a j }. This is the case if (as in the last example) pj~) converges to a limit independent of j, that is, if pn converges to a matrix with identical rows. We shall see that this is usually so, but once more we shall ha veto take into account the annoy ing exception caused by periodicities.

ai =

4. CLOSURES AND CLOSED SETS We shall say that Ek can be reached from E j if there exists some It > 0. such that p~Z) > 0 (i.e., if there is a positive probability of reaching Ek from E j including the case Eic = Ej). For example, in an unrestricted random walk each state can be reached from every other state, but from an absorbing barrier no other state can be reached. Definition. A set C ofstates is el-osed ifno state outside C call he reached from any state E j in C. For an arbitrary set C of states the smallest closed set containing C is called the closure of C. A single state Ek forming a closed set will be called absorbing. A Markov chain is irreducible if there exists no closed set other than the set of all states. .

~

Clearly C is closed if, and only if, Pjk = 0 whenever) is in C and'1( outside C, for in this case we see from (3.2) that pj~) = 0 for every n. We have thus the obvious Theorem. If in the matrices pn all rows and all columns corresponding to states outslde the closed set Care deleted, there remam stochasllc matrices for which the fundamental relations (3.2) and (3.3) again hold. •

This means that we have a Markov ch~in defined on C, and this subchain can be studied independently of all other states.

XV.4]

CLOSURES AND CLOSED SETS

385

The state Ek is absorbing if, and only if, Pkk = 1; in this case the matrix of the last theorem reduces to a single element. In general it is clear that the totality of all states Ek that can be reached from a given state E j forms a closed set. (Since the closure of E j cannot be smaller it coincides with this set.) An irreducible chain contains no proper closed subsets, and so we have the simple but useful Criterion. A chain is irreducible if, and only if, every state can be reached from every other state.

Examples. (a) In order to find all closed sets it suffices to know which Pjk vanish and which are positive. Accordingly, we use a * to denote positive elements and consider a typical matrix, say

o * *

* 000 * 000 0 0 0 0 * 0 0

* p=

0 0 0 0

o * o *

* 0 0 0 0

000 0 0 0 0 000

000

* * 0 0

* 000 0 *

We number the states from 1 to 9. In the fifth row a * appears only at the fifth place, and therefore Paa = 1: the state Ea is absorbing. The third and the eighth row contain only one positive element each, and it is clear that E3 and Eg form a closed set. From El passages are possible into E4 and E 9, and from there only to Eb E4, E 9. Accordingly the three states E 1 , E4, E9 form another closed set. From E2 direct transitions are possible to itself and to E 3 , ' E 5 , and Eg. The pair (E3' Ea) forms a closed set while Ea is asorbing; accordingly, the closure of E2 consists of the set E 2, E 3, E 5, Eg. The closures of the remaining states E6 and E7 are easily seen to consist of all nine states. The appearance of our matrix and the determination of the closed sets can be simplified by renumberi~g the states in the order

E5E3EgEIE4E9E2E6E7 . The closed sets then contain. only adjacent states and a glance at the new matrix reveals the grouping of the states.

386

MARKOV CHAINS

[XV.4

(b) In the matrix of example (2.j) the states E1 and E5 are qbsorbing and there exist no other closed sets. (c) In the genetics example (2.i) the states Eo and E2}V are absorbing. When 0 < j < 2 N the closure of ~ contains all states. In example (2.11) the states Eo and Es are absorbing. ~

Consider a chain with states E 1 , ••• , Ep such that E 1 , ••• , Er form a closed set (r < p). The r by r submatrix of P appearing in the left upper corner is then stochastic, and we can exhibit P in the form of a partitioned matrix (4.1)

p =

[~ ~l

The matrix in the upper right corner has r rows and p - r columns and only zero entries. Similarly, U stands for a matrix with p - r rows and r columns while V is a square matrix. We shall use the symbolic partitioning (4.1) also when the closed set C and its complement C' contain infinitely many states; the partitioning indicates merely the grouping of the states and the fact that Pjk = 0 whenever E j is in C and Ek in the complement C'. From the recursion formula (3.2) it is obvious that the higher-order transition probabilities admit of a similar partitioning: (4.2) We are not at present interested in the form of the elements of the mat~ix Un appearing in the left lower corner. The point of interest is that (4.2) reveals three obvious, but important, facts. First, p}~) = 0 whenever E j E C but Ek E C'. Second, the appearance of the power Qn indicates that when both E j and E" are in C the transition probabilities p}~) are obtained from the recursion formula (3.2) with the summation restricted to the states of the closed set C. Finally, the appearance of vn indicates that the last statement remains true when C is replaced by its complement C'. As a consequence it will be possible to simplify the further study of Markov chains by considering separately the states of the closed set C and those of the complement C'. Note that we have not assumed Q to be irreducible. If C decomposes into several closed subsets then Q admits of a further partitioning. There exist chains with infinitely many closed subsets. Example. (d) As was mentioned before, a random walk in the plane represents a special Markov chain even though an ordering of the states in a simple sequence would be inconvenient for practical purposes. Suppose now that we modify the random walk py the rule that on reaching the

XV.5]

CLASSIFICATION OF STATES

387

x-axis the particle continues a random walk along this axis without ever leaving it. The points of the x-axis then form an infinite closed set. On the other hand, if we stipulate that on reaching the x-axis the particle remains forever fixed at the hitting point, then every point of the x-axis ~ becomes an absorbing state.

5. CLASSIFICATION OF STATES Tn a process starting from the initial state E) the successive returns to

E j constitute a recurrent event, while the successive passages through any

other state constitute a delayed recurrent event (as defined in XIII,5). The theory of Markov chains therefore boils down to a simultaneous study of many recurrent events. The general theory of recurrent events is applicable wIthout modIficatIOns, but to avoid excessive references to chapter XIII we shall now restate the basic definitions. The present chapter thus becomes essentially self-contained and independent of chapter XIII except that the difficult proof of (5.8) will not be repeated in full. The states of a ~4arkov chain '),Iill be classified independently from two viewpoints. The classification into persistent and transient states is fundamental, whereas the classification into periodic and aperiodic states concerns a technical detail. It represents a nuisance in that it requires· constant references to trivialities; the beginner should concentrate his attention on chains without periodic states. All definitions in this section involve only the matrix of transition probabilities and are independent of the initial distribution {a j }. Definition 1. The state E j has period t > 1 if p};) = 0 unless n = 'Vt is a multiple of t, and t is the largest integer with this property. The state E j is aperiodic

if no such

t

>]

e xjsts 9

To deal with a periodic E j it suffices to consider the chain at the trials number t, 2t, 3t, .... In this way we obtain a new Markov chain with transition probabilities p:~), and in this new chain E j is aperiodic. In this way results concerning aperiodic states can be transferred to periodic states. The details will be discussed in section 9 and (excepting the following example) we shall now concentrate our attention on apetiodic chains. Example. (a) In an unrestricted random walk all states have period 2. In the random walk with absorbing barriers at 0 and p [example (2.b)] the interior states have period 2, but the absorbing states Eo and Ep are, of course, aperiodic. If at least one of the barriers is made reflecting [example (2.c)], all states become aperiodic. ~ 9 A state E j to which no return is possible (for which p~:) = 0 for all n will be considered aperiodic.

> 0)

388

MARKOV CHAINS

[XV.5

Notation. Throughout this chapter f~~) stands for the probability that in a process starting from E j the first entry to Ek occurs at the nth step. We put f}~) = 0 and (X)

(5.1)

!jk

=.2 !j~) n=l (X)

(5.2)

f.lj

=.2 n!~j). n=l

Obviously hk is the probability that, starting from E j , the system will ever pass through Ek. Thus hk < 1. When hk = 1 the {fAn)} is a proper probability distribution and we shall refer to it as the first-passage distribution for Ek • In particular, {iJ~n)} represents the distribution of the recurrence times for E j • The definition (5.2) is meaningful only when hj = 1, that is, when a return to E j is certain. In this case f.lj < 00 is the mean recurrence time for E j • No actual calculation of the probabilities fAn) is required for our present purposes, but for conceptual clarity we indicate how the fAn) can be determined (by the standard renewal argument). If the first passage through Ek occurs at the vth trial (1 < v < n - 1) the (conditional) probability of Ek at the nth trial equals Pk~-v). Remembering the convention that Pk~) = 1 we conclude that n

"'" PJk = £..

(5.3)

(,n)

!(,v)p(n-v)

Jk kk

.

v=l

. . Iy n -- 1, 2,... we get recursIve . Iy J1'(1) L ettmg successIve jk ' J1'(2) jk ' .... Conversely, If the Jj~t} are known for the pair ], k then (5.3) determmes all the transition probabilities pj~). The first question concerning any state E j is whether a return to it is certain. If it is certain, the question arises whether the mean recurrence time Lt, is finite or infinite The following definition agrees with the

Definition 2. The state E j is persistent hj < 1. A persistent state E j is called null state /-lj-

if hj = if its

1 and transient

if

mean recurrence time

00.

This definition applies also to periodic states. It classifies all persistent states into null states and non-null states. The latter are of special interest, and since we usually focus our attention on aperiodic states it is convenient

XV.5]

389

CLASSIFICATION OF STATES

to use the term ergodic for aperiodic, persistent non-null states.lO This leads us to Definition 3.

An aperiodic persistent state E j with

f.lj

<

00

is called

ergodlc. The next theorem expresses the conditions for the different types in terms of the transition probabilities pi;). It is of great importance even though the criterion contained in it is usually too difficult to be useful. Better cntena wIll be found III sectIOns 7 and 8, but unfortunately there exists no simple universal criterion. Theorem.

(i) E j is transient if, and only if, <Xl

.2 p~~) < 00.

(5.4)

n=O

In this case (X)

.2 p;:) < 00

(5.5)

n=1

for all i. (ii) E j is a (persistent) null state if, and only if, (X)

.2 p~j) =

(5.6)

It

as n -+

00.

(5.7)

bu t

CO,

p(.~) 11

-+

()

In this case

P ~~) J l

-+

for all i. (iii) An aperiodic (persistent) state E j is ergodic if, and only if, In this case as n -+ co (5.8) Corollary.

(n)

Pij

If E j is aperiodic,

f.lj

<

co.

.(-1

-+ Jijf.lj •

p~;)

tends either to 0 or to the limit given

by (5.8). 10 Unfortunately this terminology is not generally accepted. In Kolmogorov's termmology transient states are called "unessentzal," but this chapter was meant to show that the theoretical and practical interest often centers on transient states. (Modern potential theory supports this view.) Ergodic states are sometimes called "positive," and sometimes the term "ergodic" is used in the sense of our persistent. (In the first edition of this book persistent Ei were regretably called recurrent.)

390

[XV.6

MARKOV CHAINS

Proof. The assertion (5.4) is contained in theorem 2 of XIII,3. The assertion (5.5) is an immediate consequence of this and (5.3), but it is also contained in theorem 1 of XIII,S. For an aperiodic persistent state E) theorem 3 of XIII, 3 asserts that pj;) -+ flj!, where the right side is to be interpreted as zero if flj = 00. The assertions (5.7) and (5.8) follow again immediately from this and (5.3), or else from theorem 1 of XIII,S. Let E j be persistent and flj = 00. By theorem 4 of XIII,3 in this case pj;) -+ 0, and this again implies (5.7). ~ Examples. (b) Consider the state Eo of the chain of example (2./). The peculiar nature of the matrix of transition probabilities shows that a first return at the nth trial can occur only through the sequence

and so for n

>1

(5.9)

and fo~) = Ql' In the special case that the Pk are defined by (2.5) this reduces to Irion) = In- Thus Eo is transient if Lin < 1. For a persistent Eo the mean recurrence time flo of Eo coincides with the expectation of the distribution {fn}. Finally, if Eo has period t then fn -:except when n is a multiple of t. In short, as could be expected, Eo is under any circumstances of the same type as the recurrent event 6 associated with our Markov chain. (c) In example (4.a) no return to E2 is possible once the system leaves this state, and so £2 is transient. A slight refinellIent of this argument shows that E6 and E7 are transient. From theorem 6.4 it follows easily that all other states are ergodic; ~

°

6. IRREDUCIBLE CHAINS. DECOMPOSITIONS For brevity we say that Iwu slales are vi the same type if they agree in all characteristics defined in the preceding section. In other words, two states of the same type have the same period or they are both aperiodic; both are transient or else both are persistent; in the latter case either both mean recurrence times are infinite, or else both are finite. The usefulness of our classification depends largely on the fact that for all practical purposes it is always possible to restrict the attention to states of one particular type. The next theorem shows that this is strictly true for irreducible chains.

XV.6] Theorem 1.

IRREDUCIBLE CHAINS.

DECOMPOSITIONS

391

All states of an irreducible chain are of the same type

Proof. Let E j and Ek be two arbitrary states of an irreducible chain. ! n view of the criterion of section 4 every state can be reached from every other state, and so there exist integers rand s such that p~~) - IX > 0 and Pk~l = fJ > O. Obviously

(6.1)

11 P (.~+r+sl

> p(rlp(nlp(sl (nl jk kk kj -_ fJ Pkk . IX

_

Here j, k, r, and S' are fixed while n ili arbitrary For a tranliient E j the left side is the term of a convergent series, and therefore the same is true of Pk~l. Furthermore, if pj;l -+ 0 then also Pk~l -+ O. The same statements remain true when the roles of j and k are interchanged, and so either both E j and Ek are transient, or neither is; if one is a null state, so IS the other. Finally, suppose that E j has period t. For n = 0 the right side in (6.1) is positive, and hence r + s is a multiple of t. But then the left side vanishes unless n is a multiple of t, and so Ek has a period which is a multiple of t. Interchanging the roles of j and k we see that these . states have the same period. ..

The importance of theorem 1 becomes apparent in conjunction with Theorem 2. For a persistent E j there exists a unique irreducible closed set C containing Ej and such that for every pair E i, Ek of states in C

(6.2)

and

In other words: Starting from an arbitrary state Ei in C the system is certain to pass through every other state of C; by the definition of closure no exit from C is possible. Proof. Let Ek be a state that can be reached from E j • It is then obviously possible to reach Ek without previously returning to Ej, and we denote the probability of this event by IX. Once Ek is reached, the probability of never returning to E j is 1 - hj' The probability that, starting from E j , the system never returns to E j is therefore at least oc(1 Jkj). But for a persistent E j the probability of no return is zero, and so hj = 1 for every Ek that can be reached from Ej. Denote by C the aggregate of all states that can be reached from E j • If Ei and Ek are in C we saw that E j can be reached from E k, and hence also Ei can be reached from Ek. Thus every state in C can be reached from every other state in C, and so C is irreducible by the criterion of section 4. It follows that all states in C are persistent, and so every Ei can be assigned the role of E j in the first part of the argument. This means that hi = 1 for all Ek in C, and so (6.2) is true. ..

392

MARKOV CHAINS

[XV.7

The preceding theorem implies that the closure of a persistent state is irreducible. This is not necessarily true of transient states. Example. Suppose that Pik = 0 whenever k < j, but P11+1 > o. Transitions take place only to higher states, and so no return to any state is possible. Every E j is transient, and the closure of E j consists of the states E j , E1+1' E1+2, ... ' but contains the closed subset obtained by ~ deleting E j • It follows that there exist no irreducible sets. The last theorem implies in particular that no transient state can ever be reached from a persistent state. If the chain contains both types of states, this means that the matrix P can be partitioned symbolically in the form (4.1) where the matrix Q corresponds to the persistent states. Needless to say, Q may be further decomposable. But every persistent state belongs to a unique irreducible subset, and no transition between these subsets is possible. We recapitulate this in Theorem 3. The states of a Markov chain can be divided, in a unique . , . sets r 'T'G II manner, mtD nDn-DVerttlppmg , b 2G, . . . SUCtl taat (i) T consists of all transient states.

(ii) If E j is in C v then jjk = 1 for all Ek in C v while jjk = 0 for all Ek outside C v. This implies that C v is irreducible and contains only persistent states of the same type. The example above shows that all states can be transient, while example (4.d) proves the possibility of infinitely many C v. We derive the following theorem as a simple corollary to theorem 2, but it can be proved in other simple ways (see problems 18-20). Theorem 4. In a finite chain there exist no null states, and it is impossible that all states are transient. Proof. The rows of the matrix pn add to unity, and as they contain a fixed number of elements it is impossible that p}~) -+ 0 for all pairs j, k Thus 001 all states are transient. But a persistent state belongs to an irreducible set C. All states of C are of the same type. The fact that C contams a perSIstent state and at least one non-null state therefore implies that it contains no null state. ~ 7. INVARIANT DISTRIBUTIONS Since every persistent state belongs to an irreducible set whose asymptotic behavior can be studied independently of the remaining states, we shall now concentrate on irreducible chains. All states of such a chain are of the same type and we begin with the simplest case, namely chains with

XV.7]

393

INVARIANT DISTRIBUTIONS

finite mean recurrence times flj. To avoid trivialities we postpone the discussion of periodic chains to section 9. In other words, we consider now chains whose states are ergodic (that is, they are aperiodic and persistent with finite mean recurrence times See definition 5.3) Theorem. (7.1)

In all irreducible chain with only ergodic elements the limits Uk

= lim

p;~)

n-+<XJ

exist and are independent of the initial state j. Furthermore

Uk

> 0,

(7.2)

(7.3)

Uj

= .2 UiPij· i

Conversely, suppose that the chain is irreducible and aperiodic, and that there exist numbers Uk > 0 satisfying (7.2) (7.3). Then all states are ergodic, the Uk are given by (7.1), and (7.4)

where

flk

is the mean recurrence time of Ek ·

Proof. (0 Suppose the chain irreducible and ergodic, and define Uk by (7.4). Theorem 6.2 guarantees that fij = 1 for every pair of states, and so the assertion (7.1) reduces to (5.8). Now

(7.5)

Plk~n+ll

= £.. " p~~)p. lJ Jk· j

As n -+ 00 the left side approaches Uk' while the general term of the sum on the right tends to UjPik. Taking only finitely many terms we infer that (7.6)

For fixed (7.7)

Uk

> .2 U jPjk·

and n the left sides in (7.5) add to unity, and hence

s=

.2 < l. Uk

Summing over k in (7.6) we get the relation s > s in which the inequality sign is impossible We conclude that in (7.6) the equality sign holds for all k, and so the first part of the theorem is true. 11

U

If we conceive of {Uj} as a row vector, (7.3) can be written in the matrix form

= uP.

394 (ii) Assume

[XV.7

MARKOV CHAINS

Uk

>0

and (7.2)-(7.3). By induction

(7.8) for every n > 1. Since the chain is assumed irreducible all states are of the same type. If they were transient or null states, the right side in (7.8) would tend to 0 as n -+ 00, and this cannot be true for all k because the Uk add to unity. Periodic chains being excluded, this means that the states are ergodic and so the first part of the theorem applies. Thus, letting n -+ 00, (7.9) Accordingly, the probability distribution {Uk} is proportional to the probability distribution {,u~l}, and so Uk = ,u~l as asserted. ~ To appreciate the meaning of the theorem consider the development of the process from an initial distribution {aj}. The probability of the state Ek at the nth step is given by (7.10)

a(n) k

= '" £..

a .p(.n) J

Jk

j

[see (3.4)]. In view of (7.1) therefore as n -+

00

(7.11) In other words, whatever the initial distribution, the probability of Ek tends to Uk. On the other hand, when {Uk} is the initial distribution (that is, when ak = Uk), then (7.3) implies a11 ) = Uk' and by induction a1n ) = Uk for all n. Thus an initial distribution satisfying (7.3) perpetuates itself fOI all times. For this reason it is called invariant.

,

A probability distribution {Uk} satisfying (7.3) is called invariant or stationary (for the given Markov chain). Definition.

The main part of the preceding theorem may now be reformulated as follows.

An irreducible aperiodic chain possesses an invariant probability distribution {Uk} if, and only if, it is ergodic. In this case Uk > 0 for all k, and the absolute probabilities a1n ) tend to Uk irrespective of the initial distribution. The physical significance of stationarity becomes apparent if we imagine a large number of processes going on simultaneously. To be specific, consider N particles performing independently the same type of random

XV.7]

INY ARIANT DISTRIBUTIONS

395

walk. At the nth step the expected number of particles in state Ek equals Nak n ) which tends to Nuk • After a sufficiently long time the distribution will be approximately invariant, and the physicist would say that he observes the particles in equilibrium. The distribution {u lc } is therefore also called equilibrium distribution. Unfortunately this term distracts attention from the important circumstance that it refers to a so-called macroscopic equilibrium, that is, an equilibrium maintained by a large number of transitions in opposite directions. The individual particle exhibits no tendency to equilibrium, and our limit theorem has no implications for the individual process. Typical in this respect is the symmetric random walk discussed in chapter III. If a large number of particles perform independently such random walks starting at the origin, then at any time roughly half of them will be to the right, the other half to the left of the origin. But this does not mean that the majority of the particles spends half their time on the positive side. On the contrary, the arc sine laws show that the majority of the particles spend a disproportionately large part of their time on the same side of the origin, and in this sense the majority is not representative of the ensemble. This example is radical in that it involves infinite mean recurrence times. With ergodic chains the chance fluctuations are milder, but for practical purposes they will exhibit the same character whenever the recurrence times have very large (or infinite) variances. Many protracted discussions and erroneous conclusions could be avoided by a proper understanding of the statistical nature of the "tendency toward eq uihbrium. " In the preceding theorem we assumed the chain irreducible and aperiodic, and it is pertinent to ask to what extent these assumptions are essential. A perusal of the proof will show that we have really proved more than is stated in the theorem. In particular we have, in passing, obtained the following criterion applicable to arbitrary chains (including periodic and reducible chains). Criterion. If a chain possesses an invariant probability distribution {Uk}' then Uk 0 /01 each Ek that is eithel tlansient 01 a pelsistent null state.

In other words, Uk > 0 implies that Ek is persistent and has a finite mean recurrence time, but Ek may be periodic. We saw that the stationarity of {Uk} implies (7.8). If Ek is either transient or a null state, then pj~) -+ 0 for all j, and so Uk = 0 ~ as asserted. Proof.

As for periodic chains, we anticipate the result proved in section 9 that a unique invariant probability distribution {Uk} exists for every irreducible chain whose states have finite mean recurrence times. Periodic chains were

396

MARKOV CHAINS

[XV.7

excluded from the theorems only because the simple limit relations (7.1) and (7.11) take on a less attractive form which detracts from the essential point without really affecting it. Examples. Ca) Chams wIth several Irreducible components may admit of several stationary solutions. A trite, but typical, example is presented by the random walk with two absorbing states Eo and Ep [example (2.b)]. Every probability distribution of the form (ex, 0, 0, ... ,0, 1 - ex), attributing positive weights only to Eo and E p , is stationary. (b) Given a matrix of transition probabilities Pik it is not always easy to decide whether an invariant distribution {Uk} exists. A notable exception occurs when (7.12)

Pik

JOI

Ik

jl>

1,

that is, when all non-zero elements of the matrix are on the main diagonal or on a line directly adjacent to it. With the states numbered starting with the defining relations (7.3) take on the form

°

(7.13)

°

°

and so on. To avoid trivialities we assume that Pi,HI > and Pi,i-l > for all j, but nothing is assumed about the diagonal elements Pii. The equatIOns (7.13) can be solved successIvely for U 1 , U 2 , •••• Remembering that the row sums of the matrix P add to unity we get (7.14)

U2 -

POlPl2 U

0'

PIOP21

and so on. The resulting (finite or infinite) sequence U o, U 1 , ••• represents the unique solution of (7.13). To make it a probability distribution the norming factor U o must be chosen so that .2 Uk = 1. Such a choice is possible if, and only if, (7.15)

"'" POIPl2P23 • • . Pk-l,k

L

<

00.

PlOP2lP32 • . • Pk,k-l

This, then, is the necessary and sufficient condition for the existence of an invariant probability distribution; if it exists, it is necessarily unique. [If (7.15) is false, (7.12) is a so-called invariant measure. See section 11.] In example (S.d) we shall derive a similar criterion to test whether the states are persistent. The following three examples illustrate the applicability of our criterion.

XV.7]

INY ARIANT DISTRIBUTIONS

397

(c) Reflecting barriers. The example (2.c) (with p < (0) represents the special case of the preceding example with Pi .HI = P for all j < p and Pi .i-l = q for all j > 1. When the number of states is finite there exists an invariant distribution with Uk proportional to (p/q)k. With infinitely many states the convergence of (7.15) requires that P < q, and in this case Uk = (1 - p/q)(p/q)k. From the general theory of random walks it is clear that the states are transient when P > q, and persistent null states when P = q. This will follow also from the criterion in example (S.d). (d) The Ell! en/est model of diffusion. For the matrix of example (2.e) the solution (7.14) reduces to

(7.16)

k = 0, ... , p.

The binomial coefficients are the terms in the binomial expansion for (1 + l)P, and to obtain a probability distribution we must therefore put p U o = 2- • The chain has period 2, the states have finite mean recurrence times, and the binomial distribution with p - t is invariant This result can be interpreted as follows: Whatever the initial number of molecules in the first container, after a long time the probability of finding k molecules in it is nearly the same as if the a molecules had been distributed at random, each molecule having probability t to be in the first container. This is a typical example of how our result gains physical significance. For large a the normal approximation to the binomial distribution shows that, once the limiting distribution is approximately established, we are practically certain to find about one-half the molecules in each container. To the physicist a = 106 is a small number, indeed. But even with a - 106 molecules the probabIhty of findmg more than 505,000 molecules in one container (density fluctuation of about 1 per cent) is of the order of magnitude 10-23 • With a = 108 a density fluctuation of one in a thousand has the same negligible probability. It is true that the system will occasionally pass into very improbable states, but their reclIrrence times are fantastically large as compared to the recurrence times of states near the equihbnum. PhYSIcal IrreverSIbIlIty mamfests Itself m the fact that, whenever the system is in a state far removed from equilibrium, it is much more likely to move toward equilibrium than in the opposite direction. (e) The Bernoulli-Laplace model of dijJu'iion For the matrix with elements (2.1) we get from (7.14) (7.17)

Uk

= (:)' uo,

k = 0, ... , p.

398

[XV.7

MARKOV CHAINS

The binomial coefficients add to (2;) [see 11,(12.11)], and hence (7.18)

represents an invariant distribution. It is a hypergeometric distribution (see 11,6). This means that in the state of equilibrium the distribution of colors in each container is the same as if the p particles in it had been chosen at random from a collection of p black and p white particles. (I) In example (2./) the defining relations for an invariant probability distribution are k - 1,2,

(7.l9a) (7.19b)

From (7.19a) we get (7.20) and it is now easily seen that the first k terms on the right in (7 .19b) add to U o - Uk. Thus (7.l9b) is automatically satisfied whenever Uk -+ 0, and an invariant probability distribution exists if, and only if,

.2 P1P2 ... Pk < 00.

(7.21)

k

[See also examples (8.e) and (1l.c).] (g) Recurrent events. In example (2.k) the conditions for an invariant probability distribution reduce to (7.22)

Adding over k (7.23)

k - 0, 1, ....

=

0, 1, . .. we get

l'.'here

r n = In+!

+ In+2 + ....

Now ro + r1 . • . = fl is the expectation of the distributions. An invar-iant probability distribution is gilJen by Un r nip; if p; < 00; no such prob ability distribution exists when fl = 00. 1t will be recalled that our Markov chain is connected with a recurrent event e with recurrence time distribution {/k}. In the special case Pk = rk/r k- 1 the chain of the preceding example is connected with the same recurrent event e and in this case (7.20) and (7.23) are equivalent. Hence the invariant distributions are the same. In the language of queuing theory one should say that the spent waiting time and the residual waiting time tend to the same distribution, namely {r n/fl}.

XV.S]

399

TRANSIENT STATES

We derived the basic limit theorems for Markov chains from the theory of recurrent events. We now see that, conversely, recurrent events could be treated as special Markov chains. [See also example (Il.d).] (h) Doubly stochastic matrices A stochastic matrix P is called doubly stochastic if not only the row sums but also the column sums are unity. If such a chain contains only a finite number, a, of states then Uk = a- 1 represents an invariant distribution. This means that in macroscopic equilibrium all states are equally probable. ~

8. TRANSIENT STATES We saw in section 6 that the persistent states of any Markov chain may be divided into non-overlapping clOSed in educible sets C 1 , C 2 , • • •• In general there exists also a non-empty class T of transient states. When the system starts from a transient state two contingencies arise: Either the system ultimately passes into one of the closed sets C v and stays there forever, or else the system remains forever in the transient set T. Our main problem consists in determining the corresponding probabilities. Its solution will supply a criterion for deciding whether a state is persistent or transient. Examples. (a) Martingales. A chain is called a martingale if for every j the expectation of the probability distribution {Pik} equals j, that is, if

(S.l)

°

Consider a finite chain with states Eo, ... , Ea. Letting j = and i - a in (S I) we see that 10 66 - Laa 0 I ) and so Eo and Ea are absorbJ ing. To avoid trivialities we assume that the chain contains no further closed sets. It follows that the interior states E 1 , . •• , E a - 1 are transient, and so the process will ultimately terminate either at Eo or at Ea. From (S.1) we infer by induction that for all n (8.2)

But p~~)

-+

°

for every transient E k , and so (S.2) implies that for all

i>O (S.3) In other words, if the process starts with Ei the probabilities of ultimate absorption at Eo and E'J, are 1 - i/a and i/a, respectively.

400

[xv.s

MARKOV CHAINS

(b) Special cases. The chains of the examples from genetics (2.h) and (2.i) are of the form discussed in the preceding example with a = Nand a = 2N, respectively. Given the initial state E i , the probability of ultimate fixation at Eo is therefore 1 jla (c) Consider a chain with states Eo, E 1, . .. such that Eo is absorbing while from other states E j transitions are possible to the right neighbor Ej+1 and to Eo, but to no other state. For j > 1 we put (S.4)

where at Eo

1

P;o Ej

III

> 0.

With the initial state E j the probability of no absorption n trials equals

(8.5)

This product decreases with increasing n and hence it approaches a limit . Aj . We infer that the probability of ultimate absorption equals 1 - Aj while with probability Ai the system remains forever at transient states. In order that A.i > it is necessary and sufficient that .2 Ek < 00. ~

°

The study of the transient states depends on the submatrix of P obtained by deleting all rows and columns corresponding to persistent states and retaining only the elements Pjk for which both E; and Ek are transient. The row sums of this submatrix are no longer unity, and it is convenient to introduce the Definition. A square matrix Q with elements qik is substochastic qik > and all row sums are < l.

°

if

In the sense of this definition every stochastic matrix is sub stochastic and, conversely, every sub stochastic matrix can be enlarged to a stochastic matrix by adding an absorbing state Eo. (In other words, we add a top row 1, 0, 0, . .. and a column whose elements are the defects of the rows of Q.) It is therefore obvious that what was said about stochastic matrices apphes WIthout essential change also to substochastic matrices. In particular, the recnrsion relation (3.2) defines the nth power Qn as the matrix with elements (S.6)

q lk~n+1)

= £.. '"

q. q(n) lV

Vk·

v

Denote by for n > 1 (S.7)

the sum of the elements

III

a~n+1) = '" q. a(n) l

£"lVV'

v

the rth row of Qn. Then

XV.8]

TRANSIENT STATES

401

and this relation remains valid also for n = 0 provided we put a~O) = I for all v. The fact that Q is substochastic means that a?) < a~O), and from (8.7) we see now by induction that a~n+1) < a: n ). For fixed i therefore the sequence {a: n )} decreases monotonically to a limit a i > 0, and clearly (8.8) The whole theory of the transient states depends on the solutions of this system of equations. In some cases there exists no non-zero solution (that is, we have a i = 0 for all 0. In others, there may exist infinitely many linearly independent solutions, that is, different sequences of numbers satisfying (8.9) Our first problem is to characterize the particular solution {a i }. We are interested only in solutions {x.} such that < x.t < 1 for all j This t can be rewritten in the form 0 < Xi < a~O); comparing (8.9) with (8.7) we see inductively that x~ < a~n) for all n, and so

°

(8.10)

implies

The solution {a i } will be called maximal, but it must be borne in mind that in many cases U i 0 for all i. \Ve summarize this result in the

We now identify Q with the submatrix of P obtained by retaining only the elements Pik for which E j and Ek are transient. The linear system (8.9) may then be written in the form (8.11 ) '1.'

the summation extending only over those v for which E v belongs to the class T of transient states. With this identification a~n) is the probability that, with the initial state E i , no transition to a persistent state occurs during the first n trials. Hence the limit (Jl equals the probability that no such transition ever occurs. We have thus

Theorem 1. The probabilities Xi that, starting from Ei that the system stays forever among the transient states are given by the maximal solution of(8.11).

402

[XV.8

MARKOV CHAINS

The same argument leads to the Criterion. In an irreducible 12 Markov chain with states Eo, E 1, the state Eo is persistent if, and only if, the linear system

..•

00

(8.12)

Xi

=

.2 Pivxv,

i

>1

v=l

cdmits of no solution with 0

< Xi

::;;

1 except

Xi

= 0

for all i.

Proof. We identify the matrix Q of the lemma with the sub matrix of P obtained by deleting the row and column corresponding to Eo. The argument used for theorem 1 shows that (Ji is the probability tha t (with Ei as initial state) the system remains forever among the states E 1 , E 2 , •••• But jf Eo is persistent the probability Jio of reaching Eo equals 1, and hence (Ji = 0 for all i. ~

Examples. (d) As in example (7.b) we consider a chain with states Eo, Eb . .. such that (8.13)

Pik

when

= 0

Ik - jl >

1.

To avoid trivialities we assume that Pi,H1 ¥: 0 and Pi,i-1 :;:e O. The chain is irreducible because every state can be reached from every other state. Thus all states are of the same type, and it suffices to test the character of Eo. The equations (8.12) reduce to the recursive system (8.14)

j

> 2.

Thus (8.15)

Xj

_

-

X H1 -

P21P32 . . . Pi,i-1

(x - x) 1

2 .

P23P34 . . . Pi,H1

Since PlO > 0 we have X 2 - Xl > 0, and so a bounded non-negative solution {Xj} exists if, and only if, (8.16)

!

P21 . . • Pj,j 1

<

00.

P23 . . . Pj,j+!

The chain is persistent if, and only if, the series diverges. In the special case of random walks we have Pi.i+! = P and Pi,j-1 = q for all j > 1, and we see again that the states are persistent if, and only if, P < fj. 12 Irreducibility is assumed only to avoid notational complications, }t represents no restriction because it suffices to consider the closure of Eo, Incidentally, the criterion applies also to periodic chains.

XV.S]

TRANSIENT STATES

403

(This chain may be interpreted as a random walk on the line with probabilities varying from place to place.) (e) For the matrix of example (2./) the equations (S.12) reduce to (S.17) and a bounded positive solution exists if, and only if, the infinite product

PIP2 . .. converges. If the chain is associated with a recurrent event 0, the Pk are given by (2.5) and the product converges if, and only if,

2.ii < 00.

Thus (as could be anticipated) the chain and 0 are either both ~ transient, or both persistent.

To answer the last qlJestion propo~ed at the beginning of this ~ection, denote again by T the class of transient states and let C be any closed set of persistent states. (It is not required that C be irreducible.) Denote by Yi the probability of ultimate absorption in C, given the initial state E i • We propose to show that the Yi satisfy the system of inhomogeneous equatIOns

(S.IS)

Yi

= 2. PivYv + 2. Piv, T

C

the summations extending over those 'V for which Ev E T and Ev E C, respectively The system (8 18) may admit of several independent sollltions, but the fOllowing proof will show that among them there exists a minimal solution defined in the obvious manner by analogy with (S.10). Theorem 2.

The probabilities Yi of ultimate absorption in the closed . :y b he f mllllmUt .. 1 non-neg a twe . sum 1 t· r E8 18} . UI e gwen IOn OJ.

. set e pelslstent

Proof. Denote by y~n) the probability that an absorption in C takes place at or before the nth step. Then for n > 1 clearly

(S.19)

Y~ n+1) =

2. PivY~ n) + 2. Piv T

C

and this is true also for n = 0 provided we put Y~O) = 0 for all 'V. For fixed i the sequence {y~n)} is non-decreasing, but it remains bounded by 1. The limits obviously satisfy (S.IS). Conversely, if {Yi} is any nonnegative solution of (8.18) w@ have Yz > ll becau~e the ~econd SlJm in (S.IS) equals y~l). By induction Yi > y~n) for all n, and so the limits of y~n) represent a minimal solution. ~

yi

For an illustration see example (c).

404

MARKOV CHAINS

* 9.

[XV.9

PERIODIC CHAINS

Periodic chains present no difficulties and no unexpected new features. They were excluded in the formulation of the main theOleIIl in section 7 only because they are of secondary interest and their description requires disproportionately many words. The discussion of this section is given for the sake of completeness rather than for its intrinsic interest. The results of this section will not be used in the sequel. The simplest example of a chain with period 3 is a chain with three states in which only the transitions El ~ E2 ~ E3 ~ El are possible. Then

n

.L

( 0 0)

n2

010

.L

001 We shall now show that this example is in many respects typical. Consider an irreducible chain with finitely or infinitely many states E 1 , E 2 , • • •• By theorem 6.1 all states have the same period t (we assume t > 1). Since in an irreducible chain every state can be reached from every other state there exist for every state Ek two integers a and b such that (al > 0 and p(bl > 0 But p(a+bl > p(aln(bl and so a + b must be Plk kl· 11 lkrkl divisible by the period t. Keeping b fixed we conclude that each integer a for which pi~l > 0 is of the form tX + vt where tX is a fixed integer with 0 < tX < t. The integer tX is characteristic of the state Ek and so all states can be divided into t mutually exclusive classes Go, ... , G t - 1 such that (9.1)

unless

n -

tX

+

vt

We imagine the classes Go, ... , G t- 1 ordered cyclically so that G t- 1 is the left neighbor of Go. It is now obvious that one-step transitions are possible only to a state in the neighboring class to the right, and hence a path of t steps leads always to a state of the same class. This implies that in the Markov chain WIth tranSItIOn matrIx pt each class Ga, forms a closed set. 13 'I filS 13 When t = 3 there are three classes and with the symbolic partitioning introduced in section 4 the matrix P takes on the form

where A represents the matrix of transition probabilities from Go to Gh and so on.

XV.9]

405

PERIODIC CHAINS

set is irreducible because in the original chain every state can be reached from any other state and within the same class the required number of steps is necessarily divisible by t. We have thus proved the Theorem. ln an zrreduczble cham wzth perzod t the states can be dwzded into t mutually exclusive classes Go, ... , Gt-l such that (9.1) holds and a one-step transition always leads to a state in the right neighboring class (in particular,jrom G t- 1 to Go). In the chain with matrix pt each class Ga. correvponds to an irreducible closed set

Using this theorem it is now easy to describe the asymptotic behavior of the transition probabilities p}~). We know that p}~) -- 0 if Ek is either transient or a persistent null state, and also that all states are of the same type (section 6) We need therefore consider only the case where each state Ek has a finite mean recurrence time flk' Relative to the chain with matrix pt the state Ek has the mean recurrence time flklt, and relative to this chain each class Ga. is ergodic. Thus, if E j belongs to Ga. tI

(9.2) otherwise

and the weights tl flk define a probability distribution on the states of the class G a. (see the theorem of section 7). Since there are t such classes the numbers Uk = I/flk define a probability distribution on the integers as was the case for aperiodic chains. Vie show that this distribution is in variant. For this purpose we need relations corresponding to (9.2) when the exponent is not divisible by the period t. We start from the fundamental relation (9.3)

p~Plp (nt)

= " p (,nHPl 1k k

1v

Vk'

v

The factor p}~) vanishes except when Ev is in G a.+P' (When tX + fJ > t read G a,+p-t for G a.+P') In this case p~~t) vanishes unless Ek is in G a.+P' and hence for fixed P and E j in G a. t

I

(9.4) otherwise.

We now rewrite (9.3) in the form (9.5)

lnt+l) p ik

= £.. "

p~nt)p lV

vk'

v

Consider an arbitrary state Ek and let G p be the class to which it belongs. Then pvk = 0 unless Ev E G p - 1 , and so both sides in (9.5) vanish unless

406 Ei

E

MARKOV CHAINS

Gp-

I'

In this case

pi~t+1)

[XV.lO

-- tUk whence

(9.6) v

Since Ek is an arbitrary state we have proved that the probability distribution {Uk} is invariant.

10. APPLICATION TO CARD SHUFFLING A deck of N cards numbered 1, 2, ... , N can be arranged in N! different orders, and each represents a possible state of the system. Every particular shuffling operation effects a transition from the existing state into some other state. For example, "cutting" will change the order (1,2, ... ,N) into one of the N cyclically equivalent orders (r, r+ 1, ... , N, 1, 2, ... , r - 1). The same operation applied to the inverse order (N, N - 1, ... ,1) will produce (N - r + 1, N - r, ... ,1, N, N - 1, ... , N - r + 2). In other words, we conceive of each particular shuffling operation as a transformation E, ~ Ek • If exactly the same operation is repeated, the system will pass (starting from the given state E j ) through a well-defined succession of states, and after a finite number of steps the original order will be re-established. From then on the same succession of states will recur periodically. For most operations the period will be rather small, and in no case can all states be reached by this procedure. I4 For example, a perfect "lacing" 'Nould change a deck of 2m; cards from (1, ... ,2m) into (1, m+ 1,2, m+2, ... , m, 2m). With six cards four applications of this operation will re-establish the original order. With ten cards the initial order will reappear after six operations, so that repeated perfect lacing of a deck of ten cards can produce only six out of the 10! = 3,628,800 possible orders. In practice the player may wish to vary the operation, and at any rate, accidental variations will be introduced by chance. We shall assume that we can account for the player's habits and the influence of chance variations by assuming that every particular operation has a certain probability (possibly zero). We need assume nothing about the numerical values of these probabilities but shall suppose that the player operates without regard to the past and does not know the order of the cards. I5 This implies that the successive operations correspond to indeperldent trials with fixed probabilities; for the actual deck of cards we then have a Markov chain. 14 In the language of group theor~T this amounts to saying that the permutation I group is not cyclic and can therefore not be generated by a single operation. 15 This assumption corresponds to the usual situation at bridge. It is easy to devise more complicated shuffling techniques in which the operations depend on previous operations and the final outcome is oot a Markoy chaio [cf example (13 e)J

XV.II]

INVARIANT MEASURES.

RATIO LIMIT THEOREMS

407

We now show that the matrix P of transition probabilities is doubly stochastic [example (7.h)]. In fact, if an operation changes a state (order of cards) E j to ETc' then there exists another state Er which it will change into E; This means that tbe elements oUbe Jhh column of P are identical with the elements of the jth row, except that they appear in a different order. All column sums are therefore unity. It follows that no state can be transient. If the chain is irreducible and aperiodic, then in the limit all states become equally probable. In other words, any kInd of shufflIng Will do, prOVided only that It produces an irreducible and aperiodic chain. It is safe to assume that this is usually the case. Suppose, however, that the deck contains an even number of cards and the procedure consists in dividing them equally into two parts and ~huffiing them ~eparately by any method. If the two part~ are put together in their original order, then the Markov chain is reducible (since not every state can be reached from every other state). If the order of the two parts is inverted, the chain will have period 2. Thus both contingencies can arise in theory, but hardly in practice, since chance precludes perfect regularity. It is seen that continued shuffling may reasonably be expected to produce perfect "randomness" and to eliminate all traces of t~e original order . .. It should be noted, however, that the number of operations required for this purpose is extremely large. I6 *11. INVARIANT MEASIJRES. THEOREMS

RATIO LIMIT

In this section we consider an irreducible chain with persistent null states. Our main objective is to derive analogues to the results obtained in section 7 for chains whose states have finite mean recurrence times. i~·... n outstand ing property of such chains is the existence of an invariant (or stationary) probability distribution defined by (11.1) v

We know that no such invariant probability distribution exists when the mean recurrence times are Infimte, but we shall show that the Imear

* The next two sections treat topics playing an important role in contemporary research, but the results will not be used in this book. 16 For an analysis of unbelievably poor results of shuffling in records of extrasensory perception experIments, see W. Feller, Staflstlcal aspects oj ESP, Journal of Parapsychology, vol. 4 (1940), pp. 271-298. In their amazing A review of Dr. Feller's critique, ibid., pp. 299-319, J. A. Greenwood and C. E. Stuart try to show that these results are due to chance. Both their arithmetic and their experiments have a distinct tinge of the supernatural.

408

[XV. 11

MARKOV CHAINS

system (ILl) still admits of a positive solution {Uk} such that I Uk = 00. Such {Uk} is called an invariant (or stationary) measure. If the chain is irreducible and persistent, the invariant measure is unique up to an arbitrary norming constant. Examples. (a) Suppose that the matrix P of transition probabilities is doubly stochastic, that is, the column sums as well as the row sums are unity. Then (11.1) holds with Uk = 1 for all k. This fact is expressed by saying that the uniform measure is invariant. (b) Random walks. An interesting special case is provided by the unrestricted random walk on the line. We number the states in their natural order from - 00 to 00. This precludes exhibiting the transition probabilities in the standard form of a matrix, but the necessary ehanges of notation are obvious. If the transitions to the right and left neighbors have probabilities P and q, respectively, the system (ILl) takes on the form -00


00.

The states are persistent only if p = q =!, and in this case Uk = 1 represents the only positive solution. This solution remains valid if P ¥= q, except that it is no longer unique; a second non-negative solution is represented by Uk = (P/q)k. This example proves that an invariant measure may exist also for transient chains, but it need not be unique. We shall return to this interesting point in the next section. The invariant {Uj} measure can be interpreted intuitively if one considers simultaneously infinitely many processes subject to the same matrix P of transition probabilities. For each j define a random variable N j with a Poisson distribution with mean u j , and consider N; independent processes starting from E j • We do this simultaneously for all states, assuming that all these processes are mutually independent. It is not difficult to show that at any given time with probability one only finitely many processes will be found in any given state E k • The number of processes found at the nth step in state Ek is therefore a random variable Xkn ) and the invariance vf {Uk} implies that E{Xkn )} Uk /01 all n. (Cf. problem 29.) (c) In example (7.f) we found that an invariant probability distribution exists only if the series (7.21) converges. In case of divergence (7.20) still represents an invariant measure provided only that Uk -- 0, which is the same as P1P2 ... Pk -- O. No invariant measure exists when the product PI ... Pk remains bounded away from 0, for example, when Pk ~ 1 - (k+ 1)-2. In this case the chain is transient. (d) In example (7.g) the relations (7.23) define an invariant measure even when p = 00. ~

XV.! 1]

INVARIANT MEASURES.

409

RA TIO LIMIT THEOREMS

In ergodic chains the probabilities p}~) tend to the term Uk of the invariant probability distribution. For persistent null chains we shall prove a weaker version of this result, namely that as N -- 00 for all Ea, and Ep N

:2 p~7) n=O

(11.2)

N

:2 p1j) n-O

The sums on the left represent the expected numbers of passages, in the first N trials, through Ei and E j • Roughly speaking (11.2) states that these expectations are asymptotically independent of the initial states Ea, and E p, and stand in the same proportion as the corresponding terms of the invariant measures. Thus the salient facts are the same as in the case of ergodic chains, although the situation is more complicated. On' the other hand, periodic chains now require no special consideration. [In fact (11.2) covers all persistent chains. For an ergodic chain the numerator on the left is .... ·.Nu i .] Relations of the form (11.2) are called ratio limit theorems. We shall derive (11.2) from a stronger result which was until recently considered a more complicated refinement. Our proofs will be based on considering only paths avoiding a particular state E r . Following Chung we call the forbidden state Er taboo, and the transition probabilities to it are taboo probabilities. Definition. Let Er be an arbitrary, but fixed, state. For Ek ¥- Er and n > 1 we define rP}~) as the probability that, starting from E j , the state Ek is entered at the nth step without a previous passage through E r •

Here E j is allowed to coincide with E r • We extend this definition to Ek = Er and n = 0 in the natural way by (n)

(11.3)

rPjr

=0

n

>1

and (11.4)

(0) rPik -

1

o

In analytical terms we have for n

otherwise.

>0

and Ek ¥- Er

(11.5) v

In fact, for n = 0 the sum on the right reduces to a single term, namely When n > 1 the term corresponding to 'V = r vanishes by virtue of (11.3), and so (11.5) is equivalent to the original definition.

Pjk'

410

[XV. 1I

MARKOV CHAINS

Introducing Er as taboo state amounts to considering the original Markov process only until Er is entered for the first time. In an irreducible persistent chain the state Er is entered with probability one from any initial state E j • It follows that in the chain "'lith taboo Er the successive passages through the initial state E j form a transient recurrent event; and the passages through any other state Ek ¥:- Er form a delayed transient recurrent event. Thus for Ek ¥:- Er <Xl

"

(11.6)

(n)

£.. rPjk = rTTjk n=O

<

00

by the basic theorem 2 of XIII,3. For Ek = Er the summands with > 1 vanish and the sum reduces to 1 or according as j = r or j ¥:- r. We are now in a position to prove the existence of an invariant measure, that is, of numbers Uk satisfying (I 1. 1). This will not be used in the proof of theorem 2.

°

n

Theorem 1.

If the chain is irreducible and persistent, the numbers

(I 1. 7)

°

represent an invariant measure; furthermore Uk > for all k and Ur = I. Conversely, if Uk > for all k and (11.1) holds, then there exists a constant A such that Uk = A' rTTrk'

°

Here Er is arbitrary, but the asserted uniqueness implies that the sequences {Uk} obtained by varying r differ only by proportionality factors. Note that the theorem and its proof cover also chains with finite mean recurrence times.

Proof. If k ¥:- r we use (I1.5) with j = r. Summing over n = 0, 1, ... we get (11.8)

rTTrk

=!

rTTrvPvk'

v

and so the numbers (I 1. 7) satisfy the defining equations (I 1. 1) at least when k ¥:- r. For j = k = r it is clear that (11.9)

"p(n)p £.. r rv vr

= j(n+l) rr

v

equals the probability that (in the original chain) the first return to Er occurs at the (n + I)st step. Since the chain is irreducible and persistent these probabilities add to unity. Summing (11.9) over n = 0, 1, ... we

xv. 11]

INY ARIANT MEASURES.

RA TIO LIMIT THEOREMS

411

get therefore (11.10)

But by definition r7T'rr - I, and so (I 1.8) IS true also for k - r. Accordingly (11.7) represents an invariant measure. Next consider an arbitrary non-negative invariant measure {Uk}. It is clear from the definition (11. 1) that if Uk = 0 for some k, then U v = 0 for aI1 v sllch that n k > 0 By jnduction it [onows that u - 0 for

gIven lllvanant measure IS normed by the condItIOn ur = I for some prescribed r. Then (11.11)

=

Uk

Pkr

+ .2 UjPjk· jif=r

Suppose k ¥: r. We express the uj inside the sum by means of the deftning relation (11.1) and separate again the term involving U r in the double sum. The result is (11.12)

Proceeding in like manner \\>e get for every

(11.13)

_

Uk -

Il

+ rPrk + ... + rPrk + ~ " U . rPvk· (2)

Prk

(n)

v

(n)

v*r

Letting n ~ 00 we see that Uk > r7T'rk. It follows that {Ulc - r7T'rk} defines an invariant measure vanishing for the particular value k = r. But such ... a measure vanishes identically, and so (11.7) is true. It will be seen presently that the following theorem represents a sharpening of the ratio limit theorem.

Theorem 2.

In an irreducible persistent chain N

" (n) O< _ ~ Pkk

(11.14)

N -

n=O

,,(n)

~ Pa.k

< _ a.TT'kk

n=O

and (11.15)

-1

< _1_ ~ p~:Z) _

~ll

jTT'ii n=O

for all N.

__1_

~ p(.~) < 1

£"11-

iTT' jj n=O

412

[XV.! 1

MARKOV CHAINS

Proof of (I 1.14). Consider the first entry to E k ; it is clear that for tX¥:-k (11 16) v=1

[This is the same as (5.3).] Summing over n we get N

(11.17)

.2

N

p(n)

a.k

< .2 -

n=O

N

<Xl

p(n) •

kk

.2

(v) J

a.k

-.2

v=1

n=O

p(n)

kk

n=O

which proves the first inequality in (I 1.14). Next we note that, starting from E k , a return to Ek may occur without intermediate passage through Ea.' or else, a first entry to E(1. occurs at the vth step with 1 < v < n. This means that (11.18)

(n)

Pkk

n

=

p(n)

a. kk

+ £.. "j(v)p(n-v) ka. a.k . v=1

Summation over n leads to the second inequality in (11.14). Proof of (I 1.15). On account of the obvious symmetry in i and j it suffices to prove the second inequality. We start from the identity n-l

(11.19)

(n)

=

Pii

(n)

jPii

+" £.. Pu

(n-v)

(v)

• jP ji

v-I

which expresses the fact that a return from Ei to Ei occurs either without intermediate passage through E j , or else the last entry to E j occurs at the (n-v)th step and the next v steps lead from E j to Ei without further return to E j • Summing over n We get N

I (11.20)

p;F)

<

j7Ti i

+

N

j7Tji

n=O

I

p!j)

n=o

N

by virtue of (I 1.14). To put this inequality into the symmetric form of (I 1.15) it suffices to note that (11.21) In fact, by analogy with (I 1.16) we have (I 1.22)

XV.Il]

INY ARIANT MEASURES.

RA TIO LIMIT THEOREMS

413

where d'ji is the probability of reaching Ei from E j without a previous return to E j • The alternative to this event is that a return to E j occurs before an entry to E i , and hence (11.23)

Jji =

1-

i!j = -

1

j

.

(TI' j j

(The last equation is the basic identity for the transient recurrent event which consists in a return to E; without an intermediate passage through Ei') Substituting from (11.23) into (11.22) we get the assertion (11.21), and this accomplishes the proof. ~ The relation (I 1.21) leads to the interesting Corollary 1.

If {Uk} is an invariant measure, then

(11.24) Proof. The invariant measure is determined up to a multiplicative constant, and so the right side in (I 1.24) is uniquely determined. We may therefore suppose that {Uk} is the invariant measure ddined by (11.7) when the taboo state Er is identified with E,. But then Uj = 1 and fTTii = U i , and so (11.21) reduces to (1l.24). ~ Corollary 2. (Ratio limit theorem.) In an irreducible persistent chain the ratio limit theorem (11.2) holds. Proof. The sums of theorem 2 tend to 00 as N --00. The ratio of the two Slims in (11 14) therefore tends to unity, and so it Sllffices to prove (I 1.2) for the special choice tX = i and fJ = j. But with this choice (11.2) is an immediate consequence of (11.15) and (11.24). ~ The existence of an invariant measure for persistent chains was first proved by C Derman (1954) The existence of a limit in (11 2) was demonstrated by A Dablin (1938). Taboo probabilities as a powerful tool in the theory of Markov chains were introduced by Chung (1953). Further details are given in the first part of his basic N

treatiseY The boundedness of the partial sums

L (p~~) -

p:~))

was proved by S.

o Orey, who considered also the problem of convergence. I8 17 ldal kov chal·1IS with statiollary II allSidoll probabilit,·es, Berlin (Springer), 1960. A revised edition covering boundary theory is in preparation. (Our notations are not identical with his.) 18 Sums arising in the theory of Markov chains, Proc. Amer. Math. Soc., vol. 12 (1961), pp. 847-856.

414

[XV.l2

MARKOV CHAINS

*12. REVERSED CHAINS.

BOUNDARIES

When studying the development of a system we are usually interested in the probabilities of possible future events, but occasionally it is necessary to study the past. In the special case of a Markov chain we may ask for the (conditional) probability that at some time in the past the system was in state Ei given that the present state is E j • Consider first a chain with a strictly positive invariant probability distribution {Uk}; that is, we assume that Uk > 0 and Uk = 1 where

L

(12.1 )

[Recall from the theorem in section 7 that the invariant probability distribution of an irreducible chain is automatically strictly positive.] If the process starts from {Uk} as initial distribution, the probability of finding the system at any time in state Ei equals U i • Given this event, the conditional probability that n time units earlier the system was in state E j equals (12.2)

For

11

=

1 we get·

(12.3)

In view of (12.1) it is clear that the qij are the elements of a stochastic matrix Q. Furthermore, the probabilities q~;) are simply the elements of the nth power Qn (in other words, the q~;) can be calculated from the qii in the same manner as the pj~) are calculated from the Pij)' It is now apparent that the study of the past development of our Markov chain reduces to the study of a lvlarkov chain with transition probabilities q ij' The absolute probabilities of the new chain coincide, of course, with the invariant probability distribution {Uk}' The probabilities qii are called inverse probabilities (relative to the original chain) and the procedure leading from one chain to the other is called reversal of the time. In the special case where qij = Pij one says that the chain is reversible; the probability relations for such a chain are symmetric in time. We know that an irreducible chain possesses an invariant probability distribution only if the states have finite mean recurrence times. If the

XV.I2]

REVERSED CHAINS.

BOUNDARIES

415

states are persistent null states there exists an invariant measure which is unique except for an arbitrary multiplicative constant. For a transient chain all contingencies are possible: some chains have no invariant measure, others infinitely many [Examples (1 J h) and (1 J c)] Under these circumstances it is remarkable that the transformation (12.3) defines a stochastic matrix Q whenever {Uk} is a strictly positive invariant measure. The powers of Q are given by (12.2). In this sense every strictly positive invariant measure defines a reversed Markov chain. Unfortunately the new transItIOn probabilItIes qij cannot be mterpreted dIrectly as condItional probabilities in the old process. I9 A glance at (12.3) shows that {Uj} is an invariant measure also for the reversed chain. Furthermore it is clear from (12.2) that either both series q(n) and I n(n) converge or both diverge It follows that the stater

'L

Examples. (a) The invariant probability distribution corresponding to the Ehren/est model [example (2.e)] was found in (7.16). A simple calculation shows that the Ehren/est model is reversible in the sense that qij = PH' (b) In example (1 l.b) we found the invariant measures corresponding to a random walk on the line in which the transitions to the right and left neighbor have probabilities p and q, respectively. If we choose Uk = 1 for k = 0, + 1, ±2, ... , we get qil = Pii' and we are led to a new random walk in which the roles of p and q are interchanged. On the other hand, the invariant measure with Uk = (p/q)k yields a reversed random walk identical with the original one. (c) In examples (2.k) and (2.1) we introduced two Markov chains related to a recurrent event t;. For a persistent t; with finite mean recurrence time p, we saw in example (7.g) that the two chains have the same invariant probability distribution defined by (7.23). When p, = 00 these relations define an invariant measure common to the two chains [see examples (11.c) and (ll.d)]. A simple calculation now shows that the two chains are obtained/rom each other by reversing the time. This is not surprising seeing that the chain of (2.k) concerns the waiting time to the next occurrence of e while (2./) refers to the time elapsed from the last occurrence. ~

Consider now an arbitrary irreducible transient chain with an invariant measure {Uk}' The equations (12.1) defining an invariant measure may admit of other solutions, and the question of uniqueness is closely related 19 For an operational interpretation of the qii it is necessary to consider infinitely many simultaneous processes, as indicated in example (ll.b).

416

[XV.12

MARKOV CHAINS

with the question of uniqueness of the adjoint system of linear equations,20 (12.4) which played an important role in section 8. This system admits of the trivial solution ~i = c for all i. Any non-negative solution is automatically strictly positive. (Indeed, ~i = 0 would imply ~ v = 0 for all y such that Piv > O. This in turn would imply ~ v = 0 whenever p~;) > 0, and generally ~ v = 0 for every state Ev that can be reached from E i . Thus ~v = 0 for all y because the chain is irreducible.) If gJ is a non-constant solution then a glance at (12.3) shows that (12 5) defines an invariant measure for the reverse matrix Q. Conversely, if {Vi} stands for such a measure then (12.5) defines a positive solution of (12.4). In other words, the positive solutions of (12.4) stand in one-to-one correspondence with the invariant measures of the reversed cham 21 with matrix Q. In the modern theory of Markov chains and potentials the positive solutions {~i} and {Uk} are used to define boundaries. It is beyond the scope of this book to describe how this is done, but the following examples may give some idea of what is meant by an exit boundary.

Examples. (a) Consider a random walk on the infinite line such that from the position j #: 0 the particle moves with probability p a unit step away from the origin, and with probability q a unit step toward the origin. From the origin the particle moves with equal probabilities to + 1 or -1. We assume p > q. 20 If ~ stands for the column vector with components ~i the system (12.4) reduces to the matrix equation ~ = P~. The system (12.1) corresponds to u = uP where u is a row vector. 2t For an IrreducIble persistent chain the invariant measure is unIque up to a multiplicative constant. Since the chains with matrices P and Q are of the same type we have proved the Theorem. For an irreducible persistent chain the only non-negative solution of (12.4) is given by ~i = const. This can be proved also by repeating almost verbatim the last part of the proof of theorem 11.1. Indeed, by induction we find that for arbitrary i, r, and n

~. = [h~ll+·· ·+h~n)]~r

+ IrP~~)~v. v

For a persistent chain the expression within brackets tends to 1 while the series tends to Hence ~i = ~r as asserted.

o.

XV.I2l

REVERSED CHAINS.

417

BOUNDARIES

In the Markov chain the states are numbered from equations (12.4) take on the form

00

to

00,

and the

i> 0, ~o

(12.6)

~i

= !~1 + !~-1 = q~i+l + P~i-l

i

< 0.

Put (12.7)

'Ii

1

for

j

> 0,

for

j

< 0.

It is easily seen that ~i = 'fJz and ; i = 1 - 'fJi defines tw0 22 non-trivial solutions of the system (12.6). It follows that our chain is transient, and

so the position of the particle necessarily tends either to + 00 or to - 00. This conclusion can be reached directly from the theory of random walks. In fact, we know from XIV,2 that when the particle starts from a position i > the probability of ever reaching the origin equals (q/p)i. For reasons of symmetry a particle starting from the origin has equal probabilities to drift toward + 00 or 00, and so the probability oran ultimate drift to - 00 equals !(q/p)i. We conclude that 'fJi is the probability that, starting from an arbitrary position i, the particle ultimately drifts to + 00. The drift to - 00 has probability 1 - 'fJi. In the modern theory the situation would be described by introducing the "exit boundary points" + 00 and

°

-00.

(b) The pr eceding example is somewhat misleading by its simplicity,

and it may therefore be useful to have an example of a boundary consisting of infinitely many points. For this purpose we consider a random walk in the x,y-plane as follows. The x-coordinate performs an ordinary random walk in which the steps + 1 and -1 have probabilities p and q < p. The y-coordinate remains fixed except when the x-coordinate is zero, in which case the y-coordinate decreases by 1. More explicitly, when j ~ only the transitions (j, k) -- (j + 1, k) and (j-I, k) are possible, and they have probabilities p and q < p, respectively. From (0, k) the particle IIIOVeS with probability p to (1, k 1) and with probability q to (-1, k-I). From the theory of random walks we know that the x-coordinate is bound to tend to + 00, and that (with probability one) it will pass only finitely often through 0. It follows that (excepting an event of zero probability) the y-coordinate will change only finitely often. This means that

°

22 The most general solution is given by ~i = A + BYJi where A and Bare arbitrary constants. Indeed, these constants can be chosen so as to yield prescribed values for ~l and ~-l> and it is obvious from (12.6) that the values for ~l and ~-1 uniquely determine all ~i.

418

MARKOV CHAINS

[XV.12

after finitely many changes of the y-coordinate the particle will remain on a line y = r. In this sense there are infinitely many "escape routes to infinity," and for each initial position (j, k) we may calculate 23 the probability ~tk that the particle ultimately settles on the line y I. It is easily seen that for fixed r the probabilities ~}~k represent a solution of the system corresponding to (12.4), and that the most general solution is a linear combination of these particular solutions. Furthermore, the particular solution ~}~k is characterized by the intuitively obvious "boundary condition" that ~}~k ~ 0 as j ~ 00 except when k = r, in which case ~~r) ~ J ,r

1.

~

.

These examples are typical in the following sense. Given an irreducible transient Markov chain it is always possible to define a "boundary" such that with probability one the state of the system tends to some point of the boundary. Given a set r on the boundary we can ask for the probability r;i that, starting from the initial state E i , the system converges to a point of r. We refer to {r;J as the absorption probabilities for r. It turns out that such absorption probabilities are ahvays solutions of the linear system (12.4) and, conversely, that all bounded solutions of (12.4) are linear combinations of absorption probabilities. Furthermore, the absorption probabilities {r;J for r are given by the unique solution of (12.4) which assumes the boundary values 1 on r and the boundary values 0 on the complement of r on the boundary. We may now form a new stochastic ~atrix P with elements

"

(12.8)

Pik

=

r;k r;i

Pik -

This is the conditional probability of a transition from Ei to E,s given that the state ultimately tends to a point of r. The Markov process with matrix P may be described as obtained from the original process by conditioning on the hypothesis of an ultimate absorption in r. Since the 23 An explicit expression [or ~!~~ can be obtained [IOIII the results in XI V,2 concerning one-dimensional random walks. From an initial position i ~ 0 the probablhty that the ongm wIll be touched exactly p > 0 tImes equals (2q)P lCp q); when i 2 0 this probability equals (q/p)i(2q)P-ICp_q). The probability that the origin is never touched equals 0 for i ~ 0 and 1 - (q/p)i for i 2 o. It follows easily that for

i~O whih~

for i

~~~~ = (2q)k-r-lCp_q)

k>r

~~~~

k>r

>0 =

(q/p)i(2q)k-r-lCp_q)

~~~: = 1 - (q/p)i

and, of course, ~~~: = 0 when k

< r.

XV.! 3]

THE GENERAL MARKOV PROCESS

419

future development can never be known in advance such a conditioning appears at first sight meaningless. It is nevertheless a powerful analytic tool and has even an operational meaning for processes that have been going on for a very long time A boundary can be defined also for the matrix Q obtained by a reversal of the time. In general therefore there are two distinct boundaries corresponding to a given chain. They are called exit and entrance boundaries, respectively. Roughly speaking, the former refers to the remote future, the latter to the remote past. Time-reversed Markov chains were first considered by A. Kolmogorov. 24 The role of the solutions of (12.4) was stressed in the earlier editions of this book. Exit and entrance boundaries were introduced by W. Feller.25 His construction is satisfactory ..vhen there are only finitely many boundary points, but in general it is simpler to adapt the construction introduced by R. S. Martin in the theory of harmonic functions. This was pointed out by J. L. Doob. 26 The relativization (12.8) was introduced by Feller ;26 an analogous transformation in the theory of classical harmonic functions was defined at the same time by M. Brelot. 27

13. THE GENERAL MARKOV PROCESS

In applications it is usually convenient to describe Markov chains in terms of random variables. This can be done by the simple device of replacing in the preceding sections the symbol Ek by the integer k. The state of the system at time

n

then is a random variable X(n), which

assumes the value k with probability akn ); the joint distribution of X(n) and X(n+l) is given by p{x(n) = j, X(nH ) = k} = a!.n)Pik' and the joint distribution of (X(O), ... ,x(n)) is given by (l.!). It is also possible, and sometimes preferable, to assign to Ek a numerical value ek different from k. Wah thIS notatIOn a Markov cham becomes a speCIal stochastIc process,28 or in other words, a sequence of (dependent) random variables 29 24 Zur Theorie der Markoffschen Ketten, Mathematische Annalen, vol. 112(1935), pp. 155-160. 25 Boundaries induced by positive matrices, Trans. Amer. Math. Soc., vol. 83(1956), pp. 19-54. 26 Discrete potential theory and boundaries, J. Math. Mechanics, vol. 8(1959), pp.433-458. 27 Le probteme de Dirichlet. Axiomatique et frontiere de Martin, J. Math. Pures Appl., vol. 35(1956), pp. 297-335. 28 The terms "stochastic process" and "random process" are synonyms a"nd cover practIcally all the theory of probabIlIty from com tossmg to harmonic analYSIS. In practice, the term "stochastic process" is used mostly when a time parameter is introduced. 29 This formulation refers to an infinite product space, but in reality we are concerned only with joint distributions of finite collections of the variables.

420 .

[XV.l3

MARKOV CHAINS

(X(O), X (1) , • • • ). The superscript n plays the role of time. In chapter XVII we shall get a glimpse of more general stochastic processes in which the time parameter is permitted to vary continuously. The term "Markov process" is applied to a very large and important class of stochastic processes (with both discrete and continuous time parameters). Even in the discrete case there exist more general Markov processes than the simple chains we have studied so far. It will, therefore, be useful to give a definition of the Markov property, to point out the special condition charactenzmg our Markov chams, and, finally, to gIve a few examples of non-Markovian processes. Conceptually, a Markov process is the probabilistic analogue of the processes of classical mechanics, where the future development is completely determined by the present state and is independent of the way in which the present state has developed. These processes differ essentially from processes with aftereffect (or hereditary processes), such as occur in the theory of plasticity, where the whole past history of the system influences its future. In stochastic processes the future is not uniquely determined, but we have at least probabIlIty relatIOns enablmg us to make predIctIOns. For the Markov chains studied in this chapter it is clear that probability relations relating to the future depend on the present state, but not on the manner in which the present state has emerged from the past. In other words, if two independent systems subject to the same transition probabilities happen to be in the same state, then all probabilities relating to their future developments are IdentIcal. ThIS IS a rather vague descnptIOn which is formalized in the following Definition. A sequence of discrete-valued random variables is a Markov process if, corresponding to every finite collection of integers nl < n2 < < ... < nr < n, the jomt distribution of (X(al ), X(n 2 ), • • • , X(Jt r ), x(a)) IS defined in such a way that the conditional probability of the relation X( n) = X on the hypothesiS X(nl) = Xl' . . . , X(nr) = xr is identical with the conditional probability of X(n) = X on the single hypothesis X(nr) = x r. Here Xb , x, are arbitrarv numbers for which the hVDothesis has a 1.Doutive , .r /.1 probability. J

Reduced to simpler terms, this definition states that, given the present state X r , no additional data concerning states of the system in the past can alter the (conditional) probability of the state X at a future time. The Markov chains studied so far in this chapter are obviously Markov processes, but they have the additional proper ty that their lransition probabilities Pjk = p{x(m+l) = k X(m) = j} are independent of m. The more general transition probabilities

I

(13.1)

p~:-m) = p{x{n)

=

k

I

x{m)

= j}

(m

< n)

XV.13]

THE GENERAL MARKOV PROCESS

421

then depend only on the difference n - m. Such transition probabilities are called stationary (or time-homogeneous). For a general integral-valued Markov chain the right side in (13.1) depends on m and n. We shall denote it by Pik(m, n) so that Pik(n, n + 1) define the one-step transition probabilities. 'Instead of (1.1) we get now for the probability of the path (jo,h, ... ,jn) the expression (13.2) The proper generalization of (3.3) is obviously the identity (13.3)

pjim, n)

=

Ipjv(m, r)pvir, n) v

which is valid 'for all r with m < r < n. This identity follows directly from the definition of a Markov process and also from (13.2); it is called the Chapman-Kolmogorov equation. [Transition probabilities pjk(m, n) are defined also for non-Markovian discrete processes, but for them the factor pvk(r, n) in (13.3) must be replaced by an expression depending not only on y and k, but also on j.] The Markov chains studied in this chapter represent the general timehomogeneous discrete Markov process. We shall not dwell on the timeinhomogeneous' Markov process. The following examples may be helpful for an understanding of the ~4:arkov property and will illustrate situations when the Chapman-Kolmogorov equation (13.3) does not hold. Examples of Non-Markovian Processes (a) The Polya urn scheme [example V,(2.c)]. Let X(n) equal 1 or 0 according to whether the nth drawing results in a black or red ball. The sequence {x(n)} is not a Markov process. For example,

I

P{X(3) = 1 X(2) = I} = (b+c)/(b+r+c), but P{X(3)

11 X(2)

1, X(1)

I}

(h+2c)/(h+l +2c).

(Cf. problems V, 19-20.) On the other hand, if y(n) is the number of black balls in the urn a time n, then {y(n)} is an ordinary Markov chain with constant transition probabilities. (b) Higher sums. Let Yo, Y1 , . . . be mutually independent random variables, and put Sn = Yo + ... + Yn' The difference Sn - Sm (with m < n) depends only on Y m+l, . . • , Y n> and it is therefore easily seen that the sequence {Sn} is a Markov process. Now let us go one step

422

MARKOV CHAINS

[XV.I3

further and define a new sequence of random variables Un by

The sequence {Un} forms a stochastic process whose probability relations can, in principle, be expressed in terms of the distributions of the Y k • The {Un} process is in general not of the Markov type, since there is no reason why, for example, P{Un = 0 U n - l = a} should be the same as 0 Un- l U, U n - 2 b}, the knowledge of U n - l and U n _ 2 P{Un permits better predictions than the sole knowledge of Un-I. In the case of a continuous time parameter the preceding summations are replaced by integrations. In diffusion theory the Yn play the role of accelerations; the Sn are then velocities, and the Un positions. If only positions can be measured, we are compelled to study a non-Markovian process, even though it is indirectly defined in terVls of a Markov process. (c) Moving averages. Again let {Y n } be a sequence of mutually independent random variables. Moving averages of order r are defined by X(n) (Y n+Yn+1+·· ·+yn+r-l)!r. It is easily seen that the x(n) are not a Markov process. Processes of this type are common in many applications (cf. problem 25). (d) A traffic problem. For an empirical example of a non-Markovian process R. Fiirth 30 made extensive observations on the number of pedestrians on a certain segment of a street. An idealized mathematical model of this process can be obtained in the following way. For simplicity We assume that all pedestrians have the same speed v and consider only pedestrians moving in one direction. We partition the x-axis into segments 11 ,12 , ••• of a fixed length d and observe the configuration of pedestrians regularly at moments dlv time units apart. Define the random variable Y k as the number of pedestrians initially in Ik • At the nth observation these same pedestrians will be found in I k- m whereas the interval Ik will contain YHn pedestrians. The total number of pedestrians within the interval 0 < x < Nd is therefore given by X(n) = Yn+1 + ... + Yn+N' and so our process is essentially a moving average process. The simplest model for the random variables Y's is represented by Bernoulli trials. In the limit as d -- 0 they lead to a continuous model, in which a Poisson distribution takes over the role of the binomial distribution. (e) Superposition of Markov processes (composite shuffling). There exist many technical devices (such as groups of selectors in telephone exchanges, counters, filters) whose action can be described as a superposition of two Markov processes with an output which is non-Markovian. A fair idea

I

I

30 R. Furth, Schwankungserscheinungen in der Physik, Sammlung Vieweg, Braunschweig, 1920, pp. 17ff. The original observations appeared in Physikalische Zeitschrift, vols. 19 (1918) and 20 (1919).

XV.I3]

THE GENERAL MARKOV PROCESS

423

of such mechanisms may be obtained from the study of the following method of card shuffling. In addition to the target deck of N cards we have an equivalent auxiliary deck. and the usual shuffling technique is applied to this auxiliary deck If its cards appear in the order (aI' a 2 , ••• ,aN)' we permute the cards of the target deck so that the first, second, ... , Nth cards are transferred to the places number aI, a 2 , ••• ,aN. Thus the shuffling of the auxiliary deck indirectly determines the successive orderings of the target deck. The latter form a stoclzastlc process which IS not oj the Markov type. 10 prove thIS, It suffices to show that the knowledge of two successive orderings of the target deck conveys in general more clues to the future than the sole knowledge of the last ordering. We show this in a simple special case. I et N - 4, and suppose that the auxiliary deck is initially in the order (2431). Suppose, furthermore, that the shuffling operation always consists of a true "cutting," that is, the ordering (aI' a 2 , a 3 , a 4 ) is changed into one of the three orderings (a 2 , a 3 , a4 , a l ), (a 3 , a4 , aI' a 2 ), (a 4 , a b a 2 , a3 ); we attribute to each of these three possibilities probability 1. With these conventlOns the aUXIlIary deck wIll at any tIme be ill one of the four ordenngs (2431), (4312), (3124), (1243). On the other hand, a little experimentation will show that the target deck will gradually pass through all 24 possible orderings and that each of them will appear in combination with each of the four possible orderings of the auxiliary deck. This means that the ordering (1234) of the target deck will recur infinitely often, and it will always be succeeded by one of the four ordenngs (4132), (3421), (2314), (1243). Now the auxiliary deck can never remain in the same ordering, and hence the target deck cannot twice in succession undergo the same permutation. Hence, if at trials number n - 1 and n the orderings are (1234) and (1243), respectively, then at the next trial the state (1 2 34 ) is impossible. Thus two consecutive observations convey more information than does one single observation. (f) A non-Markovian process satisfying the Chapman-Kolmogorov equation. The identity (3.3) was derived from the assumption that a transition from Ev to Ek does not depend on the manner in which the state Ev was reached. Originally it seemed therefore intuitively clear that no non-Markovian process should satisfy this identity: this conjecture seemed supported by the fact that the n-step transition probabilities of such a process must satisfy a host of curious identities. It turned out nevertheless that exceptions exist (at least in theory). In fact, in IX,1 we encountered an infinite sequence of pairwise independent identically distributed random variables assuming the values 1,2, and 3 each with probability t. We have thus a process with possible states 1,2, 3 and such that Pjk - t for all combmauons of j and k. I he mdenUty (3.3) IS therefore trivially satisfied with p~;) = t. The process is nonetheless non-Markovian. To see this suppose that the first step takes the system to the state 2. A transition to 3 at the next step is then possible if, and only if, the initial state was 1. Thus the transitions following the first step depend not only on the present state but also on the initial state. (For tarious modifications see the note and footnote 3 in IX,I.)

424

[xv. 14

MARKOV CHAINS

14. PROBLEMS FOR SOLUTION 1. In a sequence of Bernoulli trials we say that at time n the state E1 is observed if the trials number n - 1 and n resulted in SS. Similarly E 2 , E 3 , E4 stand for SF, FS, FF. Find the matrix P and all its powers. Generalize the scheme. 2. Classify the states for the four chains whose matrices P have the rows given below. Find in each case p2 and the asymptotic behavior of p(~). (a) (0, t, t), (t, 0, t), (t, t, 0); , (b) (0, 0, 0, 1), (0, 0, 0, 1), t, 0, 0), (0, 0, 1,0); (c) (t, 0, t, 0, 0), (i, t, i, 0, 0), (t, 0, t, 0, 0), (0, 0, 0, t, t), (0, 0, 0, t, t); (d) (0, t, t, 0, 0, 0), (0, 0, 0, 1, 1, 1), (0,0,0,1,1,1), (1,0,0,0,0,0), (1,0,0,0,0, 0), (1, 0, 0, 0, 0,0). 3. We consider throws of a true die and agree to say that at epoch n the system IS m state Ei If j IS the hIghest number appearmg m the first n throws. Fmd the matrix pn and verify that (3.3) holds. 4. In example (2.j) find the (absorption) probabilities X k and Yk that, starting from E k, the system will end in E1 or E 5 , respectively (k = 2, 3, 4, 6). (Do this problem from the basic definitions without referring to section 8.) S. Treat example I, (S.b) as a Markov chain. Calculate the probability of winning for each player. 6. Let Eo be absorbing (that is, put Poo = 1). For j > let pjj = p and Pj,i-1 = q, where p + q = 1. Find the probability fjnJ that absorption at Eo takes place exactly at the nth step. Find also the expectation of this distribution. 7. The first row of the matrix P is given by lIo, lib • • •• For j > 0 Vie haye (as in the preceding problem) PH = p and PU-1 = q. Find the distribution of the recurrence time for Eo. 8. For j = 0, 1, ... let Pi,H2 = Vj and PjO = 1 - Vj. Discuss the character of the states. 9. Two reflecting barners. A chain with states 1, 2, ... ,p has a matrix whose first and last rows are (q,p, 0, ... ,0) and (0, ... ,0, q,p). In all other rows Pk,k+1 = P, Pk,k-1 = q. Find the stationary distribution. Can the chain be periodic? 10. Generalize the Bernoulli-Laplace model of diffusion [example (2.f)] by assuming that there are b z p black particles and w 2p b white ones. The number of particles in each :ontainer remains = p. 11. A chain with states Eo, E 1, . .. has transition probabilities

ct,

°

where the terms in the sum should be replaced by zero if v (n)

_A/a

Pjk -e

()../q)k

kr·

> k.

Show that

XV. 14]

425

PROBLEMS FOR SOLUTION

Note: This chain occurs in statistical mechanics31 and can be interpreted as follows. The state of the system is defined by the number of particles in a certain region of space. During each time interval of unit length each particle has probability q to leave the volume, and the particles are stochastically independent. Moreoyer, nevI particles may enter the volume, and the prob ability of r entrants is given by the Poisson expression e-).)t/rL The stationary distribution is then a Poisson distribution with parameter )../q. 12. Ehrenfest model. In example (2.e) let there initially be j molecules in the first container, and let x(n) = 2k - a if at the nth step the system is in state k (so that x(n) is the difference of the number of molecules in the two containers) Let en = E(x(n». Provethat en+! = (a-2)e n/a, whence en = (l-2/a)n(2j-a). (Note that en ~ 0 as n ~ 00.) 13. Treat the counter problem, example XIII, (l.g), as a Markov chain. 14. Plane random walk with reflecting barriers. Consider a symmetric random walk in a bounded region of the plane The boundary is reflecting in the sense that, whenever in a unrestricted random walk the particle would leave the region, it is forced to return to the last position. Show that, if every point of the region can be reached from every other point, there exists a stationary distribution and that Uk = l/a, where a is the number of positions in the region. (If the region is unbounded the states are persistent null states and Uk = 1 represents an invariant measure.) 15. Repeated averaging. Let {Xl' X 2 , ••• } be a bounded sequence of numbers and P the matrix of an ergodic chain. Prove that 2,p~j)Xj ~ L.UjXj. Show i

that the repeated averaging procedure of example XIII, (lO.c) is a special case. 16. In the theory of waiting lines we ecounter the chain matrix

Po PI P2 P3 Po PI P2 P3

o ....

Po PI P2

o

n.

~u

n.

_

£~

where {Pk} is a probability distribution. Using generating functions, discuss the character of the states. Find the generating function of the stationary distribution, if any. 17. Waiting time to absorption. For transient E j let Y j be the time when the system for the first time passes into a persistent state. Assuming that the probability of staying forever in transient states is zero, prove that d j - E(Y;) is uniquely determined as the solution of the system of linear equations d j = 2.PjVdV

+ 1,

T

the summation extending over all need not be finite.

l'

such that Ev is transient

However, d.

31 S. Chandrasekhar, Stochastic problems in physics and astronomy, Reviews of Modern Physics, vol. 15 (1943), pp. 1-89, in particular p. 45.

426

[XV. 14

MARKOV CHAINS

18. If the number of states is a < 00 and if Ek can be reached from E" then it can be reached in a - I steps or less (i =;l: k). 19. Let the chain contain a states and let E j be persistent. There exists a number q < 1 such that for n z a the probability of the recurrence time of E j exceedmg n IS smaller than qn. (Hint: Use problem 18.) 20. In a finite chain E j is transient if and only if there exists an Ek such that Ek can be reached from E j but not E j from E k. (For infinite chains this is false, as shown by random walks.) 21. An irreducible chain for which one diagonal element Pii is positive cannot be periodic. 22. A finite irreducible chain is non-periodic if and only if there exists an n such that p}~) > 0 for all j and k. 23. In a chain with a states let (xl, ... , xa) be a solution of the system of linear equations Xj - 'Lp;.x. Prove' (a) If Xj < 1 for all j then the states for which Xr = 1 form a closed set. (b) If E j and Ek belong to the same irreducible set then Xj = Xk' (c) In a finite irreducible chain the solution {Xj} reduces to a constant. Hint: Consider the restriction of the equations to a closed set. 24 Continuation If (Xl' • xJ is a (complex valued) solution of Xj = s2.PjVXV with lsi = 1 but s =;l: 1, then there exists an integer t > 1 such that st = 1. If the chain is irreducible, then the smallest integer of this kind is the period of the chain. Hint: Without loss of generality assume Xl = 1 IXvl. Consider successively the states reached in 1, 2, ... steps. 25. Moving averages. Let {Yk} be a sequence of mutually independent random variables, each assuming the values ± 1 "'lith probability ~. Put x(n) = (Yn + YnH )/2. Find the transition probabilities

z

pjim, n)

= p{x(n) =

k

I

x(m)

=

j},

where m < If and j, k 1, 0, 1. Conclude that (x(nl) is not a Markov process and that (13.3) does not hold. . 26. In a sequence of Bernoulli trials say that the state EI is observed at time n if the trials number n - 1 and n resulted in success; otherwise the system is in E 2 • Find the n-step transition probabilities and discuss the nonMarkovian character. Note: This process is obtained from the chain of problem 1 by lumping together three states. Such a grouping can be applied to any Markoy chain and destroys the Markovian character. Processes of this type were studied by Harris.32 27. Mixing of Markov chains. Given two Markov chains with the same number of states, and matrices PI and P 2 • A new process is defined by an initial distribution and n-step transition probabi1ities ~:Pt + ~:p2n. Discuss the non-Markovian character and the relation to the urn models of Y, 2. 32 T. E. Harris, On chains of infinite order, Pacific Journal of Mathematics, vol. 5 (1955), Supplement 1, pp. 707-724.

XV.14]

PROBLEMS FOR SOLUTION

427

28. Let N be a Poisson variable with expectation A. Consider N independent Markov processes starting at Eo and having the same matrix P. Denote by Zkn) the number among them after n steps are found in state E k • Show that z~n) has a Poisson distribution with expectation A' p~~). Hmt: Use the result of example XII,(I.b). 29. Using the preceding problem show that the variable Xkn ) of example (11.b) has a Poisson distribution with expectation LUiP}~) = Uk'

CHAPTER XVI*

Algebraic Treatment of Finite Markov Chains

In this chapter we consider a Markov chain with finitely many states E 1 , ••• , Ep and a given matrix of transition probabilities Pik. Our main aim is to derive explicit formulas for the n step transition probabilities pj~). We shall not require the results of the preceding chapter, except the general concepts and notations of section 3. We shall make use of the method of generating functions and shall obtain the desired results from the partial fraction expansions ofXI,4. Our results can also be obtained directly from the theory of canonical decompositions of matrices (which in turn can be derived from our results). ~{oreover, for finite chains the ergodic properties proved in chapter XV follow from the results of the present chapter. However, for simplicity, we shall slightly restrict the generality and disregard exceptional cases which complicate the general theory and hardly occur in practical examples. The general method is outlined in section 1 and illustrated in sections 2 and 3. In section 4 special attention is paid to transient states and absorption probabilities. In section 5 the theory is applied to finding the variances of the recurrence times of the states E j • 1. GENERAL THEORY For fixed j and k we introduce the generating function 1 00

(1.1)

Pikes) -

L p~.~)sn n=Q

* This chapter treats a special topic and may be omitted.

. Recall that p~~) equals 0 or 1 according as j =;l: k or j = k. (The p~~) are known as Kronecker symbols.) 1

428

XV!. 1]

429

GENERAL THEORY

Multiplying by SPi1 and adding over j = 1, ... , p we get P

(1.2)

s 2,PiiP1k(S) = Piis) - p~~). I

j

This means that for fixed k and s the quantities Zi system of a linear equations of the form

= PikeS)

satisfy a

p

(1.3)

Zi -

S

2, PiiZi = bi·

1=1

The solutions Zj of (1.3) are obviously rational functions of s with a common denominator D(s), the determinant of the system. To conform wIth the standard notatIOns of linear algebra we put s = t- 1 • Then t P D(t-1) is a polynomial of degree p (called the characteristic polynomial of the matrix P of transition probabilities Pjk). Its roots tJ, ... , tp are called the characteristic roots (or eigenvalues) of the matrix P. We now introduce the simplifying assumptions that the characteristic roots tJ, ... , tp are simple (distinct) and 2 ¢O. This is a slight restriction of generality, but the theory will cover most cases of practical interest. As already stated, for fixed k the p quantities PikeS) are rational functions of s with the common denominator D(s). The roots of D(s) are given by the reciprocals of the non-vanishing characteristic roots tv. It follows therefore from the results of XI,4 that there exist constants b(p)

(1)

(1.4)

PikeS)

b ik

=

1 - st1

+ ... + - -1k 1 - st p

Expanding the fractions into geometric series we get the equivalent relations (1.5)

en) _

P ik

-

b{1)t n

ik

1

+ ... + b(p)t ik p

n

valid for all integers n > o. We proceed to show that the coefficients b}~) are uniquely determined as solutions of certain systems of linear equations. The quantity p~~+1) can be obtained from (1.5) by changing n into n + 1, but also by multiplying (1.5) by Pii and summing over 2 I he condition tT =;!= 0 wIll be discarded presently. A cham wIth multiple roots is treated numerically in example (4.b). 3 In theory we should omit those roots tr that cancel against a root of the numerator. For such roots we put b~~) = 0 and so (1.4) and (1.5) remain valid under any circumstances.

430

j

=

ALGEBRAIC TREATMENT OF FINITE MARKOV CHAINS

1, ...

,p.

[XV!. 1

Equating the two expressions we get an identity of the form CI t n +I ...p + C t nP = 0

(1.6)

valid fOl all n. This is manifestly impossible unless all coefficients vanish, and we concI ude that p

" b(v) L.Pi; ;k ;=1

(1.7)

t b(v) v ik

for all combinations i, k, and v. On multiplying (I.5) by Pkr and summing over k we get in like manner p

"b(v) L. ;k Pkr -

(1.8)

t b(v) v ;r·

k 1

Consider the p by p matrix b(v) with elements b~~). The relations 4 (1.7) assert that its kth column represents a solution of the p linear equations p

'Ip t,.. x., -

(1 9)

tx·t - 0

;=1

with

t

=

tv;

similarly (1.8) states that thejth row satisfies p

(1.10)

~YkPkr - tYr

=0

k=1

wIth t = tv. The system (1.10) IS o'btamed from (I.9) by mterchangmg rows and columns, and so the determinants are the same. The determinant of (1.9) vanishes only if t coincides with one of the distinct characteristic values t 1 , • • • , tp. In other words, the two systems (I.9) and (1.10) admit of a non-trivial solution if, and only if, t - tv for some v We denote . 0 f correspon d·mg so Iu t·IOns by (Xl v) a paIr , ••• , X p(v» an d (Y1(v) ' . . . , Yp(v» • They are determined up to multiplicative constants, and so (1.11)

b (v) jk

-

c(v)x(v)y(v)

-

j

k'

where c(v) is a constant (independent of j and k). To find this unknown constant we note that (1.9) implies by induction that p

(1.12)

~ P~;)Xi ;=1

=

tnX i

for all R. We use this relation for t - t;,., where 4 is an arbitrary integer between 1 and p. When p~;) is expressed in accordance with (1.5) we 4

The two systems (1.7) and (1.8) may be written in the compact matrix form = tvb(Vl and b(VlP = tvb(Vl.

Pb(vl

XV!. 1]

431

GENERAL THEORY

find (1. 13) This represents an identity of the form (1.6) which can hold only if all coefficients vanish. Equating the coefficients of tJ: on both sides we get finall y5 (1.14)

1. k=I

This relation determines the coefficient b}~) in (1.11). It is true that the x},u and y~).) are determined only up to a multiplicative constant, but replacing x?') by Axf) and y~)..) by By~).) changes c()..) into c()..)/AB, and the coefficient b};) remains unchanged. We summarize this result as follows. The two systems of linear equations (1.9) and (I. 10) admit of non-trivial solutions only for at most p distinct values of t (the same for both systems). We suppose that there ale exactly p such values i I , . . . , lp, all different from O. To each i).. choose a non-zero solution (x~)..), ... , x1)..» of(1.9) and a non-zero solution (y~)..), ..• ,y1)..» of (1.10). With c()..) given by (1.14) we have then for n = 0,1, ... p

(1.15)

W i k )... P(ikn) _- "L. c x()")yWtn )..=1

We have thus found an explicit expression for all the transition probabilities. 6 The assumption that the characteristic roots are distinct is satisfied in most practical cases, except for decomposable chains, and these require only minor changes in the setup (see section 4). Not infrequently, however, 0 is among the characteristic roots. In this case we put tp = O. The novel feature derives from the fact that the determinant D(s) of the system (1.3) now has only the p - 1 roots t1I, ... , t;;':1' and so the generating 1. The function PikEs) is the ratio of two polynomials of degree p fJ

5

The vanishing of the other coefficients implies that ~ y~)..)x~V) = 0 whenever

A =;l: v. k=I 6 The final formula (1.15) becomes more elegant in matrix form. Let Xc)..) be the column vector (or p by 1 matrix) with elements xV), and yW the row vector (or 1 by p matrix) with elements y~)..). Then (1.15) takes on the form pn =

p

~ cC)") Xc)..)

YC)..)tl

).=1

and

cC)")

is defined by the scalar equation

cC)")

yW Xc)..)

= 1.

432

ALGEBRAIC TREATMENT OF FINITE MARKOV CHAINS

[XVI. 2

partial fraction' expansions require that the degree of the numerator be smaller than the degree of the denominator, and to achieve this we must first subtract an appropriate constant from PikeS). In this way wOe obtain for P,is) a partial fraction expansion differing from (1.4) in that theolast term is replaced by a constant. A glance at (1.15) shows that' this affects the right side only when n = O. In other words, the explicit representation (1.15) of p}~) remains valid for n > 1 even if tp = 0 (provided the roots t 1 , ••• ,tp _ 1 are distinct and different from zero). The left side in (1.15) can remain bounded for all n only if It;.1 < 1 for all A. For t = 1 the equations (1.9) have the solution xi = 1 and so one characteristic root equals 1. Without loss of generality we may put tl = 1. If the chain is aperiodic we have It;.1 < 1 for all other roots and one sees from (1.15) that as n ) 00 en) -+ C(I)y(1)

(1.16)

Pik

k

.

In other words, the invariant probability distribution is characterized as a solution 01(1.10) with t - 1. 2. EXAMPLES (a) Consider first a chain with only two states. The matrix of transition

probabilities assumes the simple form

P= (l- P <X.

P)

1-<X.

where 0 < p < 1 and 0 < <X. < 1. The calculations are trivial since they 1 involve only systems of two equations. The chalactelistic loots ale tl and t2 = (I -<X.-p). The explicit representation (l.15) for pj~) may be exhibited in matrix form 1

pn <X.

+P

(" <X.

P)

I (l_"_p)n ( p <X. + P -<X.

P

~P)

(where factors common to all four elements have been taken out as factors to the matrices). This formula is valid for n > o. (b) Let

(2.l)

p=

1

1

t

2"

1

1

XVI.2]

433

EXAMPLES

[this is the matrix of problem (2.b) in XV,I4]. The system (I.9) reduces to (2.2) To 1 0 thele corresponds the solution (1, 1, 0, 0), but we saw that the characteristic root is not required for the explicit representation of p~~) for n > 1. The standard procedure of eliminating variables shows that the other characteristic roots satisfy the cubic equation t 3 = 1. If we put for abbreviation

°

(2.3)

') ehi = cos "377"

uCJ =

+.l SIll . 377" "

(where i 2 = -1) the three characteristic roots are t I = 1, t 2 = (), and t3 = ()2 (which is the same as t3 = ()-l). We have now to solve the systems (1.9) and (1.10) with these values for t. Since a multiplicative constant remains arbitrary we may put x~V) = y~V) = -I. The solutions then coincide, respectively, with the first columns and first rows of the three matrices in the final explicit representation (2.4)

1 122 pn

=!

1 1 2 2 6 1 1 2 2

+ ()n

1

1

2()

6

()2

()2

2

°

Since we have discarded the characteristic root t = this formula is valid only for n > 1. It is obvious from (2.4) that the chain has period 3. To see the asymptotic behavior of pn we note that 1 + () + ()2 O. Using this it is easily verified that when n -+ 00 through numbers of the form n = 3k the rows of pn tend to (t, i, 0, 0). For n = 3k + 1 and n = 3k + 2 the corresponding limits are (0, 0, 0, 1) and (0, 0, 1, 0). It follows that the invariant probability distribution is given by (l, l, i, 1)· (c) Let p + q = 1, and ~

q

""0 p

(2.5)

p=

° °q ° ° ° q

p

p

....

Lt'

{\ v

n

'1

{\

~-

This chain represents a special case of the next example but is treated separately because of its simplicity. It is easily seen that the system (1.9) reduces to two linear equations for the two unknowns Xl + X3 and

434 x2

+x

(2.6)

[XVI. 2

ALGEBRAIC TREATMENT OF FINITE MARKOV CHAINS

and hence that the four characteristic roots are given by

4,

t)

=

1,

t2

=

-1,

t3

=

i(q-p),

t4

=

-i(q-p).

The conesponding solutions ale (1, 1, 1, 1), ( 1, 1, 1, 1), ( i, 1, i, 1), and (i, -1, -i, 1). [It will be noted that they are of the form ((), ()2, ()3, ()4) where () is a fourth root of unity.] The system (1.10) differs from (1.9) only in that the roles of p and q are interchanged, and we get therefore without further calculations (2.7)

p~~) =

i{l

+ (q-ptii-k-n}{l + (_l)k+i-n}.

(d) In the general cyclical random walk of example XV, (2.d) the first row of the matrix P is given by qo, . • . , qp-l and the other rows are 4 it was shown obtained by cyclical permutations. In the special case p in the preceding example that x}v) and y~v) are expressible as powers of the fourth roots of unity. It is therefore natural to try a similar procedure in terms of the pth root of unity, namely

(2.8)

All pth roots of unity are given by 1, (), we put

()2, •.• ,()P-l.

For r = 1, ... ,

p-l

(2.9)

tr =

L

qv()vr

v=O

It is easily verified that for

I -

lr

the systems (1.9) and (1.10) have the

solutions (2.10) and for the corresponding coefficients Thus finally' p~~)

(2.11)

=

c(r)

we have in all cases

c(r)

=

IIp.

p-l

p-l

L ()r(j-k)t~.

r=1 ? For n - 0 the right side in (2.11) is defined only when no lr vanishes. Actually we have proved the validity of (2.11) for n ~ 1 assuming that the roots Ir are distinct, and this is not necessarily true in the present situation. For example, if qk = p-l for all k then 10 = 1, but 11 = ... = t p-l = O. Even in this extreme case (2.11) remains valid since the right side yields for all j, k, and n ~ 1. Fortunately it is not difficult to verify (2.11) directly by induction on n. In particular, when n = 1 the factor of qv in (2.9) reduces to

0-1

L Or


This sum is zero exce·pt when j - k. + v = 0 or p, in which case each term equals one. Hence reduces to qk-J if k ~j and to qP+k-i if k <j, and this is the given matrix (Pik).

pW

XVI.2]

435

EXAMPLES

(e) The occupancy problem. Example XV, (2.g) shows that the classical occupancy problem can be treated by the method of Markov chains. The system is in state j if there are j occupied and p - j empty cells. If this is the initial situation and n additional balls are placed at random, then pj~) is the probability that there will be k occupied and p - k empty cells (so that pj~) = 0 if k < j). For j = 0 this probability follows from II, (I 1.7). We now derive a formula for pj~), thus generalizing the result of chapter II. Since Pi; - }/ p and P;,HI - (p-})/ p the system (1.9) reduces to (pt-j)Xi = (P-j)Xi+I'

(2.12)

For t = 1 this implies Xi = 1 for all j. When t ~ 1 it is necessary that = 0, and hence there exists some index r such that Xr±l = 0 but Xr ~ 0; from (2.12) it follows then that p t = r. The characteristic roots are therefore given by Xo

~r=rlp,

(2.13)

r

=

1, ... , p.

The corresponding solutions of (2.12) are given by (2.14)

so that

x\" = xjr)

=

0 when j

> r.

G) /~)

For

t

=

tr

the system (1.10) reduces to

(r_j)y~r) = (p-j+1)Y;~1

(2.15)

and has the solution (2.16)

where, of course, gjrl 0 if j yjr) = 0 for j < r we get

< 1.

Since

xjrl

0 for j

>

rand

and hence

(2.17) On expressing the binomial coefficients in terms of factorials, this formula simplifies to (2.18) with pj~)

=

0 if k

< j.

[For a numerical illustration see example (4.b).]

436

ALGEBRAIC TREATMENT OF FINITE MARKOV CHAINS

[XVI. 3

3. RANDOM WALK WITH REFLECTING BARRIERS The application of Markov chains will now be illustrated by a complete discussion of a random walk with states 1, 2, ... ,p and two refiecting barriers.s The matrix P is displayed in example XV, (2.c). For 2 < k < < p - 1 we have Pk.k+1 = P and Pk.k-l = q; the first and the last rows are· defined by (q, P, 0, ... ,0) (0, ... , 0, q, p). F or convenience of comparisons with the developments in chapter XIV we now discard the variable t = s I and write the characteristic roots in the form S;l (rather than tr ); it will be convenient to number them from to p - 1. In terms of the variable s the linear system (1.9) becomes

°

= S(qXI +px2) Xj = s(qxj_ l +pxj+1) Xp = S(qXP_I +pxp). Xl

(3.1)

(j=2, 3, ... , p-1)

This system admits the solution x, 1 corresponding to the root s 1. To find all other solutions we apply the method of particular solutions (which we have used for simIlar equations in XIV, 4). The middle equation in (3.1) is satisfied by Xj = Ai provided that A is a root of the quadratic equation A = qs + A2pS. The two roots of this equation are (3.2)

1 - Jl-4 pqs 2 2ps

2ps

and the most general solution of the middle equation in (3.1) is therefore (3.3)

Xj = A(S)Ai(s)

+ R(S)A~(S),

where A(s) and R(s) are arbitrary. The first and the last equation in (3.1) wiil be satisfied by (3.3) if, and only if, Xo = Xl and xp = x p+ l , This requires that A(s) and R(s) satisfy the conditions (3.4)

A(s)A}(s){l

AI(S)}

+ B(s)A~(s){l

A2(S)}

O.

Conversely, if these two equations hold for some value of s, then (3.3) represents a solution of the linear system (3.1) and this solution is identically zero only when AI(S) = A2(S). Our problem is therefore to find the 8 Part of what follows is a repetition of the theory of chapter XIV. Our quadratic equation occurs there as (4.7); the quantities Al(S} and A2(S} of the text were given in (4.8), and the general solution (3.3) appears in chapter XIV as (4.9). The two methods are related, but in many cases the computational details will differ radically.

XV!'3]

437

RANDOM WALK WITH REFLECTING BARRIERS

values of

S

for which ;/~(s)

(3.5)

=

A~(S)

but

,-

Since A]CS)A2CS) - q/p the first relatIOn Imphes that Al(S)"p/q must be a (2p )th root of unity, that is, we must have (3.6)

where r is an integer such that 0 < r < 2p. From the definition (3.2) it follows easily that (3.6) holds only when S = Sr where (3.7)

S;l

= 2~ . cos 7Tr/ p.

The value S = sp violates the second condition in (3.5); furthermore Sr = S2p-r, and so p distinct characteristic values are given by (3.7) with r=O,I, ... ,p-1. Solving (3.4) with S = Sr . and substituting into (3.3) we get (rl

(3.8)

xi

(Q)j/2.sm -7Trj - (Q)( Hl)/2. 7Tr(j -1) sm

= -

p

=

for r

p

p

1, ... , p - 1 whereas for r

p

=0

(39)

The adjoint system (I .10) reduces to

(3.10)

(k=2, . .. , p-l)

Yk = S(PYk-l +qYk+l)' Yp

=

SP(Yp_l +Yp)'

The middle equation is the same as (3.1) with P and q interchanged, and its general solution is therefore obtained from (3.3) by interchangmg P and q. The first and the last equations can be satisfied if s - Sr' and a simple calculation shows that for r = 1,2, ... , p-l the solution of (3.10) is (r)

Yk

(3.11)

For

So

(3.12)

=

= (p)7C/2. - sm -7Trk q

1 we get similarly

p

-

(p)(k-l)/2. 7Tr(k-l) sm . q

p

438

[XVI. 4

ALGEBRAIC TREATMENT OF FINITE MARKOV CHAINS

It remains to find the coefficients

defined by

c(r)

p-l

(3.13)

c(r)

L

k

\Vhen r

=

x~~r)y~r) = l.

0 the kth term of the sum equals (plq)k and so c(O)

(3.14)

= 1 . (plq)

- 1 P (plq)P - 1 '

except when p = q, in which case if tedious, calculation 9 leads to (3.15)

c(r)

=

Co

=

lip. When r

> 1 an elementary,

2P! TTr)-l -;tl-2Jpq cos -; .

Accordingly, the general representation (1.15) for the higher transition probabilities leads to the final resu1t10

p(.~) = (prq)

1

1

(plq)P - 1 \q

,

(d (d

p-l

IE) + 2P.L 11'

I

(3.16)

P

Xj

r=l

1-

I

Yk [2\1 pq cos TTr/ p] 1 - 2.) pq cos TTrl p

with x~r) and y1r ) defined by (3.8) and (3.l1). When p term on the right is to be interpreted as lip.

=q

n

the first

4. TRANSIENT STATES; ABSORPTION PROBABILITIES The theorem of section 1 was derived under the assumption that the roots t l • t'}" • •• are distinct. The presence of multiple roots does not require essential modifications, but we shall discuss only a particular 9 The calculations simplify considerably in complex notation using the fact that sin v = [e iv -e- iv ]/(2i). The sum in (3.13) reduces to a linear combination (with

complex coefficients) of Slims Of the form p-l

!

e 2;1TimJP

j=O

°

where m = or m = ± 1. In the first case the sum equals p, in the second 0, and (3.15) follows trivially. 10 For analogous formulas in the case of one reflecting and one absorbing barrier see M. Kac, Random walk and the theory of Brownian motion, Amer. Math. Monthly, vol. 54 (1947), pp. 369-391. The definition of the reflecting barrier is there modified so that the particle may reach 0; whenever this occurs, the next step takes it to 1. The explicit formulas are then more complicated. Kac's paper contains also formulas for p~,/ in the Ehrenfest model [example XV, (2.e)].

XVI.4]

TRANSIENT STATES;

ABSORPTION PROBABILITIES

439

case of special importance. The root t1 = 1 is multiple whenever the chain contains two or more closed subchains, and this is a frequent situation in problems connected with absorption probabilities. It is easy to adapt the method of section 1 to this case For conciseness and clarity, we shall explain the procedure by means of examples which will reveal the main features of the general case. Examples. (a) Consider the matrix of transition probabilities

1 i

0 0 0 0

2

0 0 0

3"

1

3"

0 0

1.

~

4

0 0

0 0

5"

1

.i 5

0 0

1.

1.

1.

1.

1

1.

4

1. 6

0 1

"6

4

4 1

"6

1

6"

4

"6

4

6

It is clear that E1 and E2 form a closed set (that is, no transition is possible to any of the remaining four states; compare XV, 4). Similarly E3 and E4 form another closed set. Finally, E5 and E6 are transient states. After finitely many steps the system passes into one of the two closed sets and remains there. The matrix P has the form of a partitioned matrix

(4.2)

where each letter stands for a 2 by 2 matrix and each zero for a matrix with four zeros. For example, A has the rows (1,!) and (i, t); this is the matrix of transition probabilities corresponding to the chain formed by the two states E1 and E 2. This matrix can be studied by itself, and the powers An can be obtained from example (2.a) with p = (X = i· When the powers p2, P3, ... are calculated, it will be found that the first two rows are in no way affected by the remaining four rows. More precisely, pn has the form

(4.3)

440

ALGEBRAIC TREATMENT OF FINITE MARKOV CHAINS

[XVI. 4

where An, Bn, Tn are the nth powers of A, B, and T, respectively, and can be calculatedl l by the method of section 1 [cf. example (2.a) where all calculations are performed]. Instead of six equations with six unknowns we are confronted only with systems of two equations with two unknowns each. It should be noted that the matrices Un and Vn in (4.3) are not powers of U and V and cannot be obtained in the same simple way as An, Bn, and Tn. However, in the calculation of p2, P3, . .. the third and fourth columns never affect the remammg four columns. In other words, if in pn the rows and columns corresponding to E3 and E4 are deleted, we get the matrix (4.4) which is the nth power of the corresponding submatrix in P, that is, of 1

i i !

0 0

t

t

1.

1.

1.

.!

1.

3"

(4.5)

(~ ;)

-

6

6

0 0

6

4

6

Therefore matrix (4.4) can be calculated by the method of section 1, which in the plesent case simplifies considelably. The matrix Vn can be obtained in a similar way. Usually the explicit forms of Un and Vn are of interest only inasmuch as they are connected with absorption probabilities. If the system starts from, say, E 5, what is the probability A that it will eventually pau jnto the closed set formed by E1 and E2 (and not into the other closed set)? What is the probability An that this will occur exactly at the nth step? Clearly p~~) + p~~) is the probability that the considered event occurs at the nth step or before, that is,

Letting n -+ 00, we get A. A preferable way to calculate An is as follows. The (n-l)st step must take the system to a state other than E1 and E 2, that is, to either E5 or E6 (since from E3 or E4 no transition to E1 and E2 is possible). The nth step then takes the system to £1 or £2. 11 In T the rows do not add to unity so that T is not a stochastic matrix. The matrix is substochastic in the sense of the definition in XV, 8. The method of section 1 applies without change, except that t = 1 is no longer a root (so that Tn -+ 0).

XVI. 4]

TRANSIENT STATES;

ABSORPTION PROBABILITIES

441

Hence

An -- P55 (n-l)( + ) + P56 (n-l)( + ) P51 P52 P6I P62

1. (n-l) + 1 (n-I) 4P55 "3P 56 •

=

It will be noted that J' n is completely determined by the elements of Tn-I, and this matrix is easily calculated. In the present case en) _ pen) _ P55 56 -

1.(~_)n-1

'

an d hence

4 12

-

An· -

7 5 )n-2 48(12 .

(b) Brother-sister mating We conclude by a numerical treatment of the chain of example XV, (2.j). The main point of the following discussion is to show that the canonical r~presentation 6

p~~)

(4.6)

=L

t~c(r)X~r)Ykr)

r-l

remains valid even though t = 1 is a double root of the characteristic equation. The system (1.9) of linear equations takes on the form (4.7)

Xl AXI

= tXb

iX I

+ iX2 + i X3 =

+ tX2 + i X3 + iX4 +

i X3 + iX4

+ i X5 =

i!6 X 5

+

tx4, X5 = tx5,

tx 2,

i- X6 =

tX 3

X3 = tx6,

and these equations exhibit the form of the given matrix. From the first and fifth equations it is clear that Xl - X5 - 0 unless t - 1. For t ~ 1, therefore, the equations reduce effectively to four equations for four unknowns and the standard elimination of variables leads to a fourthdegree equation for t as a condition for the compatibility of the four equations. Since there are six characteristic roots in all it follows that t = 1 is a double root. It is not difficult to verify that the six characteristic roots are l2 (4.8)

The corresponding solutions (xi r ), follows: (4.9)

••• , x~r))

of (4. 7) can be chosen as

(1,!, t, i, 0, t), (0, i, t, !, 1, t), (0, 1,0, -1,0,0) (0,1, -1, 1,0, -4),

(0,1, -1+):5, 1,0,6-2)5),

(0,1,-1-)5,1,0,

6+ 2 )5)

12 The root t a = t can be found by inspection since it corresponds to the simple solution Xa = -X4 = 1 and Xl = Xa = Xs = Xs = O. The cubic equation for the other roots is of a simple character.

442

[XVI. 4

ALGEBRAIC TREATMENT OF FINITE MARKOV CHAINS

The next problem is to find the corresponding solutions (y~r), . .. , y~r») of the system obtained from (4.7) by interchanging rows and columns. For r > 3 this solution is determined up to a multiplicative constant, but corresponding to the double root t1 - t2 - 1 we have to choose among infinitely many solutions of the form (a, 0, 0, 0, b, 0) .. The appropriate choice becomes obvious from the form of the desired representation (4.6). Indeed, a glance at (4.9) shows that x~r) = except for r = 1, and hence (4.6) yields p~~) = c(l) yi1l for all k and n. But E1 is an absorbing state and it is obvious that pt~) - 0 for all k #; 1. It follows that for r ~ I we must choose a solution of the form (a, 0, 0, 0, 0, 0), and for the same reason a solution corresponding to r = 2 is (0, 0, 0, 0, b, 0). The solutions corresponding to the remaining characteristic values are easily found. (Those chosen in our calculations are exhibited by the second rows of the matrices below.) The norming constants c(r) are then determined by (1.14), and in this way we get all the qualities entering the representation formula (4.6). In the display of the final result the matrices corresponding to r = 1 and r - 2 have been combined into one. Furthermore, the elements c(r)x~r)Ykr) corresponding to r = 5 and r = 6 are of the form a ± b.JS. For typographical convenience and clarity it was 1)ecessary to regroup their contributions in the form a[t;+t:l· and b.J5[t;-t:].

°

0 0

t

1 0 v

v

V

4

1

0 0

0 0 0 0

1

0 2" ! 0 1 0 ! 0

0 -4

~

pn=

2"

t 0

!

0 0

0 -1

4- n

+20

0 0 0

0 J

V

2- n

0 1 0 0

+4

0 4 -4

~

V

.~

0 -2

0 0 0 0

0 2 0 0

-

,

0 0

0 0 -1 -2

-

J

V

0 -1

0 0 0 0

0 0 0

-9

6 4

4 16

0 2 -2 2

4

4 -4

-9

6

4

-9 6 4 -11 -9 6

{\

()

()

{\

{\

{\

()

()

()

{\

{\

(\

4

-16

16

-16

4

8

-14

16 -16

16

-14

12

0 -4 -4

0 2 0

-6

-4

1 -1

4 -4

4

2 1 -1 -2

n t __ + t 6n +_5

40

-11

0 -4 t~ -

0 4 0

5 -4

2 4 2

4

0 2 4 2

-6

16

tr yI_ -5

+ --:w-

-5

It is easily verified that this formula is valid for n = 0. On the other hand,

from the structure of the right side in (4.6) it is clear that if (4.6) holds for

XVI. 5]

APPLICATION TO RECURRENCE TIMES

443

some n then it is valid also for n + 1. In this way the validity of (4.6) can be established without recourse to the general theory of section 1.

5. APPLICATION TO RECURRENCE TIl\fES In problem 19 of XIII,12 it is shown how the mean fl and the variance a 2 of the recurrence time of a recurrent event G can be calculated in terms of the probabilities Un that G occurs at the nth trial. If G is not periodic, then (5.1)

and

provided that a 2 is finite If we identify G with a persistent state E j , then Un = pj;) (and U o = 1). In a finite Markov chain all recurrence times have finite variance (cf. problem 19 of XV, 14), so that (5.1) applies. Suppose that E j is not periodic and that formula (1.5) applies. Then tl = 1 and Itrl < 1 for r - 2, 3, ... , so that p};) ~ P}:) - flj 1. 10 the term Un - fl 1 of (5.1) there corresponds ( 5.2)

p(.~) JJ

-

1. fl j

This formula is valid for n lr' we find

>

= £. f p(.r.l(n JJ r' r=2

1; summing the geometric series with ratio

(5.3)

Introducing this into (5.1), we find that if E j is a non-periodic persistent state, then its mean recurrence time is given by flj = lip};), and the variance of its recurrence time is (5.4)

2 2 2 Il.2 a·=Il.-Il.+ ) ,} ,} ,}

p

(r)t

2: 1PHr t

r 2

r

provided, of course, that formula (1.3) is applicable and tl 1. The case of periodic states and the occurrence of double roots require only obvious modifications.

CHAPTER XVII

The Simplest Ti111e-Dependent Stochastic Processes l

1. GENERAL ORIENTATION. MARKOV PROCESSES The Markov chains of the preceding chapters may be described very roughly as stochastic processes in which the future development depends only on the present state, but not on the past history of the process or the manner in which the present state was reached. These processes involve only countably many states £1, £2' .. , and depend on a discrete time parameter, that is, changes occur only at fixed epochs2 t - 0, 1,

Tn

the present chapter we shall consider phenomena such as telephone calls, radioactive disintegrations, and chromosome breakages, where changes may occur at any time. Mathematically speaking, we shall be concerned with stochastic processes involving only countably many states but dependmg on a contmuous tIme parameter. A complete descnptIOn of such processes is not possible within the framework of discrete probabilities and, in fact, we are not in a position to delineate formally the class of Markov processes in which we are interested. Indeed, to describe the past hi~tory of the process we must specify the epochs at which changes have occurred, and this involves probabilities in a continuum. Saying that the future development IS independent of the past hIstory has an ObVIOUS mtuitive meaning (at least by analogy with discrete Markov chains), but a formal definition involves conditional probabilities which are beyond the scope of this book. However, many problems connected with such 1 This chapter is almost independent of chapters X-XVI. For the use of the term stochastic process see footnote 28 in XV, 13. 2 As in the preceding chapters, when dealing with stochastic processes we use the term epoch to denote points on the time axis. In formal discussions the word time will refer to durations.

444

XVII. 1]

GENERAL ORIENTATION.

MARKOV PROCESSES

445

processes can be treated separately by quite elementary methods provided it is taken for granted that the processes actually exist. We shall now proceed in this manner. To the transition probability p;~) of discrete Markov chains there corresponds now the transition probability Pik(t), namely the conditional probability of the state Ek at epoch t+s given that at epoch s < t+s the system was in state E j • As the notation indicates, it is supposed that this probability depends only on the duration t of the time interval, but not on its position on the time axis. Such transition probabIlities are called stationary or time-homogeneous. (However, inhomogeneous processes will be treated in section 9.) The analogue to the basic relations XV,(3.3) is the Chapman-Kolmogorov identity (1.1)

Pik(T+t) =

L PilT)Pjk(t), j

which is based on tbe following reasoning. Suppose that at epoch 0 the system is in state E i . The jth term on the right then represents the probability of the compound event of finding the system at epoch T in state E j, and at the later epoch T+t in state Ek. But a transition from Ei at epoch 0 to Ek at epoch T+t necessarily occurs through some intermediary state E j at epoch T and summing over all . possible E j we see that (1.1) must hold for arbitrary (fixed) T > 0 and t > O. In this chapter we shall study solutions of the basic identity (1.1). It will be shown that simple postulates adapted to concrete situations lead to systems of differential equations for the Pjk(t), and interesting results can be obtained from these differential equations even without solving them. These results are meaningful because our solutions are actually the transition probabilities of a Markov process which is uniquely determined by them and the initial state at epoch O. This intuitively obvious fact3 will be taken for granted without proof. For fixed j and t the transition probabilities Pjk(t) define an ordinary discrete probability distribution. It depends on the continuous parameter t, but we have encountered many families of distributions involvmg continuous parameters. Technically the considerations of the following sections remain within the framework of discrete probabilities, but this artificial limitation is too rigid for many purposes. The Poisson distribution {e-At(At)n/n!} may illustrate this point. Its zero term e- At may be 3 It is Ilotewor thy, however, that there rnay exist (rather pathological) non-Markovian processes with the same transition probabilities. This point was discussed at length in XII, 2.a, in connection with processes with independent increments (which are a special class of Markov processes). See also the discussion in section 9, in particular footnote 18.

446

THE SIMPLEST TIME-DEPENDENT STOCHASTIC PROCESSES

[XVII.2

interpreted as probability that no telephone call arrives within a time interval of fixed length t. But then e- At is also the probability that the waiting time for the first call exceeds t, and so we are indirectly concerned with a continuous probability distribution on the time axis. We shall return to this point in section 6.

2. THE POISSON PROCESS The basic Poisson process may be viewed from various angles, and here we shall consider it as the prototype for the processes of this chapter. The following derivation of the Poisson distribution lends itself best for our generalizations, but it is by no means the best in other contexts. It should be compared with the elementary derivation in VI, 6 and the treatment of the Poisson process in XII, (2.a) as the simplest process with independent increments. F or an empirical background take random events such as disintegrations of particles, incoming telephone calls, and chromosome breakages under harmful lrradIauon. All occurrences are assumed to be of the same kmd, and we are concerned with the total number Z(t) of occurrences in an arbitrary time interval of length t. Each occurrence is represented by a point on the time axis, and hence we are really concerned with certain random placements of points on a line. The underlying physical assumption is that the forces and influences governing the process remain constant so that the probabihty of any parucular event is the same for all ume mtervals of duration t, and is independent of the past development of the process. In mathematical terms this means that the process is a timehomogeneous Markov process in the sense described in the preceding section As stated before, we do not aim at a full theory ofsllch processes, but shall be content with deriving the basic probabilities (2.1)

Pn(t)

=

P{Z(t)

=

n}.

These can be derived rigorously from simple postulates without appeal to deeper theories. To introduce notations applopIiate fOI the othel plocesses in this chapter we choose an origin of time measurement and say that at epoch t > 0 the system is in state En if exactly n jumps occurred between 0 and t. Then P net) equals the probability of the state En at epoch t, but P net) may be described also as the transition probability from an arbitrary state E j at an arbitrary epoch s to the state Ei+n at epoch s + t. We now translate our informal description of the process into properties of the probabilities P n(t). Let us partition a time interval of unit length into N subintervals of

XVII.2]

447

THE POISSON PROCESS

length h = N-I. The probability of a jump within anyone among these subintervals equals I - P o(h), and so the expected number of subintervals containing a jump equals h-I[l- Po(h)]. One feels intuitively that as h ~ 0 this number will converge to the expected number of jumps within any time interval of unit length, and it is therefore natural to assume 4 that there exists a number A > 0 such that (2.2) The physieal pieture of the process I eq UiI es also that a jump al way s leads from a state E j to the neighboring state EHb and this implies that the expected number of subintervals (of length h) containing more than one jump should tend to O. Accordingly, we shall assume that as h ~ 0 (2.3)

h-I[l

Po(k) PICk)]

~

O.

For the final formulation of the postulates we write (2.2) in the form Po(h) = l-Ah+o(h) where (as usual) o(h) denotes a quantity of smaller order of magnitude than h. (More precisely, o(h) stands for a quantity such that h-Io(h) ~ 0 as h ~ 0) With this notation (2 3) is equivalent to PI(h) = Ah + o(h). We now formulate the Postulates for the Poisson process. The process starts at epoch 0 from the state Eo. (i) Direct transitions from a state E j are possible only to E HI . (ii) Whatever the state E j at epoch t, the probability of a jump within an ensuing short time interval between t and t+h equals Ah + o(h), while the probabllzty oj more than one Jump IS o(h). As explained in the preceding section, these conditions are weaker than our starting notion that the past history of the process in no way influences the future development. On the other hand, our postulates are of a purely analytlc character, and they suffice to show that we must have (2.4)

P n (t )

=

(At)n

n!

-At

e.

To prove this assume first n > 1 and consider the event that at epoch t+h the system is in state En. The probability of this event equals Pn(t+h), and the event can occur in three mutually exclusive ways. First, at epoch t the system may be in state En and no jump occurs between t and t+h. The probability of this contingency is P n(t)Po(h)

= P n(t)[l-Ah] + o(h).

<1 The assumption (2.2) is introduced primarily because of its easy generalization to other processes. In the present case it would be more natural to observe that poet) must satisfy the functional equation PO(t+T) = Po(t)Po( T), which implies (2.2). (See section 6.)

448

THE SIMPLEST TIME-DEPENDENT STOCHASTIC PROCESSES

[XVII.3

The second possibility is that at epoch t the system is in state E n - 1 and exactly one jump occurs between t and t+h. The probability for this is P n-l(t) . Ah + o(h). Any other state at epoch t requires more than one jump between I and t+h, and the plobability of such an event is o(h). Accordingly we must have (2.5)

and this relation may be rewritten in the form (2.6)

P n(t+h) - P net) h

= _ 1pn ( t) + II.

1p

II.

n-l

()

t

+ o(h) . h

As h =* 0, the last term tends to zero; hence the limit 5 of the Jeft side exists and (n

(2.7)

>

1).

For n = 0 the second and third contingencies mentioned above do not arise, and therefore (2.7) is to be replaced by (2.8)

Po(t+h)

=

Po(t)(I-Ah)

+

o(h),

which leads to (2.9)

P~(t)

=

-APo(t).

From this and PoCO) = 1 we get Plt) = e- At • Substituting this poet) into (2.7) with n = 1, we get an ordinary differential equation for PI(t). Since PI(O) = 0, we find easily that PI(t) = Ate-At, in agreement with (24). Proceeding in the same way, we find successively all terms of (2.4).

3. THE PURE BIRTH PROCESS The simplest generalization of the Poisson process is obtained by perrruttmg the probabihues of jumps to depend on the actual state of the system This Jeads us to the following Postulates. (i) Direct transitions from a state E j are possible only to Ei+l' (ii) If at epoch t the system is in state En the probability of a jump 5 Since we restricted h to positive values, p~(t) in (2.7) should be interpreted as a right-hand derivative. It is really an ordinary two-sided derivative. In fact, the term o(h) in (2.5) does not depend on t and therefore remains unchanged when t is replaced by t - h. Thus (2.5) implies continuity, and (2.6) implies differentiability in the ordinary sense. This remark applies throughout the chapter and will not be repeated.

XVII.3]

449

THE PURE BIRTH PROCESS

within an ensuing short time interval between t and t+h equals Anh + o(h), while the probability of more than one jump within this interval is o(h). The salient feature of thi~ a~~umption i~ that the time which the ~y~tem spends in any particular state plays no role; there are sudden changes of state but no aging as long as the system remains within a single state. Again let P net) be the probability that at epoch t the system is in state En· The functions Pn(t) satisfy a system of differential equations which can be denved by the argument of the preceding section, wIth the only change that (2.5) is replaced by

In this way we get the basic system of differential equations

(3.2)

P~(t)

=

P~(t)

-

-AnP net)

+ An-IPn-l(t)

(n

>

1),

AoPo(t).

In the Poisson process it was natural to assume that the system starts from the initial state Eo at epoch O. We may now assume more generally that the system starts from an arbitrary initial state E i . This implies that6 (3.3)

for

n ¥: i.

These initial conditions uniquely determine the solution {P n(t)} of (3.2). [In particular, poet) = PI(t) = ... = Pi-let) = 0.] Explicit formulas for P net) have been derived independently by many authors but are of no interest to us It is easily verified that for arbitrarily prescribed Ait the system {Pn(t)} has all required properties, except that under certain conditions L P net) < 1. This phenomenon will be discussed in section 4. Examples. (a) Radioactive transmutations. A radioactive atom, say UIanium, may by emission of pal tides Ol Y-layS change to an atom of a different kind. Each kind represents a possible state of the system, and as the process continues, we get a succession of transitions Eo ~ EI ~ E2 ~ ~ ... ~ Em. According to accepted physical theories, the probability of a transition En ~ En+! remains unchanged as long as the atom is in state En' and this hypothesis is expressed by our starting supposition. The diffelential equations (3.2) thelefole describe the process (a faet well known to physicists). If Em is the terminal state from which no further 6 It will be noticed that PnCt) is the same as the transition probability PinCl) of section 1.

450

THE SIMPLEST TIME-DEPENDENT STOCHASTIC PROCESSES

[XVII.3

transitions are possible, then )'m = 0 and the system (3.2) terminates with n = m. [For 11 > m we get automatically P net) = 0.] (b) The Yule process. Consider a population of members which can

(by splitting or othenvise) give birth to nevI members but cannot die. Assume that during any short time interval of length h each member has probability Ah + o(h) to create a new one; the constant A determines the rate of increase of the population. If there is no interaction among the members and at epoch t the population size is n, then the probability that an increase takes place at some time between t and t+h equals 11}.h + o(h). The probability P net) that the population numbers exactly n elements therefore satisfies (3.2) with An = 11A, that is, (3.4)

(11

>

1).

P~(t) = O.

Denote the initial population size by i. The initial conditions (3.3) apply and it is easily verified that for 11 > i > 0 (3.5) and, of course, P net) = 0 for 11 < i and all t. Using the notation VI,(8.1) for the negative binomial distribution we may rewrite (3.5) as Pn(t) = 1(11 - i; i, e- At ). It follows [cf. example IX,(3.c)] that the population size at epoch t is the sum of i independent random variables each having the distribution obtained from (3.5) on replacing i by 1. These i variables represent the progenies of the i original members of our population. ThIS type of process was first studied by Yule' in connection with the mathematical theory of evolution. The population consists of the species within a genus, and the creation of a new element is due to mutations. 7 G. Udny Yule, A mathematical theory of evolution, based on the conclusions of Dr. J. C. Willis, F.R.S., Philosophical Transactions of the Royal Society, London, Series B, "01. 213 (l924), pp 21 87 y1! le does not introd1!ce the differential eq1!a t ions (3.4) but derives Pn(t) by a limiting process similar to the one used in VI,S, for the Poisson process. Much more general, and more flexible, models of the same type were devised and applied to epidemics and population growth in an unpretentious and highly interesting paper by Lieutenant Colonel A. G. M'Kendrick, Applications of mathematics to medical problems, Proceedings Edinburgh Mathematical Society, vol. 44 (1925), pp. 1-34. It is unfortunate that this remarkable paper passed practically unnoticed. In particular, it was unknown to the present author when he introduced various stochastic models for population growth in Die Grundlagen der Volterraschen Theorie des Kampfes ums Dasein in wahrscheinlichkeitstheoretischer Behandlung, Acta Biotheoretica, vol. 5 (1939), pp. 11-40.

'.-.XVII.4] •. 1

, .f

.•

DIVERGENT BIRTH PROCESSES

451

'

,!~e assum~tion that each species has the same probability of throwing out

. " ~,~~~ew specIes neglects the difference in species sizes. Since we have also :-:> ,~~ne lected the possibility that a species may die out, (3.5) can be expected -- .-~.ajgive only a crude approximation F urry8 used the same model to describe a process connected with cosmic rays, but again the approximation is rather crude. The differential equations (3.4) apply strictly to a population of particles which can split into exact replicas of themselves, provided, of course, that there is no inter.. actIOn among partIcles. *4. DIVERGENT BIRTH PROCESSES The ~olution {Pn(t)} of the infinite sy~tem of differential eqlJation~ (3.2) subject to initial conditions (3.3) can be calculated inductively, starting from Plt) = e- Ait • The distribution {Pn(t)} is therefore uniquely determined. From the familiar formulas for solving linear differential equations it follows also that P net) > O. The only question left open is whether {P net)} is a proper probability distribution, that is, whether or not (4.1) for all t. We shall see that this is not always so: With rapidly increasing coefficients An it may happen that '.-(4.2) When this possibility was discovered it appeared disturbing, but it finds a ready explanation. The left side in (4.2) may be interpreted as the pro bability that during a time interval of duration t only a finite number of jumps takes place. Accordingly, the difference between the two sides in (4.2) accounts for the possibility of infinitely many jumps, or a sort of explosion. For a better understanding of this phenomenon let us compare our probabilistic model of growth with the familiar deterministic approach. The quantity An in (3.2) could be called the average rate of growth of a population of size n. For example, in the special case (3.4) we have An = nA, so that the average rate of growth is proportional to the actual population size. If growth is not subject to chance fluctuations and has a rate of increase proportional to the instantaneous population size x(t),

* This section treats a special topic and may be omitted. On fluctuation phenomena in the passage of high-energy electrons through lead, Physical Reviews, vol. 52 (1937), p. 569. B

452

THE SIMPLEST TIME-DEPENDENT STOCHASTIC PROCESSES

[XVII A

the latter varies in accordance with the deterministic differential equation

dx(t) = Ax(t). dt

(4.3)

It Implies that (404) where i = x(O) is the initial population size~ It is readily seen that the expectation L nPnCt) of the distribution (3.5) coincides with x(t), and thus x(t) describes not only a deterministic growth process, but also the expected population size in example (3.b). Let us now consider a deterministic growth process where the rate of growth increases faster than the population size. To a rate of growth proportional to x 2(t) theIe cOIIesponds the diffeIential equation (4.5) whose solution is

i x(t) = - 1 - Ait

(4.6)

Note that x(t) increases beyond all bounds as t ~ l/Ai. In other words, the assumption that the rate of growth increases as the square of the population size implies an infinite growth within a finite time interval. Similarly, if in (3.4) the }'n increase too fast, there is a finite probability that infinitely many changes take place in a finite time interval. A precise answer about the conditions when such a divergent growth occurs is given by the

Theorem.

In

that the series

Proof.

01 del

L A~l

that

zP

1 for all t it is necessary andsufJicient

nEt) diverges. 9

Put

(4.7) Because of the obvious monotonicity the limit (4.8)

ft(t)

=

lim [l-Sk(t)] k-+

00

exists. Summing the differential equations (3.2) over n get

=

0, ... ,k we

(4.9) 8 It is not difficult to see that the inequality or else for no t > O. See problem 22.

L P net) < 1

holds either for all

t

> 0,

XVII.4]

453

DIVERGENT BIRTH PROCESSES

In view of the initial conditions (3.3) this implies for k

>i

1 - S,!I) = A. J,'P.(T) dT.

(4.10)

Because of (4.8) the left side lies between fl and I, and hence A. 1",(I)

(4.11) Summing. for k

=

(4.12)

+ . ,. +

",(I)[A,l

< l'p,!S) ds <

i, ... ,n we get for n

A;;l]

<

l'Si

S)

A;;l.

>i ds

< A,l + ... +

A;;l.

When L A~l < 00 the rightmost member remains bounded as n -+ 00, and hence It IS ImpoSSIble that the integrand tends to 1 for all t. Conversely, if L A~l = 00 we conclude from the first inequality that fl(t) = 0 for all t, and in view of (4.8) this implies that Sn(t) -+ 1, as asserted. ~ The criterion becomes plausible when interpreted probabilistically. The system spends some time at the initial state Eo, moves from there to EI> stays for a while there, moves on to E 2 , etc. The probability poet) that the sojourn time in Eo exceeds t is obtained from (3.2) as poet) = e- Aot • This sojourn time, To, is a random variable, but its range is the positive t-axis and therefore formally out of bounds for this book. However, the step from a geometric distribution to an exponential being trivial, we may with impunity trespass a trifle. An approximation to To by a discrete random variable with a geometric distribution shows that it is natural to define the expected sojourn time at Eo by (4.13)

At the epoch when the system enters E j , the state E j takes over the role of the initial state and the same conclusion applies to the sojourn time T, at E j • The expected sojourn time at E j is E(T j ) = Ail. It follows that A"il 1 + All + ... + A;l is the expected duration of the time it takes the system to pass through Eo, E 1 , • • • ,En' and we can restate the criterion of section 4 as follows: In order that Pn(t) = 1 for all t it is necessary and sufficient that

L

(4.14)

that is, the total expected duration of the time spent at Eo, E1> E 2 , • •• must be infinite. Of course, Lo(t) = 1 Pit) is the probability that the system has gone through all states before epoch t. With this interpretation the possibility of the inequality (4.2) becomes understandable. If the expected sojourn time at E j is 2-', the probability that the system has passed through all states wlthm tIme 1 + 2 I + 2 2 + . . . 2 must be positive. SimIlarly, a particle moving along the x-axis at an exponentially increasing velocity traverses the entire axis in a finite time.

L

[We shall return to divergent birth process in example (9.b).]

454

THE SIMPLEST TIME-DEPENDENT STOCHASTIC PROCESSES

[XVII. 5

5. THE BIRTH-AND-DEATH PROCESS The pure birth process of section 3 provides a satisfactory description of radioactive transmutations, but it cannot serve as a realistic model for changes in the size of populations whose members can die (or drop out). This suggests generalizing the model by permitting transitions from the state En not only to the next higher state En+l but also to the next lower state En-I' (More general processes will be defined in section 9.) Accordingly we start from the following

Postulates. The system changes only through transitions from states to their nearest neighbors (from En to En+l or E n- 1 if n > 1, but from Eo to El only). If at epoch t the system is in state Em the probability that between t and t+h the transztzon En ~ En+l occurs equals Anh + o(h), and the probability of En -~ E n- 1 (if 11 > 1) equals flnh + o(h). The probability that during (t, t+h) more than one change occurs is o(h). It is easy to adapt the method of section 2 to derive differential equations for the probabilItIes P nCt) of findmg the system m state En. 'Io calculate P n(t+h), note that the state En at epoch t+h is possible only under one of the following conditions: (1) At epoch t the system is in En and between t and t +h no change occurs; (2) at epoch t the system is in E n- 1 and a transition to En occurs; (3) at epoch t the system is in En+I and a transition to En occurs; (4) between t and t+h there occur two or more transItIons. By assumptIOn, the probabihty of the last event is o(h). The first three contingencies are mutually exclusive and their probabilities add. Therefore

(5.1)

PnCt+h)

= Pn(t){l-Anh-flnh}

+

+ An- 1hPn-l(t) + fln+lhP n+l(t) + o(h). Transposing the term P n(t) and dividing the equation by h we get on the left the difference ratio of P net), and in the limit as h ~ 0

P;t(t) (J'n+fln)Pn(t) + An lP It let) + ,flit 11Pn+l(t) This eguation holds for n > 1. For n = 0 in the same way

(5 2)

(5.3) If the initial state is E i , the initial conditions are

(5.4)

for

n =;t. i.

The birth-and-death process is thus seen to depend on the infinite system of differential equations (5.2)-(5.3) together with the initial condition (5.4). The question of existence and of uniqueness of solutions is in this case by no means trivial. In a pure birth process the system (3.2) of differential

XVII.S]

THE BIRTH-AND-DEATH PROCESS

4SS

equations was also infinite, but it had the form of recurrence relations; poet) was determined by the first equation and P net) could be calculated from P n-1(t). The new system (S.2) is not of this form, and all P net) must be found simultaneously. We shall here (and elsewhere in this

umque all

sa z::'j es ne regutarzty conullZon

n I

.UWeVel, I

IS

possible to choose the coefficients in such a way that I P net) < 1 and that there exist infinitely many solutions. In the latter case we encounter a phenomenon analogous to that studied in the preceding section for the pure birth process. This situation is of considerable theoretical interest, but the reader may safely assume that in all cases of practical significance the conditions of uniqueness are satisfied; in this case automatically I P net) = 1 (see section 9). When Ao = the transition Eo ~ E1 is impossible. In the terminology of Markov chains Eo is an absorbmg state from which no exit is possible, once the system is in Eo it stays there. From (S.3) it follows that in this case P~(t) > 0, so that poet) increases monotonically. The limit Po(oo) is the probability of ultimate absorption. It can be shown (either from the explicit form of the solutions or from the general ergodic theorems for Markov processes) that under any circumstance the lzmzts

°

(S.S)

lim P net) = Pn

exist and are independent of the initial conditions (S.4); they satisfy the system of linear equations obtained from (5.2) (5.3) on replacing the derivatives on the left by zero. The relations (S.S) resemble the limit theorems derived in XV,7 for ordinary Markov chains, and the resemblance is more than formal. Intuitively (S.S) becomes almost obvious by a comparison of our process 10 The simplest existence proof and uniqueness criterion are obtained by specialization from the general theory developed by the author (see section 9). Solutions of the birth-and-death process such that P nCt) < 1 have recently attracted wide attention. For explicit treatments see W. Lederman and G. E. Reuter, Spectral theory for the differential equations of simple birth and death processes. Philosophical Transactions of the Royal Society, London, Series A, vol. 246 (1954), pp. 387-391; S. Karlin and J. L. McGregor, the dijjerential equations of blrth-and-death processes and the Stlelrjes moment problem, Trans. Amer. Math. Soc., vol. 85 (1957), pp. 489-546, and The classification of birth and death processes, ibid. vol. 86 (1957), pp. 366--400. See also W. Feller, The birth and death processes as diffusion processes, Journal de Mathematiques Pures at Appliquees, vol. 38 (1959), pp. 301-345.

I

456

THE SIMPLEST TIME-DEPENDENT STOCHASTIC PROCESSES

[XVII. 5

with a simple Markov chain with transition probabilities (5.6)

Pn,n+l

=

1

An

+ fln

'

Pn,n-l

=

A n

+ fln

In this chain the only direct transitions are En ~ En+l and En ~ E n- b and they have the same conditional probabilities as in our process; the difference between the chain and our process lies in the fact that, with the latter, changes can occur at arbitrary times, so that the number of transitions during a time interval of length t IS a random vanable. However, for large t this number is certain to be large, and hence it is plausible that for t ~ 00 the probabilities P net) behave as the corresponding probabilities of the simple chain. If the simple chain with transition probabilities (5 6) is transient we have Pn = 0 for all n; if the chain is ergodic the Pn define a stationary probability distribution. In this case (5.5) is usually interpreted as a "tendency toward the steady state condition" and this suggestive name has caused much confusion. It must be understood that, except when Eo is an absorbing state, the chance fluctuatlons contmue forever unabated and (5.5) shows only that in the long run the influence of the initial condition disappears. The remarks made in XV, 7 concerning the statistical equilibria apply here without change. The principal field of applications of the birth-and-death process is to pro blems of waiting times, trunking, etc.; see sections 6 and 7. Examples. (a) Linear growth. Suppose that a population consists of elements which can split or die. During any short time interval of length h the probability for any living element to split into two is Ah + o(h), whereas the corresponding probability of dying is flh + o(h). Here A and fl are two constants characteristic of the populatIOn. If there IS no interaction among the elements, we are led to a birth and death process with An = nA, fln = nfl. The basic differential equations take on the form

(5.l)

P~(t) = flPl(t),

Explicit solutions can be found l l (cf. problems 11-14), but we shall not 11 A systematic way corisists in deriving a partial differential equation for the generatirig function ~ P n(t)sn. A more general process where the coefficients .It and f.l m (5. 7) are permItted to depend on time is discussed in detail in David G. Kendall, The generalized "birth and death" process, Ann. Math. Statist., vol. 19 (1948), pp. 1-15. See also the same author's Stochastic processes and population growth, Journal of the Royal Statistical Society, B, vol. 11 (1949), pp. 230-265 where the theory is generalized to take account of the age distribution in biological populations.

XVII.5]

THE BIRTH-AND-DEATH PROCESS

457

discuss this aspect. The limits (5.5) exist and satisfy (5.7) with P~(t) = o. From the first equation we find PI = 0, and we see by induction from the second equation that Pn = 0 for all n > 1. If Po = 1, we may say that the probability of ultimate extinction is 1. If po < 1, the relations PI = P2 ... = 0 imply that with probability 1 - Po the population increases over all bounds; ultimately the population must either die out or increase indefinitely. To find the probability Po of extinction we compare the process to the related Markov chain. In our case the transition pro babilities (5.6) are independent of Il, and We have theIefOIe an OIdinaIY random walk in which the steps to the right and left have probabilities P = Aj(A+fl) and q = fl/(A+fl), respectively. The state Eo is absorbing. We know from the classical ruin problem (see XIV, 2) that the probability of extinction is 1 if P < q and (q!p)i if q < p and i is the initial state. We conclude that in our process the probability Po = lim poet) of ultimate extinction is 1 if A ~ fl, and (fl/A)i if A > fl. (This is easily verified from the explicit solution; see problems 11-14.) As in many similar cases, the explicit solution of (5.7) is rather complicated, and it is desir able to calculate the mean and the variance of the distribution {P n(t)} directly from the differential equations. We have for the mean 00

(5.8)

M(t)

= :L nP net)· n=1

'}Ie shall omit a formal proof that lvf(t) is finite and that the follo wing formal operations are justified (again both points follow readily from the solution given in problem 12). Multiplying the second equation in (5.7) by n and adding over n = 1,2, ... , we find that the terms containing n2 cancel, and we get (5.9)

M' (t) =

A:L (n- I)Pn-I(t) -

fl

:L (n+ l)Pn+1(t) =

. (A- fl)M(t). This is a differential equation fOI lvf(z). The initial population size is i, and hence M(O) = i. Therefore (5.10) We see that the mean tends to 0 or infinity, according as A < fl or A > fl. The variance of {P n(t)} can be calculated in a similar way (cf. problem 14). (b) Waiting lines for a single channel. In the simplest case of constant coefficients An = A, fln = fl the birth-and-death process reduces to a special case of the waiting line example (7.b) when a = 1.

458

THE SIMPLEST TIME-DEPENDENT STOCHASTIC PROCESSES

[XVII.6

6. EXPONENTIAL HOLDING TIMES

The principal field of applications of the pure birth-and-death process is connected with trnnking in telephone engineering and varion~ type~ of waiting lines for telephones, counters, or machines. This type of problem can be treated with various degrees of mathematical sophistication. The method of the birth-and-death process offers the easiest approach, but this model is based on a mathematical simplification known as the assumption oj exponential holding times. We begm wIth a dISCUSSIOn of thIS basIc assumption. For concreteness of language let us consider a telephone conversation, and let us assume that its length is necessarily an integral number of seconds. We treat the length of the conversation as a random variable X and assume its probability distribution Pn = P{X = n} known. The telephone line then represents a physical system with two possible states, "busy" (Eo) and "free" (E1). When the line is busy, the probability of a change in state during the next second depends on how long the conversation has been going on. In other words, the past has an mfluence on the future, and our process is therefore not a Markov process (see XV,13). This circumstance is the source of difficulties, but fortunately there exists a simple exceptional case discussed at length in XIII,9. Imagine that the decision whether or not the conversation is to be continued is made each second at random by means of a skew coin. In other words, a sequence of Bernoulli trials with probability p of success is performed at a rate of one per second and continued until the first success. The conversation ends when this first success occurs. In this case the total length of the conversation, the "holding time," has the geometric distribution Pn qn-1p . \Vhenever the line is busy, the probability that it will remain busy for more than one second is q, and the probability of the transition Eo ~ E1 at the next step is p. These probabilities are now independent of how long the line was busy. When it is undesirable to use a discrete time parameter it becomes necessary to work with continuous random variables. The role of the geometric distribution for waiting times is then taken over by the exponential distributi~n. It is the only distribution having a Markovian character, that is, endowed with complete lack of memory. In other words, the pro bability that a conversation which goes on at epoch x continues beyond x + h is independent of the past duration of the conversation if, and only if, the probability that the conversation lasts for longer than t time units is given by an exponential e- At • We have encountered this "exponential holding time distribution" as the zero term in the Poisson distribution (2.4), that is, as the waiting time up to the occurrence of the first change.

XVII.6]

EXPONENTIAL HOLDING TIMES

459

The method of the birth-and-death process is applicable only if the transition probabilities in question do not depend on the past; for trunking and waiting line problems this means that all holding times must be exponential. Flom a plactical point of view this assumption may at first sight appear rather artificial, but experience shows that it reasonably describes actual phenomena. In particular, many measurements have shown that telephone conversations within a city12 follow the exponential law to a surprising degree of accuracy. The same situation prevails for other holding times (e.g., the duration of machine repairs). It remains to characterize the so-called incoming traffic (arriving calls, machine breakdowns, etc.). We shall assume that during any time interval of length h the probability of an incoming call is Ah plus negligible terms, and that the probability of more than one call is in the limit negligible. According to the results of section 2, this means that the number of incoming calls has a Poisson distribution with mean At . We shall describe this situation by saying that the incoming traffic is of the Poisson type with intensitv A. ~

It is easy to verify the described property of exponential holding times. Denote by u(t) the probability that a conversation lasts for at least t time units. The probability u(t+s) that a conversation starting at 0 lasts beyond t + s equals the probability that it lasts longer than t units multiplied by the conditional probability that a conversation lasts additional s units, given that its length exceeds t. If the past duration has no influence, the last conditional probability must equal u(s); that is, we must have

(6.1)

u(t+s)

= u(t) u(s).

To prove the asserted characterization of exponential holding times it would suffice to show that monotone solutions of this functional equation are necessarily of the form e-)..t. We prove a slightly stronger result which is of interest in itselfP Theorem. Let u be a solution of (6.1) defined for t > 0 and bounded in some interval. Then either u(t) = 0 for all t, or else u(t) = e-)..t for some constant A. Proof.

Clearly

(6.2)

Suppose first that u(a) 0 for some value a. From (6.2) we conclude by induction that u(2- n a) = 0 for all integers n, and from (6.1) it is clear that u(s) = 0 implies 12 Rates for long distance conversations usually increase after three minutes and the holding times are therefore frequently close to three minutes. Under such circumstances tbe exponential distribution does not apply 13 (6.1) is only a logarithmic variant of the famous Hamel equation f(t + s) = =/(t) +{(s). We prove that its solutions are either of the form at or else unbounded in every interval. (It is known that no such solution is a Baire function, that is, no such solution can be obtained by series expansions or other limiting processes starting from continuous functions.)

460

THE SIMPLEST TIME-DEPENDENT STOCHASTIC PROCESSES

[XVII.7

> s. Thus u(a) = 0 implies that u vanishes identically. Since (6.2) obviously excludes negative values of u it remains only to consider strictly positive solutions of (6.1). Put e-).. = u(1) and v(t) = eAtu(t). Then

u(t) = 0 for all t

(6.3)

v(t1-s)

= v(t)v(s)

We have to prove that this implies vet) integers m and n

and

= 1 for all

v(1) = 1. t. Obviously for arbitrary positive

(6.4)

and hence v(s) = 1 for all rational s. Furthermore, if v(a) = e then vena) = en for any positive or negative integer n. It follows that if u assumes some value e :rf 1 then it assumes also arbitrarily large values. But using (6.1) with t 1- s = T it is seen that V(T-S) = VeT) for all rational s. Accordingly, if a value A is assumed at some point T, the same value is assumed in every interval, however small. The boundedness of u in any given interval therefore precludes the possibility of any values :rf 1. ~

7. WAITING LINE AND SERVICING PROBLEMS (a) The simplest trunking problem. 14 Suppose that infinitely many trunks

or channels are available, and that the probability of a conversation ending between t and t+h is flh + o(h) (exponential holding time). The incoming calls constitute a traffic of the Poisson type with parameter A. The system is in state En if n lines are busy. It is, of course, assumed that the durations of the conversations are mutually independent. If n lines are busy, the probability that one of them will be freed within time h is then npth I o(h). The probability that within this time two or more conversations terminate is obviously of the order of magnitude h 2 and therefore negligible. The probability of a new call arriving is Ah + o(h). The probability of a combination of several calls, or of a call arriving and a conversating ending, is again o(h). Thus, in the 14 C. Palm, IntensitiitssGlnvankungen I'm FernsprecfwerkefJ r, Ericsson Technics (Stockholm), no. 44 (1943), pp. 1-189, in particular p. 57. Waiting line and trunking problems for telephone exchanges were studied long before the theory of stochastic processes was available and had a stimulating influence on the development of the theory. In particular, Palm's impressive work over many years has proved useful. The earliest worker in the field was A. K. Erlang (1878-1929). See E. Brockmeyer, H. L. Halstrom, and Arne Jensen, The life and works of A. K. Ertang, Transactions of the Danish Academy Technical Sciences, No.2, Copenhagen, 1948. Independently valuable pioneer work has been done by T. C. Fry whose book, Probability and its engineering uses, New York (Van Nostrand), 1928, did much for the development of engineering applications of probability.

XVII.7]

WAITING LINE AND SERVICING PROBLEMS

461

notation of section 5 (7.1)

fln = nfl·

The basic differential equations (5.2)-(5.3) take the form

+ flPI(t) P~(t) = -(A+nfl)P net) + AP n-I(t) + (n+l)flP n+l(t) n > 1. Explicit solutions can be obtained by deriving P~(t)

(7.2)

=

-APo(t)

where a partial differential equation for the generating function (cf. problem 15). We shall only determine the quantities Pn = lim P net) of (5.5). They satisfy the equations ApO = flPI (7.3) (A+nfl)Pn = APn-1 + (n+l)flPn+I' We find by induction that Pn

= po(Affl)nfn!,

and hence

(7.4) Thus, the limiting distribution is a Poisson distribution with parameter AI fl· It is independent of the initial state.

L

It is easy to find the mean Al(l) nPn(t). Multiplying the nth equation of (7.2) by n and adding, we get, taking into account that the P,nCt) add to unity,

(7.5)

M'(t) =

). -

pM(t).

When the initial state is E i , then M(O) = i, and (7.6)

The reader may verify that in the special case i = 0 the Pn(t) are given exactly by the Poisson distribution with mean M(t).

(b) Waiting lines [or a finite number o[ channels. 15 We now modify the last example to obtain a more realistic model. The assumptions are the same, except that the number a of trunklines or channels is finite. If all a channels are busy, each new call joins a waiting line and waits until a channel is freed. This means that all trunklines have a common waiting line. The word "tnmk" may be replaced by countet at a postoffice and "conversation" by service. We are actually treating the general waiting 15 A. Kolmogoroff, Sur Ie prob/eme d'attente, Recueil Mathematique [Sbomik], vol. 38 (1931), pp. 101-106. For related processes see problems 6-8 and 20.

462

THE SIMPLEST TIME-DEPENDENT STOCHASTIC PROCESSES

[XVII. 7

line problem for the case where a person has to wait only if all a channels are busy. We say that the system is in state En if there are exactly n persons either being served or in the waiting line. Such a line exists only when Il > u, and then there are n - a persons in it. As long as at least one channel is free, the situation is the same as in the preceding example. However, if the system is in a state En with n > a, only a conversations are going on, and hence ftn = aft, for n > a. The basic system of differential equations is therefore given by (7.2) for n < a, but for n > a by P;lt)

(7.7)

=

-(A+aft)P net)

+ AP n-I(t) + aftPn+1(t).

In the special case of a single channel (a - 1) these equatIOns reduce to those of a birth-and-death process with coefficients independent of n. The limits Pn = limpn(t) satisfy (7.3) for n < a, and (7.8)

for n

> a.

By recursion we find that

(7.9)

Pn

(7.10)

Pn

=

(A/ft)n Po--,-' n.

n


n>a

The series L (Pnlpo) converges only if (7.11)

Alft

< a.

Hence a limiting distribution {Pk} cannot exist when A > aft· In this case Pn = 0 for all n, which means that gradually the waiting line grows over all bounds. On the other hand, if (7.11) holds, then we can determine Po so that LPn = 1. From the explicit expressions for P net) it can be shown that the Pn thus obtained really represent the lzmllzng dzstrlbutzon of the PIt (t) Table I gives a numerical illustration for a - 3, Alu - 2 " (c) Servicing ofmachines. 16 For orientation we begin with the simplest case and generalize it in the next example. The problem is as follows. We consider automatic machines which normally require no human care except that they may break down and call for service. The time required 16 Examples (c) and (d), including the numerical illustrations, are taken from an article by C. Palm, The distribution of repairmen in servicing automatic machines (in Swedish), Industritidningen Norden, vol. 75 (1947), pp. 75-80, 90-94, 119-123. Palm gives tables and graphs for the most economical number of repairmen.

XVII.7]

WAITING LINE AND SERVICING PROBLEMS

463

for servicing the machine is again taken as a random variable with an exponential distribution. In other words, the machine is characterized by two constants A and fl with the following properties. If at epoch t the machine is in working state, the probability that it will call for set vice before epoch t+h equals Ah plus terms which are negligible in the limit h -- O. Conversely, when the machine is being serviced, the probability that the servicing time terminates before t+h and the machine reverts to the working state equals flh + o(h). For an efficient machine A should be relatively small and fl relatively large. The ratio ,1/ fl is called the servicing factor.

We suppose that m machines with the same parameters A and fl and working independently are serviced by a single repairman. A machine which TABLE

1

LIMITING PROBABILITIES IN THE CASE OF a CHANNELS AND

n

Lines busy People waiting P1l

0 0 0 0.1 III

1 0 0.2222

2 2 0 0.2222

AI fl =

3 3 0 0.1481

=3

2 4 3

0.09888

5 6 7 3 3 3 2 3 4 0.0658 0.0439 0.0293

breaks down is serviced immediately unless the repairman is servicing another machine, in which case a waiting line is formed. We say that the system is in state En if n machines are not working. For 1 < n < m this means that one machine is being serviced and n - 1 are in the waiting line; in the state Eo all machines work and the repairman is idle. A transition En -- En+l is caused by a breakdown of one among the m - n working machines, whereas a transition En -- E n- 1 occurs if the machine being serviced reverts to the working state. Hence we have a birth-and-death process with coefficients (7.12) For 1 (7.13)

An

=

(m-n)A,


=

flo

fll = fl2 = ... = flm = fl·

0,

1 the basic differential equations (5.2) become

-{(m-n)A+fl}P net)

while for the limiting states n (7.13a)

=

+ (m-n+ 1)APn-l(t) + flP n+l(t),

0 and n

m

+ flPl(t), P'm(t) = -flP met) + APm-l(t). P~(t) = -mAPo(t)

464

THE SIMPLEST TIME-DEPENDENT STOCHASTIC PROCESSES

[XVII.7

This is finite system of differential equations and can be solved by standard methods. The limits Pn = lim P net) are determined by

(7.14)

{(m-n)A

+ fl}Pn = (m-n+ I)APn-l + flPn+l,

TABLE

2

ERLANG'S LOSS FORMULA

pn

PROBABILITIES

FOR THE CASE

m

AI fl =

0.1,

6

n

Machines in Waiting Line

pn

0 1 2 3 4 5 6

0 0 1 2 3 4 5

0.4845 0.2907 0.1454 0.0582 0.0175 0.0035 0.0003

From these equations we get the recursion formula (7.15) Substituting successively n

=

m - 1, m - 2, ... , 1, 0, we find

The remaining unknown constant Pm can be obtained from the condition that the Pi add to unity. The result is known as Erlang's loss formula: (7.16)

Typical numerical values are exhibited in table 2. The probability Po may be interpreted as the probability of the repairman's being idle (in the example of table 2 he should be idle about half the

XVII.7]

465

WAITING LINE AND SERVICING PROBLEMS

time). The expected number of machines in the waiting line is m

. (7.17)

W

m

= 2(k-l)pk = 2 k pk - (I-po)· k-l

k-l

This quantity can be calculated by adding the relations (7.15) for n = = 0,1, ... ,m. Using the fact that the Pn add to unity, we get

mA - AW - A(I -Po)

= flO-po)

or (7.18)

In the example of table 2 we have W - 6 . (0.0549). Thus 0.0549 is the average contribution of a machine to the waiting line. (d) Continuation: several repairmen. We shall not change the basic assumptions of the preceding problem, except that the m machines are now serviced by r repairmen (r < m) Thus for n < r the state En means that r - n repairmen are idle, n machines are being serviced, and no machine is in the waiting line for repairs. For n > r the state En signifies that r machines are being serviced and n - r machines are in the waiting line. We can use the setup of the preceding example except that (7.12) is obviously to be replaced by flo

An

(7.19)

=

= 0,

(1

(m-n)A,

An = (m-n)A,

fln = rfl

(r

< n < r), < n < m).

We shall not write down the basic system of differential equations but only the equations for the limiting probabilities pn' For 1 < n < r (7.20a)

while for r


(7.20b)

{(m-n)A

+ rfl}Pn =

(m-n+ 1)APn-1

+ rflPn+I'

For n = 0 obviously mApo = flPI' This relation determines the ratio PI/PO' and from (7.20a) we see by induction that for n < r (7.21)

finally, for n (7.22)

>r

we get from (7.20b)

rflPn+1

=

(m-n)APn·

466

[XVII.7

THE SIMPLEST TIME-DEPENDENT STOCHASTIC PROCESSES

These equations permit calculating successively the ratios Pnlpo. Finally, Po follows from the condition 2Pk = I. The values in table 3 are obtained in this way. A comparison of tables 2 and 3 reveals surprising facts. They refer to the same machines (AlP, = 0.1), but in the second case we have m = 20 machines and r = 3 repairmen. The number of machines per repairman TABLE PROBABILITIES

pn

FOR THE CASE

3

AJp. =

0.1, m

=

20, r

=

3

n

Machines Serviced

Machines Waiting

Repairmen Idle

pn

0 1 2 3 4 5 6 7 8 9 10 11 12

0 1 2 3 3 3 3 3 3 3 3 3 3

0 0 0 0 1 2 3 4 5 6 7 8 9

3 2 1 0 0 0 0 0 0 0 0 0 0

0.13625 0.27250 0.25888 0.15533 0.08802 0.04694 0.02347 0.01095 0.00475 0.00190 0.00070 0.00023 0.00007

has increased from 6 to 6~, and yet the machines are serviced more efficiently. Let us define a coefficient of loss for machines by (7.23)

w m

average number of machines in waiting line number of machines

and a coefficient of loss for repairmen by (7.24)

p r

average number of repairmen idle number of repairmen

For practical purposes we may identify the probabilities P nCt) with their limits Pn. In table 3 we have then w = P4 + 'JP5 + 3p6 + ... + 17P2o and p = 3po + 2Pl + P2' Table 4 proves conclusively that for our particular machines (with Alp = 0.1) three repairmen per twenty machines are much more economical than one repairman per six machines.

XVII.7]

WAITING LINE AND SERVICING PROBLEMS

467

(e) A power-supply problem.l 7 One electric circuit supplies a welders who use the current only intermittently. If at epoch t a welder uses current, the probability that he ceases using before epoch t+h is flh + o(h); if at epoch t he requires no current, the probability that he calls for current before t+h is Ah o(h). The welders work independently of each other. We say that the system is in state En if n welders are using current. Thus we have only finitely many states Eo, ... , Ea.

+

+

TABLE

4

COMPARISON OF EFFICIENCIES OF Two SYSTEMS DISCUSSED IN EXAMPLES ( c) AND

Number of machines Number of repairmen Machines per repairman Coefficient of loss for repairmen Coefficient of loss for machines

(d) (c)

(d)

6 1 6 0.4845 0.0549

20 3

61

0.4042 0.01694

If the system is in state ET'l' then a - n welders are not using current and the probability for a new call for current within a time interval of duration h is (a-n)Ah + o(h); on the other hand, the probability that one of the n welders ceases using current is nph + o(h). Hence we have a birth-and-death process with

(7.25)

An

=

(a-n)A,

fln = nfl,

o


The basic differential equations become

(7.26)

P~(t) = -{nfl+(a-n)A}Pn(t)

+ (n+l)flP n+l(t) +

+ (a -n + l)AP n-l(t), P~(t)

=

-aflPa(t)

+ }.Pa-1(t).

17 This example was suggested by the problem treated (inadequately) by H. A. Adler and K. W. Miller, A new approach to probability problems in electrical engineering, Transactions of the American Institute of Electrical Engineers, vol. 65 (1946), pp. 630-632.

468

THE SIMPLEST TIME-DEPENDENT STOCHASTIC PROCESSES

[XVII. 8

(Here 1 < n < a - 1.) It is easily verified that the limiting probabilities are given by the binomial distribution (727)

=

Pn

fa)f_A ff-'" y-n ~n7 ~A + p7 ~A + ",7 '

a result which could have been anticipated on intuitive grounds. (Explicit representations for the Pn(t) are given in problem 17.)

8. THE BACKWARD (RETROSPECTIVE) EQUATIONS In the preceding sections we were studying the probabilities P net) of finding the system at epoch t in state En. This notation is convenient but misleading, inasmuch as it omits mentioning the initial state Ei of the system at time zero. For the further development of the theory it is prefer.able to revert to the notations of section 1 and to the use of transition probabilities. Accordingly we denote by Pin(t) the (conditional) probability of the state En at epoch t + s given that at epoch s the system was in state E i . We continue to denote by Pn(t) the (absolute) probability of En at epoch t. When the initial state Ei is given, the absolute probability P n(t) coincides with Pin(t), but when the initial state is chosen in accordance with a probability distribution {a i } we have (8.1)

Pn(t) - !aiPin(t). i

For the special processes considered so far we have shown that for fixed i the transition probabilities Pin(t) satisfy the basic differential equations (3.2) and (5 2) The subscript i appears only in the initial COllditions, namely 1 for n = i p.tn (0) = 0 (8.2) otherwise. As a preparation for the theory of more general processes we now proceed to show that the same transition probabilities satisfy also a second system of differential equations. To fix ideas, let us start with the pure birth process of section 3. The differential equations (3.2) were derived by prolonging the time interval (0, t) to (0, t+h) and considering the possible changes during the short time (t, t+h). We could as well have prolonged the interval (0, t) in the direction of the past and considered the changes during (-h, 0). In this way we get a new system of differential equations in which n (instead of i) remains fixed. Indeed, a transition from Ei at epoch -h to En at epoch t can occur in three mutually

XVII.8]

THE BACKWARD (RETROSPECTIVE) EQUATIONS

469

exclusive ways: (1) No jump occurs between -h and 0, and the system passes from the state Ei at epoch 0 to En. (2) Exactly one jump occurs between -h and 0, and the system passes from the state EHI at epoch 0 to En at epoch t; (3) more than one jump occurs between -h and O. The probability of the first contingency is 1 - Aih + o(h), that of the second Aih + o(h), while the third contingency has probability o(h). As in sections 2 and 3 we conclude that

Hence for i

>0

the new basic system now takes the form

(8.4) These equations are called the backward equations, and, for distinction, equations (3.2) are called the forward equations. The initial conditions are (8.2). [Intuitively one should expect that

if

(8.5)

n

< i,

but pathological exceptions exist; see example (9.b).] In the case of the birth-and-death process the basic forward equations (for fixed i) are represented by (5.2)-(5.3). The argument that lead to (8.4) now leads to the corresponding backward equations

It should be clear that the forward and backward equations are not independent of each other; the solution of the backward equations with the initial conditions (8.2) automatically satisfies the forward equations, except in the rare situations where the solution is not unique.

Example. The Poisson process. In section 2 we have interpreted the Poisson expression (2.4) as the probability that exactly n calls arrive during any time interval of length t. Let us say that at epoch t the system is in state En if exactly Il calls arrive within the time interval from 0 to t. A transition from Ei at tl to En at t2 means that n - i calls arrived between tl and t 2. This is possible only if n > i, and hence we have for the transition probabilities of the Poisson process

(8.7)

P () _ -At (At) n-i . t - e tn (n-i)!

if n

>

PinCt) = 0

if n

< i.

i,

470

THE SIMPLEST TIME-DEPENDENT STOCHASTIC PROCESSES

[XVII.9

They satisfy the forward equations

(8.8) as well as the backward equatjons

(8.9) 9. GENERAL PROCESSES So far the theory has been restricted to processes in which direct transitions from a state En are possible only to the neighboring states En+! and En-I. Moreover, the processes have been time-homogeneous, that is to say, the transition probabilities Pin(t) have. been the same for all time intervals of length t. We now consider more general processes in which both assumptions are dropped. As in the theory of ordinary Markov chains, we shall permit direct transitions from any state Ei to any state En. The transition probabilities are permitted to vary in time. This necessitates specifying the two end points of any time interval instead of specifying just its length. Accordingly, we shall write PineT, t) for the conditional probability offinding the system at epoch t in state En' given that at a previous epoch T the state was Eio The symbol PineT, t) is meaningless unless T < t. If the process is homogeneous in time, then PineT, t) depends only on the difference t - T, and We can write PinEl) instead of Pin('F, 'F I t) (vlhich is then independent of T). We saw in section 1 that the transition probabilities of time-homogeneous Markov processes satisfy the Chapman-Kolmogorov equation (9.1a) v

The analogous identity for non-homogeneous processes reads (9.1b)

and is valjd for T < SO < t This relation expresses the fact that a transition from the state Ei at epoch T to En at epoch t occurs via some state Ev at the intermediate epoch s, and for Markov processes the probability P vn(s, t) of the transition from Ev to En is independent of the previous state E i • The transition probabilities of Markov processes WIth counta5Iy many states are therefore solutIOns of the ChapmanKolmogorov identity (9.1b) satisfying the side conditions (9.2)

XVII.9]

GENERAL PROCESSES

471

We shall take it for granted without proof that, conversely, such solution represents the transition probabilities of a Markov process. IS It follows that a basic problem of the theory of Markov processes consists in finding all solutions of the Chapman-Kolmogorov identity satisfying the side conditions (9.2). The main purpose of the present section is to show that the postulates of the birth-and-death processes admit of a natural generalization permitting arbitrary direct transitions Ei -- E j • From these postulates we shall derive two systems of ordinary differential equations, to be called forward and backward equations, respectively. Under ordinary circumstances each of the two systems uniquely determines the transition probabilities. The forward equations are probabilistically more natural but, curiously enough, their derivation requires stronger and less intuitive assumptions. In the time-homogeneous birth-and-death process of section 5 the starting postulates referred to the behavior of the transition probabilities Pjih) for small h; in essence it was required that the derivatives P;k exist at the origin. For inhomogeneous processes we shall impose the same condition on Pjk(t, t+x) considered as functlOns of x. I he derIVatIves will have an analogous probabilistic interpretation, but they will be functions of t.

Assumption 1. function cn(t)

(9.3)

>0

To every state En such that as h -- 0

1 - Pnn(t, t+h) h

-----=..::.:..::....:..-~-...:..

there corresponds a continuous

--

c() t. n

Assumption 2. To every pair of states E j , Ek with j ¥- k there correspond transition probabilities pjll) (depending on time) such that as

(9.4)

(j ¥- k).

18 The notion of a Markov process requires that, given the state Ev at epoch s, the development of the process prior to epoch s has no influence on the future development. As was pointed out in section 1, the Chapman-Kolomogorov identity expresses this requirement only partially because it involves only one epoch T < s and one epoch t > s. The long-outstanding problem whether there exist non-Markovian processes whose transition probabilities satisfy (9.1) has now been solved in the affirmative; the simplest known such process is time-homogeneous and involves only three states E} [See W. Feller, Ann. Math. Statist., vol. 30 (1959), pp. 1252-1253.] Such processes are rather pathological, however, and their existence does not contradict the assertion that every solution of the Chapman-Kolomogorov equation satisfying (9.2) corresponds (in a unique manner) to a Markov process.

472

THE SIMPLEST TIME-DEPENDENT STOCHASTIC PROCESSES

[XVII.9

The Pik(t) are continuous in t, and for every fixed t,j

(9.5)

pjj(t)

=

O.

The probabilistic interpretation of (9.3) is obvious; if at epoch t the system is in state En' the probability that between t and t+h a change occurs is cn(t)h + o(h). The coefficient Pik(t) can be interpreted as the conditional probability that, if a change from Ei occurs between t and t+h, this change takes the system from Ei to E k • In the birth-and-death

(9.6)

Pi,iH(t)

= ,

Ai

+'A· , #i

Pi,i-l(t)

=

A i

#i

+ #i

'

and pilt) = 0 for all other combinations of j and k. For every fixed t the Pik(t) can be interpreted as transition probabilities of a Markov chain. The two assumptions suffice to derive a 'system of backward equations for the Pike T, t), but for the forward equations we require in addition

Assumption 3. For fixed k the passage to the limit in (9.4) is uniform with respect to j.

The necessity of this assumption is of considerable theoretical interest and will be discussed presently. We proceed to derive differential equations for the Pik(T, t) as functions of t and k (forward equatIOns). From (9.1) we have (9.7)

Pik(T, t+h)

= 2, Pij(T, t)Pik(t, t+h). i

Expressing the term Pk1e(t, t+h) on the right in accordance with (9 3), we get (9.8)

Pik(T, t+h) - PilT, t) h -

where the neglected terms tend to 0 with h, and the sum extends over all j except j = k. We now apply (9.4) to the terms of the sum. Since (by assumption 3) the passage to the limit is uniform in j, the right side has a limit. Hence also the left side has a limit, which means that Pike T, t) has a partial derivative with respect to l, and (9.9)

XVII.9]

GENERAL PROCESSES

473

This is the basic system offorward differential equations. Here i and Tare fixed so that we have (despite the formal appearance of a partial derivative) a system of ordinary differential equations 19 for the functions PiiT, t). The parameters i and T appear only in the initial condition

for

(9.10)

k = i

otherwise.

We now turn to the backward equations. In them k and t are kept constant so that the transition probabilities Pii T, t) are considered as functions of the initial data Ei and T. In the formulation of our starting assumptions the initial variable was kept fixed, but for the derivation of the backward equations it is preferable to formulate the same conditions with reference to a time interval from t-h to t. In other words, it is more natural to start from the following alternative form for the conditions (9.3) and (9.4): . (9.3a)

(j ~ k).

(9.4a)

It is not difficult to pro've the equivalence of the two sets of condItIOns (or to express them in a unified form), but we shall be content to start from the alternative form. The remarkable feature of the following derivation is that no analogue to assumption 3 is necessary. By the Chapman Kolmogorov identity (9.1h) (9.11) v

and using (9.3a) with n

= 19

i, we get

-ClT)Pik(T, t)

+ h-1I PivCT-h, T)PviT, t) + .... v~i

The standard form would be x~(t)

=

-Ck(t)Xk(t)

+ 2 xit)Cj(t)pillt ). j

474

[XVII.9

THE SIMPLEST TIME-DEPENDENT STOCHASTIC PROCESSES

Here h- 1P i vCT-h, T) -- ClT)PiJT) and the passage to the limit in the sum to the right in (9.12) is always uniform. In fact, if N> i we have (9 13) 0

<

b-

1

ro

2:

ro

PivCT

1],

< h- 1 2:

T) P"iT , t)

v=.\'+1

h, 1")

PzvC'T

v=.\'+1

= "-'( 1 - ,tPi,(.-h,

T») ~ C;(.)(

1-

i

Pi

'('»)'

In view of condition (9.5) the right side can be made arbitrarily small by choosing N sufficiently large. It follows that a tennwise passage to the limit in (9.12) is permitted and we obtain (9.14) These are the basic backward differential equations. Here k and t appear as fixed parameters and so (9.14) represents a system of ordinary differential equations. The parameters k and t appear only in the initial conditions for i = k (9.15) otherwise. Example. (a) Generalized Poisson process. Consider the case where all ci(t) equal the same constant, clt). = A, and the Pik are independent of t. In this case the Pik are the transition probabilities of an ordmary Markov chain and (as in chapter XV) we denote its higher transition probabilities by p;~). From clt) = A, it follows that the probability of a transition occurring betwen t and t + h is independent of the state of the system and equals ,1h + o(h). This implies that the number of transitions between T and t has a Poisson distribution with parameter ,1(t-T). Given that exactly n transitions occurred, the (conditional) probability of a passage from j to k is p;~). Hence

(9.16)

Pz"k(T , t)

-

-A,(t-T) IOCJ

e

An(t-T)n (n) ,

n-O

n.

P"k z

(where, as usual, i"~) = 1 and p;~) = 0 for j ¥- k). It is easily verified 33 , " that (9.16) is in fact a solution of the two systems (9.9) and (9.14) of differential equations satisfying the boundary conditions. In particular, if (9.17)

Pik

=

for

k <j,

Pik

= fk-i

for

k

(9.16) reduces to the compound Poisson distribution <;>f XII,2.

>j

XVII.9]

GENERAL PROCESSES

475

Our two systems of differential equations were first derived by A. Kolmogorov in an important paper developing the foundations of the theory of Markov processes. 20 Assuming that the sequence of coefficients enCt) remains bounded for each t it was then shown by W. Feller that there exists a unique solution {Pike T, t)} common to both systems, and that this solution satisfies the Chapman-Kolmogorov identity (9.lb) as well as the side conditions (9.2). Furthermore, in this case neither system of differential equations possesses any other solutions, and hence the two systems are essentially equivalent. However, concrete problems soon lead to equations with unbounded sequences {en} and, as shown in section 4, in such cases we sometimes encounter unexpected solutions for which (9.18)

2: PjiT, t) < 1 k

holds with the strict inequality. It has been shown 21 [without any restrictions on the coefficients en(t)] that there always exists a minimal solution {Pike T, t)} satisfying both systems of differential equations as well as the Chapman-Kolmogorov identity (9.lb) and (9.18). This solution is called minimal because (9.19) whenever the left sides satisfy either the backward or the forward differential equations (together with the trite initial conditions (9.10)]. 'Nhen the minimal solution satisfies (9.18) with the equality sign for all t, this implies that neither the backward nor the forward equations can have any probabilistically meaningful solutions besides Pike T, t). In other words, when the minimal solution is not defective, the process is uniquely determined by either system of equations. As stated before, this is so when the coefficients en(t) remain bounded for each fixed t. The situation is entirely different when the minimal solution is defective, that is, when in (9.18) the inequality sign holds for some (and hence for all) t. In this case there exist infinitely many honest transition probabilities 20 A. Kolmogoroff, Uberdie analyt isch en Methoden inder Wahrscheinlichkeitsrechnung, Mathematische Annalen, vol. 104 (1931), pp. 415-458. 21 W. Feller, On the integro-differential equations of purely discontinuous Markoff processes, Trans. Amer. Math. Soc., vol. 48 (1940), pp. 488-515. This paper treats more general state spaces, but countable state spaces are mentioned as special case of greatest interest. This was overlooked by subsequent authors who gave more eOffiplicated and less complete derivations. The minimal solution in the time-homogeneous case is derived in XIV, 7 of volume 2 by the use of Laplace transforms. For a more complete treatment see W. Feller, On boundaries and lateral conditions for the Kolmogorov differential equations, Ann. Math., vol. 65 (1957), pp. 527-570.

476

THE SIMPLEST TIME-DEPENDENT STOCHASTIC PROCESSES

[XVII.9

satisfying the backward equations and the Chapman-Kolmogorov identity, and hence there exist infinitely many Markovian processes satisfying the basic assumptions 1 and 2 underlying the backward equa"tions. Some of these may satisfy also the fonvard equations, but in other cases the solution of the forward equations is unique. 22 Example. (b) Birth processes. The differential equations (3.2) for the time-homogeneous birth process were of the form (9.20) These are the forward equations. Since they form a recursive system the solution is uniquely determined by its initial values for t = O. For the transition probabilities we get therefore sllccessively Pik(t) - 0 for all k < i, (9.21) and finally for k (9.22)

>i P o«t)

. Ak--l f.'e-·"p,.k--l(t-S) ds.

To see that these transition probabilities satisfy the Chapman-Kolmogorov identity (9.1a) it suffices to notice that for fixed i and s both sides of the identity represent solutions of the differential equations (9.20) assuming the same initial values. The backward equations were derived in (8.4) and are of the form (9.23) We have to show that this equation is satisfied by Pik(t) when k IS kept fixed. This is trivially true when k < i because in this case all three terms in (9.23) vanish. Using (9.21) it is seen that the assertion is true also when k - i = 0 and k - i = 1. We can now proceed by induction using the fact that for k > i + 1 (9.24) 22 It will be recalled that only assumptions 1 and 2 are probabilistically meaningful whereas assumption 3 is of a purely analytic character and was introduced only for convenience. It is unnatural in the sense that not even all solutions of the forward equations satisfy the imposed uniformity condition. Thus the backward equations express probabilistically meaningful conditions and lead to interesting processes, but the same cannot be said of the forward equations. This explains why the whole theory of Markov processes must be based on the backward equations (or abstractly, on semi-groups of transformations of functions rather than probability measures).

XVII.9]

GENERAL PROCESSES

477

Assume that the Pik(t) satisfy (9.23) if k - i =:;: n. For k = i + 1 + n we can then express the integrand in (9.24) using the right side in (9.23) with the result that (9.23) holds also for k - i = n + 1. We have thus proved that a system of transition probabilities Pilil) is uniquely determined by the forward equations, and that these probabilities satisfy the backward equations as well as the Chapman-Kolmogorov identity. The backward equations (9.23) may have other solutions. The asserted mmunahty property (9.19) of our transitIOn probabilities may be restated as follows. For arbitrary non-negative solutions of (9.23) (9.25) for all l > O. Here k is arbitrary, but fixed. This assertion is trivial for k < i since in this case the right sides vanish. Given Yi+l the solution Yi of (9.23) can be represented explicitly by an integral analogous to (9.22), and the truth of (9.25) now follows recursively for i = k, k - 1, .... Suppose now that 2.1;1 < 00. It was shown in section 4 that in this case the quantities co

(9.26)

Li(t)

= 1 - '2,Piit) k=O

do not vanish identically. Clearly Llt) may be interpreted as the probability that, starting from E i , "infinity" is reached before epoch t. It is also obvious that the Li are solutions of the differential equations (9.23) with the initial values LlO) = o. Consider then arbitrary non-negative functiops Ak and define

(9.27)

Pik(t)

=

Pik(t)

+ CLlt-s)Ak(s) ds . .(0

It is easily verified that for fixed k the Pik(t) satisfy the backward differential equations and Pik(O) = Pik(O). The question arises whether the Ak(t) can be defined in such a way that the Pik(t) become transition probabilities satisfying the Chapman-Kolmogorov equation The answer is in the affirmative. We refrain from proving this assertion but shall give a probabilIstic mterpretatIOn. The Piit) define the so-called absorbing boundary process: When the system reaches infinity, the process terminates. Doob 23 was the first to study a return process in which, on reaching infinity, the system instantaneously returns to Eo (or some other prescribed state) and the process starts from scratch. In such a process the system may pass from Eo to E5 either in 23 J. L. Doob, Markoff chains-denumerable case, Trans. Amer. Math. Soc., vol. 58 (1945), pp. 455-473.

478

THE SIMPLEST TIME-DEPENDENT STOCHASTIC PROCESSES

[XVII. 10

five steps or in infinitely many, having completed one or several complete runs from Eo to "infinity." The transition probabilities of this process are of the form (9.27). They satisfy the backward equations (8.4) or (9.23) ~ but not the forward equations (9.24) or (8.5). This explains why in the derivation of the forward equations we were forced to introduce the strange-looking assumption 3 which was unnecessary for the backward equations: The probabilistically and intuitively simple assumptions 1 2 are compatible with return processes, for which the forward equations (9.24) do not hold. In other words, if we start from the assumptions r-2 then Kolmogorov's backward equations are satisfied, but to the forward equations another term must be added. 24 The pure birth process is admittedly too trite to be really interesting, but the conditions as described are typical for the most general case of the Kolmogorovequations. Two essentially new phenomena occur, however. First, the birth process involves only one escape route out to "infinity" or, in abstract terminology, a single boundary point. By contrast, the general process may involve boundaries of a complicated topological structure. Second, in the birth process the motion is directed toward the boundary because only transitions En -- En+! are possible. Processes of a different type can be constructed; for example, the direction may be reversed to obtain a process in which only transitions En+! -- En are possible. Such a process can originate at the boundary instead of ending there. In the birth and death process, transitions are possible in both directions just as in one-dimensional diffusion. It turns out that in this case there exist processes analogous to the elastic and reflecting barrier processes of diffusion theory, but their description would lead beyond the scope of this book.

10. PROBLEMS FOR SOLUTION 1. In the pure birth process defined by (3.2) let An > 0 for all n. Prove that for every fixed n ;;::: 1 the function P n(t) first increases, then decreases to O. If tn is the place of the maximum, then tl < t2 < t3 < .. " Hint: Use induction; differentiate (3.2). 2. Continuation. If 2 Jon 1 00 show that tn • 00. :Hint. If tn - I, then for fixed t > T the sequence AnPn(t) increases. Use (4.10). 3. The Yule process. Derive the mean and the variance of the distribution defined by (3.4). [Use only the differential equations, not the explicit form (3.5).]

4. Pure death process. Find the differential equations of a process of the Yule type with transitions only from En to En_I' Find the distribution Pn(t), its mean, and its variance, assuming that the initial state is E i . 24

For further details see XlV,8 of volume 2.

XVII. 10]

479

PROBLEMS FOR SOLUTION

5. Parking lots. In a parking lot with N spaces the incoming traffic is of the Poisson type with intensity A, but only as long as empty spaces are available. The occupancy times have an exponential distribution (just as the holding times in section 7). Find the appropriate differential equations for the probabilities P net) of finding exactly n spaces occupied. 6. Various queue disciplines. We consider the waiting line at a single channel subject to the rules given in example (7.b). This time we consider the process entirely from the point of view of Mr. Smith whose call arrives at epoch o. His waiting time depends on the queue discipline, namely the order in which waiting calls are cleared. The following disciplines are of greatest interest. (a) Last come last served, that is, calls are cleared in the order of arrival. (b) Random order, that is, the members of the waiting line have equal probabilities to be served next. (c) Last come first served, that is, calls are cleared in the inverse order of arrival 25 It is convenient to number the states starting with -1. During Mr. Smith's actual service time the system is said to be in state Eo, and at the expiration of this servicetime it passes into E_l where it stays forever. For n ~ 1 the system is in state En if Mr. Smith's call is still in the waiting line together with n - 1 other calls that will, or may, be served before Mr. Smith. (The call being served IS not mcluded m the waiting line.) Denote by Pn(t) the probabihty of En at epoch t. Prove that

in all three cases. Furthermore (a) Under last come last served discipline P~(t) -

-pPn(t)

+ ~I~z

(b) under random order discipline when n

1'1>0

it), ~

2

+ t,uP2(t) + t,uP2(t) + i,uP3(t) + ....

P~(t) = -(A+,u)P1(t) P~ (t) =

-,uPo(t)

+ ,uP1(t)

(c) Under last come first served discipline for n

P~(t)

~

2

= -(A+,u)P1(t) + ,uP2(t)

P~(t) =

-,uPo (t)

+ ,uP1(t).

(See also problem 20.) 25 This discipline is meaningful in information-processing machines when the latest information (or observation) carries greatest weight. The treatment was suggested by E. Vaulot, Delais d'attente des appels telephoniques dans l'ordre inverse de leur arrivee, Comptes Rendues, Academie des Sciences, Paris, vol. 238 (1954), pp. 1188-1189.

480

[XVII. 10

THE SIMPLEST TIME-DEPENDENT STOCHASTIC PROCESSES

7. Continuation. Suppose that the queue discipline is last come last served (case a) and that prcO) = 1. Show that (fltY-1.: Pk(t) = (r k)! e-Jlt ,

o~ k

~

r.

8. Continuation. Generalize problem 6 to the case of a channels. 9. The Polya process. 26 This is a non-stationary pure birth process with An

depending on time: (10.1)

1

+ an

1

+ at

Show that the solution with initial condition PoCO) = 1 is poet) = (1 +at)-lja (10.2) Pn(t) = (1 +a)(l +2a) . ',' {I +(n -l)a} t n(1 +at)-n-Ija.

n. Show from the differential equations that the mean and variance are t and t(l +at), respectively. 10. Continuation. The Polya process can be obtained by a passage to the limit from the Polya urn scheme, example V, (2.c). If the state of the system is defined as the number of red balls drawn, then the transition probability Ek ---+- Ek+l at the (n + l)st drawing is (10.3)

Pk,n = r

r + kc + b + nc

=

p 1

+ ky + ny

where P = r/(r+b), y = c/(r+b). As in the passage from Bernoulli trials to the Poisson distribution, let drawings be made at the rate of one in h tIme umts and let h ---+- 0, n ---+- Cf) so that np ---+- t, ny ---+- at. Show that in the limit (10.3) leads to (10.1). Show also that the Polya distribution V, (2.3) passes into (10.2). 11. Linear growth. If in the process defined by (5.7) A = fl, and PI(O) = 1, then (10.4)

Poct)

1

+ At '

The probability of ultimate extinction is 1. 12. Continuation. Assuming a trial solution to (5.7) of the form Pn(t) = = A(t)Bn(t), prove that the solution with PI(O) = 1 is (10.5)

poet) = flB(t),

Pn(t) = {l-AB(t)}{l-flB(t)}{AB(t)}n-1

26 O. Lundberg, On random processes and their applications to sickness and accident statistics, Uppsala, 1940..

XVII. 10]

481

PROBLEMS FOR SOLUTION

with 1 - e(l-Jtlt B(t) = fl - Ae(l -I' ) t

(10.6)

13. Continuanoll. The generating function pes, t) partial differential equation

2 Pn(t)sn

satisfies the

(10.7) 14. Continuation. Let M 2(t) = "'1n2Pn(t) and M(t) = '2,nPn(t) (as in section

5). Show that M~(t) = 2(A-fl)M2(t)

(10.8)

+ (A+fl)M(t).

Deduce that when A > fl the varzance of {Pnet)} IS gIven by (10.9)

15. For the process (7.2) the generating function P(s, t) the partial differential equation

-ap = (1-s) ( -AP + Il -ap}

(10.10)

at

as

=

'2,Pn(t) sn satisfies

.

Its solution is For i t -+A/fl.

00,

=

0 thIS IS a Poisson dIstributIon with parameter ;{(1 -e Jtt)! fl. As the distribution {Pn(t)} tends to a Poisson distribution with parameter

16. For the process defined by (7.26) the generating function for the steady state P(s) = '2, Pnsn satisfies the partial differential equation (10.11)

(fl

ap + AS) -as

=

a),P,

with the solution P = {(fl+AS)/(A+fl)}al 17. For the diffeIential equations (7.26) assume a trial solution of the form

Prove that this is a solution if, and only if, 2

A - - - (l-e

(;1:+Jllt)

A+fl

18. In the "simplest trunking problem," example (7.a), let Qn(t) be the probability that starting from En the system will reach Eo before epoch t.

482

THE SIMPLEST TIME-DEPENDENT STOCHASTIC PROCESSES

[XVII.I0

Prove the validity of the differential equations Q~(t) = -(A +n,u)Qn(t)

+ AQn+l(t) + n,uQn_l(t),

(n

~

2)

(10.12) with the initial conditions Qn(O) = O. 19. Continuation. Consider the same problem for a process defined by an arbitrary system of forward equations. Show that the Qn(t) satisfy the corresponding backward equations (for fixed k) with Pollt) replaced by 1. 20. Show that the differential equations of problem 6 are essentially the same as the forward equations for the transition probabilities. Derive the corresponding backward equations. 21. Assume that the solution of at least one of the two systems of (forward and backward) equations is unique. Prove that the transition probabilities satisfying this system satisfy the Chapman-Kolmogorov equation (1.1). Hint: Show that both sides satisfy the same system of differential equations with the same initial conditions. 22. Let Pik(t) satisfy the Chapman-Komogorov equation (1.1). Supposing that Pik(t) > 0 and that Si(t) = 2: Pik(t) ::;; 1, prove that either Slt) = 1 for an t or Si(t) < 1 for all t. k 23. Ergodic properties. Consider a stationary process with finitely many states; that is, suppose that the system of differential equations (9.9) is finite and that the coefficients Cj and Pjk are constants. Prove that the solutioris are linear combinations of exponential terms eA,(t-r ) where the real part of ). is negative unless A = O. Conclude that the asymptotic behavior of the transition probabilities is the same as in the case of finite Markov chains except that the penodic case IS Impossible.

Answers to Problems

CHAPTER I 1. (a)!; (b)!; (c) 130' 2. The events S1' S2, S1 U S2' and S1S2 contain, respectively, 12, 12, 18, and 6 points. 4. The space contains the two points HH and TT with probability!; the two points HTT and l'HH WIth probabIlity t; and generally tw.o pomts with probability 2-n when n ~ 2. These probabilities add to 1, so that there is no necessity to consider the possibility of an unending sequence of tosses. The required probabilities are it and i, respectively.

t.

9. P{AB} = P{A U B} = ~:, P{AB'} 12. x = 0 in the events (a), (b), and (g). x = 1 in the events (e) and (n. x = 2 in the event (d). x = 4 in the event (c). 15. (a) A; (b) AB; (c) B U (AC).

=

t.

16. Correct are (c), (d), (e), (j), (h), (i), (k), (I). The statement (a) is meaningless unless C c B. It is in general false even in this case, but is correct in the special case C c B, AC = O. The statement (b) is correct if C:::> AB. The statement (g) should read (A u B) - A = A'B. Finally (k) is the correct version of (j). 17. (a) AB'C'; (b) ABC'; (c) ABC; (d) A u B u C; (e) AB U AC U BC; (I) AB'C' U A'BC' U A'B'C; (g) ABC' U AB'e u A'BC = (AB u AC U BC) - ABC; (h) A'B'C'; (i) (ABC),.

18. A

B

U

U

C

= A U (B-AB) U {C-C(A UB)} = A U BA' U CA'B'.

CHAPTER II

18,252; (c) 26 2 + 263 + 264 • In a city with 20,000 inhabitants either some people have the same set of initials or at least 1748 people have more than three initials. 2. 2(210 -1) = 2046. 3

1. (a) 26

;

(b) 26

2

+ 26

3

=

1 n

4. (a) -; 483

(b)

(

nn

1 1) .

484

ANSWERS TO PROBLEMS

5. qA = (%)6, qB = (%)12 + 12(%)11 ·l. 6. (a) PI = 0.01, P2 = 0.27, P3 = 0.72. (b) PI = 0.001, P2 = 0.063, P3 = 0.432, P4 = 0.504. 7. p.r = (10)r lO-r. For example, P3 = 0.72, P10 = 0.00036288. Stirling's formula gIves P10 - 0.0003598 .... 8. (a) (loY; (b) (l1!.o)k; '(180)k; (d) 2(190)k - (18o )k; (a) AB and A u B.

11. The probability of exactly r trials is (n -1)r-1/(n)r = n- 1. 12. (a) [1 . 3 . 5 ... (2n _1)]-1 = 2 n n !/(2n)!; (b) n! [1

3

(2n

1)]-1

2

7

n

.

13. On the assumption of randomness the probability that all of twelve tickets come either on Tuesdays or Thursdays is (t)12 = 0.0000003 .... There are only

t:j

21 pairs of days, so that the probability remains extremely

small even for any two days. Hence it is reasonable to assume that the police have a system. 14. Assuming randomness, the probability of the event is (~)12 = l appro No safe conclusion is possible. 15. (90)10/(100)10 = 0.330476 .... 16. 25! (5 !)-55-25 = 0.00209 .... 17.

2(n -2)in -r -I)! 2(n -r -1) n! = n(n-1) .

19. The probabilities =

are

1 - (i)4 = 0.517747 . ..

and

1 - (ii)24 =

0.491404 ....

20. (a) (n - N)r/(n)r' (b) (1 - N/nY. For r = N = 3 the probabilities are (a) 0.911812 ... ; (b) 0.912673 .... For r = N = 10 they are (a) 0.330476; (b) 0.348678 ....

21. (a) (l-Nlny-1. (b) (n)Nrl«n)NY' 22. (1 _2/n)2 r -2; for the median 2 r+1 = 0.7n, approximately. 23. On the assumption of randomness, the probabilities that three or four breakages are caused (a) by one girl, (b) by the youngest girl are, respectively, l! ~ 0.2 and l536 ~ 0.05. 24. (a) 12 !/1212 = 0.000054.

(b)

(12) 25. 230! 6 12-30 ~ 0.00035 .... 6 66

(26 -2)12-6 = 0.00137 ....

485

ANSWERS TO PROBLEMS

(;,)z" / G~);

26. (a)

n(n-2)

(e)

(b) n

Gr~~)

2'H/ G;);

2"-4/ (2n).

27. (N-3)/(N-1). r-1

r-1

N / 'eN) 2N ~ V2/(N7r). Cj'

28. P = 29. P

30. Cf. problem 29. The probability is

(13)( 39 )(13-m)C6+m) 'C2)C9). m 13-m n 13-n ( 13

31. (:)

13

(264~k) / (~~).

(13\( 39 \(13-a\(26+a\(13-a-b\(13+a+b\ \ a 13 a! ~ b I ~ 13 c 1\ 13 c ,

!\

33. (a) 24p(5, 4, 3, 1);

bl \

(b) 4p(4, 4, 4, 1);

(c) 12p(4, 4,3,2).

(Cf. problem 33 for the probability that the

hand contains a cards of some suit, b of another, etc.) 35. po(r) = (52 -r)4/(52)4; PIer) = 4r(52 -r)3/(52)4; P2(r) = 6r(r-1)(52-r)2/(52)4; paCr ) = 4r(r-1)(r-2)(52-r)/(52)4; plr) = r4/(52)4' 36. The probabilities that the waiting times for the first, ... , fourth ace exceed rare WIer) - po(r); W2Cr) - poCr) + PIer); waCr)

=

po(r)

+ PIer) + plr);

wlr) = 1 - plr).

486

ANSWERS TO PROBLEMS

Next fi(r) = wi(r-1) - wier). The medians are 9" 20, 33, 44. with k

40. (

r1

+5) 5 (r2 + 1).

41. (r1+r2 +r3)! . r1! r2! r3!

43. P{(7)}

=

10.10-7

=

0.000001.

P((6, 1))

10! 7! 7 8!1!1! 1!6!'1O

P{(5,2)}

10! . 7! . -7 8!1!1! 2!5! 10

P{(S, 1, 1)}

10! 7 ! . 10-7 7!2!1! 1!1!5!

P{(4, 3)}

10! 7! 0- 7 8!1!1!'3!4!'1

=

0.000315.

P{(4, 2, l)}

10! 7 ! . 10-7 7!1!1! 1!2!4!

=

0.007560.

=

0.017640.

=

0.005040.

P{(4, 1, 1, I)}

10'

0.000063. =

0001512

7'

P{(3, 3, I)} P{(3, 2, 2)} P{(3, 2, 1, I)}

0.000 189.

0.007560.

7!2!1! 2!2!3!'1O

7 ! . -7_ 10! 6!2!1!1! 1!1!2!3! 10 -0.105840.

P{(2 , 2 , 2, I)}

10! 7! 18 5!4!1! 1!1!1!1!3! 1O! 7, . . 10-7 6!3!1! 1!2!2!2!

P{(2, 2,1,1, I)}

10! 7' 7 5! 3! 2! 1! 1! I'! 2! 2! .10- = 0.317 520.

P {(2, 1, 1, 1, 1, 1)}

1O! 7 ! . 10-7 _ 0 31 520 4! 5! I! I! I! I! 1 ! 1 ! 2! -. 7 .

P{(3, 1, 1, 1, 1)}

- 10 ! . 7.,. 10-7 )} -3!7! { P(1,l,l,l,l,l,l,

7

8.105848. =

0.052 920.

0 060 = .480 .

~

2;

487

ANSWERS TO PROBLEMS

44. Letting S, D, T, Q stand for simple, double, triple, and quadruple, respectively, we have P{22S}

=

~!~ ~

.

365-22

= 0.524 30.

_ 365! . 22!. -22 - 1! 344! 20! 2! 365

8

P{20S

+ 1D}

P{18S

+ 2D}

365! 22! -22 ---~-' 365 2! 345! 18! 2! 2!

P{16S

+ 3D}

365! 22! -22 3! 346! 16! 2! 2! 2!· 365

P{19S

+

P{17S

+

IT} 1D

=

+

= 0.3520 .

365! . ~. 365-22 345! 19! 3!

IT} = 365!.

22! .365-22 346 ! 17! 2! 3!

P{14S

+ 4D}

P{15S

+ 2D + 1T f

=

=

0.01429.

=

0.00680.

=

0.003 36.

000066

=.

= 365! . ~. 365-22 346! 18! 4!

+ 1Q}

0.09695.

365! 22! . 6 -22 4 347! 14!2!2!2!2! 35 -0.0012.

.36 -22 ) - 365! . 22! - 347! 15! 2! 2! 3! 5

P{(18S

=

.

= 0.00009.

2 45. Let q = (55 ) = 2,598,960. The probabilities are: (a) 4/q;

(d) 9 .45 .q-l = 217665880;

(j)

(c) 13 . 12·4 . 6 . q-l = 41665;

(b) 13 . 12 ·4 . q-l = 4/65;

cn .

11 . 6 6 4 lJ

(e) 13· 1

(~2)

No'5;

4.4 2 . q-l =

(g) 13

c:)

4~~5;

6 43 q

I

1760

4165'

CHAPTER IV 1. 99/323. 2. 0.21 . . . . 3. 1/4. 4. 7/26 • 6 5. 1/81 and 31/6 • 6. If Ak is the event that (k. k) does not appear. then from lC.5)

1- h

=6

7. Put r l = S3 = 40

-G) G:r G) G~r -(!) G~) G~) -G~r G!)· (:8k c;) (~)

G~r

c:)

+

Then Sl = 13

p. Numerically,

approximately.

+6

PrO]

=,0.09658;

S, =

Pel]

= 0.0341;

p;

P[2]

= 0.0001,

488 8.

ANSWERS TO PROBLEMS

Ur

=

i

(_1)k

k=O

(1 - ~\\. nJ

(N) k

II, (12.18) for a

roof that the two

k=O

results agree. 10. The general term is alkla2k2'" aNkN' where (k l , k2' ... ,kN) is a permutation of (1,2, ... ,N). For a diagonal element kv = v. 12.

Ur

14. Note that, by definition, Ur

15. Ur

Ur

i: (

I

The limit equals

l)k-1

I (_1)k (n -1) k

Pro] = 0.780217, P[3]

=

0.000330,

0 for r


and Un

= n! sn/(ns)n'

{n-1) (ns-kS)r_l.

n-l

k=O

=

(1 - -k +-1)r-1. n

Pel] =0.204606, P[4]

=

P[2]

=

0.014845,

0.000002, approximately.

N-m 19. m! N! Um = (-1)k(N-m-k)!/kL

I

k=O

20. Cf. the following formula with r 21. (rN)! x = -

(~) r'(rN-2)!

-

(0

r"{rN-3)!

=

2.

+ _0 ° 0+ (-l)Nr N(rN-N)!.

25. Use II, (12.16) and (12.4). 26. Put UN = Al u·· . U AN and note that UNAN+! = (AlAN+! U· .. U (A NA N+1)'

UN+! = UN U AN+! and

489

ANSWERS TO PROBLEMS

CHAPTER V 2. P 3. (a)

13

4

Cal

(~~)

2

10· 59 1 - 610 _ 510

0.182. . ..

=

= 0.411 . . ..

4.

=

=

0.61 ....

The probability of exactly one ace is

(b) 1 - 0.182 - 0.411

= 0.407, approximately.

11

G~) ~ 50 ; 80 345'

7. ~~.

P

12. -2- .

-p

t, bn = Yn - t, Cn = Zn -~. Then lanl + Ibnl + len I = t{lan+11 + Ibn+1 I + Ien+1I}. + Ibnl + len I increases geometrically.

14. (d) Put an = xn -

Hence lanl 15. P = (l-Pl)(l-P2) ... (1-Pn)' 16. Use 1 x < e re for 0 < x < 1 or Taylor's series for log (1 II, (12.26).

x); cf.

b +c 18. - - - b+c+r

19. Suppose the assertion to be true fo~ the nth drawing regardless of b, r, and c. Considering the two possibilities at the first drawing we find then that the probability of black at the (n + l)st trial equals

b . b+c +r . b b+r b+r+c b+r b+r+c

b b+r

20. The preceding problem states that the assertIon IS true for m I and all n. For induction, consider the two possibilities at the first trial. 23. Use II, (12.9). 24. The binomial coefficient on the right is the limit of the first factor in the numerator in (8.2). Note that

(-I/ Y) (1 + p)n ( -1/Y) n n I""-.J

2

26. 2v = 2p(l - p) ~

t

in consequence of (5.2).

1•

490

ANSWERS TO PROBLEMS

28. (a) u 2; (b) u 2

= P32 =

33. Pn

+ uv + v2/4;

2P2l

= P,

P12

(c) u 2

= P33 =

+ (25uv+9v 2+vw+2uw)/16.

2P23 =q,

P13

= P3l =

0, P22

~

x

= t.

CHAPTER VI 1.

5

2. The probability is 0.02804 ....

1 6'

3. (9.9)X

0.1,

~

22.

and x 2: 66, respecuvely. 5. 1 - (0.8)10 - 2(0.8)9

=

0.6242 ....

6. {I -(0.8)10 -2(0.8)9}/{1 _(0.8)10} = 0.6993 .... 7.

~26} ~26} / ~52}

8.

(~2)

2

11

r

13

=

0.003954 ...• and

~13} 2 2113

-

0.00952 ....

{6-6 - 2 . 12-6}.

9 True values' 0 6651 , 040187 , and 0 2009 ,; poisson approximations: 1 - e-l = 0.6321 ... , 0.3679 ... , and 0.1839 .... 00

00

10. e- 2

2. 2klk!

11. e-l

= 0.143 ....

2. 11k!

= 0.080 ....

3

4

12. e- x / IOO ~ 0.05 or ~ ~ 300. 13. e-l = 0.3679 ... , 1 - 2 . e-l = 0.264 .... 14. e- X ~ 0.01, x ~ 5. 15. lip = 649,740. 16. 1 - pn where P = p(O; A) + ... + p(k; A). 18. q3 for k = 0; pq3 for k = 1,2,3; and pq3 - pq6 for k = 4. 19.

k~ (~)' 2 ,. (~)2

b 1 20. a+ - (a+b-1) k=a k

2.

pa

2.

b 1

a+k-1

qk,

,- '"'" l/vr;;;, for large

n.

. can be wntten . -.In the alternatIve . form ThIS

pk~-b-l-k.

..

where the kth term equals the probabIhty that the ath

success occurs directly after k ~ b - 1 failures. 21.

xr

=

2N-1-r) . 2- 2N+r +1. ( N _ 1 N

22. (a) x = r~l xr2-

r l -

N

= 2- 2N ~l

(2N-1-r) N - 1 ;

whence

(b) Use II, (12.6).

491

ANSWERS TO PROBLEMS

25. P Plq2(Plq2 + P2ql) 1. 31. By the Taylor expansion for the logarithm b(O; n,p)

= qn = (1-A/n)n < e- A = p(O; A).

The terms of each distribution add to unity, and therefore it is impossible that all terms of one distribution should be greater than the corresponding terms of the other. 32. There are only finitely many terms of the Poisson distribution which are greater than E, and the remaining ones dominate the corresponding terms of the binomial distribution. CHAPTER VII 3. 91( - ~5) = 0.143 .... 2. Use (1.7). 1. Proceed as in section 1. 4. 0.99. 5. 511. 6. 66,400. 7 Most certainly The inequalities of Chapter VI suffice to show that an - excess of more than eight deviations is exceedingly improbable. 8. (27rn)-I{pIP2(1-Pl -pJ}-t. CHAPTER VIII 1. f3 = 21. 2 x - pu + q" U

+ rw,

= plX-1

where u, v, w are solutions of

w

3. u

= plX-1 + (qv+rw) V :::;:

1 -

+ (qv+rw)

IX-I

p

1 - P

= (pu+rw)

V

1 - P-l 1 q

-q

= pu + qv + rw = x.

1

IX-I

1- p

-p

,

1 - qP-l (pu+rw) 1 '

W

-q

< (2p)n,

but

If P = i, the last quantity is ,..." to zero.

dn;

4. Note that P{A.J

,

if p

1 - ,1-1 = (pu+qv) 1 _ r

> -~,

then P{An} does not even tend

CHAPIER IX 1. The possible combinations are (0,0), (0, 1), (0, 2), (1,0), (1, 1), (2,0), (2, 1), (3,0). Their probabilities are 0.047539, 0.108883, 0.017850, 0.156364, 0.214197, 0.321295, 0.026775, 0.107098.

492

ANSWERS TO PROBLEMS

2. (a) The joint distribution takes on the form of a 6-by-6 matrix. The main diagonal contains the elements q, 2q, ... ,6q where q = is. On one side of the main diagonal all elements are 0, on the other q. (b) E(X) = i-, Var (X) = -f~, E(Y) = 136l, Var (Y) = i~g~, Cov (X, Y) = 17°25. 3. In the joint distribution of X, Y the rows are 32-1 times (1, 0, 0, 0, 0, 0), (0, 5, 4, 3, 2, 1,) (0, 0, 6, 6, 3, 0), (0, 0, 0, 1, 0, 0); of X, Z: (1, 0, 0, 0, 0, 0), (0, 5, 6, 1, 0, 0), (0, 0, 4, 6, 1, 0), (0, 0, 0, 3, 2, 0), (0, 0, 0, 0, 2, 0), (0, 0, 0, 0, 0, 1); Y, Z: (1, 0, 0, 0), (0, 5, 6, 1), (0, 4, 7,0), (0, 3, 2, 0), (0, 2, 0, 0), (0, 1, 0, 0). Distribution of X + 1: (1, 0, 5, 4, 9, 8, 5) all divided by 32, and the values of X + Y ranging from to 6; of XV: (1, 5, 4, 3, 8, 1, 6, 0, 3, 1) all divided by 32, the values ranging from to 9. E(X) =~, E(Y) = j-, E(Z) = -ft, Var (X) =!, Var (Y) = i, Var (Z) = i~:.

°

°

4. (a) p/(l + q); (b) 1/(1 + q + q2); (e) 1/(1 + q)2 . 8. The distribution of V n is given by (3.5), th 2 of Un follows by symmetry.

9. (a) P{X ~ r, Y ~ s} = N-n(r-s+1)n

for r

~

P{X = r, Y = s} = N-n{(r-s+1)n - 2(r-s)n

if r

> s,

rn - 2 - (r - 1)n-2 rn _ (r-1)n

j

if

rn - 2 x = --,....----,rn - (r - 1)n

if j

~

x=o

if j

>r

2

+ (r-s--':'1)n},

and = N-n if r = s.

(b) x =

(e) a

s;

< rand

rand k or k

k =

< r. r,

or j

=

rand

k

~

r.

> r.

nN2

~ (n + 1)2(n +2) .

10. Probability for n double throws 2pq(p2 +q2)n-1. Expectation 1/(2pq). 12. P{N = n, K = k} = (:) pn-k(qq')k. qp' for k P{N

~ n.

= n} = (1-Qp')nqp'.

13. E(N K

+

1) = k,nL kpk,n/(n + 1) = q2p 'q'n=l ~ (1 - n ~ l)(P + qq')n-1 qq' 1

E(K)

'II"

q2p'q' 1 log (1 qp')2 qp' .

--=-=~-.-::

,

= ~. E(N) =

(1

-qp p' ' q p '

p(K, N) = Vq'/(1-qp').

')

; Cov (K, N) =

q' qp

-;-2 .

=

493

ANSWERS TO PROBLEMS

14. (a) Pk

= pkq + qkp; E(X) = pq-1 + qp-1; Var (X) = pq-2

+ qp-2

- 2.

(b) qk = p2q k-1 + q2p k-1; P{X = m. Y = n} n ~ 1; E(Y) = 2; a 2 = 2(pq-1+ qp-1_1).

17. (:)364 365 n k -

= pmHqn + qm+1qn with m,

1 n - .

18. (a) 365{I 364 1t · 365 It -n3641t 1 . 365 It}; (b) n ~ 28. 19. (a) fJ- = n, a 2 = (n-l)n; (b) fJ- = (n+l)/2, a 2 = (n 2 -1)/12. 20. E(X) = np1; Var (X) = np1(1-P1); COY (X, Y) = -np1P2. 21. -n/36. This is a special case of problem 20. 25. E(Y r ) =

I : +1; k=l r -

r

Var (Y r )

=

I k=l

(b) E(X) = N {1- q k+k- 1};

26. (a) 1 - qk;

N(N ,+k 1) (-k 1)2 . r

(c)

+

d~~)

=

o.

27. ~(1 _pj)n. Put Xj = 1 or 0 according as the jth class is not or is presented. 28. E(X) = r 1(r 2 + 1); r1 + r2 b

33. E

(x) r

=

nb ; +r r

I

00

r=k =

I

k=l

Var (X) = r1r2(r1 -1)(r2 + 1) 2. (r1 +r2 -1)(r1 +r2) nbr{b +r +nc} (b +r)2(b +r +c) .

Var (8 n )

k- 1(k-l) prqk-r = r-l

=+

(_I)k-1 _ _ r - k q

-=

r logp.

q

To derive the last formula from the first, put [(q) =rI k- 1(k-l)qk. Using II, (12.4), we find that ['(q) by repeated integrations by part.

=

r-l rqr-1(1_q)-r. The assertion now follows

CHAPTER XI 1. spes) and P(S2). 2. (a) (1-S)-lP(S); (b) (1-S)-l SP(S); (c) {1-sP(s)}/(1-s); (d) PoS-1 + {1-s- 1P(s)}/(1-s); (e) HP(v~) + P( -V:;)}. 3. U(s) = pqs2/(1_pS)(1-qs). Expectation = l/pq, Var = (1-3pq)[p2q2. 6. The generating function satisfies the quadratic equation A(s) = A2(S) + s.

Hence A(s) =

i - iv-1 -4s

and an

=

n-1 (2n -2) n - 1

.

494

ANSWERS TO PROBLEMS

10. (a) (b) 11. (a) (b)

c:pr(s)p1c(s) Ip - ql c:pr(s)[l +P(s) + ... (q!pyc:p2r(s). (q!pyc:p2r(s)U(s).

+ pk(s)] Ip -

ql.

12. Osmg the generating function for the geometric distribution of Xv we have without computation N-1)(N-2) P (s) = sr ... (N-r+1) (N - s r N - 2s N - (r -l)s .

13. Pis){N - (r-1)s} = P r_ l (s)(N-r-1)s. s 2s 14. Pis) = N _ (N -l)s N - (N'-2)s

rs N-(N-r)s'

15. Sr is the sum of r independent variables with a common geometric distribution. Hence

pr,k v-I

16. (a) P{R = r} =

I

=

qrp k(r+k-1) k •

P{Sr-1 = k}P{Xr ~ v - k}

=

k=O

1

E(R)

(b) 17

1

(PIP2)N

p

:v ~2)2(qIqJV-I.

vq

p

2 •

Note that

+ q3.

=

(1

+ s + ... + Sf-I)(1 + Sf + s2a + ... + s(b-I)a).

Using the fact that this recurrence relation 1 U(s) = -1- qs

22.

Val (R)

~I (N

+ s + ... + Sfb-I

- p3 type,

+ qv,

Un

= pWn-1 + qUn-l>

U(s) - 1 = psW(s) + qsW(s).

Vn

+ qsU(s);

IS

of the convolutIon

(pS)3

+ (1 -qs)3 U(s).

= PUn-1 + qVn-l> Wn = pVn-1 + qwn - I • Hence V(s) = psU(s) + qs . V(s); W(s) = psV(s) + CHAPTER XIII

1. It suffices to show that for all roots s ¥= 1 of pes) = 1 we have lsi and that lsi = 1 is possible only in the periodic case.. .

~

1,

495

ANSWERS TO PROBLEMS

2. u,n

~ ((~)2-'nr~ ltV("nl'.

Hence S is persistent only for r

~ 2.

For r = 3 the tangent rule for numerical integration gives 00

I

U2n

n=l

~

1"00 1 ~ A3jA3r dx = 3/-

'V

J rr 1

3. u6n ,...." vi 6/(2rrn)5. Thus u - 1 and! ~ 0.045.

+

4. U(A-t-1)n =

n except when p = AI(A

n

pAnqn.

'V

~

'\j rr

x

J

~

1 - .

2

6 roo x-i dx. (2rr)5 i

J

Hence u

~

1.047

The ratio of two successive terms is

<1

+ 1). (The assertion is also a consequence of the law of large numbers.) 6. From + P{X1 > O} < 1 conclude that f < 1 unless P{X1 > O} = O. In this case all Xi < 0 and & occurs at the first trial or never. 7. Zn = smallest integer ~(n - Nn)/r. Furthermore E(Zn) ,...." np/(q + pr), Var (Zn) ,...." npq(q + pr)-3.

Iii

(1 - qs)B(qs) . ( )' and pes) as In problem 8. - s + psB qs 11. N~ ~ (N n - 714.3)/22.75; 91(1) - 91( -1) ~ i. 12. rn = rn-1 - rn-2 + i rn-3 with ro = r1 = r2 = 1; R(s) = (8 +2S2)(8 -8s +2s2 -s3)-1; rn"-' 1.444248(1.139680)-n-1. 14. II an is the probablhty that an A-run of length r occurs at the nth trial, then A(s) is given by (7.5) with p replaced by ct and q by 1 - ct. Let B(s) and C(s) be the corresponding functions for B- and C-runs. The required generating functions are [(1 - s)U(s)]-l, where in case (a) U(s) = A(s); in (b) U(s) = A(s) + B(s) - 1; in (c) U(s) = A(s) + B(s) + C(s) - 2. 15. Use a stIaigiItfOi waId combination of the method in example (8.b) and 9. G(s) = 1

t

problem 14. 16. Expected number for age k equals Npqk. 18. wk(n) = Vn_krk when n > k and wk(n) = f3k-nrk/rk-n when n ~ k. 19. Note that 1 - pes) = (1-s)Q(s) and Il- Q(s) = (1-s)R(s), whence Q(1) = ft, 2R(1) = a 2 - ft + ft2. The power series for Q-1(S) = I (Un -un-Jsn converges for s - 1

CHAPTER XIV (q/p)b - 1 1. (q/p)a+ b _ 1

. If P ¥= q,

and

b

a

+b

if P =q.

3. When q < p. the number of visits is a defective variable. 4. The expected number of visits equals p(l - q1)/qqa-1 = (p/q)a. 5. The probability of ruin is still given by (2.4) with P = ct(1_y)-l, q = = f3(1 _y)-l. The expected duration of the game is Di1 _y)-l with D z given by (3.4) or (3.5).

496

ANSWERS TO PROBLEMS

6. The boundary conditions (2.2) are replaced by qo - oq1 = 1 - 0, qa = O. To (2.4) there corresponds the solution

qz

(q/p)a(1-0) + oq/p - 1

The boundary conditions (3.2) become Do = oD1 , Da = O. 7. To (2.1) there corresponds qz = pqz+2 + qqz-1, and qz = Az is a particular solution if A = pA3 + q, that is, if A = 1 or A2 + A = qp-1. The probabili ty of I uin is 1

if q

~

2p

if q

~

2p.

10. wz.n+1(x) = pWz+1,n(x) + qWz_1.n(x) with the boundary conditions (1) wo,n(x) = wa,n(x) =:= 0; (2) wz,o(x) = 0 for z ¥= x and wx,o(x) = 1. 11. Replace (1) by wo,n(x) = w1.nCx ) and wa,n(x) = wa_1.n(x),

12. Boundary condition: Ai(s)A~-!(S)

Af-t(s)

Ua n

=

U a ] no

+ A~(S)A1-!(s) + A~-!(S)

Generating function:

+ A~-Z-!(s) A~-!(S) + A~-l(s)

Af-z-!(s) -

00

18. P{Mn

< z}

=

L (vx-z,n

- vx+ z.n)

x=l

P{Mn z} P{Mn < z + i} P{l\fn < z}. 19. The first passage through x must have occurred at k ~ n, and the particle returned from x in the following n - k steps. 31. The relation (8.2) is replaced by a 1

UzCs) = s

L UxCs)Px_z

+ srz·

x=l

The characteristic equation is s LPk ak = 1. CHAPTER XV 1. P has rows Cp, q, 0, 0), (0, 0, p, q), (p, q, 0, 0), and (0, 0, p, q) For n > 1 the rows are (p 2,pq,pq, q2). 2. (a) The chain is irreducible and ergodic; p~~) -- i for all j, k. (Note that P is doubly stochastic.) (b) The chain has period 3, with G01 containing E1 and E 2 ; the state E4 forms GlrJ and E3 forms G2 • We have U] = U2 = t. U3 = U4 = 1. (c) The states E1 and E3 form a closed set Sl' and E 4, E5 another closed set S2' whereas E2 is transient. The matrices corresponding to the closed sets are 2-by-2 matrices with elements i. Hence p}~) --i if E} and Ek belong to the same S· __ O· r, pen) j2 ' finally pen) 2k __ 1.2 if k = 1, 3, and pen) 2k -- 0 if k = 2, 4,5.

497

ANSWERS TO PROBLEMS

(d) The chain has period 3. Putting a = (0,0,0, t, t, t), b = (1,0,0,0,0,0), c = (0, i, t 0, 0, 0), we find that the rows of p2 = p5 = ... are a, b, b, c, c, c, those of p3 = p6 = . .. are b, c, c, a, a, a, those of P = p4 = . .. are c, a, a, b, b, b.

3. p~j) = (j/6)n, p~~) =' (k/6)n - ((k-1)/6)n if k k <j. 4. x k = (1, i, i, i), Yk = (t, i, 1, i)· 6. For n ~j f}on)

=

1)

(n.

> j,

and

p~~) = 0

if

= ( j .) (-p)iqj.

pn-jqj

}-1

n-)

Generating function (qs)j(1 - ps)-j. Expectation j/q.

8. The even-numbered states form an irreducible closed set. The probability 'of a return to Eo at or before the nth step equals 1

Vo

+ v o(1

v2)

+ vov 2(1

v4 )

+ ... + VOV2 ... v2n-2(1

D2n)

1 - VOV2V4 ••• V2n Thus the even states are persistent if, and only if, the last product tends to O. Tl?-e probability that starting from E 2r+1 the system remains forever among the odd (transient) states equals v2r+1v2r+3" .. =

10. Possible states Eo, ... , Ew' For j PU-l

=

j(p - w

+ j) p-2,

13.

p=

Pj,j+1

. .0

1

u

U

U

o

o

>0

. ..

=

(p - j)(w - j)p-2,

1

"'-U-U

0 0 q P 0 0

0 1 ... 0 0_

14. Note that the matrix IS doubly stochastic; use example (7.h). 15. Put Pk,k+1 = 1 for k = 1, ... ,N - 1, and PNk = Pk' 16. L.Ujpjk = Uk> then U(s) = 11 0(1 - s)P(s){P(s) -S}-l. For ergodicity it is necessary and sufficient that p, = pl(l) < 1. By L'Hospital's rule U(l) = uo(1 - p,) whence Uo = (1 - p,)-l.

498

ANSWERS TO PROBLEMS

25. If N ~ m - 2, the variables x(m) and x(n) are independent, and hence the three rows ofthe matrix p }~.n) are identical with the distribution of x(n), namely (1, t, t)· For n = m + 1 the three rows are (t, t, 0), (t, i, 1), (0, t, t)· CHAPTER XVII 3. E(X) = ieAt;

Var (X) = ieAt(eAt -1).

4. P~ = -AnPn

+

A(n+l)Pn +1'

Pn(t) =

t~Je

E(X) = ie-At;

tAt(e At _l)t n

(n

~i).

Var (X) = ie-At(l _e- lt ).

5. P~(t) = -(A +np,)Pn(t) + APn _ 1(t) + (n + l)p,Pn +1(t) for n ~ N - land P'tv(t) - - NpPN(t) + APN l(t). 19. The standard method of solving linear differential equations leads to a system of linear equations.

Index Absolute probabllztles 116; - In Markov chains 384. Absorbing barrier in random walks 342, 368, 369, 376; - in higher dimensions 361. Absorbing boundaries 477. Absorbing states (in Markov chains) 384. Absorption probabilities: in birth and death processes 455, 457; in diffusion 358, 367; in Markov chains 399ff., 418, 424, 425, 438ff.; in random walk 342ff., 362, 367. [cf. Duration of games; Extinction; First passages; Ruin problem.] Acceptance cf. Inspection sampling. Accidents: as Bernoulli trials with variable probabilities 282; bomb hits 160; distribution of damages 288; occupancy model 10; Poisson distribution 158, 292; urn models 119, 121. ADLER, H, A. and K. W. MILLER 467. Aftereffect: lack of - 329, 458; urn models 119,122. [cf. Markov property.] Age dist1ibution in lenewal theOIY 335, 340; (example involving ages of a couple 13, 17.) Aggregates, self-renewing 311, 334, 340. Alleles 133. Alohabets 129. 1 ANDERSEN cf. SPARRE ANDERSEN, E. A.NDRE, D. 72, 369. Animal populations: recaptures 45; trapping 170, 239, 288, 301. Aperiodic cf. Periodic. Arc sine distributions 79. Arc sine law for: first visits 93; last visits 79; maxima 93; sojourn times , 82. (Counterpart 94.) Arrangements cf. Ballot problem; OCCU-" pancy. Average of distribution = Expectation.

Averages, moving 422,426. Averaging, repeated 333,425. b(k; n, p) 148. BACHELlER, L. 354. Backwmd equations 358, 468, 474, 482. Bacteria counts 163. BAILEY, N. T. J. 45. Ballot problem 69, 73. Balls in cells cf. Occupancy problems. Banach's match box problem 166, 170, 238. Barriers, classification of 343, 376. BARTKY, W. 363. BARTON, D. E. and C. L. MALLOWS 69. BATES, G. E. and J. NEYMAN 285. Bayes'rule 124. BERNOULLI, D. 251, 378. BERNOULLI, J. 146, 251. Bernoulli trials: definition 146; infinite sequences of - 196ff.; interpretation in number theory 209; recurrent events connected with - 313ff., 339. [cf. Arc sine law, Belling, Firsl passage limes, Random walk; Returns to origin; Success runs etc.] Bernoulli trials, multiple 168, 171,238. Bernoulli trials with variable~probabilities 218, 230, 282

Bernoulli-Laplace model of diffusion 378, 397; generalized 424. BERNSTEIN, S. 126. BERTRAND, J. 69. Beta function 173. Betting: 256, 344ff., 367 - in games with infinite expectation 246, 251ff., 322; - on runs 196, 210, 327; - systems 198,346; three players taking turns 18, 24, 118, 141, 424. [cf. Fair games; Ruin problem.] Bias in dice 149.

499

500

INDEX

Billiards 284. Binomial coefficients 34, 50ff.; identities for - 63ff., 96, 97, 120. Binomial distribution 147ff.; central term 150. 180. 184: combined with Poisson 171, 287, 301; - as conditional distr. in Poisson process 237; convolution of -173,268; expectation 223 (absolute 241); generating function 268; integrals for - 118, 368, 370; - as limit 10 Elirenfest model 397, for hypergeometric distr. 59, 172; normal approximation to - 179ff.; in occupancy problems 35, 109; Poisson approximation to - 153ff., 171 172, 190 (numerical examples 109, 154); tail estimates 151-152, 173, 193ff.; variance 228, 230. Binomial distribution, the negative cf. Negative binomial. Binomialformula 51. Birth-and-death process 354ff.; backward equations for - 469; inhomogeneous - 472; - in servicing problems 460, 478ff. Birth process 448ff., 478ff.; backward equations for - 468; divergent451ff.,476; general 476. Birthdays: duplications 33, 105 (table 487); expected numbers 224; as occupancy problem 10, 47, 102; Poisson distribution for - 106, 155; (combinatorial problems involving 56, 58, 60, 169, 239). Bivariate: generating functions 279, 340; - negative binomial 285; - Poisson 172, 279. [cf. Multinomial distribution.] BT ACKWET T, D , P DEWEL, and D FREEDMAN 78. Blood: counts 163; tests 239. Boltzmann-Maxwell statistics: 5,21, 39ff., 59; as limit for Fermi-Dirac statistics 58. [cf. Occupancy problems.] Bomb hits (on London) 160. Bonferroni's inequalities 110, 142. Books produced at random 202. Boole's inequality 23. BOREL, E. 204, 210. Borel-Cantelli lemmas 200ff.

Bose-Einstein statistics 5, 20, 40, 61, 113; negative binomial limit 62. BOTTEMA, O. and S. C. VAN VEEN 284. Boundaries for Markov processes 414ff., 477. Branching processes 293ff., 373; - with two types 301. Breakage of dishes 56. Breeding 144, 380,424, 441. BRELOT, M. 419. Bridge: ace dlstnbutIOn 11, 57; defimtIOn 8; waiting times 57; (problems and examples 27, 35, 37, 47, 56, 100, 112, 140, 169.) [cf. Matching of cards:' Poker; Shuffling.] BROCKMEYER, E., H. L. HALSTROM, and A. JENSEN 460. Brother-sister mating 143, 380, 441. Brownian motion cf. Diffusion. Busy hour 293. Busy period in queuing 299, 300, 315. CANTELLI, F. P. 204. (Borel-Cantelli lemmas 200.) CANTOR, G. 18, 336. Car accidents 158, 292. CARDANO, G. 158. Cards cf. Bridge; Matching of 'cards; Poker; Shuffling. Cartesian product 129. Cascade process cf. Branching process. CATCHESIDE, D. J. 55, 287; - , D. E. LEA, and J. M. THODAY 112, 161. Causes, pIobability of 124. Cell genetics, a problem in 379, 400. Centenarians 156. Central force, diffusion under - 378. Central limit theorem 244, 254, 261; applications to combinatorial analysis 256, to random walks 357, to recurrent events 320. [cf. DeAfoilJre Laplace limit theorem; Normal approximation.] Chain letters 56. Chain reaction cf. Branching process. Chains, length of random - 240. CHANDRASEKHAR, S. 425. Changes of sign in random walks 84ff., 97. Chan,ging stakes 346. Channels cf. Servers; Trunking problems.

INDEX

G. 45. Chapman-Kolmogorov equation: for Markov chains 383, 421; for nonMarkovian processes 423; for stochastic processes 445, 470ff., 482. Characteristic equation 365. Characteristic roots = eigenvalues 429. CHEBYSHEV, P. L. 233; - inequality 233, 242. Chess 111. Chromosomes 133 ; breaks and lOterchanges of 55, 112; Poisson distribution for - 161, 171, 287. CHUNG, K. L. 82, 242, 312, 409, 413. CLARKE, R. D. 160. ClassijiGation multiple 27. Closed sets in Markov chains 384ff. COCHRAN, W. G. 43. Coin tossing: as random walk 71, 343; - experiments 21, 82, 86; simulation of - 238; ties in multiple - 316, 338. [cf. Arc sine laws; Bernoulli trials; Changes of sign; First passage times; Leads; Random walk; Returns to origin; Success runs, etc.] Coincidences = matches 100, 107; multiple - 112. Collector's problem 11, 61, 111; waiting times 48, 225, 239, 284. Colorblindness: as sex-linked character 139; Poisson distribution for 169. Combinatorial product 129. Combinatorial runs cf. Runs, combinatOl iat. Competition problem 188. Complementary event 15. Composite Markov process (shuffling) 422. Composition cf. Convolution. Compound Poisson distribution 288ff., 474. Conditional: distribution 217ff., 237; expectation 223 ; probability 114 if, [cf. Transition probabilities.] Confidence level 189. Connection to a wrong number 161. Contagion 43, 120, 480; spurious - 121. Continuity equation 358. Continuity theorem 280. Convolutions 266ff. (special cases 173). Coordinates and coordinate spaces 130. Cornell professor 55. CHAPMAN, D.

501

Correlation coefficient 236. Cosmic rays 11, 289, 451. Counters cf. Geiger counter; Queuing; Trunking problems. Coupon collecting cf. Collector' s prohlem. Covariance 229ff., 236. Cox, D. R. 226. CRAMER, H. 160. Crossing of the axis (in random walks) 84ff., 96. Cumulative distribution Junction 179. Cycles (in permutations) 257, 270. Cyclical random walk 377, 434. Cylindrical sets 130.

G. 140. Damage cf. Accidents,' Irradiation. DARWIN, C. 70. Death process 478. Decimals, distribution of: of e and 7r 32, 61; law of the iterated logarithm 208. [cf. Random digits.] Decomposition of Markov chains 390. Defective items: Poisson distribution for -155; (elementary problems 55, 141). [cf. Inspection sampling.] Defective random variables 273, 309, Delayed recurrent events 316ff.; In renewal theory 332, 334. DEMOIVRE, A. 179, 264, 285. DeMoivre-Laplace limit theorem 182ff.; application to diffusion 357. [cf. Central limit theorem; Normal approximaDAHLBERG,

tion.]

Density fluctuations 425. [cf. BernoulliLaplace model; Ehrenfest model.] Density function 179. Dependent cf. Independent. Derivatives partial, number of 39, DERMAN, C. 413. Determinants (number of terms COIltaining diagonal elements) 111. DEWEL, P. 78. Diagonal method 336. Dice: ace runs 210,324; - as occupancy problem 11 ; equalization of ones, twos, ... 339; de Mere's paradox 56; Newton-Pepys problem 55; Weldon's data 148. Difference equations 344ff.; method of

502

INDEX

images 369; method of particular solutions 344, 350, 365; passage to limit 354ff., 370; several dimensions 362 (- in occupancy problems 59, 2S4; for Polya distribution 142, 480). [cf. Renewal theory.] Difference of events 16. Diffusion 354ff., 370; - with central force 378. [cf. Bernoulli-Laplace model; Ehrenfest model.] Dirac-Fermi statistics 5, 41: - for misprints 42, 57. Discrete sample space 17ff. Dishes, test involving breakage of 56. Dispersion = variance 228. Dlstmgulshable cr. IndlstmgUlshable. Distribution: conditional 217ff., 237; joint 213; marginal 215. Distribution function 179, 213; empirical -71. DOBLIb', W 413 DOMB, C. 301. Dominant gene 133. Domino 54. DOOB, J. L. 199,419,477. DORFMAN, R. 239. Doubly stochastic matrices 399. Drift 342: - to boundary 417. Duality 91. DUBBINS, L. E. and L. J. SAVAGE 346. Duration of games: in the classical ruin problem 348ff.; in sequential sampling 368. [cf. Absorption probabilities; ExtinctIOn; hrst passage times; Waltmg times.] & for recurrent events 303, 308. e, distribution of decimals 32, 61.

Eeo/-og) 289.

Efficiency, tests of 70, 148, 149.

F. 119. P. and T. 121. Ehrenfest model: 121; 377; density 425; reversibility 415; steady state 397. Eigenvalue = characteristic value 429.

EGGENBERGER, EHRENFEST,

Ein«lein-BMe «MIMic« 5, 20, 40, 61, 113;

negative binomial limit 62. C. and F. S. SWED 42. Elastic barrier 343, 368, 377. Elastic force, diffusion under - 378.

EISENHART,

Elevator problem 11, 32, 58 (complete table 486). ELLIS, R. E. 354. Empirical distribution 71. Entrance houndary 419. Epoch 73, 300, 306, 444.

Equalization cf. Changes of sign; Returns to origin. Equidistribution theorems 94, 97. [cf. Steady state.] Equilibrium, macroscopic 395ff., 456. Equilibrium, return to cf. Returns to origin. ERDOS, P. 82, 211, 312. Ergodic properties: in Markov chains 393ff., 443; - III stochastIc processes 455, 482. Ergodic states 389: ERLANG, A. K. 460; -'s loss formula 464.

Errorftll'lction 179 ESP 55, 407. Essential states 389. Estimation: from recaptures and trapping 45, 170; from samples 189, 226, 238. [cf. Tests.] Estimator, unbiased 242. Events: 8, 13ff.; compatible 98; independent - 125ff.; - in product spaces 128ff,; simultaneous realization of - 16, 99, 106, 109. Evolution process (Yule) 450. [cf. Genes.] Exit boundary 416. ExpectatIOn 220ft.; condItIOnal - 223; from generating functions 265 ; infinite - 265; - of normal distribution 179; - of products 227; - of reciprocals 238, 242; - of sums 222. Experiments: compound and repeated 131; conceptual 9ff. Exponential distribution 446; characterization by a functional equ. 459. Exponential holding times 458ff. Exponential sojourn times 453. Extinction: in birth and death processes 45 7 ; in branching processes 295ff (in bivariate branching processes 302); of family names 294; of genes 136, 295, 400. [cf. Absorption probabilities.] Extra Sensory Perception 55, 407.

INDEX

Factorials 29; gamma function 66; Stirling's formula 52, 66. Fair games 248ff., 346; - with infinite expectation 252; unfavorable - 249, 262. Faltung = convolution. Families: dish washing 56; sex distribution in - 117,118,126, 141,288. Family names, survival of 294. Family relations 144. Famzly size, geometrIc distribution for 141, 294, 295. "Favorable" cases 23, 26. FERGUSON, T. S. 237. Fermi-Dirac statistics 5, 40; - for misprints 42, 58. FINUCAN, H. M. 28, 239. Fire cf. Accidents. Firing at targets 10, 169. First passage times in Bernoulli trials and random walks 88. 271. 274, 343ff (Explicit formulas 89, 274, 275, 351, 353, 368; limit theorems 90, 360.) [cf. Duration of games; Returns to origin; Waiting times.] First passage times: in diffusion 359, 368, 370; in Markov chains 388; in stochastic processes 481. [cf. Absorption probabilities. ] Fish catches 45. FISHER, R. A., 6, 46, 149, 380. Fission 294. Flags, display of 28, 36. Flaws ill material 159, 170. Flying bomb hits 160. Fokker-Planck equation 358. Forward equations 358, 469, 473, 482. FRAME, J. S. 367. FRECHET, ~ 98,111,375 'FREEDMAN, D. 78. FrC£juency function 179. FRIEDMAN, B. (urn model) 119, 121, 378. FRY, T. C. 460. FURRY, W. H. 451. FURTH, R. 422; -'s formula 359. G.-M. Counters cf. Geiger counters. GALTON, F. 70, 256, 294; -'s rank order test 69, 94.

503

Gambling systems 198ff., 345. [cf. Betting.] Gamma function 66. Gauss (= normal) distribution 179. Geiger counters 11, 59; - type I 306, 315; general types 339; as Markov chain 425. GEIRINGER, H. 6. Generalized Poisson process 474. Generating functions 264; bivariate 279; moment - 285, 301. Genes 132ff. ; evolution of frequencies 135ff., 380, 400; inheritance 256; mutations 295; Yule process 450. Genetics 132ff.; branching process 295; Markov chains in - 379, 380, 400; Yule process 450. Geometric distribution 216; characterization 237, 328; convolutions, 269; exponential limit 458; generating function 268; - as limit for Bose-Einstein statistics 61; as negative binomial 166,224. GNEDENKO, B. V. 71. GONCAROV, V. 258. GOOD, I. J. 298, 300. GREENWOOD, J. A. and E. E. STUART 56, 407. GREENWOOD, R. E. 61. GROLL, P. A. and M. SOBEL 239. Grouping of states 426. Grouping, tests of 42. Guessing 107. GUMBEL, E. J. 156. HALSTROM, H. L. 460. Hamel equation 459. HARDY, G. H. and J. E. LITTLEWOOD 209. llnrdy'f law 135; nonapplicability to pairs 144. HARRIS, T. E. 297, 426. HAUSDORFF, F. 204, 209. Heat flow cf. Diffusion,· Ehrenfest model. Heterozygotes 133. Higher sums 421. Hitting probabilities 332, 339. HODGES, J. L. 69. HOEFFDING, W. 231. Holding times 458ff.; - as branching process 286.

504

INDEX

Homogeneity, test for - 43. Homozygotes 133. Hybrids 133. Hypergeometric distribution 43ff. (moments 232) ; approximation: by bi nomial and by Poisson 59, 172, by normal distr. 194; multiple - 47;as limit in Bernoulli-Laplace model 397. Hypothesis: for conditional probability 115; statistical cf. Estimation; Tests. Images, method of 72, 369. Implication 16. Improper ( defective) random variable 273, 309. Independence, stochastic 125ff.; - pairwise but not mutual 127, 143. Independent experiments 131. Independent jncrement~ 292 Independent random variables 217, 241; pairwise but not mutually - 220. Independent trials 128ff. Indistinguishable elements in problems of occupancy and arrangements 38ff., 58; (elementary examples 11, 20, 36.) In(inite moments, 246, 265; limit theorems involving - 90, 252, 262, 313, 322. Infinitely divisible distributions 289; factorization 291. Inheritance 256. [cf. Genetics.] Initials 54. Insect litters and survivors 171, 288. Inspection sampling 44, 169, 238; sequential - 363, 368. Intersection of events 16. Invariant distributions and measures (in Markov chains) 392tf., 407tf. (periodic chains 406). [cf. Stationary distribudOllS .]

Inverse probabilities (in Markov chains) 414. Inversions (in combinations) 256. Irradiation, harmful 10, 55, 112; Poisson . distribution] 6], 287 Irreducible chains 384, 390ff. Ising's model 43. Iterated logarithm, !aw of the 186, 204ff.; stronger form 211. (Number theoretical interpretation 208.)

KAC, M. 55, 82, 121, 378,438. KARLIN, S. and J. L. MCGREGOR 455. Kelvin's method of images 72, 369. KENDALL, D. G. 288, 295, 456. KENDALL, M. c. and D. SMITH 154. Key problems 48, 55, 141, 239. KHINTCHINE, A. 195, 205, 209, 244. KOLMOGOROV, A. 6, 208, 312, 354, 375, 389, 419, 461; -'s criterion 259 (converse 263); -'s differential equations 475; -'s inequality 234. [cf. Chapman-Kolmogorov equation.] Kolmogorov-Smirnov type tests 70. KOOPMAN, B. O. 4. Kronecker symbols 428. Ladder variables 305, 315. LAGRANGE, J. L. 185, 353. LAPLACE, P. S. 100, 179, 264. -'s law of succession 124. [cf. Bernoulli-Laplace model; De Moivre Laplace limit thee rem.] Large numbers, strong law of 258, 262; for Bernoulli trials 203. Large numbers, weak law of 243ff., 254; for Bernoulli trials 152, 195, 261; for dependent variables 261; generalized form (with infinite expectations) 246, 252; for permutations 256. Largest observation, estimation from 226, 238. Last visits (arc sine law) 79. Leads, distribution of 78ff., 94; experimental IllustratIOn 86ff.; (Galton's rank order test 69.) LEDERMANN, W. and G. E. REUTER 455. Lefthanders 169. LEVY, PAUL 82, 290. LI, C. C. and L. SACKS 144. Ltghlning, damagefrom 289, 292. LINDEBERG, J. W. 244, 254, 261. Linear growth process 456, 480. LITTLEWOOD, J. E. 209. LJAPUNOV, A. 244, 261. Logarithm, expansion for 51 Logarithmic distribution 291. Long chain molecules 11, 240. Long leads in random walks 78ff. Loss, coefficient of 466. Loss formula, Erlang's 464.

INDEX

LOTKA, A. J. 141, 294. Lunch counter example 42. LUNDBERG, 0.480. MCCREA, W. H. and F. J. W WHIpPIE 360, 362. MCGREGOR, J. and S. KARLIN 455. M'KENDRICK, A. G. 450. Machine servicing 462ff. [cf. Power supply.] MacroscopIc equllzbrzum 395ft., 456. [cf. Steady state.] MALECOT, G. 380. MALLOWS, C. L. and D. E. BARTON, 69. MARBE, K. 147. MARGENAU,

H. and G. M.

MUltl'Ht

41.

Marginal distribution 215. MARKOV, A. 244, 375. Markov chains 372ff.; - of infinite order 426; mixing of 426; superposition of 422. Markov process 419ff.; - with continuous time 444ff., 470ff. (Markov property 329.) MARTIN, R. S. (boundary) 419. Martingales 399. Match box problem 166, 170,238. Matches = coincidences 100, 107. Matching of cards 107ff., 231; multiple112. Mating (assortative and random) 134; brother-sister mating 143, 380, 441. Maxima in random walks: position 91ff., 96 (are sine laws 93), distributioIl 369. Maximal solution (in Markov chains) 401. Maximum likelihood 46. MAXWELL, C. 72. [cf. BoltzmannMaxwell statistics.] Mean cf. Expectation. Median 49, 220 Memory in waiting times 328, 458. MENDEL, G. 132. de Mere's paradox 56. MILLER, K. W. and H. A. ADLER 467. Minimal solution: for Kolmogorov differential equations 475; in Markov chains 403. MISES, R. VON: relating to foundations 6, 147, 199,204; relating to occupancy problems 32, 105, 106, 341.

505

Misprints 11; estimation 170; FermiDirac distribution for 42 58· Poisson distribution 156, 169. ' , Mixtures: of distributions 301; of Markov chains 426; of populations 117, 121. MOLINA, E. C. 155, 191. Moment generating function 285, 301. Moments 227; infinite - 246, 265. MONTMORT, P. R. 100. MOOD, A. M. 194. Morse code 54. MORAN, P. A. P. 170. Moving averages 422,426. Multinomial coefficients 37. Multinomial distribution 167, 215, 239; geIlerating function 279, maximal term. 171,194; randomized 216,301. Multiple Bernoulli trials 168, 171, 238. Multiple classification 27. Multiple coin games 316, 338. Multiple Poisson divtribution ] 72 Multiplets 27. MURPHY, G. M. and MARGENAU, H. 41. Mutations 295.

n andm 174. (n)r 29. Negation 15. Negative binomial distribution 164ff., 238; bivariate - 285; - in birth and death processes 450; expectation 224; generating function, 268; infinite divisibility 289; - as limit of Bose-Einstein statistics 61, and of Potya distr. 143, Poisson limit of - 166, 281. NELSON, E. 96. NEWMAN, D. J. 210, 367. NEWTON, I. 55; -'s binomialformula 51. NEYMAN, J ] 63, 285 Non-Markovian processes 293, 421, 426;

satisfying Chapman Kolmogorov equation 423, 471. Normal approximation for: binomial distribution 76, 179ff. (large deviations 192, 195); changes of sign 86; combinatorial runs 194; first passages 90; hypergeometric distribution 194; permutations 256; Poisson distribution 190, 194, 245; recurrent events 321; returns to origin 90; success runs 324. [cf. Central limit theorem.]

506

INDEX

Normal density and distribution 174; tail estimates 179, 193. Normalized random variables 229. Nuclear chain reaction 294. Null state 388. Number theoretical interpretations 208. Occupancy numbers 38. Occupancy problems 38ff., 58ff., 101ff., 241; empirical interpretations 9; multiply occupied cells 112; negative binomial limit 61; Poisson limit 59, 105; treatment by Markov chains 379, 435, and by randomization 301; waiting times 47, 225; (elementary problems 27, 32, 35, 55, 141, 237.) [cf. Boltzmann-Maxwell statistics; BoseEinstein statistics; Collector's problems.] Optional stopping 186, 241. Orderings 29, 36. [cf. Ballot problem; Runs, combinatorial.] ORE, O. 56. OREY, S. 413. p(k; A) 157. Pairs 26. PALM, C. 460, 462. PANSE, V G and P V SUKHATME 150. Parapsychology 56, 407. (Guessing 107.) Parking: lots 55, 479; tickets 55. Partial derivatives 39. Partial fraction expansions 275ff., 285, explicit calculations for reflecting barrier 436ff., - for fimte Markov chams 428ff.; for ruin problem 349ff., and for success runs 322ff.; numerical calculations 278, 325, 334. "Particle" in random walks 73. 342. l'a,ticulal solutions, method &f 344, 347, 365. PartltlOmng: of stochastic matrices 386; of polygons 283. Partitions, combinatorial 34ff. PASCAL, B. 56; -'s distribution 166. PATHRIA, R. K. 32. Paths in random walks 68. PEARSON, K. 173, 256. Pedestrians: as non-Markovian process 422; - crossing the street 170. PEPYS, S. 55.

Periodic Markov chains (states) 387, 404ff. Periodic recurrent events 310. Permutations 29, 406; - represented by independent trials 132, 256ff. ..Persistent leCUllellf evelli 310, limit theorem 335. Persistent state 388. Petersburg paradox 251. Petri plate 163. Phase space 13. Photographic emulsions 11, 59. n, distribution of decimals 31, 61. POISSON, S. D. 153. Poisson approximation or limit for: Bernoulli trials with variable probabIlmes 282; bmomlal dlstr. 153ft, 172, 190; density fluctuations 425; hypergeometric distr. 172; matching 108; negative binomial 172, 281; normal distr. 190, 245; occupancy problems 105, 153; stochastic proc@ss@s 461, 462, 480, 481; long success runs 341. Poisson distribution (the ordinary) 156ff.; convolutions 173, 266; empirical observations 159ff.; generating function 268; integral representation 173; moments 224, 228; normal approximation 190. 194, 245 Poisson distributions: bivariate 172, 279; compound 288ff., 474; generalized 474; multiple 172; spatial 159. (- combined with binomial distr. 171, 287, 301.) Poisson process 292, 446ff.; backward and forward equs. 469-470, generalized 474. Poisson traffic 459. Poisson trials (= Bernoulli trials with variable probabilities) 218, 230, 282. Poker: definition 8; tabulation 487. (Elementary problems 35, 58, 112, 169). POLLARD, H. 312. Polygons, partitions of283. POLYA, G. 225, 283, 360; -'s distribution 142,143,166,172; - process 480; urn model 120, 142,240,262,480 ( as non-Markovian process 421). Polymers 11, 240. Population 34ff.; - in renewal ~heory 334-335, 340; stratified - 117.

INDEX

Population growth 334-335, 450, 456. [cf. Branching processes.] Positive state 389. Power supply problems 149, 467. Product measure 131. Product spaces 128ff. Progeny (in branching processes) 298ff. Prospective equations cf. Forward equations. Qualzty control 42. [cl. Inspection sampling.] Queue discipline 479. Queuing and queues 306, 315, 460ff., 479; as branching process 295, 299-301 ~ general limit theorem 320; (a Markov chain in queuing theory 425.) Radiation cf. Cosmic rays; Irradiation. Radioactive disintegrations 157, 159, 328; differential equations for - 449. RAFF, M. S. 240. Raisins, distribution of 156, 169. Random chains 240. Random choice 30. Random digits (= random sampling numbers) 10, 31; normal approximation 189; Poisson approximation 155; references to - 21, 61. (Elementary problems 55, 169.) Random mating 134. Random placement of balls into cells cf. Occupancy problems. Random sampling ef. Sampling. Random sums 286ff. Random variables 212ff.; defective - 273, 309; integral valued - 264ff. normalized - 229. [cf. Independent -.] Random walks 67ff.• 342ff.; cyclical 377 , 434; dual- 91; generalized - 363ff., 368; invariant measure 408; Markov chain treatment 373, 376-377, 425, 436ff. ; renewal method 370; reversibility 415; - with variable probabilities 402. [cf. Absorbing barrier; Arc sine law; Changes of sign,' Diffusion; Duration ofgames; First passage times; Leads; Maxima,' Reflecting barrier; Returns to origin; Ruin problem.] Randomization method: in occupancy

507

problems 301; in sampling 216. [cf. Random sums.] Randomness in sequences 204; tests for42, 61. [cf. Tests.] Range 213. Rank order test 69, 94. Ratio limit theorem 407, 413. Realization of events, simultaneous 16, 99, 106, 109, 142. Recapture in trapping experiments 45. Recessive genes 133; sex-lInked - 139. Recurrence times 388; - in Markov chains 388. [cf. Renewal theorem.] Recurrent events 31Off.; delayed - 316ff.; Markov chain treatment of - 381-382, 398, 403; number of occurrences of a - 320ff.; reversibility 415. [cf. Renewal theorem.] Reduced number of successes 186. Reflecting barriers 343, 367ff.; invariant distribution 397, 424; Markov chain for 376, 436ff.; two dimensions 425. Reflection principle 72, 369. (Repeated reflections 96, 369ff.) Rencontre (= matches) 100, 107. Renewal of aggregates and populations 311, 334-335, 340, 381. Renewal argument 331. Renewal method for random walks 370. Renewal theorem 329; estimates to 340 (for Markov chains 443.) Repairs of machines 462ff. Repeated averaging 333, 425. Replacement ef. Renewal, Sampling. Residual waiting time 332, 381. Retrospective equations cf. Backward equations. Return process 477. Returns to origin: first return 76-78, 273, 313; - in higher dimensions 360; nth return 90, 274; through negative values 314, 339; number of 96; visits prior to first - 376. [cf. Changes of sign; First passage times.] REUTER, G. E. and W. LEDERMANN 455. Reversed Markov chains 414ff. RIORDAN, J. 73, 299, 306. ROBBINS, H. E. 53. ROMIG, H. C. 148. Ruin problem 342ff.; in generalized

508

INDEX

random walk 363ff.; renewal method 370; - with ties permitted 367. . Rumors 56. Runs, combinatorial42, 62; moments 240; normal approximation 194. [cf. Success runs.] RUTHERFORD, E. 170; RUTHERFORDCHADWICK-ELLIS 160. SACKS, L. and C. C. LI 144. Safety campaign 12l. Sample point 9. Sample space 4, 9, 13ff.; discrete 17ff. for repeated trials and experiments 128ff.; - in terms of random variables 217. Sampling 28ff., 59, 132,232; randomized - 216; required sample size 189, 245; sequential - 344, 363; stratified240; waiting times 224, 239. (Elementary probl~ms 10, 12,56,117,194) [cf Collector's problem; Inspection sampling SAVAGE, L. J. 4, 346. SCHELL, E. D. 55. SCHENSTED, I. V. 379. SCHROEDINGER, E. 294. Schwarz' inequality 242. Seeds: Poisson distribution 159; survival 295. Segregation, subnuclear 379. Selection (genetic) 139, 143, 295. Selection principle 336. Self-renewing aggregates 311, 334, 340. Senator problem 35, 44. Sequential sampling 344, 363. Sequential tests 171. Sera, testing of 150. Se, vel S cr. Queuing.. Trttnking ...Orohkms. Service times 457ff.; - as branching process 288. Servicing factor 463. Servicing problems 460, 479. [cf. Power supply.] Seven-way lamps 27. Sex distribution witkin families 11, 117, 118, 126, 169, 288. Sex-linked characters 136. SHEWHART, W. A. 42. Shoe problems 57, 111.

Shuffling 406; composite - 422. Simulation of a perfect coin 238. Small numbers, law of 159. SMIRNOV, N. 70, 7l. SMITH, B. and M. C. KENDALL, 154. SOBEL, M. and P. A. GROLL, 239. Sojourn times 82, 453. SPARRE-ANDERSEN, E. 82. Spent waiting time 382. Spores 226, 379. Spurious contagion 12l. Stable distribution of order one half90. Stakes (effect of changing -) 346. Standard deviation 228. Stars (Poisson distribution) 159, 170. States in a Markov cham 374, 446; absorbing 384; classification 387. Stationary distributions: of age 335, ·340; of genotypes 135. [cf. Invariant distributions and measures; Steady state.] Stationary transitionprobabilities 420, 445 Steady state cf. Equilibrium, macroscopic; Invariant distributions and measures; Stationary distributions. STEINHAUS, H. 166. Sterilization laws 140. STIRLING, J. 52; -'s formula 52, 66,180. Stochastic independence cf. Independence. Stochastic matrix 375; doubly - 399; substochastic matrix 400. Stochastic process (term) 419, 444ff.; general limit theorem 318; - with independent increments 292. [cf. Markov process.] STONEHAM, R. G. 32. Strategies in games 198, 346. Stratification, urn models for 121. Stratified populations 117. Stratified sampling 240. Street crossing 170. Slruggle for existence 450. STUART, E. E. and J. A. GREENWOOD, 56, 407. Substochastic matrix 400. Successes 146; reduced number of -186. Sura"" rum' as reCllfrent events 305, 322ff., 339; Markov chain for - 383; Poisson distribution for long - 341; of several kinds 326, 339; r successes before s failures 197, 210.

INDEX

Succession, Laplace's law of 124. SUKHATME, P. V. and V. G. PANSE 150. Sums of a random number of variables 286ff. Superposition of lJarkov processes 422. Survival cf. Extinction. SWED, F. S. and C. EISENHART 42. Systems of gambling 198, 346. Table tennis 167 Taboo states 409. TAKACS, L. 69. Target shooting 10, 169. Telephone: holding times 458; traffic 161,282,293; trunking 191,460,481. [d. Busy period; Queuing.] Tests, statistical: - of effectiveness 70, 149-150; Galton's rank order -69, 94; Kolmogorov-Smirnov tests 70; of homogeneity 43, 70; - of randomness 42, 61; sequential 171. (Special - of: blood 239; clumsiness 56; dice 148; guessing abilities 107; randomness of parking tickets 55; sera and vaccines 150.) [cf. Estimation.] Theta functions 370. THORNDIKE, F. 161. Ties· in billiards 284; in games with several coins or dice 316, 338. [cf. Returns to orzgin.] Time-homogeneous cf. Stationary. TODHUNTER, I. 378. Traffic of Poisson type 459. Traffic problems 170, 422. [cf. Telephone.] Transient recurrent event 310. Transient state, 388-390, 399ff., 438. Transition probabiliiies: in Markov chains 375, 420, (higher 382), in processes 445,470ff. Trapping, animal 170, 239, 288, 301. Trials (independent and repeated) 128ff.; random variable representation 217ff. Trinomial cf. Multinomial. Truncation method 247. Trunking problems 191, 460, 481.

509

Turns: in billiards 284; three players taking - 18, 24, 118, 141. UHLENBECK, G. E. and M. C. WANG 378. Unbiased estimator 242. Unessential states 389. Unfavorable ''fair'' games 249, 262. Uniform distribution 237, 285. Uniform measure 408. Union of events 16; probability of - 101. Urn models 188ff.; and Markov chains 373. [cf. Bernoulli-Laplace; Ehrenfest; Friedman; Laplace; Polya.]

Vaccines, testing of 150. calculated fr om Variance 227ff. ; generating functions 266; - of normal distribution 179. VAULOT, E. 479. Volterra's theory of struggle for existence 450. Waiting lines cf. Queuing. Waiting times: memoryless - 328, 458; residual - 332, 381; spent - 382. (- in combinatorial problems 47; for recurrent events 309, 317.) [cf. Duration ofgames; First passage times.] WALD, A. 171, 248, 344, 363; and J. WOLFOWITZ 43, 194. WATSON, G. S. 239. WAUGH, W. A. O'N. 367. Welders problems 149, 467. Weldon's dice data 148 149 WHIPPLE, F. J. W. and W. H. MCCREA 360, 362. WHITWORTH, W. A. 26, 69. Wiener process 354. WISNIEWSKI, T. K. M. 238. WOLFOWITZ, J. and A. WALD 43, 194. Words 129. WRIGHT, S. 380. Wronbcr number, connections to 161. X-rays cf. Irradiation. YULE,

G. U. (process) 450,

4n.

An Introduction to Probability Theory and Its Applications, Vol. 1, 3rd Edition - PDF Free Download (2025)
Top Articles
Latest Posts
Recommended Articles
Article information

Author: Tuan Roob DDS

Last Updated:

Views: 6509

Rating: 4.1 / 5 (42 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Tuan Roob DDS

Birthday: 1999-11-20

Address: Suite 592 642 Pfannerstill Island, South Keila, LA 74970-3076

Phone: +9617721773649

Job: Marketing Producer

Hobby: Skydiving, Flag Football, Knitting, Running, Lego building, Hunting, Juggling

Introduction: My name is Tuan Roob DDS, I am a friendly, good, energetic, faithful, fantastic, gentle, enchanting person who loves writing and wants to share my knowledge and understanding with you.