This theory was the substance of a talk that was given to the Annual General Meeting of the Statistical Association of Manitoba, held in conjunction n with the Statistical Association of America, Red River Chapter on the 16th day of April 1988 in the Senate Chambers at the University of Manitoba.
For the most part, I tend to deal with rare events which are considered as independent events because of the length of time between each occurrence. For example, if you have a record high temperature for the date, they are found by examining all records for the date and they are a year apart. If p is the probability that an event will occur and q is the probability that it will not occur then one is governed by the relationship:
p + q = 1 …1)
If you require the probability that an event will occur over a two-year period, it could occur the first year, or the second year, or for that matter in both years. In this case, it is easier to calculate the probability that the event will not occur in either year as q2, then the probability that the event will occur in a two-year period would be given by:
P2 = 1 – q2 …(2)
And in n successive occasions:
Pn = 1 – qn …(3)
Which may be written:
Pn = 1 – (1 – p)n …(4)
G. R. Kendall, a Climatologist and statistician in the Canadian Weather Service for many years, was the first to show this and it is published in Helsinki, Finland.
You will notice that the longer the Return Period, the more accurate Equation(6) is. Both the Table and Equation(6) reveal that there is only a 63% chance that an event will occur within its Return Period. There is a healthy 37% chance that it will not.
While statisticians fuss about a curve-fit and require a 95% confidence factor in most cases, they willingly accept a 63% Return Period. Return Period is a misnomer and implies that the event is bound to return. I have suggested for years something else which will be shown.
You may also show that where p is the probability of an event and t is the time period, starting now:
Pt = 1 – 1/(ept) …(6)
Which is an extremely useful formula.
The value of e = 2.718281828459045.. approximately.
The observed frequency is an actual happening and is a constant at the time you sit down to study it. By contrast, a probability is a variable, and often taken as the area under some curve and is theoretical, or abstract in nature. Some people use the observed frequency, or relative frequency (similar to a mean) as a probability in the case of extremely rare events because of the lack of data. The usual procedure is to examine the data, try to fit a curve to it and obtain the probability from the curve.
Probabilities must be calculated in advance, whereas observed frequencies are tallied afterwards,
CHANGING TIME PERIODS
Equation(6) is a very useful tool for changing time-periods.
There is a greater likelihood of an event in a long time period than in a short period. However, this is compensated by the fact that there is a greater likelihood of an event happening earlier than later. Statisticians have noticed that there are the accident prone people which help make it so.
You have seen a tornado in your area where you live twice in 40 years. What is the probability that you will see another one?
Here, the observed frequency is 2 and the relative frequency is 2/40 = 1/20, or .05. The Return Period is 20 years. You have to assume that no other data is available and that the average human being lives to 70 years of age. In the absence of a math model, I shall use the observed frequency as p which is “good at any time,” and 30 years as my life expectation. One writes:
P30 = 1 – 1/(e.05X30) …(8)
= 1 – 1/e.1.5 =.77687
Thus, there is a 78% chance that you will see the event again.
PROBABILITY MEAN TIME
I have suggested a concept called “Probability Mean Time,” or, PMT to replace Return Period. The PMT is the length of time which will give you the 50:50 split, i.e. the same as the toss of a coin. It would correspond closely to a similar concept called the “half-life” used already in some other fields of science. This is obtained by solving the equation:
PMT = ln(.5)/ln(1-p) …(9)
For the tornado above, if p=.05, and R=20 years, then the PMT is:
PMT = ln(.5)/ln(.95) = 13.51 years, or 14 years.
Notice that I have rounded off to the best estimate for whole years.
If the Return Period is 1000 years, what is the PMT?
PMT = ln(.5)/ln. = 692.800 years, or 693 years.
THE PROBABILITY OF AN EARTHQUAKE AT VICTORIA
On the 2nd day of May 1996, Seattle, Vancouver, Victoria and vicinity were rocked by a =n earthquake which registered 5.3 on the Richter scale. A newspaper article revealed that there were only four measuring 7, or more in 124 years of records. This would be 130 years now. What is the likelihood that Victoria will experience an earthquake of magnitude 7, or more?
Although semi-logarithmic graph paper could be used to rank and plot the earth tremors , records are scant and so I use relative frequency rf as an a posteriori probability p Many people are not satisfied with
P = rf = 4/130 = .03078692308
As the answer, although that probability is considered by some as being good at any time. This probability is good for the next year, because the data is measured in years. Most people want to know what is the probability in a lifetime, a generation, or some other more meaningful period of time. Equation 6 can be used to provide an answer.
Table 2. The probability shown for different time periods
Odds are 50:50 that an earthquake of magnitude 7, or more, will occur Is
ln(.5)/ln(1- .0307692308) = 22 years
You can see that 22 years agrees with the Table 2, However, there may some doubt that the past occurrences realistically represent Victoria because of such little data. Seismologists have much better data to work on for their estimates.
PROBABILITY DENSITY FUNCTION
A learned Bayesian Statistician indicated that previously I had not defined precisely enough the relationship between the event and time. A non-Bayesian Statistician did not like tossing the coin 2 and a half times.
Therefore, by defining the Probability Density Function, and allowing integers only in the result should be sufficient to satisfy both grounds.
Even though the functions are continuous, , they are not meant for continuous usage. Eulerian equations usually have integer coefficients. n-Dimensional equations usually require n to be an integer.
There are thousands of probability density functions which will do the job. Perhaps, the best ones are the simplest.
F(p,t) = y = p/ept …………..(10)
This is probably the simplest of them all which will do the job. For any given probability p, t is allowed to vary. This does not preclude the situation where for any given time interval t, the probability is allowed to vary. In the equation if t>0, then y is positive. If t is infinite, then y=0. Equation 10 expresses a monotonic decreasing function. It is non-negative and single-valued for all t with given p. It can be integrated so that the area under the curve represents the probability. It can be demonstrated that Prob(0,1) for the unitary time period approximates p.
Now, these equations are not just my own concoction, They have been around for quite some time in one form, or another. Indeed,
P = 1 – 1/e …(11)
Equation (11) at its simplest form, has been well-handled in Martin Gardner’s Mathematical Games , Scientific American, October 1961, pp. 160 to 166.
Last updated Thursday March 22, 2007
Webmaster: Harvey Heinz firstname.lastname@example.org