Stats 2 notes

Discrete random variables

Discrete random variable A random variable for which a list of all possible values could be made

Probability distribution A list or table showing the probability of each value occurring

The sum of the probabilities in the probability distribution equals 1

Probability function A function which provides P(X=x) for all x

Cumulative distribution function A function which provides P(Xx) for all x

Mean μ=E(X2)=xipi (The expectation of X)

Variance σ2=E(X2)E(X)2=xi2pixipi

Expectation of a function of a random variable

E(g(X))=g(xi)pi

Mean and variance of functions of a random variable

E(aX)=aE(x)E(X+b)=E(X)+bE(aX+b)=aE(X)+b

Var(aX)=a2Var(x)Var(X+b)=Var(X)Var(aX+b)=a2Var(X)

The poisson distribution

Conditions

P(X=x)={eλλxx!xN00otherwise

The recurrence formula

In order to calculate the succession of values of x: P(X=1),P(X=2),...

P(X=xn)=λx×P(X=xn1)

Sum of independent random variables

Two or more independent Poisson distributions can be combined as follows

If X1P(λ1),X2P(λ2),...,XnP(λn) then

k=1nXkP(k=1nλk)

This also shows that if XP(λ) then

nXP(nλ)

Example

At a checkpoint an average of 300 cars pass per hour and the mean time between lorries is 5 minutes.
Find the probability that exactly 6 vehicles pass the checkpoint in a 1 minute period.
300 cars per hour 5 cars per minute
1 lorry per 5 minutes 0.2 lorries per minute

5.2 vehicles per minute
P(X=6)=e5.2×5.266!0.1515

Binomial

Questions on the poisson distribution can include the use of the binomial theorem.

Example (Following from the above example)

What is the probability that exactly 6 cars pass the checkpoint in at least 3 or the next 4 minutes?
Probability of success = 0.1515, n=4

P(X3)=P(X=3)+P(X=4)=(43)(0.1515)3+(44)(0.1515)40.0123

Mean and variance of a Poisson distribution

Mean = Variance = λ

Continuous random variables

Continuous random variable A variable which can take an infinite number of possible values

P(X=0)=0P(X<t)=P(Xt)

Probability density functions f(x)

Cumulative distribution function F(x)

Example

Find F(x) for the following probability density function

f(x)={x2180x314(5x)3x50otherwise

The function f(x) must be integrated in sections
The first section is a quadratic. If 0<c<3 then
P(X<c)=0cx218dx=c354

Using the above formula, P(X<3)=0.5
The second section is linear. If 3<c<5 then
P(X<c)=12+3c14(5x)dx=18(10cc217)

F(x) is then given by the piecewise function
f(x)={0x<0x3540x318(10xx217)3x51x5

Rectangular / Continuous uniform distribution

A rectangular distribution is given by

f(x)={1baa<x<b0otherwise

and

F(x)={0xaxabaaxb1xb

Mean and variance of a rectangular distribution

E(X)=μ=12(a+b)

Var(X)=σ2=112(ba)2

Given the mean and the variance of a rectangular distribution, a and b can be found by solving simultaneously.

Estimation

Mean

x¯=xn=fxf

Sample variance

(σn)2=(xx¯)2n=x2nμ2=fx2fμ2

Unbiased estimator of the population variance

(σn1)2=nn1×(σn)2

Confidence intervals

Interpretation of confidence intervals

For a k percent confidence interval for the mean, across a number of different samples k percent of the confidence intervals for the mean will contain the true population mean.

Writing confidence intervals

Confidence intervals should be written to a high degree of accuracy in the format (lower limit, upper limit)

Calculating confidence intervals

If the population variance is given for a normally distributed population the Z tables are used and the confidence interval is given by:

x¯ ± Zα×σ2n

If the population variance is unknown, but the sample size is greater than 30 the Z tables are used due to central limit theorem. The above formula is still used, with the following assumptions:

If the population variance is unknown, and the sample size is less than 30, the t tables are used with ν=n1 degrees of freedom. The confidence interval is then given by:

x¯ ± tα×S2n

Example

20 bottles are selected from a production line. The volume of liquid in each is recorded as x ml

x=1518.9(xx¯)2=7.2895

Stating any assumptions made, construct a 95% confidence interval for the mean.
n=20mean=xnsample variance=(xx¯)2nunbiased estimator of population variance=(xx¯)2n1

Using ν=201=19 degrees of freedom, the 95% t values is 2.093
The confidence interval is then given by
75.945 ± 2.0893×0.3836620=(75.66,76.23)

The assumptions made were that

Hypothesis testing

Procedure

Step 1

State the null hypothesis and the alternate hypothesis.

H0: μ=a

H1: μa

Two tailed test

Step 2

Choose the test statistic

For a known variance, or n>30

Z=xμσ2n

For an unknown variance, and n<30

t=xμσ2n

Step 3

Use tables to find the critical value

If a t distribution is used, there are n1 degrees of freedom

State the critical value, or draw a graph and mark it with the critical value and the test statistic

Step 4

Conclude the hypothesis test

As [condition] there is / isn’t significant evidence at the % level that the mean differs from a.
We therefore reject/accept H0 and conclude that… [context].

Errors

Type 1 error

This type of error occurs when H0 is reject, and H1 is accepted when H0 is in fact correct.

The probability of obtaining a type 1 error is the level of significance of the test hypothesis.

Type 2 error

This type of error occurs when H0 is accepted, when it is in fact false.

The probability of a type 2 error is not fixed, since it depends upon the extent to which the value of μ deviates from the value given in H0. If the value of μ is closed to the value given in H0 the probability of a type 2 error is large.

Chi-squared goodness of fit test

Calculating

Calculating expected frequencies

For a table of values, the expected frequency of each value is

row×columntotal

If the expected frequency of a particular value is less than 5, its row or column must be merged.

The test statistic

For the general case

χ2=(OiEi)2Ei

For the case where the table is 2×2 we have only one degree of freedom, and the test statistic is

χ2=(|OiEi|0.5)2Ei

The critical value

The degrees of freedom, ν , of a chi-squared contingency table is given by n1 where n is the number of groups in the table.

The critical value is then found from the table or from a calculator.

Example of the comparison of two variables

Natives of England, Africa, and China were classified by blood group

O A B AB
English 235 212 79 83
African 147 106 30 51
Chinese 162 135 52 43

Is there any evidence at the 5% level that there is a connection between blood group and nationality?
H0: There is no connection between blood group and nationality
H1: There is a connection between blood group and nationality

For each cell we find the expected value by multiplying the row total and column total before dividing by the table total

24.816 206.65 73.44 80.74
136.10 113.33 40.28 44.28
159.74 133.02 47.27 51.97

We have that ν=(41)(31)=6 degrees of freedom and then that

χ62(5%)=12.592

From the table of expected frequencies
(OiEi)2Ei=8.39

As 8.39<12.592 we do not reject H0 at the 5% level and therefore conclude that there is no connection between nationality and blood group.