nm0940: today's lecture follows a [0.6] er very [0.2] directly on from what i 
was saying [0.5] er yesterday and i'm afraid only one overhead is working so 
i'll have to be over here all the time [1.1] er [0.5] er yesterday i was 
introducing the idea of significance [0.5] and er [0.8] i i talked about five 
key ideas which i'm going to run over very quickly to start with today [0.4] 
and then today i'm going to add two more new ideas t-, [0.3] on the end [1.0] 
well the five key ideas that i was talking about yesterday [0.4] was er the 
null hypothesis [0.2] the [0.4] er test statistic [0.6] the er [0.6] P-value 
the critical region [0.4] and a significant level [0.5] you know so those are 
five key ideas [0.2] that we've got to [0.5] get on board [0.2] before er we 
can we can go any further really so let me qui-, quickly review [0.4] er these 
so first of all [1.4] the null hypothesis i'll just mention verbally what they 
are null hypothesis H-sub-zero [0.2] so er a rather at a rather general level 
the null hypothesis is just some statement [0.2] about the distribution of the 
data we have observed [2.3] er test 
statistic [1.8] that's the notation i used yesterday a test statistic [0.2] is 
a function of the data [0.3] which we look at in order to inform us [0.3] about 
whether we think H-nought is reasonable or not [0.3] in the light of the data 
so usually the test statistic [0.2] is some kind of difference [0.3] between 
what we've observed [0.2] and what we would expect under the null hypothesis 
and the idea is that the the bigger the value of the test statistic i-, is [0.
2] the more doubt we have that H-nought is actually reasonable in the light of 
the data [0.6] so if we're going to be accepting and rejecting H-nought which 
we're going to be [0.3] doing with ever increasing frequency as we go along [0.
2] er [0.2] the the logic is that we will want to reject H-nought think H-
nought is unreasonable [0.2] when T is large [0.7] and the larger it is [0.3] 
the more [0.2] unreasonable we think H-nought is and that is encapsulated by 
the [0.9] P-value the third thing [0.3] P-value is the probability [1.7] that 
by chance [0.8] if the null hypothesis is 
true [0.2] i would get a value of the test statistic [0.3] greater than than or 
equal to the one i actually got so the notation [1.3] that i talked about [0.8] 
yesterday [0.7] that's the probability [0.2] so [0.3] er we just pretend for a 
moment that the null hypothesis actually is true [0.3] and we ask ourselves 
well what's the chance [0.2] that we would have come up with a fairly good test 
statistic [0.2] which i-, which is as or more extreme [0.4] than the one that's 
er one we've actually got notice the convention capital letters [0.2] and small 
letters that i keep on going on about [0.4] er well you really see the 
importance of that [0.4] convention here [1.2] so that's the er [0.3] er [0.8] 
that's the P-value [0.2] critical va-, critical region was the fourth [1.3] 
idea i talked about yesterday so [0.2] er if we had a significance test for 
that sum that sum procedure for deciding [0.3] on the basis of X whether or not 
[0.3] H-nought is reasonable or 
unreasonable [0.3] the critical region is just [0.4] the set of all possible 
sets of data that would lead us to think [0.2] that H-nought was unreasonable 
[0.2] or in the jargon [2.3] it would er [0.6] sets of data which would lead us 
to reject [0.7] H-nought or if you like to decide [0.5] if H-nought [0.5] is [1.
8] false [1.8] so er [0.8] so the first thing i'm really going to do today 
formally is just to [0.2] just to point out that the the sensible critical 
region should be those values of the data X such that the P-value's 
sufficiently small [1.1] so that's the critical region that's that's a a 
general concept applied with any decision procedure we might [0.5] er propose 
[0.6] and er the significance level alpha [0.8] everybody uses that notation 
the significance level is the probability [0.6] that if [2.5] if H-nought is 
true [1.6] that we would by chance [0.4] get [0.3] a data set [0.3] that was in 
the critical region [1.1] so these were the five things that i [0.3] i talked 
about [0.3] er [0.2] yesterday and these are five 
key ideas [0.3] er that understanding what these are [0.3] is absolutely 
mandatory if we're going to get anywhere at all [0.3] in talking about er 
significance [1.3] so first things first thing the first thing i want to do 
today is to [0.3] is to formally [0.3] er try and explain what the connection 
is [0.2] between the P-value [0.3] and the significant level [0.7] and the idea 
is [0.4] i've already mentioned it very briefly idea is that [0.3] that a 
sensible critical region [0.3] should be the sets of data whose P-values are 
less than something [0.9] and that [0.3] is none other than the alpha [0.9] so 
i'm going to set that round to [1.8] as a theorem [1.3] although you can really 
hardly justify it in term-, [0.2] in a mathematical sense [0.6] the theorem 
says [6.1] the significance test [2.1] given by [1.4] this in particular [1.0] 
critical region and the critical region we're thinking about is the sets of 
data such that the P-value is less than the threshold [0.7] so 
it's the set of [0.4] data X such that the P-value [1.4] corresponding to [1.1] 
X is less than or equal to [0.9] a threshold alpha [0.2] and the theorem says 
[0.3] that if we take that as the critical region in other words if we [0.4] 
decide H-nought is false when the P-value is less than alpha [0.3] that is 
precisely a significance test [0.2] with significance level [0.2] alpha [2.7] 
so this thing has [4.5] significance level [1.9] alpha [4.7] and the proof of 
it [1.6] is is is [0.2] w-, once you see the idea if you prove it's completely 
trivial and i-, and it's easi-, easier to rather rather than trying to write 
write it out formally [0.4] it it's easiest just to [0.9] er look at a picture 
so let's think of [0.3] let's think of the values of the test statistic [2.2] 
so so er here's 
the distribution [0.3] of the test statistic [0.2] we would expect to see [0.9] 
if the null hypothesis is true [1.1] so [0.4] looks like a normal distribution 
of course it doesn't have to be so we've got some distribution for T under H-
nought now [0.2] what what are we what are we doing with the P-value what we do 
is we [0.3] we locate on the scale [0.4] of the test statistic we locate [0.5] 
a particular value [0.4] of the test statistic we've observed so notice a 
little-X here [0.2] big-X is the general one [0.8] so er [0.8] so we locate the 
value of the [0.2] test statistic we've actually observed [0.3] and then [1.3] 
from the definition [0.2] which you can [0.3] just see still at the top there 
on the screen [0.2] the P-value [1.0] is this area isn't it it's the 
probability [2.2] it's the probability [0.2] of for getting by chance [0.2] a 
value of T at capital-X bigger [0.3] than T [0.4] at little-X so what we have 
to show [0.5] is that if we consider all possible sets of data [0.4] such that 
this area is less than alpha [1.1] then that has significance level [0.2] alpha 
[0.4] so what we have to show [1.1] and this black thing's getting a bit [1.2] 
worn out with this i'm going to do it in blue now what what what we have to 
show [0.7] is that the probability [2.3] of X belonging to the critical region 
[0.9] calculated under the [1.1] assumption of H-nought is equal to alpha is 
what we have to show [1.0] so what is this so from the [0.5] from the 
definition of C in the same theorem this is the probability [1.0] that P-X [0.
8] is less than or equal to alpha [1.8] by definition of what C is [1.8] and 
that this is this is now where we use the picture when is [1.1] when is this P-
value this area i shaded [0.2] when is this less than alpha [0.8] well o-, [0.
2] obviously it's less than alpha [0.2] if the T [0.4] if if the er left-hand 
end of the shaded area [0.2] is beyond the one-minus-alpha quantile [0.5] 
remember about quantiles [0.2] so this is exactly the same thing as i say [0.3] 
that the [0.5] this is the 
probability [0.4] that the value T i get [0.4] is greater than or equal to [1.
0] the point under this distribution [0.2] which cuts off area alpha to the 
right and that's the [1.1] notation we used before that's the one-minus-alpha 
quantile [0.4] of the distribution at T and that everything is calculated [0.2] 
under the assumption [0.2] of the [0.5] null hypothesis okay so Q [1.1] Q this 
is the one-minus alpha [2.3] quantile [1.2] of T [1.1] that's what Q is [1.8] 
so what is the chance now [0.3] that a random variable [0.2] by cha-, by chance 
[0.2] will give you a value greater than or equal to the one-minus alpha 
quantile [0.3] well [0.7] the the the suffix one-minus-alpha means you have a 
chance one-minus-alpha to the left [0.4] so the chance to the right is 
therefore one minus one-minus-alpha [0.2] which is alpha- [1.7] T okay that's 
the proof [0.3] it's kind of trivial really it's just a matter 
of understanding [0.2] what probabilities we're [0.2] we're we're we're we're 
talking about [1.3] so er so this shows that this significance test [0.2] 
rejecting H-nought when the P-value is less than alpha [0.5] is a significance 
test [0.3] with significance level [1.0] alpha now in a sense this is extremely 
subtle because we n-, we are two interpretations [0.5] of of alpha [0.3] i'm 
sorry i don't have another overhead to [0.8] switch over there so i hope you're 
not losing too much of this on the screen [0.6] okay can i just emphasize this 
by [1.9] just noting up here [1.9] that this theorem [0.3] gives us [0.2] two 
interpretations [0.4] of this quantity alpha [2.4] the first interpretation [1.
4] is the interpretation [1.5] just see it in the top there the first 
interpretation [0.2] is alpha is a threshold [0.3] for the P-value [2.3] so 
this is a threshold [2.2] for P-value [0.8] and the P-value [0.3] is a measure 
of how surprised we are [0.6] to get a particular value of our test statistic 
so so er this interpretation [0.6] is is is saying something like alpha is a 
measure of how 
surprised we've got to be [0.3] to reject [0.3] H-nought [4.9] so i'll write 
that in [0.3] so how surprised [3.1] we have [2.1] got [0.6] to be [4.7] before 
we reject [1.5] H-nought by reject i mean declare it to be false or think it to 
be [0.5] [0.3] okay so first interpretation then [0.3] threshold of the P-value 
how surprised [0.2] we have to be with the data set in front of us [0.2] in 
order to think that H-nought is false [0.6] so it's a kind of measure of 
surprise it's a measure of how we think [0.3] about the data [1.2] now the 
second interpretation [0.3] is given by the result of the theorem that alpha is 
a significance level now what is that [0.2] that is an error rate [1.8] so this 
is the probability [0.9] that we're in the critical region given H-nought is 
true now what do we mean to say [0.4] the test that it the data in the critical 
region what we mean is that we reject [0.4] or decide H-nought 
is false [0.6] so when alpha [1.5] is a probability [1.5] so [1.2] that you'll 
decide H-nought is false [1.3] and that but that probability is calculated [0.
2] on the assumption [1.0] that H-nought [1.9] is true [0.8] so you know it's 
an error rate it it t-, tells you how often you make a mistake [0.6] if you use 
this for a significance test [0.4] given by comparing P-values [0.2] with this 
threshold [0.8] and so this is a [1.7] this is an error rate [0.7] it's a 
natural [0.4] long run frequency probability [1.5] and er in order to exercise 
this error rate [0.4] er [0.2] idea [1.3] a little bit more [1.0] carefully 
some people er use the jargon type one error rate [7.9] it's a type one error 
rate now [0.2] er why it's called type one error rate will be clear hopefully 
in twenty minutes' time [0.3] when i've talked about a 
type two error rate [0.6] but it's a type one error rate that's because it's 
just [0.2] it's about making an error in one direction only [0.8] if there if H-
nought is true what's the chance we think it's false [0.2] of course you might 
make an error the other way round as well [0.5] and that's what i'm going to 
talk about later but for the moment it's just one [1.0] thing okay so two two 
subtly different interpretations [0.3] at this point here alpha [1.9] well let 
me talk about an example now [4.3] just to reinforce again this idea of what H-
nought is what a test statistic is and what P-values are et cetera [1.4] and 
again i'm sorry this silly thing isn't working [1.0] well this is an example in 
genetics and it it's an example of historical [0.6] er [0.3] interest i think 
i've mentioned once or twice already [0.6] that er early work in bio-, biology 
particularly in quantitative genetics [0.5] er [0.2] ha-, had a very 
informative role [0.3] in the early development of statistics [0.9] and er in 
example five which i [0.3] gave 
out to you the other day [1.2] er [0.3] er there there there is a rather famous 
experiment from Charles Darwin which i've put in one of the questions for you 
to look at [0.7] and here's another famous experiment done [0.4] by a 
geneticist in er the early nineteen-twenty [0.2] and and and this this work is 
done by somebody [0.8] called er Frets who was a geneticist [0.4] and published 
his paper in nineteen-twenty-one [0.2] and er Frets was interested in this er 
question [0.2] of the inheritance [0.2] of human characteristics [0.5] we all 
we're all very familiar with the idea [0.3] that er human characteristics like 
[0.2] facial features for example [0.2] do tend to be inherited [0.4] how often 
have we seen [0.6] mothers and daughters looking really very similar and 
sisters looking similar and brothers looking similar of course we [0.2] see 
that all the time don't we [0.2] so so there is a strong [0.2] inheritance in 
facial characteristics and and Frets was one of the first [0.5] biologists who 
really tried to get to grips with this [0.3] quantitatively he asked the 
question well how can we measure how much [0.5] of facial features are 
inherited [0.2] and how much are just [0.3] random occurrences that nobody can 
explain [0.7] and er Frets did a a rather famous experiment he was interested 
in [0.2] people's faces and and he [0.2] he took a number of families and tried 
to compare faces of different brothers [0.2] in the same family [1.0] and er [0.
9] and er [0.3] he [2.7] he w-, he was particularly interested in [1.4] head 
length now length is a kind of funny [0.6] word to use but the head length is 
actually the distance between you sh-, you shut your mouth you see you start i 
stop talking [0.3] and it's the dista-, it's the distance between here [0.2] 
and the top of my head okay and you measure that in millimetres [0.2] and 
that's your head length [0.7] i suppose it's a length if the person's lying 
down flat with a tape measure [0.4] it's a kind of natural length isn't it so 
so what i'm trying to express here was he measured he measured people's [0.2] 
head length he measured this er [0.2] 
i call it height really [0.2] this measurement anyway [0.4] and er what he did 
was he he he found a number of [0.3] he he he found a number of families a 
sample of families where where he had [0.3] er [0.2] er two or more brothers in 
the same family [0.5] and he measured the head heights or head lengths he 
called them [0.4] er for the [0.2] first son and for the second son [0.8] and 
he tried to see how similar they were [0.9] and er basically the idea is to is 
to show that [0.3] that that brothers of the same family have faces which are 
much more similar to each other [0.2] than just taking the sample generally 
from the population [0.5] we know now that that is of course true [0.4] but a 
hundred years ago it wasn't obviously true [0.3] and that's what he tried to do 
[0.4] find out so so this is this is just a little extract from what what he 
did so what so what what he er [1.1] what what he [0.2] what he measured which 
i'll now call X [0.7] what he measured was the [0.6] the value of L this this 
er this dimension of of of the head [0.3] for the first [1.2] son in a family 
[1.1] and he also measured it for the second son and took the 
difference [1.7] so the particular measure i'm going to talk about from his 
work [1.5] is er the difference between the value for the first son [0.5] minus 
the value [1.3] for the second son [0.5] and he did this for twenty-five 
families [0.7] and so he got twenty-five values of X and these are my data [0.
3] and i will tell you in the next few minutes er you know [0.4] er er talking 
about now what what question was he interested in well what he's interested in 
[0.4] is er how similar these things are in particular one thing he might want 
to know [0.3] is er [1.0] obviously there are differences here 'cause there's 
an order effect [0.2] this is the first son this is the second son [0.2] so 
maybe [0.4] er [0.7] the m-, the mother by definition is getting older before 
she gets her second son you see so maybe the son changed over time so one thing 
he looked at was time trends [0.6] and so the the question [1.1] which i want 
to [0.4] talk about now is the s-, surface question really of his work [0.2] 
the 
question is [0.3] er [0.7] okay [2.4] does L [1.9] tend to get bigger [2.3] or 
smaller [1.9] is there any evidence [0.4] that the son the first son [0.2] has 
a bigger head than the second son or vice versa [2.5] so if er [0.5] i-, if er 
if the value of L gets bigger then of course X is negative [0.5] and if the 
value of L gets smaller then [0.4] X is er positive so really he's interested 
in [0.3] w-, the sign of X essentially [1.7] so er these are his data [1.2] so 
i'm [2.1] somewhat frustrated i don't know if have a projector 'cause i al-, 
really should have put the data on the other projector so [0.3] it's all got to 
go here now i i'm going to arrange them there were twenty-five [1.6] twenty-
five families he measured from these these were his er [0.4] h-, his data [1.2] 
and i'm going to arrange these data [0.2] in the kind of way that statisticians 
usually arrange data [1.3] in what's called 
a stem and leaf plot [1.3] and you'll see [0.2] in a moment why it's a kind of 
that's a sensible way of writing our data so here are three or four values [0.
4] er minus-nine minus-nine minus-seven minus-six and i write them in order [0.
2] from the m-, [0.2] biggest minus-one to big plus-one you see [0.7] and then 
the then then there were some more families which [1.0] you you get two minus-
four er two minus-fives two minus-fours a minus-three [0.6] and two minus-ones 
[1.4] and then then then there was a nought there was a family where [0.3] 
these are millimetre measurements by the way [0.3] so this is a family where to 
the nearest millimetre their here might be two exactly the same [0.7] and there 
was a one and there was a two and there was a four [0.9] and then there were 
two [0.5] families with plus-five [0.8] seven [0.2] eight [0.8] and nine there 
was a family with ten a family with twelve [1.0] family with twelve and a 
family with thirteen [1.4] and 
there was a family with sixteen difference [1.0] that's a stem and leaf plot [0.
6] it's just writing down the data but it's kind of ordered in a nice little 
way [0.5] ranking them from the smallest to the largest [0.2] and you notice 
i've grouped them [0.3] in [0.2] class intervals of width five [1.6] see these 
are the minus-ten to the minus-five minus-five to nought et cetera [0.8] and er 
the the the the advantage of being able to do that is you can immediately spot 
what the spot what the [0.3] histogram looks like [1.0] so if i just draw a [1.
1] tiny little histogram down here you can immediately spot there are four [2.
0] four er observations in the first group [0.5] there are seven in the second 
[1.2] four in the third five [0.6] in the fourth and then another four [0.5] 
and another one see so there's the [0.6] there's the histogram [0.2] of the [1.
2] of er there it's a typical sort of histogram you get in biological 
experiments [1.5] and er [0.4] and the 
question is we we now want to analyse these data in such a way that we [0.2] we 
try and shed light on the question [0.6] whether genetic longer [0.3] or [0.2] 
or shorter now i want to talk about two [0.9] two approaches [1.3] er [0.2] the 
first approach [1.2] is the T-test [2.7] and it's called the T-test [0.6] 
because it's based on the T distribution and it's it's exactly the same really 
as what i was talking about er last week i when i was talking about [0.2] 
confidence intervals with a T [0.6] distribution so so the first approach is to 
is to plot a er er a normal distribution over all this and to discuss it all in 
terms of inference for a normal sum [1.5] so er [0.2] what would be [0.5] a 
sensible way of [0.7] thinking about these data then from a [0.4] normal 
perspective well [0.2] er there's er there's my histogram see it's a [2.6] sort 
of normal shape [1.3] not very good really it's a got i mean i er we've only 
got twenty-five observations so that's as close to as a normal 
distribution as [0.3] in fact as you you could ever get [0.9] so normality [0.
7] is er [0.3] probably a reasonable assumption [0.2] and all biologists assume 
normality without worrying about it so [0.6] we will as well [0.5] so this is a 
normal has a normal distribution [0.4] so in the usual notation [1.0] this 
thing has mu [0.3] er mean mu [0.6] and variance sigma-squared [0.3] so there's 
the model [1.1] so that's an to the ingredients i was talking about reminding 
about earlier so ne-, so the next question is [0.4] what is the null hypothesis 
[0.6] well the question [1.3] here we are just at the top there the question is 
whether there any evidence that L gets longer or shorter so the natural null 
hypothesis now [0.3] is that er L on average stays the same [1.0] so so [0.2] 
er so that means the mean of X is zero [0.7] which is mu isn't it so the null 
hypothesis [0.4] is that mu [0.3] is zero [1.6] so that's the null hypothesis 
of a T-test [1.2] and er [1.0] next ingredient test statistic so what's a test 
statistic [0.3] statistic are we going to take well we we need to [0.2] er [0.
2] we need to analyse the data now [0.3] so we need to [0.7] er 
work out the [0.9] sample mean [0.5] in order to [0.2] the T distribution and 
er [1.0] and according to my [1.1] calculations the sample mean is one-point-
nine-six [2.2] that's zero over there by the way isn't it so [0.3] so clearly 
[0.3] the distribution tends to be pushed over a bit to the right so the mean 
is [0.2] positive one-point-nine-six [0.9] and we also need the standard 
deviation [0.5] sample standard deviation for the T [0.5] statistic so [1.7] 
according to my calculator [0.7] this was seven-point-four-zero and of course 
as i've alrea-, already said N is twenty-five [0.7] so we're all set now to 
form the [0.9] T statistic [1.7] for the the the the the T random variable 
which tells us about mean to a normal distribution so [0.2] remember what that 
is so so that's just [1.0] we have temporarily to go to capital letters now 
'cause i'm talking about the 
random [0.5] distribution of these things so [0.2] remember it's it's the the 
the T statistic standard normal random variable on the top so it's er [0.2] 
it's the the the sample mean X-bar minus its mean [0.6] but [0.3] we're 
constructing a test statistic now so we do that under the assumption of the 
null hypothesis [0.2] and the mean is zero [0.3] so we don't have to divide by 
[0.5] we don't have to subtract anything [0.4] 'cause it's zero [1.2] and then 
we [0.2] er the-, then we scale that by dividing by the standard deviation [0.
7] so we divide by the standard deviation [0.8] but the standard deviation of X-
bar of course is X over square root of X so that's just the same thing as [0.5] 
sticking an N [0.6] up there [1.1] and er [0.9] and er [0.3] le-, let's get the 
numbers in so that one-point-nine-six [0.9] times five [0.5] square root of 
twenty-five [0.8] divide by seven-point-four-zero [0.2] and according to my 
calculator [0.5] that gives us one-point-three-two [0.9] 
okay so we've got a [0.2] a value of the T [0.8] random variable now which is 
one-point- [0.6] three-two so [0.5] so this is an intermediate set now we're 
getting a test statistic so what are we going to take as the test statistic [0.
6] well er this this quantity [0.2] intuitively measures the difference [0.3] 
between X-bar and what we'd expect namely zero [1.0] and since the question [0.
5] we're asking ourselves [0.5] here we are still at the top there the question 
we're asking ourselves is whether there any evidence that it gets either bigger 
or smaller [0.5] we put mod bars around this thing because it's deviations in 
either direction [0.3] which are equally [0.3] interesting [0.2] er intere-, 
equally interesting so the test statistic then is just the absolute value [0.3] 
of this er T statistic [1.3] so we now go to the [0.2] calculating the [0.5] P-
value now the third ingredient in this er [0.6] whole procedure so it's a [0.4] 
l-, let me do that little diagram over here so here's a distribution [1.2] of T 
[1.7] okay so 
this is T [0.4] on how many der-, degrees of freedom [0.2] er twenty-five 
observations so it's twenty-four degrees of freedom you remember [1.7] it's the 
N-minus-one because of the chi-squared [0.2] argument behind [0.3] all this so 
so there's a distribution [0.8] of er T or twenty-four degrees of freedom 
centred on zero [0.7] that's the zero and then we we look at the value we've 
got [0.3] which is up here somewhere one-point-three- [0.5] two [2.0] okay we 
need some for this this is five really isn't it [0.4] okay so so so what what w-
, what we the P-value then is the probability [0.6] of our test statistic 
greater than or equal to the one we've got [1.0] and so i'll put in red now [0.
6] that's going to be [1.2] that area there [0.9] probability of getting a T-
value bigger than one-point-three-two [0.2] 
but we're going to put mod bars around the test statistic [0.3] because 
deviations in the negative direction [0.2] are also equally interesting [0.6] 
so [0.5] so i i put this area down here as well [0.3] which is minus-one-point-
three- [0.5] two [1.3] writers of [0.6] elementary textbooks like talking about 
two-tails and one-tails and so on [0.2] this is a two-tailed test [0.4] because 
we're interested in deviations in [0.4] both directions so so we we [0.3] we 
work out [0.6] we work out the er P-value [3.2] which is the sum of these two 
red things [0.4] and this is where you have to go to your statistical tables [0.
5] or go to S-plus [0.5] if you have S-plus switched on in your desk which i do 
[0.5] and so i ju-, i just get S-plus to work out for me the tail area [0.5] 
the quantity the er [1.1] cumulative distri-, distribution from [0.3] T and er 
[0.4] and i make that nought-point-one-eight-six [0.7] so this is about [1.6] 
nineteen per cent [0.8] 
okay so the P-value [0.8] is er [0.5] nineteen per cent [0.6] so what does that 
tell us [0.5] general discussion over now what we're what we're trying to spot 
is if the P-value is small [0.6] and this is this probability very small and 
that's a measure of how much of a fluke it is to get this [0.4] value of T well 
it's not really is it [0.5] it's sum of your chance nineteen per cent [0.5] is 
er [0.6] not really very unlikely at all [0.2] so er so by any usual [0.9] 
canons of significance testing whatever significance level you've used five per 
cent or one per cent or whatever [0.2] you you wouldn't [0.5] declare this to 
be significant so the conclusion is [0.2] although the data show [0.2] er the g-
, data give a bit of a hint of a positive difference [0.9] which means that 
face lengths tend to get smaller as you go from first to second son [0.3] it's 
not significant [0.2] the difference is not big enough [0.3] to be clearly 
evident [0.4] from such a small sample size [1.2] so that's the [0.5] that's 
the first [0.7] approach to these [0.3] data with the T [0.2] T-test now let 
let me just mention another one 'cause it's [0.2] i 
wanted to get over the idea that the choice of a test statistic [0.3] is 
somewhat arbitrary [1.3] and here's another [2.2] here's another [0.6] way of 
looking at the data which is called the sign test [4.7] so it's another way of 
looking at the data and the sign test [0.7] sign test is much simpler than the 
T-test [1.0] the sign test just [0.5] just so it goes back to the original 
question [0.2] here we are still up there does L tend to get smaller or longer 
[1.2] the sign test simply looks at the sign of the difference [0.7] so if this 
is positive [0.8] the [0.2] L is getting shorter isn't it and if this thing is 
[0.2] negative [0.2] then L L is getting bigger [0.7] so a natural null 
hypothesis now [1.5] just simply looking at whether [0.7] you get bigger or 
less for a value of L [0.4] the [1.7] natural probability [0.5] er to look at 
now [1.3] is let's look at the probability that X [0.8] is positive [0.3] okay 
so that means that the [0.4] L is actually getting less [1.3] and al-, also we 
could look at the probability that X is negative so that 
means L gets bigger [0.8] and if there's nothing going on here if on average 
there's no trend over time [0.3] you'd expect to see as often L getting [0.4] 
bigger as it getting smaller you see so the [0.6] another way of formulating 
the null hypothesis [0.2] is that the probability that X is positive [0.2] 
should be the same as the probability [0.6] that X is negative [5.1] and er [1.
2] if we have a normal distribution with mean zero then truly that satisfies 
that requirement [0.2] but notice that this is a much more [0.3] this is a much 
weaker null hypothesis than this one here the f-, the T-test puts all this 
baggage on it like assuming normality and things and working out standard 
deviation and things [0.4] this is a much more crude this is a much more 
primitive [0.2] way of looking at the data just counting up how many positives 
and negatives [0.3] and 
testing [0.3] this hypothesis [1.2] so let's see [2.2] let's see how many 
positives and negatives we've got so [0.2] so can i [1.0] F for frequency there 
so let me put F- [0.3] sub-plus [1.9] is the number of positive [0.6] Xs [4.3] 
and let let er F- [0.7] minus [1.6] be the number of negative [0.8] Xs [1.7] 
okay [1.9] and er [0.4] er the probability of positive and negative should be 
the same according to the nu-, hyer-, null hypothesis [0.2] so on average [0.6] 
F-plus should be the same [0.3] as F-minus it w-, w-, never be exactly the same 
of course but on average [0.4] er the the difference between these things would 
be zero and one will be the same [0.6] as the other so so that's the null 
hypothesis this is what we'll look at now [0.3] so we d-, we don't we don't 
have Xs any more we just replace them by the frequencies [0.8] so so what is 
what's going to be a suitable test statistic now [1.9] a test statistic 
is a function of the data but now we're only looking at how many positives and 
negatives there are [0.8] so it's really a function of F-plus and F-minus i'll 
be looking at now [0.6] what is a suitable [0.5] te-, test statistic what is it 
about these things that [0.2] would lead us to doubt the null hypothesis is 
true [0.2] well it's the difference isn't it intuitively so if we just take F-
plus [0.5] minus F-difference [0.4] absolute bars around that [1.3] then that's 
a good test statistic [0.4] the more different [0.2] these two frequencies are 
[0.2] the more evidence we have [0.6] that er [0.5] X is more likely to be 
positive or negative [1.3] okay so this is the sign test now a completely 
different way of looking at data [0.2] this is now the [0.5] test statistic so 
how do we work out the P-value well let's get the data [2.6] let's look let's 
let's look at the [0.4] and i'm really very fed up we haven't got the [2.0] 
anyway there's the data again [0.3] how many positive and how many negative 
we've got four and a seven [0.3] in the first two cells haven't we so we've got 
seven [0.4] we've we've got eleven negatives [0.5] how many positives have we 
got [0.2] all the rest are positive except for zero [0.7] we'll leave the zero 
out 'cause that doesn't tell us anything about which way we're going [0.7] so 
we've got eleven [1.9] we've got eleven negatives [2.6] and the number of 
positives is the rest of them [0.5] which is fourteen 'cause the fifteen to 
twenty-five [0.2] but we'll leave the zero out so the number of negatives [0.9] 
is er [0.6] thirteen [5.4] i'm so sorry number of positives is thirteen see 
that so we've got [0.5] er th-, three there [0.5] five there four there and one 
there and [0.2] so we've got a total [1.8] okay so the P-value then what is the 
P-value then [0.7] this 
is the probability [0.7] that by chance [2.8] this is the probability that by 
chance [1.1] F-plus [0.7] minus F-minus [0.8] is going to be [0.6] greater than 
or equal to what we have observed and what we have observed [0.5] is two isn't 
it there's only a difference only two between these two [1.0] these two things 
[1.5] so this probability [0.2] going to be pretty big now [0.3] if you just 
think about it [2.8] so so [0.5] how how are we going to get this probability 
well easy way to think about it [0.2] is that we've got er fourteen positives 
and eleven negatives so so think of tossing a coin twenty-four times [0.9] and 
we get thirteen heads [0.2] and eleven tails that's essentially the same [0.7] 
problem isn't it [0.3] recast in terms of coin tossing [0.5] so so this is the 
probability of the number of heads minus the number of tails in twenty-four 
tosses [0.6] is er [0.3] greater than or equal [0.2] to two [0.5] well two is a 
pretty small number so the easiest thing is to work out the 
opposite of that [1.4] so that's one minus the chance [0.4] that F-plus minus F-
minus is strictly less than two [0.7] now F-plus minus F-minus [0.7] is always 
going to be an even number isn't it [1.7] if you think about it 'cause if you 
put the number of heads up by one the number of tails goes down by one [0.4] so 
this is so so the only exception the only [0.5] possible sample result which 
doesn't satisfy this inequality here [0.4] is if the two are exactly equal [1.
1] so it's one minus the probability [0.4] that F-plus [0.6] equals F-minus and 
therefore equals twelve [3.1] and we can [0.5] we know what that is don't we [0.
2] what's the so so that's the chance that if you toss a coin twenty-four times 
[0.2] you will get exactly twelve heads [0.8] and twelve tails in exact balance 
and we know what that is don't we that's twenty-four [1.4] C twelve [1.7] times 
one over two [1.4] to the power of twenty-four [1.7] 
okay so it it follows down [0.4] to a binomial [0.3] er probability as it must 
do 'cause we're talking about binomial distributions [0.4] basically and if you 
work out that probability [1.1] this is about point-one-six [1.0] this 
probability if you work it out on your calculator so one minus that [0.3] is 
nought-point-eight- [0.2] three- [0.6] nine [3.7] so that's the P-value [0.7] 
so that's the P-value calculated [0.4] by this other way [0.8] of er [0.8] 
looking at it so the main point to note [1.0] is we've we've got exactly the 
same data [1.0] both tests are doing something which is very sensible [0.6] 
first test looking at the mean of the distribution [0.5] in the classical way 
[0.3] the second test is looking at how many positives and negatives there are 
both are [0.4] perfectly plausible [0.3] and yet they give different answers [0.
7] and they give different answers [0.3] because they're using different test 
statistics [0.4] this one's look-, looking at the T thing [0.2] with the 
baggage of standard deviation [0.2] this test statistic is just 
looking at the difference between [0.2] two [0.2] integers [1.2] so er [0.7] so 
the so the moral of this [0.4] example is i-, is is is the point [0.3] that 
significance levels are not everything [0.9] significance levels [1.1] isn't 
really the full story we want we want to somehow get the idea here [0.2] that 
the second test [0.2] is actually worse than the first test we want to [0.3] 
somehow take into get get in [0.6] get get to grips with the fact that the 
first test [0.2] looks at the data in much more detail [0.2] than the second 
test we're making [0.2] lots of assumptions in the first test like normality 
and things like that [0.5] so we would expect to get a premium [0.3] for doing 
the first method [0.3] rather than the second method so in se-, there must be a 
sense [0.2] in which the P-value for the first way method [0.2] is better [0.2] 
than the P-value [0.2] for the second method [0.6] and the only way we can get 
to grips with that [0.9] is er by looking at the error rate the other way round 
[0.9] we're not just interested [0.3] in how often we reject the null 
hypothesis when it's true [0.2] we've also got 
to worry about [0.2] the sensitivity of the test we've got to worry about the 
probability [0.3] that if ne-, if H-nought actually isn't true [0.8] we've got 
to worry about the the probability that we'd actually detect that [0.8] and we 
like a test [0.4] which has a a a good sensitivity which i-, er going to be 
more likely to detect [0.8] the er lack of truth of H-nought rather than just 
simply detect [0.2] its truth [0.7] and that's how the rest of the theory of [0.
2] significance tests goes which is what i'm going to talk about [0.4] er [0.5] 
just be-, beginning now and then er that's the main topic for next week really 
and what i'm going to be talking about from now on [0.2] is much later work 
actually historically [0.4] than P-value P-values goes back about a hundred 
years [0.7] the idea of looking at these types of error [0.2] goes back to the 
late nineteen-thirties so it's a a more recent [0.7] er [0.7] innovation [0.9] 
so er finally today then [0.2] let me define [2.7] let me define er [0.2] two 
more things i said i was going to have two more concepts to define [0.4] here 
they are [1.1] so the first 
thing to define [0.7] is er [0.6] i-, is i-, is what's called alternative 
hypothesis [1.2] and er [0.4] you see why now i have H-nought 'cause i'm now 
going to have H-one [2.2] H-one [2.4] is the [11.2] H-one [0.2] stands for the 
alternative [0.3] hypothesis and H-one [0.5] H-one is simply the complement of 
H-nought [7.1] in in the sets that i-, in the set [0.2] sets so [0.2] if H-
nought is true [1.1] H-one is its opposite so H-one must be false [1.0] and 
similarly is H-nought is false [0.6] then the opposite of it is true [0.9] so H 
H-one is just the opposite of H-nought [1.0] so in set theory sense it's the 
complement [3.8] so that's the first definition the alternative hypothesis so 
we've got two hypotheses going on now [0.8] either H-nought or H-one and we're 
trying to decide [0.8] which is which [0.4] and the er [1.2] the second 
quantity is the error rate the other way around so everybody uses a simple beta 
now instead of alpha [1.1] so so this 
is now the probability [1.6] that we're going to [4.4] accept the null 
hypothesis in other words believe the null hypothesis is true [0.5] but now 
calculate it on the [0.9] assumption [0.2] that H-one is the [0.2] the the 
truth [1.1] so this is the error rate the other way round this is the chance 
that [0.4] H H-one actually is true in other words H-one r-, H-nought is really 
false [0.4] but we conclude that it's true so we made a mistake of course [0.5] 
so this [0.7] i defined alpha to be the type one error rate a moment ago this 
is now the type two [1.0] error rate [10.1] so it's the probability of getting 
things wrong [0.6] er the other way round and the theory of significance tests 
[0.3] now goes down the route [0.4] of looking at values of alpha [0.2] and 
beta [0.6] and now not only do we want to control alpha which i've talked about 
so far [0.2] we also want to choose significance tests [0.2] I-E choose test 
statistics [0.3] for which beta [0.2] is also small [1.0] and that's what i'm 
going to talk about [0.6] er next Monday [0.6] and Thursday