nf0274: er so in the last lecture we looked at hypothesis tests [0.9] where our 
belief about the value of the underlying tendency whatever it was [0.7] was 
used to calculate the probability of the data that we observed [1.3] er we were 
essentially considering [0.2] how different what we observed was [0.3] to what 
we expected to happen [0.8] what we believe is happening [1.0] and today we're 
going to be looking at confidence intervals [0.7] which allows us to get an 
appreciation of the size of that difference [0.5] we calculate a range which 
includes the true value of the specified probability [5.9] before we go on to 
that [0.3] we'll consider a quick illustration of the problem that might face 
us [0.7] the slide shows data er hypothetical of a number of neural tube 
defects in Western Australia [0.4] from nineteen-seventy-five to two-thousand 
[1.8] er [0.3] and obviously what we're most interested in [0.5] is how many 
cases we can expect [0.3] on average in a year [1.2] and from what we observe 
the line gives us an idea the line in the er middle of the plot [0.7] gives us 
an idea of 
what we might expect to observe [1.3] the first thing to note about that is 
that there's a fair amount of variation of the points around that line [1.0] 
which makes it a little bit difficult to predict the number of cases [0.4] in 
any particular year [2.2] the second thing to note about the plot is that there 
appears to be a drop [0.5] in the number of cases at around about nineteen-
ninety [1.0] and that actually coincides with the introduction of folic acid [0.
7] er given to pregnant women [1.4] so an obvious question is has the 
introduction of folic acid made any difference on the number of cases that 
we've observed of neural tube defects [1.4] is that drop [0.2] as a result of 
the introduction of folic acid or is it just random variation [1.5] in other 
words what we want to do is to remove the year on year variation [0.2] that 
we've observed [1.0] and [0.4] make an inference about what the underlying 
trend is [1.2] and perhaps whether [0.2] the number of cases in the years prior 
to nineteen-ninety [0.5] are different to the number of cases in the 
years after nineteen-ninety [0.8] we want to get rid of the random variation 
and make some kind of inference about the underlying tendency [0.5] in the data 
[5.3] so just to give you a quick reminder of what a hypothesis test is [1.1] 
we set up our hypothesis which is to quantify our belief about say an incidence 
rate ratio or something like that [1.7] and we calculate the probability of 
what we've observed in our data [0.4] given what we've believed what the 
hypothesis is that we've set up [1.4] and the inference goes that if that 
probability [0.2] is very small [0.4] then either something very unlikely has 
happened and you should keep in mind that [0.3] the fact that something is 
unlikely doesn't mean that it's impossible it could happen [1.6] or that the 
hypothesis is wrong [2.0] and so if we observe a very small P-value [0.7] then 
we conclude that our data are incompatible with our hypothesis [0.4] so we can 
reject our null hypothesis [1.6] er a quick reminder that the probability of 
what we observe given what we believe is called the P-value [4.7] so 
this is a slide i lost it a little bit on last week [0.3] er [0.8] just do you 
remember the cut-off value of the P-value [0.2] that we use [0.6] is completely 
arbitrary [0.7] most of the time we use point-nought-five but we could use a 
different value if we chose to [1.6] and if we observe er [0.2] a P-value of 
point-zero-five-one [0.3] then that's still fairly unlikely [0.7] and if we're 
investigating something contentious like AIDS then [0.4] we'd still be fairly 
interested in what was going on in that instance although the result that we've 
observed is not statistically significant at the five per cent level if we get 
have a P-value of point-five-one [0.6] it's still a pretty unlikely event [0.4] 
and so we'd be more interested [1.0] conversely if we were investigating the 
common cold [0.2] then we probably wouldn't be too bothered [2.0] er good thing 
about P-values is they're they're easy to use and interpret the we [0.2] gives 
us a simple comparison of two numbers our significance level [0.6] and er the 
observed P-value that we have [1.5] it also has er an 
interpretation that it's the probability of rejecting the null hypothesis [0.7] 
er [0.5] when it's actually true in other words that the data could be 
consistent with the hypothesis [0.3] and be very unlikely [1.6] er the P-value 
gives us a probability of that [2.0] note also that the sid-, statistical 
significance depends on sample size [0.9] er we'd never reject the null 
hypothesis [0.2] in er a toss of a coin three times [0.2] because the lowest 
possible P [0.7] is still greater than point-nought-five the the lowest 
possible P in that case is er one-over-eight [0.3] it's always going to be 
greater than point-nought-five [1.0] and so we'd never reject it [0.8] so 
significance depends on the sample size [0.4] sorry i'm getting some hands 
waved in namex 
sm0275: how sorry how do you calculate P [0.4] 
nf0274: well i i have to talk about that [0.4] later [0.4] sorry [1.0] er [1.8] 
so er a statistically significant result [0.3] er [0.2] may not be clinically 
important remember [0.9] that depends on the [0.3] context of the problem [0.4] 
for 
example if we're looking at er [0.8] readmission data then we might be very 
very interested in a very small difference of readmission [0.8] but conversely 
a very small difference in mortality rate may not be [0.5] er clinically 
important [0.5] so [0.7] depends on er [0.5] the context of the problem as to 
whether or not the difference we observe is clinically important [6.7] [3.8] so 
our problem is to use [0.6] what we observe to draw conclusions about the 
underlying tendencies [1.0] and confidence intervals give us a range [0.2] er 
which may include the true value [0.8] and we can also test a hypothesis about 
the true value [0.3] that we're interested in [4.7] so [0.6] sorry i've er [0.
2] pressed the wrong button on the [0.7] slide i just want to make sure i'm 
showing you the right one [1.7] estimation right [2.0] so consider an example 
suppose in our study hypothetically [0.8] we have a hypothesis that the risk of 
er T-B in Warwickshire or Warwick [0.7] is the same as the risk of T-B in the 
rest of the U-K [1.2] and we've collected 
some data and calculated an incident rate ratio of one-point-three [1.1] and on 
the under the assumption [0.3] that th-, t-, the er [0.4] two risks are the 
same [1.4] the probability of observing that incident rate ratio of one-point-
three [0.7] er occurs [0.2] less than one per cent of the time [1.2] so the 
data is inconsistent with the hypothesis that we're testing [1.2] and we can er 
[0.5] conclude that that we can reject that hypothesis but [0.3] what it 
doesn't tell you is what the magnitude of the difference [0.7] of the er [0.8] 
risk of T-B for Warwick and the risk of the U-K is [0.8] and [0.2] quite often 
we want to say something about [0.3] what size of difference [0.3] is [1.0] 
what the size of difference is [1.3] we want a best guess at the true risk [3.
7] so this slide shows the P-values associated with the range of hypotheses 
where we observe an incident rate ratio of one-point-three [1.9] so if we look 
at the [0.2] er line [0.4] corresponding to plus-thirty per cent risk i'll just 
get the pointer up [1.5] there it is [0.9] so if we're looking at this line 
plus-thirty per cent 
risk [1.3] observing an incident rate ratio of one-point-three [0.9] would 
correspond to a P-value of point-nought-fi er sorry point-five [0.4] and so we 
wouldn't reject our null hypothesis that there's a difference between the two 
areas there [2.2] correspondingly if our null hypothesis was that there was er 
[1.3] a plus-forty per cent risk [0.7] then observing er a P an incident rate 
ratio of one-point-three [0.5] would have a P-value of point-two so we still 
wouldn't reject our null hypothesis [1.1] and so on for all the other values in 
the table [4.0] this gives us an idea [0.5] of which hypotheses are 
inconsistent with our observed data [1.0] so i-, [0.2] informally the values 
outside the range of ten per cent excess risk to fifty per cent excess risk are 
inconsistent with the data we've observed [1.9] and that range probably 
includes the true value [6.1] so the ninety-five per cent confidence interval 
[0.8] is a range which includes the true value with ninety-five per cent 
certainty [1.3] er in this example the ninety-five per cent confidence 
interval for the incident rate ratio was one-point-one to one-point-five [1.2] 
and it centred on the observed value which is our best guess at the true value 
[1.1] and obviously because it's centred on the observed value that always 
falls inside the er confidence interval [5.6] so er [0.9] there are slightly 
different ways of calculating confidence intervals and namex students in 
particular might have seen [0.2] different methods er than the ones that are 
given in your lecture notes [0.9] but for the purposes of this course we're 
just using the error factor [1.1] formula are given in er the lecture notes for 
various [0.5] calculations for confidence intervals [0.5] er pages one-twenty-
five and one-twenty-six [2.1] basically the confidence interval is centred on 
the observed value [1.1] and then we calculate the error factor [0.3] and 
correspondingly the upper and lower confidence lint-, [0.2] limits as 
appropriate [0.4] using the [0.3] er given formula [1.8] and the range between 
the lower and the upper confidence 
limit [0.5] is called the ninety-five per cent confidence [0.2] interval [3.6] 
[sniff] [1.1] so an example [1.3] say that we have interest in the incidence of 
diabetes [1.0] and we've observed fifty cases in ten-thousand person years [1.
2] so we have er five cases per thousand person years [1.4] we can calculate 
the observed exposure [1.3] and the error factor which is based on the number 
of events observed fifty [0.7] [sniff] [0.2] given by the formula which a-, a-, 
[0.4] appears on page one-twenty-five of your notes [0.3] the error factor 
exponential twice times the square root of one over the number of cases one 
over fifty in this example [0.9] being one-point-three-three [4.8] so we've 
observed [0.6] five per thousand person years give cases of diabetes per 
thousand person years [0.8] and we've calculated our error factor one-point-
three-three [0.9] we can then use the formula to calculate the lower and the 
upper ninety-five per cent confidence limits [1.1] and give our best estimate 
of the true infiden-, [0.2] 
incidence being the observed inth-, incidence [0.4] [sniff] [0.3] and the 
ninety perfef-, [0.2] ninety-five per cent confidence interval [0.5] being 
three-point-eight to six-point-seven cases per thousand person years [0.8] we 
can be ninety-five per cent certain [0.2] that that range three-point-eight to 
six-point-seven [0.4] includes the true value of the incidence rate [1.5] 
sm0276: excuse me [0.3] [0.2] at the risk of feeling dim where do you get the 
two from in that formula [0.8] 
nf0274: sorry i've just been asked where the two comes from in the formula [0.
2] in the error factor [0.2] [sniff] [0.2] er [0.5] i i don't want to discuss 
that [0.2] i-, [0.4] in the lecture i want to carry on 'cause i've got quite a 
lot to get through [0.3] er we can perhaps talk about that in a [0.4] in a 
session later 
sm0277: if we don't understand the [0.3] point [0.6] then [0.2] is there any 
point [0.6] giving the lecture [0.9] i'm not being 
rude but like [0.7] that's two questions and you've not answered either of them 
[1.1] 
nf0274: the reason i i'm i'm again being asked another question the reason i'm 
not answering these is because er i'm also lecturing to le-, to namex students 
[0.8] and so [0.2] they can't hear what you say [0.2] they can only hear what i 
say [0.9] and i'm not confident enough with this system [0.2] to repeat your 
question [0.3] then think of the answer [0.3] try and keep the lecture to time 
[0.3] and keep going [0.7] so i'm sorry we will have to talk about those later 
[0.4] okay [2.5] right so diabetes example sorry i'll just have to look at 
where i am now [1.4] [3.2] so [1.7] consider what happens as we get more data 
[1.6] basically error factor which is based one over the number of cases [0.4] 
gets smaller because if we get more data we observe more cases [0.4] and so one 
divided by a number which is increasing [0.4] gets smaller [0.8] the error 
factor gets smaller [0.5] we multiply the observed value by the error factor 
which is 
getting smaller [1.0] and the confidence interval gets narrower [0.7] [sniff] 
[0.8] so for example [0.2] again if we've observed two-hundred new cases of 
diabetes in a population of forty-thousand people [0.2] over a year [0.9] then 
the estimated rate is the same as before [1.1] but the error factor is smaller 
because we've observed two-hundred cases not fifty cases [1.2] so the error 
factor using that formula is one-point-one-five [0.8] and that upper and lower 
ninety-five per cent confidence limits [0.3] are as given [0.9] and we have er 
[1.3] a confidence interval [0.5] of now four-point-three to five-point-eight 
rather than [0.6] three-point-six three-point-eight to six-point-seven [0.7] so 
we've got more data [0.5] our error factor has got smaller [0.2] and our 
confidence interval has got er [0.2] more narrow [5.6] so the confidence 
interval reflects our uncertainty about the true value of something so it's it 
could be an infi-, incidence a population prevalence or an average height or 
whatever [1.6] but you should remember that it's not a value [0.2] er not a 
range in 
which ninety-five per cent of the observations lie [2.2] and you can illustrate 
that quite easily if we split up the data from a few slides back [0.8] so if we 
have fifty cases on two-thousand people over five years [0.7] if we consider 
the number of cases in each of those five years that's what's given in the 
table so in the first year we observe th-, er thirteen cases [0.5] second year 
ten cases and so on [2.1] er the confidence interval for that data was three-
point-eight to six-point-seven [0.7] but you can see that the incidence rate 
for er years three four and five [0.8] where the rate is point-zero-zero-three 
point-zero-zero-seven and point-zero-zero-three-five [0.7] is outside that 
range so sixty per cent of our observed observations are falling outside that 
range [1.3] just an illustration [4.0] another example looking at the heights 
of fifty students [1.0] can calculate the observed mean height [0.4] and the 
confidence interval [0.8] er using the appropriate formula [1.6] but ninety-
five per cent of our students er fall between [0.4] 
one-point-five-five metres and one-point-eight-five metres [0.2] in height [0.
8] [sniff] [0.4] er [1.1] so we find the range in which ninety-five per cent of 
our students [0.2] lie by inspection of our data in that case [1.6] and you can 
see that the two ranges aren't the same [0.2] one is called the reference range 
which is different from the confidence interval [0.7] and it's important that 
you remember that [5.5] another quick example if we're interested in a rate 
ratio [1.1] so in the first population we observe D-one cases er in in P-one 
person years [0.4] and in the second we observe D-two cases in P-two person 
years [0.6] [sniff] [0.4] er calculate the observed rate ratio easily [0.8] and 
the error factor using the appropriate formula page one-twenty-five [5.7] 
estimation versus hypothesis testing you should note that estimation is [0.2] 
more informative than hypothesis testing [1.2] er it can incorporate a 
hypothesis test [1.6] quick drink [6.5] so it's 
actually more useful to know [0.5] something about the plausible size of a 
difference than knowing only that there is a difference [0.7] our hypothesis 
test can tell us whether or not our data is consistent with there being a 
difference [0.4] but it can't tell us how big that difference is [2.3] so carry 
on with the rate ratio example [0.4] if in population A we have twelve cases in 
two-thousand person years [0.6] and population B we have sixteen cases in four-
thousand person years [1.0] then we can calculate the rates per thousand person 
years in the usual way [0.3] and the ratio [0.3] of A to B being one-point-five 
[3.4] obviously in our hypothesis [0.7] er [1.3] if the rates are the same then 
the ratio will be one [0.4] [sniff] [0.2] and the observed ratio of the rates 
in our [0.2] data example [0.5] is one-point-five [0.7] excuse me [2.1] er we 
used the formula [0.3] on page one-point-five [0.2] er is it one-point-five 
page one-twenty-five to calculate the error factor [0.7] note that includes the 
er [0.6] observed events in both populations not just in the one [1.5] 
and the error factor for that example [0.2] er is is easily shown to be two-
point-one-five [0.5] so we can use the usual way to calculate the ninety-five 
per cent [0.3] confidence interval for the rate ratio [1.0] er and that gives 
us the range of point-seven-nought to three-point-two-three [3.1] now that 
includes the er [1.6] value of one [0.6] which is that the rates are the same 
[1.6] so from that we can conclude that the observed data we've [0.2] based 
that confidence interval on [0.5] are consistent with the null hypothesis that 
the rates are the same [0.6] because the confidence interval [0.4] includes the 
er null value of one [0.6] we conclude [0.2] that the data is er consistent [0.
3] with that hypothesis [0.9] so we can't reject that hypothesis at the five 
per cent level [2.0] note that it doesn't prove [0.2] that the null hypothesis 
is true [1.2] er [0.3] and if you think about that for a while you'll note that 
the range that we've got there point-seven to three-point-two [0.7] also 
includes the value of three [0.9] so if we tested the hypothesis that the er 
rate ratio was three [0.6] 
then we wouldn't reject that hypothesis either [2.8] so that just shows you 
that it doesn't prove that the null hypothesis [0.4] of er the ratios being the 
same [0.3] is true [0.7] it merely says that the data is not inconsistent with 
it [5.8] another example using the rate ratio this one is where the data are 
inconsistent with the null hypothesis [0.9] the confidence interval calculated 
i'll not [0.3] trawl through the er [0.4] algebra again [0.5] the confidence 
interval there is one-point-four to two-point-eight-six and that does include 
the null value of one [1.5] so in this case we can reject [0.6] the er [0.3] 
sorry that doesn't include the null value of one [1.0] namex students are 
looking a bit confused there [sniff] [0.3] er it doesn't include the null value 
of one [0.3] and so we can't er [0.2] reject [0.3] the er we can reject the 
null hypothesis [5.1] er [0.5] today you'll have done a bit of inference on er 
[0.6] standardized mortality ratios [0.4] [sniff] [0.3] again that just runs 
through the formula this is on page 
one-twenty-six of your lecture notes [1.1] we observe O deaths we expect E 
deaths [0.4] based on age-specific rates and standard population [0.4] age-
specific population sizes in the study population [0.9] we can calculate our 
observed S-M-R [0.7] and the error factor [0.4] twice the square root of one 
over the observed [0.3] exponentiated [2.1] and er set up our confidence 
interval [3.5] so suppose we put some data in we er we have er [0.2] we expect 
fifty deaths in our study population [0.7] and we observe sixty deaths [0.7] 
then our observed S-M-R is one-twenty [0.7] error factor one-point-two-nine and 
our ninety-five confidence interval [0.3] is ninety-three to one-fifty-five [1.
7] this includes a hundred [0.4] er remember if the observed equals the 
expected [0.3] then the S-M-R would be a hundred [0.8] and so we wouldn't 
reject the null hypothesis [1.8] note that it also includes values as high as 
fifty per cent excess deaths so it doesn't in-, it doesn't prove the quality 
either [0.8] again the same same argument [1.1] as before [4.5] so a summary 
quite an extensive summary for 
this lecture [1.0] [sniff] [1.0] er all obser-, observations are subject to 
random variation we've seen several examples of data [0.2] which have 
fluctuated usually around some underlying tendency [1.1] and we're always 
interested in the underlying tendency [1.4] we can use the data that we observe 
to test hypothesis [0.2] hypotheses [0.5] about underlying values which gives 
us an idea of whether data is consistent or not [0.5] with er what we believe 
[1.2] and we can also use our er [0.2] observed data to estimate our underlying 
tendency [1.5] that we're interested in [1.9] in this course the best estimate 
of the true value [0.3] er of underlying tendency is the observed value [1.8] 
but we also want er an idea of how that varies [0.3] er to take into account 
the the the nature of random variation [0.4] just to give a single number for 
something would be rather naive [1.2] and we express the uncertainty again in 
this course by calculating error factors and deriving confidence intervals [0.
8] and remember that the definition of 
a ninety-five per cent confidence interval is the range which includes the true 
value of the statistic of intre-, [0.3] interest [0.4] with probability of 
ninety-five per cent or point-nine-five if you like it that way [1.8] you can 
also look at it as the range of true values [0.2] which is consistent with the 
observed data [1.9] so if er different values are consistent with the observed 
data [0.6] that would lead us to different [0.2] different conclusions but you 
can only be uncertain [0.2] what to conclude [3.4] another summary slide [1.2] 
we have two populations with incidence rates point-zero-zero-eight point-zero-
zero-two [0.8] our rate ratio is four error factor is two [0.7] so our ninety-
five per cent confidence interval is two to eight [1.2] all the values in the 
ninety-five per cent confidence interval suggest that the rate in A is higher 
than the rate in B because it doesn't include the null value of one [2.0] we 
can er [1.0] safely conclude that A is higher than B [3.2] so the rate ratio is 
significantly different from one [0.3] at er [0.7] the 
five per cent level [5.4] a further example [0.2] rate ratios again the rate 
ratio g-, er is er [0.4] two the error factor is four [0.7] and the ninety-five 
per cent confidence interval is [0.2] point-five to point to eight sorry [2.2] 
the values in the ninety-five per cent confidence interval in this case are 
consistent with A being much higher than B [0.5] A being lower than B or both 
the same [0.7] in other words er [1.5] there are values in that confidence 
interval which are greater than or less than one [0.3] and one is also included 
[1.9] we can't really be [0.2] er too firm about our conclusions [0.8] in that 
case the ninety-five per cent confisa-, confidence interval does include our 
null [0.4] value [0.3] er the value of one [0.6] so we can say that the rate 
ratio is not significantly different from one [0.3] in that case [0.9] but it 
doesn't prove the quality [0.2] again again we have values of up to eight in 
that confidence interval [0.7] so the data is also consistent 
with [0.2] with er hypotheses that extreme [2.8] so things to remember [0.5] er 
variation [0.2] always exists people are different we should all know that 
without too much uncertainty [1.7] and because of that variation our underlying 
dat-, er our observed data [0.5] is different from our underlying tendency [1.
4] you need to have an appreciation of what sources of variation might be [0.9] 
why why things differ [0.5] and er be able to test hypotheses about true values 
and set up confidence intervals [0.3] using the formula given in your lecture 
notes [2.9] so that's it for today [0.3] er [0.2] i believe someone has an 
announcement to make i don't know where are you [0.7] she's there do you need 
to make it to namex students as well [0.3] 
sf0278: no [0.2] 
nf0274: no okay this is just for namex students so [0.9] there we are thanks 
very much