nf0274: er so in the last lecture we looked at hypothesis tests where our 
belief about the value of the underlying tendency whatever it was was used to 
calculate the probability of the data that we observed er we were essentially 
considering how different what we observed was to what we expected to happen 
what we believe is happening and today we're going to be looking at confidence 
intervals which allows us to get an appreciation of the size of that difference 
we calculate a range which includes the true value of the specified probability 
before we go on to that we'll consider a quick illustration of the problem that 
might face us the slide shows data er hypothetical of a number of neural tube 
defects in Western Australia from nineteen-seventy-five to two-thousand er and 
obviously what we're most interested in is how many cases we can expect on 
average in a year and from what we observe the line gives us an idea the line 
in the er middle of the plot gives us an idea of 
what we might expect to observe the first thing to note about that is that 
there's a fair amount of variation of the points around that line which makes 
it a little bit difficult to predict the number of cases in any particular year 
the second thing to note about the plot is that there appears to be a drop in 
the number of cases at around about nineteen-ninety and that actually coincides 
with the introduction of folic acid er given to pregnant women so an obvious 
question is has the introduction of folic acid made any difference on the 
number of cases that we've observed of neural tube defects is that drop as a 
result of the introduction of folic acid or is it just random variation in 
other words what we want to do is to remove the year on year variation that 
we've observed and make an inference about what the underlying trend is and 
perhaps whether the number of cases in the years prior to nineteen-ninety are 
different to the number of cases in the 
years after nineteen-ninety we want to get rid of the random variation and make 
some kind of inference about the underlying tendency in the data so just to 
give you a quick reminder of what a hypothesis test is we set up our hypothesis 
which is to quantify our belief about say an incidence rate ratio or something 
like that and we calculate the probability of what we've observed in our data 
given what we've believed what the hypothesis is that we've set up and the 
inference goes that if that probability is very small then either something 
very unlikely has happened and you should keep in mind that the fact that 
something is unlikely doesn't mean that it's impossible it could happen or that 
the hypothesis is wrong and so if we observe a very small P-value then we 
conclude that our data are incompatible with our hypothesis so we can reject 
our null hypothesis er a quick reminder that the probability of what we observe 
given what we believe is called the P-value so 
this is a slide i lost it a little bit on last week er just do you remember the 
cut-off value of the P-value that we use is completely arbitrary most of the 
time we use point-nought-five but we could use a different value if we chose to 
and if we observe er a P-value of point-zero-five-one then that's still fairly 
unlikely and if we're investigating something contentious like AIDS then we'd 
still be fairly interested in what was going on in that instance although the 
result that we've observed is not statistically significant at the five per 
cent level if we get have a P-value of point-five-one it's still a pretty 
unlikely event and so we'd be more interested conversely if we were 
investigating the common cold then we probably wouldn't be too bothered er good 
thing about P-values is they're they're easy to use and interpret the we gives 
us a simple comparison of two numbers our significance level and er the 
observed P-value that we have it also has er an 
interpretation that it's the probability of rejecting the null hypothesis er 
when it's actually true in other words that the data could be consistent with 
the hypothesis and be very unlikely er the P-value gives us a probability of 
that note also that the sid-, statistical significance depends on sample size 
er we'd never reject the null hypothesis in er a toss of a coin three times 
because the lowest possible P is still greater than point-nought-five the the 
lowest possible P in that case is er one-over-eight it's always going to be 
greater than point-nought-five and so we'd never reject it so significance 
depends on the sample size sorry i'm getting some hands waved in namex 
sm0275: how sorry how do you calculate P 
nf0274: well i i have to talk about that later sorry er so er a statistically 
significant result er may not be clinically important remember that depends on 
the context of the problem for 
example if we're looking at er readmission data then we might be very very 
interested in a very small difference of readmission but conversely a very 
small difference in mortality rate may not be er clinically important so 
depends on er the context of the problem as to whether or not the difference we 
observe is clinically important so our problem is to use what we observe to 
draw conclusions about the underlying tendencies and confidence intervals give 
us a range er which may include the true value and we can also test a 
hypothesis about the true value that we're interested in so sorry i've er 
pressed the wrong button on the slide i just want to make sure i'm showing you 
the right one estimation right so consider an example suppose in our study 
hypothetically we have a hypothesis that the risk of er T-B in Warwickshire or 
Warwick is the same as the risk of T-B in the rest of the U-K and we've 
collected 
some data and calculated an incident rate ratio of one-point-three and on the 
under the assumption that th-, t-, the er two risks are the same the 
probability of observing that incident rate ratio of one-point-three er occurs 
less than one per cent of the time so the data is inconsistent with the 
hypothesis that we're testing and we can er conclude that that we can reject 
that hypothesis but what it doesn't tell you is what the magnitude of the 
difference of the er risk of T-B for Warwick and the risk of the U-K is and 
quite often we want to say something about what size of difference is what the 
size of difference is we want a best guess at the true risk so this slide shows 
the P-values associated with the range of hypotheses where we observe an 
incident rate ratio of one-point-three so if we look at the er line 
corresponding to plus-thirty per cent risk i'll just get the pointer up there 
it is so if we're looking at this line plus-thirty per cent 
risk observing an incident rate ratio of one-point-three would correspond to a 
P-value of point-nought-fi er sorry point-five and so we wouldn't reject our 
null hypothesis that there's a difference between the two areas there 
correspondingly if our null hypothesis was that there was er a plus-forty per 
cent risk then observing er a P an incident rate ratio of one-point-three would 
have a P-value of point-two so we still wouldn't reject our null hypothesis and 
so on for all the other values in the table this gives us an idea of which 
hypotheses are inconsistent with our observed data so i-, informally the values 
outside the range of ten per cent excess risk to fifty per cent excess risk are 
inconsistent with the data we've observed and that range probably includes the 
true value so the ninety-five per cent confidence interval is a range which 
includes the true value with ninety-five per cent certainty er in this example 
the ninety-five per cent confidence 
interval for the incident rate ratio was one-point-one to one-point-five and it 
centred on the observed value which is our best guess at the true value and 
obviously because it's centred on the observed value that always falls inside 
the er confidence interval so er there are slightly different ways of 
calculating confidence intervals and namex students in particular might have 
seen different methods er than the ones that are given in your lecture notes 
but for the purposes of this course we're just using the error factor formula 
are given in er the lecture notes for various calculations for confidence 
intervals er pages one-twenty-five and one-twenty-six basically the confidence 
interval is centred on the observed value and then we calculate the error 
factor and correspondingly the upper and lower confidence lint-, limits as 
appropriate using the er given formula and the range between the lower and the 
upper confidence 
limit is called the ninety-five per cent confidence interval [sniff] so an 
example say that we have interest in the incidence of diabetes and we've 
observed fifty cases in ten-thousand person years so we have er five cases per 
thousand person years we can calculate the observed exposure and the error 
factor which is based on the number of events observed fifty [sniff] given by 
the formula which a-, a-, appears on page one-twenty-five of your notes the 
error factor exponential twice times the square root of one over the number of 
cases one over fifty in this example being one-point-three-three so we've 
observed five per thousand person years give cases of diabetes per thousand 
person years and we've calculated our error factor one-point-three-three we can 
then use the formula to calculate the lower and the upper ninety-five per cent 
confidence limits and give our best estimate of the true infiden-, 
incidence being the observed inth-, incidence [sniff] and the ninety perfef-, 
ninety-five per cent confidence interval being three-point-eight to six-point-
seven cases per thousand person years we can be ninety-five per cent certain 
that that range three-point-eight to six-point-seven includes the true value of 
the incidence rate 
sm0276: excuse me at the risk of feeling dim where do you get the two from in 
that formula 
nf0274: sorry i've just been asked where the two comes from in the formula in 
the error factor [sniff] er i i don't want to discuss that i-, in the lecture i 
want to carry on 'cause i've got quite a lot to get through er we can perhaps 
talk about that in a in a session later 
sm0277: if we don't understand the point then is there any point giving the 
lecture i'm not being 
rude but like that's two questions and you've not answered either of them 
nf0274: the reason i i'm i'm again being asked another question the reason i'm 
not answering these is because er i'm also lecturing to le-, to namex students 
and so they can't hear what you say they can only hear what i say and i'm not 
confident enough with this system to repeat your question then think of the 
answer try and keep the lecture to time and keep going so i'm sorry we will 
have to talk about those later okay right so diabetes example sorry i'll just 
have to look at where i am now so consider what happens as we get more data 
basically error factor which is based one over the number of cases gets smaller 
because if we get more data we observe more cases and so one divided by a 
number which is increasing gets smaller the error factor gets smaller we 
multiply the observed value by the error factor which is 
getting smaller and the confidence interval gets narrower [sniff] so for 
example again if we've observed two-hundred new cases of diabetes in a 
population of forty-thousand people over a year then the estimated rate is the 
same as before but the error factor is smaller because we've observed two-
hundred cases not fifty cases so the error factor using that formula is one-
point-one-five and that upper and lower ninety-five per cent confidence limits 
are as given and we have er a confidence interval of now four-point-three to 
five-point-eight rather than three-point-six three-point-eight to six-point-
seven so we've got more data our error factor has got smaller and our 
confidence interval has got er more narrow so the confidence interval reflects 
our uncertainty about the true value of something so it's it could be an infi-, 
incidence a population prevalence or an average height or whatever but you 
should remember that it's not a value er not a range in 
which ninety-five per cent of the observations lie and you can illustrate that 
quite easily if we split up the data from a few slides back so if we have fifty 
cases on two-thousand people over five years if we consider the number of cases 
in each of those five years that's what's given in the table so in the first 
year we observe th-, er thirteen cases second year ten cases and so on er the 
confidence interval for that data was three-point-eight to six-point-seven but 
you can see that the incidence rate for er years three four and five where the 
rate is point-zero-zero-three point-zero-zero-seven and point-zero-zero-three-
five is outside that range so sixty per cent of our observed observations are 
falling outside that range just an illustration another example looking at the 
heights of fifty students can calculate the observed mean height and the 
confidence interval er using the appropriate formula but ninety-five per cent 
of our students er fall between 
one-point-five-five metres and one-point-eight-five metres in height [sniff] er 
so we find the range in which ninety-five per cent of our students lie by 
inspection of our data in that case and you can see that the two ranges aren't 
the same one is called the reference range which is different from the 
confidence interval and it's important that you remember that another quick 
example if we're interested in a rate ratio so in the first population we 
observe D-one cases er in in P-one person years and in the second we observe D-
two cases in P-two person years [sniff] er calculate the observed rate ratio 
easily and the error factor using the appropriate formula page one-twenty-five 
estimation versus hypothesis testing you should note that estimation is more 
informative than hypothesis testing er it can incorporate a hypothesis test 
quick drink so it's 
actually more useful to know something about the plausible size of a difference 
than knowing only that there is a difference our hypothesis test can tell us 
whether or not our data is consistent with there being a difference but it 
can't tell us how big that difference is so carry on with the rate ratio 
example if in population A we have twelve cases in two-thousand person years 
and population B we have sixteen cases in four-thousand person years then we 
can calculate the rates per thousand person years in the usual way and the 
ratio of A to B being one-point-five obviously in our hypothesis er if the 
rates are the same then the ratio will be one [sniff] and the observed ratio of 
the rates in our data example is one-point-five excuse me er we used the 
formula on page one-point-five er is it one-point-five page one-twenty-five to 
calculate the error factor note that includes the er observed events in both 
populations not just in the one 
and the error factor for that example er is is easily shown to be two-point-one-
five so we can use the usual way to calculate the ninety-five per cent 
confidence interval for the rate ratio er and that gives us the range of point-
seven-nought to three-point-two-three now that includes the er value of one 
which is that the rates are the same so from that we can conclude that the 
observed data we've based that confidence interval on are consistent with the 
null hypothesis that the rates are the same because the confidence interval 
includes the er null value of one we conclude that the data is er consistent 
with that hypothesis so we can't reject that hypothesis at the five per cent 
level note that it doesn't prove that the null hypothesis is true er and if you 
think about that for a while you'll note that the range that we've got there 
point-seven to three-point-two also includes the value of three so if we tested 
the hypothesis that the er rate ratio was three 
then we wouldn't reject that hypothesis either so that just shows you that it 
doesn't prove that the null hypothesis of er the ratios being the same is true 
it merely says that the data is not inconsistent with it another example using 
the rate ratio this one is where the data are inconsistent with the null 
hypothesis the confidence interval calculated i'll not trawl through the er 
algebra again the confidence interval there is one-point-four to two-point-
eight-six and that does include the null value of one so in this case we can 
reject the er sorry that doesn't include the null value of one namex students 
are looking a bit confused there [sniff] er it doesn't include the null value 
of one and so we can't er reject the er we can reject the null hypothesis er 
today you'll have done a bit of inference on er standardized mortality ratios 
[sniff] again that just runs through the formula this is on page 
one-twenty-six of your lecture notes we observe O deaths we expect E deaths 
based on age-specific rates and standard population age-specific population 
sizes in the study population we can calculate our observed S-M-R and the error 
factor twice the square root of one over the observed exponentiated and er set 
up our confidence interval so suppose we put some data in we er we have er we 
expect fifty deaths in our study population and we observe sixty deaths then 
our observed S-M-R is one-twenty error factor one-point-two-nine and our ninety-
five confidence interval is ninety-three to one-fifty-five this includes a 
hundred er remember if the observed equals the expected then the S-M-R would be 
a hundred and so we wouldn't reject the null hypothesis note that it also 
includes values as high as fifty per cent excess deaths so it doesn't in-, it 
doesn't prove the quality either again the same same argument as before so a 
summary quite an extensive summary for 
this lecture [sniff] er all obser-, observations are subject to random 
variation we've seen several examples of data which have fluctuated usually 
around some underlying tendency and we're always interested in the underlying 
tendency we can use the data that we observe to test hypothesis hypotheses 
about underlying values which gives us an idea of whether data is consistent or 
not with er what we believe and we can also use our er observed data to 
estimate our underlying tendency that we're interested in in this course the 
best estimate of the true value er of underlying tendency is the observed value 
but we also want er an idea of how that varies er to take into account the the 
the nature of random variation just to give a single number for something would 
be rather naive and we express the uncertainty again in this course by 
calculating error factors and deriving confidence intervals and remember that 
the definition of 
a ninety-five per cent confidence interval is the range which includes the true 
value of the statistic of intre-, interest with probability of ninety-five per 
cent or point-nine-five if you like it that way you can also look at it as the 
range of true values which is consistent with the observed data so if er 
different values are consistent with the observed data that would lead us to 
different different conclusions but you can only be uncertain what to conclude 
another summary slide we have two populations with incidence rates point-zero-
zero-eight point-zero-zero-two our rate ratio is four error factor is two so 
our ninety-five per cent confidence interval is two to eight all the values in 
the ninety-five per cent confidence interval suggest that the rate in A is 
higher than the rate in B because it doesn't include the null value of one we 
can er safely conclude that A is higher than B so the rate ratio is 
significantly different from one at er the 
five per cent level a further example rate ratios again the rate ratio g-, er 
is er two the error factor is four and the ninety-five per cent confidence 
interval is point-five to point to eight sorry the values in the ninety-five 
per cent confidence interval in this case are consistent with A being much 
higher than B A being lower than B or both the same in other words er there are 
values in that confidence interval which are greater than or less than one and 
one is also included we can't really be er too firm about our conclusions in 
that case the ninety-five per cent confisa-, confidence interval does include 
our null value er the value of one so we can say that the rate ratio is not 
significantly different from one in that case but it doesn't prove the quality 
again again we have values of up to eight in that confidence interval so the 
data is also consistent 
with with er hypotheses that extreme so things to remember er variation always 
exists people are different we should all know that without too much 
uncertainty and because of that variation our underlying dat-, er our observed 
data is different from our underlying tendency you need to have an appreciation 
of what sources of variation might be why why things differ and er be able to 
test hypotheses about true values and set up confidence intervals using the 
formula given in your lecture notes so that's it for today er i believe someone 
has an announcement to make i don't know where are you she's there do you need 
to make it to namex students as well 
sf0278: no 
nf0274: no okay this is just for namex students so there we are thanks very much