nf0274: er so in the last lecture we looked at hypothesis tests [0.9] where our belief about the value of the underlying tendency whatever it was [0.7] was used to calculate the probability of the data that we observed [1.3] er we were essentially considering [0.2] how different what we observed was [0.3] to what we expected to happen [0.8] what we believe is happening [1.0] and today we're going to be looking at confidence intervals [0.7] which allows us to get an appreciation of the size of that difference [0.5] we calculate a range which includes the true value of the specified probability [5.9] before we go on to that [0.3] we'll consider a quick illustration of the problem that might face us [0.7] the slide shows data er hypothetical of a number of neural tube defects in Western Australia [0.4] from nineteen-seventy-five to two-thousand [1.8] er [0.3] and obviously what we're most interested in [0.5] is how many cases we can expect [0.3] on average in a year [1.2] and from what we observe the line gives us an idea the line in the er middle of the plot [0.7] gives us an idea of what we might expect to observe [1.3] the first thing to note about that is that there's a fair amount of variation of the points around that line [1.0] which makes it a little bit difficult to predict the number of cases [0.4] in any particular year [2.2] the second thing to note about the plot is that there appears to be a drop [0.5] in the number of cases at around about nineteen- ninety [1.0] and that actually coincides with the introduction of folic acid [0. 7] er given to pregnant women [1.4] so an obvious question is has the introduction of folic acid made any difference on the number of cases that we've observed of neural tube defects [1.4] is that drop [0.2] as a result of the introduction of folic acid or is it just random variation [1.5] in other words what we want to do is to remove the year on year variation [0.2] that we've observed [1.0] and [0.4] make an inference about what the underlying trend is [1.2] and perhaps whether [0.2] the number of cases in the years prior to nineteen-ninety [0.5] are different to the number of cases in the years after nineteen-ninety [0.8] we want to get rid of the random variation and make some kind of inference about the underlying tendency [0.5] in the data [5.3] so just to give you a quick reminder of what a hypothesis test is [1.1] we set up our hypothesis which is to quantify our belief about say an incidence rate ratio or something like that [1.7] and we calculate the probability of what we've observed in our data [0.4] given what we've believed what the hypothesis is that we've set up [1.4] and the inference goes that if that probability [0.2] is very small [0.4] then either something very unlikely has happened and you should keep in mind that [0.3] the fact that something is unlikely doesn't mean that it's impossible it could happen [1.6] or that the hypothesis is wrong [2.0] and so if we observe a very small P-value [0.7] then we conclude that our data are incompatible with our hypothesis [0.4] so we can reject our null hypothesis [1.6] er a quick reminder that the probability of what we observe given what we believe is called the P-value [4.7] so this is a slide i lost it a little bit on last week [0.3] er [0.8] just do you remember the cut-off value of the P-value [0.2] that we use [0.6] is completely arbitrary [0.7] most of the time we use point-nought-five but we could use a different value if we chose to [1.6] and if we observe er [0.2] a P-value of point-zero-five-one [0.3] then that's still fairly unlikely [0.7] and if we're investigating something contentious like AIDS then [0.4] we'd still be fairly interested in what was going on in that instance although the result that we've observed is not statistically significant at the five per cent level if we get have a P-value of point-five-one [0.6] it's still a pretty unlikely event [0.4] and so we'd be more interested [1.0] conversely if we were investigating the common cold [0.2] then we probably wouldn't be too bothered [2.0] er good thing about P-values is they're they're easy to use and interpret the we [0.2] gives us a simple comparison of two numbers our significance level [0.6] and er the observed P-value that we have [1.5] it also has er an interpretation that it's the probability of rejecting the null hypothesis [0.7] er [0.5] when it's actually true in other words that the data could be consistent with the hypothesis [0.3] and be very unlikely [1.6] er the P-value gives us a probability of that [2.0] note also that the sid-, statistical significance depends on sample size [0.9] er we'd never reject the null hypothesis [0.2] in er a toss of a coin three times [0.2] because the lowest possible P [0.7] is still greater than point-nought-five the the lowest possible P in that case is er one-over-eight [0.3] it's always going to be greater than point-nought-five [1.0] and so we'd never reject it [0.8] so significance depends on the sample size [0.4] sorry i'm getting some hands waved in namex sm0275: how sorry how do you calculate P [0.4] nf0274: well i i have to talk about that [0.4] later [0.4] sorry [1.0] er [1.8] so er a statistically significant result [0.3] er [0.2] may not be clinically important remember [0.9] that depends on the [0.3] context of the problem [0.4] for example if we're looking at er [0.8] readmission data then we might be very very interested in a very small difference of readmission [0.8] but conversely a very small difference in mortality rate may not be [0.5] er clinically important [0.5] so [0.7] depends on er [0.5] the context of the problem as to whether or not the difference we observe is clinically important [6.7] [3.8] so our problem is to use [0.6] what we observe to draw conclusions about the underlying tendencies [1.0] and confidence intervals give us a range [0.2] er which may include the true value [0.8] and we can also test a hypothesis about the true value [0.3] that we're interested in [4.7] so [0.6] sorry i've er [0. 2] pressed the wrong button on the [0.7] slide i just want to make sure i'm showing you the right one [1.7] estimation right [2.0] so consider an example suppose in our study hypothetically [0.8] we have a hypothesis that the risk of er T-B in Warwickshire or Warwick [0.7] is the same as the risk of T-B in the rest of the U-K [1.2] and we've collected some data and calculated an incident rate ratio of one-point-three [1.1] and on the under the assumption [0.3] that th-, t-, the er [0.4] two risks are the same [1.4] the probability of observing that incident rate ratio of one-point- three [0.7] er occurs [0.2] less than one per cent of the time [1.2] so the data is inconsistent with the hypothesis that we're testing [1.2] and we can er [0.5] conclude that that we can reject that hypothesis but [0.3] what it doesn't tell you is what the magnitude of the difference [0.7] of the er [0.8] risk of T-B for Warwick and the risk of the U-K is [0.8] and [0.2] quite often we want to say something about [0.3] what size of difference [0.3] is [1.0] what the size of difference is [1.3] we want a best guess at the true risk [3. 7] so this slide shows the P-values associated with the range of hypotheses where we observe an incident rate ratio of one-point-three [1.9] so if we look at the [0.2] er line [0.4] corresponding to plus-thirty per cent risk i'll just get the pointer up [1.5] there it is [0.9] so if we're looking at this line plus-thirty per cent risk [1.3] observing an incident rate ratio of one-point-three [0.9] would correspond to a P-value of point-nought-fi er sorry point-five [0.4] and so we wouldn't reject our null hypothesis that there's a difference between the two areas there [2.2] correspondingly if our null hypothesis was that there was er [1.3] a plus-forty per cent risk [0.7] then observing er a P an incident rate ratio of one-point-three [0.5] would have a P-value of point-two so we still wouldn't reject our null hypothesis [1.1] and so on for all the other values in the table [4.0] this gives us an idea [0.5] of which hypotheses are inconsistent with our observed data [1.0] so i-, [0.2] informally the values outside the range of ten per cent excess risk to fifty per cent excess risk are inconsistent with the data we've observed [1.9] and that range probably includes the true value [6.1] so the ninety-five per cent confidence interval [0.8] is a range which includes the true value with ninety-five per cent certainty [1.3] er in this example the ninety-five per cent confidence interval for the incident rate ratio was one-point-one to one-point-five [1.2] and it centred on the observed value which is our best guess at the true value [1.1] and obviously because it's centred on the observed value that always falls inside the er confidence interval [5.6] so er [0.9] there are slightly different ways of calculating confidence intervals and namex students in particular might have seen [0.2] different methods er than the ones that are given in your lecture notes [0.9] but for the purposes of this course we're just using the error factor [1.1] formula are given in er the lecture notes for various [0.5] calculations for confidence intervals [0.5] er pages one-twenty- five and one-twenty-six [2.1] basically the confidence interval is centred on the observed value [1.1] and then we calculate the error factor [0.3] and correspondingly the upper and lower confidence lint-, [0.2] limits as appropriate [0.4] using the [0.3] er given formula [1.8] and the range between the lower and the upper confidence limit [0.5] is called the ninety-five per cent confidence [0.2] interval [3.6] [sniff] [1.1] so an example [1.3] say that we have interest in the incidence of diabetes [1.0] and we've observed fifty cases in ten-thousand person years [1. 2] so we have er five cases per thousand person years [1.4] we can calculate the observed exposure [1.3] and the error factor which is based on the number of events observed fifty [0.7] [sniff] [0.2] given by the formula which a-, a-, [0.4] appears on page one-twenty-five of your notes [0.3] the error factor exponential twice times the square root of one over the number of cases one over fifty in this example [0.9] being one-point-three-three [4.8] so we've observed [0.6] five per thousand person years give cases of diabetes per thousand person years [0.8] and we've calculated our error factor one-point- three-three [0.9] we can then use the formula to calculate the lower and the upper ninety-five per cent confidence limits [1.1] and give our best estimate of the true infiden-, [0.2] incidence being the observed inth-, incidence [0.4] [sniff] [0.3] and the ninety perfef-, [0.2] ninety-five per cent confidence interval [0.5] being three-point-eight to six-point-seven cases per thousand person years [0.8] we can be ninety-five per cent certain [0.2] that that range three-point-eight to six-point-seven [0.4] includes the true value of the incidence rate [1.5] sm0276: excuse me [0.3] [0.2] at the risk of feeling dim where do you get the two from in that formula [0.8] nf0274: sorry i've just been asked where the two comes from in the formula [0. 2] in the error factor [0.2] [sniff] [0.2] er [0.5] i i don't want to discuss that [0.2] i-, [0.4] in the lecture i want to carry on 'cause i've got quite a lot to get through [0.3] er we can perhaps talk about that in a [0.4] in a session later sm0277: if we don't understand the [0.3] point [0.6] then [0.2] is there any point [0.6] giving the lecture [0.9] i'm not being rude but like [0.7] that's two questions and you've not answered either of them [1.1] nf0274: the reason i i'm i'm again being asked another question the reason i'm not answering these is because er i'm also lecturing to le-, to namex students [0.8] and so [0.2] they can't hear what you say [0.2] they can only hear what i say [0.9] and i'm not confident enough with this system [0.2] to repeat your question [0.3] then think of the answer [0.3] try and keep the lecture to time [0.3] and keep going [0.7] so i'm sorry we will have to talk about those later [0.4] okay [2.5] right so diabetes example sorry i'll just have to look at where i am now [1.4] [3.2] so [1.7] consider what happens as we get more data [1.6] basically error factor which is based one over the number of cases [0.4] gets smaller because if we get more data we observe more cases [0.4] and so one divided by a number which is increasing [0.4] gets smaller [0.8] the error factor gets smaller [0.5] we multiply the observed value by the error factor which is getting smaller [1.0] and the confidence interval gets narrower [0.7] [sniff] [0.8] so for example [0.2] again if we've observed two-hundred new cases of diabetes in a population of forty-thousand people [0.2] over a year [0.9] then the estimated rate is the same as before [1.1] but the error factor is smaller because we've observed two-hundred cases not fifty cases [1.2] so the error factor using that formula is one-point-one-five [0.8] and that upper and lower ninety-five per cent confidence limits [0.3] are as given [0.9] and we have er [1.3] a confidence interval [0.5] of now four-point-three to five-point-eight rather than [0.6] three-point-six three-point-eight to six-point-seven [0.7] so we've got more data [0.5] our error factor has got smaller [0.2] and our confidence interval has got er [0.2] more narrow [5.6] so the confidence interval reflects our uncertainty about the true value of something so it's it could be an infi-, incidence a population prevalence or an average height or whatever [1.6] but you should remember that it's not a value [0.2] er not a range in which ninety-five per cent of the observations lie [2.2] and you can illustrate that quite easily if we split up the data from a few slides back [0.8] so if we have fifty cases on two-thousand people over five years [0.7] if we consider the number of cases in each of those five years that's what's given in the table so in the first year we observe th-, er thirteen cases [0.5] second year ten cases and so on [2.1] er the confidence interval for that data was three- point-eight to six-point-seven [0.7] but you can see that the incidence rate for er years three four and five [0.8] where the rate is point-zero-zero-three point-zero-zero-seven and point-zero-zero-three-five [0.7] is outside that range so sixty per cent of our observed observations are falling outside that range [1.3] just an illustration [4.0] another example looking at the heights of fifty students [1.0] can calculate the observed mean height [0.4] and the confidence interval [0.8] er using the appropriate formula [1.6] but ninety- five per cent of our students er fall between [0.4] one-point-five-five metres and one-point-eight-five metres [0.2] in height [0. 8] [sniff] [0.4] er [1.1] so we find the range in which ninety-five per cent of our students [0.2] lie by inspection of our data in that case [1.6] and you can see that the two ranges aren't the same [0.2] one is called the reference range which is different from the confidence interval [0.7] and it's important that you remember that [5.5] another quick example if we're interested in a rate ratio [1.1] so in the first population we observe D-one cases er in in P-one person years [0.4] and in the second we observe D-two cases in P-two person years [0.6] [sniff] [0.4] er calculate the observed rate ratio easily [0.8] and the error factor using the appropriate formula page one-twenty-five [5.7] estimation versus hypothesis testing you should note that estimation is [0.2] more informative than hypothesis testing [1.2] er it can incorporate a hypothesis test [1.6] quick drink [6.5] so it's actually more useful to know [0.5] something about the plausible size of a difference than knowing only that there is a difference [0.7] our hypothesis test can tell us whether or not our data is consistent with there being a difference [0.4] but it can't tell us how big that difference is [2.3] so carry on with the rate ratio example [0.4] if in population A we have twelve cases in two-thousand person years [0.6] and population B we have sixteen cases in four- thousand person years [1.0] then we can calculate the rates per thousand person years in the usual way [0.3] and the ratio [0.3] of A to B being one-point-five [3.4] obviously in our hypothesis [0.7] er [1.3] if the rates are the same then the ratio will be one [0.4] [sniff] [0.2] and the observed ratio of the rates in our [0.2] data example [0.5] is one-point-five [0.7] excuse me [2.1] er we used the formula [0.3] on page one-point-five [0.2] er is it one-point-five page one-twenty-five to calculate the error factor [0.7] note that includes the er [0.6] observed events in both populations not just in the one [1.5] and the error factor for that example [0.2] er is is easily shown to be two- point-one-five [0.5] so we can use the usual way to calculate the ninety-five per cent [0.3] confidence interval for the rate ratio [1.0] er and that gives us the range of point-seven-nought to three-point-two-three [3.1] now that includes the er [1.6] value of one [0.6] which is that the rates are the same [1.6] so from that we can conclude that the observed data we've [0.2] based that confidence interval on [0.5] are consistent with the null hypothesis that the rates are the same [0.6] because the confidence interval [0.4] includes the er null value of one [0.6] we conclude [0.2] that the data is er consistent [0. 3] with that hypothesis [0.9] so we can't reject that hypothesis at the five per cent level [2.0] note that it doesn't prove [0.2] that the null hypothesis is true [1.2] er [0.3] and if you think about that for a while you'll note that the range that we've got there point-seven to three-point-two [0.7] also includes the value of three [0.9] so if we tested the hypothesis that the er rate ratio was three [0.6] then we wouldn't reject that hypothesis either [2.8] so that just shows you that it doesn't prove that the null hypothesis [0.4] of er the ratios being the same [0.3] is true [0.7] it merely says that the data is not inconsistent with it [5.8] another example using the rate ratio this one is where the data are inconsistent with the null hypothesis [0.9] the confidence interval calculated i'll not [0.3] trawl through the er [0.4] algebra again [0.5] the confidence interval there is one-point-four to two-point-eight-six and that does include the null value of one [1.5] so in this case we can reject [0.6] the er [0.3] sorry that doesn't include the null value of one [1.0] namex students are looking a bit confused there [sniff] [0.3] er it doesn't include the null value of one [0.3] and so we can't er [0.2] reject [0.3] the er we can reject the null hypothesis [5.1] er [0.5] today you'll have done a bit of inference on er [0.6] standardized mortality ratios [0.4] [sniff] [0.3] again that just runs through the formula this is on page one-twenty-six of your lecture notes [1.1] we observe O deaths we expect E deaths [0.4] based on age-specific rates and standard population [0.4] age- specific population sizes in the study population [0.9] we can calculate our observed S-M-R [0.7] and the error factor [0.4] twice the square root of one over the observed [0.3] exponentiated [2.1] and er set up our confidence interval [3.5] so suppose we put some data in we er we have er [0.2] we expect fifty deaths in our study population [0.7] and we observe sixty deaths [0.7] then our observed S-M-R is one-twenty [0.7] error factor one-point-two-nine and our ninety-five confidence interval [0.3] is ninety-three to one-fifty-five [1. 7] this includes a hundred [0.4] er remember if the observed equals the expected [0.3] then the S-M-R would be a hundred [0.8] and so we wouldn't reject the null hypothesis [1.8] note that it also includes values as high as fifty per cent excess deaths so it doesn't in-, it doesn't prove the quality either [0.8] again the same same argument [1.1] as before [4.5] so a summary quite an extensive summary for this lecture [1.0] [sniff] [1.0] er all obser-, observations are subject to random variation we've seen several examples of data [0.2] which have fluctuated usually around some underlying tendency [1.1] and we're always interested in the underlying tendency [1.4] we can use the data that we observe to test hypothesis [0.2] hypotheses [0.5] about underlying values which gives us an idea of whether data is consistent or not [0.5] with er what we believe [1.2] and we can also use our er [0.2] observed data to estimate our underlying tendency [1.5] that we're interested in [1.9] in this course the best estimate of the true value [0.3] er of underlying tendency is the observed value [1.8] but we also want er an idea of how that varies [0.3] er to take into account the the the nature of random variation [0.4] just to give a single number for something would be rather naive [1.2] and we express the uncertainty again in this course by calculating error factors and deriving confidence intervals [0. 8] and remember that the definition of a ninety-five per cent confidence interval is the range which includes the true value of the statistic of intre-, [0.3] interest [0.4] with probability of ninety-five per cent or point-nine-five if you like it that way [1.8] you can also look at it as the range of true values [0.2] which is consistent with the observed data [1.9] so if er different values are consistent with the observed data [0.6] that would lead us to different [0.2] different conclusions but you can only be uncertain [0.2] what to conclude [3.4] another summary slide [1.2] we have two populations with incidence rates point-zero-zero-eight point-zero- zero-two [0.8] our rate ratio is four error factor is two [0.7] so our ninety- five per cent confidence interval is two to eight [1.2] all the values in the ninety-five per cent confidence interval suggest that the rate in A is higher than the rate in B because it doesn't include the null value of one [2.0] we can er [1.0] safely conclude that A is higher than B [3.2] so the rate ratio is significantly different from one [0.3] at er [0.7] the five per cent level [5.4] a further example [0.2] rate ratios again the rate ratio g-, er is er [0.4] two the error factor is four [0.7] and the ninety-five per cent confidence interval is [0.2] point-five to point to eight sorry [2.2] the values in the ninety-five per cent confidence interval in this case are consistent with A being much higher than B [0.5] A being lower than B or both the same [0.7] in other words er [1.5] there are values in that confidence interval which are greater than or less than one [0.3] and one is also included [1.9] we can't really be [0.2] er too firm about our conclusions [0.8] in that case the ninety-five per cent confisa-, confidence interval does include our null [0.4] value [0.3] er the value of one [0.6] so we can say that the rate ratio is not significantly different from one [0.3] in that case [0.9] but it doesn't prove the quality [0.2] again again we have values of up to eight in that confidence interval [0.7] so the data is also consistent with [0.2] with er hypotheses that extreme [2.8] so things to remember [0.5] er variation [0.2] always exists people are different we should all know that without too much uncertainty [1.7] and because of that variation our underlying dat-, er our observed data [0.5] is different from our underlying tendency [1. 4] you need to have an appreciation of what sources of variation might be [0.9] why why things differ [0.5] and er be able to test hypotheses about true values and set up confidence intervals [0.3] using the formula given in your lecture notes [2.9] so that's it for today [0.3] er [0.2] i believe someone has an announcement to make i don't know where are you [0.7] she's there do you need to make it to namex students as well [0.3] sf0278: no [0.2] nf0274: no okay this is just for namex students so [0.9] there we are thanks very much