nf0274: er so in the last lecture we looked at hypothesis tests where our belief about the value of the underlying tendency whatever it was was used to calculate the probability of the data that we observed er we were essentially considering how different what we observed was to what we expected to happen what we believe is happening and today we're going to be looking at confidence intervals which allows us to get an appreciation of the size of that difference we calculate a range which includes the true value of the specified probability before we go on to that we'll consider a quick illustration of the problem that might face us the slide shows data er hypothetical of a number of neural tube defects in Western Australia from nineteen-seventy-five to two-thousand er and obviously what we're most interested in is how many cases we can expect on average in a year and from what we observe the line gives us an idea the line in the er middle of the plot gives us an idea of what we might expect to observe the first thing to note about that is that there's a fair amount of variation of the points around that line which makes it a little bit difficult to predict the number of cases in any particular year the second thing to note about the plot is that there appears to be a drop in the number of cases at around about nineteen-ninety and that actually coincides with the introduction of folic acid er given to pregnant women so an obvious question is has the introduction of folic acid made any difference on the number of cases that we've observed of neural tube defects is that drop as a result of the introduction of folic acid or is it just random variation in other words what we want to do is to remove the year on year variation that we've observed and make an inference about what the underlying trend is and perhaps whether the number of cases in the years prior to nineteen-ninety are different to the number of cases in the years after nineteen-ninety we want to get rid of the random variation and make some kind of inference about the underlying tendency in the data so just to give you a quick reminder of what a hypothesis test is we set up our hypothesis which is to quantify our belief about say an incidence rate ratio or something like that and we calculate the probability of what we've observed in our data given what we've believed what the hypothesis is that we've set up and the inference goes that if that probability is very small then either something very unlikely has happened and you should keep in mind that the fact that something is unlikely doesn't mean that it's impossible it could happen or that the hypothesis is wrong and so if we observe a very small P-value then we conclude that our data are incompatible with our hypothesis so we can reject our null hypothesis er a quick reminder that the probability of what we observe given what we believe is called the P-value so this is a slide i lost it a little bit on last week er just do you remember the cut-off value of the P-value that we use is completely arbitrary most of the time we use point-nought-five but we could use a different value if we chose to and if we observe er a P-value of point-zero-five-one then that's still fairly unlikely and if we're investigating something contentious like AIDS then we'd still be fairly interested in what was going on in that instance although the result that we've observed is not statistically significant at the five per cent level if we get have a P-value of point-five-one it's still a pretty unlikely event and so we'd be more interested conversely if we were investigating the common cold then we probably wouldn't be too bothered er good thing about P-values is they're they're easy to use and interpret the we gives us a simple comparison of two numbers our significance level and er the observed P-value that we have it also has er an interpretation that it's the probability of rejecting the null hypothesis er when it's actually true in other words that the data could be consistent with the hypothesis and be very unlikely er the P-value gives us a probability of that note also that the sid-, statistical significance depends on sample size er we'd never reject the null hypothesis in er a toss of a coin three times because the lowest possible P is still greater than point-nought-five the the lowest possible P in that case is er one-over-eight it's always going to be greater than point-nought-five and so we'd never reject it so significance depends on the sample size sorry i'm getting some hands waved in namex sm0275: how sorry how do you calculate P nf0274: well i i have to talk about that later sorry er so er a statistically significant result er may not be clinically important remember that depends on the context of the problem for example if we're looking at er readmission data then we might be very very interested in a very small difference of readmission but conversely a very small difference in mortality rate may not be er clinically important so depends on er the context of the problem as to whether or not the difference we observe is clinically important so our problem is to use what we observe to draw conclusions about the underlying tendencies and confidence intervals give us a range er which may include the true value and we can also test a hypothesis about the true value that we're interested in so sorry i've er pressed the wrong button on the slide i just want to make sure i'm showing you the right one estimation right so consider an example suppose in our study hypothetically we have a hypothesis that the risk of er T-B in Warwickshire or Warwick is the same as the risk of T-B in the rest of the U-K and we've collected some data and calculated an incident rate ratio of one-point-three and on the under the assumption that th-, t-, the er two risks are the same the probability of observing that incident rate ratio of one-point-three er occurs less than one per cent of the time so the data is inconsistent with the hypothesis that we're testing and we can er conclude that that we can reject that hypothesis but what it doesn't tell you is what the magnitude of the difference of the er risk of T-B for Warwick and the risk of the U-K is and quite often we want to say something about what size of difference is what the size of difference is we want a best guess at the true risk so this slide shows the P-values associated with the range of hypotheses where we observe an incident rate ratio of one-point-three so if we look at the er line corresponding to plus-thirty per cent risk i'll just get the pointer up there it is so if we're looking at this line plus-thirty per cent risk observing an incident rate ratio of one-point-three would correspond to a P-value of point-nought-fi er sorry point-five and so we wouldn't reject our null hypothesis that there's a difference between the two areas there correspondingly if our null hypothesis was that there was er a plus-forty per cent risk then observing er a P an incident rate ratio of one-point-three would have a P-value of point-two so we still wouldn't reject our null hypothesis and so on for all the other values in the table this gives us an idea of which hypotheses are inconsistent with our observed data so i-, informally the values outside the range of ten per cent excess risk to fifty per cent excess risk are inconsistent with the data we've observed and that range probably includes the true value so the ninety-five per cent confidence interval is a range which includes the true value with ninety-five per cent certainty er in this example the ninety-five per cent confidence interval for the incident rate ratio was one-point-one to one-point-five and it centred on the observed value which is our best guess at the true value and obviously because it's centred on the observed value that always falls inside the er confidence interval so er there are slightly different ways of calculating confidence intervals and namex students in particular might have seen different methods er than the ones that are given in your lecture notes but for the purposes of this course we're just using the error factor formula are given in er the lecture notes for various calculations for confidence intervals er pages one-twenty-five and one-twenty-six basically the confidence interval is centred on the observed value and then we calculate the error factor and correspondingly the upper and lower confidence lint-, limits as appropriate using the er given formula and the range between the lower and the upper confidence limit is called the ninety-five per cent confidence interval [sniff] so an example say that we have interest in the incidence of diabetes and we've observed fifty cases in ten-thousand person years so we have er five cases per thousand person years we can calculate the observed exposure and the error factor which is based on the number of events observed fifty [sniff] given by the formula which a-, a-, appears on page one-twenty-five of your notes the error factor exponential twice times the square root of one over the number of cases one over fifty in this example being one-point-three-three so we've observed five per thousand person years give cases of diabetes per thousand person years and we've calculated our error factor one-point-three-three we can then use the formula to calculate the lower and the upper ninety-five per cent confidence limits and give our best estimate of the true infiden-, incidence being the observed inth-, incidence [sniff] and the ninety perfef-, ninety-five per cent confidence interval being three-point-eight to six-point- seven cases per thousand person years we can be ninety-five per cent certain that that range three-point-eight to six-point-seven includes the true value of the incidence rate sm0276: excuse me at the risk of feeling dim where do you get the two from in that formula nf0274: sorry i've just been asked where the two comes from in the formula in the error factor [sniff] er i i don't want to discuss that i-, in the lecture i want to carry on 'cause i've got quite a lot to get through er we can perhaps talk about that in a in a session later sm0277: if we don't understand the point then is there any point giving the lecture i'm not being rude but like that's two questions and you've not answered either of them nf0274: the reason i i'm i'm again being asked another question the reason i'm not answering these is because er i'm also lecturing to le-, to namex students and so they can't hear what you say they can only hear what i say and i'm not confident enough with this system to repeat your question then think of the answer try and keep the lecture to time and keep going so i'm sorry we will have to talk about those later okay right so diabetes example sorry i'll just have to look at where i am now so consider what happens as we get more data basically error factor which is based one over the number of cases gets smaller because if we get more data we observe more cases and so one divided by a number which is increasing gets smaller the error factor gets smaller we multiply the observed value by the error factor which is getting smaller and the confidence interval gets narrower [sniff] so for example again if we've observed two-hundred new cases of diabetes in a population of forty-thousand people over a year then the estimated rate is the same as before but the error factor is smaller because we've observed two- hundred cases not fifty cases so the error factor using that formula is one- point-one-five and that upper and lower ninety-five per cent confidence limits are as given and we have er a confidence interval of now four-point-three to five-point-eight rather than three-point-six three-point-eight to six-point- seven so we've got more data our error factor has got smaller and our confidence interval has got er more narrow so the confidence interval reflects our uncertainty about the true value of something so it's it could be an infi-, incidence a population prevalence or an average height or whatever but you should remember that it's not a value er not a range in which ninety-five per cent of the observations lie and you can illustrate that quite easily if we split up the data from a few slides back so if we have fifty cases on two-thousand people over five years if we consider the number of cases in each of those five years that's what's given in the table so in the first year we observe th-, er thirteen cases second year ten cases and so on er the confidence interval for that data was three-point-eight to six-point-seven but you can see that the incidence rate for er years three four and five where the rate is point-zero-zero-three point-zero-zero-seven and point-zero-zero-three- five is outside that range so sixty per cent of our observed observations are falling outside that range just an illustration another example looking at the heights of fifty students can calculate the observed mean height and the confidence interval er using the appropriate formula but ninety-five per cent of our students er fall between one-point-five-five metres and one-point-eight-five metres in height [sniff] er so we find the range in which ninety-five per cent of our students lie by inspection of our data in that case and you can see that the two ranges aren't the same one is called the reference range which is different from the confidence interval and it's important that you remember that another quick example if we're interested in a rate ratio so in the first population we observe D-one cases er in in P-one person years and in the second we observe D- two cases in P-two person years [sniff] er calculate the observed rate ratio easily and the error factor using the appropriate formula page one-twenty-five estimation versus hypothesis testing you should note that estimation is more informative than hypothesis testing er it can incorporate a hypothesis test quick drink so it's actually more useful to know something about the plausible size of a difference than knowing only that there is a difference our hypothesis test can tell us whether or not our data is consistent with there being a difference but it can't tell us how big that difference is so carry on with the rate ratio example if in population A we have twelve cases in two-thousand person years and population B we have sixteen cases in four-thousand person years then we can calculate the rates per thousand person years in the usual way and the ratio of A to B being one-point-five obviously in our hypothesis er if the rates are the same then the ratio will be one [sniff] and the observed ratio of the rates in our data example is one-point-five excuse me er we used the formula on page one-point-five er is it one-point-five page one-twenty-five to calculate the error factor note that includes the er observed events in both populations not just in the one and the error factor for that example er is is easily shown to be two-point-one- five so we can use the usual way to calculate the ninety-five per cent confidence interval for the rate ratio er and that gives us the range of point- seven-nought to three-point-two-three now that includes the er value of one which is that the rates are the same so from that we can conclude that the observed data we've based that confidence interval on are consistent with the null hypothesis that the rates are the same because the confidence interval includes the er null value of one we conclude that the data is er consistent with that hypothesis so we can't reject that hypothesis at the five per cent level note that it doesn't prove that the null hypothesis is true er and if you think about that for a while you'll note that the range that we've got there point-seven to three-point-two also includes the value of three so if we tested the hypothesis that the er rate ratio was three then we wouldn't reject that hypothesis either so that just shows you that it doesn't prove that the null hypothesis of er the ratios being the same is true it merely says that the data is not inconsistent with it another example using the rate ratio this one is where the data are inconsistent with the null hypothesis the confidence interval calculated i'll not trawl through the er algebra again the confidence interval there is one-point-four to two-point- eight-six and that does include the null value of one so in this case we can reject the er sorry that doesn't include the null value of one namex students are looking a bit confused there [sniff] er it doesn't include the null value of one and so we can't er reject the er we can reject the null hypothesis er today you'll have done a bit of inference on er standardized mortality ratios [sniff] again that just runs through the formula this is on page one-twenty-six of your lecture notes we observe O deaths we expect E deaths based on age-specific rates and standard population age-specific population sizes in the study population we can calculate our observed S-M-R and the error factor twice the square root of one over the observed exponentiated and er set up our confidence interval so suppose we put some data in we er we have er we expect fifty deaths in our study population and we observe sixty deaths then our observed S-M-R is one-twenty error factor one-point-two-nine and our ninety- five confidence interval is ninety-three to one-fifty-five this includes a hundred er remember if the observed equals the expected then the S-M-R would be a hundred and so we wouldn't reject the null hypothesis note that it also includes values as high as fifty per cent excess deaths so it doesn't in-, it doesn't prove the quality either again the same same argument as before so a summary quite an extensive summary for this lecture [sniff] er all obser-, observations are subject to random variation we've seen several examples of data which have fluctuated usually around some underlying tendency and we're always interested in the underlying tendency we can use the data that we observe to test hypothesis hypotheses about underlying values which gives us an idea of whether data is consistent or not with er what we believe and we can also use our er observed data to estimate our underlying tendency that we're interested in in this course the best estimate of the true value er of underlying tendency is the observed value but we also want er an idea of how that varies er to take into account the the the nature of random variation just to give a single number for something would be rather naive and we express the uncertainty again in this course by calculating error factors and deriving confidence intervals and remember that the definition of a ninety-five per cent confidence interval is the range which includes the true value of the statistic of intre-, interest with probability of ninety-five per cent or point-nine-five if you like it that way you can also look at it as the range of true values which is consistent with the observed data so if er different values are consistent with the observed data that would lead us to different different conclusions but you can only be uncertain what to conclude another summary slide we have two populations with incidence rates point-zero- zero-eight point-zero-zero-two our rate ratio is four error factor is two so our ninety-five per cent confidence interval is two to eight all the values in the ninety-five per cent confidence interval suggest that the rate in A is higher than the rate in B because it doesn't include the null value of one we can er safely conclude that A is higher than B so the rate ratio is significantly different from one at er the five per cent level a further example rate ratios again the rate ratio g-, er is er two the error factor is four and the ninety-five per cent confidence interval is point-five to point to eight sorry the values in the ninety-five per cent confidence interval in this case are consistent with A being much higher than B A being lower than B or both the same in other words er there are values in that confidence interval which are greater than or less than one and one is also included we can't really be er too firm about our conclusions in that case the ninety-five per cent confisa-, confidence interval does include our null value er the value of one so we can say that the rate ratio is not significantly different from one in that case but it doesn't prove the quality again again we have values of up to eight in that confidence interval so the data is also consistent with with er hypotheses that extreme so things to remember er variation always exists people are different we should all know that without too much uncertainty and because of that variation our underlying dat-, er our observed data is different from our underlying tendency you need to have an appreciation of what sources of variation might be why why things differ and er be able to test hypotheses about true values and set up confidence intervals using the formula given in your lecture notes so that's it for today er i believe someone has an announcement to make i don't know where are you she's there do you need to make it to namex students as well sf0278: no nf0274: no okay this is just for namex students so there we are thanks very much