nf0273: okay you might notice a slight difference between er this week and last week that's because i am not namex i'm namex i'm a lecturer in medical statistics and i'm doing today's lecture and the next Sources of Variation lecture er first of all a couple of things i realize you've got a lecture at quarter past one so i will be trying to keep to time er bear with me the other thing to say is that the er guest lecture from namex is being swapped with Sources of Variation Three so namex was on the twenty-seventh and sorc-, Sources of Variation was on the twentieth they're now going to be swapped so namex will be on the twentieth and Sources of Variation Three will be on the twenty-seventh okay so sources of variation hurray the slide changed over the informal objectives of this lecture are to enable you to distinguish between observed data and underlying tendencies which give rise to observed data and to understand the concept of variation and randomness er you have some examples in your lecture notes on page a hundred er for example we might observe the proportion of people with diabetes in a sample and that would give us an idea of the underlying prev-, prevalence of diabetes in a particular population another example would be breast cancer survival we might observe the proportion surviving who were treated with tamoxifen whereas what we're actually interested in is the effect of survival on treating everybody with tamoxifen if they have breast cancer er so that gives you an idea quickly of the difference between observed data and underlying tendencies which give rise to data objective two of understanding concepts of sources of variation and randomness i would hope that most of us have a fairly good appreciation that we're all different without really thinking about it the reason this kind of thing is important to take into account er is basically when we're planning for er predicting for the future say for example for providing flu jabs we can observe the number of cases of flu per year in the last five years and we wouldn't be surprised to see that those numbers in the last five years were different year on year wouldn't surprise us at all er we we should all be fairly fairly competent at realizing that the number of cases of flu would depend on various factors in a very complex manner and simply because of the the fact that we're all different anyway there'd be a natural variation component in that so the formal objectives of this lecture [sniff] is first that you should be able to distinguish between observed epidemiological quantities such as incidence prevalence incident rate ratio things like that and their true or underlying values and you ought to be able to discuss how observed epidemiological quantities depart from true values because of random variation unless we have large resources and can measure absolutely everybody in a particular population we're interested in we'll only ever see an observed proportion of people with diabetes say and that may or may not be equal to the true prevalence of diabetes in our sa-, in our population but if we selected our sample properly then that ought to give us a fairly good idea of the basic prevalence of diabetes in the population but that basic idea will vary because of natural variation so consequently we want to be able to say something about how our basic idea of prevalence will vary in reality an idea of the scale of the variation will help us with that and statistical theory will help us to do that objective three we want to be able to describe how observed values help us towards a knowledge of the true values and there are two basic statistical ways of doing that certainly in this module at least the first is to test a hypothesis about a true value and that's what we'll be dealing with in this lecture and the second is to calculate a range in which that true value probably lies so today we'll just be talking about hypothesis tests just have a quick drink so say we're interested for some reason in the probability of getting a head when we flip a coin so the obvious thing to do do a quick experiment flip the coin ten times see what happens suppose we observe seven heads and three tails then informally we could m-, draw several conclusions from that observation given our prior belief about the probability of the coin falling on heads first of all we might suspect that our data was wrong it happens er censuses get miscounted for various reasons er another thing could be that we could have artefact which isn't very easy to illustrate in the example of tossing a coin so i'll give you another one if we look at how er deaths from diabetes change with time i believe we were discussing that in a couple of lectures ago one of the things that altered the number of deaths o-, from diabetes with time was altering the definition of diabetes that had an er an effect on the conclusions that we made about the change in deaths and that is generally known as artefact er another conclusion we might draw is it's just chance the coin's fair we're expecting five heads we've seen seven it's not all that surprising we just put it down to chance and conclude that our coin is fair on the other hand if we're feeling particularly cynical we might conclude that the coin is biased it's difficult to tell seven is that different from five or not we don't know so that provides a simple example of what we observe not being exactly what we expected if we toss a coin ten times we expect five heads given that it's fair but we observe seven the coin will tend to produce an equal number of heads and an equal number of tails but we're not surprised when random variation means that we observe something slightly different and similarly it's no surprising that the health of people varies on average four cases of meningitis per month in Leicester some some months we observe ten other months we observe none nobody's terribly surprised about that again smokers tend to be less healthy than non-smokers but if we pick a small sample then we might for some reason have ended up picking healthy smokers just down to chance so tendency versus observations what we practically want to know [sniff] is what is going to happen in the future what are the underlying tendencies of health in our population for example er providing for our flu jabs we want to plan to buy enough flu jabs to vaccinate at least most people at risk in our population we need the underlying tendency of that population to being at risk at flu of flu from flu even so we might take the number of er cases of flu in the previous years and logically we might also use any other information that we know to have a bearing on the number of cases of flu that we observe so temperature would be an obvious one er the underlying health of the general population but that's slightly more difficult to to quantify so we would take what we've observed in the past and what we know to have a bearing on our probability of someone having flu and try and use it to predict the future so some further examples of attempts er of of the differences between the underlying tendency being related to the observed data if we're interesteded for some bizarre artificial reason of the proportion of red marbles in a bag with a thousand red and black ones then we could count all thousand marbles and we would know exactly what the underlying proportion was our underlying tendency er but obviously we don't have all day and we're not particularly interested in counting marbles so we could just take a sample and measure the proportion of reds in that sample that we pick at random if we pick the sample sufficiently well and sufficiently large we'll have a fairly good idea of what the proportion of reds is in the bag similarly we can't ask everybody how they voted in the general election but we ought to be intuitively er confident that asking a thousand people how they voted assuming they didn't lie to us er that we have a fairly good idea of the result of the election and again we're interested in the total number of Leicester diabetic patients who have foot problems so instead of asking all Leicester diabetic pr-, patients how their feet are we would just take er a random sample we don't have all day we don't have infinite time we don't have infinite money so if we have an idea of the underlying tendency of diabetes in a population then we can predict what we may reasonably observe using probability theory a further example er working out the provision of neonatal intensive care cots we know from the past data that the true requirement in nineteen-ninety-two was about one cot per thousand live births per year and we also know from the past that we observe about twelve-thousand live births per year so on average we'll need about twelve-thousand neonatal intensive cots per year that's the true tendency we've taken a lot of data and measured what we're interested in and that's what we've ended up with however just knowing the average isn't enough we need to know an idea of how it all varies the er slide shows the er requirement of neonatal intensive care costs in the past you can see that it varies quite a lot it gives us an idea of the variation in the need for a neo-, neonatal intensive care cots in the past this is what we've observed in the past not where it what we're expecting in the future yet and it has quite a large range er we've in the past we've required between two and twenty-four neonatal intensive care cots and most of the time we needed between about eight and sixteen cots so if we provided twelve if we'd just gone with the average and ignored the variation then quite a lot of the time we'd be up to about four cots short so we need an appreciation of the variation [sniff] slide eleven a slight repeat neonatal intensive care cots we often observe eight to six c-, er eight to sixteen cots being used and on one day per month more having done some mildly complex calculations using the data in that histogram we needed nineteen or more cots and on one per cent of the days we needed twenty-one er cots hardly ever did we need more than twenty-four so logically we provided we w-, we looked at that data and thought right let's provide nineteen cots and on average about twelve were occupied so we had sixty-three per cent of those nineteen cots occupied usually now that was taking data from our true distribution which we'd observed over a certain period of time in the past and used it to er work out what we would expect to see but in practice what we want to do is entirely the other way round we want to observe something and make an inference about what we expect to see in the future we want to reverse the direction of inference from the observed distribution to the true tendency given what we observe what we might what might we expect to happen in the future and hypothesis tests allows us to do this in a formal way we can take the observed data and make an objective statement about the er true situation we can use we can describe how the observed values will help us towards a knowledge of the true values by testing our hypothesis so formally a hypothesis is a statement that an underlying tendency of scientific interest takes a particular quantitative value we have to state our beliefs in a quantitative way in order to use quantitative methods and on the slide are some examples of hypotheses that we might test so first of all we might say that the coin is fair but to put that in a quantitative way we have to put a value on the probability of a head so if the coin is fair we'd expect to see er heads about half the time and that is equivalent to saying that the probability of a head is a half if we want to say that a new drug is no better than a standard treatment then we would compare the survival rates by calculating the ratio if the new drug is no better then the survival rates we would expect to be equal and consequently the ratio would be equal to one and again er if we want to make a statement about the true prevalence of tuberculosis in a given population then we have to put a value on that we may observe from the past that it would be two in ten-thousand and use that as our help-, h-, our hypothesis to test so we're stating our beliefs which may be possibly informal in a formal quantitative way in order to use quantitative methods so now we have our hypothesis what can we do with it say er we have the hypothesis that our success rate for aneurysm repair is eighty per cent and we observe what happens to say six patients who have an aneurysm repaired we need to use what we observe about those six patients to test that hypothesis that the success rate is about eighty per cent is eighty per cent now informally if we had observed er one death in a in those six patients then er we could be reasonably confident of a difference from eighty per cent because that's quite an obstre-, extreme observation if we observe four or five in six then we would be unsure what to conclude because the proportion of four or five out of six is quite close to eighty per cent we're not totally sure so that would give us an informal idea but we want a way of objectively distinguishing the instances where our er our observed data is slightly different from our expected data our our null hypothesis from the situations where we have er data which is different from our null hypothesis and constant consequently quite extreme and hypothesis testing allows us to do that objectively so it allows us to compare consistently what we observe with what is actually happening what we think is happening so formally in a hypothesis test we calculate the probability of getting an ar-, an observation as as extreme as or more extreme than the one observed if the stated hypothesis was true we have our stated hypothesis in a quantitative fashion and we can make some probability statement about that which we can then use to calculate the probability of our observed data the idea is that if what we observe is very unlikely then the probability will be very small so if the probability is very small then either under the null hypothesis something very unlikely has occurred or the hypothesis is wrong so then we conclude that the data are inca-, incompatible with our null hypothesis and that probability is called a P-value er another example of how er you might remember a P-value which is er a slightly more medical interpretation would be to consider how likely a patient having a blood pressure of one-forty over ninety and being healthy would be healthy patients don't nen-, generally have blood pressures that extreme so either it's highly unlikely the patient has er is healthy and has an extreme blood pressure reading or the patient is not healthy er so tha-, that that probability is a P-value so take our extreme value we have a hypothesis that a coin is fair and we've tossed it ten times we've observed ten heads and zero tails now under the hypothesis that the coin is fair the probability of a head is point-five a half one in two then assuming that the probability of a head is one if fi-, er one in two even a half we can calculate the probability of getting ten heads each with a probability of a half and that translates to about point-zero-zero-two one in five-hundred exactly two in one-over-one-thousand-and- twenty-four two times one-thousand one-over-one-thousand-and-twenty-four that's our P-value our probability of observing ten heads given the probability of a head is a half the probability of observing the data given that the null hypothesis is true now that's really unlikely one in five-hundred so either we've got an outstanding chance result or the data o-, or the hy-, the hypothesis i-, er can be rejected the data we've observed is inconsistent with the hypothesis that we're testing that the coin is true and therefore we've got strong evidence against that hypothesis we've un-, we've observed something very unlikely so we've concluded that the hypothesis we were testing is false er yeah [sniff] we've rejected that that null hypothesis prior beliefs are relevant here er they help us to set up the null hypothesis er i-, i-, in in this example our prior belief was that the the coin was fair so we assume that the probability of a head was a half and coc-, calculated the probability of our o-, observed data in those circumstances the last example there where we have er ten patients treated on er u-, using new treatment X and ten of them surviving er is exactly the same as tossing a coin ten times where instead of tossing a coin we wait and see whether the patient lives or dies same as head or tail and historically if we've seen that fifty per cent die that's the same as expecting a head with probability point-five so that might help put it in context for you so we've set up our null hypothesis we've made some probability statements about the observed data given that our n-, our null hypothesis is true we've got our P- value if that P-value is less than or equal to point-zero-five then we reject our hypothesis we say one of one of several things we could say that the data is inconsistent with the hypothesis we've assumed something is true we've observed something which is very like-, very unlikely if it's true therefore what we're seeing is inconsistent with what we think we could also put that as saying that we have substantive evidence against the hypothesis er that it's reasonable to reject the hypothesis and that it's a statistically significant result at five per cent in this particular example if the P-value is greater than point-nought-five then we can't say any of the above er what we can't say is that the null hypothesis is false absence of evidence against the null hypothesis isn't evidence of absence we can't say that that gives us evidence to conclude that the hypothesis is false for example if er the probability under our null hypothesis that the mean surface temperature er of the earth has increased by only one centigrade over the last fifty years [sniff] er our observed data has a probability of point-one then that's greater than point- nought-five so we reject that null hypothesis it doesn't prove that there is no global warming it simply proves that what we've observed is inconsistent with what we believe that the temperature of the earth has increased by one per cent over the last er one degree-C by the last in the last fifty years another example which might be particularly illuminating on this absence of evidence not evi-, is not evidence of absence would be the U-S's stance on Iraqi weapons at the moment they're trying to say that the absence of evidence of er weapons doesn't mean that that is evidence that there are no weapons that may help to illuminate for you er further examples of i-, h-, hypothesis tests and P-values the first example the incidence of disease X in Warwickshire significantly lower than the rest of the U-K P equals nought-point-nought-one this means that we've tested the hypothesis that the incidence of disease in Warwickshire is equal to the incidence of disease in the rest of the U-K and what we've observed about the incidence of disease in Warwickshire is very unlikely under that null hypothesis if the two incidences were the same then what we'd observe would have a probability of point-nought-one that's very unlikely it's less than point-nought-five so we've rejected that null hypothesis and we can say that the incidence of disease X in Warwickshire is significantly lower than in the rest of the U-K second example death rate from disease Y is significantly higher in Barnsley than in Leicester with P equals point-five we've tested the null hypothesis that the two death rates are equal we've observed something about the death rates of both of them and we've concluded that what we've observed is very unlikely under that null hypothesis that they are the same that's er that particular example what we've observed under our null hypothesis has about a five per cent chance of occurring i'll talk a little bit more about how we choose a cut-off point P- values a bit later on third example patients on the new drug did not live significantly longer than those on the standard drug we've taken patients on the new drug and patients on the standard drug tested the hypothesis that they both lived the same amount of time and calculated under that hypothesis the probability of the data we've observed being about point-four in other words about forty per cent of the time we would observe data that extreme that's not that unlikely so we've rejected the null hypothesi-, er we've accepted the null hypothesis in that case so the null hypothesis the hypothesis to be tested is often called the null hypothesis oh i'm glad we've got H-nought on the slides i occasionally call it that without really thinking er this is the pr-, the quantitative statement about our tr-, our prior beliefs so for example if we're supposing that death rates from er a disease on treatment A and treatment B were the same then we would calculate the ratio of the death rates to be er one that would be our null hypothesis we would then observe data and calculate the probability of what we observed occurring for example again the prevalence er in in Warwickshire of a particular disease is the same in Leicestershire another example of a null hypothesis and P being less than or equal to point-nought-five s-, is substantial evidence against the hypothesis being tested not that it's definitely false it means what we've observed is unlikely given what we think not that the hypothesis is untrue again by the same token P being greater than point-nought-five is that the data is not inconsistent with the er hypothesis that means that there's not much evidence against the hy-, the hypothesis being tested but not that it's definitely true meaning that what we've observed is reasonably likely given what we believe as a further experiment again flipping a coin ten times and having our observed results being seven heads three tails we suppose that our null hypothesis is that the coin is is er fair so we make the probability statement that the probability of a head is point-five as before what we're interested in is whether or not the coin is biased what you're seeing there on the slide is the probabilities of observing various different events the first column is the number of heads just let me get the pointer up oh where's it gone there we are so the first column is the number of heads we may observe from zero to ten obviously and the second column is the probability of that number of heads occurring under our null hypothesis so if our coin is unbiased if our coin is fair and our probability of a head is point-five then the probability of observing no heads is point-zero-zero-one the probability of observing one head is point-zero-one-zero and right the way up to ten now what we want to know is how likely is it that we observe seven heads and three tails and we do that by adding up the relevant probabilities and multiplying by two because this is a two-sided test we don't know whether the coin is biased in favour of heads or in favour of tails and that gives us a P-value of point-three-four-four so if the coin were biased about thirty-four per cent of the time we would expect to see seven heads three tails that's not particularly unlikely it's certainly not five per cent unlikely and so we don't reject our null hypothesis that the coin is biased ah and there we are we flipped a coin ten times observed the results seven heads three tails calculated the probability of what we've observed it's reasonably consistent with what we believe that the coin is unbiased and that's fairly weak evidence against because it's consistent with the nun-, null hypothesis so we don't have an-, evidence that the coin is unb-, is biased but it doesn't prove that the coin is not unbiased er er th-, th-, that the coin is unbiased all it does is provide evidence in favour of that hypothesis so just let me collect my thoughts [sniff] now rejecting H- nought is not always much use this is this is what i was said i'd get back to you about the P-equals-point-five business we simply choose that as an arbitrary cut-off point there there is nothing amazing happens between er point- zero-four-nine and point-zero-five-one and [sigh] hang on a second [sniff] so yeah arbitrary P-values we're not all that interested in exact er differences between point-nought-five- one and point-nought-four-nine it largely depends on the context of what we're thinking of it it's an arbitrary cut-off rule which we'll use but it depends on our situation if we're testing a hypothesis that a treatment for the common cold er is effective and we observed er a P-value of point-nought-five-one in that particular hypothesis test then because this isn't a particularly you know groundbreaking thing to be testing the fact that we've observed something fairly unlikely probably means that our cure for the common cold isn't isn't all that effective and so we're not all that excited however if we're looking at a cure for AIDS and we observe a P-value of point- nought-five-one then because this is quite an im-, important and expensive problem we've observed a fairly unlikely result and we're really very interested in finding a cure for AIDS so even though it's not a significant result it's still an interesting thing and we would want to investigate further er false positive results er it's a very strange slide i i i can't quite see the connection between rejecting H-nought and and all the other points on the slide anyway er the P-value gives us an idea of i-, i-, a probability of interpretation of how unlikely what we observe is given what we believe er i-, i-, it's a it's a simple interpretation er that that we can talk about it also has the nice probability interpretation that it is the probability of getting a false positive result so in other words the P-value is also the probability of rejecting the null hypothesis when it's true which is quite a handy interpretation you should also note that significance depends on the sample size if we flipped a coin three times then the minimum P-value we could observe would be er a quarter point-two-five which means that we're never going to observe a significant result in that test of whether or not that coin is unbiased er and so er what we'd w-, obviously need there is is a larger sample size for that test er last point to note is that a stig-, a statistically significant result is not necessarily a clinically important one er again this depends on the context of the problem that we're we're dealing with one that i've er consulted on recently was about A and E admissions er alth-, although the result the the reduction in er A and E admissions was really quite small this was actually very very interesting because even a tiny one per cent reduction in A and E admissions rate translated to quite a large money saving and so we were actually very interested in a very small difference however in other situations we might only be interested in a fairly large er change in say diabetes prevalence for for practical purposes it's er it's rather down to down to context another example would be looking at a-, aneurysm repair er abdominal aortic aneurysm if we have a fairly rare problem er s-, er say abdominal or-, aortic aneurysm having quite a low success rate of repair then sorry er er quite a low death rate of repair and we want to reduce that then if it's low to start with we can only really reduce a low rate by a very small amount simply because of the amount we start with if we start with a five per cent death rate and we want to reduce that for whatever reason er economic or whatever then we can only er reduce a five per cent death rate by a maximum of five per cent which may in other contexts be quite a small reduction so statistically significant does not necessarily mean clinically important but it largely depends on the context of the problem at the time nevertheless P-values are used a lot er most people i i have consulting me at er the Walsgrave Hospital sorry the hospital formerly known as Walsgrave er get very excited when they see P-values in papers most most people are very interested in seeing significant results but that does not necessarily mean that a significant result in a hypothesis test translates to something which is clinically useful or interesting so to sum up hypothesis tests allow us to describe how our observed values help us towards a knowledge of true values by testing er the probability of observing given what we believe and in the next lecture we'll look at how we calculate a range of er in in which the true value probably lies so key points to note in this lecture are that variation exists that that people differ we should all have a fairly good appreciation that such is life that is the way it is er our observed data because of that natural variation is often different from our underlying tendency the observed proportion of people with diabetes in a in a general practice is often different from the prevalence of diabetes in the area that that general practice covers just because of natural variation er various sources of variation natural is is the most obvious one to think about but our estimate of er the proportion of people in diabetes in our general practice will depend on how we choose our sample which is another source of variation and we may test hypothesis about hypotheses about our true value of prevalence of diabetes in our population from our general practice area by using what we observe given what we believe and calculating the probability of what we observe given what we believe and after next week's lecture you'll be able to see how confidence intervals can be calculated those are an e-, give us an idea of where our true value may lie with a specific probability and that's it for today so you'll be pleased you have a slightly longer break than usual