nm0211: right as i said to namex i'm here just for an hour to start this morning to give you some background on experiences that we've had over the last five or six years in the discovery of a particular set of protein families a a new protein superfamily so-called in plants you'll hear and you've heard previously from other people some of the fundamental studies that underlie protein structure and function i thought it would be useful to give you an example a worked example of how a particular practical project with which i was associated er in my former life 'cause i came from a commercial company however that practical project led on over the last few years into a discovery of a completely unexpected set of related proteins and it shows you how the functional activity of a protein can be preserved throughout evolution and something that's can now be uncovered by this combination of bioinformatic and protein structure study relationships some of you might know a bit of the background but i think you'll find it hopefully an interesting story if you want to stop at any time this is an interactive [laugh] discussion with a group as small as this so please if you don't understand any of the background just er make yourself heard so this is where it started and this is the the one real plant er that i'll show you and this is the the commercial and agricultural basis of this whole academic project you might not recognize the plant but it's an oilseed rape plant and the practical project was represented by this area here which is a completely diseased section of an oilseed rape plant and it's a section of a stem that's been attacked by a particular virulent fungus that's destroyed the stem between these two points and it's gone from green to white and the leaf that's attached to the stem has been destroyed by that process and the fungus itself is called Sclerotinia er there's no kind of real English name for it but that's what it's known as and if you're a farmer and you have this i think we've got [laugh] somebody else okay if you're a farmer and you you have you have this type of fungus in your crop then it's bad news it attacks a whole range of agriculturally important crops particularly oilseed so er oilseed rape sunflower soya beans it's very difficult to control with fungicides the breeding materials that are available are very limited so although plant breeders have tried to introduce resistance they've er really rather failed over the last ten to twenty years the biochemistry of the disease is what er is where my interest lies and the biochemistry is represented here this is the same disease fungus growing on a petri dish in the lab and you can see that the colour of this is very different the outside is purple the inside orange and the fungus is growing across the surface and the reason that there's a colour change is that the P-H is changing the acidity of that petri dish is changing as the fungus grows and it's going from neutral P-H represented by the purple to bright orange which means that the P-H has fallen so acid is being secreted by the fungus and it's the acid that's the major key feature of this fungus and the reason that these sorts of fungi are so successful is that they secrete a lot of acid [laugh] and do we have somebody else er and the reason they secrete acid and the acid that they secrete is the basis of this particular story for the next forty-five minutes and this er this is the acid and this is how it works the acid is oxalic acid and do any of you know of any other plants maybe that have oxalic acid in it in them is it an acid that you know anything about in a biological context no okay well the m-, the most famous source of oxalic acid are green vegetables and things like rhubarb so this is the method of action of why oxalic acid is such a powerful toxin why the fungi that secrete this acid are so successful that they primarily act by chelating the calcium so the calcium binds to the acid and it removes the calcium from the cell walls in the plant and cell walls of a plant are maintained in structure and the reason that they're me-, structure is maintained is through these calcium containing compounds called pectins and pectic acid it's what you put in jam to make it set [laugh] it's a thickening agent that comes from plant wall cell walls once you get the calcium out of the plant cell walls particularly out of these specialized cell types then air can come into the plant 'cause of the rigidity of a plant is maintained by the water inside it once you let air into the vessels in the plant then the plant will start to wilt which is what happens next once the acid re-, goes into the plant it reduces the P-H all the other enzymes then that are present in the f-, that are secreted by the fungi are activated at low P-H and so the plant then starts to rot and once it's started to rot [laughter] that's the end of the plant okay so that's how the oxalic acid works and the question was what could you do about overcoming the the problem so about eight years ago biochemists and the genetics people in what was then Zeneca Plant Sciences at Jealott's Hill were looking for biochemical approaches to reduce the level of acid secreted by the fungus and therefore perhaps to protect the fungus pla-, plant from the fung-, the fungus and the obvious thing to do was to say could we break down the acid secreted by the fungus and there are two enzymes that are known to to break down oxalic acid and these are oxalate oxidase which converts oxalate to carbon dioxide and hydrogen peroxide and oxalate decarboxylase that converts oxalate to formate and carbon dioxide so those are the two main pathways for breaking down oxalic acid so the immediate question was as a strategic question could we identify isolate and then introduce into a plant one or more of these two enzymes and therefore allow the plant to protect itself from the fungus by degrading the acid it's a kind of simple idea simple question and the immediate answer was well we had to make a choice and the first choice was first enzyme which is a plant enzyme the second enzyme here is a fungal one so a lot more was known about this enzyme and over a period of a year or two that enzyme was er the gene that encodes that enzyme was isolated was put in back into a plant through genetic manipulation methods and we produced many transgenic genetically modified plants that now had this enzyme and expressed this enzyme its origin originally was from cereals so we were taking a an enzyme that was found in wheat and barley and introducing it into oilseed rape and what you've then got and this is the end of the biology then and go on to the chemistry what you've then got was this these were two leaves of oilseed rape plants the one on the left is the traditional variety infected by er a small sample of the fungus and you can see it's extended and produced this so-called lesion this rotten area in the middle of the leaf and the acid is spreading out to the edge where it's come becoming crinkled this leaf is a leaf from a transgenic or genetically modified variety where the fungus is no longer growing and this whole idea has been taken and used in over the last five years by companies in North America in particular and this isn't a G-M discussion today but there's a good chance that er a sunflower variety containing this gene will be commercialized over the next few years and will provide very good protection for the first time ever against this very er devastating pathogen of plants so that's the background and now we get to the chemistry and the chemistry says okay what is this enzyme er that was where the practical commercial work er was going from an academic perspective what is it about this enzyme that made it worth studying in its own right and these are i-, the characteristics that made it so interesting and unusual it was an enzyme in fact that had been isolated previously and had been given the name germin about twenty years ago and it was isolated from barley embryos and wheat embryos at that time the protein was isolated but it wasn't known to be this enzyme and we were therefore left with a strange situation of a research group in Canada who'd worked on a protein that they'd given the name germin to and they'd found that it had these characteristics but they had no idea about its function they'd worked on it for a long period of time because it was pepsin resistant most enzymes being proteins are broken down by protein degrading enzymes themselves so this particular germin protein was almost completely resistant to t-, being broken down by normal er proteases it was also resistant to hydrogen peroxide which again is unusual for a protein usually these two treatments would completely denature a protein and make it unfold it was a glycosylated protein in other words it had sugars attached to it which is quite usual in plants it was a multimeric one in other words it had lots of subunits and it was considered to be important because there was a lot of it [laugh] and biologists think well if there's a lot of it it must be important it's a kind of crude analysis of its significance and so the group in Toronto had worked for ten years or more on this germin they had the sequence of it but they'd no way of finding the function we were working on the barley equivalent of this on the idea that it was an enzyme and then we went back to the gene they had the gene and the protein but no function we put the two things together and we were able to tell the group in Canada what they'd been doing for the last ten years was working on an oxalate oxidase and that was quite surprising to them because up to that point nobody had ever isolated an oxalate oxidase from plants er a-, and isolated the gene from a this enzyme if you look then and using the the er bioinformatics techniques at the structure of this protein which is what was done very simply it then became quite clear that this protein wasn't a unique protein but in fact had lots of quite close relatives and this is the analysis as it was about five years ago and at that time in all the databases in the world wherever you looked you could find a total of ten sequences this is the Prodom er protein the main database that showed this pattern of ten proteins but the eight at the top were from plants the two at the bottom were from slime moulds the f-, different colours here represent areas of conservation the different colours at the ends are where they're different so we have this family of ten proteins where there was quite a lot of conservation in the middle of them the the blue area some of them had diff-, had the same N-terminus which is labelled here but the they were different at the C-terminus at this end of the protein so the cereal proteins were different from the the brassica and dicot proteins the slime moulds two on the bottom and the slime mould is one of these er eukaryotes that not as m-, much is known about biochemically but it was known that when this slime mould became desiccated became starved it produces very small spores and these spores contained a lot of this protein again there was no particular function assigned to it but it was known to be somehow related to desiccation so there are series of clues here being built up during this story that say we've got a very resistant heat stable pepsin resistant protease resistant hydrogen peroxide resistant protein found in cereals it seems to have some similarity to a desiccation tolerant starvation induced er protein from a very primitive eukaryote a pri-, very primitive er mould in this case a slime mould so that was interesting because it said there's a a protein connection between a slime mould a barley plant a-, and a wheat embryo so that's a an interesting position and about that time a group in Germany as well as ourselves started to think more about the evolutionary significance of this and they also found by repeated database searches that this family of proteins which started to become enlarged as time went on from ten to twenty and er i'll tell you what the latest version is at the end of this story they were given the name originally germin from the germination er association of the of the wheat gene these er related proteins were then known as germin-like proteins more and more detailed sequence similarity searches showed that there were particular amino acids within these germins and germin-like proteins that were also found in yet another much more distantly related group of proteins that er are particularly interesting to to plant scientists and those are the proteins found in s-, in seeds in storage proteins proteins that are part of all our diets because we indirectly or directly we rely upon eating seeds as a form of er pro-, er vegetable protein at least and if you look at vegetable proteins in seeds particularly the so-called legumin vicilin proteins then you'll see you can identify particular amino acids that are also found in this group of proteins that i've just described and this was a simple attempt to look at the evolutionary significance of these families during time and this was put together by a group in Germany again around this period of five to six years ago and they started with a hypothesis that said if you believe in evolution then at the beginning of time there sh-, should be some so-called ancestral protein from which all these other proteins er were produced during evolution we're i-, in a position that said some of them evolved into these slime mould proteins that are known as spherulins some of them involved evolved rather into what we now know as the germin types of protein some of them evolved into they're described here as C-globulins but another more important group and these are the ones that found in seeds now went through a duplication event at some stage to form these legumin vicilin er precursor proteins because what i haven't said is that the storage proteins are twice as big in length as the germins so at some time they seem to have doubled in size so they have two halves and within each half you can see er the the letters here represent the conserved amino acid residues during this whole period of evolution from the primitive slime mould up to the the higher plant just a few amino acids particularly glycines prolines glycines we've seen phenylalanines particularly those amino acids were conserved in the same place all the way through this process so although you look at these proteins and you don't see very much perhaps superficial significance in similarity terms they do have these conserved residues strictly conserved at particular points so that was an outline a sort of sketch of what the evolutionary pattern might have been and why was it possible to do that was because this three-dimensional structure of the storage proteins was already known and i'll say why that's important in a second the diagrammatic version of that is summarized on here so again we start with something where there's an ancestor you go through duplications you then evolve into a whole series of different families and this covers a whole a large period of time and it covers a lot of different potential functions so that's the the outline kind of cartoon of history that er i want to build from okay i said that the three-dimensional structure of of the storage proteins was known and this is what you get if you look at the alignment of different storage proteins and i've marked on here just a a couple of the globally conserved amino acids these are two storage proteins the first from soya bean the one on the top is this protein called glycinin and the second protein is a storage protein from beans and the w-, the pro-, the amino acids i've marked in red and blue are two of these globally conserved ones so wherever you find a storage protein from whatever species you'll always find a proline at this position and a glycine here and lots of variations in between but there'll always be a conserved residues another glycine here so these boxes represent these few key residues but the question is w-, what function do those few key residues have in determining the structure of a protein because linear sequences of proteins are usually they're useful but they're rather meaningless when you're trying to relate protein sequence to function function depends upon the shape of the protein and the shape of the protein is determined by how it folds up how this linear sequence becomes a three-dimensional structure and if you look at the three-dimensional structure of a storage protein and this is a simple version of it this is f-, a sort of diagram of it being folded er it's obviously two-D to f-, to flatten out here but it's meant it's composed of a series of strands you probably know about er the components of proteins that are made up out of either bet-, so-called beta strands or or helices so there's curved helix er components within a protein or there are there are strands which are just short stretches and these two marked residues that were on the previous er overhead the blue and red ones are in key areas that determine the three-dimensional structure of the protein so the glycine is here where the protein bends and the proline is at a key area here that forms an interaction between this part of the protein and this part so the moral to remember all the time is that just occasional amino acids at key points in a protein can determine the three-dimensional structure it doesn't matter too much in many cases what's between this corner and this corner as long as in three-dimensional space you can join up one corner another corner over a certain distance so when you're looking at proteins and the elution of proteins you th-, have to think all the time about okay linear sequences are fine but it's three-dimensional shapes which are the real key to biology er and the it's the chemistry and the structure behind that that i'll go on to to mention so that's a just a a representation of these two amino acids and the the key function that they play in determining structure so the question o-, or the next question was if we looked at a similar alignment of the germins and the germin-like proteins what could we learn by first of all at a in-, initial alignment of the different germin family and also could we relate the germin structural sorry could we relate the storage protein structural information to possibly predicting what shape of protein the germin- like proteins might have so the next stage was to do a simple alignment of the germin proteins er you don't need to read it just kind of look at the colours if you can there and the colours as usual are colour coded according to the type of amino acid so the the yellows are the er the sulphur containing amino acids prolines are the green and the dark blues are the basic amino acids and you'll see if you go along this sequence there are areas where there are quite good stretches of similarity between all these families there are certain regions there where there's more conservation than others and one area where there are series of successive conserved amino acids is is here and particularly of interest to us because again if you're looking at amino acid you have to think okay some amino acids have an importance in structure some amino acids have an importance in say enzyme activity in things like binding of metals inside a protein and all our particular attention was er was drawn to these particularly to these two areas of here where there are dark blue lines that you see are conserved histidine residues so there were two there and there were also if you went on through the protein and this is towards the the end towards the C-terminus of the protein you could see that there was another histidine residue here and the reason why we should always pay attention to things like histidines i say they're well known to being involved in the active site of enzymes so enzymes have a structure but the structure is really only there to form a scaffold in effect and the activity the chemistry in the protein in the enzyme takes place usually in the middle of it where the active site of the enzyme is so you have this rigid structure which provides a shape but the chemistry that goes on in an enzyme takes place in the active site in the usually in the centre where the chemical reactions er occur and many chemical reactions particularly if you remember this is an oxidase enzyme an oxalate oxidase many oxidases require metals for their activity so there's a they have a metal cofactor so how do metals stick on to on to the insides of proteins they have to be held in position and they have to be held in position through particular amino acids that have a sor-, a very rigid geometry so the distance between specific amino acids will hold particular metals and each metal requires different sorts of binding amino acids can anybody tell me what kinds of metals there might be inside oxidase enzymes any guesses sm0212: iron nm0211: yep su0213: nm0211: yep er two good ones sm0214: zinc sm0215: magnesium nm0211: sorry sm0214: zinc nm0211: yep that's three of the major ones pretty good sm0215: magnesium nm0211: er not quite magnesium something a bit like [laughter] you start with the s-, first three letters of manganese is magnesium is not usually found in in oxidases certainly iron zinc particularly iron and copper i suppose are the two most common zinc not quite so much manganese is often found as an alternative to er to iron so just think about those those four components things like er amine oxidases are copper oxidases in each case though whether it's iron manganese or copper they're always held in position by histidine amino acids so you get this thing called a histidine cluster and you should always look very carefully if you start to see conservation of histidines in a protein alignment but that's circumstantial evidence for there being a metal binding site and that was our prediction on the basis of this er initial survey so were we dealing with a copper containing iron containing or manganese containing enzyme right so how could we answer that without doing any real er difficult biochemistry and the power now of of computing facilities in structural biology is so great that you can do a lot of work without actually going into the lab any longer er many labs are being depopulated because [laugh] computers are taking over and it's a lot easier to run the computer and use the modelling programmes that exist to answer some of these questions rather than saying i want to know whether these three histidines possibly might form a binding site i've got to purify the protein i've got to add metals i've got to do very complex er analytically work and then i might be wrong or i can't get the pure protein pure enough but what you can do in a few days is to say well let's try and produce a model of the protein i mean the best way to produce a model of a protein is if you have an existing structure to work from and the great benefit for us is that we had the structure of the storage protein we had the idea that the storage protein structure was probably related to the germin structure so we take the crystal and the three- dimensional coordinates of the pr-, of the storage protein and we would try and fit the germin sequence on to that backbone and see what we got and this is what we did er couple of years ago and this is a summary of the conclusion we used as a model an average between two existing structures and these were the two storage proteins and this case canavalin and phaseolin so we took from the databases three-dimensional structure of the and coordinates of those those proteins if you remember i said that they had two halves to them because they were twice as big as germin so we treated them as independent halves an N- terminal half an N-terminal domain and a C-terminal domain and the red and orange one represents one protein and the other so that's just to show if you look sideways at these proteins that they have this beta barrel shape and er and we used the average of those shapes to fit the germin sequence on to and you can this is automatic you do the alignment and the computer will come out and it will try and fold the protein according to the coordinates that are it knows exist here and if you do that you get depending on the quality of the alignment you get a prediction then of the three-dimensional structure of your favourite protein so you don't have to crystallize it you can use the model and the model immediately told us something very interesting this is the crude model here that shows it is very similar to these two but it also told us that if we look in the inside of the barrel of this model and we ask where are those three histidines that we th-, saw in the alignment they were quite a f-, they were too close together and there was one that was quite a long way apart but once you fold up the protein you find that those three and they're represented here by green those three histidines fold together so they're very close to each other the they're adjacent amino acids then so in folding the protein we've brought the third histidine close to the first two which confirms now that you have three histidines together and it's further strong evidence that that is a histidine cluster so-called and that that could act as the site for binding the metal inside the protein and all of that can be done now with these very powerful modelling programmes that i'm sure if you haven't heard about them namex will be showing you later so that was the the computer based research and er just spend a couple of minutes now with a sort of interlude before i go on to is that predicted research really been proven by what's happened recently and i'll go back a bit to the question that i was interested in personally which was if we could imagine that there is an evolutionary sequence of these proteins that started somewhere with a ancestral a hypothetical ancestral protein that then evolved through a slime mould protein to things in lower plants and then eventually to seed proteins logic would say well all of those must have had some very ancient ancestry somewhere can can we identify what the oldest surviving member of this protein family is and most people would say well would you go back to the early plants or you go back to the early fungi or f-, from which plants are very distantly related but i wanted to push the boundary in time back a bit further and so i started to search for bacterial and primitive archaeol which is a er er a related form of primitive bacteria could you find these sorts of proteins er even further back in evolutionary time because after all the proteins that are found in plants and animals now didn't kind of arrive from outer space [laugh] unless you are er are a particularly strange religion they came from some a-, existing protein structures so plants and animals didn't evolve a whole set of new protein structures they took existing ones from more primitive life forms and they amalgamated them they cut and they pasted them and they used them for different things but they didn't in many cases they didn't really invent new three- dimensional structures and if nothing else that's a sort of take home message that every protein in you or i or a vegetable [laugh] really is made up out of quite a small number of three-dimensional structures there's probably you know you have about a hundred-thousand genes er plants have maybe forty or fifty- thousand genes but er each of which encodes a different protein but there aren't a hundred-thousand different proteins in terms of their structures in a in you and there aren't forty-thousand different proteins in terms of structure in a in a plant there are probably five-hundred to a thousand and all the other variation is just minor sort of tinkering with the edges or duplications or taking a bit out of a s-, existing structure there's a really a v-, quite a small number of those underlying structures and now as more sort of organisms are being sequenced it becomes clear that in a f-, probably five years we'll know what all those structures are you'd be able to go and say there's you know there's five-hundred structures and they make make life whether it's bacterial or a human and everything else is just rearrangements of those existing it might be less than five-hundred eventually and so the conclusion must be that you will find in bacteria the underlying three-dimensional components of all other proteins that have been produced during evolution and that's in fact what we did do we went back and we looked in all the databases now there are fifty or sixty bacterial species where the complete gene sequence is known therefore you know all the genes therefore if you predict every protein sequence so you know that in E-coli or or B-subtilis the two best known bacteria you can now predict the at least the primary sequence of every protein and people are now trying to model and predict the three-dimensional structure of every protein in an organism and in the future the idea will be that you will take a different cell from an organism and be able to say a skin cell of a human has this set of proteins and we know the structure of all of them so that that's not far-fetched people are doing that now so what we did was go back and say can we find these ancestors of these storage proteins of these germins in bacteria and as you might expect the similarities become more and more limited as the further y-, back in time you go so you have to look for key conserved amino acids and we knew from this analysis that clearly the conserved er histidines in the centre of the protein were some of the most functionally interesting of the amino acids because they're the ones that determine potentially the binding site to the metals and potentially the the enzyme activity of the protein and this is a initially just a brief er outline of that there's lots of letters on here which are just sequences but we c-, we categorized this and attempted to categorize it to make the analysis easier and we divided the conserved areas up into two groups we said that there was a conserved er motif here with these two conserved histidines which are the grey boxes and there was a conserved motif here where the histidine was conserved all the way down there all the proteins at the top of this list are from bacteria and what we also have in here is a thing that i haven't mentioned which is a space between this motif and this one so in other words the two motifs were at different distances apart in the plant proteins the germins there are about twenty or twenty-five amino acids from the end of this motif to the beginning of this one in the storage proteins that can vary as well which are the book bit on the bottom but in these primitive bacteria that distance was was less so again we have another kind of quantitative way of looking at a protein evolution that we have these two conserved motifs that both had histidines in that we knew when they folded up came together that in plants were a certain distance apart in the linear sequence but if you look at the bacteria they were closer together so during evolution certain things had happened the two motifs had in effect moved apart in sequence the protein size itself had also changed because the bacterial proteins were only about a hundred amino acids in length in total whereas the plant ones were twice that length and and the protein some storage proteins were double that length so we had a a model now that said in ancient bacteria we had a s-, fairly small protein with these conserved amino acids in it as it moved from a bacteria to a fungus to a plant an animal and this is a representation of that one important thing happened and again this is just a look at the shape rather than the detail the two motifs are the blocks where the yellow residues are these two motifs had moved apart and this is represented by the kind of tower in the middle the bacteria at the front at the top here plants and animals towards the bottom these residues are ones which had been inserted into the middle of a protein during the billions of years of evolution there were also residues stuck on at each end that i haven't shown at this end and that end but because we knew the significance of the two motifs and the histidine residues we could trace them but during evolutionary time proteins had become more and more complex they'd had extra residues inserted into them and they'd had extra residues stuck on either end and then eventually the whole protein had doubled and become a storage protein so we're getting kind of to the end of the the story now but er i just want to show you er this which was the next attempt at at our model of what the the germin the oxalate oxidase might look like in real life i've told you what the computer said it would look like [laugh] i've told you the prediction of how it might have changed during evolutionary time but what did it really look like and i go back to the comment that said this protein was a multimeric protein it had different subunits in it for many years for about ten years the biochemists in Canada had said we think that the germin protein is made out of five subunits because when we separate them which you can do we get kind of five and we lo-, if we measure the molecular weight we get something that says the molecular weight of the total protein's five times the weight of the subunit and that was what the computer said would be the model of five subunits stuck together we became a bit doubtful about whether that was valid because we already knew from the storage protein structures that s-, storage proteins in seeds are composed of three subunits and we've if you remember that we said that each subunit is about twice as big as the germin subunit so common sense [laugh] if you believe in biology having common sense would s-, argue that if we know that there's a structure of three subunits in each one is twice the size of the one we're interested in it's kind of obvious that say well wouldn't it make more sense to have six subunits of similar size that would then give an equivalent shape if evolution had conserved shapes and i've argued that evolution does conserve protein shapes and the computer model then said if we had in fact six subunits in our shape we would have then have something that we'd described as a trimer of dimers because you have this triangular shape so there's two here two here and two here so it's not there isn't a sixfold axis of symmetry there's a threefold axis of symmetry so and that shape would look very very similar to what we have in a seed in a storage protein but here we'd have six bits rather than than three double- sized bits and that's the kind of simple maths of it so those were our two working hypotheses and the Canadian group were said oh sniff we've spent ten years and we've said it's a pentamer 'cause if you measure the weight then that tells you it's a pentamer and er what i'll now do is show you how we resolved that and we did it through conventional crystallography we had a student who's just finished s-, his PhD successfully who crystallized the germin-like protein the the oxalate oxidase from barley he purified it and purified it and eventually got a a a source of protein that was sufficiently pure to to produce crystals from it and that was a lot harder in this case than than in most cases and i won't go into the biochemistry but eventually he found us a crystal that was good enough to be able to resolve in the in the X-ray beams that you use for this sort of thing and this is it's not published yet so not many people in the world have ever seen it before but this is his resolution of that now the definitive three- dimensional structure of oxalate oxidase from barley and if we get it the right way round well if there is a right way round you see there are six colours so we've confirmed absolutely that it is a hexamer it's made of six subunits there are some other very unusual or sort of key features about this that help to explain its er its biological and its chemical properties one is that if you just take for example there's three corners here we've got one two and three these are corners where this subunit the light blue one interacts with the dark blue one and they are held together by very tight linking of the of the helical er ends of the each subunit so they're called sort of a-, alpha helical clasps they join together very tightly so first of all it's held very tightly at three corners it's also stuck together in effect by the centre of the protein this is the beginning of it this is the N-terminus of it the C-terminus is the bit down here the N-terminus is held together in this case the dark blue subunit is attached to this er magenta coloured one and it's held together by very strong bonds between the the amino acids in the centre here so you have these subunits at each corner which are holding each other together tightly you also have the other alternative groupings of this subunit this one and this one and this one or this one are holding each other together so you have a a intensively strong relationship that holds these different units together and that's characteristic of the fact that this is very thermally stable it's withstands eighty degrees and it still survives so it takes a lot of energy to break those there's the links between it also it has the highest amount of hidden surface if you can imagine this is just on a flat surface but you can look at this in three dimensions but a lot of the the surface of each monomer as you join it to the next monomer is not therefore exposed to the outside world to the solvents around the protein any longer so as you stick things together if you can hide the surface by sticking them together you reduce the exposure of the whole protein to the solvents around it and this has more than half of the area of each subunit hidden by the association and that's er i don't know whether it's the world record but it's close to the world record of proteins of of hiding surface by by assembling into a er into a larger er order protein and because it doesn't have very many surface loops on the surface these are the the strands that joi-, sorry the loops that join the strands together because these are not very many or large and where they are large they're hidden in the middle means that if you want to dissolve this protein with a protease you don't have many sites for the proteases to attack so in other words it's a it becomes resistant to degredation by protein attack so it can withstand all sorts of chemical thermal and and other physical breakdown because it's such a tightly conserved and that helps you understand why it's evolved so successfully during throughout er time that in a seed what you want in the proteins in seeds seeds of the dried up part of the plant they have to withstand dehydration desiccation they have to withstand high temperatures and so the functional characteristics of this whole protein superfamily have these different characteristics that it started off in a primitive bacteria as quite a small protein but it had this probably the same thermally stable structure and during evolution where you'd need a desiccation tolerant thermally stable protein structure it's a lot easier to use one that exists in that organism rather than to invent one and it's a bit teleological but er plants in seeds have taken this desiccation tolerant protein and multiplied it up enormously and they've used it for a different purpose what i'm ju-, going to say now explains two other bits of the biology and that is to take in in effect a third of that structure that you saw before and i'm going to compare it exactly with a one unit of a storage protein okay so if you can imagine the top here is oxalate oxidase but it's it's a third of the hexamer it's two subunits and we're comparing it directly with one subunit from the storage protein and you can see and you can superimpose this on this and they're almost indistinguishable so although in primary sequence if you match this to this you'd have less than twenty per cent similarity we know that the conservation is er important areas we we've got the helices here in the same place so we've got absolute now structural confirmation that our hypothesis that storage proteins were related to this is confirmed by real measurement in space and the two other bits that i haven't mentioned are if you can see the green blobs in the middle here that is our metal there's one metal in each subunit this is the the metal that's held together by the histidine residues so manganese that's why i was k-, [laugh] keen on mentioning manganese at the beginning so manganese containing oxidase it's a unique manganese containing oxidase 'cause there'd never been any described like it before storage proteins have one histidine i didn't stress that but you if you'd counted the number you might have seen that storage proteins have one histidine they have no metal so they've lost the two other histidines during evolution they've preserved the structure they've lost the metal they're no longer an enzyme so they don't have a f-, a chemical function they act as a store of amino acids in a seed so what you eat your diet is made up out of in effect deactivated enzymes that have gone through evolution by maintaining a structure that can withstand heat and temperature but it's lost its enzyme activity by losing the histidines that bind the metal the other point i said was if you imagine the two motifs i said they moved apart in evolution they did move apart and that's represented by this loop here the loop here is the distance between the conserved areas and this loop can really be quite large without disturbing at all the structure of the protein so this loop and some of the other loops have changed in size but they haven't altered the structure and as an aside if you're interested in in food studies at all then you know that some storage proteins in seeds are powerful allergens and the best known of those is the peanut allergen if you if any of you are allergic to nuts very dangerous for some people part of the reason for that is that the peanut allergen has a very large loop in this position and that the allergenic amino acids are in these loopy areas so during evolution some subset of storage proteins have become allergenic by putting in unfortunately for humans [laugh] rather unpleasant amino acids here that can be toxic to people but now we understand the structure there are er G-M people who are modifying peanut proteins to remove those loops and therefore remove the allergic potential of peanuts so the summary now says that er which of these shall i show you is that something like this happened in time er this is a f-, brief phylogeny then of the whole story but you had archaeol species you had bacterial species green bacteria fungi plants ferns the it doesn't have animals on here animals also have proteins that are related to C-storage proteins nobody knows what they do yet [laugh] but if you look in a in a human or in a nematode worm they have a s-, protein sequence that's quite like the storage proteins we haven't got a clue what it does in an animal [laugh] 'cause er we we suspect it's er something to do with with desiccation tolerance but we don't know yet the other thing is that at certain times in evolution we had a duplication event this duplication event led to C-storage proteins there was another one i haven't had time to talk about at the beginning here that led to a different group of proteins and amongst the ones that this duplication led to were the other oxalate oxidase sorry the other oxalate degrading enzyme i showed you right back at the beginning so although we started ten years ago nearly with the choice between should we use oxalate oxidase or oxalate decarboxylase what we didn't have was any clue that in fact the two enzymes are probably very closely related but we go back to here we now know through this evolutionary analysis that oxalate decarboxylase is a duplicated version of oxalate oxidase it's very limited in its co-, conservation but we now know that this is twice of of the size of that and it's a member of the same superfamily so there's a certain kind of symmetry in the story that says throughout all of this we followed a kind of academic analysis but it's led to an understanding of of conservation of function of conservation of er in some cases conservation of s-, sorry conservation of structure but at of a rather broad diversification of function and that's the a message in all of these evolutionary studies that you can start from very apparently very different and distantly related proteins and if you know the structure that's the key thing then you can find that the diversification isn't very that great and that lots of proteins are really members of this small subset of families so er i should finish there i'm sorry about the confusion for the [laugh] at the beginning er [laugh] namex is obviously the expert in the modelling and i'm sure he's going to tell you and and show you how some of the er these techniques can be used but this is a i think an an interesting framework to build from and i should finish there anyway thank you namex i hope you have every word of that [laughter]