nf0704: yep okay today we're going to talk about searching over the last few weeks i've talked to you about sorting and er having sorted your list or having not bothered to sort it you might want to look for a particular item so i'm going to talk through the subject today what is searching and i'll give you some examples of searching algorithms next Wednesday i'll give you some examples of even cleverer techniques to find what you want quickly so we've done sorting we'll get on to searching namex is sort of keeping up with this in the formal methods stream in the fact that he's talking about some things and then he's going to be talking about the formal specifications of sorting so hopefully it's coming together at least in our minds and hopefully in your minds so searching computer's still not working okay i guess you know what searching is really we're looking at locating a particular item in a list we're looking if you've got a list of items you've got a pack of playing cards and you're looking for a particular one then you can perhaps look at every one until you find the one you want oh look there's the ace of spades if the list is already sorted so you've already got your cards in suit order and rank order er you can look through the entire list until you find either the one you want or something that's bigger oh we've found the two of spades and it's the ace mustn't be here so that's always the problem with searching you hope what you want is there but it might not be and you can get into really clever ideas because if you know it's sorted you can just flick through saying oh there's the hearts there's the clubs oh here's the spades let's look for the one i want so using the knowledge of the order to find what you want quicker so that's where we're going when we did sorting we looked at T-string list sort and we sorted on strings mainly didn't we so that we were looking to get the strings in order today i'm going to look at numbers in particular integers so we're going to be searching for integers er i wondered whether i should have gone with the same idea as whether we should have been looking for strings and i got caught out er i had a random generator which was generating numbers between one and twenty er eighteen twenty five twelve and when i used string list sort on it it gave me an ordering like this one eleven twelve eighteen er two twenty because when you're handling strings it's looking at the strings and i-, it sees one as being followed by one followed by something else so don't get caught out by that er if you're going to use a sort make sure you're using the right sort of sort er i've put my searching algorithm to work working on my sort puzzles so i i hacked together a new sort routine that sorts integers into the proper range er and you can use that in the examples that we have today sm0705: sorry nf0704: there's some notes there okay two of the techniques that we look at you can generalize to string you can change whatever string comparison you're doing to it and see compare text and you can do the same now yes little message in there about case sensitivity er you got to remember if you're sorting strings how have you handled case sensitivity have you put capital-A way up front and small-A after capital-Z have you got small-A and capital-A next to each other if you have then obviously your sorting technique has to work in the same way as your searching technique 'cause if they're different your algorithms will be up the chute so searching for numbers exhaustive and binary will generalize to string interpolation which is tricky but fast works with numbers and it doesn't work well with strings so if you're looking to search for strings generalize exhaustive or binary don't try with interpolation unless you're very clever in which case you can probably will do it i tried i couldn't do it okay in the same way as we've done over the last few weeks what i've done in the hope we'd na-, have a nice computer to show them working is i've taken T-string list and i've inherited a new class in this example it's called T-S-string list i'm using a different class to the one we were using for sorting and i'm doing this in a vaguely object oriented manner okay er the function name's a bit long and it's falling off the end of the slide er but we we can go through what it is it's a function exhaustive so it's a method of this class it's taking a single parameter target which is what we're searching for and it's returning an integer okay little bit of comment telling you what it does returns the position of the first occurrence of target in the list if it's not present a negative value is returned my error handling okay local variable here integer begin end code on the next slide okay degree of inconsistency with my coding remember with functions in Pascal you either say the function name and the return value or you say result on the return value good programmers wouldn't mix it in one's bit of program so we probably should change that one to say result although the effect is exactly the same so result takes the value minus-count so if we get to the end of a this function and we haven't changed the result of a function it's going to return a negative value i've said minus-count remember count is the number of items in your string list okay a for loop we're going to go through the loop going to go zero to count-minus-one list start at zero they go to count-minus one so they've got count elements okay and we say if we're working with integers so i'm converting this string to an integer is equal to a target what we're looking for if this item's equal to the target put the result equal to I and exit someone said what's exit mean in Pascal exit takes you to the end of the function the end of the procedure so it saves you er having a more complicated if good programming practice possibly not the best of programming practice er exiting out the trouble is that otherwise you're into a complicated while loop here which is more difficult to understand so yeah it's a good programming practice 'cause you can read that and say oh when i've found the value i've finished i'm off home or i've finished the function anyway okay so this is exhaustive search i look at every item if i find the item that i want i said the result equal to that's position and i exit and that's the end of it if the item i want isn't in the list i get right down to the end of the list i've never done this bit of code here and i'm out of the bottom here result still equal to count-minus-one and the function will return having executed count times found nothing and say oh minus-count okay you happy with that sm0706: list is if there's nothing in your list if you have an empty list you will still going to return zero if your count's going to be zero nf0704: woo-hoo yes okay the point is that if count is zero and my list is empty count will be zero and so i'm going to return zero when you do testing you're supposed to test with no elements aren't you yes well spotted okay so er we probably want to add one add one subtract probably subtract yes i i had in mind when i was writing these bits of code that i'd return different negative values depending whereabouts it didn't find it and which method 'cause the other methods have different ways of doing it er the same mistake is in all the programs i i tell you guys to test er i i i test by generating little lists and seeing if it looks right er okay er okay the rest of it's mostly true anyway okay exhaustive search because it examines each item in turn items that are at the front of the list er are found more quickly than those at the rear so if you had the whole of the telephone directory sorted on your telephone numbers and you were looking for a particular number er if it was near the beginning of the list you'd find it in seconds milliseconds if it was near the back you'd find it in hours or days so it's not altogether the best and of course if it's missing you've got to look at everything before you find it so that's your worst case and this is in our big-O notation it's O- N so however many items you've got you have to look at each of them but for small lists it's adequate especially if you write it correctly okay easy to write and debug and test okay same function but this time we're assuming that we've already sorted the list er T-string list sort wouldn't be enough 'cause what i explained it sorts strings not integers but we can write a little sort routine or we can find one elsewhere er that would do the sort correctly so assume our list of integers is sorted into numeric order okay much the same as before the comment cut and pasted from before has the word sorted inserted here in a sorted list returns a negative value which we'll need to change because it's not quite right okay two local variables I and J they're not important what they're called er code on next slide okay exhaustive searching on sorted list we start off putting the result equal to a negative number more negative than anything that we will have seen before and we've got here er we're using a local variable which i'm putting to the difference between the item i'm looking at and target so i'm looking to see whether i've passed the target or not because if we're earlier than target this value will be positive if we're passed the target it'll be negative and if we're actually on the target if this value here equals our target we've found the item is that clear yeah yeah okay so we're looking to find the target er it just saves having a complicated bit of code where i've got J here and J here i'd have to say is this greater than that or equal to that so just trying to save time er an optimizing compiler would do it for us anyway so if J is equal to zero then this is equal to this and we've found it so if we come in here we've found it we can put result equal to I because that's the position in the list it is and exit out the end of the function without doing anything else okay if J is not equal to zero but J is still positive J is still positive J is positive if J is positive then this value is bigger than this value so this is bigger than that means we've passed the point in the list where our item should be so if J's positive because string I is bigger than target we've passed the point and we can also exit so we go round our loop round and round and round comparing the current item against target if target is less than what we're looking at we'll go round again target is less than is that the right way round are you certain it's the right way round do we want to look at an example okay we've got a list here suppose in this list here that we're looking at we've got thirteen as the target what we do is we've got a target of thirteen we compare it to one we say one minus thirteen minus-twelve that difference is negative so we're going to not be saying J is nought we're not going to be saying J is positive we're going to go round again so we're going to look at the next item so we're going to say one minus eleven which is minus-ten on a good day and we'll say hey that's still negative it's not equal to it it's not greater than so we'll go round again come to this item one minus twelve [sneeze] are you are you watching what i'm doing or are trying to er confuse me here we start by saying what's in strings I minus thirteen then we say what's in strings I minus thirteen yeah and that's minus-two and when we get to this point so Is nought one two three eighteen minus thirteen is positive okay and at that point we're saying J's not equal to zero J is greater than zero so we passed it so we'll go to exit yeah so we work out this quantity each time and if it's equal to it we've found it if it's gone positive we've gone past it so it's not there now are you happy with that mostly okay exhaustive search still big-O- N worst sort of case is we've got to go round looking at every item could be O-N divided by two but that still is O-N it is faster usually when the item's not there because exhaustive searching without it being sorted you're going to go past everything and then say oh it isn't here if it's sorted normally it'd be somewhere between the beginning and the end on average in the middle you'd only look at half the list and say oh it's not there i'm going home but there are two things which will make it slower and you got to be aware of one there is a little bit more code we've got a few more statements to look at each time we're doing this comparison this and that is it equal is it greater than and you've got the overhead of sorting how long does your sort algorithm take so you've got both those things to contend with it's faster normally but make sure when you're saying it's faster that you've remembered sorting takes some time okay that's exhaustive search which effectively looks at everything binary search works on sorted lists your list has to be sorted for it to work and what it does is it starts at the middle of the list back to drawing up the list on the board it starts at the middle can say that one's the middle and it says is our target if it was thirteen before or after this thirteen is bigger than twelve so we know we only need to look in this part of the list if our target was three we'd know we only needed to look before it oh and if our target was twelve it's a hey we've found it we can go home now so if it's equal we've found it smaller up bigger look down and if there aren't any items left in the bit of list that we're looking at it's not there so we're finished so it's doing the chop we've seen sort routines that do this we did with the er quick sort we were effectively chopping the list in half and sorting that okay so here's a bit of code that's going to do much the same er this is going to be a method on this object here called binary bit of comment which is exactly the same as the comment that we had before a few local variables here that we're going to use min mid max and J er min's going to tell us where's the top of the bit of list we're looking at max is going to tell us the bottom and mid's where we're going to compute the middle point 'cause each time we'll go for the chunks of list and J is just the local variable same as we had before could have used more imaginative names one of the things we say about programs use meaningful names er the trouble with trying to fit everything onto a slide is that if your names get very meaningful when you get if statements they fall off the end er okay it's definitely a bit more complicated than the code for the others it it takes more space and i've shunted some things onto two lines so that they fit so it's more complicated code we've got here but let's talk it through okay we're starting here min equal to zero max equal to count-minus-one the top and bottom of our list and we got a while loop we're going to go through while min is less than equal to max we're changing those values in here and if they're equal then our sublist has got nothing in and we said sublist nothing in it's empty okay J is this difference between the item we're looking at and target so once that becomes positive we're passed it when it's negative we're earlier and when that's zero we've found it and that's what the code says if J equals zero then we've found it and we can return mid which is our index for that middle point did we miss a line of code there yeah we missed that line of code there this one calculates mid and we just add max and min and divide by two we could have copied the example from string list and done a shift right two er but i hesitate to try that sort of thing in my code okay so we've found it if J is greater than zero we're later in the list than the item we're looking for so we need to search earlier so if we're down here in our list and we're looking for thirteen we know we need to look earlier okay and we look earlier by changing our value of max we make it equal to not the middle point 'cause we know that's not the item the one before if it isn't earlier and it isn't this one it must be later if it's there and we handle that in the final else clause here which puts min equal to mid-plus-one again we don't need to include mid 'cause we know it's not that so search earlier search later er if it's not there we make it equal to count-er - minus-one so we'll return a negative number okay are you happy with that as an algorithm yeah sm0707: could we write it a lot easier if we use recursion instead nf0704: couldn't we use it write it a lot easier if we used recursion instead sm0707: then you've got until you get down to the nf0704: yeah i i wondered about writing a recursive version of this er i am sure that you could write this as a recursive routine er sm0707: well no this it would make sense to use them but you can use recursion to take a lot out and make it more simple nf0704: i'll tell you what we'll have three minutes till half past and if you're feeling adventurous [laughter] sketch out what we think is a recursive one i believe that you could make it recursive er my my book didn't make it recursive we'll tr-, try and sketch one out as i say three minutes er and we'll compare and contrast the two okay ss: nf0704: anyway i'm hoping we're going to have a recursive algorithm soon ss: nf0704: okay have we have we got a a version that's recursive sm0708: er sort of nf0704: sort of do you want to write it up or do you want to dictate it to me sm0708: neither sm0709: nf0704: [laughter] neither okay sm0710: it's not very sm0711: we decided it's it because Delphi isn't functional language it wasn't actually very easy to do [laughter] if it was list if it was a list based like namex it would be a lot easier to do than Delphi nf0704: okay the the the statement we've got here is it's Delphi's fault for [laughter] not being a functional language and and if it was a functional language it'd be a lot easier to write er i've got a version that i'll i'll try on you er what would the execution time be like if we compared MIRANDA with Delphi answers on the back of a postcard to whoever you want to send it to er i think that the basic algorithm but what would bother me was that the sort of function level of this binary like the quick sort i'd need quite a few parameters into it and i think what i'd need is to be calling something well like our min and max as parameters in there and our target and make those all integers and things like that but basically the code would be if this thing i've called J is greater than zero so i'd need to look earlier in the list i'd call binary again with min er min- minus- one i need to calculate min and J up here calculate them past target sorry my writing's falling to bits here er else if it was bigger then i'd call binary with mid- plus-one max and target else i've found it so result will be equal to mid i don't think that's the whole code but i think the body is that what the sort of thing you were aiming at sm0712: yeah sm0713: yeah think we were trying nf0704: i think you st-, you still need the decorative bits sm0713: we were trying to pass we were trying to pass the list as a parameter which didn't really work nf0704: ah yeah now you see that's where we had problems with the quick sort algorithm and er and we made a a scratch list as well didn't we er the conventional algorithms that just use arrays they just sort of index into bis-, different bits of the array and i think that helps i think you'd do it easier with link lists as well actually er so i think i think you're right i think you could write a double Basic algorithm i think the algorithm itself that i've given you is flawed from what namex was saying here er if we only have one item and it's the item fall for the same problem that you were telling me about earlier if we have only one item and it's the item then min's zero max is zero so this while falls out and we pop down here and we say result equals minus-count now this algorithm i i hi-, hijacked from Delphi three algorithms and only changed into objects so i think he's wrong as well sm0714: yeah it does nf0704: so let's choo-, sm0714: i bet it does work sm0715: nought is equal to nought 'cause it would still run the middle section nf0704: so it gives a right answer sm0715: yeah nf0704: oh so Mr Delphi algorithms is right oh i'm relieved okay er but the bit of error checking we would put in to make everything easier is we'd check if our list was empty and if it was empty we wouldn't bother trying to search or sort okay let's push on that's about half the slides we've done okay er the next few slides are just comparing the approaches i decided that i wasn't er doing very well at drawing examples on the board and i'd do some at home in advance okay binary search at each s-, step whether re-, recursive or not we're cutting the list in half and because we're cutting the list in half at each step we're big-O log-N if you had a million items sorry i i spelled out a million because when people say million i never quite see how many zeros there are one followed by six zeros the most steps will be twenty with exhaustive search it would have been a million steps yeah so a lot faster the most you're going to do is twenty 'cause you're cutting it in half cutting it in a half and very quickly you're down to having either found it oh or it's not there okay as i said i i i i felt last time when i was talking about sorting and i started playing this game let's write a few lists on the board that i was always getting to a point where the example became more confusing so i thought i'll knock one up and i'm try it with this er as i implied earlier i've got a random number generator that just generates numbers in this instance er they've got to be between one and nine don't they think they're between one and ten and we just didn't get ten okay so this is just a random list i've generated for this example this is the original list we can't use binary sort on that original list because it's not sorted binary search sorry binary search needs it to be sorted if we were to use exhaustive search we would look at each item in turn suppose we were looking for the value five we'd look first of item zero which is two then we'd see four then we'd see nine then we'd see five and say hey we've got it okay of course there's a few more fives in there but er this stops as soon as it's found one so it hasn't found these down here okay so you're happy with that if we were to use exhaustive search on this but to take the sorted list so we'd run our sorting algorithm we'd look at this one we'd look at element zero and say oh there's a one then we'd look at element one and there's a two another two another two and then we'd four's got a four in it element five's got a five oh we've found it in this example er the exhaustive search on the non-sorted list found it faster because of the repeated instances binary search would cut the list into half and it would see a four there and it would then cut the remaining list in half which would be at this point here seven so it cuts it here and then here and it says oh i've found s-, five at position seven so it finds a different instance but since i'd asked for item five it doesn't matter okay are you happy with that as an example that's just how it works if we took the same thing and we're looking for six if we did the exhaustive non-sorted we'd look at all the items saying is this six is it six where's that six gone and we wouldn't find it so we'd look at everything and not find it this sorted list we've got here exhaustive search again we'd start at item zero look at one two three and would eventually find it here at position oh sorry i'm looking for six aren't i look at five and it's not that one you'd look at six and it's not that one you'd look at seven and it's not that one and eventually here when it got to index eight there it'd say ooh this is too big it can't be here so looking for six it'd look all the way down to nearly the bottom of the list and then say oh it's not here anyway the binary search would again split it here at the middle say that's item four is that it no split it here is that it no split it probably here think the middle li-, the it's rounding down 'cause it's a div operation that's too big the sub list between here and here it's got no elements in it it'll fall out in that min max less than or equal to empty result equals minus-count minus one so it's just different approaches the binary is looking at fewer items 'cause it finds the middle one and then it says shall i look at that half or this half finds the middle one shall i look at this half or that half and so there's many less operations for it to do okay the next one we're doing is definitely tricky i don't think we're going to cover all the code today what i'll do i'll tell you about the method and vaguely how it works then i'll go on to talk about hunting which is another approach and then next week when we talk about hashing i'll talk about interpolation as well so if you can remember next Wednesday to bring these notes with you as well then i won't need to copy more of them for you so i'll talk about the approach but i won't go into detail of the algorithm just yet okay the approach of interpolation is similar to binary we might be able to write it recursively maybe i'll give you a week to try that okay it's using known values to guess where an unknown value lies so i've got a list of numbers between one and a million if they're actually evenly distributed and i'm looking for five-hundred-thousand i'd guess it was in the middle if i was looking for two-hundred-and-fifty-thousand i'd guess it was quarter of the way down if it was evenly distributed and that is the approach that we use what we know about the top and the bottom value to guess what where the one we want is okay er if we then look in the middle and find that it's actually seven-hundred-and- fifty-thousand in there we'll change our process so in searching it uses the indexes of known values to guess what the index of the one we want might be it works well if your list is evenly distributed if you have all numbers from one to a million then your guess is going to be brilliantly right in that sort of list if your list isn't evenly distributed but it sort of is er it'll work quite well if your list is skewed and you've got a lot of big numbers and hardly any small numbers it's telephone numbers sort of territory er it isn't so good and it only works with numbers well i guess again if we're clever we could probably predict strings we could probably if we had a list of surnames we could probably say if they're evenly distributed or beginning with A and B and C we could use this technique but the algorithm i'm going to show you relies on there being integer values in there okay i'll show you the outline of the code but i know we're not going to finish it that's just the outside of the function er few variables bits of comment the next slide gives a flavour but as i say we'll cover this next week so we won't go into the details but it's a much like the other codes to start with in the fact that we're looking to see if we're on target if we're on target then we're done and if it's absolutely not there we're going to do a minus-count minus one thing there so that's if it's there but before we can reach this point where this is true we've got three subcalculations we'll need to do we will be going round the loop lots of times one we're going to calculate where the dividing point is so how many items are there in the list what are we actually looking for are we looking for a value that we believe is halfway through or quarter of the way through so we're going to work out where we think it'll be in the list and there's a calculation for doing that then we're going to check that it's not out of bounds 'cause there's no point in looking if we've already fallen off the end of our list and when we've done that we'll create a new middle value we'll look and say oh is that it if it isn't it we'll go round again oh it sounds like recursion to me but we'll cover that in detail next week in theory it's the same order of magnitude as binary O-log-N but in actual tests with real data it runs on integer lists of a hundred-thousand items three times faster than binary it finds it quicker and that's particularly useful er the way that it works if you're having to fetch your data from a slow device 'cause if you've got it on a hard disk or i don't know something else that's relatively slow to fetch it from this is fetching what it thinks is the right area whereas binary you're finding the middle when you're finding the middle of the rest of it you'll probably be rushing off somewhere to fetch it okay but more details of that as i say next time two more slides hunting if you need to locate many items within a list a hunting technique will make it better for you er in the fact that if you assume that the items you're looking for are either close together or even better that you've sil-, sorted the list of items that you're looking for so they're in order you can start your hunt from the item that you've found so you find one item and if the next item on the list that you're looking for is smaller than the one you've found you know to look in that bit of the list if it's bigger you know to look in that bit of the list does that make sense so you just get your list you got all your items in your list and you've got a a list up here with er lots and lots of items in and then going down to a hundred got an i-, a large list there and you've got a smaller list of things that you want to find from there i think perhaps some of these are actually in the list so if you look for three first having found it's not at that point you can look in the remainder for the next one so it's just bringing together what are you using your list for why are you doing the searching is this the reason that directory enquiries lets you look for two phone numbers is it quicker for them or is it 'cause they want to charge you money okay here we go er searching is something that you need to do a lot of in your databases you've created large amounts of data and then you need to access it er why do you need to access it what sort of algorithms well if all you're doing is looking at short lists and doing it occasionally an exhaustive sort search whether on a sorted or unsorted list is good you can write it easily you can test it you can debug it and why do more a binary search is good on a large list the other positive thing about it is it's not dependent on data distribution interpolation works well if we've got a list evenly distributed if it's not binary search will be faster these two work well for numbers and for strings you can work with er names as well interpolation is very fast it's not good on unevenly distributed and it's not good on strings but remember three times faster on a hundred- thousand items it is good the trouble is you've seen two bits of slide that represent it there are three more slides on that the code is longer there's la-, many more bugs that can come in there while you're writing it do you need it okay right so that's the end of today on Monday namex will be talking to you about formal methods and on Wednesday i want to talk to you about hashing and we'll include interpolation so i think that covers it thank you