nm0858: okay this er [0.8] should be the last [0.6] lecture of this series according to the handbook [0.4] er there should be four this term this is number four [0.6] er at the end er i'll just check with you because [0.3] if there's anything that you'd like me to go over again or if you've got any [0.4] questions or points to make [0.3] er we could have another get-together [0.2] at the same time next week [0.2] that slot is free [0.5] but as far as i'm concerned the content of the [0.3] lectures is complete [0.4] assuming i get through what i want to get through today [0.7] okay [2.0] er [1.0] namex here from CALS is recording what i'm saying [0.3] er it's not that he's going to learn anything interesting om0859: nm0858: it's just er [0.5] er he's going to be [0.5] finding what a syntactic disaster zone [0.4] er unscripted speech is in a lecture i think [1.0] er [0.9] let me just [0.2] oh first of all er [0.3] ap-, [0.2] apologies various well i've i've had it pointed out to me that this er [0.6] the tables in this room of are in a pretty [0.2] filthy state [0.5] and er they're not being cleaned properly so i'll pass that on to [0.5] the relevant er [0.9] department in the faculty [1. 1] okay [0.5] where we got to [0.3] last week [0.6] er was saying that [0.3] er [0.4] although the evidence is very conflicting very strongly conflicting [0.5] there are apparently characteristic [0.2] rhythmical differences [0.4] between different languages [2.9] and i said that although [0.3] we're still very much in the dark about this [0.5] there clearly would not be this rhythmical [0.3] regularity [0.3] unless it was fulfilling some function [0.8] okay [0.3] if it's true that every [0.4] speaker of every language [0.2] speaks with some kind of rhythm [0.5] there must be some kind of function [0.2] to that rhythm [0.4] er because [0.2] er on the whole we don't do things just for the fun of it [0.4] er when we're speaking [0.8] and er [0.3] one of the major [0.7] functions of rhythm [0.3] that we suspect [0.6] is that it helps us to divide speech up into units [4.4] hello it's phonetics [0.2] sm0861: oh sorry [0.2] wrong room nm0858: okay [1.9] [2.3] okay so [2.0] we're looking at [0.3] the possible functions of rhythm and timing [0.4] in dividing speech up into units that we need for perceiving speech [4.3] now of course one of the questions is what sort of units are we dividing up into but first of all just consider [0.5] if we had no help [0.5] in dividing speech up [0.5] into units that is if we had no help [0.4] from any kind of phonetic information [1.2] what we would have to imagine is [0.4] bearing in mind that we don't usually pause between words [0.4] we'd have to imagine that speech is coming at us as a continuous [0.2] stream [0.4] of phonemes and syllables [0.9] and [0.4] we would have the very difficult decision [0.6] er as to where one [0.2] word ends and another one begins or where a phrase ends another one begins [0.4] er or where we reach a a sentence boundary and so on [1.9] and i was going to give you an example of this because somebody once wrote an article [0.7] in phonetic transcription [0.4] leaving no spaces at all [1.2] er this was a a slightly nutty paper [0.4] er that appeared in a rather [0.2] bizarre journal in which all the papers had to be written in in phonemic transcription [0.6] and this guy was saying [0.2] that i mean he was actually a very eminent phonetician John Trim [0.4] he was saying [0.6] er [0.3] if we're going to be realistic with phonemic transcription [0.2] we shouldn't put spaces in our transcription unless there is phonetically a space there [0.4] and if you can't hear one you shouldn't write one [0.8] and so he then wrote out this paper as though he was saying it out loud [0.4] er only putting a space where he would pause for breath [0.4] and of course [0.2] er [0.2] even if you can read phonemic transcription [0.3] the actual paper is almost completely illegible [0.4] because it's just a continuous string of phonemes [0.7] er [0.5] and er i will [0.4] er [0.3] let you have a copy of that paper if you're interested [0.4] er unfortunately the volume on my shelves which has got it in is [0.3] out on loan to er one of my research students er [0.5] er [0.2] who's not [0.2] who's not brought it back but i will get this to you it's an [0.2] interesting experience to read it [0.3] in fact it's a very interesting experience to read the journal itself the [0.4] it used to be called Le Mètre Phonétique [0.5] and er er it was the main publication of the International Phonetic Association [0.4] and until nineteen-seventy-four [0.5] you would not be h-, you would not have a paper accepted for that journal [0.3] unless it was written in phonemic transcription [1.3] anyway one of the big debates in [0.5] er [0.8] the early er well around the middle period of this journal Le Mètre Phonétique [0.4] was a big dispute between two of the great giants of [0.3] er phonetics in the early part of the twentieth century [0.5] er [0.3] that was Paul Passy [0.2] who was the great French phonetician [0.3] and Daniel Jones the [0.2] his British equivalent [1.3] er [0.2] here's a quote from Passy [0.5] il est bien entendu n'est ce pas que l'espace blanc laissé entre des mots n'a pas de valeur phonétique [0.6] basically saying [0.2] we leave spaces between the words when we write transcription [0.2] but it doesn't mean anything phonetically [0.6] okay it's just there [0.3] to help us [0.3] understand [0.4] er [0.2] more easily what is written [2.1] and [0.4] er [0.9] this [0.2] er stirred Jones to write a reply [0.3] by the way er i'll be giving you a handout which gives you these quotes so you don't need to write these down verbatim [0.5] er [0.4] just take in the general gist [0.7] er [1.1] Jones's reply was i would say that a word is a phonetic entity [0.3] that the blank spaces between written words do have phonetic significance [0.5] Passy himself has given instance of th-, in-, instances of this [0.5] er dans un parler tant soit peu langue on distinguera [0.4] trois petites roues [0.2] that's er three little wheels [0.4] and trois petits trous [0.3] er three little holes [0.5] okay so er [0.7] er this has been disputed by French phoneticians [0.3] er quite a lot [0.3] because some French people say you can't hear the difference between those two [0.6] er three little wheels and three little holes [0.4] er [0.2] but the word boundary is in a different place in these two cases [0.3] and er [0.2] Passy on another occasion [0.3] had said that you can actually hear the difference between them [0.3] i'm not enough of a French speaker to know if it's true that you can hear the difference [0.3] but if any of you have French speaking friends you might like to try it out on them [0.3] and see if there is a perceptible difference [1.8] okay [0.2] so there's er there's a a basic dispute er certainly as far as English and French are concerned [0.3] about whether we can actually hear [0.3] where one word ends and one [0.2] begins [0.6] and it is when you think about it one of the most [0.3] vital functions [0.3] in [0.3] speech perception [0.8] if speech comes at us as one continuous stream [0.9] and yet [0.5] mentally [0.5] we're able to divide that up [0.4] into in a very sophisticated way [0.4] into a whole [0.4] series of units [0.5] going down to very small units like words [0.3] and going right up to big units like sentences [0.5] if we're able to do that [0.8] the [0.4] question is are we getting any help in that from the phonetic information [0.6] is th-, is there stuff there in the speech signal [0.4] that tells us where these boundaries are [0.5] or are we just figuring it out on some kind of statistical basis [11.5] i've been doing some research [0.2] on [0.4] one particular problem that arises out of this [0.4] and i'd like to use that as a kind of peg to hang this [0.2] issue on [0.4] to tell you a little bit about what we've been doing [0.4] er and where we've been getting with this [0.8] it's the problem [0.2] it's sometimes called the problem of embedded words [4.3] when we hear a word of several syllables like responsibility [1.1] okay [1.0] that word will almost inevitably contain some other English words which are smaller [1.0] so in the case of responsibility [0.2] we find within that the word response [1. 4] the word sponsor [0.8] that bit there [0.7] the word [0.3] ability [0.9] that last bit there [0.6] the word bill [0.4] which is that one there [0.2] and you'd find a few others as well if you looked hard enough [0.5] almost any [0. 2] word [0.2] in English [0.3] of more than two syllables [0.2] will actually contain [0.3] within it [0.3] packaged up inside it [0.3] a smaller English word [3.2] now consider what the brain is faced with [0.3] if somebody produces a sentence containing the word responsibility [2.1] if the brain wrongly [0.2] segments responsibility into response [1.4] and [0.2] ability [1.3] the [0.3] parsing [0.3] er the decoding of that sentence [0.2] is going to go catastrophically wrong [5.5] see the [0.3] point i'm making [1.4] somehow we know [0.4] that when we hear responsibility [0.6] er [0.5] it is that word [0.2] not a combination of response and [0.4] ability [1.0] nor is it [0.3] in some sense [0.2] re [0.2] and sponsor [0.3] and [0.2] bility [0.4] whi-, er [0.2] that of course contains two English [0.3] non-words [0.3] so that's a com-, comparatively easy task [2.0] now the fact is [0.4] that we don't go [0.2] cata-, catastrophically wrong [0.3] we don't make mistakes all the time [0.4] about [0.2] er [0.2] polysyllabic words [3.4] and the research that i've been [0.3] involved in [0.4] has been looking at factors responsible for [0.4] our being able to cope successfully [0.5] with [0.2] this problem of embedded words [0.3] the fact that we're not constantly going off in the wrong direction [0.2] being fooled by [0.3] the sounds into hearing something [0.3] that isn't there [3.1] let's just [0.3] try to think about [0. 7] what [0.3] factors might be [0.8] helping us [0.3] not to go wrong [1.4] i've got [0.3] three possible [0.3] hypotheses here [2.1] and again these are in the handout that i'll be giving you [0.4] and [0.4] er [0.5] you don't need to write these down [0.3] you you'll get this text [0.4] later on [5.0] okay the question is then how are we [0.3] successful how comes how come that we are successful [0.4] mo-, most of the time [0.3] in deciding where word boundaries come [1.9] one possibility [0.6] is that there is some phonetic information [0. 4] just at the point where the boundary comes [0.4] which helps us to say ah [0. 3] that's a boundary [0.3] so i know [0.2] that one word ends and another one begins at that point [0.3] now we'll look at that in a bit more detail in a moment that's [0.2] certainly a possible and a plausible [0.6] er explanation [2.8] okay [0.2] another possibility [1.1] is that [0.3] although we don't [0. 2] find [0.6] all that much information [0.6] in the actual segments [0.2] at the boundary the actual phonemes [0.5] at the beginning and end of a word [0.4] there is something about the overall shape of a word [0.4] that enables us to say [0.5] er [0.4] i can hear [0.4] a word starting [0.3] and now i can hear the word ending there's something about the overall shape of it [0.7] for example [0.2] er this is a completely wrong statement but imagine [0.3] that there was something like [0.3] that a word always started very quietly [0.2] and built up to a a crescendo [0.2] and then faded away into silence [0.4] the overall shape of the loudness pattern there [0.3] would help you to know [0.3] the where the beginning and end of the word [0.2] came [0.2] that actually doesn't happen of course [0.3] er that that's just an imaginary [0.2] example [1.6] and the third possibility [0.2] which i suspect a lot of linguists would rather prefer [0.2] is that there is absolutely nothing worth listening to [0. 2] in the speech signal [0.5] when it comes to deciding on boundaries [0.3] we simply do it [0.2] on the basis of linguistic knowledge [0.8] that's the sort of top-down [0.5] theory [0.6] and we'll look at that one a bit more as well so [0.2] what i've actually said th-, is er [0.2] instead of using phonetic or phonological information [0.4] we match segment strings against our lexicon [0. 3] and choose the match that gives the most plausible sequence of words [4.7] let's just go back to the [0.3] er example i had on the previous slide the word responsibility [0.6] er [0.9] if you get that str-, that s-, [0.2] sequence of phonemes that make up [0.3] responsibility [0.3] and i put to you the problem [0.2] why [0.2] in a sentence like it's your responsibility to get there on time [0.4] why don't we [0.4] interpret that as it's your response [0.5] ability [0.3] to get there in time [0.3] why don't we interpret it that way [0.5] it's because we know [0.3] the structure of that sentence [0.2] we know its lexical content [0.2] we know the sort of situation in which that's uttered [0.5] and we simply wouldn't make a daft [0.3] interpretation like [0.3] response [0.3] and [0.2] ability [0.2] as two separate words [0.2] because it wouldn't fit [0.2] with the syntax [0.2] and it wouldn't fit with the semantics of what we were saying [0.8] end of problem [0.3] you don't need [0.2] phonetic [0.2] or phonological information [0.7] okay that would [0.2] er almost certainly be [0.5] er so the well anyway the computational linguists' answer to the problem [0.5] there is [0.2] enough contextual linguistic information to solve the problem [0.3] without relying on what's there in the sounds [0.8] now of course [0.3] er [0. 4] that's not [0.5] my [0.2] approach to the subject so i'm not going to buy that [0.3] explanation [0.4] to me [0.4] er [0.7] er there must be something in point one [0.3] that word boundaries are marked by allophonic information in the segments adjacent to the boundary [0.7] and or [0.6] prosodic factors can ter-, can characterize the overall form of a word [0.7] now remember that last week i was talking about differences between different languages [0.4] what i suspect is [0.6] that in some languages [0.4] we find a preponderance of the [0. 4] er er [0.9] er the function [0.4] in word boundary divisions [0.4] b-, [0.6] here based on this first [0.4] possibility [0.4] that word boundaries [0.2] are marked [0.2] segmentally [0.2] at the edges [0.3] and in other languages you find that the main [0.2] contributing [0.3] factor to our being able to divide into words [0.3] is the second one the prosodic information [0.7] let's let's look at this second one [0.2] to begin with [0.7] prosodic factors character-, [cough] characterizing the overall fir-, form of the word [0.6] [cough] [0.2] if it's true that in French [1.0] every word [0.4] ends with a [1.8] stressed [0.2] syllable [2.7] which is usually [0.2] the claim made in introductory phonetics books [0.6] then dividing French up [0.4] dividing continuous French up [0.2] into words [0. 3] is simply not a problem [0.6] you just listen for a stressed syllable [0.5] and you say [0.2] ha [0.2] stressed syllable [0.3] end of word [0.7] word boundary [0.5] listen for the next one and then you might have [0.6] a few syllables [0.3] and a strong one like that strongly stressed one [0.4] so you automatically [0.3] then [0.2] place a word boundary [0.2] it's about as simple [0.2] a procedure [0.3] as simple an algorithm [0.3] as you could find [0.3] in [0.3] decoding speech [0.2] just listen for a stressed syllable [0.4] and place a word boundary immediately after it [2.5] er [0.2] and there are other languages as i've said before which have other stressed patterns [0.2] so for example in Polish [0.4] er [0.2] most words [0.2] have [0.2] a strong [0.3] syllable [0.3] and then a weak one and then a word boundary [0.4] in other words the stress in Polish [0. 2] normally comes on the penultimate syllable [0.7] so if you're a Polish listener listening to Polish [0.4] you [0.2] l-, [0.3] er let the stream of speech come in through your ears [0.3] and you simply [0.3] er have [0.2] er some bit of your [0.4] processing [0.5] capability in your brain [0.2] listening out for stressed syllables [0.3] you let one more syllable go by [0. 3] and then [0.2] you place the word boundary [0.9] it's [0.5] just as with French w-, one has to be a bit sceptical about this [0.2] there are [0.3] actually if you listen to spoken French there are plenty of cases where French speakers put the stress [0.4] er earlier than the final syllable [0.4] there are exceptions to the rule in Polish [0.3] if you take a a word [0.5] the wor-, Polish word for university for example is uniwersytet [0.4] er which is [0.3] pro-penultimate it's the it's it's not on [0.3] er it's not they don't say [unIvEr"sItEt] [0.2] they say [unI"vErsItEt] which is [0.3] er which leaves two unstressed syllables at the end [0.2] but most Polish words are are str-, are are structured like that [0. 7] so in those cases there are prosodic factors characterizing the overall form of the word [0.3] under those circumstances [0.2] you've got lots of help [0.2] for dividing up speech into words [5.5] now as we know English is a much more difficult customer [0.2] in that respect [0.3] because [1.2] we know that in polysyllabic English words [0.3] we find some [0.2] where the stress is on the first syllable [0.5] some where the stress is on the last syllable [0.3] and some [0.2] er in other places in the middle [1.0] and therefore [0.2] we can't rely at least not in such an easy way [0.3] on that overall prosodic shape of the word [1.0] [cough] [1.7] on the other hand what we do have in English in a fairly powerful way [0.5] is the [0.4] er ability to distinguish words or pairs of words [0.3] er on the basis of phonetic information [0.8] er [0.3] the example that everybody's heard of and that always comes up in [0.4] er [0.4] early lectures on phonetics is distinctions like [0.7] grey [0.3] tape [2.3] and [0.2] great [1.0] ape [2.3] i'm sure you've all come across examples like that [0.3] people have written [0. 4] er [0.2] huge articles all based on this particular problem [0.3] er in fact a lot of it was inspired by that row or dispute that d-, argument between Passy [0.3] and Jones [0.3] er all those years ago [0.5] because once the [0.4] er [0. 3] issue had become a theoretical problem [0.4] it impelled people to start doing experiments [1.7] if you don't know what it is that distinguishes grey tape and great ape [0.4] er you certainly ought to be able to explain it [0.3] it's not [0.2] all that [0.2] difficult to understand [0.4] er but you may have forgotten the sort of basic phonetics that enables you [0.4] to figure this out [0.6] er let's just [0.2] go through this particular example [0.5] er [0.8] the first point to make is of course that [0.3] both [0.7] of these [0.6] phrases [0.4] contain [1.3] exactly the same segments [1.6] if you actually go through phoneme by phoneme there is no difference [1.9] and yet [0.3] if i say [0.2] either grey tape [0.2] or great ape [1.8] th-, the v-, a-, ninety-nine per cent of people will [0.2] successfully recognize [0.3] which of the [0.2] two [0.2] i intended you to hear [1.7] now i can v-, i can make the difference [1.0] even clearer [0.3] if i sort of fake it [0.3] if i put a glottal stop in here before the [0.3] before the [0.3] vowel begins in the second word and say great ape [0.5] great ape like that [0.3] then there is no ambiguity at all you simply couldn't interpret [0.6] great ape [0.3] as [0. 3] er a combination of grey [0.2] and tape [1.6] but [0.2] if we take it a little more bit more naturally and say [0.4] er grey tape and great ape [0.4] without a glottal stop [0.2] you can still hear the difference [0.2] what are the phonetic factors [0.8] well [0.3] one of them [0.4] er [1.8] anybody want to tell me before i tell you [2.9] this is delving back into phonetics from long ago [1.6] go on [0.2] you're nearly [0.7] [laugh] [0.4] [laugh] [1.7] sf0862: is it stress [0.8] nm0858: mm [0.4] sf0862: is it stress [0.6] nm0858: no the stress is identical [0.8] grey tape great ape it's so it's er the second syllable is stressed in both cases [1.0] no it's a-, this is allophonic information this is er [0.2] we've got the same phonemes [0.3] but they have different allophones [0.3] this one here is initial [0.2] in the syllable [0.2] in tape [0.6] and so it's aspirated [0.8] if you listen [0.3] grey tape [0.2] grey tape [0.4] but if i [0.2] take this one at the end of the word great it's unaspirated [0.6] er and so it's pronounced great ape [0.3] great ape [0.4] great ape [0.3] and there's no [t_h] [0.4] [t_h] [0.4] [t_h] [0.3] sound [0.2] at the end [0.2] of this one here [0.3] so here we the the T is aspirated [0.3] here [0.3] the T is unaspirated [1.7] there's another difference as well [0.5] this [0.3] word here great [0.7] has a final fortis consonant [0.6] what do final fortis consonants do to preceding vowels [5.8] there's a lot of rust on that old phonetics isn't there [1.6] a f-, a final fortis consonant shortens [0.2] the preceding vowel [0.7] if you measure [0.2] the [0.4] [eI] sound in great [0.6] it is very much shorter [0.3] than the [eI] sound in grey [0.4] listen to this [0.4] grey tape [0.5] grey tape [0.6] and now this other one [0.4] great ape [0.4] great ape [0. 4] the [eI] [0.3] is [0.3] shortened [0.3] by [0.2] possibly fifty or sixty per cent [0.3] it's a ver-, very striking [0. 2] shortening effect [0.4] that is almost unique to English [1.1] most languages in the world have a slight shortening effect from fortis consonants [0.4] English has taken this [0.2] very slight almost imperceptible difference [0.3] and for some reason that we can only guess at [0.2] has magnified this enormously [1.4] so what we're seeing here [0.3] is a case [0.2] which i've labelled as one among these hypotheses [0.3] that word boundaries are marked by allophonic information [0.5] we are able to pick up [0.5] from [0.3] the allophonic detail [0.6] in the phonemes [0.3] where [0.2] the word boundary must be [0.5] given this information [0.2] that in this case of grey tape [0.3] you've got an aspirated initial [t_h] [0.6] you've got to put the word boundary [0.2] before the [t_h] [0.7] given the information that you got a short [eI] [0. 2] sound [0.3] and an unaspirated T [0.3] you are forced to put [0.2] the word boundary [0.4] after the [0.2] T there [0.7] er [0.2] i mean when i say forced there is no law or [0.5] er [0.3] th-, or penalty involved here [0.3] but that's the way we work [2.4] once this effect had been observed [0.3] er it arose well er various [0.2] various follow up studies were done [1.6] the best known of these i'll give you the reference er to this was work done in the nineteen-sixties by O'Connor and Tooley [0.6] where they got unsuspecting readers [0.4] to read [0.4] er [0.5] rather weird sentences containing pairs like this [0.3] so [0.3] er [0.2] things like er [0.5] er i saw the grey tape out of the window and i saw the great ape [0.2] out of the window [0.5] people would read these they then went [0.3] through the recordings with a pair of scissors [0.3] and cut out just the pairs of words [0. 3] and played them to listeners [0.3] and said can you say [0.2] whether you're hearing this one grey tape [0.4] or this one great ape [0.7] and [0.2] er what they found was er [0.5] er fairly surprising [0.5] when there were plosives involved particularly voiceless plosives [0.3] people were very very successful [0.4] in [0.2] successfully plac-, very successful [0.3] in placing [0.2] the word boundary [0.3] in the in the right place [1.7] er [0.3] there were [0.5] many other examples that they constructed with different types of consonants [0.4] that were much less successful [0.7] er [0.4] and er they were er eventually forced to conclude [0. 2] that this [0.6] business of allophonic marking of word boundaries [0.3] only works in a limited number of cases [0.7] but in the meantime they had a lot of fun inventing [0.4] these pairs of words [0.2] they're sometimes called juncture pairs [0.6] because this [0.8] word juncture is used to refer to the joining between two words so juncture pairs [0.5] became a kind of [0.3] er phoneticians' hobby [0.4] er [0.2] when i first started going to phonetics conferences in this country [0.4] er you would often get people sitting around [0.3] er over a beer in the bar after the [0.2] papers were over for the day [0. 3] inventing things like this and seeing if they [0.3] could get people to hear the difference [0.3] and you get things like [0.3] er [1.6] to choose [0.5] ink [1.2] as opposed to [0.3] to chew [1.5] zinc [1.1] and er what were some of the other crazy ones [0.5] er [0.4] yes more [1.5] ice [0.8] and more [1.7] rice [0. 7] okay [0.4] er [0.2] lots of things like this [0.3] constantly [0.2] thinking up [0.4] pairs like that [0.3] and then trying them out on listeners to see if they can hear the difference [0.9] the answer is if people are deliberately trying [0.2] to make it clear which of these they're intending [0.4] then it can be made unambiguous [0.3] but in normal speech [0. 2] you just can't hear the difference [0.3] unless it's something which has got whacking great [0.4] allophonic variations [0.3] that help you [0.3] like the aspiration here [0.3] and the prefortis [0.3] shortening [0.5] in that particular case [0.6] so [0.2] looking back at these possibilities [0.3] what i'd say is that [0.3] okay [0.3] there are certain circumstances in which we get allophonic information at word boundaries [0.2] which helps us to discriminate [0.9] but not all the time [0.3] and not in all languages [2.1] secondly there are prosodic factors [0.2] er [0.2] in th-, terms of overall shapes of words [0.3] which help [0.4] in some languages [0.3] but that help is not very great in English [0.8] and in fact until [0.3] fairly recently it was believed [0.3] that there was no help at all [0.6] in English [0.6] for [0.3] er word identification [0.3] based on overall prosodic shape [2.3] but i mentioned briefly [0.3] er a couple of weeks ago [0.3] that research by Anne Cutler [0.6] er [1.1] o-, o-, by look-, who has looked at a very very large number of English words [0.3] research by Anne Cutler and her colleagues has shown [0.2] that statistically [1.5] it is more likely than not [0.5] that an English word of [0.2] several syllables [0.3] will begin [0.2] with a stressed syllable [0.8] statistically [0.4] initial stressed syllables [0.2] are the most likely in English [3.4] and if you think about it you can come up with [0. 2] hundreds of words just off the top of your head [0.3] which don't have [0.2] the initial syllable stressed [5.5] but even so [0.3] the [0.2] er [0.6] figure is something like sixty-five per cent to seventy per cent in a given [0.3] text [0.3] the [0.9] the er [0.9] the number of [0.2] initial stressed words of mor-, that is of polysyllabic words [0.4] the number of initial stressed words in [0.3] English [0.3] on average is around sixty-five per cent [0.4] to even sometimes as much as seventy per cent [0.6] er [0.3] initial stressed [1.4] and so Cutler's theory is that although it's only a weak tendency compared with [0.3] languages like French and Polish and so on [1.0] English speakers do [0.3] to some extent rely on this as a guideline [0.4] if you hear a stressed syllable [0.5] your brain says [0.2] this is probably the beginning of a word [1.5] and of course [0.3] er following on from that [0.2] the sy-, the syllable before that therefore was the last syllable of the preceding word [1.8] and [0.2] we are at times proved wrong on that [1.1] but if we stick to that simple rule [0.3] we are correct [0.2] more often than not [1.0] er i have my doubts [0.6] still about this [0.4] er but er this is something which she and her [0.3] coworkers [0.3] er have held to for a very long time [10.3] let me just er [0.6] for a moment look at that last possibility the one that we don't make use of er [0.2] that we don't need to make use of [0.2] phonetic information at all [1.6] and i i said that this appeals to computational linguists because if you're designing a computer to recognize words probably [0.3] er you would find it a tedious [0.3] er [0.5] er [0.3] superimposition [0.3] to have prosodic information to worry about [2.3] you have to assume that all the words that you know [0.3] are coded in some kind of dictionary in your head [0.8] that is we all have a mental lexicon [0.5] opinions vary about the size of it [0.5] er partly depends how [0.4] er how highly educated you are [0.3] and how good you are at remembering words [0.3] but it can easily be somewhere around say eighty-thousand words [0.3] that's quite a big dictionary [1.0] we have to assume [0.4] that that dictionary is coded [0.2] in some kind of phonological [0.2] form [1.1] er that is it's [0.3] a-, although we know as literate people we know the spelling of the words that we've got stored in our heads [0.5] what's more important is that we know [0.3] the sounds [0.2] that make up [0.3] the words that we have in our head [0.4] you've got the word [0.2] cat [0.3] e-, everybody [0.3] in this room has got the word cat [0.3] in their mental vocabulary [0.3] and that is stored [0.3] in a number of ways including the fact that it contains a [k] [0.4] and an [ae] [0. 3] and a [t] [4.7] a fairly typical [0.2] computational operation in computational linguistics [0.4] is to have [2.7] two strings of [1.3] well let's say we had to have one a string of phonemes [0.3] which are the input [0. 3] it might be the sounds which are coming in through your ears [0.9] and then in your mental lexicon [0.6] you've got [0.5] lots of items which are made up [0.2] of [0.8] phonemes where each of these little dots represents a phoneme [0. 8] and your job [0.3] is to map [0.6] the one [0.3] onto the other what you've got to say is [0.3] well [0.2] for example [0.3] er [0.4] i [0.2] can identify a particular phoneme here [0.5] let me look through the words that i know [0.2] and see if i can find any [0.3] which [0.3] er begin with that phoneme and you find one [0.3] er supposing that's [k] [0.6] er [0.3] then you might look for a word beginning with a [k] [0.2] like that [0.3] and then you look for another match [0.4] following that [0.2] and another match following that [0.3] and you see [0.2] if [0.3] any of these patterns of sounds [0.2] match up with something in your mental [0.3] lexicon [0.8] and if it does [0.4] you mark that down and say i think that's a whole word [0.3] let's now move on [0.3] and try the next one so you think well if that's a word [0.3] then this should be the beginning [0.2] this next dot along [0.2] should be the beginning of the next word [0.3] let's see if that matches any words in my mental vocabulary [3.0] if you were using a a b-, a big computer and you had unlimited computer time [0.2] er [0.2] doing that k-, that kind of manipulation is fairly straightforward [0. 4] and all you need is some fairly clever mechanism which will keep cycling back every time you fail [0.9] now if we go back to the example of responsibility [0.5] er [0.4] if [1.1] you had wrongly identified [1.0] er [1. 1] responsibility [0.6] er as re [0.5] and [0.2] sponsor [0.4] and bility [0.6] sponsor would match up [0.3] to one of the words in your mental lexicon [1.0] but re [0.2] wouldn't [0.5] and [0.3] bility wouldn't [0.2] because re is not an English word [0.2] and bility is not an English word at least not as far as i know [0.6] and therefore [0.2] that hypothesis [0.2] would have to be trashed [0.4] you would have to s-, [0.3] you would simply have to say [0.3] that was a non-starter i will go back to the beginning [0.4] er [0.3] to the last place where i was fairly sure [0.3] and [0.2] start over [0.2] and see if i can make a different interpretation and you might this time say [0.3] maybe it's response and [0.3] ability [0.2] let's see if that works [0.4] but then [0.2] later on the syntactic information that you had [0.3] would rule that out as a reasonable hypothesis [0.2] so again you would trash that and say [0.2] okay perhaps it's the whole word [0.3] responsibility [0.4] and you would match that up [0.2] yep that matches up with the word in the mental lexicon [0.3] and it fits the syntax [0.2] and it fits the meaning [0.3] that's it [0.2] i'll [0.3] er i'll go for that hypothesis [1.4] so [0.6] er [0. 5] a a computational linguist would like this idea of [0.2] shuffling [0.3] the possibilities [0.2] matching patterns [0.2] from the input that is the sounds that you hear [0.4] to [0.2] stored patterns in your brain [0.3] which are the words [0.3] er that you actually have stored [0.7] now our brains are stupendously fast [0.3] at finding words [0.3] but even so [0.3] the idea [0.2] that we would leave it [0.5] to our brains just to work on a a a a phoneme [0.2] pattern matching [0.7] ignoring all this wonderfully rich information about prosody [0.5] and the allophonic information [0.3] is just crazy [0.4] the brain [0.2] would not simply ignore [0.2] such a valuable source of information [0.8] so it seems to me that i i would want to reject the idea [0.3] that we don't use phonetic and phonological information [0.3] in deciding on word boundaries [4.8] okay well [0.9] what i want to do to finish up with is just describe an experiment [0.6] er that i was working on last year [0.4] er on this particular question of embedded words [2.5] now embedded words are difficult things to work with [0.5] because [0.6] the only way we can really [0.2] test peoples' ability to hear them [0.5] is to cut them out of their context [0.4] and when you cut a word out of context [0.4] it suddenly stops [0.4] sounding [0.5] er [0.3] recognizable and familiar [0.9] er [0.3] i've done lots of this and i've got a couple of examples on tape [0.2] there's one that i use a lot [0.3] this is an example where you know [0.7] er what i've got here is is [0.2] a large number of [0.2] extracted versions of one particular word [0.3] which is hundred [0.3] just pulled out of one of our [0.3] big er computer corpora [0.2] automatically [0.4] by one of our research computers [0.4] and since you know what the word is [0.2] you can recognize the word every single time [19.6] nm0858: it goes on for hours [0.2] but this is just the you know we just set the computer loose on going through [0.4] hours of speech looking for the word hundred [0.3] cutting the word out and playing it out onto the tape [0.6] but what we did for the [0.4] perception experiment on embedded words [0.4] was to take out words [0.3] which were [0.2] er quite identifiable to us as experimenters [0.3] but when [0.3] cut out without any context [0.3] and without any information [0.2] and presented to naive listeners [0.2] were al-, [0.2] very often unrecognizable [0.5] now i haven't got the text for this here [0.3] but what you'll hear is that you can recognize some words [0.5] er these these are actually extracted from the same [0.3] corpus of recordings [0.3] as those wor-, those [0.2] words hundred that you just heard [0.3] it's from a corpus called MARSEC corpus [0.4] that er we've been working with for many years [0.4] in in my group [2.0] nm0858: this is the kind of thing that people had to listen to for our experiment to [0.3] try and identify the words [4.5] each one's said twice [14. 3] that's fairly easy [1.3] that's city [6.0] that's just [15.7] okay [1.5] let me [0.2] just explain what this er [0.9] experiment was trying to do [1.1] er [0.7] we started off by using this MARSEC [0.2] database [1.1] and we went through three stages first of all we had to select [0.3] data [0.8] and what were doing was looking for pairs of words [0.8] in [0.5] er the [0.2] data that we had recorded [0.5] where we could match [0.2] from the same speaker [0.7] a full word [1.0] like it might be response [1.3] and [0.3] something which seemed to be the same [0.4] which existed as an embedded word [2.4] so that we had pairs of words [0.3] er [0.2] although we separated them out in the tapes [0.5] so that sometimes people were listening [0.2] er i mean we heard the word just on that tape [0.4] in some cases people heard the word just [0.5] out of a sentence that sa-, said things like [0.2] i was just going down the road [0.3] or [0.3] he was a just man [0.7] but in some other cases [0. 2] from the same speaker [0.4] we had the word just [0.4] from [0.4] er [0.3] a a word like adjustment [0.5] okay [0.3] that's an embedded word [0.4] the word just [0.3] sits inside the word adjustment [0.3] and we cut it out [0.5] and the idea was to find [0.3] by testing listeners' perception [0.4] whether [0.3] our [0.2] listeners were more successful at hearing [0.3] the embedded words [0. 3] or [0.3] the [0.4] er what we call the real words the words which genuinely had [0.3] a word boundary at either [0.2] side [3.3] er and so we went through and we [1.0] er this is work er done jointly with the [0.6] er [0.2] with Anne Cutler's group in the Max Planck Institute for Psycholinguistics in [0.3] in Nijmegen in Holland [0.7] er and we spent [0.2] very very large amount of time [0.4] going through extracting these pairs of examples [0.3] and then recording them in random order [0.5] for for listening tests [4.1] now the first thing when we'd done all this [0.3] was we the experimenters listened to tapes to see if we could hear the difference [0.5] and we could of course we'd been working on this for years [0.2] so it's not surprising [0.3] that we could tell the difference between real words and embedded words [0.7] er [0.2] we actually did t-, a [0. 2] test on this [0.4] er as er experts [0.4] er these are the [0.4] statistical results and the main thing is that the [0.4] er [0.3] er probability value is point-zero-zero-seven-six [0.3] which means that the [0.3] difference between real embedded wor-, and embedded words [0.3] in terms of us recognizing which was which [0.3] was highly significant [0.5] so [0.3] we were able as the experts running the experiment [0.3] we were able to distinguish between real and embedded words [1.8] there's nothing very surprising about that [0.6] then we played [0.4] these words to naive listeners [0.4] who had had no previous experience of working [0.3] with this kind of problem [0.6] and [cough] we worked out scores [0.5] for [0.2] how many words [0.4] they got [0.5] correct [6.1] er [0.6] the i won't i won't it would take too long to explain what these er success [0.2] scores er [0.4] were actually calculated on [0.3] but we get [0.2] a much higher success rate [0.2] here six-point-one-five [0.3] on real words [0.2] compared with four-point-three-three on the embedded words [0.3] and that difference there is very highly significant with a probability value [0.3] of point-zero-zero-zero-four [0.9] so [0.3] there was no doubt at all that our [0. 3] listeners [0.5] did better [0.5] on [0.2] real words [0.2] rather than embedded words [5.7] remember that these words were presented completely out of context [0.3] and therefore our listeners had nothing to go on [0.3] except what they could hear from the tape [0.8] and the only conclusion you can make from that [0.3] is [0.2] that there is something there phonetically [0.4] that enables you [0.3] to [0.2] tell [0.6] what is a word and what is part of a word [0.6] to enable you to distinguish between [0.4] bits of words [0.3] and [0.2] whole words [8.9] so we went back to the tapes and we spent a lot of time listening to them and i spent [0.2] er [0.4] quite a lot of time [0.3] er over in Nijmegen [0.2] working through every single word [0.4] doing a very detailed phonetic examination of each word [0.5] and the thing that was coming out more and more clearly was [0. 2] that the embedded words [0.2] were shorter [0.3] than the corresponding real words [2.4] if we look at that in graphical form [0.7] er [0.3] what we find [0. 3] here the-, these these are box plots [0.3] that's the scale of duration on the left hand side going from [0.3] a hundred to five-hundred milliseconds [0. 4] er [0.2] er this box covers most of the data [0.4] and in the case of embedded words [0.5] the [0.3] er duration was [0.4] f-, rather shorter [0.4] than [0.3] the duration of the real words it's it's not a big difference but it's enough to be [0.3] statistically significant [0.4] embedded words tend to be a bit shorter [0.6] than [0.2] the real word [0.6] probably er [0.2] the difference is [0.3] er enough to be over the threshold of our [0.4] er [0.2] ability to perceive differences [0.4] in durations of [0.2] words and syllables [1.4] there was just one final question to answer [0.3] is it that just the entire body of embedded words is shorter [0.4] than [0.2] the whole [0.3] collection of real words [0.2] or is this a genuine relationship that each individual pair of words [0.4] will exhibit [0.3] a greater duration for the real word [0.3] and a shorter duration [0.4] for [0.3] the embedded word [0.7] er so er this is er [0.8] if this this is a rather messy graph but [0.3] it just shows the relationship [0.3] between the durations of embedded words [0.3] and the durations of [0.2] real words [0. 2] and you can see that centre line there [0.4] er represents a trend [0.5] er [0.3] which is that the [0.6] er [0.3] the longer [0.2] a real word is [0.3] the longer [0.2] an embedded word is that is they are [0.4] closely related [0. 5] however [0.3] for any given value of a real word like three-hundred here [0. 4] the corresponding duration of an embedded word is shorter [0.9] so in er the case of all virtually all the words in our data [0.3] and [0.5] i mean i had to admit if you look at some of these dots they're way off that [0.2] centre line [0.3] there's a lot of variation [0.4] but the overall trend is that [0.3] for any given pair of words the embedded word [0.2] will be shorter [0.3] than the [0.2] real word [0.4] and that must be giving us the information that we need [0.3] to identify whether we're hearing a part of a word [0.4] or [0.3] the word as a whole [1.9] that work is still going on i'm still writing it up [0.4] er [0.2] but recently i had to give a talk on this at a conference [0.4] and er [0.4] as conference organizers do they asked me to write it up [0.4] to go er in a collection of papers [0.4] er and since it's a very sh-, er a short paper reporting on work in progress [0.4] er what i'd like to do is give you each a copy [0.3] so that you can go over this at [0.2] er at at more [0.2] leisure [0. 6] so there i was er [0.2] quarter of an hour before the lecture began [0.3] ready to go on the photocopier [0.3] when i looked at it and realized that it was an early draft which didn't have the diagrams and the statistics in [0.5] er when i went back i realized it's on my computer at home not on my computer at work [0.5] so i'm afraid you don't get it this morning [0.4] but i will put copies in namex's office [0.4] and those will be available tomorrow onwards [0. 4] so if you'd like a copy of the [0.4] most recent paper i've written based on this research [0.3] er [0.4] er er and the bibliography that goes with it [0.3] er there will be enough for one each [0.6] er on the other hand er if you're not interested just leave it there and i'll give it to somebody else [1.2] that gets us to the end of that [0.2] and also to the end of [0.4] the study of [0.4] the relationship between temporal factors and speech perception [0.5] and [0.3] i hope that the [0.4] the general impression that you've got on this [0.4] is that we are not simple [0.4] phoneme crunchers when it comes to [0.3] perceiving speech [0.3] we are not simply taking in a stream of phonemes [0.3] looking them up in a mental dictionary [0.3] and [0.5] churning out a kind of transcript [0.5] what we're doing is [0.2] at the same time monitoring [0.3] a very rich [0.6] er [0.5] stream of prosodic information [0.4] and in some cases also of allophonic variation [0.5] but it's the prosodic side i really want to emphasize [0.2] there is so much going on in the prosody of spoken language [0. 6] it's giving us so much information about [0.2] how to divide the speech up into units [0.3] and how to interpret it [0.4] and it just has to be [0.4] er something of great importance [0.4] er it's something which we only understand in a very dim [0.4] and partial way at the moment but [0.3] a lot more research will be [0.3] er going on [0.3] in future years [0.2] and we should discover more and more about it [0.3] and ultimately we can teach computers that recognize speech [0.3] how to make intelligent use of that information [1.3] is that okay are there any [0.3] questions [4.0] okay [1.8] right then