|§12.0||Introduction||§12.2.2||Number Matching (continued)|
|§12.1||Definite Clause Grammars||§12.3||Adding Further Arguments|
|§12.2||Number Matching||§12.4||Concluding N.L.P Example|
|§12.2.1||Classification of Grammars|
Why? Why N.L.P?
To take it further, consult ...
... Or the bible (though volumes 2+ never published)
Aside - written or oral/aural?
For the latter we need:
This is a highly non-linear process, with much back tracking between the stages
- whence development (inter alia) of blackboard architecture expert systems
The simpler subset of the last pair of problem stages is one of the written word ... but it is still very difficult!
For the former pair, consider (you need to listen to the sounds involved) -
|Three years ago|
|Three years say go|
|Three year sago|
|Three here. A go|
All are (just!) plausible
Which is right? The answer to that
Applications of N.L.P?
Perhaps we should also explicitly add
And how do we judge correctness?
Natural language processing really is difficult!
What we have met is the problem of ambiguity
Ambiguity can be global, or local - which gives us a problem:
What we have to do, is to stifle our ambition - take one step at a time!
Nonetheless, the usual approach is by immediate constituent analysis, leading to a syntactic structure, whence (with limits of integration 0 to S)
Let's agree to represent sentences by lists of atoms
Clocksin & Mellish in their I/O chapter construct a program to produce such a list from input like
Then what we must want to do, is gradually to strip a word at a time from our alleged sentence, to arrive at an empty list
The simplest sentences are of the form
For the first two, we have a noun phrase followed by a verb phrase (the "phrase" for future generality). The third starts the same way, but then gets trickier!
So perhaps the following as a starting point?
Yes, if we add:
And, for sentence three,
Which works well enough - although it does permit semantic oddities like "hockey sing" and (perhaps not quite so odd) "boys play girls" - but is ... tedious. Why do we need to repeat the two arguments like [_|L] and L?
So ... more syntactic sugar, in the form of -->/2
And we write (into our file, for consulting)
The Prolog interpreter, when it converts our file of Definite Clause Grammar material into "standard" Prolog, automatically adds in the two needed extra arguments which caused the tedium.
|Sample File||Prolog Listing|
|Run 1||Run 2|
What we have done, is to arrive at the Backus-Naur Form (BNF) expression of a grammar
And we have exactly the problems and delights we are accustomed to with BNF. In particular, we can generate the sentences of our language, or parse an alleged sentence for correctness.
Rules like these are perfectly satisfactory - but very limited!
We know that English has rules about number matching ...
One route is:
But this is very messy:
The D.C.G. notation will accept this perfectly happy
sentence/1as we have it maps into the true Prolog
But notice that in doing this we have changed the style of our grammar (not surprisingly: after all, we are encoding extra information) ... it has changed from being context free to become context sensitive.
As an aside, we recall (material amplified in the Compilers Class 52 348) Chomsky's classification of grammars, from the most general (and least tractable) to the simplest
Remember that types 1 - 3 (but not 0) are decidable ... that is, we can build an algorithm which will check for any string x whether x is a member of the language generated by the grammar.
But in N.L.P, decidability is of little interest ... what we are usually concerned with is parsing a given string, rather than with recognising one as valid.
Our excursion into number matching (and any related cross-check, for example building a parse tree, which involves extra parameters in our D.C.G) always potentially takes us from a context-free world to a context-sensitive one, from type 2 grammars to type 1 grammars
Earlier we listed, explicitly, both 'boy' and 'boys' as nouns, one singular, the other plural.
Surely we can get round this, can use the fact that many nouns have a plural of the form,
Of course we can!
But it needs some extra D.C.G symbolism - consider
We need two things:
We can guard against the latter by enclosing the "proper" Prolog within curly brackets
Thus we might build up our pluralisation code for "Noun" as
Note both the curly brackets and the convention (required by some systems) of internal brackets within them.
Of course, this pluralisation rule only holds most of the time, not all of it
- for example, "fly" becomes "flies", and "mouse" becomes "mice"
To cope with that, we have no option but to code the exceptions explicitly:
Notice both the use of the cut - to eliminate "flys", "mouses" - and the fact that normally it does not require wrapping in curly brackets.
In exactly similar ways we can cope with the plurality of verbs
so that we can extend the arrangement to cope with things like
|I eat||We eat|
|You eat||You eat|
|He/She/It eats||They eat|
or with tenses and moods.
Most importantly, we can extend it to cover parse trees
Important - but too much for now!
Besides anything else, it takes us dangerously close to the idea of the "deep meaning" of a sentence.
The only sentences we have looked at, have been very simple. Let's end by being a little more realistic - but ignoring, for simplicity, number and parse trees and the like.
But now we find that "boys sing" is unacceptable! We must add three things:
Life quickly gets very complicated, even for what are officially "simple sentences"
|© Paul Goldfinch 1997||Return to 52 349 Menu|