52 349 Logic Programming

Chapter 12 - Natural Language Processing


§12.0 Introduction §12.2.2 Number Matching (continued)
§12.1 Definite Clause Grammars §12.3 Adding Further Arguments
§12.2 Number Matching §12.4 Concluding N.L.P Example
§12.2.1 Classification of Grammars


Previous Reading: Expert Systems & Databases

§12.0 Introduction

Why? Why N.L.P?

To take it further, consult ...

... Or the bible (though volumes 2+ never published)


Aside - written or oral/aural?

For the latter we need:

This is a highly non-linear process, with much back tracking between the stages
- whence development (inter alia) of blackboard architecture expert systems

The simpler subset of the last pair of problem stages is one of the written word ... but it is still very difficult!

For the former pair, consider (you need to listen to the sounds involved) -

Three years ago
Three years say go
Three year sago
Three here. A go

All are (just!) plausible

Which is right? The answer to that


Applications of N.L.P?

Perhaps we should also explicitly add

And how do we judge correctness?

Natural language processing really is difficult!


What we have met is the problem of ambiguity

Ambiguity can be global, or local - which gives us a problem:

What we have to do, is to stifle our ambition - take one step at a time!

Nonetheless, the usual approach is by immediate constituent analysis, leading to a syntactic structure, whence (with limits of integration 0 to S)

Meaning(S) = integral meaning(si) dsi


§12.1 N.L.P In Prolog - Definite Clause Grammars

Let's agree to represent sentences by lists of atoms

[it, is, prolog, time, again]

Clocksin & Mellish in their I/O chapter construct a program to produce such a list from input like

it is prolog time again.

Then what we must want to do, is gradually to strip a word at a time from our alleged sentence, to arrive at an empty list


The simplest sentences are of the form

or
or

For the first two, we have a noun phrase followed by a verb phrase (the "phrase" for future generality). The third starts the same way, but then gets trickier!

So perhaps the following as a starting point?

sentence(S,[]):-
noun_phrase(S,S1),
verb_phrase(S1,[]).

Yes, if we add:

noun_phrase([boys|L], L).
noun_phrase([girls|L], L).
noun_phrase([people|L], L).
verb_phrase([sing], []).
verb_phrase([dance], []).

And, for sentence three,

verb_phrase([play|L], []):-
noun_phrase(L, []),
noun_phrase([hockey], []).

Which works well enough - although it does permit semantic oddities like "hockey sing" and (perhaps not quite so odd) "boys play girls" - but is ... tedious. Why do we need to repeat the two arguments like [_|L] and L?

So ... more syntactic sugar, in the form of -->/2

And we write (into our file, for consulting)

sentence --> noun_phrase,
verb_phrase
noun_phrase --> [boys].
noun_phrase --> [girls].
% et cetera
verb_phrase --> [sing].
% et cetera
verb_phrase --> [play], noun ...
% No: we'll revisit this last one later!
% Done like this,
% we would need a further convention.

The Prolog interpreter, when it converts our file of Definite Clause Grammar material into "standard" Prolog, automatically adds in the two needed extra arguments which caused the tedium.

Sample File Prolog Listing
Run 1 Run 2

What we have done, is to arrive at the Backus-Naur Form (BNF) expression of a grammar

And we have exactly the problems and delights we are accustomed to with BNF. In particular, we can generate the sentences of our language, or parse an alleged sentence for correctness.


§12.2 Number Matching

Rules like these are perfectly satisfactory - but very limited!

We know that English has rules about number matching ...

Thus

boys sing Correct!
boy sings Correct!
boys sings Wrong!

One route is:

sentence --> singular_sentence.
sentence --> plural_sentence.
singular_sentence --> singular_noun_phrase,
singular_verb_phrase.
plural_sentence --> plural_noun_phrase,
plural_verb_phrase.
% etcetera.

But this is very messy:

sentence --> sentence(Number).
sentence(Number) --> noun_phrase(Number),
verb_phrase(Number).
noun_phrase(Number) --> noun(Number).
verb_phrase(Number) --> verb_phrase(Number).
noun(singular) --> [boy].
noun(plural) --> [boys].
verb(singular) --> [sings].
verb(plural) --> [sing].

The D.C.G. notation will accept this perfectly happy

But notice that in doing this we have changed the style of our grammar (not surprisingly: after all, we are encoding extra information) ... it has changed from being context free to become context sensitive.


§12.2.1 Classification of Grammars

As an aside, we recall (material amplified in the Compilers Class 52 348) Chomsky's classification of grammars, from the most general (and least tractable) to the simplest

Remember that types 1 - 3 (but not 0) are decidable ... that is, we can build an algorithm which will check for any string x whether x is a member of the language generated by the grammar.

But in N.L.P, decidability is of little interest ... what we are usually concerned with is parsing a given string, rather than with recognising one as valid.

Our excursion into number matching (and any related cross-check, for example building a parse tree, which involves extra parameters in our D.C.G) always potentially takes us from a context-free world to a context-sensitive one, from type 2 grammars to type 1 grammars


§12.2.2 Number Matching (continued)

Earlier we listed, explicitly, both 'boy' and 'boys' as nouns, one singular, the other plural.

Surely we can get round this, can use the fact that many nouns have a plural of the form, singular + s?

Of course we can!

But it needs some extra D.C.G symbolism - consider

noun(plural) --> [Noun],
% and what?

We need two things:

  1. to decompose "Noun" into characters - easy:
    name(Noun, Noun_list)

  2. to prevent the interpreter adding the two extra D.C.G parameters

We can guard against the latter by enclosing the "proper" Prolog within curly brackets

Thus we might build up our pluralisation code for "Noun" as

noun(plural) -->
[Noun],
{ ( name(Noun, Noun_list),
append(Single_list, "s", Noun_list),
name(Single, Single_list),
noun(singular, [Single|L], L)
) }.

Note both the curly brackets and the convention (required by some systems) of internal brackets within them.


Of course, this pluralisation rule only holds most of the time, not all of it
- for example, "fly" becomes "flies", and "mouse" becomes "mice"

To cope with that, we have no option but to code the exceptions explicitly:

noun(singular) --> [fly].
noun(plural) --> [flies], !.
noun(singular) --> [mouse].
noun(plural) --> [mouse], !.

Notice both the use of the cut - to eliminate "flys", "mouses" - and the fact that normally it does not require wrapping in curly brackets.


§12.3 Adding Further Arguments

In exactly similar ways we can cope with the plurality of verbs

so that we can extend the arrangement to cope with things like

I eat We eat
You eat You eat
He/She/It eats They eat

or with tenses and moods.

Most importantly, we can extend it to cover parse trees

sentence (
noun_phrase (
noun (boys)
)
verb_phrase (
verb (sing)
)
)

Important - but too much for now!

Besides anything else, it takes us dangerously close to the idea of the "deep meaning" of a sentence.


§12.4 Concluding N.L.P Example

The only sentences we have looked at, have been very simple. Let's end by being a little more realistic - but ignoring, for simplicity, number and parse trees and the like.

sentence --> noun_phrase,
verb_phrase.
noun_phrase --> determiner,
adjective,
noun.
verb_phrase --> verbal_phrase.
verb_phrase --> transitive_verbal_phrase,
noun_phrase.
verbal_phrase --> verb,
adverb.
transitive_verbal_phrase --> transitive_verb,
adverb.

But now we find that "boys sing" is unacceptable! We must add three things:

noun_phrase --> determiner,
noun.
determiner --> []. % For plurals.
verbal_phrase --> verb.

Life quickly gets very complicated, even for what are officially "simple sentences"


© Paul Goldfinch 1997 Return to 52 349 Menu