What was I thinking when I said/typed "Find the best restaurant." The obvious nature of this statement to a human is a clear, declarative desire to get something to eat. And to find the most popular choice in the area. But to a machine, the intent is not at all clear, even for such a simple search query. Consider this phrase: "pain treatment". Here the intent is not so clear even to a human listening to the speaker. Is the pain physical, emotional, existential? Perhaps the best treatment is a psychiatrist instead of ibuprofen.
A person looking for intelligent responses would need to add some context in
the form of a prepositional phrase or two to have any chance of narrowing the
results of a search. Even a person would not understand the true intent of
"pain treatment" until the user added this: "pain treatment for
migraines". Now we and the machine know to narrow the search to the field
of medicine, specifically neurological medications. However, what if the person
really meant "holistic treatment for migraines"? With the addition of
an adjective, the focus shifts from drug therapy to organic protocols such as
meditation, light reduction techniques, and perhaps herbal or traditional
Chinese teas and tinctures.
This simple example of communication highlights the nature of language as
ambiguous and changing, depending on the context. The semantic weight of
various components is essential to understanding the desire of the speaker.
Therefore it is necessary for the machine to also learn the semantics
and grammar structure of the language at hand. What
is a noun? An adjective? How to recognize and break down a
prepositional phrase. More importantly, are there verbs involved? These indicate
actions that are vectors of intent. 'Intent' in NLP is the desired outcome of a
behavior. In parallel, behaviors are the outcome of verbs. How you behave is
what you do. So, it should be straight-forward to see the value of verb
analysis, instead of just counting nouns for statistical purposes.
Stringing together words is a skill that children pick up from their social
environment. Consider: "The dog bited me." And then imagine the
mother or father correcting the child to "the dog bit me." Thus
enforcing an irregular verb form of the past tense into the child's
understanding. Semantics
cannot be underrated in the search and broader NLP processing
endeavor.
Many computational linguists rely on statistics. Count the nouns and how many times they appear in a text, called clustering, to see what a document contains. But statistics are not enough. To have more than a topical understanding of text, or conversations that have been transcribed to text, such as a phone call, the ideas or subject matter must be discerned from the context of the larger subject matter. A paragraph has a theme or major idea to it. If that paragraph contains a lot of medical jargon, to use the original example above, it could be about migraines without ever using that term.
NLU, or Natural Language Understanding, attempts to solve this issue. To
read a document sentence by sentence, paragraph by paragraph, is more complex
than just looking at heading titles and counting nouns. NLP and its
sub-disciplines, NLU
and NLG (Natural Language Generation) rely on grammar parsers, part
of sentence tagging, and other tools to break down the sentence into component
parts, examine their relationship to other parts of the sentence, and apply
rules to grasp its core meaning. Then the paragraph can be examined, again
looking at structure and content, to label that grouping of sentences with tags
indicating the topic, the major ideas, and other nuances of semantic
significance.
But all of this does not answer the question of intent. What does the person
really want, or want to convey, when they make an utterance. Sometimes I don't
even know what I meant by what I just said. I think of different ways to phrase
it, to get just the right emphasis, the right idea across to my listener. This
is why people practice speeches, instead of speaking extemporaneously. Some
people just say what they're thinking, then have to explain what they really
mean by the words they chose. The ensuing verbal vomit results in comedy.
Detection of sarcasm, irony,[2]
and the like is another problem. Were they serious, or being emotionally
manipulative? All of these issues result in a practice of NLP called
"sentiment analysis." Yet another sub-discipline. How to determine
the emotional attitude of the speaker.
As you may be beginning to realize, the many factors behind
"intent" in speech analysis create a fascinating landscape for
research. And they open the door to bias and pollution by world views that are
in conflict.
Teaching a Machine to Understand Intent
When we interact with another human, we rarely recognize the surface
behaviors. What the brain cares about is the underlying message. Facial
expression, hand motions, body language are all subconsciously cataloged. But
the computer often does not have these types of kinematic inputs available. The
machine only has the plain text in front of it. "A generative knowledge
system underlies our skill at discerning intentions, enabling us to comprehend
intentions even when action is novel and unfolds in complex ways over time.
Recent work spanning many disciplines illuminates some of the processes
involved in intention detection."[3]
Even the medical community struggles to define and quantify the elements of
communication leading to a discernment of intent.
As we have detailed before, a computer is a logic machine, working
step-by-step to break a problem down into a number of processes, sub-processes,
and algorithms that interact to create an outcome or goal. Here are the common
steps for NLU, the first part of intent analysis.
- Analyze
the Topic: Extract key concepts and overall subject matter
- Extract
Relevant Context: What is the general background of the discussion
- Syntactic
Analysis: Sentence structure and defined meaning of nouns, and other parts
of speech using a Part of Speech tagger.
- Identify
Actors: Who are the people, organizations, and agents involved, how
important are they compared to each other
- Semantic
Analysis: Resolve the contextual meaning of the word, where it may have
multiple meanings.
- Sentiment
Analysis: What are the 'moods' of the user? Emotion, attitude, mental
state.
Discerning the actor's intent is one of NLU's key strengths. To successfully
identify intent, NLU leverages translation, text summarization, clustering,
speech recognition, topic analysis, named entity recognition, semantic
analysis, etc. all of the subcomponents of the NLP toolkit that have been
developed through intensive research. Most of the current use cases are related
to consumer analysis applied to commerce and support systems.
An example of a good, multilingual sentiment analysis and intent discernment
tool is Rosette,
available in over 30 languages. It provides a combination of tools (as listed
above) that automate the effort. There are other services out there such as MarsView.ai to ease the
development cycle.
Data Implications
As with any ML (machine learning) problem, the quality of the training data for a model
determines its viability and value. Many use cases, including the one around
intent, require labeled data to give the model a jumpstart in building its view
of the content. Business goals determines in large part what labels a client
chooses. If you are in the medical industry, you choose medical terms as the
labels. If you are in technology, you choose programming concepts, terms, and
structures as the labels.
Why is this important? Essentially you are trying to create vectors pointing
to a common meaning. If you parse the 'concept' of Mercury, you need to
indicate the domain in which the term has meaning. Is it a god from Roman
mythology? A chemical element? A planet? Or the car from Ford in 1938? See the
problem? Context matters. The model is only as good as the tags you put on the
data when it's cleansed and prepared for processing.
And what's with the 'vector' thing? Word embeddings, called vectors, are
representations of text data where words with similar contextual meaning have a
similar representation. In other words, synonyms, what are all the various ways
to say the same thing? A Roget's Thesaurus comes to mind. Words from the text
are represented as calculated vectors in a predefined vector space, or domain
of knowledge. One of the most common tools for this is Word2Vec,
which superseded LSA
(Latent Semantic Analysis). Each look to create an understanding of
a word based on the context in which it is used. The creation of a vector space
is more graph-ical (as in graph based) than the tree structure approach of an
ontology. In a coordinate-based system, words that are related are in close
proximity to one another based on a corpus of relationships that can be
calculated and turned in to mathematical formulas.
Again, why should we care? Here's why: These models are increasingly used to
drive services over the internet. And they may contain biases and suppositions
based in world views that we agree or disagree with. The outcome of the model
and how it is applied to a problem can be discriminatory at its very root if
the tagging and resulting word vectors are not carefully constructed. Training
a model with data that brings in multiple perspectives is essential to creating
a well-rounded knowledge domain.
An Impossible Dream?
Back to the Holy Grail. "What I meant to say was..." How often to
you hear people explaining themselves after saying something that is difficult
to understand. Even humans have a hard time with intent. To let a machine do
the work for us means that multiple techniques must be deployed to get within
acceptable tolerances of what Natural implies in NLP.
This is why it's a 'holy grail' type of problem in machine learning and AI.
The desire to parse out what a person really means when they say something is a
hard problem in computational linguistics. And we are far from a solution that
works well.
S. Bolding, Copyright © 2022 ·Boldingbroke.com
[1]
Conversational AI over
Military Scenarios Using Intent Detection and Response Generation,
Hsiu-Min Chuang, Ding-Wei Cheng, Current Approaches and Applications in
Natural Language Processing, Appl. Sci. 2022, 12(5), 2702.
doi: 10.3390/app12052494.
[2]
Sociolinguistically
Informed Natural Language Processing: Automating Irony Detection,
D.A. Baird, J.A. Baird, Trends in Cognitive Science,
2001 Apr 1;5(4):171-178. doi: 10.1016/s1364-6613(00)01615-6.
[3] Discerning intentions in dynamic human actions, D.A. Baird, J.A. Baird, Trends in Cognitive Science, 2001 Apr 1;5(4):171-178. doi: 10.1016/s1364-6613(00)01615-6.
No comments:
Post a Comment