Boldingbroke: AI Building Blocks: Discerning Intent, the Holy Grail of Any NLP Solution

What was I thinking when I said/typed "Find the best restaurant." The obvious nature of this statement to a human is a clear, declarative desire to get something to eat. And to find the most popular choice in the area. But to a machine, the intent is not at all clear, even for such a simple search query. Consider this phrase: "pain treatment". Here the intent is not so clear even to a human listening to the speaker. Is the pain physical, emotional, existential? Perhaps the best treatment is a psychiatrist instead of ibuprofen.

A person looking for intelligent responses would need to add some context in the form of a prepositional phrase or two to have any chance of narrowing the results of a search. Even a person would not understand the true intent of "pain treatment" until the user added this: "pain treatment for migraines". Now we and the machine know to narrow the search to the field of medicine, specifically neurological medications. However, what if the person really meant "holistic treatment for migraines"? With the addition of an adjective, the focus shifts from drug therapy to organic protocols such as meditation, light reduction techniques, and perhaps herbal or traditional Chinese teas and tinctures.

This simple example of communication highlights the nature of language as ambiguous and changing, depending on the context. The semantic weight of various components is essential to understanding the desire of the speaker. Therefore it is necessary for the machine to also learn the semantics and grammar structure of the language at hand. What is a noun? An adjective? How to recognize and break down a prepositional phrase. More importantly, are there verbs involved? These indicate actions that are vectors of intent. 'Intent' in NLP is the desired outcome of a behavior. In parallel, behaviors are the outcome of verbs. How you behave is what you do. So, it should be straight-forward to see the value of verb analysis, instead of just counting nouns for statistical purposes.

^[1]

Stringing together words is a skill that children pick up from their social environment. Consider: "The dog bited me." And then imagine the mother or father correcting the child to "the dog bit me." Thus enforcing an irregular verb form of the past tense into the child's understanding. Semantics cannot be underrated in the search and broader NLP processing endeavor.

Many computational linguists rely on statistics. Count the nouns and how many times they appear in a text, called clustering, to see what a document contains. But statistics are not enough. To have more than a topical understanding of text, or conversations that have been transcribed to text, such as a phone call, the ideas or subject matter must be discerned from the context of the larger subject matter. A paragraph has a theme or major idea to it. If that paragraph contains a lot of medical jargon, to use the original example above, it could be about migraines without ever using that term.

NLU, or Natural Language Understanding, attempts to solve this issue. To read a document sentence by sentence, paragraph by paragraph, is more complex than just looking at heading titles and counting nouns. NLP and its sub-disciplines, NLU and NLG (Natural Language Generation) rely on grammar parsers, part of sentence tagging, and other tools to break down the sentence into component parts, examine their relationship to other parts of the sentence, and apply rules to grasp its core meaning. Then the paragraph can be examined, again looking at structure and content, to label that grouping of sentences with tags indicating the topic, the major ideas, and other nuances of semantic significance.

But all of this does not answer the question of intent. What does the person really want, or want to convey, when they make an utterance. Sometimes I don't even know what I meant by what I just said. I think of different ways to phrase it, to get just the right emphasis, the right idea across to my listener. This is why people practice speeches, instead of speaking extemporaneously. Some people just say what they're thinking, then have to explain what they really mean by the words they chose. The ensuing verbal vomit results in comedy. Detection of sarcasm, irony,^[2] and the like is another problem. Were they serious, or being emotionally manipulative? All of these issues result in a practice of NLP called "sentiment analysis." Yet another sub-discipline. How to determine the emotional attitude of the speaker.

As you may be beginning to realize, the many factors behind "intent" in speech analysis create a fascinating landscape for research. And they open the door to bias and pollution by world views that are in conflict.

Teaching a Machine to Understand Intent

When we interact with another human, we rarely recognize the surface behaviors. What the brain cares about is the underlying message. Facial expression, hand motions, body language are all subconsciously cataloged. But the computer often does not have these types of kinematic inputs available. The machine only has the plain text in front of it. "A generative knowledge system underlies our skill at discerning intentions, enabling us to comprehend intentions even when action is novel and unfolds in complex ways over time. Recent work spanning many disciplines illuminates some of the processes involved in intention detection."^[3] Even the medical community struggles to define and quantify the elements of communication leading to a discernment of intent.

As we have detailed before, a computer is a logic machine, working step-by-step to break a problem down into a number of processes, sub-processes, and algorithms that interact to create an outcome or goal. Here are the common steps for NLU, the first part of intent analysis.

Analyze the Topic: Extract key concepts and overall subject matter
Extract Relevant Context: What is the general background of the discussion
Syntactic Analysis: Sentence structure and defined meaning of nouns, and other parts of speech using a Part of Speech tagger.
Identify Actors: Who are the people, organizations, and agents involved, how important are they compared to each other
Semantic Analysis: Resolve the contextual meaning of the word, where it may have multiple meanings.
Sentiment Analysis: What are the 'moods' of the user? Emotion, attitude, mental state.

Discerning the actor's intent is one of NLU's key strengths. To successfully identify intent, NLU leverages translation, text summarization, clustering, speech recognition, topic analysis, named entity recognition, semantic analysis, etc. all of the subcomponents of the NLP toolkit that have been developed through intensive research. Most of the current use cases are related to consumer analysis applied to commerce and support systems.

An example of a good, multilingual sentiment analysis and intent discernment tool is Rosette, available in over 30 languages. It provides a combination of tools (as listed above) that automate the effort. There are other services out there such as MarsView.ai to ease the development cycle.

Data Implications

As with any ML (machine learning) problem, the quality of the training data for a model determines its viability and value. Many use cases, including the one around intent, require labeled data to give the model a jumpstart in building its view of the content. Business goals determines in large part what labels a client chooses. If you are in the medical industry, you choose medical terms as the labels. If you are in technology, you choose programming concepts, terms, and structures as the labels.

Why is this important? Essentially you are trying to create vectors pointing to a common meaning. If you parse the 'concept' of Mercury, you need to indicate the domain in which the term has meaning. Is it a god from Roman mythology? A chemical element? A planet? Or the car from Ford in 1938? See the problem? Context matters. The model is only as good as the tags you put on the data when it's cleansed and prepared for processing.

And what's with the 'vector' thing? Word embeddings, called vectors, are representations of text data where words with similar contextual meaning have a similar representation. In other words, synonyms, what are all the various ways to say the same thing? A Roget's Thesaurus comes to mind. Words from the text are represented as calculated vectors in a predefined vector space, or domain of knowledge. One of the most common tools for this is Word2Vec, which superseded LSA (Latent Semantic Analysis). Each look to create an understanding of a word based on the context in which it is used. The creation of a vector space is more graph-ical (as in graph based) than the tree structure approach of an ontology. In a coordinate-based system, words that are related are in close proximity to one another based on a corpus of relationships that can be calculated and turned in to mathematical formulas.

Again, why should we care? Here's why: These models are increasingly used to drive services over the internet. And they may contain biases and suppositions based in world views that we agree or disagree with. The outcome of the model and how it is applied to a problem can be discriminatory at its very root if the tagging and resulting word vectors are not carefully constructed. Training a model with data that brings in multiple perspectives is essential to creating a well-rounded knowledge domain.

An Impossible Dream?

Back to the Holy Grail. "What I meant to say was..." How often to you hear people explaining themselves after saying something that is difficult to understand. Even humans have a hard time with intent. To let a machine do the work for us means that multiple techniques must be deployed to get within acceptable tolerances of what Natural implies in NLP.

This is why it's a 'holy grail' type of problem in machine learning and AI. The desire to parse out what a person really means when they say something is a hard problem in computational linguistics. And we are far from a solution that works well.

[1] Conversational AI over Military Scenarios Using Intent Detection and Response Generation, Hsiu-Min Chuang, Ding-Wei Cheng, Current Approaches and Applications in Natural Language Processing, Appl. Sci. 2022, 12(5), 2702. doi: 10.3390/app12052494.

[2] Sociolinguistically Informed Natural Language Processing: Automating Irony Detection, D.A. Baird, J.A. Baird, Trends in Cognitive Science, 2001 Apr 1;5(4):171-178. doi: 10.1016/s1364-6613(00)01615-6.

[3] Discerning intentions in dynamic human actions, D.A. Baird, J.A. Baird, Trends in Cognitive Science, 2001 Apr 1;5(4):171-178. doi: 10.1016/s1364-6613(00)01615-6.

Boldingbroke: AI Building Blocks

Thursday, May 2, 2024

Discerning Intent, the Holy Grail of Any NLP Solution

Teaching a Machine to Understand Intent

Data Implications

An Impossible Dream?

No comments:

Post a Comment

Generative AI: Risks and Rewards

Search This Blog

Report Abuse

Search This Blog