Boldingbroke: AI Building Blocks: April 2024

Tuesday, April 30, 2024

Predictive Analytics

Using Machine Learning, statistical methods, and a variety of other techniques from data mining principles, Predictive Analytics is a discipline of data sciences that takes historical and current data and uses that information to make predictions about related unknown future events. It’s not mind reading, rather it’s a statistical probability that something will occur based on past activity. A vector projected into the future that says if people or things keep behaving the way they have in the past, they will continue on this same path into the future. Of course, there could be some unforeseen catastrophic event that changes the course of history, and therefore the behavior of people. But in general, people and things (like Markets, Industries, or even Diseases) tend to behave in predictable ways.

To determine this path of behavior, this vector if you will, a score or probability for each entity in the pool of data is calculated. Remember, as we have seen before an entity is just a noun, a thing, such as a customer, an employee, a patient, a machine, an SKU being sold.

Because you can measure, or score, the probability of behavior for entities, this type of machine learning technique is very popular with financial companies, such as insurance and banking. It can be used for credit scoring by looking a customer’s credit history and determine the likelihood of making payments on time. Predictive analytics can also be used by medical researchers to track the development of diseases in populations over time. Other applications are in the areas of social networking, gaming, commerce, marketing, insurance, travel, and many more. One emerging popular application is to predict global warming trends, as seen in this piece on Predicting the Impact of Artic Sea Ice Loss on Climate.

Unlike other types of data sciences disciplines, the emphasis is on prediction rather than description, classification, or clustering. In other words, it is forward-looking rather than steady state. The goal is a rapid analysis with an emphasis on business relevance and ease of use for the end user. Predictive analytics provides a means to become proactive instead of reactive to situations in the business landscape, providing insights and information in advance of when they’re needed.

How does it work

As with any process, there are a standard number of steps to follow:

Define your project
Collect the necessary data
Analyze that data
Statistically model the data
Use predictive modeling techniques
Deploy your model
Monitor its performance and adjust as necessary

After determining the business goals, in order to ground your research you will need to choose a methodology. The approach a researcher takes is entirely dependent upon the question being asked, the business use case that is being worked on. Start with a clearly defined objective. For example, if you are working in medicine, the approach will take a different route than if you are working in gaming or entertainment. There are a number of methodologies that can be applied when talking about Predictive Analytics. The two most common are Regression techniques and Machine Learning techniques. Machine Learning was discussed in an earlier post.

Regression types include linear regression, logistic regression, time series models, classification and r egression trees. These are among the most common. Less well known and yet just as powerful, discrete choice, probit regression, and multivariant adaptive regression models and can be good choices depending on the use case being worked on. These techniques are fairly rigorous and detailed to explain. You can learn more about them in a deeper dive if you want to become a full-on data scientist and learn to code. Suffice it to say, you should hire an expert to build these models if you need someone to perform this type of analysis for you. Here is a simple example explaining linear regression:

Familiarizing yourself with the tools available will help when you have conversations with the experts who are performing the coding and research. You may not know how to perform a time series model, but you can at least know that it’s different from a linear regression model and recognize the different charts created from the two data outputs.

Creating a model

To create one of these beasts, you need data, lots and lots of data to work with. The data can be raw, or what’s called ‘unsupervised’, where it has not been looked at by an expert who tags it with labels. Labeling is a process of putting attributes or metadata into the information. At this point some of the data becomes ‘supervised’ because it is being cleaned up by an expert in the domain of knowledge. That person knows a lot about the topic and can determine enough about the document or information to say what’s going on. The tags will help teach the machine to learn about the information, circling back to Machine Learning techniques.

Usually, researchers will try to have a sample of up to 10% of the data tagged for purposes of training the model. This is a very human intensive task, expensive and time consuming. Once the initial model is trained with the 10% of the data, then it can scan through the remaining 90% and look for the same patterns. At this point you have a fully functional model that can be used with any sets of data. Because information is always changing as people and societies evolve, the model will need to be retrained or updated with fresh training data on a periodic basis.

Because this is a new science, some people are very skeptical about the outcomes and dangers of using it in sociological applications such as politics and policing. Scenarios such as Minority Report are always used as cautionary tales. It is one thing to predict that someone will buy the newest widget, it is another thing to predict they will commit a crime and put them in jail for something they have yet to do. But as I stated at the beginning, this is not an attempt by the machine to read one person's mind. Rather it is an attempt in aggregate to get a read on where groups of people who have common characteristics may be headed for purposes of civic planning and other social applications. To read more about this topic, see this article in Scientific American: Will Democracy Survive Big Data and Artificial Intelligence?

Machine Learning Step by Step

Machine Learning is at its most simple definition a technique whereby the computer learns from data rather than being programmed to perform an explicit task. Using labeled datasets, known as training data, algorithms look for patterns and are taught to recognize what is good data that matches those patterns. A model is then produced and applied to a larger, unstructured set of data. Because the model was taught, it can comb through the vast amounts of unstructured data faster and find the patterns we humans are seeking to uncover.

There, that wasn’t so hard, was it? But wait. It’s far more complicated than what I just described. How do you label the data? What are those pesky algorithms anyway and how do they know how to read through the training data and pick up what’s important out of the language? Model sounds easy, but there are many types of models that can be created depending on your business needs and goals. The approaches to machine learning are varied and evolving as research progresses in orthogonal directions. supervised, unsupervised, reinforced, and deep learning are just a few of the more popular approaches.

There are three steps to follow:

Get some data
Create and train the model
Test and refine the model

Training Data

In any big data processing challenge, the first hurdle to overcome is finding the right set of data to work with. Sometimes too much data is just as much a problem as too little data. The need to cleanse and normalize content so that it is able to be handled by software has long been a challenge in Computer Sciences. This process is known as ETL—Extract, Transform, Load. However, if you were able to work with unstructured data and avoid the ETL process altogether, there would be a huge time and cost savings. Enter the concept of Machine Learning and Training Data.

With Training Data, you teach the computer to recognize patterns by labeling a small set of data with the information you are seeking. You place tags or metadata into the information, and let the computer learn from a subset of the ideas. Then the computer goes and sifts through all that unstructured data looking for the same patterns, discarding anything that doesn’t match and returning results of a search for those that match the pattern or are near matches within a threshold that can be set. Another application is to apply tagging or adding metadata to those items that do match. This is one type of Auto-Labeling.

Right now, the process of creating training data is very manual. For most purposes, humans need to review the training data and add the tags according to the business goals. Depending on the outcomes, one set of data can have a variety of purposes. For example, reviewing project documentation can show the need for more job training or analyzing electronic and voice communications can reveal espionage.

There is a growing trend of using NLP for Auto-Labeling to create training data. This process looks at the context and content to extract entities and determine what are the most important elements of a document or data item (such as a banking transaction, wire transfer, communiqué) and adds the metadata automatically. These services are still in their first stages of development and the quality is not yet established. Therefore, the output is always validated by humans and is frequently subject to iterative learning processes (see below) and rigorous testing.

Training Data has a hidden risk in it. Whoever tags the data creates the focus or intent of the model that learns from it. In other words, those tags are the “teacher” of the model. If that person is biased or leans even unintentionally in a particular direction, the model becomes biased as well. A lack of data creates bias. For example, a well-known bias is that much medical data is based on white males. There is not a lot of medical research data for let’s say females from 3^rd-world countries or elderly poverty communities in the records.

Therefore, models using training data will naturally be biased towards those demographics that were used to train them. The way to fix this problem is to have a more diverse set of training data, and more involvement within people who tag the data. Another solution is the Auto-Labeling. Let the machines tag the data, and then have humans review the output for bias or lack thereof in the quality control stages.

Algorithms for Model Creation

When a model is created, a specific set of code is used to process the data. These steps to process and sift through the information are called algorithms. There are many types that have been created by researchers who specialize in understanding how information is catalogued and broken down. Some of the most popular are regression algorithms, decision trees, clustering and associative algorithms, and neural networks. Each one has its benefits and drawbacks. If you want to go deeper, IBM has a nice description of the various approaches here.

The process of creating a model based on the approach you choose is a topic in its own right for a future discussion. In brief, you train the model using your training data set, compare the output with what you were expecting it to produce, and then adjust the variables until you get the desired results. This creates a targeted, precision model that should be trained to find what you are looking for. In other words, you have taught it to look for a specified set of data based on the problem to be solved. A model looking to solve medical problems will be very different in nature from a model looking to drive a robotic arm painting pictures.

Iterative Learning

Iterative Learning is the process of repeated training on renewed data sets over time. When first training the model, as noted above, it’s very important to have the right set of data to get the expected outcomes. However over time that data becomes stale or ineffective. In reality, situations and context change and the information in the model is no longer relevant. The model must be retrained periodically for it to remain precise and effective. This is often required on a 6-month basis, or even more often, and can be very expensive.

Techniques are being invented to keep models up to date without the need for intermittent retraining. Iterative learning is one of those techniques where essentially the machine gets a lot of “practice” at targeting the search for information.

The process is basically one of constantly retraining the base model through pre-processing training data sets and then random testing the results. Once the test cases pass muster, the feedback loop that is returned to the system on the results shows that “tuning” or adjustments for the model are acceptable. As it passed the threshold or bar of acceptability, that portion of the model is updated and put into operation. This iterative process can take place in an offline manner and the results promoted into the operational model once human users approve the end results.

In a true machine-learned system, the whole process can be automated, and the model updated in real-time for immediate use. It depends on the degree to which the humans using the system accept and trust the software to be accurate and of high quality. And of course, it also depends on the domain in which the software operates. If safety concerns are an issue, human-in-the-loop decision making is critical for oversight to ensure lives are not lost, for example with medical applications or airplane safety. If you are talking about low-risk applications, then there is less of a need for concern.

Supervised Learning

Supervised Machine Learning works on labeled data sets with information classified in advance to ensure a predetermined outcome. This is what we most commonly think of when talking about machine learning. The model can be compared to the actual labeled results for testing, but that leads to the danger of “overfitting.” A model that is so closely tied to training data doesn’t handle variation in new data very well. It is too strict in its definitions and will miss edge cases and emerging trends. This is where bias in models comes into play, leading us to the unsupervised approach.

Unsupervised Learning

Unsupervised Learning starts by ingesting unlabeled data, avoiding the cost and time required to tag the training data. It requires algorithms to extract “features” or interesting elements such as nouns, phrases, and topics of interest. The algorithms then sort and classify the chunks of text into patterns that occur with varying degrees of frequency.

This approach is less about decision and predictions and more about identifying patterns that humans would miss because the volume of data being handled is just so vast. Computers are experts at processing huge amounts of data far more efficiently than a team of humans could ever hope to tag and label that same data set. Spam filtering comes to mind in this regard. This approach is popular in cybersecurity.

Reinforcement Learning

Reinforcement Learning can be summed up as the “trial and error” approach. While it is somewhat similar to Supervised Learning, there is no sample set of training data involved. Rather, it attempts to map the best decision by trying a series of answers and recording the success or failure rate over time. It must know the state of the environment it is operating in to succeed. This approach is great for game theory applications.

Deep Learning

Deep Learning frequently involves artificial neural networks. The process requires vast amounts of data to work with due to its design mimicking the learning processes of the human brain. Data passes through multiple layers of calculations, some obvious and others hidden, with weighting and biases built in. The programmers who create each unique neural network are looking to achieve a particular outcome and put the biases in to craft and direct the results to a targeted goal. The learning process for these models can be unsupervised or semi-supervised, meaning the data can have some tagging or metadata applied to it before being ingested.

Applications for Deep Learning include computer vision, speech recognition, self-driving cars, and Natural Language Generation (NLG). Many of these innovations are critical in the field of applied robotics and space exploration. Other use cases are digital assistants, chatbots, medical image analysis, fraud detection, and cybersecurity.

A Cautionary Note

Machine Learning is already impacting our world today and how we live as seen by the examples noted above. It’s not just Siri and Alexa making our lives easier. Protecting our increasingly digital life from cyber terrorism, identity theft, and malicious actors is one way data scientists work in the background to keep consumers from losing their digital right to privacy. We need an Internet Bill of Individual Data Rights.

We are well into a new era. It’s no longer the Age of the Internet. It’s the Age of Digital Data, where a person’s virtual identity is just as important to safeguard as their physical paper trail, if not more so. Increasingly, everything is about who owns the data and where is it housed. Who has access and who is monetizing your digital footprint? Companies like Facebook, Amazon, and Apple have long stated that you are the product because they are using your data, data that they own by virtue of their terms of service, to create revenue. With machine learning techniques this is becoming more true than ever before. Be wise, be safe. Every click you take, every Like you make, they’re watching you.

It's call Predictive Analytics, another way to make even more revenue off of your data, and a topic for a future post.

What is an Entity?

As we begin to explore the world of Natural Language Processing (NLP) and other forms of Machine Learning (ML) or ArtificialIntelligence (AI) tools, there is a foundational concept that will appear in various forms, often by more than one name.

That is the foundational concept of “entities.” Entities are observations in the data of real-life people, companies, places, or other things like cell phones or vehicles. They represent a real-life “who” or “what.” Links in that data are indicative of shared attributes which can create relationships. These relationships create a context for exploration into what the entities are doing and why they are captured into their unique transactional history. Just as nouns have adjectives that describe them, entities have properties or attributes that help distinguish them as unique or similar to other objects in the system. Data extraction techniques help to determine one entity from another by its attributes. Another tool can then cluster similar entities into like groupings by commonalities.

In machine learning with an AI-focused on understanding real-world actors, their relationships, and the meaning of their communications, you will encounter two similarly described activities: Entity or Identity Resolution and Entity Extraction. Entity Resolution is the practice of distilling an individual identity of a person, place, or thing from the parts of structured data coming from many sources. It may appear under many other names as well, like identity resolution, record linking, relationship linkage, record matching, and several different terms. If the job is to match up records from structured data to come up with the ultimate identity of a real-world person, place, or thing, and the relationships between them, then the job is Entity Resolution. Entity Extraction is the practice of identifying the name of real-life people, places, and things mentioned in semi-structured and unstructured text. In other words, Entity Extraction is identifying a unique person in the first place, and Entity Resolution is making sure they really are who they say they are.

Entities are the beating heart of systems dedicated to taking action based on understanding who is who, how they are related, what they are doing, why they are doing it, and if that is good or bad news for an organization. Despite how straightforward this may sound, it is utterly non-trivial. Entity Management means enabling systems to match up all the data from different origin systems required to create a unified identity and then monitor transactions originating from many other systems between entities and activities. These transactions can contain semi-structured, unstructured, and structured data. Each position may provide what’s needed to generate the necessary context to gain insights from or generate system actions with that data.

To build systems able to respond to actors’ behaviors, you must marry Entity Resolution with Entity Extraction, Semantic NLP tools, and a well-developed business, compliance rules base set that when combined permit a context-driven flagging of activity. Under what circumstances would you want to invest the time in building systems that compose various ML/AI-driven components, given the expense and time involved? We can look to the recent news and industry reports for some examples:

Fraud

The State of California loses ~ 8 billion taxpayer dollars to unemployment insurance fraud. Reporting shows that bad actors used the chaos of COVID and the social shut down of the economy to file false claims using identity fraud. The state was unable to verify the identity of claimants, some as young as one-year-old identity fraud victims, due to an inability to cross check databases such as DMV, prison, and death records.

Insider Threat

Ponemon Institute shared in their 2020 Cost of Insider Threats: Global study that the three largest industries affected were companies in financial services, services, and technology and software incurred average annual costs of $14.05 million, $12.31 million, and $12.30 million, respectively. Those are the hard costs of identification and containment and don’t estimate losses generated in events that materially damage customers, create public distrust and dislike.

Both examples have two principal entity types, one or both may be bad actors; people and companies.

The State of California might have avoided multi-billion dollar losses and enabled more citizens to be processed quicker during the COVID-19 crisis if the Unemployment Security department could efficiently marry up all known data held on a citizen and business entities by the state, search that data, apply eligibility rules programmatically, and flag suspicious claims for verification. Systems can also compare answers between relatives to see if the content is similar in nature, use location data, and verify age, and other demographics.

Insider Threat detection is an order of magnitude more nuanced. This requires a cross-system understanding of people, systems, permissions levels, other access levels, relationships, and communication activities. Monitoring behaviors access and communication aligned with rule sets that create the opportunity to neutralize a threat condition before it becomes a costly breach. (Read “What is NLP?” for a primer on teaching systems to recognize the intent in written communication.)

There are other valuable business drivers pushing the desire to know the identity of people, their relationships, and the content of communications between them.

Customer 360 to have a clear understanding of both individual and aggregate customer journeys in their relationship with a product and its supporting services that permit process automation, customization, retention, recommendations, and other marketing intelligence.
Process Automation for business processes, back office, supply chain to reduce operational costs. This includes log file processing for anomalies, maintaining inventory and fulfilling orders, and alerting humans when the assembly line breaks, to name a few examples.
Financial Planning and insights for a view of customer plans and investing strategies including risk appetite. Estate planning strategies and trust management are also potential beneficiaries of relational content and identity management data.
Governance, Risk, and Compliance management with workflow and audit and regulatory reporting. This includes Know-Your-Customer and other watchlist checking.

Entity Resolution also involves graph theory and can extend into Actor-Based Network analysis, where the interactions between entities are mapped and resolved to determine patterns of behaviors within communities of practice. This can occur between people groups, companies (think of Enron and the many companies who were impacted by its malfeasance), or even within germ cultures in medical research. Looking at this level of entity interactions is an advanced discussion and will be the topic of a future post on Actor-Based Network Analysis.

As you continue to explore NLP and other ML/AI toolsets and how they can be composed into systems the deliver risk reduction, cost savings, and greater customer value in product experience delivery, understanding the concept of entities will help make engaging with many other topics in Data Science, Machine Learning, and AI engineering easier. When using AI toolsets in many practical business applications, we use entities to understand who is doing what, when, and why. We often want to either respond at the time to actions and situations, understand details after the fact, or predict the next actions.

What is Artificial Intelligence?

The process of teaching a machine to simulate human thoughts, reasoning, and emotions is known as Artificial Intelligence or AI, wherein the machine demonstrates intelligence through agents or bots that mimic human communications, learning, and problem-solving skills. Just as with classical disciplines, AI is broken into many endeavors, such as reasoning, analysis, machine learning, and knowledge representation. Programmers have to teach a machine (machine learning) to think (reason) and then demonstrate that it understands (analyze) the concept (knowledge representation) and can work to a solution (problem solving) on its own. Independent Problem Solving is one of the key goals of AI. A second, and increasingly important goal of AI is Knowledge Generation, using speech recognition, perception, and Natural Language Generation (NLG) to create better outcomes faster and more efficiently. This can be seen in applications for Technical Support, Agriculture, and finding common medical solutions that have worked in the past for other patients with similar disease profiles. Obviously, Natural Language Processing is another key sub-discipline of AI. The AI Revolution in computing will make your business smarter in many ways, as data gathering advances alongside the increased scale in processing power driven by cloud and distributed computing initiatives.

But you cannot teach a machine to think and act like a human without first understanding what human intelligence is. And this means that Computer Scientists, whether they like it or not, are going to have to collaborate with the Humanities. The Social and Anthropological disciplines offer the best insights into what makes the human mind function, along with Linguistics and Philosophy. The whole debate also engenders questions of ethics: should we create artificial entities endowed with human properties like intelligence, emotions, and choice? Clearly, automating a DevOps function with AI is not going to give birth to SkyNet, but the groundwork of ethical choices is still a relevant topic that will be addressed in future posts.

Intelligence, Artificial and Otherwise

There is no end to the debates, philosophical and otherwise, as to what constitutes intelligence. For the purposes of our discussion, we will rely on a simple definition from Philosophynow.org: “It is a capacity to acquire, adapt, modify, extend and use information in order to solve problems.” Therefore, intelligence is the ability to cope with unpredictable. In other words, to take the unpredictable and make it known and predictable. This concept encapsulates one of the disciplines of AI, which is Predictive Analytics, where the machine takes data and analyzes it in order to surface trends, make them more apparent, and therefore enable predictions. At the base of every debate is the assumption that machines are communicating with other machines (M2M) or with humans (M2H). In this discussion, we shall first look at how humans acquire language and communicate to exchange knowledge, and then how computer languages are modeled on human languages and therefore essentially work on the same structural principles.

When examining the human disciplines, a natural separation between the hard and soft sciences exists where on one hand you have Neurology, Biophysics, and Linguistics which study how the brain (human “hardware”) processes language and on the other hand Communications, Sociology, Psychology and Anthropology which study how humans use language within social context to convey knowledge.

At the end of the day, we can see that a parallelism can be drawn between the common view of the machine and our view of classic human knowledge: each has a classic 7-layer stack when it comes to communications. The following diagram illustrates how logic machines mimic human systems and make it possible to teach a machine to understand human languages. Obviously, there is a lot more to explore in this topic, such as how anthropologists study toolmaking and comparing applications as tools. This parallelism is what makes Computational Linguistics possible at a conceptual level.

The Language Instinct

It is in the heart of man to know and be known by others, at a minimum by at least one other person. This pursuit of community is at the heart of Pascal’s dilemma. (Go look it up.) Humans have the need to communicate, to share their innermost thoughts through words, signs, and signals. This instinct, sometimes referred to as the “Language Instinct” implies that communicating is not an invention like writing or tools, but rather an innate capacity for sharing thoughts and experiences. Many see the brain as a “neural network” where language resides in one region, denoted as Broca’s and Wernicke’s areas, supported by other functions such as the motor skills necessary to move the mouth and tongue, reasoning skills to process high-order concepts, and so forth. Hence the desire to simulate language in computers with circuitry, neural networks, and programming.

The structure, or grammar of languages however is quite different and reflects the culture in which it evolved. For example, Germanic grammar is quite different in nature from Swahili. The ordering of words into sentences, formal rules for writing, and the like are able to be grouped into language families and can be taught and codified, whereas the urge to speak and communicate is a natural part of a baby’s reasoning and development. Some linguists posit a “universal grammar” as part of this innate ability, a debate we will not digress into. However, suffice it to say that there is no need to have a universal grammar to understand the difference between language as an ability of humans and grammar as a structural arrangement of words.

Programmers fight over languages all the time some preferring Java to C++ or Python to Perl. This a debate over grammar first and nothing more. Operating systems also have grammars, witnessed by the wars between Linux aficionados, Apple fanboys, and Windows diehards. These communities of practice have agreed to use a particular way of speaking to the machine in order to give the hardware instructions. Those instructions have to be translated into a language the machine can understand. This is the job of the compiler which takes the human-readable code, such as Basic or Java and turns it into Machine Code. The machine code is then interpreted by the CPU as 1s and 0s that that the circuitry can use.

Of course, it’s far more complicated than that. If you want to talk to the graphics card and tell it to render a particular picture on the screen you use different commands (verbs) than if you are talking to the math coprocessor asking it to calculate a function. You can talk to the operating system and tell it to open a port, you can use an API to send commands and data to other systems and programs through that port, and so forth. The possibilities are as creative as any human communications. You just have to know how to talk to the iron.

Based on the ability to talk to machines via code, and knowing how to parse human speech with NLP, the goal of AI is to create agents that take care of everyday tasks and independently operate in the background with little to no human intervention. Scheduling appointments, answering simple questions, alerting a robot to refill the supply line, and so forth. These may sound like the stuff of science fiction, but each example has already been realized in the current business climate by Amazon, Google, Tesla, and others.

Tuesday, April 23, 2024

Natural Language Understanding and Generation

It is one thing to slice and dice Text, to chop it up into its component and parts, count them into piles and determine what’s in those piles. It is quite a different problem to comprehend the concepts and topics that a document contains. Natural Language Understanding (NLU) and its counterpart Natural Language Generation (NLG) are considered hard problems within AI whether dealing with Voice or Text data, because they both have as their end goal Machine Reasoning. In other words, how does a machine not just parse the speech but actually stitch all those parts together to truly understand the concept or idea that is being conveyed by the speaker (Voice) or writer (Text).

Sometimes people compare NLP to the process of taking text and breaking it down into data and NLU/NLG as the process of taking data and transforming it into readable grammatically correct text that can be used in articles, reports, and many other places. Using data analytics, you can target your content to a particular audience; you can transform information into a more readable form; you can also scale up content creation saving time and maintenance.

Natural Language Understanding

As noted in our earlier examination of How Language Works, the first step is to break down the utterance, whether that be a phrase or sentence into its parts of speech and tag them with a Part of Speech tagger. Then step two is to understand the grammatical structure of the phrase or how to parse the word order. A third step is to try and put that phrase into a known context or domain of knowledge to guide the comprehension or reasoning. This results in a type of classification or putting the communication into context. From there, it is possible to dive into a more diverse syntactical analysis. Getting to the ultimate meaning of a phrase is a hard problem because each domain of knowledge requires a deep ontology to represent its area of lexical complexity. For example, the Medical and Legal domains are highly different from the Roman Classical Literature domain, and yet they both share the Latin language as a vocabulary source.

This is all well and good when working with the written word, where the text is immutable. But when dealing with the spoken word, accents, intonation, and background noises need to be accounted for (speaker identification.) The machine must be able to filter sound files and understand what is the human saying, what is a pause or breathing for example, and how pronunciation variations, such as emotion, change when a particular word is spoken. Various acoustic modeling techniques are employed to take audio wave files and break them into individual words or parts of words. The goal is to create a model for a given language and train based on examples of people pronouncing words or phrases in that language. Then use the trained model for automatic speech recognition.

Speech Recognition has its own set of challenges and is a sub-discipline which incorporates recognizing languages, machine translation, and converting speech to text. This latter technique is very important, as there are many more tools for working with text than speech, so if you can get the spoke word into a written form then its easier to manipulate and process.

These models are often so large that they are deployed on the cloud and then use a network connection to interact with. This is why virtual assistants exist and they have a lag time in responding of a few seconds. When you say “Hey Siri” it takes a few moments for Siri to get your answer. She’s off in the cloud looking up the answer in a vast database of responses based on a trained model in your local language.

Virtual Assistants, Siri, Alexa and the like are the intermediaries that put a human face on computer technology. The first computer assistant, Clippy was in the form of a paper clip that attempted to automate the actions in Microsoft Office applications, but did not contain a speech component. The attempt to put a human face to computers will one day extend to computer trying to read lips when image facial recognition technology is good enough to parse subtle muscle changes and map them to sound files. But this is no Max Headroom.

Another application of NLU is medical records transcription. You go for a Dr. appointment and the notes are taken. In prior times, those notes were handwritten and then sent for transcription to a typist, then added to your permanent file. Over time, the technology got a bit more sophisticated: the Dr. would use a recorder to dictate the notes into a tape cassette. The transcription service listened to the tape recording and again typed out the data into a medical record to be added to a paper file. Now with NLU, a computer accomplishes this task in a much more efficient time using speech-to-text, and it only needs a final quality check by a reviewer. The resulting Electronic Medical Record is added to your history instantly instead of having to be shipped by mail to your doctor’s office by post.

A third example is customer support: when you call and are asked to choose a numeric option or say Yes/No, the software will recognize when you say your choice. If you state your birthday, or account number, this is NLU in action. When the automated voice repeats that number back to you, another function is in play, Natural Language Generation (NLG).

Natural Language Generation

As the subject implies, once a machine knows how to parse language, reversing the process to create speech is the next step in the journey. But we are far from a Babel Fish or Jarvis level of operability. In order for a machine to genetically create thoughts and interact with a human being on the level of AI that is portrayed in movies, the amount of computing power required for a true simulation of the human brain and its vast network of neurons simply does not yet exist.

There are some companies working on the Babel Fish translator problem, including Waverly Labs and Timekettle, who provide smart ear buds that hook up to real-time translation services in the cloud. These are paired with apps on your smart phone, containing phrase databases in which your most common speech patterns can be stored. The translation software learns your patterns as you use the app, it the same way that Alexa or Siri recognizes your unique voice print. The system then responds back with the corresponding translation if you need, say to conduct a conversation in French or Chinese. Conversely if you are in France or China and need something translated to your native tongue, it will translate that into French if you are a Frenchman traveling in China, or Chinese if you’re on vacation in Paris in our example. This is jut one simple example of a speech generation application using NLG.

NLG is far more complex than translating from one language to another. In fact, it involves understanding the context of a conversation and creating appropriate responses to the person the machine is interacting with. One of the most common examples where NLG is employed today is when you call a support line for help or to make an appointment. The infamous “Press 1 to speak with Customer Support, Press 2 to speak with Accounting to pay your Bill, Press 3 to make an appointment,” and so forth. And then you get stuck in phone-tree hell where you never get your question answered, just a computer-generated voice sending you to menu after menu of options until you are disconnected.

This early example where computers tried to replace humans in the phone system was just a lot of prerecorded messages, much like today’s database lookups of pre-recorded responses from Alexa or Siri. Customer Support solutions are now much more sophisticated and responsive, with algorithms behind them that are designed to react to your voice when you say your problem using a phrase or a few words. The computer in the background will then search through a knowledge base, and a voice synthesizer will read the response that it finds.

NLG also uses grammar and speech patterns to take common phrases and chunks of text and organizes it all into a document or voice response. There are three basic steps that need to occur for this process to be successful: content determination, information structuring, and aggregation.

Content determination is basically looking at the context and topic being addressed and deciding what information needs to be included in the document or response. If the goal is to compose a test-based output such as auto-generating webpages, news articles, or a business report, then it’s crucial to have the right data as input. If the response is to create a verbal interchange between a person and a machine over a phone call, then generating the right few sentences is even more important as a person asking questions or seeking information in a phone call does not devote a lot of time to the task.
Information structuring takes all of the content, organizes it into the most important, makes decisions about word order, lexical choices, and such. There is also a portion of the processing called “realization” whereby the code determines syntax, morphology, and word order, in other words how to write the actual sentences, or compose the morphemes of speech in the case of verbal expression with speech synthesizers.
Aggregation serves to finalize the content by merging or consolidating similar concepts for readability and conciseness. If an article is too long, then it will seem clunky and unnatural. This step is unnecessary for speech-based NLG.

Another approach to NLG is to use large sets of labeled, tagged data for training models with machine learning algorithms. The use of trained models is most apparent in chatbots, text-to-speech systems that conduct online conversations using an avatar as an agent that guide users through a process. The earliest chatbot assistant was ELIZA, a 1966 effort to create an interaction with a seemingly human program.

An often-overlooked use of NLG is the autocomplete function in many applications. Essentially, when you are typing the computer is predicting what words you will choose, what proper grammar is best to use, and the like. This, whether you recognize it or not, is a form of generating language on your behalf. The technology was originally developed by Nuance and is now so ubiquitous on smart phones that most of us take it for granted.

Teaching a computer, a logic machine, to put together words into phrases and complete sentences is a far cry from teaching it to reason, to think for itself as Jarvis does. He is a long way from the bumbling C3PO, showing pathos and sentiment as well as subtle humor in the face of crisis. C3PO and his sidekick R2D2 are much more the traditional stereotype of AI inside metal skin, robots who click and whirl while spitting out speech. They are simulacra of humans, with speech built in. Jarvis on the other hand appears fully human while Other. With Jarvis we wonder “Do we need Asimov’s rules after all?” Thankfully, we are a far way away from needing to answer that question.

NLP Demystified

When building actor-based systems for Artificial Intelligence (AI) applications, the aim is to be able to characterize and explain the multitude of conversations and real-world observations happening around us. In other words, the goals is to understand the linguistics of cognitive thoughts and how humans use language to convey meaning in a given context. The linguistic structures by which any language conveys meaning are not just in the words people choose, but in the way those words infer knowledge, both implicit and explicit, about the world around us—elements such as emotion, intent, and desired outcomes. When teaching computers to understand language, the trick is to not just statistically count words and match strings, but to convey context and intent in a way that can be computed upon for a data-driven approach to Business and Operations.

In order to understand the sequence of processes (algorithms) and the tools (applications) necessary to translate between human and machine communications, this blog aims to describe the theoretical background as well as the real-world practice of Natural Language Processing (NLP). There are three themes to this conversation:

I: Understanding how language works for both humans and machines

II: Understanding Machine Learning and NLP as a confluence of Linguistics and Computing

III: Understanding how applications can leverage the power of Computational Linguistics

Too often, books and information in the domain of Artificial Intelligence assume a baseline understanding of Computer Sciences and programming. This site is designed for the novice and business audience alike. The goal is to make clear the underlying technologies and algorithms in NLP without having to grasp the code.

Join us in the journey as the women of Boldingbroke sort out all the technobabble for those people in the business community who are not full-time programmers and data scientists.

What is NLP?

When users of a system–customers or employees–communicate with each other in the regular course of business, they convey their plans and intentions in written and verbal speech patterns. These are in turn recorded to ensure continuity of business, financial record keeping, compliance to regulatory laws and code of conduct standards, and other use cases. The work environment is dynamic and language patterns and terminology change constantly. Correspondingly, the systems and tools have to keep pace with the rate of change and must be able to continuously learn and reveal unforeseen and actionable connections to uncover opportunities as well as risk. To do this, machines need to be able to understand human language.

Natural Language Processing (NLP) is the science of breaking down human language into discrete patterns that a machine can understand and interpret. Whereas the understanding and responding to basic commands is straightforward, a machine cannot understand the nuances of why a person says what they say, or their intent and sentiment. This is where data sciences and computational linguistics have created tools to help machines understand what humans are really trying to accomplish when they type or say something.

NLP has several disciplines, including Natural Language Understanding (NLU) or speech recognition based on disambiguation (understanding precisely what a word means) and Natural Language Generation (NLG) or speech creation, where a computer independently composes sentences such as with a “chatbot.” We will look at these side disciplines in future blog posts. For now, you can follow the links to Wikipedia for a quick reference definition of these areas.

Why Should You Care

In the 2021 Algorithmia Enterprise Trends in Machine Learning survey, there was a reported urgency around AI/ML projects: “When we asked respondents why, 43% said their AI/ML initiatives “matter way more than we thought.” Nearly one in four said that their AI/ML initiatives should have been their top priority sooner.” (p6) While many areas of IT budgets are downsizing, the AI/ML line item is ramping up significantly. If 2020 has taught businesses anything, it is that automation of DevOps and management of data assets are key strategic investments that recession-proof operations and keep a business viable in uncertain times.

Organizations are looking at an increasing number of use cases for ML and with it, NLP. In future articles, we will dive into these scenarios and their ML/NLP applications in depth. In the meantime, here are just a few of the possible best practices and outcomes where this technology can be applied:

Improving Customer acquisition, retention, interactions, and experience and therefore customer loyalty
Process, supply chain, and back office automation, reducing operational costs and increasing ROI
Fraud and Insider Threat detection
Sales pipeline, recommendation systems, loyalty, brand awareness, and marketing program intelligence
Financial planning and insights
Governance, Risk, and Compliance management and workflow for audit and regulatory reporting

Governance is by far the most problematic of these use cases, with over half of all organizations ranking it as the top challenge. The Ethics, Explainability, and Data Privacy concerns inherent in the AI/ML discipline are fodder for many conversations and debate in this emerging space. As with any new technology, standards are not yet established. But governance mandates that the handling and processing of data, especially PII, be treated with kid gloves… and an audit trail in order to minimize risk. Data is exploding at a rate that makes it hard to trace: even with data lakes and cloud solutions, data is messy by nature.

Why Does “Big Data” Matter

Big Data is a term that illustrates the growth in volume, velocity, and variety of data in the world. It is exemplified by an explosion in the quantity of data; and primarily so in the unstructured “messy” data of chats, and the semi-structured data of emails. Notes, images, and attachments increase the complexity of what must be captured and supervised. And while communications appear to be small, when viewed independently, they provide more insightful patterns and context when viewed in aggregate. Until recently, a human eye was required to spot these patterns and understand the context – but now we have access to advanced tools such as statistical analysis, machine learning, data mining, NLP, information retrieval, and predictive analytics.

Fundamentals of Pattern Analysis in Language

Detecting behavioral patterns in unstructured, text-based data is often compared to a “Needle-in-a-Haystack” scenario. The basic assumption is that the vast majority of people are following common patterns, and only a small percentage are outliers, such as First Movers with a new technology (positive use case), the “bad actors” who try to mask their intent for Insider Threats (negative use case), or a person who is acting under duress due to life circumstances (neutral use case). Still, NLP systems must look at everything in order to find evidence in the communications of those few individuals. Most communications detected as potential trends are generally innocuous, and we call these “false positives” when we have to review them.

NLP software therefore seeks to sift through and filter out the vast majority of documents that are valid, business-related communications, and reduce the haystack down to an interesting set of data where the “needles” or outlier behaviors are hiding. It is these “interesting” communications that are considered high-value, or “true positives” and for which we want to generate data sets for further examination.

In any search application, there is a classic tradeoff between Recall and Precision [Fig.1]. Recall is the scope of coverage: “Did I miss anything?” Precision is how closely can I come to getting exactly what I intend to find? We seek to optimize the balance between reducing False Positives (increasing Precision) and making sure as few True Positives as possible are missed (increasing Recall). Complete coverage with high precision is the goal of all NLP solutions.

The balance between Precision and Recall can be viewed as a balance between the two competing needs of a business: to have a strong assurance of coverage, suggesting that few positive cases are missed; and yet concurrently to have a low volume of highly targeted, precise documents for analysis.

One method for managing this competition is to map the business needs and risks to a behavior taxonomy and then link individual rules and algorithms to them. The taxonomic technique allows us to measure the tool’s performance with respect to each of the managed use cases, and to demonstrate that each of the risks has a corresponding coverage. This management method renders the typical tradeoff between sacrificing Recall in favor of Precision insignificant, as there is now an assurance of business coverage, while providing targeted risk reporting for those behaviors that senior management prioritize.

NLP—An In-Depth Explanation

NLP tries to break down the complexity of human speech in two parts and solve each part independently, with what are called Semiotic and Semantic analysis.

The first challenge is to understand word origins and their evolution over time, in order to find relationships between them. This is called “Semiotic Analysis” and is the subject of a lot of research in the industry. It has been broadly addressed by breaking words down into character strings and sub-strings (their stem and “lemma” or main representative form). For example, “run, runs, ran, running” are all forms of the lemma “run.” The system then performs clustering and counting of words within documents, finding which are most commonly present in the proximity of others. The tools for this work are effective and openly available as open-source toolkits.

Common toolkits that are used in the industry are Stanford Core NLP and Apache OpenNLP, for the following operations:

Normalization: Correcting spelling errors and standardizing words.
Stemming: Looking for the stem, i.e. the most basic, common form of a word.
Entity Extraction: Identifying nouns and tagging them with properties that are useful for analysis (for example: Barclays can be tagged with “Bank” or “Counterparty”, and “cell phone” can be tagged with “communication channel”.)
Fuzzy Matching: A type of string-based matching of phrases to a dictionary of interesting words, or topics, that accounts for variations in position or spelling (so “call my cell” would be the same as “call @ cell”.)
Synonyms: Simple word substitutes to capture the context provided by other words (so “call my cell” would be the same as “call my mobile” in search terms.)
Feature Construction: The use or combination of the above techniques to generate and store complex context along with the data (so “call my cell” would be stored as “use of external communication channel” –and would match the text “reach me on my mobile”.)

The second challenge is to understand each word’s meaning, and how that meaning changes within a single sentence and within the context. This becomes more complex in larger passages because the context and sentiment will shift between sentences or chat lines in a conversation. This type of analysis is called “Semantic Analysis” and is a harder problem to address because it reflects the deeper linguistic intricacies of human communication.

In language, context is everything. We, humans, are exceptionally good at understanding emotions, nuance, and innuendos, all things that machines cannot grasp. A diagram of the language’s structure helps explain why [Fig.2]. There are two levels of semantics in human speech, shown here as breadth and depth.

The basic structure of a sentence is represented in the top line: Subject-->Verb-->Direct Object. The conduct risk behaviors that we are trying to detect are constructed as Verb-->Direct Object phrases, representing the discrete activities that the Subjects perform. The deeper structures of prepositional phrases provide context. This is where the intent can be best discerned: the “why” that explains a person’s motivation and actions.

NLP enables systems to learn the context and meaning of words from sentences, paragraphs, and entire documents by reading each line. Unfortunately, though, computers cannot yet read “between” the lines. It is instructive to point out the ambiguity inherent in human language by examining a few expressions:

“He was on fire lst nite.” (last night)

How does a machine know that a person is not really burning up when someone says that a sports player is “on fire” - meaning that they are performing well? In an NLP platform, the fuzzy matching capability is able to accommodate the spelling error or abbreviation and recognize “lst nite” as a “timeframe”. Its NLP semiotic techniques can then understand that “on fire” is a synonym for “high performance” when combined with the dual contexts of “person” and “timeframe”.

“She jumped for joy.”

In the same way “jumping for joy” is not a literal action, but it could be especially when the context is dealing with small children.

“They really love to do bad things.”

In this example, the software will need to be trained to understand that while “love” is positive (let’s think about it as a +1) and “bad things” is negative (-1), to love a negative makes the statement doubly negative (-2) instead of neutral (which is reasonable, given +1 -1 = 0). In order for the machine to understand the above examples, the instructions we give have to take away the ambiguity that humans are very adept at handling naturally. This is in essence the semantic power of NLP.

Our NLP tools can look at adverbs and prepositional phrases to discern sentiment or emotions (“love”, “hate,”) and intensifiers (“really”, “maybe”, “hesitantly”). However, the ability to discern free-will choices and the intent of a person’s speech patterns based on deeper semantic structures is still an area of research that has yet to enter the commercial space. In order to approximate the understanding of such human psychology, it is possible to create semi-static structures against which rules can be mapped. We call these structures “taxonomies of risk,” or “behavioral taxonomies.” In future topics, we will look at how language works and the way taxonomies aid computers in organizing human knowledge.

In this post, we have looked at the basics of NLP at an extremely high level and posited several use cases for businesses to consider when applying advanced AI/ML techniques to their operations. Making an investment in ML for the long run is a strategic decision that should be mapped out with deliberation and an understanding of the investment in DevOps, Infrastructure and Data Sciences that are necessary to fully support the initiative. To accomplish this type of transformation takes buy in at the most senior levels of the company. Therefore, it is critical that the C-Suite understand the concepts involved and the investments necessary to drive innovation in the new world of Predictive and Behavioral Analytics powered by NLP.

Boldingbroke: AI Building Blocks