As we begin to explore the world of Natural Language Processing (NLP) and other forms of Machine Learning (ML) or ArtificialIntelligence (AI) tools, there is a foundational concept that will appear in various forms, often by more than one name.
That is the foundational concept of “entities.” Entities are observations in
the data of real-life people, companies, places, or other things like cell
phones or vehicles. They represent a real-life “who” or “what.” Links in that
data are indicative of shared attributes which can create relationships. These
relationships create a context for exploration into what the entities are doing
and why they are captured into their unique transactional history. Just
as nouns have adjectives that describe them, entities have properties or
attributes that help distinguish them as unique or similar to other objects in
the system. Data extraction techniques help to determine one entity from
another by its attributes. Another tool can then cluster similar entities into like
groupings by commonalities.
In machine learning with an AI-focused on understanding real-world actors, their relationships, and the meaning of their communications, you will encounter two similarly described activities: Entity or Identity Resolution and Entity Extraction. Entity Resolution is the practice of distilling an individual identity of a person, place, or thing from the parts of structured data coming from many sources. It may appear under many other names as well, like identity resolution, record linking, relationship linkage, record matching, and several different terms. If the job is to match up records from structured data to come up with the ultimate identity of a real-world person, place, or thing, and the relationships between them, then the job is Entity Resolution. Entity Extraction is the practice of identifying the name of real-life people, places, and things mentioned in semi-structured and unstructured text. In other words, Entity Extraction is identifying a unique person in the first place, and Entity Resolution is making sure they really are who they say they are.
Entities are the beating heart of systems dedicated to taking action based
on understanding who is who, how they are related, what they are doing, why
they are doing it, and if that is good or bad news for an organization. Despite
how straightforward this may sound, it is utterly non-trivial. Entity
Management means enabling systems to match up all the data from different
origin systems required to create a unified identity and then monitor
transactions originating from many other systems between entities and
activities. These transactions can contain semi-structured, unstructured, and
structured data. Each position may provide what’s needed to generate the
necessary context to gain insights from or generate system actions with that
data.
To build systems able to respond to actors’ behaviors, you must marry Entity
Resolution with Entity Extraction, Semantic NLP tools, and a well-developed
business, compliance rules base set that when combined permit a context-driven
flagging of activity. Under what circumstances would you want to invest the
time in building systems that compose various ML/AI-driven components, given
the expense and time involved? We can look to the recent news and industry
reports for some examples:
Fraud
The
State of California loses ~ 8 billion taxpayer dollars to unemployment
insurance fraud. Reporting shows that bad actors used the chaos of
COVID and the social shut down of the economy to file false claims using
identity fraud. The state was unable to verify the identity of claimants, some
as young as one-year-old identity fraud victims, due to an inability to cross
check databases such as DMV, prison, and death records.
Insider Threat
Ponemon
Institute shared in their 2020 Cost of Insider Threats: Global study
that the three largest industries affected were companies in financial
services, services, and technology and software incurred average annual costs
of $14.05 million, $12.31 million, and $12.30 million, respectively. Those are
the hard costs of identification and containment and don’t estimate losses
generated in events that materially damage customers, create public distrust
and dislike.
Both examples have two principal entity types, one or both may be bad
actors; people and companies.
- The State
of California might have avoided multi-billion dollar losses and enabled
more citizens to be processed quicker during the COVID-19 crisis if the
Unemployment Security department could efficiently marry up all known data
held on a citizen and business entities by the state, search that data,
apply eligibility rules programmatically, and flag suspicious claims for
verification. Systems can also compare answers between relatives to see if
the content is similar in nature, use location data, and verify age, and
other demographics.
- Insider
Threat detection is an order of magnitude more nuanced. This requires a
cross-system understanding of people, systems, permissions levels, other
access levels, relationships, and communication activities. Monitoring
behaviors access and communication aligned with rule sets that create the
opportunity to neutralize a threat condition before it becomes a costly
breach. (Read “What is NLP?” for a primer on
teaching systems to recognize the intent in written communication.)
There are other valuable business drivers pushing the desire to know the
identity of people, their relationships, and the content of communications
between them.
- Customer 360 to have a clear understanding of both individual and aggregate customer journeys in their relationship with a product and its supporting services that permit process automation, customization, retention, recommendations, and other marketing intelligence.
- Process Automation for business processes, back office, supply chain to reduce operational costs. This includes log file processing for anomalies, maintaining inventory and fulfilling orders, and alerting humans when the assembly line breaks, to name a few examples.
- Financial Planning and insights for a view of customer plans and investing strategies including risk appetite. Estate planning strategies and trust management are also potential beneficiaries of relational content and identity management data.
- Governance, Risk, and Compliance management with workflow and audit and regulatory reporting. This includes Know-Your-Customer and other watchlist checking.
Entity Resolution also involves graph theory and can extend into Actor-Based
Network analysis, where the interactions between entities are mapped and resolved
to determine patterns of behaviors within communities of practice. This can
occur between people groups, companies (think of Enron and the many companies
who were impacted by its malfeasance), or even within germ cultures in medical
research. Looking at this level of entity interactions is an advanced
discussion and will be the topic of a future post on Actor-Based Network
Analysis.
As you continue to explore NLP and other ML/AI toolsets and how they can be
composed into systems the deliver risk reduction, cost savings, and greater
customer value in product experience delivery, understanding the concept of
entities will help make engaging with many other topics in Data Science, Machine Learning, and AI engineering easier.
When using AI toolsets in many practical business applications, we use entities
to understand who is doing what, when, and why. We often want to either respond
at the time to actions and situations, understand details after the fact, or
predict the next actions.
S. Bolding —Copyright © 2021 · Boldingbroke.com
No comments:
Post a Comment