Benefits of GenAI
The landscape of general computing has changed significantly
since the initial introduction of ChatGPT in November, 2022 and similar LLMs
for usage by developers and end users alike. And yet. The technology is not
new, just expanded. Built on previous generations of research and model
building, ChatGPT and its fellow models are examples of stitching together
several targeted models in various domains to reflect a more global view of
reality. ChatGPT itself comprises at least eight distinct models under the hood.
It’s not the apocalypse some critics predict: End
of the Middle Class. Destruction
of white-collar jobs. Advent
of the Age of Machines.
Cynicism and fear generate clicks, using the specter of
‘radical tech’ for marketing purposes and to show you how important they, the
titans of tech, are. Human-computer interactions via AI went from prescriptive
to collaborative overnight. The shift centers around a new interface paradigm.
No longer tied to the physical elements of screen, keyboard, mouse, humans can
now use natural language to order their machines around. This currently
includes voice, gestures, facial expressions which augment the traditional
written text input. We talk, using words to explain what we mean, and AI talks
back to us.
These types of interactions defined as more ‘human’ imply
collaboration, in that humans do part of the task. People scope, bound, and
provide context to the questions. This is now known as ‘prompt engineering’ a
whole new job category that emerged with the advent of interactive models.
Collaboration implies that humans do part of the tasks, AI does others, and the
back and forth of it all feels more ‘natural’ with a better result at the end
of the day. The trend is to view these tools and bots as augmentation to
existing efforts. While Ironman’s JARVIS is a long way from the old paradigm of
Jeeves (as in AskJeeves of
the Internet 1.0), generative AI is far from independent, able to have agency
and purpose without human input.
There is a potential downside, in that AI
amplifies what’s already there. So if you have great processes you’re
automating, they will become more efficient. If your processes are faulty,
inefficient, or misdirected, then those flaws will multiply causing more loss
of time, income, and ultimately clients. It’s a double-edged sword.
Operationalizing AI
AI as a topic is about more than just LLMs and content
creation. At its core, the LLMs that underpin today’s advances in AI are built
on data. When you operationalize AI, you're essentially operationalizing data. For
the longest time (in internet terms) the focus has been on building out
networks, infrastructure, and the applications that run on top of them. Now
however, to protect your valuable information, the paradigm has shifted to
protecting the data that runs through and powers that infrastructure. Think of
data as the gas that fuels the engine. Without fuel, the systems and processes
are just so much piping and potential.
To get to a point where companies are protecting both the
infrastructure and the data, a core discipline of “always on encryption” will be
needed to protect sensitive information and prevent breaches in both internal
systems and cloud environments.
Data Security in Today’s Environment
Data, your personal information, is a form of currency in
today’s online world. No longer is your privacy guaranteed as geographical
boundaries and jurisprudence are increasingly ignored by hackers, thieves, and
well-intentioned analytics gurus. Data mining is a profitable business, and you
are the product being milked for all its worth. There are reports every day of
medical information being shared without permission to insurance groups,
biostatisticians, and others in the name of the ’public good.’ Standard
organizations can’t even agree on basic definitions. Rules and regulations
like GDPR give many the right to be
forgotten. But you must deliberately opt out. And when you don’t realize you’re
even ‘in’ the game in the first place, it's a never-ending battle to protect
your digital footprint.
How does the smart consumer of digital life go about
shielding their most intimate details, while still enjoying the many
entertainments and services at hand on the web of today? Some experts say it’s
already too late. And they may be right. Your cellphone alone provides a wealth
of data via GPS, location tracking and connectivity services, the baseline
necessary to stay connected while out and about, before you even begin to
indulge in a little TikTok.
Needless to say, data security is a core discipline for companies to protect sensitive information and prevent breaches in both internal systems and cloud environments. In the case of expert systems, where the software mines information internally from repositories and active channels, such as MS Teams or Slack, the employees are subject to exposure on a regular basis. These smart knowledge management systems advertise that they distribute the right information to the right people at the right time. However, if an employee is chatting about personal plans, or sensitive health information, that could be shared with coworkers unintentionally. Another example: executives are talking with each other about the financial implications of an upcoming merger. It is essential that the information remains confidential to comply with SEC regulations and not violate insider trading restrictions. Ensuring that the conversations remain limited to a small cohort is where data privacy rules and policies come into play.
Encryption at rest and in transit
Maintaining data security is critical for most organizations. Unauthorized access to even moderately sensitive data can damage the trust, relationships, and reputation that you have with your customers. Any truly responsible company encrypts data stored at rest by default. Any object uploaded to the cloud can be encrypted using for example an Amazon-managed encryption key or a Google key store. Having a key management solution is a critical step for any company to take. Data in transit should also be encrypted to follow industry best practices and prevent leakage or exposure to bad actors. Again, each company will employ their own keys and methods to ensure this protection is in place. It’s all about good hygiene along the entire route of getting the bits of information to and from where they are most needed.
Risks of Unbounded AI
The government, as usual, is behind the curve in regulating
the industry. There are no clear ways to apply old paradigms enshrined in law
to new ways of viewing the world.
Regulatory controls and agencies such as the FEC and SEC are designed to
monitor and force compliance of human beings to laws regarding monetary policy.
If an AI agent acts contrary to the rules, who do you punish? The programmers
who created the agent or bot? The company that deployed the AI? Murphy’s Law of
unintended consequences always raises its head.
A case
pending before the Supreme Court addresses YouTube’s use of content to
generate video recommendations for users. Does the liability rest in the
organization’s ability to shape content via recommendation engines and the
algorithms that power them? This case is relevant to the debate, as ChatGPT and
its brethren operate according to the same principles as those recommendation
engines. In other words, do Section 230 protections generally apply to any
third-party content from users, to content or information created by a company
out of that third-party data? Does it
protect companies from the consequences of their own products? More
importantly, who ‘owns’ the output of the engines? In Thaler v. Vidal, the
courts maintained that US Patent Law requires a human inventor.
Many institutions and groups are organizing around ‘Ethical
AI’ to lay the groundwork for a public policy debate that surely must come.
Groups such as Oxford
University, the Institute for Ethical
AI and ML (UK), Stanford
University, AI for Good (UN), and
Global IA Ethics Institute (Paris) are
all attempting to lead the way. There are many more out there offering classes,
certificates, and services such as audits and systemic quality reviews. It’s
early days.
Ethical and Privacy Concerns Inherent in the AI/ML Space
When it comes to the use of data to create LLMs and the
output of those models in the GenAI world, there are a few points of interest
to consider. When you stop the flow of data, the gas that fuels the engine, you
interrupt the flow of business. Effective cybersecurity means safeguarding the
flow as well as the contents of data. In terms of AI, and how that data is used
to build models, the process of securing your digital assets starts at the
moment of creation and then curation is the name of the game. A few points of
interest are in order to understand how AI sucks in your digits, chews them up
and ultimately spits out a version of your reality, or view of the world.
Provenance: Who chooses the data to train the model?
There are many sources of data that a company generates:
documents, communications, imagery, marketing, financial, and so forth.
Granting access to the databases, file repositories, and financial systems
inherently opens that system up to intrusions, because once you provide an API
or other access, it can be compromised. Guaranteeing security at the source is
critical. It always comes down to a people problem. Trust the people, secure
the API endpoints, and instantiate best practices such as 2FA or MFA to ensure
the person or system accessing your data is really authorized and authenticated
to use it.
But people have biases, and data must be cleansed,
normalized, labeled, and in many other ways massaged to be useful to machines.
Unstructured learning techniques avoid a lot of these issues. However, they are
less efficient and more prone to statistical errors than having humans curate
the data set by preprocessing it. The quality of data matters, and just as ‘you
are what you eat’, your ‘model is what it ingests.’ If you only feed it emails,
then a skewed view of your company results.
Data has a sort of provenance to it, and this evidence can
be found in the metadata, like ‘was it created by a human or a computer,’ ‘when
was it created,’ ‘where is it housed,’ and so forth. Each element, while able
to be faked, adds up to a sort of fingerprint or watermark, an information
trail as it were.
Methodology: What technique is used to produce and maintain the model
There are many approaches to model generation: Supervised
Learning, Unsupervised Learning, Reinforcement Learning, and Deep Learning, to
name the most popular. You may also have heard of techniques like ‘zero-shot’
and ‘single-shot’ models.
Each method is an approach to statistically evaluating the
data from the first issue of data above. How and why you pick one approach over
another is purely based on the results you are looking for. Business
applications and solutions to process bottlenecks require differing
perspectives to achieve improved outcomes.
These techniques are fundamentally descriptive and focus on
what humans actually have done. They do not hand the ambiguity of what we
‘should do,’ or negation, what we ‘shouldn’t do.’ Consider the method used to
train an AI for identifying which recruits to interview for a new hire. If the
data set is skewed and then the method is weighted toward a desired outcome:
the decision making of the machine automatically throws out non-statistically
significant resumes. There is no way to argue against it. Valid candidates
don’t even get a rejection email. Classifications can be reviewed, but who’s to
say it’s the right set of classifiers in the first place? The explainability of
your method is just as critical to the process as is the data. Values such as
beneficence, non-maleficence, autonomy, accountability, responsibility, and
privacy are also key dimensions.
Don’t exaggerate what your algorithm or model can do. Don’t
speculate that it delivers fair or unbiased results. Your statements must be
backed up by evidence and this is where we get to the next issue, one of
quality and veracity.
Quality: What technique is used to test the model and validate its output
Transparency is the cornerstone of trust when it comes to
what for all intents and purposes is a black box process. The means by which
you start out must be the way you finish. And the ends cannot justify the
means. If a company is not careful, decisions and activities can be delegated
to bots and justified by the thought that bots can get away with everything a
human can’t. This is a logical fallacy: this evasion of responsibility ends up
becoming no more than a legal dodge, a sort of ‘human behavior laundering’ in
the guise of technical advances. Who’s to blame? You can’t point to a single
person, the machine’s doing the work.
While the alphabet agencies are developing frameworks, and talking compliance and certification, there are still basic steps to testing, validation, and certification you can take. Obvious proactive measures are:
- Independent audits
- Legal reviews
- Opening your data or source code to inspection by 3rd parties
Quality control of the data input lays the foundation for
verification of the output. Does the results match the expected area of
information and standard responses a reasonable person would accept? Are
statistical measures within tolerances? All of this implies that you have
established acceptance criteria and KPIs in advance, expectations that support
your business goals.
In testing, skepticism about the data and the output is your
best tool. Deep fakes are an increasing problem. Confirmation bias creates a
reinforcement feedback loop that increases the willingness of some to believe
whatever a machine creates. ‘The data says so’ ‘that’s what the data shows’ are
no longer strong arguments. Corroboration of information means human curation
at some point. Quality control is that step.
Product: Who owns the output
The legal responsibility for the impact of a model is the
question at hand. As ridiculous as it may seem, lawyers are already being hired
to represent AI chatbots. This maneuver is an attempt to displace
responsibility away from the humans behind the curtain. In effect, the
developers and the companies who employ them are trying to say, ‘hand to heart,
we can’t be held responsible for discrimination, it’s an algorithm.’
Logic dictates that we look at this attitude. Common sense
says ‘the creator is greater than the creation.’ When a human encodes
instructions and gives direction to model, they are essentially the puppet
masters controlling the strings. A computer sifts through data, looking for
patterns. Who told it which patterns were relevant and which patterns to
ignore? The humans who wrote the algorithms and code obviously. The teams that
create, curate, process and quality control the data should be held responsible
for the outcome of their creation.
The next question that immediately comes to mind is ‘who
then owns the output?’ Some companies are arguing that the end use is the legal
owner. OpenAI’s terms assign
the right, title and interest in output to a user. Legal experts are cautionary
in stating that it may be a nice position to take, but large corporations are
wont to change their mind retroactively. Additional criteria may include
‘independent intellectual effort’ (perhaps demonstrated by prompt creation),
‘originality’ of authorship, and so forth. None of these are inherent in
AI-generated output.
In Conclusion
Learning about all these issues and more will protect you
and your company in the coming months and years as the current wild west of
GenAI turns into the established way of doing business. Taking care now to balance
and protect your interests will ensure a safe and prosperous future for all.