Pattern Computer, out of Friday Harbor, WA has published a new whitepaper about XAI (Explainable AI). You can download the PDF here. The press release is here. The researchers at Pattern Computer present an interesting use case of how AI in its current state is able to advance scientific research, and yet still faces fundamental challenges. I have summarized the article in plain English as some of the concepts are highly technical and related to data sciences theory.
Hypothesis: One can use AI and data sciences to change the basic nature of scientific discovery. Researchers no longer need to form a hypothesis, instead they should let the data tell them what the emerging issue/idea is to pursue in the broader world. In order to achieve this, adequate tools must be developed.Today, science leans on human intuition to form a hypothesis about an issue
or idea being investigated. This is dependent on that human having deep domain
knowledge. With Machine
Learning (ML), the machines are domain experts by virtue of the data
they hold and analyze. The methods for creating the models and neural networks
are dependent on the humans who select and input the data as well as code the
algorithms. However, machines are able to go beyond human capacity, in that
they are able to be domain-agnostic and are faster, more efficient, and can
learn. AI models make connections that are profound that a human would miss.
Will this impact the sciences as much as it has other industries? Biology
is one example where AI is already providing success. Other sciences that are
early adopters are Climate
Sciences, Drug
Discovery, Agriculture,
Cosmology,
Neuroscience.
Why is AI more productive in Industry than the Sciences? It is the nature of the questions being asked: Industry looks to solve specific discrete problems and achieve returns on investments, money is on the line. The Sciences, however, seek to answer broader questions in the wider world, such as natural phenomena which is an order of magnitude and scale that goes way beyond the smaller, more targeted goals of Industry. As a result, AI in Sciences requires more investment and time, more data, and a more complex set of algorithms. Indeed, it all goes back to the questions being asked.
Additionally, scientific discovery seeks to answer 'how' and 'why' an
outcome occurred, not just to arrive at the 'what' of the data output. The goal
of the Pattern Computer whitepaper is to examine the
promise of 'AI for Science' where, as they state, 'we must develop methods to
extract novel, testable hypotheses directly from data-driven AI models.' In
essence, use AI to automate one of the steps of the scientific process, that of
forming a correct hypothesis. This will save time, money, and increase opportunities
for furthering science. One of the most costly aspects of research is chasing
incorrect hypotheses or investigating an idea only to find out that a
competitor has already investigated and abandoned that idea. If you eliminate
these false trails early on in the process, you increase the likelihood that
your research will result in tangible outcomes.
Taking a 'scientific first-principle' approach to modeling means that the model is a secondary tool to the primary question of understanding the world. Therefore the model, regardless of methodology used to create it, must adjust to the data and needs of the domain being investigated. The simple approach of using a training set to create a candidate model, and then test, adjust, test, etc. is not sufficient because it only looks for the positive and does not account for counterfactual, or contradictory data. You get curve fitting, rather than accounting for the whole world. This results in a correct outcome but does not answer the critical questions of 'why' and 'how' that were mentioned earlier.
Statistical approaches investigate not only the data but the actual model,
how it was built, and the domain being modeled. They provide confidence scores,
answer the uncertainty factors, and look at the impact of parameters, both
chosen and rejected. Statistical methods are just one way of testing a model,
there are others that are not dependent on parameters, and the authors list
several. The crucial factor is that the model be tested by outside methods to
validate its effectiveness to reason quantitatively and qualitatively.
The authors provide a clean and well-presented history of improvements in
tools and technology in Figure 1, which summarizes
the advances in math and computing leading to today's state of the art in AI.
They question whether data-driven AI can mix with Statistics the same way that
Calculus and Computing are linked. Figure 2
provides a view of the applications of current AI in Science versus Industry.
Starting with a statistical approach of creating a minimal model based on a
limited set of parameters, the authors posit that it is possible to infer from
that proto-model a hypothetical collection of 'parameters to outcomes' that
will indicate which parameters to include in a more complex alpha-model, and
then iterate from there. However, in my opinion, this is little to no different
from creating a training data set and then testing as was rejected in their
introduction to the problem. They are simply using the statistical approach
instead of starting with an ML approach. It is a matter of degrees rather than
process at this juncture: we are splitting hairs.
In regard to their discussion of Generalization versus Extrapolation (Section 3), the authors look to expound upon their theory regarding the construction of models for scientific understanding of the world. And it all comes down to data: access to quality data is always a barrier to entry in any effort. They break the discovery of patterns into two classic parts: the capacity to learn from data and the capacity to learn from outliers that do not 'fit the curve' of the statistically interesting nodes. Why is number two important? Because it is in the outliers that the potential for new scientific discoveries may lie. But both are critical for the scientific method to succeed. In contrast for Industry, those outliers are anomalies to be cleansed and removed.
Extrapolation is the principle of using one concept to transfer learning to
another domain. Figure 3 of the paper
illustrates this principle of science. For AI research, the concept of
extrapolation applies when a first-principles model is used to create
phenomenological or semi-empirical models so that data can be extended beyond
its narrow domain to neighboring areas of research.
But where does the principle of Generalization fit into this for AI? It is
in the testing of accuracy that we see generalization methods being applied,
and again it goes back to training sets, where a data set is split in two, one
for creating the model and one for testing its output. This allows the creator
to measure things like accuracy, precision, recall, and other important
benchmarks. If you can then apply that model to a very different data set and
get similar benchmarks, the model is said to be generalizable or extensible to
other domains.
Extrapolation in contrast takes into account the counterfactual reasoning
about data that could not be collected and what we can posit about it. The
essential difference is that the AI model is either discovering patterns within
a given data set or discovering patterns outside of that data set in the
broader world–two entirely different problems. However, without proper tools
and methods to evaluate the performance of the model being used, the
extrapolation may end up with poor results. In other words, it provides a
vector for further investigation, but does not provide conclusive evidence.
Where is the answer to this conundrum? Adversarial AIs which challenge those
models may provide an automated means of testing. Such testing would increase
the robustness of the models and reduce the noise in the signals produced.
Explaining AI is always a challenge. This whitepaper overall makes a significant attempt at explaining AI for Science. The authors go into detail in Section 4 on the importance of differentiating between 'explaining' and 'interpreting' the matter at hand. While it may seem like semantics, the details are important. One can understand a model but have a hard time understanding the implications of its output. The authors denote the difference as follows: 'the distinction between identifying patterns in the data/model (hence, ML-style generalization) versus discovering patterns in the world from data/model (hence, scientific extrapolation, or ML-style transfer learning).' But there is a more simplistic way to understand the difference, a simple English definition will suffice. Explaining is understanding what's going on; Interpreting is translating into new insights, the 'why' and the 'how'. This then, according to the authors results in a process of vetting the model in question for its fitness for use.
The Completeness of the model as a metric is not really addressed in this
paper. But it is significant to the discussion, and somewhat implied. In
modeling a domain for any purpose, the choice of data for training or input to
the model is based on the question being asked. But as is noted in this paper,
science seeks to answer questions about the wider world. Therefore, in looking
at the quality of a model, and using principles of extrapolation, would it not
also be important to examine the gaps or assumptions in the model, thereby
avoiding bias. Examples of bias are well documented, such as facial recognition
AI that fails when looking for minority races, or the lack of data about
women's health in third-world countries when modeling heart disease. If healthcare
models are based on first-world countries and the bulk of data over time has
been collected about men, then it's hard to imagine that a model is complete in
that domain. The authors discuss density estimations as embedded preponderances
of data to some degree, and link that to regional effects. These and other
barriers to integrating AI into the scientific process again trace back to the
availability of input data. The authors acknowledge that the 'capacity to
understand data representations' is a 'grand challenge' that has significant
impacts on the scientific discovery process.
Learning additional information always results in a decision being taken.
This is how science advances. You make a hypothesis, test and learn from it,
then make decisions about the next steps. What that information has taught you
about the world determines a new way of interacting with the data and
parameters with which you are working. Starting out with semantically
meaningful parameters describes a known system or area of knowledge. Then you
examine what that model is telling you about the information. How you take
decisions to extrapolate and form new hypotheses is a common sequence of
events. When the data is within the same domain but lacking semantic meaning in
and of itself, then the model may take that unstructured data and provide a new
insight into it. Or it may distort and provide false trails.
The distinction is important. How you take decisions on which path to pursue
is critical to the process. The authors posit that it is possible for AI
algorithms to discover emergent parameters that lead to defining new semantics,
intrinsic meaning, within a domain that lacks definition. This problem is
non-trivial because the process as well as the results need to be subjected to
rigorous testing. Emergent system features for modeling in a domain is a
wide-open field of research at the moment, due to the lack of good, let alone
sufficient, testing methods. One example the authors give is whether a model
has hidden within it knowledge about a structure or pattern that humans lack
the ability to discern. How to extract that knowledge in a scientific manner, and
then make it more than an anecdote of the research? This is the non-trivial
part where the scientists must be rigorous, detailed in their validation steps.
The lack of clear connections between parameters and patterns is where AI excels but is also where the most doubt can be found. Where there are 'as-yet-known system controls' at play, the definition of emergent phenomenon is in the outliers. But many data scientists treat those outliers as data to be rejected or cleansed from the set. The data doesn't match their hypothesis. This is classic confirmation bias at play. The authors rightly acknowledge that outlier data cannot be rejected out of hand. It must be accounted for in order to validate the hypothesis as well as the model. Just because you don't like something, doesn't mean you can ignore its existence. That's not scientific.
Scale is another factor to consider in the problem space, whether the domain
is life sciences or any other. What is the correct resolution of data? Or
information inputs when they are images? The data points to collect from those
images? The system being queried has different layers of complexity whether it
be a civilization and the domain is sociology, or the domain is neurosciences
and the object being studied is the human brain. The need to learn the correct
level, or scale at which to examine the data is based not only on the question
being asked, but also on the governing principles, the decisions taken to this
point, and the responses gleaned thus far. There is a need to learn the scale
as you perform the research, trial and error come into play.
The solution may lie in creating multiple variants of models, and then
combine them into a single model resulting in higher quality predictions,
because the sum of the errors of each of the sub-models cancel each other out,
while their outputs tend to enhance or have a combinatorial effect. This may be
faulty logic unless the ensemble methods used address the errors, because the
sum of those errors could also have a compounding effect on the output. It
creates a black-box effect in Industry, where the end users rarely look at how
the model was built: its explainability. Industry only cares about
effectiveness and accuracy for the most part, because there is no need to worry
about the negative or counterfactual implications. The causal relationships do
not matter. AI solves a discrete task, a function of operations, a limited need
to know. Science is not like this. The need to know in science is boundless.
Science needs the causal factors as well as the relationships to gain a deeper
understanding of system dynamics.
Extracting structures from the model representation of a domain (i.e.
real-life situations) is a pattern recognition problem. Repeating patterns
allow scientists to identify the building blocks of life, nature, and the
cosmos at the largest scale. When something repeats, it's called a pattern,
when patterns occur over and over again we call those structures. The closer to
the core, the more fundamental the structure, the further out from the core,
the more detailed and finer the scale the dimension becomes often with less and
less data to support it. Scale is handled by filter selection, how
fine-grained, high-dimensional, or low-dimensional for higher-ordered patterns.
Minimal models with a few parameters are said to be low-dimensional and are
useful when the data set is finite.
The problem with high-dimensional systems is that they exhibit patterns of chaotic behavior. They expand quickly beyond the ability of the model to handle and have high noise to signal ratios. Sometimes the data points merge or diverge when a method such as linear regression is used, losing the unique qualities that interesting for the researchers. This again goes to the question of scale, where is the right cut off point? And also the computing power required for complex, probabilistic models can be prohibitive. (Quantum computing may solve that problem in the near future.)
The authors take away two conclusions from this analysis: an AI model that
has good predictive accuracy must be learning some sort of approximated
parsimonious model; and existing methods are insufficient to identify or learn
from that parsimonious model in any scientific way. It still requires a human
intervention to test and validate the learnings, showing that only the first
step of the scientific method can be automated by AI, that of forming the
hypothesis. Therefore, complex models, while they may contain well-defined,
low-dimensional sub-models, do not lend themselves to aggregation into a
globally applicable structure that can be analyzed with ease by a machine or
human. A taxonomy of model structures with definitions is provided in Figures
4A and B for reference.
Global sparsity does not translate to local sparsity in this configuration.
But the intermediate forms of model structures depending on either type of
sparsity may provide probative areas of interest and lead to high predictive
quality output. In other words, start small, combine what is good, and in
testing along the way, you may learn something interesting to pursue.
Statistical theory rests on models of global sparsity, lacking local bias in some way. Results for locally-biased models have proven to be brittle when tested. The thought is that by extrapolating from the local, then intermediate, then global scale, those global features will have higher-quality predictions. Seems logical. But if all models are wrong, and some are useful (as George Box stated1), how does one pick the models to incorporate at the local level to build up to the global? This problem of sub-models is inherent in both Industry and Science, when looking for domain knowledge to be extended. Global models treated as a black box end up being a sort of Mechanical Turk, doing the heavy lifting without much insight. The authors look to local models to provide the finer insights of a white-box approach, since there the inner workings of the model have been examined more closely due to finer-grained parameterization.
Ultimately, the challenge is to find the happy medium, to pursue the
intermediate-scale models. It is believed that this intermediate space is where
high-quality predictions lie. Data plus computation holds the potential to
solve challenges such as computer vision, predictability of drug outcomes, and
other hard problems. As always, the issue is testing the models, validating the
results, challenging them with negative cases.
Taking the approach of employing surrogate models to face this 'grand
challenge', the use of scientifically validated models, smaller models that
have been tested, core sets of data that are cleansed for training, and other
sources of ground truth can be called into service to establish a baseline.
These surrogate models are a type of lens through which one can examine the
model being tested. Even though the mechanisms of the two models may be
different or even unknown, if their responses match, that output provides a
level of validation for the candidate model. The proposition put forth by the
authors is that it is essential to develop a set of 'scientifically-validated
surrogates for state-of-the-art AI models' in order for AI-enabled science to
progress.
Operationally, they want to 'enable domain scientists to identify, test, and
validate' properties of surrogate models. This implies creating a suite of
tools and processes for that aspiration to be realized. The authors then go on
to provide a history lesson in analogous developments from chemistry and
mathematics, where similar approaches were taken. They state that better,
faster computers will not solve the problem when the challenge lies in the area
of validation and verification.
Naturally, the Next Steps, as Section 8 is
titled, revolve around the development of this set of surrogate models,
including counterfactual ones. Acknowledging that this is virgin territory for
researchers, with little or no methods in existence, key to the success of
their proposal is a three-point plan: create theories and methods for surrogate
model development; statistical analysis of the output of those surrogate
models, especially the outliers; and strategies for counterfactual testing and
reasoning of the models. All of the above requires valid use cases for each
domain, as well as learning from historic examples in engineering practices,
hard sciences, and life sciences.
Conclusions
Addressing the elephant in the room, ethical AI is especially relevant when
discussing medical applications, life sciences, or anything that affects people
in their daily lives. Social bias predictions, blindly following the data
without examining who chose the data in the first place, allowing a machine to
make decisions with no human intervention in the interpretations, all lead to
potentially damaging results for individuals and communities. Keeping a human
in the loop is critical. An example given is crowd-sourcing the data input to
ensure a wider, more diverse representation of demographics. Using local
surrogate models against global models tests the global model's applicability
to a new region. The cost of being wrong in these situations is high when
medical treatments are the subject matter.
Teaching a machine to know what it doesn't know, to in essence identify a gap in data, a bias, is another area for research and exploration. These 'known unknowns' go to the question of competency of the model. But again it is a question of trust. Trusting the technology means trusting the people behind its creation, and absent the tools to validate the output as well as the methodologies, it is a matter of reputation and awareness. Once again, the authors argue that surrogate models provide the answer, as they are orthogonal to the methods used for training the candidate models being tested. The measure of uncertainty for models would then have a more theoretical foundation from which to operate.
Data being retrospective in nature is hard to mold into a predictive tool.
The decisions being made based on AI and data-driven models are entering the
daily lives of people at an increasing rate without societal awareness. It is
more incumbent than ever that scientists, both data and hard scientists, take
their ethical obligations seriously. This whitepaper is a step forward in
advancing and proposing a method and processes for doing so.
S Bolding—Copyright © 2022 · Boldingbroke.com
[1] George E. P. Box. Science
and statistics. Journal of the American Statistical
Association, 71(356):791–799, 1976. (See footnote 22 of the
whitepaper.)
No comments:
Post a Comment