Thursday, May 2, 2024

Use Case: Explainable AI, Learning from Learning Machines

Pattern Computer, out of Friday Harbor, WA has published a new whitepaper about XAI (Explainable AI). You can download the PDF here. The press release is here. The researchers at Pattern Computer present an interesting use case of how AI in its current state is able to advance scientific research, and yet still faces fundamental challenges. I have summarized the article in plain English as some of the concepts are highly technical and related to data sciences theory.

Hypothesis: One can use AI and data sciences to change the basic nature of scientific discovery. Researchers no longer need to form a hypothesis, instead they should let the data tell them what the emerging issue/idea is to pursue in the broader world. In order to achieve this, adequate tools must be developed.

Today, science leans on human intuition to form a hypothesis about an issue or idea being investigated. This is dependent on that human having deep domain knowledge. With Machine Learning (ML), the machines are domain experts by virtue of the data they hold and analyze. The methods for creating the models and neural networks are dependent on the humans who select and input the data as well as code the algorithms. However, machines are able to go beyond human capacity, in that they are able to be domain-agnostic and are faster, more efficient, and can learn. AI models make connections that are profound that a human would miss.

Will this impact the sciences as much as it has other industries? Biology is one example where AI is already providing success. Other sciences that are early adopters are Climate Sciences, Drug Discovery, Agriculture, Cosmology, Neuroscience.

Why is AI more productive in Industry than the Sciences? It is the nature of the questions being asked: Industry looks to solve specific discrete problems and achieve returns on investments, money is on the line. The Sciences, however, seek to answer broader questions in the wider world, such as natural phenomena which is an order of magnitude and scale that goes way beyond the smaller, more targeted goals of Industry. As a result, AI in Sciences requires more investment and time, more data, and a more complex set of algorithms. Indeed, it all goes back to the questions being asked.

Additionally, scientific discovery seeks to answer 'how' and 'why' an outcome occurred, not just to arrive at the 'what' of the data output. The goal of the Pattern Computer whitepaper is to examine the promise of 'AI for Science' where, as they state, 'we must develop methods to extract novel, testable hypotheses directly from data-driven AI models.' In essence, use AI to automate one of the steps of the scientific process, that of forming a correct hypothesis. This will save time, money, and increase opportunities for furthering science. One of the most costly aspects of research is chasing incorrect hypotheses or investigating an idea only to find out that a competitor has already investigated and abandoned that idea. If you eliminate these false trails early on in the process, you increase the likelihood that your research will result in tangible outcomes.

Taking a 'scientific first-principle' approach to modeling means that the model is a secondary tool to the primary question of understanding the world. Therefore the model, regardless of methodology used to create it, must adjust to the data and needs of the domain being investigated. The simple approach of using a training set to create a candidate model, and then test, adjust, test, etc. is not sufficient because it only looks for the positive and does not account for counterfactual, or contradictory data. You get curve fitting, rather than accounting for the whole world. This results in a correct outcome but does not answer the critical questions of 'why' and 'how' that were mentioned earlier.

Statistical approaches investigate not only the data but the actual model, how it was built, and the domain being modeled. They provide confidence scores, answer the uncertainty factors, and look at the impact of parameters, both chosen and rejected. Statistical methods are just one way of testing a model, there are others that are not dependent on parameters, and the authors list several. The crucial factor is that the model be tested by outside methods to validate its effectiveness to reason quantitatively and qualitatively.

The authors provide a clean and well-presented history of improvements in tools and technology in Figure 1, which summarizes the advances in math and computing leading to today's state of the art in AI. They question whether data-driven AI can mix with Statistics the same way that Calculus and Computing are linked. Figure 2 provides a view of the applications of current AI in Science versus Industry.

Starting with a statistical approach of creating a minimal model based on a limited set of parameters, the authors posit that it is possible to infer from that proto-model a hypothetical collection of 'parameters to outcomes' that will indicate which parameters to include in a more complex alpha-model, and then iterate from there. However, in my opinion, this is little to no different from creating a training data set and then testing as was rejected in their introduction to the problem. They are simply using the statistical approach instead of starting with an ML approach. It is a matter of degrees rather than process at this juncture: we are splitting hairs.

In regard to their discussion of Generalization versus Extrapolation (Section 3), the authors look to expound upon their theory regarding the construction of models for scientific understanding of the world. And it all comes down to data: access to quality data is always a barrier to entry in any effort. They break the discovery of patterns into two classic parts: the capacity to learn from data and the capacity to learn from outliers that do not 'fit the curve' of the statistically interesting nodes. Why is number two important? Because it is in the outliers that the potential for new scientific discoveries may lie. But both are critical for the scientific method to succeed. In contrast for Industry, those outliers are anomalies to be cleansed and removed.

Extrapolation is the principle of using one concept to transfer learning to another domain. Figure 3 of the paper illustrates this principle of science. For AI research, the concept of extrapolation applies when a first-principles model is used to create phenomenological or semi-empirical models so that data can be extended beyond its narrow domain to neighboring areas of research.

But where does the principle of Generalization fit into this for AI? It is in the testing of accuracy that we see generalization methods being applied, and again it goes back to training sets, where a data set is split in two, one for creating the model and one for testing its output. This allows the creator to measure things like accuracy, precision, recall, and other important benchmarks. If you can then apply that model to a very different data set and get similar benchmarks, the model is said to be generalizable or extensible to other domains.

Extrapolation in contrast takes into account the counterfactual reasoning about data that could not be collected and what we can posit about it. The essential difference is that the AI model is either discovering patterns within a given data set or discovering patterns outside of that data set in the broader world–two entirely different problems. However, without proper tools and methods to evaluate the performance of the model being used, the extrapolation may end up with poor results. In other words, it provides a vector for further investigation, but does not provide conclusive evidence.

Where is the answer to this conundrum? Adversarial AIs which challenge those models may provide an automated means of testing. Such testing would increase the robustness of the models and reduce the noise in the signals produced.

Explaining AI is always a challenge. This whitepaper overall makes a significant attempt at explaining AI for Science. The authors go into detail in Section 4 on the importance of differentiating between 'explaining' and 'interpreting' the matter at hand. While it may seem like semantics, the details are important. One can understand a model but have a hard time understanding the implications of its output. The authors denote the difference as follows: 'the distinction between identifying patterns in the data/model (hence, ML-style generalization) versus discovering patterns in the world from data/model (hence, scientific extrapolation, or ML-style transfer learning).' But there is a more simplistic way to understand the difference, a simple English definition will suffice. Explaining is understanding what's going on; Interpreting is translating into new insights, the 'why' and the 'how'. This then, according to the authors results in a process of vetting the model in question for its fitness for use.

The Completeness of the model as a metric is not really addressed in this paper. But it is significant to the discussion, and somewhat implied. In modeling a domain for any purpose, the choice of data for training or input to the model is based on the question being asked. But as is noted in this paper, science seeks to answer questions about the wider world. Therefore, in looking at the quality of a model, and using principles of extrapolation, would it not also be important to examine the gaps or assumptions in the model, thereby avoiding bias. Examples of bias are well documented, such as facial recognition AI that fails when looking for minority races, or the lack of data about women's health in third-world countries when modeling heart disease. If healthcare models are based on first-world countries and the bulk of data over time has been collected about men, then it's hard to imagine that a model is complete in that domain. The authors discuss density estimations as embedded preponderances of data to some degree, and link that to regional effects. These and other barriers to integrating AI into the scientific process again trace back to the availability of input data. The authors acknowledge that the 'capacity to understand data representations' is a 'grand challenge' that has significant impacts on the scientific discovery process.

Learning additional information always results in a decision being taken. This is how science advances. You make a hypothesis, test and learn from it, then make decisions about the next steps. What that information has taught you about the world determines a new way of interacting with the data and parameters with which you are working. Starting out with semantically meaningful parameters describes a known system or area of knowledge. Then you examine what that model is telling you about the information. How you take decisions to extrapolate and form new hypotheses is a common sequence of events. When the data is within the same domain but lacking semantic meaning in and of itself, then the model may take that unstructured data and provide a new insight into it. Or it may distort and provide false trails.

The distinction is important. How you take decisions on which path to pursue is critical to the process. The authors posit that it is possible for AI algorithms to discover emergent parameters that lead to defining new semantics, intrinsic meaning, within a domain that lacks definition. This problem is non-trivial because the process as well as the results need to be subjected to rigorous testing. Emergent system features for modeling in a domain is a wide-open field of research at the moment, due to the lack of good, let alone sufficient, testing methods. One example the authors give is whether a model has hidden within it knowledge about a structure or pattern that humans lack the ability to discern. How to extract that knowledge in a scientific manner, and then make it more than an anecdote of the research? This is the non-trivial part where the scientists must be rigorous, detailed in their validation steps.

The lack of clear connections between parameters and patterns is where AI excels but is also where the most doubt can be found. Where there are 'as-yet-known system controls' at play, the definition of emergent phenomenon is in the outliers. But many data scientists treat those outliers as data to be rejected or cleansed from the set. The data doesn't match their hypothesis. This is classic confirmation bias at play. The authors rightly acknowledge that outlier data cannot be rejected out of hand. It must be accounted for in order to validate the hypothesis as well as the model. Just because you don't like something, doesn't mean you can ignore its existence. That's not scientific.

Scale is another factor to consider in the problem space, whether the domain is life sciences or any other. What is the correct resolution of data? Or information inputs when they are images? The data points to collect from those images? The system being queried has different layers of complexity whether it be a civilization and the domain is sociology, or the domain is neurosciences and the object being studied is the human brain. The need to learn the correct level, or scale at which to examine the data is based not only on the question being asked, but also on the governing principles, the decisions taken to this point, and the responses gleaned thus far. There is a need to learn the scale as you perform the research, trial and error come into play.

The solution may lie in creating multiple variants of models, and then combine them into a single model resulting in higher quality predictions, because the sum of the errors of each of the sub-models cancel each other out, while their outputs tend to enhance or have a combinatorial effect. This may be faulty logic unless the ensemble methods used address the errors, because the sum of those errors could also have a compounding effect on the output. It creates a black-box effect in Industry, where the end users rarely look at how the model was built: its explainability. Industry only cares about effectiveness and accuracy for the most part, because there is no need to worry about the negative or counterfactual implications. The causal relationships do not matter. AI solves a discrete task, a function of operations, a limited need to know. Science is not like this. The need to know in science is boundless. Science needs the causal factors as well as the relationships to gain a deeper understanding of system dynamics.

Extracting structures from the model representation of a domain (i.e. real-life situations) is a pattern recognition problem. Repeating patterns allow scientists to identify the building blocks of life, nature, and the cosmos at the largest scale. When something repeats, it's called a pattern, when patterns occur over and over again we call those structures. The closer to the core, the more fundamental the structure, the further out from the core, the more detailed and finer the scale the dimension becomes often with less and less data to support it. Scale is handled by filter selection, how fine-grained, high-dimensional, or low-dimensional for higher-ordered patterns. Minimal models with a few parameters are said to be low-dimensional and are useful when the data set is finite.


The problem with high-dimensional systems is that they exhibit patterns of chaotic behavior. They expand quickly beyond the ability of the model to handle and have high noise to signal ratios. Sometimes the data points merge or diverge when a method such as linear regression is used, losing the unique qualities that interesting for the researchers. This again goes to the question of scale, where is the right cut off point? And also the computing power required for complex, probabilistic models can be prohibitive. (Quantum computing may solve that problem in the near future.)

The authors take away two conclusions from this analysis: an AI model that has good predictive accuracy must be learning some sort of approximated parsimonious model; and existing methods are insufficient to identify or learn from that parsimonious model in any scientific way. It still requires a human intervention to test and validate the learnings, showing that only the first step of the scientific method can be automated by AI, that of forming the hypothesis. Therefore, complex models, while they may contain well-defined, low-dimensional sub-models, do not lend themselves to aggregation into a globally applicable structure that can be analyzed with ease by a machine or human. A taxonomy of model structures with definitions is provided in Figures 4A and B for reference.

Global sparsity does not translate to local sparsity in this configuration. But the intermediate forms of model structures depending on either type of sparsity may provide probative areas of interest and lead to high predictive quality output. In other words, start small, combine what is good, and in testing along the way, you may learn something interesting to pursue.

Statistical theory rests on models of global sparsity, lacking local bias in some way. Results for locally-biased models have proven to be brittle when tested. The thought is that by extrapolating from the local, then intermediate, then global scale, those global features will have higher-quality predictions. Seems logical. But if all models are wrong, and some are useful (as George Box stated1), how does one pick the models to incorporate at the local level to build up to the global? This problem of sub-models is inherent in both Industry and Science, when looking for domain knowledge to be extended. Global models treated as a black box end up being a sort of Mechanical Turk, doing the heavy lifting without much insight. The authors look to local models to provide the finer insights of a white-box approach, since there the inner workings of the model have been examined more closely due to finer-grained parameterization.

Ultimately, the challenge is to find the happy medium, to pursue the intermediate-scale models. It is believed that this intermediate space is where high-quality predictions lie. Data plus computation holds the potential to solve challenges such as computer vision, predictability of drug outcomes, and other hard problems. As always, the issue is testing the models, validating the results, challenging them with negative cases.

Taking the approach of employing surrogate models to face this 'grand challenge', the use of scientifically validated models, smaller models that have been tested, core sets of data that are cleansed for training, and other sources of ground truth can be called into service to establish a baseline. These surrogate models are a type of lens through which one can examine the model being tested. Even though the mechanisms of the two models may be different or even unknown, if their responses match, that output provides a level of validation for the candidate model. The proposition put forth by the authors is that it is essential to develop a set of 'scientifically-validated surrogates for state-of-the-art AI models' in order for AI-enabled science to progress.

Operationally, they want to 'enable domain scientists to identify, test, and validate' properties of surrogate models. This implies creating a suite of tools and processes for that aspiration to be realized. The authors then go on to provide a history lesson in analogous developments from chemistry and mathematics, where similar approaches were taken. They state that better, faster computers will not solve the problem when the challenge lies in the area of validation and verification.

Naturally, the Next Steps, as Section 8 is titled, revolve around the development of this set of surrogate models, including counterfactual ones. Acknowledging that this is virgin territory for researchers, with little or no methods in existence, key to the success of their proposal is a three-point plan: create theories and methods for surrogate model development; statistical analysis of the output of those surrogate models, especially the outliers; and strategies for counterfactual testing and reasoning of the models. All of the above requires valid use cases for each domain, as well as learning from historic examples in engineering practices, hard sciences, and life sciences.

Conclusions

Addressing the elephant in the room, ethical AI is especially relevant when discussing medical applications, life sciences, or anything that affects people in their daily lives. Social bias predictions, blindly following the data without examining who chose the data in the first place, allowing a machine to make decisions with no human intervention in the interpretations, all lead to potentially damaging results for individuals and communities. Keeping a human in the loop is critical. An example given is crowd-sourcing the data input to ensure a wider, more diverse representation of demographics. Using local surrogate models against global models tests the global model's applicability to a new region. The cost of being wrong in these situations is high when medical treatments are the subject matter.

Teaching a machine to know what it doesn't know, to in essence identify a gap in data, a bias, is another area for research and exploration. These 'known unknowns' go to the question of competency of the model. But again it is a question of trust. Trusting the technology means trusting the people behind its creation, and absent the tools to validate the output as well as the methodologies, it is a matter of reputation and awareness. Once again, the authors argue that surrogate models provide the answer, as they are orthogonal to the methods used for training the candidate models being tested. The measure of uncertainty for models would then have a more theoretical foundation from which to operate.

Data being retrospective in nature is hard to mold into a predictive tool. The decisions being made based on AI and data-driven models are entering the daily lives of people at an increasing rate without societal awareness. It is more incumbent than ever that scientists, both data and hard scientists, take their ethical obligations seriously. This whitepaper is a step forward in advancing and proposing a method and processes for doing so.

S BoldingCopyright © 2022 · Boldingbroke.com


[1] George E. P. Box. Science and statistics. Journal of the American Statistical Association, 71(356):791–799, 1976. (See footnote 22 of the whitepaper.)


No comments:

Post a Comment

Generative AI: Risks and Rewards

 Benefits of GenAI The landscape of general computing has changed significantly since the initial introduction of ChatGPT in November, 202...