Tuesday, April 30, 2024

Predictive Analytics

Using Machine Learning, statistical methods, and a variety of other techniques from data mining principles, Predictive Analytics is a discipline of data sciences that takes historical and current data and uses that information to make predictions about related unknown future events. It’s not mind reading, rather it’s a statistical probability that something will occur based on past activity. A vector projected into the future that says if people or things keep behaving the way they have in the past, they will continue on this same path into the future. Of course, there could be some unforeseen catastrophic event that changes the course of history, and therefore the behavior of people. But in general, people and things (like Markets, Industries, or even Diseases) tend to behave in predictable ways.

To determine this path of behavior, this vector if you will, a score or probability for each entity in the pool of data is calculated. Remember, as we have seen before an entity is just a noun, a thing, such as a customer, an employee, a patient, a machine, an SKU being sold.

Because you can measure, or score, the probability of behavior for entities, this type of machine learning technique is very popular with financial companies, such as insurance and banking. It can be used for credit scoring by looking a customer’s credit history and determine the likelihood of making payments on time. Predictive analytics can also be used by medical researchers to track the development of diseases in populations over time. Other applications are in the areas of social networking, gaming, commerce, marketing, insurance, travel, and many more. One emerging popular application is to predict global warming trends, as seen in this piece on Predicting the Impact of Artic Sea Ice Loss on Climate.

Unlike other types of data sciences disciplines, the emphasis is on prediction rather than description, classification, or clustering. In other words, it is forward-looking rather than steady state. The goal is a rapid analysis with an emphasis on business relevance and ease of use for the end user. Predictive analytics provides a means to become proactive instead of reactive to situations in the business landscape, providing insights and information in advance of when they’re needed.

How does it work

As with any process, there are a standard number of steps to follow:

  1. Define your project
  2. Collect the necessary data
  3. Analyze that data
  4. Statistically model the data
  5. Use predictive modeling techniques
  6. Deploy your model
  7. Monitor its performance and adjust as necessary

After determining the business goals, in order to ground your research you will need to choose a methodology. The approach a researcher takes is entirely dependent upon the question being asked, the business use case that is being worked on. Start with a clearly defined objective. For example, if you are working in medicine, the approach will take a different route than if you are working in gaming or entertainment. There are a number of methodologies that can be applied when talking about Predictive Analytics. The two most common are Regression techniques and Machine Learning techniques. Machine Learning was discussed in an earlier post.

Regression types include linear regression, logistic regression, time series models, classification and regression trees. These are among the most common. Less well known and yet just as powerful, discrete choice, probit regression, and multivariant adaptive regression models and can be good choices depending on the use case being worked on. These techniques are fairly rigorous and detailed to explain. You can learn more about them in a deeper dive if you want to become a full-on data scientist and learn to code. Suffice it to say, you should hire an expert to build these models if you need someone to perform this type of analysis for you. Here is a simple example explaining linear regression:

Familiarizing yourself with the tools available will help when you have conversations with the experts who are performing the coding and research. You may not know how to perform a time series model, but you can at least know that it’s different from a linear regression model and recognize the different charts created from the two data outputs.

Creating a model

To create one of these beasts, you need data, lots and lots of data to work with. The data can be raw, or what’s called ‘unsupervised’, where it has not been looked at by an expert who tags it with labels. Labeling is a process of putting attributes or metadata into the information. At this point some of the data becomes ‘supervised’ because it is being cleaned up by an expert in the domain of knowledge. That person knows a lot about the topic and can determine enough about the document or information to say what’s going on. The tags will help teach the machine to learn about the information, circling back to Machine Learning techniques.

Usually, researchers will try to have a sample of up to 10% of the data tagged for purposes of training the model. This is a very human intensive task, expensive and time consuming. Once the initial model is trained with the 10% of the data, then it can scan through the remaining 90% and look for the same patterns. At this point you have a fully functional model that can be used with any sets of data. Because information is always changing as people and societies evolve, the model will need to be retrained or updated with fresh training data on a periodic basis.

Because this is a new science, some people are very skeptical about the outcomes and dangers of using it in sociological applications such as politics and policing. Scenarios such as Minority Report are always used as cautionary tales. It is one thing to predict that someone will buy the newest widget, it is another thing to predict they will commit a crime and put them in jail for something they have yet to do. But as I stated at the beginning, this is not an attempt by the machine to read one person's mind. Rather it is an attempt in aggregate to get a read on where groups of people who have common characteristics may be headed for purposes of civic planning and other social applications. To read more about this topic, see this article in Scientific American: Will Democracy Survive Big Data and Artificial Intelligence?


S. Bolding—Copyright © 2021 · Boldingbroke.com


No comments:

Post a Comment

Generative AI: Risks and Rewards

 Benefits of GenAI The landscape of general computing has changed significantly since the initial introduction of ChatGPT in November, 202...