Using Machine Learning, statistical methods, and a variety of other techniques from data mining principles, Predictive Analytics is a discipline of data sciences that takes historical and current data and uses that information to make predictions about related unknown future events. It’s not mind reading, rather it’s a statistical probability that something will occur based on past activity. A vector projected into the future that says if people or things keep behaving the way they have in the past, they will continue on this same path into the future. Of course, there could be some unforeseen catastrophic event that changes the course of history, and therefore the behavior of people. But in general, people and things (like Markets, Industries, or even Diseases) tend to behave in predictable ways.
To determine this path of behavior, this vector if you will, a score or
probability for each entity in the pool of data is calculated. Remember, as we
have seen before an entity
is just a noun, a thing, such as a customer, an employee, a patient, a machine,
an SKU being sold.
Because you can measure, or score, the probability of behavior for entities,
this type of machine learning technique is very popular with financial
companies, such as insurance and banking. It can be used for credit scoring by
looking a customer’s credit history and determine the likelihood of making
payments on time. Predictive analytics can also be used by medical researchers
to track the development of diseases in populations over time. Other
applications are in the areas of social networking, gaming, commerce,
marketing, insurance, travel, and many more. One emerging popular application
is to predict global warming trends, as seen in this piece on Predicting
the Impact of Artic Sea Ice Loss on Climate.
Unlike other types of data sciences disciplines, the emphasis is on
prediction rather than description, classification, or clustering. In other
words, it is forward-looking rather than steady state. The goal is a rapid
analysis with an emphasis on business relevance and ease of use for the end
user. Predictive analytics provides a means to become proactive instead of
reactive to situations in the business landscape, providing insights and
information in advance of when they’re needed.
How does it work
As with any process, there are a standard number of steps to follow:
- Define
your project
- Collect
the necessary data
- Analyze
that data
- Statistically
model the data
- Use
predictive modeling techniques
- Deploy
your model
- Monitor
its performance and adjust as necessary
After determining the business goals, in order to ground your research you
will need to choose a methodology. The approach a researcher takes is entirely
dependent upon the question being asked, the business use case that is being
worked on. Start with a clearly defined objective. For example, if you are
working in medicine, the approach will take a different route than if you are
working in gaming or entertainment. There are a number of methodologies that
can be applied when talking about Predictive Analytics. The two most common are
Regression techniques and Machine Learning techniques. Machine Learning was discussed
in an earlier post.
Regression types include linear
regression, logistic
regression, time series models, classification
and regression
trees. These are among the most common. Less well known and yet just
as powerful, discrete choice, probit
regression, and multivariant adaptive regression
models and can be good choices depending on the use case being worked on. These
techniques are fairly rigorous and detailed to explain. You can learn more
about them in a deeper dive if you want to become a full-on data scientist and
learn to code. Suffice it to say, you should hire an expert to build these
models if you need someone to perform this type of analysis for you. Here is a
simple example explaining linear regression:
Familiarizing yourself with the tools available will help when you have conversations with the experts who are performing the coding and research. You may not know how to perform a time series model, but you can at least know that it’s different from a linear regression model and recognize the different charts created from the two data outputs.
Creating a model
To create one of these beasts, you need data, lots and lots of data to work
with. The data can be raw, or what’s called ‘unsupervised’, where it has not
been looked at by an expert who tags it with labels. Labeling is a process of
putting attributes or metadata into the information. At this point some of the
data becomes ‘supervised’ because it is being cleaned up by an expert in the
domain of knowledge. That person knows a lot about the topic and can determine
enough about the document or information to say what’s going on. The tags will
help teach the machine to learn about the information, circling back to Machine Learning
techniques.
Usually, researchers will try to have a sample of up to 10% of the data
tagged for purposes of training the model. This is a very human intensive task,
expensive and time consuming. Once the initial model is trained with the 10% of
the data, then it can scan through the remaining 90% and look for the same
patterns. At this point you have a fully functional model that can be used with
any sets of data. Because information is always changing as people and
societies evolve, the model will need to be retrained or updated with fresh
training data on a periodic basis.
Because this is a new science, some people are very skeptical about the
outcomes and dangers of using it in sociological applications such as politics
and policing. Scenarios such as Minority Report are always used as cautionary
tales. It is one thing to predict that someone will buy the newest widget, it
is another thing to predict they will commit a crime and put them in jail for
something they have yet to do. But as I stated at the beginning, this is not an
attempt by the machine to read one person's mind. Rather it is an attempt in
aggregate to get a read on where groups of people who have common
characteristics may be headed for purposes of civic planning and other social
applications. To read more about this topic, see this article in Scientific
American: Will Democracy Survive Big Data and
Artificial Intelligence?
S. Bolding—Copyright © 2021 · Boldingbroke.com
No comments:
Post a Comment