Analytics Process
Have a business question.
Pose the question in terms of a single target field:
Maximize profit, minimize duration, … or
Find which of two groups an item belongs in.
Get data into a single table with target and explanatory fields.
Create a model.
Evaluate the model.
Examine the model to understand the relationships.
Apply the model to new data to predict the target.
Have a business question
You begin with a question that you would like to answer.
Good questions are or one of these two forms:
What maximizes (or minimizes) the value of this field? E.g., what maximizes profit, minimizes delay, etc.
What determines which of two possible states are used. E.g., customers who are profitable, customers who responded to a solicitation, web sessions which involved fraudulent activity, etc. This problem is often called "classification".
You can also identify a group of items by graphically selecting in ADVIZOR Analyst charts and then use the Predictive Analytics to describe what determine membership in that group. This lets you in effect define a new target field, using the selection as a data field that you have interactively defined. A new field is created in a table based on the current selection state.
Pose the question in terms of a single target field
The data to answer your question must be in a single data table.
One field in the table is the "target" that answers the business question.
The target field can only be of one of two types:
A continuous field: an integer or real number.
A field that takes only two values. This field may be an integer or string field. It does not matter what the two conditions are.
For a classification problem (two state target), it may be necessary to pre-condition the data to have such a target field. This may be done in the data source, or possibly via ADVIZOR Analyst tools such as the Expression Builder.
The target cannot be a nominal field (string or integer) with more than 2 values. If you think this is what represents your business question, restate the question in terms of maximizing or minimizing a measure (continuous data field), or preprocess the data into two categories (e.g., "successful", "unsuccessful").
The target field can be the selection state; select an interesting subset of the data in ADVIZOR Analyst charts than use selection as the target field. This lets you treat selection as a classification target that you can interactively define. This can be an alternative to conditioning your data to provide a classification target that answers your question.
Condition explanatory fields
The target and explanatory fields must be in a single table.
You may omit table fields from your model.
Omit fields that are keys, where each row has a unique value.
Omit text fields containing arbitrary text that is mostly unique to each row, such as comments fields or addresses.
Create a Model
Give the model a descriptive name.
Tip: Base the name on the target field and the business question.
Select the target field, or create a field based on the selection state and used that as the target.
Select the explanatory fields. Typically you can use all fields as explanatory fields; you may wish to omit some if you know in advance that they are highly correlated with the target.
If your target is the selection state, and the selection state was defined by a selection in a single chart, exclude the fields represented by the chart, since the result will be most strongly correlated with those fields; after all, you used them to define the set!
Evaluate the Model
Evaluate the Information metric to see if the model is adequate.
For ordinary regression models, the R2 (or "R squared") metric is used. This gives the amount of variation in the target explained by the model.
For classificatinon models, the information metric is "Percent Concordance". This compares the predicted values versus actual target values. It is also a value between 0% and 100%. This may be used to compare alternative models for the same target.
Any Information Metric above 0 indicates that the model has predictive capability, that the prediction is better than random. Values above 95 are highly predictive. To increase the Information metric, add additional fields to the data table used to train the model.
Understand the Model
Two pages are added to your project with charts that help you understand the relationships in the model.
The first page for ordinary regression models with continuous targets shows the information metric, the equation terms, and the relationship of the original target field to the predicted field.
The "% Contribution to Model" bar chart shows what percent of the model's overall predictive ability (given by the "Quality" R2 metric) comes from each explanatory field.
The first page for classification models shows the information metric, the equation terms, the weights of terms, the original target distribution, the model predicted distribution, the probability scores, and bins of the probability scores.
"% Contribution to Model" bar chart represents how much effect each field that is part of the model has on the target.
A second page contains a Heat Map showing the model table and which fields influence the target.
Apply the Model to New Data
The model equation is optionally registered with the project as an Expression Builder formula. This is available in the Project Workshop with all other calculations. It will be automatically run whenever your project is refreshed with new data.
See Also
Last updated