ADVIZOR Help
v7.2+
v7.2+
  • Introduction
  • Overview
    • ADVIZOR Help
    • ADVIZOR Overview
      • Analyst
      • Analyst/X
      • Data Blender
      • Desktop Navigator
      • Server AE
      • Managed Hosting
  • Using ADVIZOR
    • File Ribbon
      • Open an Existing ADVIZOR Project
      • Restore a Backup Project Version
      • Save a Project
      • Template Library
      • Update Expired Credentials
    • Analyze Ribbon
      • Select and Exclude Data
      • Flight Recorder
      • Set Coloring
        • Use Color Scale
        • Use Color By
        • Color Legend
      • Navigation Pane
    • Author Ribbon
      • Charts, Pages, and Dashboards
        • Composing Pages with Charts
        • Page Gallery
      • Load Data
        • Load New Data Using the Data Wizard
        • Load Text Data
        • Load Microsoft Excel Data
        • Load Microsoft Access Data
        • Load SQL Server Data
        • Load Oracle Data
        • Load a Database via ODBC
        • Manage Data Sources
        • Replace an Existing Data Source
      • Design Pages
        • Create Navigation Pane Content
        • Rearrange Charts
        • Change Chart Fonts
      • Use Color Models
        • Manage Color Models
        • Assign Color Models to Pages
        • Color Workshop
        • Identify How Color Is Applied to Data
        • Uncolorable Tables
      • Configure Charts
        • Property Explorer
        • Link Unmatched Rows
        • Show Missing Values
        • Use Polygon Map Format
        • Use FocusFormat Property
      • Condition Data
        • Project Workshop
        • Use the Expression Builder
        • Use the Link Wizard
        • Delete a Link
        • Date Formatter
        • Configure Data Hierarchies
      • Explore Data Usage
      • Identify Issues with Legacy Projects
      • Data Pool Visualization
    • Model Ribbon
      • Predictive Analytics: Analyst/X
      • Analytics Process
        • Bin a Categorical Field
        • Date Fields
        • Zip Codes
      • Predictive Modeling Pane
      • Configuring a Model
      • Managing Models
    • Share Ribbon
      • Share Results
      • Export Tables
      • Deployment and ADVIZOR Server
        • ADVIZOR Server Dashboards
        • Publishing to ADVIZOR Server
        • Server Security
        • Credential Based Filters
        • Embedding Data in a Project
  • Charts and Visual Discovery
    • Charts Overview
      • Bar Chart
        • Inserting a Bar Chart
        • Bar Chart Toolbar
      • Counts
        • Inserting a Counts
        • Counts Toolbar
      • Data Constellation
        • Inserting a Data Constellation
        • Data Constellation Toolbar
      • Data Sheet
        • Inserting a Data Sheet
        • Data Sheet Toolbar
      • Heat Map
        • Inserting a Heat Map
        • Heat Map Toolbar
      • Histogram
        • Inserting a Histogram
        • Histogram Toolbar
      • Line Chart
        • Inserting a Line Chart
        • Line Chart Toolbar
      • Map
        • Inserting a Map
        • Map Toolbar
      • Multiscape
        • Inserting a Multiscape
        • Multiscape Toolbar
      • Parabox
        • Inserting a Parabox
        • Parabox Toolbar
      • Pie Chart
        • Inserting a Pie Chart
        • Pie Chart Toolbar
      • Scatterplot
        • Inserting a Scatter Plot
        • Scatter Plot Toolbar
      • Summary Sheet
        • Inserting a Summary Sheet
        • Summary Sheet Toolbar
      • Text Box
        • Inserting a Text Box
        • Text Box Toolbar
      • Text Filter
        • Inserting a Text Filter
        • Text Filter Toolbar
      • Time Table
        • Inserting a Time Table
        • Time Table Toolbar
    • Recommended Chart Use
    • Visual Discovery
      • Using Colors
      • Selection
      • Managing Viewpoint
      • Missing Values
    • User Interfaces
      • Context Menu
      • Keyboard
  • Release Notes
    • What's New
      • Release 7.2
      • Release 7.1
      • Release 7.0
      • Release 6.8
      • Release 6.7
      • Release 6.6
      • Release 6.4
      • Release 6.3
      • Release 6.2
      • Release 6.2.2
      • Release 6.0
      • Release 5.9
      • Release 5.8.2
      • Release 5.7
      • Release 5.6.2
      • Release 5.6.1
      • Release 5.51
      • Release 5.5
      • Release 5.4.1
      • Release 5.4
      • Release 5.3
      • Release 5.22d
      • Release 5.2 SalesAdvizor
      • Release 5.1
      • Release 5.0.3
      • Release 5.0
    • If You Need Additional Help
    • Copyright
Powered by GitBook
On this page
  • Bin Zip Codes by Characteristics
  • Group Zip Codes
  1. Using ADVIZOR
  2. Model Ribbon
  3. Analytics Process

Zip Codes

PreviousDate FieldsNextPredictive Modeling Pane

Last updated 5 years ago

Zip codes seem like a good explanatory field since they represent geographical location, but their are problems with their use:

  • There are a large number of zip codes that will slow modeling.

  • Zip codes that are missing from your training data will not be included as explanatory factors even if they should be.

Here are strategies for modeling zip codes.

Bin Zip Codes by Characteristics

Attach a Score to each Zip Code that groups like Zip Codes into bins. If, for example, all wealthy zip codes are equally representative of wealth, then they all should be ranked together whether or not a member of the target population happened to be in a particular Zip Code. For example, in a fundraising dataset, if a large donor resides in Greenwich, CT causes that code to be ranked high, but there are no cases living in Winnetka, IL, this will cause that zip code to be ranked low. This would mean people living in Winnetka would get a low wealth score just because nobody in the target population had lived there so far.

Either a numeric score or a categorical grouping would be attached to each Zip Code and used in the model. If, for example, the "A-Wealthiest" group scored high because it was more highly represented in the Target than "D-Midlevel Wealth", every community in that group would receive the same high score in the model calculation.

Two sources we use for this data:

  • Forbes Wealth Zips: flags the 500 wealthiest zips in the US.

  • IRS stats on wealth, population density, etc. This is more comprehensive in that every zip in US gets scored; it also can be used to flag urban (high density) vs. rural (low density). This can be useful because, for example, urban low income communities are actually quite different from rural low income communities in many ways. You can get IRS zip code data from

To use this data in ADVIZOR:

  • Load the additional table(s) into the project. If the project is set to reload data, then this new data will need to be on an accessible network drive

  • Copy the relevant scores from the new table to the existing entity table in the project using the Table Link wizard, using Zip Code as the common key.

    • Be sure that zip codes in both table are either both integers, or both strings so that the fields will match. You can use the expression builder to match type if this is needed.

    • Make sure that the encoding is consistently 5 or 9 digit codes. If there is a mixture, convert them all to 5 digit using the Expression Builder.

Group Zip Codes

Group the Zip Codes with less than 25 members in the Base Population into an "Insufficient Datai" bin. The model can be run against the Zip Codes themselves as long as the Zip Codes with low participation are first binned into a "Insufficient Data" bin. Conceptually a categorical field like a Zip Code should have at least 25 members of the Base Population in that Zip Code. With a population of 0 or 1 any scoring is essentially random. By the time the membership reaches 10 the non-systematic risk is reduced. By 25 it is largely eliminated. You still will need to be careful of classification models with a small target population relative to the base population. If that percentage is, say, 1% then you should raise this number to maybe 100 per category.

Assuming a standard forecast model, or a classification model with Target/Base in the 5%+ range, a good strategy is to bin every Zip Code with less than 25 members in the Target into an "Insufficient Data" bin, and then the model will run against the remaining Zip Codes. With this method you will learn about the statistically relevant Zip Codes, but you will not learn about the Zip Codes in the "Insufficient Data" bin. It is possible that there are members in this bin that should be scored high, but are not just because there is insufficient data in that Zip Code.

You should use this approach if you believed there were unique and possibly qualitative aspects of these various communities that cannot be adequately represented by any of the scores as described in the previous method.

To do this in ADVZIZOR:

  • Determine your Base Population for the model.

  • Add a column to the entity table labeled "OneCount" with a value of "1" for each member that is in the base population using the Expression Builder.

  • Roll up the entity table on Zip Code, and Sum "OneCount". This will tell you how many base population entities are in each zip code.

  • Copy "OneCount" from this rollup table back to the entity table using Zip Code as the key.

    • Be sure that zip codes in both table are either both integers, or both strings so that the fields will match. You can use the expression builder to match types if needed.

    • Make sure that zip codes are consistently 5 or 9 digit codes; if a mix convert them all to 5 digit using the Expression Builder.

  • Create a new column in the entity table: "ZipCodeForModel" using the expression "if OneCount < 25 then "Insufficient Data" else string(Zip Code)".

Note that the zip codes should be strings so that they are evaluated as discrete items, not a range of numbers.

http://www.irs.gov/uac/SOI-Tax-Stats-Individual-Income-Tax-Statistics-ZIP-Code-Data-(SOI)