Segmentation and classification are analytic techniques that help firms compare and group customers who share common characteristics (i.e., segmentation variables) into homogeneous segments and identify ways to target particular segments of customers in a market based on external variables (i.e., discriminant variables).
Segmentation refers to the process of classifying customers into homogenous groups (segments), such that each group of customers shares enough characteristics in common to make it viable for the firm to design specific offerings or products for it. This application identifies customer segments using needs-based variables called basis variables.
To download the Enginius tutorial in pdf format: (1) Follow the link below. It will open an example data set, then (2) Click on the link in the upper-left corner of the screen.
Segmentation variables measure those factors that are central in determining the similarity between two respondents. Those variables serve as the basis for segmentation and are often called basis variables. They might include customer’s needs, wants, expectations, or preferences.
Discriminant variables, also called descriptors, are variables that can describe the segments formed based on the segmentation variables. These include demographic variables, such as educational level, gender, income, media consumption, and the like. In a good segmentation, knowledge of the descriptor or discriminant variables can predict the respondents segment (as calculated from the basis variables).
It is usually good to start with hierarchical clustering (which builds up or breaks down the data, row by row), at least to determine the appropriate number of clusters for K-Means, which partitions the data but requires both a starting group center (centroid) and a number of clusters to get started.
The dendrogram or elbow chart from the software output generally shows where there is no longer much to be gained in separating the data more finely. But the purpose of the segmentation will often dictate the appropriate number of clusters. So, think about building a business case for the number of clusters: (1) what do the data tell us and (2) how can we implement the segmentation? (If the purpose of the segmentation is to allocate all customers to, say, 3 sales reps, then there should be 3 segments)
Those in green are statistically significantly (at 95%) above the remaining population mean; those in red are statistically significantly (at 95%) below the remaining population mean.
The discrimination analysis tells how well descriptive data (generally available for all customers and prospects) will predict segment membership. The confusion matrix shows actual versus predicted segment membership from the discrimination. If all segments are roughly the same size, then there is one chance in n (n= number of segments) for correct classification, if there is no information in the descriptor data; that is, if the classification is done at random So, for a 4 segment solution, a good discrimination should do far better than 1/4 or 25% correct classification. In addition, high values on the diagonal of the confusion matrix indicate good classification.