Market segmentation algorithms seem to allow any number of basis variables for customer segmentation. I could have a sample of 100 respondents and cluster analyze them on the basis of 400 variables and the segmentation program would run just fine. This is a rookie mistake in market research segmentation, however, because of something called the "curse of dimensionality."
Understanding the Curse of Dimensionality in Segmentation
The curse of dimensionality in segmentation analysis is best explained in a 2019 article by Tony Yiu:
"When we have too many features, observations become harder to cluster – believe it or not, too many dimensions causes every observation in your dataset to appear equidistant from all the others. And because clustering uses a distance measure such as Euclidean distance to quantify the similarity between observations, this is a big problem. If the distances are all approximately equal, then all the observations appear equally alike (as well as equally different), and no meaningful clusters can be formed."
While that theoretical description works for some folks, others prefer to see what the curse of dimensionality looks like in action in real customer segmentation studies. So here's a real-life market segmentation example.
Real-World Example: How Too Many Variables Destroy Segmentation Quality
In a recent customer segmentation study with 1,000 respondents, we had some 80 potential basis variables for our market segmentation analysis. When we segment, we can get an F-statistic that tests how much each variable differs across customer segments (higher F means more variability). F is a convenient statistic for segmentation analysis because it allows for the variables to have differing scales.
The dramatic impact of the curse of dimensionality on market segmentation:
- When I segment using the first 20 variables as bases, the average F-statistic is 67
- When I segment using the first 40 variables, the average F-statistic for the first 20 drops to 50
- It drops further when I include 60 bases (to 36)
- And further still when I use 80 bases (to 32)
Just as we'd expect from the curse of dimensionality in customer segmentation, as I increase the number of basis variables, the between-segment differences get smaller and smaller, as the falling F-statistics prove.
How to Avoid the Curse of Dimensionality in Market Segmentation
The remedy for protecting your customer segmentation analysis is straightforward: use fewer basis variables for your market segmentation. Ideally the segmentation objectives would drive variable selection, but if not, there are some helpful and objective ways to do it: Variable Selection Harmony: A Tale of Two Methods.
References
Yiu, T. "The Curse of Dimensionality: Why High Dimensional Data Can Be So Troublesome," Towards Data Science, July 20, 2019, https://towardsdatascience.com/the-curse-of-dimensionality-50dc6e49aa1e. Accessed 5/16/2025.