Customer segmentation is the practice of dividing a customer base into groups of individuals that are similar in specific ways relevant to marketing, such as age, gender, interests and spending habits. Customer segmentation relies on identifying key differentiators that classify customers into distinct groups that can be targeted.Customer segmentation procedures include:
- Deciding what data will be collected and how it will be gathered
- Collecting and integrating data from various sources
- Developing data analysis methods for customer segmentation
- Establishing effective communication among the impacted business units, such as marketing and customer service, about the segmentation
- Implementing processes to respond to the insights delivered by the data analysis process
Information such as a customers’ demographics (age, race, religion, gender, family size, ethnicity, income, education level), geography (where they live and work), psychographic (social class, lifestyle and personality characteristics) and behavioral (spending, consumption, usage and desired benefits) tendencies are considered when determining customer segmentation practices. In this article, the customer segmentation is based on customers’ actual purchases and the analysis is performed on frequency and recency of their purchases. This analysis approach is called RFM (recency, frequency, monetary) analysis.
RFM (recency, frequency, monetary) analysis is a marketing technique used to determine quantitatively which customers are the best ones by examining how recently a customer has purchased (recency), how often they purchase (frequency), and how much the customer spends (monetary).
Recency and frequency are very important behavior metrics; we are interested in these metrics, because frequency affects client’s lifetime value and recency affects retention. Hence, these metrics help us to understand where a customers are located in their lifecycle. Based on this intelligence, we can segregate customer base into distinct groups (segments) so that we could:
- pain a snapshot of the current state of affairs the customers find themselves in
- accurately target marketing and increase marketing budget performance
- customise offers for different customer groups
- and, eventually increase customers’ life-time and value
For achieve these objectives, we will use LifeCycle Grids, visualize the information, and perform some in-depth analysis using the R statistical programming language.
As the first step for, let’s generate some data for simulation & analysis. The following R code generates the required data.
# Load the required libraries library(dplyr) library(reshape2) library(ggplot2) # Create data sample set.seed(10) data <- data.frame(orderId=sample(c(1:1000), 5000, replace=TRUE), product=sample(c('NULL','a','b','c'), 5000, replace=TRUE, prob=c(0.15, 0.65, 0.3, 0.15))) order <- data.frame(orderId=c(1:1000), clientId=sample(c(1:300), 1000, replace=TRUE)) gender <- data.frame(clientId=c(1:300), gender=sample(c('male', 'female'), 300, replace=TRUE, prob=c(0.40, 0.60))) date <- data.frame(orderId=c(1:1000), orderdate=sample((1:100), 1000, replace=TRUE)) orders <- merge(data, order, by='orderId') orders <- merge(orders, gender, by='clientId') orders <- merge(orders, date, by='orderId') orders <- orders[orders$product!='NULL', ] orders$orderdate <- as.Date(orders$orderdate, origin="2012-01-01") rm(data, date, order, gender)
The above code generates a list of 4378 objects of 5 variables – OrderID, ClientID, Product, Gender, & OrderDate. The list looks as follows:
In this example, we have used only 5 variables, but additional variables, such as channel, campaign, demographic data, etc. could be included, if available, for deriving further insights.
A LifeCycle Grid is a matrix with 2 dimensions, comprising of:
- Frequency, which is expressed in number of purchased items or placed orders
- Recency, which is expressed in days or months since the last purchase
As it is impossible to work with infinite segments, the first step is to define some boundaries of frequency and recency that help us to classify customers into homogeneous groups (segments). The analysis of the distribution of the frequency and the recency in our data set combined with the knowledge of business aspects can help us to find suitable boundaries. Therefore, we need to calculate two values:
- number of orders that were placed by each client (or in some cases, it can be the number of items)
- time lapse from the last purchase to the reporting date
We then plot the distribution for exploratory analysis. The following R code helps us with this step:
# Reporting date today <- as.Date('2012-04-11', format='%Y-%m-%d') # Processing data Orders2 <- dcast(Orders, OrderID + ClientID + Gender + OrderDate ~ Product, value.var='Product', fun.aggregate=length) Orders2 <- Orders2 %>% group_by(ClientID) %>% mutate(frequency=n(), recency=as.numeric(today-OrderDate)) %>% filter(OrderDate==max(OrderDate)) %>% filter(OrderID==max(OrderID)) # Exploratory analysis ggplot(Orders2, aes(x=frequency)) + theme_bw() + scale_x_continuous(breaks=c(1:10)) + geom_bar(alpha=0.6, binwidth=1) + ggtitle("Distribution by frequency") ggplot(Orders2, aes(x=recency)) + theme_bw() + geom_bar(alpha=0.6, binwidth=1) + ggtitle("Distribution by recency")
The above code reshapes the list into a structure of 289 objects of 9 variables as shown below:
The following plots – Distribution by frequency & recency – help us with exploratory analysis.
Analysing the early behaviour is the most important, so finer detail is usually good there. Usually, there is a significant difference between customers who bought 1 time and those who bought 3 times, but is there any difference between customers who bought 50 times and other who bought 53 times? That is why it makes sense to set boundaries from lower values to higher gaps. We will use the following boundaries:
- Frequency: 1, 2, 3, 4, 5, >5
- Recency: 0-6, 7-13, 14-19, 20-45, 46-80, >80
Next, we need to add segments to each client based on the boundaries. Also, we will create new variable ‘Cart’, which includes products from the last cart, for doing in-depth analysis. The following R code helps us achieve this:
Orders2.Segments <- Orders2 %>% mutate(Segments.freq=ifelse(between(frequency, 1, 1), '1', ifelse(between(frequency, 2, 2), '2', ifelse(between(frequency, 3, 3), '3', ifelse(between(frequency, 4, 4), '4', ifelse(between(frequency, 5, 5), '5', '>5')))))) %>% mutate(Segments.rec=ifelse(between(recency, 0, 6), '0-6 days', ifelse(between(recency, 7, 13), '7-13 days', ifelse(between(recency, 14, 19), '14-19 days', ifelse(between(recency, 20, 45), '20-45 days', ifelse(between(recency, 46, 80), '46-80 days', '>80 days')))))) %>% # creating last cart feature mutate(cart=paste(ifelse(a!=0, 'a', ''), ifelse(b!=0, 'b', ''), ifelse(c!=0, 'c', ''), sep='')) %>% arrange(ClientID) # defining order of boundaries Orders2.Segments$Segments.freq <- factor(Orders2.Segments$Segments.freq, levels=c('>5', '5', '4', '3', '2', '1')) Orders2.Segments$Segments.rec <- factor(Orders2.Segments$Segments.rec, levels=c('>80 days', '46-80 days', '20-45 days', '14-19 days', '7-13 days', '0-6 days'))
The above code reshapes the data into the following structure:
Now, we have all the necessary information required to create LifeCycle Grids. All what we have to do now is to combine clients into segments with the following R code:
lcg <- Orders2.Segments %>% group_by(Segments.rec, Segments.freq) %>% summarise(quantity=n()) %>% mutate(client='client') %>% ungroup()
The above step reshapes data and classifies the customers into segments as shown below:
The classic matrix can be created with the following code:
However, I suppose a good visualization is obtained through the following code:
ggplot(lcg, aes(x=client, y=quantity, fill=quantity)) + theme_bw() + theme(panel.grid = element_blank())+ geom_bar(stat='identity', alpha=0.6) + geom_text(aes(y=max(quantity)/2, label=quantity), size=4) + facet_grid(Segments.freq ~ Segments.rec) + ggtitle("LifeCycle Grids")
The code produces the following LifeCycle Grids:
This model of segmentation is stable and alive simultaneously in terms of customers flow. Every day, with or without purchases, it will provide customers flow from one cell to another. And, it is stable in terms of working with segments. It allows to work with customers who are on the same lifecycle phase. That means you can create suitable campaigns / offers / emails for each or several close cells and use them constantly. R allows us to create subsegments and visualize them effectively. It can be helpful to distribute each cell via some features. For instance, there can distribute customers by gender. For the other example, where our products have different lifecycles, it can be helpful to analyze which product/s was/were in the last cart or we can combine these features. Let’s do this with the following code:
lcg.sub <- Orders2.Segments %>% group_by(Gender, cart, Segments.rec, Segments.freq) %>% summarise(quantity=n()) %>% mutate(client='client') %>% ungroup() ggplot(lcg.sub, aes(x=Gender, y=quantity, fill=cart)) + theme_bw() + theme(panel.grid = element_blank())+ geom_bar(stat='identity', position='fill' , alpha=0.6) + facet_grid(Segments.freq ~ Segments.rec) + ggtitle("LifeCycle Grids by Gender and last cart (propotion)")
This code produces the following graphic that helps analyse LifeCycle Grids by gender and last cart (proportion):