Call Us @

(91) 96200 48623

Blog: Market Basket Analysis - Step by step approach

img

Market Basket Analysis - Step by step approach

The objective is to explain the steps involved in Market Basket Analysis (MBA) or Association Analysis. Also, explain the key terminologies used. 

We will leverage customer transaction data for developing association rules & insights which can be used for right product bundling and promotions, assortment planning and inventory management, and product placement in the stores.

Data Preparation

The main data source used for a Market Basket Analysis is customer purchase transaction data. The purchase slip or bill will have information on products purchased on a customer visit along with their quantities, prices, and overall prices.

The transaction table may store information as follow

 

 

 

OrderID

 

 

Transaction Date

 

Product ID

 

 

Product Description

 

Quantity Purchased

 

 

Unit Price

 

 

Price

11

1-Jan-14

23

Colgate 50gm

2

12

24

11

1-Jan-14

73

Modern Bread

1

30

30

12

1-Jan-14

23

Colgate 50gm

1

12

12

 

12

 

1-Jan-14

 

55

Pepsodent             Tooth

Brush

 

1

 

17

 

17

12

1-Jan-14

87

Cadbury Chocolate

1

21

21

 

 

 

 

 

 

 

 

 

From the above data warehouse table, we need to get data by order/visit. The dataset from a real-life scenario is here retail 1

 

For the example data used, the data look like this.

 

OrderID

ProductCodeList

 

[[1]]

"0"  "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12"

"13" "14" "15" "16" "17" "18" "19" "20" "21" "22" "23" "24" "25"

"26" "27" "28" "29"

[[2]]

"30" "31" "32"

[[3]]

"33" "34" "35"

[[4]]

"36" "37" "38" "39" "40" "41" "42" "43" "44" "45" "46"

[[5]]

"38" "39" "47" "48"

[[6]]

"38" "39" "48" "49" "50" "51" "52" "53" "54" "55" "56" "57" "58"

 

Data Analysis

Once we have data in the required format, we need to carry out univariate or exploratory analysis, so that we understand what is going on.

Some of the typical questions we will try to answer based on Market Basket Analysis are

  • What are the distinct visits?
  • What is the typical number of products purchased by a customer in an order or a visit?
  • What number of different SKUs (stock-keeping units) being sold in a week or month?
  • Which are the most frequent items or products?

We will try to answer these Market Basket Data Analysis questions using sample datasets. We can use R or Python for doing Market Basket Analysis (MBA). We will show the programming steps in the next blog.

 
 

 

 

 

 

 

 

 
 

 

 

 

 

 

 

 

 

 

 

 

Based on the above analysis, 60% of the orders or visits have between 2 to 10 products in an order. The next important question really is who are frequently bought products in customer baskets.

Product – 39 and 48 are the most frequently purchased products. This will help us to confirm if these are as per expectations.

Market Basket Analysis and Affinity Analysis

Post data preparation and exploratory analysis, we can shift to the main analysis targeted toward Market Basket Analysis (MBA).

Some of the key questions Market Basket Analysis (MBA) tries to answer are:

  • Should we perform market basket analysis at a product level or category level?
  • Do we have information on the sequence of products buying in a basket or customer visit?
  • Which are products bought together by the customers?
  • Can we conclude if product ‘A’ sale drives product ‘B’ sales?
  • What product categories are bought together?
  • What product is to be recommended given a customer has bought a product or a group of products?

 

Steps used in Market Basket Analysis

  • Identify Rules

Association Rules or Affinity between products bought together need to be identified based on transactional data.

Example of rules

lhs      rhs

support

confidence

lift

{3854} => {38}

0.001

0.913

5.159

{1045} => {32}

0.001

0.907

5.270

{4030} => {48}

0.001

0.826

1.728

{1473} => {39}

0.001

0.800

1.392

{1727} => {38}

0.002

0.931

5.263

               Lhs (left hand side) indicates first product or item considered for the rule

                Rhs (right hand side) indicates second product bought when first product is given (lhs) Support, Confidence and Lift shows relative importance of each rule identified.

  • Evaluate Rules

Support, Confidence, and Lift are key KPIs for evaluating rules and we will discuss the importance of each of these metrics.

Support: Support indicates percent transactions with a product combination. . Support indicates % of transactions that are supporting the rule. This is an important indicator to check whether there are enough transactions in support of the rule. In the above example, 0.01% of transactions have “{3854} => {38}” product combination occurring together.

 

Confidence: For measuring the quality of association rules, another measure of confidence is used. It is the ratio of support for a rule to the condition of one product purchase. For rule “{3854} => {38}”, we will find the support of these two products being bought together and also how many times the first product (“{3854}”) bought by a customer.

 

Conf ( R ) = Sup (A υ B)/Sup (A)

 

A rule indicates that Product B is bought along with Product A. So if buying product A triggers the purchase of product B, we need to check the number of times product B is bought when the customer buys product A.

Life: Lift is a measure of the importance of the rule. It compares confidence of a rule against expected confidence. So, a rule with a higher value of Lift is the better. The lift value close to one indicates a redundant rule.

 

Find Rule with high Support Values

Rules - “{41, 48} => {39}” and “{170} => {38}” have higher support values, meaning     many transactions have these product combinations in the transaction data. But the confidence level for these rules is lower than one.

 

Find rules with high confidence values.

Find rules with high Lift values

 

Actionable Insights

Based on Support, Confidence, and Lift values we can select a list of rules. These rules have to be analyzed for insights and actions.

We can have new hypotheses as well. We can say what are products or product combinations bought by these customers who have bought a specific product as a second product. The business may want to identify customers who can be targeted for “Product 38”, now they are looking target list of customers based on the association of between product take-up

The second type of hypothesis can be based on first product selection, what product to be targeted when we know the first product select by a customer.

 

Reference

1. Data Source: http://fimi.ua.ac.be/data/retail.dat

  • SOCIAL SHARE :

0 Comments

Leave a Comment

Your email address will not be published. Required fields are marked *