Market Basket Analysis – Key Performance Statistics

Market Basket Analysis (MBA) is an analysis to understand product bought together in a retail transaction or customer visit in a store. A number of blogs on a brief overview on Market Basket Analysis for a retail , a few published case studies of market basket analysis and step by step approach to Market Basket Analysis using R.

In this blog, the focus is to explain calculation details of the key performance statistics involved in Market Basket Analysis (MBA).

Dummy Scenario Example

Transaction

Products

1

A

2

B,C

3

A,C

4

D,A

5

A,C

6

A,B,C

7

C

One of the first steps in Market Basket Analysis (MBA) is to find frequency of each product sold in a store or online.

In this example the Frequency and Most Frequent Products are as follow

Product

Frequency
or Count

A

5

B

2

C

5

D

1

For finding rules, we can create all possible sets of product take up. Considering, there are 4 products,;
hence there can be 4C1 + 4C2 +4C3+4C4 = 4+6+4+1 = 15 different rules (without considering sequence, meaning A,B are
same as B,A). These rules or product combinations are

Rules

Availability
in current data

A

B

X

C

D

A,B

X

A,C

A,D

B,C

B,D

X

C,D

X

A,B,C

A,B,D

X

A,C,D

X

B,C,D

X

A,B,C,D

X

Support

In a traditional Market Basket Application scenario, support is number ( or percent) of transactions containing a particular product. Support is also looked from probability perspective –probability of transaction having a product.

Support (A) = % of Transaction with product A

                       = Count of Transaction which has Product /Total Number of Transaction

P(A)              = Probability of a transaction having product,A

                         =5/7

Product

Frequency
or Count

Support

%
Support

A

5

5/7

71%

B

2

2/7

29%

C

5

5/7

71%

D

1

1/7

14%

In Market Basket Analysis or Affinity Analysis, more than single products, the combinations of product more important. If we establish a
particular combination is more prevalent, the insights could be used for cross sell or assortment planning. Having said
that support is also calculated that for single product occurrences.

Support for an association rule is % of occurrences or the transactions for the association rule.

In the above example/scenario, support for each of the rules

Rules

Support
(Count)

Support(%)

A,C

3

3/7 = 43%

A,D

1

1/7 = 14%

B,C

2

2/7 = 29%

A,B,C

1

1/7 = 14%

So {A,C} combination is present is 3 transactions out of overall 7 transactions, giving support of 43% . Support
can both in count and percentage, but unless specified it is represented in percent. Higher value of support indicates higher importance of an association rule.

In probability terms, support for rule is {B → C} = Probability of having Product B and C

                                                                                           = P(B ᴗ C)

Confidence

Support of an association rule is one of the basic requirement before checking next set of association rule performance. Another,
key performance statistics is Confidence.

Confidence shows occurrence of one product given occurrence of another product or set of products

For the rule {B → C}, shows that the percentage of transactions containing B which also contain C. It is conditional probability terms, Probability of product C given occurrence of Product B

Confidence {B → C} = P( C/B)

                                       =Support (B and C)/Support(B)

                                       =P(B ᴖC) / P(B)

                                       = (2/7)/(2/7)

                                        = 1.00 

Rules

Support(%)

Confidence (Formula)

Confidence

A,C ( A →C)

3/7 = 43%

P(AᴖC)/P(A)

43% /71% = 61%

A,D (A →D)

1/7 = 14%

P(AᴖD)/P(A)

14%/71% =20%

B,C (B →C)

2/7 = 29%

P(BᴖC)/P(B)

29%/29% = 100%

A,B,C ({A,B} →C)

1/7 = 14%

P(AᴖBᴖC)/P(AᴖB)

14%/14% = 100%

A,B,C ({B,C} →A)

1/7 = 14%

P(AᴖBᴖC)/P(BᴖC)

14%/29% = 50%

A,B,C ({A,C} →B)

1/7 = 14%

P(AᴖBᴖC)/P(AᴖC)

14%/43% = 33%

 

Each rule has two sides – LHS (Left Hand Side) and RHS (Right Hand Side). In an association, we interpret that RHS happens given LHS. For Example in {A,B}→ C, LHS is {A,B} and RHS is {C}.


Rule is that Product is taken up when a customer buys product A and B.

In summary, Support gives percent occurrences of a rule and confidence gives strength of dependency/association
of happening.

In the example of {A,B} → C, How many times all 3 products {A,B,C} are bought together? This support for the rule. Second, we are making inferences that when customers buys {A,B} then it is high likely that they also buy C. For measuring this, Confidence is derived. It captures
that when customers bought product {A,B}, how many times or percentage they also bought product C.

Lift

Lift is a measure of the improvement in the occurrence due to an association rule. In a non-technical definition, it is comparison of happening
due to rule and happening otherwise.

For example, (B →C) states that if a customer buys B then it has higher chances of buying product C. Now, question comes is that
whether it is due to buying product B or buying product C in general has higher chances of being bought?

So, Lift is developed to compare whether a product is bought due to a product /LHS or it is due to higher probability of being bought anyway.

Lift is defined as ratio of the conditional probability of the RHS given the LHS, divided by the unconditional probability of the RHS.

Lift = P(RHS/LHS)/P(RHS)

Example - B →C

Lift (B →C) = P(C/B)/P(C)

Rules

Support(%)

Confidence (Formula)

Confidence

Lift (Formula)

Lift

A,C ( A →C)

3/7 = 43%

P(AᴖC)/P(A)

43% /71% = 60%

P(C/A)/P(C)

84%

A,D (A →D)

1/7 = 14%

P(AᴖD)/P(A)

14%/71% =20%

P(D/A)/P(D)

140%

B,C (B →C)

2/7 = 29%

P(BᴖC)/P(B)

29%/29% = 100%

P(C/B)/P(C)

140%

A,B,C ({A,B} →C)

1/7 = 14%

P(AᴖBᴖC)/P(AᴖB)

14%/14% = 100%

P(C/{A,B})/P(C)

140%

A,B,C ({B,C} →A)

1/7 = 14%

P(AᴖBᴖC)/P(BᴖC)

14%/29% = 50%

P(A/{B,C})/P(A)

70%

A,B,C ({A,C} →B)

1/7 = 14%

P(AᴖBᴖC)/P(AᴖC)

14%/43% = 33%

P(B/{A,C})/P(B)

117%

For the above example, we can use to read the dummy data, converting to transactions (required for association analysis) and finding
support, confidence and lift of the rules.

 

R Code for the Market Analysis

1 thought on “Market Basket Analysis – Key Performance Statistics

Leave a Comment