We use data science to uncover various patterns in data. These patterns are then utilized for purposes such as product recommendations for online shoppers and complementary product placement for offline shoppers in supermarkets. These techniques aim to boost sales and increase revenue.
But how do algorithms determine which items to recommend and which to place together for optimal sales? This is where Apriori algorithm comes into play. We use this algorithm to find product associations and determine the most effective product recommendations and placement strategies.
After going through this blog, you will learn following things:
Supermarkets associate products smartly such that users buy multiple items simultaneously. These types of association of products are determined by association rule mining, and Apriori is a type of this.
Association rule mining is a method for discovering relationships among items. In a supermarket, for instance, there is often a pattern of items being purchased together. For example, bachelors might buy beer and chips together, and mosquito repelling liquid, and its corresponding machine are placed together so that customers will buy both.
Several techniques are present today to find this association rule, and apriori is one of them, which we will discuss in detail.
Supermarkets contain thousands of products, and they record data for every transaction that happens. The goal is to find the product association rule by looking into the transaction data. Most association rule mining algorithms form all possible combinations/associations among products and then loop through each transaction to find the best association rules.
For example, if a Supermarket has four products, namely A, B, C, and D, then possible combinations would be: {A, B}, {A, C}, {A, D}, {B, C}, {B,D}, {C, D}, {A, B, C}, {B, C, D}, {A, C, D}, {A,B,D}, {A,B,C,D}.
General algorithms consider all these combinations and then try to find their occurrence in every transaction. Or, they loop over all the transactions and then again loop over all possible combinations to count the number of occurrences in every transaction. This becomes highly inefficient when the number of products is in the thousands.
Apriori Algorithm provides an efficient approach for this task. It loops over the same data multiple times and extracts the best-associated products. But during its first loop, it simply counts the occurrences of various products to find the large 1-itemsets. 1-itemset for the four products (A, B, C, D) will be: {A}, {B}, {C}, {D}.
If the number of occurrences of these products in transactions is less than some threshold, the apriori algorithm will discard that product from all the combinations. This saves a lot of time and makes the algorithm faster. Let's see the step-wise working of this algorithm in detail.
Let's create hypothetical data where each transaction from the supermarket is maintained separately.
T1: Bread Butter
T2: bread Cheese
T3: Bread Cheese Butter
T5: T4: Milk Butter
T6: Milk Biscuits
T7: Potato chips Beer
T8: egg flour
T9: Beer Soft drinks
T10: Chocolates Toffees
T11: Chocolates Waffles
T12: Bread Butter Potato Chips
T13: Bread Butter Milk
...
...
There can be thousands of possible associations, and hence there is a need for metrics to say which associations are better. The Apriori algorithm calculates three parameters support, confidence, and lift. If these values for an association are greater than pre-defined thresholds, that association can be selected. Let's learn these thresholds in great detail.
It refers to the popularity of any item, which is nothing but counting several transactions containing that item. For example, Support of A means how often item A got purchased in all the transactions. To keep this number easily representable, we divide the count by the total items bought. For example, if bread, butter, and bread + butter are bought in 200, 150, and 100 transactions, respectively, in a day when 1000 transactions happened. Then support for these associations will be:
Support(A) = Transactions containing item A/ Transactions containing total items
Support(Bread) = Total bread bought/ Total items sold in supermarket
= 200 / 1000 = 0.2
Support(Butter) = 150/1000 = 0.15
Support (Bread + Butter ) = 100/1000 = 0.1
Confidence refers to the probability that item B will be bought if it is given that the user has purchased item A. It is the same as the conditional probability P(B/A) = P(A and B)/P(A) and denoted by confidence (A → B). For example, if A and B both are bought in 100 transactions and A is bought in 200 transactions, then confidence (A → B) would be:
Confidence(A --> B) = Transaction containing both A and B / Transaction only containing A
Confidence (Bread --> Butter) = Transaction having both bread and butter / transaction only containing bread
= 100/200 = 0.5
Even applying thresholds for Support and Confidence, we can have thousands of association rules to be filtered, which is still huge. That's why we need Lift(A → B), which refers to any association rule's strength. It is calculated by dividing confidence(A → B) by Support of B, or Support of (A, B) divided by Support(A)*Support(B). Support(A) and Support(B) in the denominator talks about the independent occurrence of A and B in transactions. If the denominator of lift is high, it means the purchase is happening because of randomness, and there is no such association rule.
Lift(A --> B) = Confidence(A --> B) / Support(B)
Lift(Bread --> Butter) = Confidence(Bread --> Butter) / Support(Butter)
= 0.5/0.15 = 33.33
Lift gives us an idea about which association rules we should consider ahead. If the lift value is 1, then products are independent of each other, and that rule can be discarded.
Here, we will calculate the support of every item in our dataset.
ITEMS | Frequency Count
-------------------------------------------------------------------
Butter | 5
Milk | 3
Potato Chips | 2
Beer | 2
Flour | 1
Egg | 1
Chocolates | 2
Waffles | 1
Toffees | 1
Cheese | 2
Bread | 5
The next step is to decide a threshold value below which items will be discarded in forming association rules. Let's take the threshold value as 3. The items that are left will be:
Bread 5
Butter 5
Milk 3
All other items, like cheese and eggs having support values less than the threshold, are ignored.
Only those items will be considered to form combinations not discarded in step 2: butter, bread, and Milk. Possible combinations would be:
Item 1 | Item 2 | Frequency Count
--------------------------------------------------------------
Bread | Butter | 4
Bread | Milk | 1
Butter | Milk | 2
Please note that a combination like bread and cheese is not considered as cheese is an item that does not have support above the decided threshold. Also, here only combination above the threshold is bread and butter, which has support above 3.
Now we will build larger combinations with three items, including bread and butter (the only combination of two items we had above the threshold). There are two possible combinations:
Hence, all the possible combinations are eliminated here.
We also need to find other metrics mentioned before, lift and confidence. These additional matrices will allow us to choose the suitable Association Rules. We must rely on more than one metric in finalizing the rules. For example, suppose a particular rule has a higher lift value but lower confidence than another; we will have to figure out which rule to choose based on additional criteria. Rules passing the most metrics will be finalized.
Too much theory; for now, let's see the complete implementation in Python.
There is a suitable package for using the Apriori algorithm in Python, which can be installed using this command:
pip install efficient-apriori
The data needs to be prepared for applying the apriori algorithm. We pass a list containing each transaction as a tuple of items. Let's take the same dataset:
transactions= [
("Bread","Butter"),
("bread","Cheese"),
("Bread","Cheese","Butter"),
("Milk","Butter"),
("Milk" , "biscuits"),
("Potato chips","Beer"),
("flour","egg"),
("Beer","Soft drinks"),
("Chocolates","Toffees"),
("Chocolates","Waffles"),
("Bread","Butter","Potato Chips"),
("Bread","Butter","Milk")
]
The next step would be to set up parameters in the algorithm. The two main parameters required are:
The Python code for the algorithm is mentioned below
# our min support is 3 which has to be expressed as a percentage while using efficient-apriori
min_support = 3/len(transactions)
# min confidence allows you to delete rules with low confidence. Here we set it 0 for now
min_confidence = 0
itemsets, rules = apriori(transactions, min_support=min_support, min_confidence=min_confidence)
the output of the algorithm can be seen using the set of commands.
print(itemsets)
## Output
{1: {('Bread',): 4, ('Butter',): 5, ('Milk',): 3}, 2: {('Bread', 'Butter'): 4}}
Please note that the above output has two keys, 1 and 2. These 1 and 2 represent the number of items considered at a time to get support. The results match our expectation as key 1 has only three values ([bread, butter, Milk]). These values are the same that we selected after keeping a minimum threshold of 3.
For key 2, there is only one item, as we saw before, which passes our successive criteria (Bread, butter)
Let us inspect the rules and their metrics.
for rule in rules:
print(rule)
## Output
{Butter} -> {Bread} (conf: 0.800, supp: 0.333, lift: 2.400, conv: 3.333)
{Bread} -> {Butter} (conf: 1.000, supp: 0.333, lift: 2.400, conv: 583333333.333)
Let's understand one rule in plain English.
{Butter} -> {Bread} (conf: 0.800, supp: 0.333, lift: 2.400, conv: 3.333)
It means that the rule (Butter → Bread) has a confidence of 0.8 and a lift of 3.3333. These high values tell us that customers are more likely to buy them together, so they should be kept together in the supermarket. One can observe an additional Conv parameter in the output representing the conviction in the rule. The greater the conviction, the higher the interest in the association rule.
We have discovered a foundational association rule mining algorithm and how we can apply it to transactions in Python. This was the most basic algorithm with its advantages and disadvantages. For more advanced results, we can use other algorithms available under association rule mining, including AIS, SETM, and variations of basic Apriori.
Credits: Thanks to Harshit Agarwal for creating the first version of this content