![]() |
Source: Slideplayer |
The task was to recommend set of books to promote together for a specific customer group. Also, to give recommendations to the CEO so that he/she can make the right decision with the aim to improve decision making and stay profitable. The good thing about this project is that the sample dataset was given to us so all we needed to do was work with the data and provide the necessary recommendations.
WEKA was the recommended software we were asked to use to solve this problem. I will be breaking down the steps we used to solve this problem in WEKA accompanied with screenshots of our outputs.
1) We launched WEKA application
2) We imported the given dataset. Note: The dataset we were given was in ARFF format
3) The first thing to note in this kind of problem is that, it is a recommendation problem which obviously streamlined our decision on the algorithm to use.
4) We decided to use Association mining rule. We applied the Apriori Algorithm. Why? Simply because this rule/algorithm are known to be used for finding relationships between frequent itemsets, correlations and associations.
5) After deciding on the rule that was applicable to our project, we needed to be sure there were no missing data in our dataset and also know the type of data that we were given which is called pre-processing stage.
A. Pre-processing Stage
1) Our data had 599 instances with 12 attributes.
2) The data was initially a numeric (REAL) data type. However, in order to be able to apply Apriori algorithm, the data type was converted to nominal using 'NumericToNominal' unsupervised filter in WEKA.
Raw Data Visualization in WEKA |
Data after Conversion in WEKA |
3) The 'ID Transaction' attribute was removed simply because it does not add any value to the data mining approach.
4) At the end of the pre-processing stage, the dataset consisted of 11 attributes with either 0s or 1s. Where 0 indicated that the item was not bought and 1 indicated that an item was bought.
B. Analysis Stage
1) In this stage, the Apriori algorithm was applied on the dataset. However, we discovered that WEKA built the model based on only unpurchased items 😕 which was not our intention. Our aim is to give recommendations based on purchased items or 1s. How then we do we move forward from here😖
2) WEKA of course has a feature to solve this which was what we applied and voila we got some juicy outputs to work with😋. What then is this feature?
3) Well, in the Apriori algorithm settings there is a feature called "treatZeroAsMissing" which by default is set to "False" so we set this feature to "True" and yes 💪 that was it.
4) We reran the algorithm but no best rules were found at the default 'minMetric' of 0.9, which indicated that no best rules were found at a 90% confidence.
5) However, we reduced the 'minMetric' to 0.8, 0.7 and even 0.6.and we were able to get some really good combinations which we used for our recommendation/ solution to the problem.
C. Analysis of Results
1) Each of the rules that we found contained ‘A=>C’ which means that if a set of antecedents (A) are purchased, then there is a probability that Consequent (C) will also be purchased. For example, for the output at 0.8 confidence level, 78 transactions contained purchase of a Youthbook and a Cookbook (A). Out of those transactions, 67 instances contained a Childbook (C). The latter part is referred to as ‘Support’ for the Consequent. The Confidence score shows how confident the association rule is, given the dataset. It is calculated as: C/A: 67 / 78 = 0.86.
2) Another interesting parameter is the ‘Lift’ which is defined as how likely it is to have all antecedents and consequent in one single transaction in comparison to the entire transaction dataset. Basically the larger the lift ratio, the more significant the association of the itemset. In order to calculate Lift, first we needed to figure out the ‘Expected Confidence’ which is the probability of the purchase of consequent regardless of the antecedents. As an example, looking at the first rule (0.8 confidence), the total number of transactions containing Childbook (250) divided by the total number of transactions (599): 250/ 599 = 0.417362270 (approximately 0.42)
3) After calculating the Expected Confidence, the Lift can then be calculated. This is the ratio of the confidence and the expected confidence: 0.86 / 0.417362270 = 2.06
4)With the confidence score of 86% and the lift score of 2.06, this rule can be considered as a strong association. That is just the analysis of one rule. I wouldn't be going through all the analysis of all the rules in this article.
4) We reran the algorithm but no best rules were found at the default 'minMetric' of 0.9, which indicated that no best rules were found at a 90% confidence.
Output at 0.8 confidence level (3 best rules found) |
Output at 0.7 confidence level (10 best rules found) |
Output at 0.6 confidence level (10 best rules found) |
C. Analysis of Results
1) Each of the rules that we found contained ‘A=>C’ which means that if a set of antecedents (A) are purchased, then there is a probability that Consequent (C) will also be purchased. For example, for the output at 0.8 confidence level, 78 transactions contained purchase of a Youthbook and a Cookbook (A). Out of those transactions, 67 instances contained a Childbook (C). The latter part is referred to as ‘Support’ for the Consequent. The Confidence score shows how confident the association rule is, given the dataset. It is calculated as: C/A: 67 / 78 = 0.86.
2) Another interesting parameter is the ‘Lift’ which is defined as how likely it is to have all antecedents and consequent in one single transaction in comparison to the entire transaction dataset. Basically the larger the lift ratio, the more significant the association of the itemset. In order to calculate Lift, first we needed to figure out the ‘Expected Confidence’ which is the probability of the purchase of consequent regardless of the antecedents. As an example, looking at the first rule (0.8 confidence), the total number of transactions containing Childbook (250) divided by the total number of transactions (599): 250/ 599 = 0.417362270 (approximately 0.42)
3) After calculating the Expected Confidence, the Lift can then be calculated. This is the ratio of the confidence and the expected confidence: 0.86 / 0.417362270 = 2.06
4)With the confidence score of 86% and the lift score of 2.06, this rule can be considered as a strong association. That is just the analysis of one rule. I wouldn't be going through all the analysis of all the rules in this article.
5) After building the model, these three best rules were found by Weka:
- If a Youthbook and a Cookbook are purchased in one transaction, there is 86% confidence that a Childbook will be purchased
- If a Cookbook and a Refbook are purchased in one transaction, there is 83% confidence that Childbook will be purchased
- If a Cookbook and a Geogbook are purchased in one transaction, there is 82% confidence that Childbook will be purchase
D. Our Recommendation
Based on our analysis and the results from WEKA, the decision/business model that we would recommend is that: since Childbook has a relatively high correlation with Youthbooks, Cookbooks, Refbooks and GeogBooks, then they can be promoted together.
In conclusion, when trying to solve a data mining problem. There are several ways to go about it. There are also several ways to interpret your results after your analysis. However, I would recommend that you understand the basis behind WEKA if you are not familiar with it. This will give you a better understanding of whatever project you are given and diferent ways to go about it.
I hope this article helps someone out there trying to get a hang of a similar project. Below are the list of some useful links that were very useful for us while we were solving this problem.
1) Building a market basket model
2) Market Basket Analysis with Association Rule Learning
3) Lift in Association Rule
Hey before you go, if you like this article, consider buying me a coffee by clicking here. Until next time...💋
1) Building a market basket model
2) Market Basket Analysis with Association Rule Learning
3) Lift in Association Rule
Hey before you go, if you like this article, consider buying me a coffee by clicking here. Until next time...💋
Really nice and interesting post. I was looking for this kind of information and enjoyed reading this one. Keep posting. Thanks for sharing.
ReplyDeleteExcelR Data Analytics Course
Data Science Interview Questions
ExcelR Data Science Course
I just got to this amazing site not long ago. I was actually captured with the piece of resources you have got here. Big thumbs up for making such wonderful blog page!
ReplyDeletebusiness analytics course
data analytics courses
data science interview questions
data science course in mumbai
For more info :
ExcelR - Data Science, Data Analytics, Business Analytics Course Training in Mumbai
304, 3rd Floor, Pratibha Building. Three Petrol pump, Opposite Manas Tower, LBS Rd, Pakhdi, Thane West, Thane, Maharashtra 400602
18002122120
Excellent Blog! I would like to thank for the efforts you have made in writing this post. I am hoping the same best work from you in the future as well. I wanted to thank you for this websites! Thanks for sharing. Great websites!
ReplyDeleteorthodontist in bangalore
I wanted to leave a little comment to support you and wish you a good continuation. Wishing you the best of luck for all your blogging efforts.
ReplyDeleteKnow more about Data Analytics
wonderful article. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. This article resolved my all queries.
ReplyDeleteData Science Course
Awesome blog. I enjoyed reading your articles. This is truly a great read for me. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work!
ReplyDeletedata science course
I like viewing web sites which comprehend the price of delivering the excellent useful resource free of charge. I truly adored reading your posting. Thank you!
ReplyDeleteSimple Linear Regression
Awesome blog. I enjoyed reading your articles. This is truly a great read for me. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work!
ReplyDeleteData Science Course in Pune
Data Science Training in Pune
It's really nice and meanful. it's really cool blog. Linking is very useful thing.you have really helped lots of people who visit blog and provide them usefull information.
ReplyDeleteData Science Institute in Bangalore
I have express a few of the articles on your website now, and I really like your style of blogging. I added it to my favorite’s blog site list and will be checking back soon…
ReplyDeleteData Scientist Courses This is a great inspiring article.I am pretty much pleased with your good work.You put really very helpful information...
Very interesting blog. Many blogs I see these days do not really provide anything that attracts others, but believe me the way you interact is literally awesome.You can also check my articles as well.
ReplyDeleteData Science In Banglore With Placements
Data Science Course In Bangalore
Data Science Training In Bangalore
Best Data Science Courses In Bangalore
Data Science Institute In Bangalore
Thank you..
Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
ReplyDeleteCorrelation vs Covariance
Simple linear regression
data science interview questions
Really nice and interesting post. I was looking for this kind of information and enjoyed reading this one. Keep posting. Thanks for sharing.
ReplyDelete360DigiTMG
Very impressive and interesting blog found to be well written in a simple manner that everyone will understand and gain the enough knowledge from your blog being more informative is an added advantage for the users who are going through it. Once again nice blog keep it up.
ReplyDelete360DigiTMG Artificial Intelligence Course
I am really enjoying reading your well written articles. It looks like you spend a lot of effort and time on your blog. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work.
ReplyDelete360DigiTMG
I wanted to leave a little comment to support you and wish you a good continuation. Wishing you the best of luck for all your blogging efforts.
ReplyDeleteData Analytics course PuneI am a new user of this site so here i saw multiple articles and posts posted by this site,I curious more interest in some of them hope you will give more information on this topics in your next articles.
Amazing Article ! I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
ReplyDeleteSimple Linear Regression
Correlation vs covariance
data science interview questions
KNN Algorithm
Logistic Regression explained
It’s very informative and you are obviously very knowledgeable in this area. You have opened my eyes to varying views on this topic with interesting and solid content.
ReplyDeleteData Analyst Course
Amazing Article ! I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
ReplyDeleteCorrelation vs Covariance
Simple Linear Regression
data science interview questions
KNN Algorithm
Logistic Regression explained
very well explained .I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
ReplyDeleteSimple Linear Regression
Correlation vs covariance
data science interview questions
KNN Algorithm
Logistic Regression explained
very well explained. I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
ReplyDeleteLogistic Regression explained
Correlation vs Covariance
Simple Linear Regression
data science interview questions
KNN Algorithm
very well explained. I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
ReplyDeleteLogistic Regression explained
Correlation vs Covariance
Simple Linear Regression
data science interview questions
KNN Algorithm
ReplyDeleteThank you for helping people get the information they need. Great stuff as usual. Keep up the great work!!!
360digitmg
Really exciting to see this blog. I would like to appreciate you for the efforts you had performed in writing this impressive article.
ReplyDeleteadvantages of ai
applications of net
what is hadoop
list of devops tools
selenium interview questions and answers for experienced