R programming language resources › Forums › Data manipulation › How to get items for both LHS and RHS for only specific columns in arules?
- This topic has 0 replies, 1 voice, and was last updated 10 years, 3 months ago by
kim.
- AuthorPosts
- January 15, 2015 at 12:59 pm #1097
kim
MemberHi all,
I have a question about the arules package in R.Within the apriori function in the arules package, I want the outcome to only contain these two variables in the LHS HouseOwnerFlag=0 and HouseOwnerFlag=1. The RHS should only contain attributes from the column Product. For instance:
(for the sake of readability, I omitted the colomns for support, lift, confidence)lhs rhs
1 {HouseOwnerFlag=0} => {Product=SV 16xDVD M360 Black}
2 {HouseOwnerFlag=1} => {Product=Adventure Works 26″ 720p}
3 {HouseOwnerFlag=0} => {Product=Litware Wall Lamp E3015 Silver}
4 {HouseOwnerFlag=1} => {Product=Contoso Coffee Maker 5C E0900}So now I use the following:
rules <- apriori(sales, parameter=list(support =0.01, confidence =0.8, minlen=2), appearance = list(lhs=c("HouseOwnerFlag=0", "HouseOwnerFlag=1")))
Then I use this to ensure that only the Product column is on the RHS:
inspect( subset( rules, subset = rhs %pin% "Product=" ) )
The outcome is like this:
lhs rhs
1 {ProductKey=153, IncomeGroup=Moderate, BrandName=Adventure Works } => {Product=SV 16xDVD M360 Black}
2 {ProductKey=176, MaritalStatus=M, ProductCategoryName=TV and Video } => {Product=Adventure Works 26″ 720p}
3 {BrandName=Southridge Video, NumberChildrenAtHome=0 } => {Product=Litware Wall Lamp E3015 Silver}
4 {HouseOwnerFlag=1, BrandName=Southridge Video, ProductKey=170 } => {Product=Contoso Coffee Maker 5C E0900}So apparently the LHS is able to contain every possible column, not just HouseOwnerFlag like I specified. I see that I can put default=”rhs” in the apriori function to prevent this, like so:
rules <- apriori(sales, parameter=list(support =0.001, confidence =0.5, minlen=2), appearance = list(lhs=c("HouseOwnerFlag=0", "HouseOwnerFlag=1"), default="rhs"))
Then upon inspecting (without the subset part, just inspect(rules), there are far less rules (7) than before but it does indeed only contain
HouseOwnerFlag in the LHS:lhs rhs
1 {HouseOwnerFlag=0} => {MaritalStatus=S}
2 {HouseOwnerFlag=1} => {Gender=M}
3 {HouseOwnerFlag=0} => {NumberChildrenAtHome=0}
4 {HouseOwnerFlag=1} => {Gender=M}However on the RHS there’s nothing from the column Product in the RHS. So it has no use to inspect it with subset as ofcourse it would return null. I tested it several times with different support numbers to experiment and see if Product would appear or not, but the 7 same rules remain the same.
So my question is, how can I specify both the LHS (HouseOwnerFlag) and RHS (Product)? What am I doing wrong?
You can reproduce this problem by downloading this via this link:
https://www.dropbox.com/s/tax5xalac5xgxtf/testdf.txt?dl=0
Mind you, I only took the first 20 rows from a huge dataset (12 million), so the output here won’t have the same product names as the example I displayed above. But the problem still remains the same. (if you would like to have the entire dataset I can email or upload it ofcourse). I want to be able to get only HouseOwnerFlag=0 and/or HouseOwnerFlag=1 on the LHS and the column Product on the RHS.I asked this question on other forum before, but no response at all unfortunately.
Thanks in advance! I look forward to hear from you.
Kim
- AuthorPosts
- You must be logged in to reply to this topic.