How to get items for both LHS and RHS for only specific columns in arules?‏‏

R programming language resources Forums Data manipulation How to get items for both LHS and RHS for only specific columns in arules?‏‏

Tagged: ,

Viewing 1 post (of 1 total)
  • Author
    Posts
  • #1097
    kim
    Member

    Hi all,
    I have a question about the arules package in R.

    Within the apriori function in the arules package, I want the outcome to only contain these two variables in the LHS HouseOwnerFlag=0 and HouseOwnerFlag=1. The RHS should only contain attributes from the column Product. For instance:
    (for the sake of readability, I omitted the colomns for support, lift, confidence)

    lhs rhs
    1 {HouseOwnerFlag=0} => {Product=SV 16xDVD M360 Black}
    2 {HouseOwnerFlag=1} => {Product=Adventure Works 26″ 720p}
    3 {HouseOwnerFlag=0} => {Product=Litware Wall Lamp E3015 Silver}
    4 {HouseOwnerFlag=1} => {Product=Contoso Coffee Maker 5C E0900}

    So now I use the following:
    rules <- apriori(sales, parameter=list(support =0.01, confidence =0.8, minlen=2), appearance = list(lhs=c("HouseOwnerFlag=0", "HouseOwnerFlag=1")))

    Then I use this to ensure that only the Product column is on the RHS:
    inspect( subset( rules, subset = rhs %pin% "Product=" ) )

    The outcome is like this:
    lhs rhs
    1 {ProductKey=153, IncomeGroup=Moderate, BrandName=Adventure Works } => {Product=SV 16xDVD M360 Black}
    2 {ProductKey=176, MaritalStatus=M, ProductCategoryName=TV and Video } => {Product=Adventure Works 26″ 720p}
    3 {BrandName=Southridge Video, NumberChildrenAtHome=0 } => {Product=Litware Wall Lamp E3015 Silver}
    4 {HouseOwnerFlag=1, BrandName=Southridge Video, ProductKey=170 } => {Product=Contoso Coffee Maker 5C E0900}

    So apparently the LHS is able to contain every possible column, not just HouseOwnerFlag like I specified. I see that I can put default=”rhs” in the apriori function to prevent this, like so:
    rules <- apriori(sales, parameter=list(support =0.001, confidence =0.5, minlen=2), appearance = list(lhs=c("HouseOwnerFlag=0", "HouseOwnerFlag=1"), default="rhs"))

    Then upon inspecting (without the subset part, just inspect(rules), there are far less rules (7) than before but it does indeed only contain
    HouseOwnerFlag in the LHS:

    lhs rhs
    1 {HouseOwnerFlag=0} => {MaritalStatus=S}
    2 {HouseOwnerFlag=1} => {Gender=M}
    3 {HouseOwnerFlag=0} => {NumberChildrenAtHome=0}
    4 {HouseOwnerFlag=1} => {Gender=M}

    However on the RHS there’s nothing from the column Product in the RHS. So it has no use to inspect it with subset as ofcourse it would return null. I tested it several times with different support numbers to experiment and see if Product would appear or not, but the 7 same rules remain the same.

    So my question is, how can I specify both the LHS (HouseOwnerFlag) and RHS (Product)? What am I doing wrong?

    You can reproduce this problem by downloading this via this link:
    https://www.dropbox.com/s/tax5xalac5xgxtf/testdf.txt?dl=0
    Mind you, I only took the first 20 rows from a huge dataset (12 million), so the output here won’t have the same product names as the example I displayed above. But the problem still remains the same. (if you would like to have the entire dataset I can email or upload it ofcourse). I want to be able to get only HouseOwnerFlag=0 and/or HouseOwnerFlag=1 on the LHS and the column Product on the RHS.

    I asked this question on other forum before, but no response at all unfortunately.

    Thanks in advance! I look forward to hear from you.

    Kim

    • This topic was modified 5 years, 6 months ago by kim.
    • This topic was modified 5 years, 6 months ago by kim.
Viewing 1 post (of 1 total)
  • You must be logged in to reply to this topic.