- Published on
I came across an interesting problem at work. The company I work for has a dataset full of items, and they needed to find what the optimal combinations of these items would be. This sounds like a simple problem, but when you are dealing with large spreadsheets, this task may be a little daunting. No fear! There are multiple ways to tackle combination problems in R, and today I will showcase a few methods of doing so. Lets first make a sample data set.
data <- data.frame(
category = c("Tshirt","Tshirt","Tshirt","Tshirt","Cup","Cup","Cup","Cup","Bag","Bag","Bag","Bag"),
item = c("RedShirt",
"GreenShirt",
"BlueShirt",
"YellowShirt",
"CeramicCup",
"ClayCup",
"RubberCup",
"CoffeeCup",
"BigBag",
"SmallBag",
"MediumBag",
"JumboBag"),
price = c(10,12,8,15,7,5,8,10,25,15,20,30))
view(data)
#To make combinations for each item in each category, we need to filter this data into separate vectors for the expand grid function. So lets make some filters.
Tshirt <- filter(data, category == "Tshirt")
Cup <- filter(data, category == "Cup")
Bag <- filter(data, category == "Bag")
#Now we can finally use the expand.grid() function.
combinations <- expand.grid(Bag$item,Cup$item,Tshirt$item)
view(combinations)
#Let's make another example data frame.
classes <- c("A+","A-","A","B+","B-","B","C+","C-","C","D+","D-","D","F")
#This is all the possible grades possible for classes... I am not sure if D+ or D- is a thing but let's try to see what's possible with these letters.
#here we are going to use Combn.
ncomb <- combn(classes, 5)
# 5 is the length of each combination, with is the useful part of combn. expand.grid makes the combination as long as the amount of vectors. n^z where "n" is the number of rows in the list and "z" is the amount of columns.
view(ncomb)
here is how it works with expand.grid()
n2comb <- expand.grid(data2,data2,data2)
#data2 has 13 rows. And since we are using data2 three times, the output will have 13^3 = 2917 cells.
view(n2comb)
data3 <- c("A","B","C","D","E")
data4 <- c("a","b","c","d","e")
n3comb <- outer(data3, data4, FUN = "paste", sep = "")