R - merge/combine columns with same name but incomplete data -
i have 2 data frames have columns same names , others different names. data frames this:
df1 id hello world hockey soccer 1 1 na na 7 4 2 2 na na 2 5 3 3 10 8 8 23 4 4 4 17 5 12 5 5 na na 3 43 df2 id hello world football baseball 1 1 2 3 43 6 2 2 5 1 24 32 3 3 na na 2 23 4 4 na na 5 15 5 5 9 7 12 23
as can see, in 2 of shared columns ("hello" , "world"), of data in 1 of data frames , rest in other.
what trying (1) merge 2 data frames "id", (2) combine data "hello" , "world" columns in both frames 1 "hello" column , 1 "world" column, , (3) have final data frame contain of other columns in 2 original frames ("hockey", "soccer", "football", "baseball"). so, want final result this:
id hello world hockey soccer football baseball 1 1 2 3 7 4 43 6 2 2 5 3 2 5 24 32 3 3 10 8 8 23 2 23 4 4 4 17 5 12 5 15 5 5 9 7 3 43 12 23
i'm pretty new @ r codes i've tried variations on merge
, i've tried answer found here, based on similar question: r: merging copies of same variable. however, data sets bigger i'm showing here (there's 20 matching columns (like "hello" , "world") , 100s of non-matching ones (like "hockey" , "football")) i'm looking won't require me write them out manually.
any idea if can done? i'm sorry can't provide sample of efforts, don't know start besides:
mydata <- merge(df1, df2, by=c("id"), = true)
to reproduce data frames:
df1 <- structure(list(id = c(1l, 2l, 3l, 4l, 5l), hellow = c(2, 5, na, na, 9), world = c(3, 1, na, na, 7), football = c(43, 24, 2, 5, 12), baseball = c(6, 32, 23, 15, 23)), .names = c("id", "hello", "world", "football", "baseball"), class = "data.frame", row.names = c(na, -5l)) df2 <- structure(list(id = c(1l, 2l, 3l, 4l, 5l), hellow = c(na, na, 10, 4, na), world = c(na, na, 8, 17, na), hockey = c(7, 2, 8, 5, 3), soccer = c(4, 5, 23, 12, 43)), .names = c("id", "hello", "world", "hockey", "soccer"), class = "data.frame", row.names = c(na, -5l))
here's approach involves melt
ing data, merging molten data, , using dcast
wide form. i've added comments understand going on.
## required packages library(data.table) library(reshape2) dcast.data.table( merge( ## melt first data.frame , set key id , variable setkey(melt(as.data.table(df1), id.vars = "id"), id, variable), ## melt second data.frame melt(as.data.table(df2), id.vars = "id"), ## you'll have 2 value columns... = true)[, value := ifelse( ## ... combine them 1 ifelse is.na(value.x), value.y, value.x)], ## reshaping formula id ~ variable, value.var = "value") # id hello world football baseball hockey soccer # 1: 1 2 3 43 6 7 4 # 2: 2 5 1 24 32 2 5 # 3: 3 10 8 2 23 8 23 # 4: 4 4 17 5 15 5 12 # 5: 5 9 7 12 23 3 43
Comments
Post a Comment