R - merge/combine columns with same name but incomplete data -


i have 2 data frames have columns same names , others different names. data frames this:

df1       id hello world hockey soccer     1  1    na    na      7      4     2  2    na    na      2      5     3  3    10     8      8     23     4  4     4    17      5     12     5  5    na    na      3     43  df2           id hello world football baseball     1  1     2     3       43        6     2  2     5     1       24       32     3  3    na    na        2       23     4  4    na    na        5       15     5  5     9     7       12       23 

as can see, in 2 of shared columns ("hello" , "world"), of data in 1 of data frames , rest in other.

what trying (1) merge 2 data frames "id", (2) combine data "hello" , "world" columns in both frames 1 "hello" column , 1 "world" column, , (3) have final data frame contain of other columns in 2 original frames ("hockey", "soccer", "football", "baseball"). so, want final result this:

  id hello world hockey soccer football baseball 1  1     2     3      7      4        43       6 2  2     5     3      2      5        24      32 3  3    10     8      8     23         2      23 4  4     4    17      5     12         5      15 5  5     9     7      3     43        12      23 

i'm pretty new @ r codes i've tried variations on merge , i've tried answer found here, based on similar question: r: merging copies of same variable. however, data sets bigger i'm showing here (there's 20 matching columns (like "hello" , "world") , 100s of non-matching ones (like "hockey" , "football")) i'm looking won't require me write them out manually.

any idea if can done? i'm sorry can't provide sample of efforts, don't know start besides:

mydata <- merge(df1, df2, by=c("id"), = true) 

to reproduce data frames:

df1 <- structure(list(id = c(1l, 2l, 3l, 4l, 5l), hellow = c(2, 5, na, na, 9), world = c(3, 1, na, na, 7), football = c(43, 24, 2, 5, 12), baseball = c(6, 32, 23, 15, 23)), .names = c("id", "hello", "world", "football", "baseball"), class = "data.frame", row.names = c(na, -5l))   df2 <- structure(list(id = c(1l, 2l, 3l, 4l, 5l), hellow = c(na, na, 10, 4, na), world = c(na, na, 8, 17, na), hockey = c(7, 2, 8, 5, 3), soccer = c(4, 5, 23, 12, 43)), .names = c("id", "hello", "world", "hockey", "soccer"), class = "data.frame", row.names = c(na, -5l)) 

here's approach involves melting data, merging molten data, , using dcast wide form. i've added comments understand going on.

## required packages library(data.table) library(reshape2)  dcast.data.table(   merge(     ## melt first data.frame , set key id , variable     setkey(melt(as.data.table(df1), id.vars = "id"), id, variable),      ## melt second data.frame     melt(as.data.table(df2), id.vars = "id"),      ## you'll have 2 value columns...     = true)[, value := ifelse(       ## ... combine them 1 ifelse       is.na(value.x), value.y, value.x)],    ## reshaping formula   id ~ variable, value.var = "value") #    id hello world football baseball hockey soccer # 1:  1     2     3       43        6      7      4 # 2:  2     5     1       24       32      2      5 # 3:  3    10     8        2       23      8     23 # 4:  4     4    17        5       15      5     12 # 5:  5     9     7       12       23      3     43 

Comments

Popular posts from this blog

javascript - Any ideas when Firefox is likely to implement lengthAdjust and textLength? -

matlab - "Contour not rendered for non-finite ZData" -

delphi - Indy UDP Read Contents of Adata -