python - Merging Pandas DataFrames on categorical series -


i'm trying understand if pandas supports merging dataframes on columns of categorical data (i.e. dtype="category").

i of data work in r, trying more work in python/pandas. in r, merging on factors (analogous categorical dtype) induces type coercion, typically character. allows 1 data frame have by-variable (join column) specified factor (categorical) , other have by-variable string. pandas perform similar coercion of categorical data string prior merging/joining? should expect merging on categoricals robust? can find documentation on (automatic) type coercion in pandas?

simple example:

+++ error test categorical vector equality against non-categorical/non-scalar vector:

in [52]: import pandas pd  = pd.series(['a','b','c'],dtype="category") b = pd.series(['a','b','c'],dtype="object") c = pd.series(['a','b','cc'],dtype="object")  in [54]:  a==b  --------------------------------------------------------------------------- typeerror                                 traceback (most recent call last) ... typeerror: cannot compare categorical op <built-in function eq> type <class         'numpy.ndarray'>. if want compare values, use 'series <op> np.asarray(cat)'. 

+++ merging dataframe on columns of different type--one categorical, 1 string--does not throw error (at least in simple case). type of coercion must occur:

in [59]: = pd.dataframe({'a':a,'b':[1,2,3]}) b = pd.dataframe({'a':b,'c':[4,5,6]}) print(a.merge(b,on='a'))     b  c 0   1  4 1  b  2  5 2  c  3  6 

so in short, in 0.15.1 merging behavior changed (fixed really) allow merging of categoricals had same categories. further if object array merged in allowed, resulting character of returned merge object (iirc). don't recall if try infer categorical or not.

i created issue here discussion on this.

the equality shown above, e.g. not allowing comparisons of categoricals vs object dtypes done first, while merging behavior expanded allow merging of like-categoricals , objects dtypes (assuming merged categoricals share same categories).

so think allowing equality work api not catching up. address in 0.16.0, pls provide comments on issue.

pr here

this in upcoming 0.15.2 release (slated week of december 7, 2014)


Comments

Popular posts from this blog

matlab - "Contour not rendered for non-finite ZData" -

delphi - Indy UDP Read Contents of Adata -

javascript - Any ideas when Firefox is likely to implement lengthAdjust and textLength? -