python - Merging Pandas DataFrames on categorical series -
i'm trying understand if pandas supports merging dataframes on columns of categorical data (i.e. dtype="category").
i of data work in r, trying more work in python/pandas. in r, merging on factors (analogous categorical dtype) induces type coercion, typically character. allows 1 data frame have by-variable (join column) specified factor (categorical) , other have by-variable string. pandas perform similar coercion of categorical data string prior merging/joining? should expect merging on categoricals robust? can find documentation on (automatic) type coercion in pandas?
simple example:
+++ error test categorical vector equality against non-categorical/non-scalar vector:
in [52]: import pandas pd = pd.series(['a','b','c'],dtype="category") b = pd.series(['a','b','c'],dtype="object") c = pd.series(['a','b','cc'],dtype="object") in [54]: a==b --------------------------------------------------------------------------- typeerror traceback (most recent call last) ... typeerror: cannot compare categorical op <built-in function eq> type <class 'numpy.ndarray'>. if want compare values, use 'series <op> np.asarray(cat)'.
+++ merging dataframe on columns of different type--one categorical, 1 string--does not throw error (at least in simple case). type of coercion must occur:
in [59]: = pd.dataframe({'a':a,'b':[1,2,3]}) b = pd.dataframe({'a':b,'c':[4,5,6]}) print(a.merge(b,on='a')) b c 0 1 4 1 b 2 5 2 c 3 6
so in short, in 0.15.1 merging behavior changed (fixed really) allow merging of categoricals had same categories. further if object array merged in allowed, resulting character of returned merge object (iirc). don't recall if try infer categorical or not.
i created issue here discussion on this.
the equality shown above, e.g. not allowing comparisons of categoricals vs object dtypes done first, while merging behavior expanded allow merging of like-categoricals , objects dtypes (assuming merged categoricals share same categories).
so think allowing equality work api not catching up. address in 0.16.0, pls provide comments on issue.
pr here
this in upcoming 0.15.2 release (slated week of december 7, 2014)
Comments
Post a Comment