python - Fast method for removing duplicate columns in pandas.Dataframe -
so using
df_ab = pd.concat([df_a, df_b], axis=1, join='inner')
i dataframe looking this:
b b 0 5 5 10 10 1 6 6 19 19
and want remove multiple columns:
b 0 5 10 1 6 19
because df_a , df_b subsets of same dataframe know rows have same values if column name same. have working solution:
df_ab = df_ab.t.drop_duplicates().t
but have number of rows 1 slow. have faster solution? prefer solution explicit knowledge of column names isn't needed.
you may use np.unique
indices of unique columns, , use .iloc
:
>>> df b b 0 5 5 10 10 1 6 6 19 19 >>> _, = np.unique(df.columns, return_index=true) >>> df.iloc[:, i] b 0 5 10 1 6 19
Comments
Post a Comment