python - Fast method for removing duplicate columns in pandas.Dataframe -

so using

df_ab = pd.concat([df_a, df_b], axis=1, join='inner')

i dataframe looking this:

          b    b 0   5    5   10   10 1   6    6   19   19

and want remove multiple columns:

        b 0   5    10 1   6    19

because df_a , df_b subsets of same dataframe know rows have same values if column name same. have working solution:

df_ab = df_ab.t.drop_duplicates().t

but have number of rows 1 slow. have faster solution? prefer solution explicit knowledge of column names isn't needed.

you may use np.unique indices of unique columns, , use .iloc:

>>> df       b   b 0  5  5  10  10 1  6  6  19  19 >>> _, = np.unique(df.columns, return_index=true) >>> df.iloc[:, i]      b 0  5  10 1  6  19

Core code