python - numpy/pandas effective multiplication of arrays/dataframes -


i have 2 pandas data frames, this:

import pandas pd  df_one = pd.dataframe( {     'a': [1,1,2,3,4,4,4],     'b1': [0.5,0.0,0.2,0.1,0.3,0.2,0.1],     'b2': [0.2,0.3,0.1,0.5,0.3,0.1,0.2],     'b3': [0.1,0.2,0.0,0.9,0.0,0.3,0.5]} ); df_two = pd.dataframe( {     'a': [1,2,3,4],     'c1': [1.0,9.0,2.1,9.0],     'c2': [2.0,3.0,0.7,1.1],     'c3': [5.0,4.0,2.3,3.4]} );  df_one      b1   b2   b3 0  1  0.5  0.2  0.1 1  1  0.0  0.3  0.2 2  2  0.2  0.1  0.0 3  3  0.1  0.5  0.9 4  4  0.3  0.3  0.0 5  4  0.2  0.1  0.3 6  4  0.1  0.2  0.5  df_two      c1   c2   c3 0  1  1.0  2.0  5.0 1  2  9.0  3.0  4.0 2  3  2.1  0.7  2.3 3  4  9.0  1.1  3.4 

what compute scalar product multiplying rows of first data frame rows of second data frame, i.e., \sum_i b_i * c_i, in such way row in first data frame multiplied row in second data frame if values of a column match in both frames. know how looping , using if's in more efficient numpy-like or pandas-like way. appreciated :)

not sure if want unique values column (if do, use groupby on result below)

pd.merge(df_one, df_two, on='a')      b1   b2   b3   c1   c2   c3 0  1  0.5  0.2  0.1  1.0  2.0  5.0 1  1  0.0  0.3  0.2  1.0  2.0  5.0 2  2  0.2  0.1  0.0  9.0  3.0  4.0 3  3  0.1  0.5  0.9  2.1  0.7  2.3 4  4  0.3  0.3  0.0  9.0  1.1  3.4 5  4  0.2  0.1  0.3  9.0  1.1  3.4 6  4  0.1  0.2  0.5  9.0  1.1  3.4      pd.merge(df_one, df_two, on='a').apply(lambda s: sum([s['b%d'%i] * s['c%d'%i] in range(1, 4)]) , axis=1) 0    1.40 1    1.60 2    2.10 3    2.63 4    3.03 5    2.93 6    2.82 

Comments