i have 2 pandas data frames, this:
import pandas pd df_one = pd.dataframe( { 'a': [1,1,2,3,4,4,4], 'b1': [0.5,0.0,0.2,0.1,0.3,0.2,0.1], 'b2': [0.2,0.3,0.1,0.5,0.3,0.1,0.2], 'b3': [0.1,0.2,0.0,0.9,0.0,0.3,0.5]} ); df_two = pd.dataframe( { 'a': [1,2,3,4], 'c1': [1.0,9.0,2.1,9.0], 'c2': [2.0,3.0,0.7,1.1], 'c3': [5.0,4.0,2.3,3.4]} ); df_one b1 b2 b3 0 1 0.5 0.2 0.1 1 1 0.0 0.3 0.2 2 2 0.2 0.1 0.0 3 3 0.1 0.5 0.9 4 4 0.3 0.3 0.0 5 4 0.2 0.1 0.3 6 4 0.1 0.2 0.5 df_two c1 c2 c3 0 1 1.0 2.0 5.0 1 2 9.0 3.0 4.0 2 3 2.1 0.7 2.3 3 4 9.0 1.1 3.4
what compute scalar product multiplying rows of first data frame rows of second data frame, i.e., \sum_i b_i * c_i
, in such way row in first data frame multiplied row in second data frame if values of a
column match in both frames. know how looping , using if's in more efficient numpy-like or pandas-like way. appreciated :)
not sure if want unique values column (if do, use groupby on result below)
pd.merge(df_one, df_two, on='a') b1 b2 b3 c1 c2 c3 0 1 0.5 0.2 0.1 1.0 2.0 5.0 1 1 0.0 0.3 0.2 1.0 2.0 5.0 2 2 0.2 0.1 0.0 9.0 3.0 4.0 3 3 0.1 0.5 0.9 2.1 0.7 2.3 4 4 0.3 0.3 0.0 9.0 1.1 3.4 5 4 0.2 0.1 0.3 9.0 1.1 3.4 6 4 0.1 0.2 0.5 9.0 1.1 3.4 pd.merge(df_one, df_two, on='a').apply(lambda s: sum([s['b%d'%i] * s['c%d'%i] in range(1, 4)]) , axis=1) 0 1.40 1 1.60 2 2.10 3 2.63 4 3.03 5 2.93 6 2.82
Comments
Post a Comment