matlab code optimization - multiply all 3x3 matrices in a large cell array with each other -


summary

looking increase time efficiency of code, repeatedly performing matrix multiplication between 3x3 matrix , inverse 3x3 matrix - using mldivide.

background

i trying implement vector quantization method before using orientation data hand-eye calibration between sensors attached subject's lower limb in gait analysis... algorithm i'm following paper "data selection hand-eye calibration: vector quantization approach" that's background don't need...

code optimize

i hoping find faster method of solving possible "relative movements" (a or b), takes long (c , d around 2000 elements long, therefore size of or b =2000*(2000-1)/2=1999000):

%c,d cell array, each cell 3x3 rotation matrix. nframe=length(c); nrel = nframe*(nframe-1)/2; a=cell(nrel,1); b=cell(nrel,1); at=zeros(nrel,4); bt=zeros(nrel,4); count = 1; i=1:nframe     j=1:nframe         if j <=             continue;         else             % main place optimize (avoid looping?)!!             % in dcm representation (ie. each cell 3x3 matrix)             a{count,1} = c{j}\c{i}; %=inv(c{i+x})*c{i}             b{count,1} = d{j}\d{i};              %using following adds ~ 4000% time (according tic toc).             %represent axis-angle (ie. each cell -> 1x4 vec) - perhaps compute later             at(count,:) = spinconv('dcmtoev',a{count}); %on matlab file exchange             bt(count,:) = spinconv('dcmtoev',b{count});             count=count+1;         end     end end 

hope right place ask, , unable find previous solution apply. also, have no real experience, i'm unsure if computational time unavoidable when working large matrices.

--------------------------------------------------------------------------------------------

*edit*

matrix properties: rotational, commented below - 'nice', not singular. in special orthogonal group, so(3) [transpose=inverse]. see http://en.wikipedia.org/wiki/rotation_matrix#properties_of_a_rotation_matrix

method test: create random rotation matrices, r, use following code:

[u,s,v] = svd(randn(3,3));  r = u∗v'; if det(r) < 0      s(1 ,1) = 1;      s(2 ,2) = 1;     s(3,3) = −1;      r = u∗s∗v’; end 

spinconv: using convert 3x3 directional cosine matrix axis-angle representation. more involved, , converts more necessary stability (to quaternions first). here's link: http://www.mathworks.com/matlabcentral/fileexchange/41562-spinconv/content/spinconv.m here's needs done (not in spinconv - implemented method quickly):

t = (trace(r)-1)/2; % in case of numerical errors if t < -1,     t = -1; elseif t>1,     t = 1; end theta = acosd(t); if isequalf(180,theta) || isequalf(0,theta),     axis = null(r-eye(3));     axis = axis(:,1)/norm(axis(:,1)); else     axis = [r(3,2) - r(2,3); r(1,3) - r(3,1); r(2,1) - r(1,2) ];     axis = axis/(2*sind(theta)); end at(count,:) = [-axis,theta]; %% note (-ve) noted in comments of correct answer. 

*edit #2* realized, alternatively, can use quaternion avoid using 3x3 matrices:

so quaternion 1x4 vector. original code can changed (inside else statement):

a(count,:) = qnorm(qmult(qconj(c(j,:)),c(i,:))); vec = [q(1) q(2) q(3)]/norm([q(1) q(2) q(3)]); theta = 2*atan2(norm([q(1) q(2) q(3)]),q(4)); at(count,:)=[vec,theta]; 

where qconj, qmult, , qnorm quaternion operations.


alright, sorry - that's info , possibilities have.

as commented above, fastest method depends on properties of matrices. example, algorithm can benefit matrix being symmetric, rather slow if not.

so without further information, can make general statements, , compare methods on random matrices (which not give comparison in context of matrices inverses).

depending on matlab version (the jit in r2011a or improved it), pre-allocating a , b gives large gain in loop performance; growing arrays dynamically inside loop inefficient.

along same lines call spinconv: external function (mex or m, doesn't matter), jit cannot compile loop, you're limited speed of interpreter. rather low. if @ possible, can avoid copy-pasting relevant parts of spinconv loop body. know, it's hugely annoying (and sure hope automated in future versions of matlab), way jit understand loop structure , compile (really, factors of 100 or more not uncommon).

so, having said that, tested 2 different methods:

  1. compute , store lu decompositions of c , d, , re-use them in loop
  2. solve systems cn \ [c{1} c{2} ... c{n-1}] n = 2:n, , re-order

in code:

clc clear  nframe = 500;  %// random data c = cellfun(@(~)rand(3), cell(nframe,1), 'uniformoutput', false);  d = cellfun(@(~)rand(3), cell(nframe,1), 'uniformoutput', false);   %// original method  tic  count  = 1; i=1:nframe     j=1:nframe         if j <=             continue;         else             %// main place optimize (avoid looping?)!!             %// in dcm representation (ie. each cell 3x3 matrix)             a{count,1} = c{j}\c{i}; %=inv(c{i+x})*c{i}             b{count,1} = d{j}\d{i};              count=count+1;         end     end end  toc  a1 = a;    %// first method: compute lu decompositions , re-use them in loop %// ------------------------------------------------------------------------  tic  %// compute lu decompositions of c , d clu = cell(nframe, 2); dlu = cell(nframe, 2); ii = 1:nframe     [clu{ii,1:2}]  = lu(c{ii});     [dlu{ii,1:2}]  = lu(d{ii}); end  %// improvement: pre-allocate , b = cell(nframe*(nframe-1)/2, 1); b = cell(nframe*(nframe-1)/2, 1);  %// improvement: don't use , j variable names count  = 1; ii = 1:nframe      %// improvement: instead of continue if j<=i, use different range     jj = ii+1 : nframe          %// mldivide lu equal backwards substitution,         %// trivial , fast         a{count} = clu{jj,2}\(clu{jj,1}\c{ii});         b{count} = dlu{jj,2}\(dlu{jj,1}\d{ii});          count = count+1;      end end  toc a2 = a;    %// second method: solve systems simultaneously concatenation %// ------------------------------------------------------------------------  tic  % pre-allocate temporary matrices aa = cell(nframe-1, 1); bb = cell(nframe-1, 1);  ii = 2:nframe        % solve cn \ [c1 c2 c3 ... cn]     aa{ii-1} = c{ii}\[c{1:ii-1}];     bb{ii-1} = d{ii}\[d{1:ii-1}]; end toc  %// compared order in data stored in 1 of other %// methods, order of data in aa , bb different. so, have %// re-order proper order back:   tic  = cell(nframe*(nframe-1)/2, 1); b = cell(nframe*(nframe-1)/2, 1); ii = 1:nframe-1       a( (1:nframe-ii) + (nframe-1-(ii-2)/2)*(ii-1) ) = ...          cellfun(@(x) x(:, (1:3) + 3*(ii-1)), aa(ii:end), 'uniformoutput', false);       b( (1:nframe-ii) + (nframe-1-(ii-2)/2)*(ii-1) ) = ...          cellfun(@(x) x(:, (1:3) + 3*(ii-1)), bb(ii:end), 'uniformoutput', false); end  toc a3 = a;  % check validity of outputs allequal = all( cellfun(@(x,y,z)isequal(x,y)&&isequal(x,z), a1,a2,a3) ) 

results:

elapsed time 44.867630 seconds.  %// original method  elapsed time 1.267333 seconds.   %// lu decomposition elapsed time 0.183950 seconds.   %// solving en-masse concatenation elapsed time 1.871149 seconds.   %// re-ordering output of  allequal =      1 

note i'm on r2010a, slowness of original method due a , b not being pre-allocated. note performance on newer matlab versions better in regard, still, performance better if pre-allocate.

intuitively (and others suggest), compute explicit inverses,

cinv = cellfun(@inv, c, 'uniformoutput', false); 

or

cinv = cellfun(@(x) [...     x(5)*x(9)-x(8)*x(6)  x(7)*x(6)-x(4)*x(9)  x(4)*x(8)-x(7)*x(5)      x(8)*x(3)-x(2)*x(9)  x(1)*x(9)-x(7)*x(3)  x(7)*x(2)-x(1)*x(8)     x(2)*x(6)-x(5)*x(3)  x(4)*x(3)-x(1)*x(6)  x(1)*x(5)-x(4)*x(2)] / ...         (x(1)*x(5)*x(9) + x(4)*x(8)*x(3) + x(7)*x(2)*x(6) - ...          x(7)*x(5)*x(3) - x(4)*x(2)*x(9) - x(1)*x(8)*x(6)),...     c, 'uniformoutput', false); 

(which faster , more accurate), , multiply inside loop. see, slower both en-masse solve cn\[c1 c2 ... cn-1] , lu (although that depends on nature of matrices). also, fails produce allequal == true; differences small, (especially near-singular matrices , other specials), differences huge.

as mentioned in lot of other questions here on so, , refined google search or advanced linear algebra book tell you, using explicit inverses in numerical applications slow, inaccurate, , dangerous. inverse nice theoretical construct, pretty useless in practical application of theory. therefore, better use 1 of other methods mentioned above.

in conclusion:

  • if can live data out of order (which necessitates more complex indexing later on), solving systems en-masse concatenation fastest far. granted, way re-order data can improved upon, suspect lu still faster if need re-order.

  • if not case, matrices suited lu decomposition, use that. find out if case, use on real data , profile. can try additional outputs of lu (most notably, permutation matrix p, or sparse matrices, column reordering matrix q).

  • of course, if qr decomposition more appropriate, use qr. same chol, or pcg, etc. experiment different methods bit.

edit:

as mention, matrices so(3) rotation matrices. wow, ever important information!! in case, inverse transpose, 1 or 2 orders of magnitude faster any variant of inverse. also, indicate want convert rotation matrices axis-angle representation. kernel should changed along lines of

a = c{ii}.'*c{jj}; b = d{ii}.'*d{jj};  [v,lam] = eig(a); axis  = v(:,imag(diag(lam))==0); theta = acos(min(max(-1, (trace(a)-1)/2), 1)); at{count, :} = {axis theta*180/pi};  [v,lam] = eig(b); axis  = v(:,imag(diag(lam))==0); theta = acos(min(max(-1, (trace(b)-1)/2), 1)); bt{count, :} = {axis theta*180/pi}; 

this uses only built-in functions, should pretty efficient. @ least it's better copy-pasting spinconv, since spinconv uses lot of non-builtin functions (null, isequalf, acosd, sind). note method above uses eigenvalue method; can make more efficient if use determinant method used in spinconv function, provided update have no calls non-built-ins.

beware: appears that version of spinconv has incorrect sign of axis; sign of axis computed in elseif opposite of axis computed in if.


Comments