X86-64 Inline Assembly in C (compiled using GCC), multi-precision multiplication routine causing a seg fault -
i'm trying implement multi-precision multiplication of gmp mpz_t
objects in inline x86 assembly. depending on choice of constraints on output variable, either segmentation fault, or values in output variable corrupted in inconsistent way (i.e. different runs of code cause values corrupted differently).
what code take 2 gmp mpz_t
objects, ain
, bin
, each guaranteed have size 13 (i.e. _mp_size
set 13, objects defined 13, 64 bit numbers) , produce mpz_t
object of size 26, res, result of multiplying ain
, bin
together. reason not use mpz_mul
because method results in performance increase in particular setting.
note res->_mp_d, ain->_mp_d
, bin->_mp_d
refer array of "limbs" define respective mpz_t
objects, (obj->_mp_d)[0]
being least significant limb , (obj->_mp_d)[obj->_mp_size-1]
being significant limb.
if can explain doing wrong here, appreciate it! below code segment. have excluded assembly because repetitive, think give enough give indication of going on:
void mpz_mul_x86_1(mpz_t res, mpz_t ain, mpz_t bin){ if( res->_mp_alloc<26) //the next few lines makes sure res large enough _mpz_realloc(res,26); //the result of multiplication res->_mp_size = 26; asm volatile ( "movq 0(%1), %%rax;" "mulq 0(%2);" "movq %%rax, 0(%0);" "movq %%rdx, %%r8;" //a0*b0 //0 "xorq %%r10, %%r10;" "movq 8(%1), %%rax;" "mulq 0(%2);" "addq %%rax, %%r8;" "movq %%rdx, %%r9;" "adcq $0, %%r9;" //a1*b0 "movq 0(%1), %%rax;" "mulq 8(%2);" "addq %%rax, %%r8;" "movq %%r8, 8(%0);" "adcq %%rdx,%%r9;" "adcq $0, %%r10;" //a0*b1 //1 "xorq %%r8, %%r8;" "movq 0(%1), %%rax;" "mulq 16(%2);" "addq %%rax, %%r9;" "adcq %%rdx, %%r10;" "adcq $0, %%r8;" //a0*b2 "movq 8(%1), %%rax;" "mulq 8(%2);" "addq %%rax, %%r9;" "adcq %%rdx, %%r10;" "adcq $0, %%r8;" //a1*b1 "movq 16(%1), %%rax;" "mulq 0(%2);" "addq %%rax, %%r9;" "movq %%r9, 16(%0);" "adcq %%rdx, %%r10;" "adcq $0, %%r8;" //a2*b0 //2 "xorq %%r9, %%r9;" "movq 24(%1), %%rax;" "mulq 0(%2);" "addq %%rax, %%r10;" "adcq %%rdx, %%r8;" "adcq $0, %%r9;" //a3*b0 "movq 0(%1), %%rax;" "mulq 24(%2);" "addq %%rax, %%r10;" "adcq %%rdx, %%r8;" "adcq $0, %%r9;" //a0*b3 "movq 16(%1), %%rax;" "mulq 8(%2);" "addq %%rax, %%r10;" "adcq %%rdx, %%r8;" "adcq $0, %%r9;" //a2*b1 "movq 8(%1), %%rax;" "mulq 16(%2);" "addq %%rax, %%r10;" "movq %%r10, 24(%0);" "adcq %%rdx, %%r8;" "adcq $0, %%r9;" //a1*b2 //3 /*about 1000 lines of omitted assembly code here*/ "xor %%r8, %%r8;" "movq 96(%1), %%rax;" "mulq 88(%2);" "addq %%rax, %%r9;" "adcq %%rdx, %%r10;" "adcq $0, %%r8;" //a12*b11 "movq 88(%1), %%rax;" "mulq 96(%2);" "addq %%rax, %%r9;" "movq %%r9, 184(%0);" "adcq %%rdx, %%r10;" "adcq $0, %%r8;" //a11*b12 //23 "xor %%r9, %%r9;" "movq 96(%1), %%rax;" "mulq 96(%2);" "addq %%rax, %%r10;" "movq %%r10, 192(%0);" "adcq %%rdx, %%r8;" "adcq $0, %%r8;" //a12*b12 //24 "movq %%r8, 200(%0);" //25 : "=&r" (res->_mp_d) : "r" ((ain->_mp_d)), "r" ((bin->_mp_d)) : "%rax", "%rdx", "%r8", "%r9", "%r10", "memory", "cc" ); }
you incorrectly declare res->_mp_d output of asm statement, when input pointer output.
Comments
Post a Comment