X86-64 Inline Assembly in C (compiled using GCC), multi-precision multiplication routine causing a seg fault -

i'm trying implement multi-precision multiplication of gmp mpz_t objects in inline x86 assembly. depending on choice of constraints on output variable, either segmentation fault, or values in output variable corrupted in inconsistent way (i.e. different runs of code cause values corrupted differently).

what code take 2 gmp mpz_t objects, ain , bin, each guaranteed have size 13 (i.e. _mp_size set 13, objects defined 13, 64 bit numbers) , produce mpz_t object of size 26, res, result of multiplying ain , bin together. reason not use mpz_mul because method results in performance increase in particular setting.

note res->_mp_d, ain->_mp_d , bin->_mp_d refer array of "limbs" define respective mpz_t objects, (obj->_mp_d)[0] being least significant limb , (obj->_mp_d)[obj->_mp_size-1] being significant limb.

if can explain doing wrong here, appreciate it! below code segment. have excluded assembly because repetitive, think give enough give indication of going on:

void mpz_mul_x86_1(mpz_t res, mpz_t ain, mpz_t bin){     if( res->_mp_alloc<26) //the next few lines makes sure res large enough      _mpz_realloc(res,26); //the result of multiplication     res->_mp_size = 26;      asm volatile (                  "movq 0(%1), %%rax;"       "mulq 0(%2);"      "movq %%rax, 0(%0);"          "movq %%rdx, %%r8;"           //a0*b0                                    //0       "xorq %%r10, %%r10;"        "movq 8(%1), %%rax;"            "mulq 0(%2);"                    "addq %%rax, %%r8;"           "movq %%rdx, %%r9;"        "adcq $0, %%r9;"              //a1*b0       "movq 0(%1), %%rax;"        "mulq 8(%2);"               "addq %%rax, %%r8;"       "movq %%r8, 8(%0);"        "adcq %%rdx,%%r9;"          "adcq $0, %%r10;"                //a0*b1                                      //1       "xorq %%r8, %%r8;"        "movq 0(%1), %%rax;"      "mulq 16(%2);"                  "addq %%rax, %%r9;"                  "adcq %%rdx, %%r10;"      "adcq $0, %%r8;"           //a0*b2       "movq 8(%1), %%rax;"      "mulq 8(%2);"                  "addq %%rax, %%r9;"                  "adcq %%rdx, %%r10;"      "adcq $0, %%r8;"        //a1*b1       "movq 16(%1), %%rax;"      "mulq 0(%2);"                  "addq %%rax, %%r9;"          "movq %%r9, 16(%0);"       "adcq %%rdx, %%r10;"      "adcq $0, %%r8;"            //a2*b0                                  //2      "xorq %%r9, %%r9;"         "movq 24(%1), %%rax;"      "mulq 0(%2);"                  "addq %%rax, %%r10;"                  "adcq %%rdx, %%r8;"      "adcq $0, %%r9;"              //a3*b0       "movq 0(%1), %%rax;"      "mulq 24(%2);"                  "addq %%rax, %%r10;"                  "adcq %%rdx, %%r8;"      "adcq $0, %%r9;"            //a0*b3       "movq 16(%1), %%rax;"      "mulq 8(%2);"                  "addq %%rax, %%r10;"                  "adcq %%rdx, %%r8;"      "adcq $0, %%r9;"        //a2*b1       "movq 8(%1), %%rax;"      "mulq 16(%2);"                  "addq %%rax, %%r10;"         "movq %%r10, 24(%0);"       "adcq %%rdx, %%r8;"      "adcq $0, %%r9;"        //a1*b2                              //3       /*about 1000 lines of omitted assembly code here*/        "xor %%r8, %%r8;"       "movq 96(%1), %%rax;"      "mulq 88(%2);"                  "addq %%rax, %%r9;"      "adcq %%rdx, %%r10;"      "adcq $0, %%r8;"    //a12*b11       "movq 88(%1), %%rax;"      "mulq 96(%2);"                  "addq %%rax, %%r9;"      "movq %%r9, 184(%0);"      "adcq %%rdx, %%r10;"      "adcq $0, %%r8;"    //a11*b12                          //23      "xor %%r9, %%r9;"       "movq 96(%1), %%rax;"      "mulq 96(%2);"                  "addq %%rax, %%r10;"      "movq %%r10, 192(%0);"      "adcq %%rdx, %%r8;"      "adcq $0, %%r8;"    //a12*b12                          //24       "movq %%r8, 200(%0);" //25        :  "=&r" (res->_mp_d)       : "r" ((ain->_mp_d)), "r" ((bin->_mp_d))      : "%rax", "%rdx", "%r8", "%r9", "%r10", "memory", "cc"      ); }

you incorrectly declare res->_mp_d output of asm statement, when input pointer output.

Breidenthal

Search This Blog

X86-64 Inline Assembly in C (compiled using GCC), multi-precision multiplication routine causing a seg fault -

Comments

Post a Comment