mpn_perfpow: do we need a special GCD for mp_bitcnt_t?

Niels Möller nisse at lysator.liu.se
Fri Sep 5 07:34:19 UTC 2014


tg at gmplib.org (Torbjörn Granlund) writes:

> It might make sense to have an entry point for log2(a) \approx log2(b)
> allowing even operands, suitable for fresh-out-of multilimb mod
> reduction.  It might perhaps make sense to have an entry point for
> operands which are also both odd.  And we need an entry point allowing
> any two limbs (perhaps requiring non-zero arguments).

For a start, let's look at what's needed by gcd_1. It would look
something like (untested):

  mp_limb_t
  mpn_gcd_1 (mp_srcptr up, mp_size_t size, mp_limb_t vlimb)
  {
    mp_limb_t ulimb;
  
    ASSERT (size >= 1);
    ASSERT (vlimb != 0);
    ASSERT_MPN_NONZERO_P (up, size);
  
    ulimb = up[0];
  
    if (size == 1)
      return mpn_gcd_11 (ulimb, vlimb);
    else
      {
        unsigned long  zero_bits, v_low_zero_bits;
        /* Need vlimb odd for modexact. */
        count_trailing_zeros (zero_bits, vlimb | ulimb);
        count_trailing_zeros (v_low_zero_bits, vlimb);
        
        vlimb >>= v_low_zero_bits;
  
        ulimb = MPN_MOD_OR_MODEXACT_1_ODD (up, size, vlimb);
        mpn_gcd_11 (ulimb, vlimb) << zero_bits;
      }
  }

For the first call to gcd_11, the inputs come direct from the
application. Even numbers must be handled, and one number being a lot
smaller than the other may well be an important use case. Or?

For the second call to gcd_11, v is always odd, and as far as I can tell
it's pretty unlikely that u and v will differ a lot in size; hence, it's
not worth any extra overhead to check for that case.

Maybe we should have an public mpn_gcd_11 tailored for the first case:
Handling all inputs (and maybe also support gcd(0,0) == 0?), and
checking for very different sizes. And a private (and optional) second
entry point mpn_gcd_11_odd which requires v odd, and is optmized for the
case that u and v are of similar size.

Hmm, and MPN_MOD_OR_MODEXACT_1_ODD, is that still useful? For which
machines and ranges is modexact_1 faster than the family of mod_1_*
functions?

> Do you agree?  Did I forget some scenario?

I'm not sure "both odd" is worth an entry point of it's own.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.


More information about the gmp-devel mailing list