Improvements for power64/mode64

Mark Rodenkirch mgrogue at wi.rr.com
Sun Mar 26 18:54:32 CEST 2006


On Mar 26, 2006, at 9:19 AM, Torbjorn Granlund wrote:

> Mark Rodenkirch <mgrogue at wi.rr.com> writes:
>
>   Assuming I understand how to use speed correctly I am getting about
>   between 8.6 and 8.7 cycles per limb for both addmul and submul.
>   sqr_diagonal is between 7.8 and 7.9 cycles per limb.  If you have a
>   single use of speed that lets both of us know that I am comparing
>   apples to apples, that would be great.
>
> A typical use would be
>
>         speed -C -s1-1000 -f1.1 mpn_addmul_1.1 mpn_submul_1.1

Great.

Here is the before:

/Distributed/gmp-4.1.99/tune > ./speed -C -s1-1000 -f1.1  
mpn_addmul_1.1 mpn_submul_1.1 mpn_sqr_diagonal
overhead 6.63 cycles, precision 10000 units of 3.00e-08 secs, CPU  
freq 2500.00 MHz
         mpn_addmul_1.1 mpn_submul_1.1 mpn_sqr_diagonal
1             15.4082       17.2361      #10.2528
2             18.0057       16.7547      #10.3513
3             15.1702       15.4417       #9.7935
4             12.7514       14.2524       #9.3448
5             12.8009       14.4684       #9.0765
6             12.3352       15.0858       #8.8974
7             12.4303       15.2321       #8.7687
8             11.9185       14.3786       #8.6726
9             11.5017       14.4105       #8.5988
10            11.2769       14.3019       #8.5396
11            11.0930       14.2829       #8.4897
12            11.0227       14.2237       #8.4489
13            10.8239       14.1139       #8.4153
14            10.8381       14.1186       #8.4207
15            10.7659       14.1063       #8.4279
16            10.7210       13.9623       #8.3689
17            10.8150       13.9222       #8.3472
18            10.6317       13.9684       #8.3282
19            10.6249       13.9406       #8.3109
20            10.5023       13.9311       #8.2951
22            10.5930       13.8897       #8.2687
24            10.5025       13.8082       #8.2455
26            10.4509       13.8353       #8.2278
28            10.4311       13.7324       #8.2113
30            10.3696       13.7873       #8.1974
33            10.3360       13.7463       #8.1793
36            10.3543       13.6697       #8.1649
39            10.2853       13.6714       #8.1522
42            10.2617       13.6430       #8.1414
46            10.2200       13.6115       #8.1294
50            10.2158       13.5951       #8.1189
55            10.2321       13.5808       #8.1089
60            10.1987       13.5524       #8.0990
66            10.1755       13.5995       #8.0907
72            10.1504       13.5974       #8.0828
79            10.1378       13.5407       #8.0756
86            10.1369       13.5720       #8.0699
94            10.1213       13.5554       #8.0633
103           10.1119       13.5245       #8.0590
113           10.0991       13.5125       #8.0532
124           10.0956       13.5070       #8.0488
136           10.0817       13.5161       #8.0447
149           10.0700       13.5036       #8.0407
163           10.0669       13.5205       #8.0377
179           10.0652       13.5297       #8.0340
196           10.0615       13.4985       #8.0315
215           10.0534       13.5041       #8.0285
236           10.0482       13.4858       #8.0267
259           10.0467       13.5109       #8.0246
284           10.0463       13.4550       #8.0223
312           10.0362       13.4654       #8.0201
343           10.0330       13.4850       #8.0191
377           10.0379       13.4853       #8.0167
414           10.0267       13.4832       #8.0155
455           10.0264       13.4581       #8.0139
500           10.0258       13.4813       #8.0135
550           10.0238       13.4868       #8.0122
605           10.0207       13.4713       #8.0109
665           10.0186       13.4481       #8.0106
731           10.0176       13.4773       #8.0094
804           10.0168       13.4861       #8.0088
884           10.0379       13.4456       #8.0079
972           10.0347       13.4910       #8.0073

Here is the after:

/Distributed/gmp-4.1.99/tune > ./speed -C -s1-1000 -f1.1  
mpn_addmul_1.1 mpn_submul_1.1 mpn_sqr_diagonal
overhead 6.75 cycles, precision 10000 units of 3.00e-08 secs, CPU  
freq 2500.00 MHz
         mpn_addmul_1.1 mpn_submul_1.1 mpn_sqr_diagonal
1             14.9862       16.2515      #12.8770
2             11.3779       12.3810      #10.8758
3             11.3483       12.0028       #9.8353
4             12.1296       11.4489       #9.1886
5             10.5530       12.4029       #8.8770
6              9.3084       11.5968       #8.6266
7              9.4014       11.6220       #8.4836
8              9.4974       12.1069       #8.3450
9              9.3424       11.5795       #8.2654
10             9.2108        9.5605       #8.0760
11             9.1978        9.7187       #8.2292
12             9.2183        9.8350       #8.0017
13             8.8908        9.1650       #8.1164
14             8.9274        9.2755       #7.8940
15             8.9943        9.4167       #8.0191
16             9.0167        9.5257       #7.9305
17             8.8835        9.0001       #7.9872
18             8.8792        9.1152       #7.8007
19             8.8339        9.2433       #7.9361
20             8.8704        9.3378       #7.8021
22             8.7390        9.0205       #7.7743
24             8.8047        9.2152       #7.7515
26             8.6991        8.9594       #7.7166
28             8.8300        9.1265       #7.7160
30             8.8336        9.1563       #7.7011
33             8.9409        9.1630       #7.7518
36             8.8542        9.1050       #7.6680
39             9.3730        8.9518       #7.7065
42             8.6479        8.8859       #7.6301
46             8.5845        8.8609       #7.6318
50             8.6029        8.8306       #7.6096
55             9.0716        8.8473       #7.6468
60             8.8317        8.8711       #7.6017
66             8.5455        8.7807       #7.5923
72             8.7378        8.8211       #7.5847
79             8.8690        8.7635       #7.6055
86             8.4915        8.7330       #7.5709
94             8.4812        8.7197       #7.5648
103            8.7636        8.7262       #7.5793
113            8.5314        8.7342       #7.5739
124            8.6080        8.7318       #7.5500
136            8.5803        8.7152       #7.5452
149            8.4786        8.7044       #7.5567
163            8.6187        8.6754       #7.5523
179            8.6071        8.6683       #7.5469
196            8.5216        8.6765       #7.5321
215            8.5672        8.6552       #7.5389
236            8.4982        8.6649       #7.5275
259            8.5322        8.6481       #7.5325
284            8.4849        8.6537       #7.5262
312            8.4730        8.6446       #7.5204
343            8.5008        8.6309       #7.5254
377            8.4207        8.6405       #7.5229
414            8.4099        8.6193       #7.5156
455            8.4738        8.6232       #7.5188
500            8.4438        8.6252       #7.5138
550            8.4039        8.6123       #7.5124
605            8.4109        8.6202       #7.5151
665            8.4078        8.6178       #7.5138
731            8.4447        8.6100       #7.5130
804            8.4217        8.6118       #7.5091
884            8.4240        8.6183       #7.5083
972            8.4175        8.6162       #7.5073

I'm still working on sqr_diagonal.  I've made some changes compared  
to the code I put on the list.  It appears to work fine, but I would  
like to test it more.

--Mark


More information about the gmp-devel mailing list