AMD bulldozer and GMP

Rick C. Hodgin foxmuldrster at yahoo.com
Sun Feb 19 14:58:02 CET 2012


The thing AMD needs to add to make it work properly, and be a completely
killer implementation of an extended ARM ISA, is the ability to control
extremely fine grained threads at the application level (without OS
service calls).

I have an outline for how this could be done.  I wrote a paper about it
back in 2007/2008.  Gave a copy to Wolfgang Gruener who forwarded it on
to a third party... but nothing was ever mentioned on it again.

With such an ability, and a simple hardware-level feature to allow
threads to be turned on and off for particular intra-single-thread
workloads, the threshold would be crossed from
difficult-to-implement-parallelism to
extremely-beneficial-parallelism-for-nearly-all-apps.

Best regards,
Rick C. Hodgin

On Sun, 2012-02-19 at 08:48 -0500, Rick C. Hodgin wrote:
> Vincent,
> 
> I believe AMD is working on a CPU design that will eventually allow them
> to switch from an x86-based ISA to one ARM-based.
> 
> I could be wrong.  Probably am.  But there are many signs.
> 
> Best regards,
> Rick C. Hodgin
> 
> On Sat, 2012-02-18 at 11:08 +0100, Vincent Diepeveen wrote:
> > Rick,
> > 
> > you can prove that throughput of bulldozer is not more than from the  
> > quadcore intels.
> > In the end bulldozer decodes 4 instructions a cycle a module and  
> > intel decodes 4 instructions a cycle a core,
> > and bulldozer is a tad slower then than the intels as its caches are  
> > a lot slower.
> > 
> > For parallel well scaling applications that is why bulldozer always  
> > will lose it from quadcore intels.
> > 
> > Besides - they really overclocked bulldozer a lot to get where they  
> > are now.
> > 
> > GMP is such a highly optimized software product that integer  
> > multiplication dominates and i bet in bulldozer
> > they added huge latencies there in order to overclock bulldozer a lot  
> > to be at least nearby the intel quadcores.
> > 
> > On Feb 15, 2012, at 12:54 AM, Rick Hodgin wrote:
> > 
> > >> It is totally incomprehensible what AMD is doing.
> > >> The new processor runs hot, slowly, and hardly
> > >> outperforms a 5W processor for integer number
> > >> crunching.  OK, they do, thanks to a 2x clock and
> > >> a more cores. But clock-for-clock they are equal.
> > >
> > > There was a lot of surprise in the CPU community when AMD released  
> > > its early Bulldozers for internal benchmarking.  The additional  
> > > cores provided far greater throughput overall, but so much was lost  
> > > for lesser-parallelized applications that everyone was left  
> > > scratching their heads and wondering what was going on (as you are  
> > > doing now).
> > >
> > > AMD was also surprised by its performance actually, indicating to  
> > > many that it was an unexpected condition.  Yet, in the end there  
> > > were some fixes made but nothing to bring it up to par with what  
> > > everyone (outside of core development??) was expecting.
> > >
> > > Best regards,
> > > Rick C. Hodgin
> > >
> > > _______________________________________________
> > > gmp-discuss mailing list
> > > gmp-discuss at gmplib.org
> > > https://gmplib.org/mailman/listinfo/gmp-discuss
> > >
> > 
> 




More information about the gmp-discuss mailing list