SSE Optimization on AMD CPU
Hi all
I have written an assembler function in SSE to calculate Vector multiply Matrix .It works fine with an Intel Processor , cost only 30% time compare to the FLU assembler by VC8. But as to my AMD CPU(AthlonX2 3600+). It cost about double time than FLU. I tried 3DNOW,which worked even worse. Does AMD SIMD just work slow?
Can some one help me? Any suggestion is welcomed.
Re: SSE Optimization on AMD CPU
I am no expert, but you probably have to take into account the fact that AMD K-8's SSE unit is much slower than Intel C2D's, since it can process only 64-bit per clock cycle. Also, memory access pattern can be very influential factor. I was toying with some asm routines in linux kernel and have managed to accelerate them on K-8/K-10 just by removing a couple of pre-fetches that were supposed to lift performance on Intel...
Re: SSE Optimization on AMD CPU
It would be easier to help if you post the source code :icecream: