Hi all
I have written an assembler function in SSE to calculate Vector multiply Matrix .It works fine with an Intel Processor , cost only 30% time compare to the FLU assembler by VC8. But as to my AMD CPU(AthlonX2 3600+). It cost about double time than FLU. I tried 3DNOW,which worked even worse. Does AMD SIMD just work slow?
Can some one help me? Any suggestion is welcomed.
Bookmarks