jage
07-16-2003, 01:13 PM
Don't let the MHz to fool you. PPCs are all very very unbalanced designs. It would be relatively straightforward to design 100MHz system that would beat any PPC in every possible category. That is, if you have gates & energy to burn in the chips. All X-Scale & SA designs are badly memory bandwidth starved. They have also severe other weaknesses - like small cache, no floating point or division instruction.
This is not tested or anything - just a feel of things, but this is how I "feel" like PPC stands at the moment. To repeat, not verified, but I'd be surprised if this is far from the truth.
X-Scale 400MHz
Following integer/fp section ignores bandwidth problems:
Integer operations except division: Pentium 200-300MHz, depending on register use (PXA has more registers, that helps) and 'pairability' which helps Pentium.
Division: Pentium 50MHz (if such a thing existed...)
General Single precision floating point: hypotethical 16MHz Pentium
General Double precision floating point: 10MHz Pentium
*floating point note*: PXA255 has no FPU. You can usually find faster compromises for special cases if you're not scared of low level coding and in the worst case assembler. FPU uses too much energy & chip real estate. I'd like to see FPU you can turn on and off... but I doubt we'll see such a thing soon. Just FADD, FSUB, FMUL, FSTORE and FCMP would make me happy, even just single precision, pretty much all you need most of the time for floating point math.
RAM bandwidth:
Typical early Pentium class, latency somewhat better.
CPU cache size: typical low end 486 system. Cache size is one of the biggest performance problems. Can be worked around to some extent by smart coding and using prefetch functionality. This is something way beyond ability of an average programmer, though, as it requires close knowledge of the system architecture and assembler programming. PXA 255 16kB data cache, 32k code cache. Energy-burning unified 256kB cache would help a lot - at the expense of energy consumption and die size... Maybe they could make cache with 2 modes? Low power consumption 32-64kB mode and performance 256kB mode...
CPU cache speed: Comparable to Pentium 150
"Disk" IO, flash cards and such:
Bandwidth similar to high end 386-systems, except latency which is Pentium-class system level.
"Disk" capacity:
Typical 486-era system.
Network IO:
Comparable to Ethernet on 386-class systems.
Graphics subsystem:
Simple operations (like filling rectangles, etc): 386/early 486 with VLB
Complex operations (like drawing filled polygons or clipping, etc): Pentium 100-150
Graphics bandwidth:
From high end 386-system or VLB 486-systems.
Comments?
This is not tested or anything - just a feel of things, but this is how I "feel" like PPC stands at the moment. To repeat, not verified, but I'd be surprised if this is far from the truth.
X-Scale 400MHz
Following integer/fp section ignores bandwidth problems:
Integer operations except division: Pentium 200-300MHz, depending on register use (PXA has more registers, that helps) and 'pairability' which helps Pentium.
Division: Pentium 50MHz (if such a thing existed...)
General Single precision floating point: hypotethical 16MHz Pentium
General Double precision floating point: 10MHz Pentium
*floating point note*: PXA255 has no FPU. You can usually find faster compromises for special cases if you're not scared of low level coding and in the worst case assembler. FPU uses too much energy & chip real estate. I'd like to see FPU you can turn on and off... but I doubt we'll see such a thing soon. Just FADD, FSUB, FMUL, FSTORE and FCMP would make me happy, even just single precision, pretty much all you need most of the time for floating point math.
RAM bandwidth:
Typical early Pentium class, latency somewhat better.
CPU cache size: typical low end 486 system. Cache size is one of the biggest performance problems. Can be worked around to some extent by smart coding and using prefetch functionality. This is something way beyond ability of an average programmer, though, as it requires close knowledge of the system architecture and assembler programming. PXA 255 16kB data cache, 32k code cache. Energy-burning unified 256kB cache would help a lot - at the expense of energy consumption and die size... Maybe they could make cache with 2 modes? Low power consumption 32-64kB mode and performance 256kB mode...
CPU cache speed: Comparable to Pentium 150
"Disk" IO, flash cards and such:
Bandwidth similar to high end 386-systems, except latency which is Pentium-class system level.
"Disk" capacity:
Typical 486-era system.
Network IO:
Comparable to Ethernet on 386-class systems.
Graphics subsystem:
Simple operations (like filling rectangles, etc): 386/early 486 with VLB
Complex operations (like drawing filled polygons or clipping, etc): Pentium 100-150
Graphics bandwidth:
From high end 386-system or VLB 486-systems.
Comments?