MAIN PROCESSOR

 

Why are floating point values important?  
Floating point as opposed to integer values lend themselves to results that are much more accurate, for example a 32-bit integer number can give a value of 0 to 4,295,000,000 and a 32-bit floating point number can give a value of 0 to 256 but with a mantissa of  0.0 to 0.000000059605. Lets say you had a calculation to do like 200 divide by 3. In integer arithmetic the answer would be 66, but the answer in floating point would be 66.666667. As you can see that by calculating the value on a CPU's integer unit you would lose 0.666667 of accuracy. Floating point accuracy is very important to graphic operations. The Dreamcast benefits greatly in the use of the SH-4 because of the power and accuracy of the floating point unit. 

Floating Point Unit  
Almost all RISC processors produced to this date have only one multiply unit but the SH-4 comes with 4 multiply units, so being able to provide four times the multiplication power of any other comparable RISC processor. The FPU on the SH-4 is fully pipelined so that it can sustain the 4 multiply units with the execution of an instruction every clock cycle. Which means that at 200 MHz * 4, the SH-4 can calculate 800 million multiplies per second. Lets compare this with the current Saturn. The Saturn's SH-2 processor has 1 multiply unit which is not pipelined, and it takes 2 to 3 cycles to process 1 multiply instruction. Actually the SH-2 does not have a separate multiply unit, all multiplies have to be done with the integer unit. The SH-2 in the Saturn is running at 28.6 MHz. So we divided the clock rate by the number of cycles it takes to execute one multiply instruction and we get roughly 10 million multiplies per second. Since the Saturn has 2 SH-2s it can do a total of 20 million multiplies per second. The one SH-4 in the Dreamcast is 40 times more powerful then the two SH-2s in the Saturn in doing multiplies which are so important in polygon calculations and graphic transformations.  

Floating Point Unit Excels At 3D Calculations  
 
The SH-4 architecture includes impressive 3D floating point hardware. Each of the four floating point multipliers (fmuls) can receive two 32-bit values and produce a multiplied result that is passed to a four-input floating point adder. This hardware reads two 128-bit vectors (two sets of four 32-bit values) out of register files, multiplies the four 32-bit pairs at the same time, adds the four products together, and puts the 32-bit result back into the register file. This provides the equivalent of 288-bit data crunching (2 x 128 + 32 = 288). A typical application for this processing power would be to perform the following transformation instruction, which involves seven operations:  

         f0*f4 + f1*f5 + f2*f6 + f3*f7 ' f7  

The SH-4 can execute this seven-operation instruction in three clock cycles. Yet, because the architecture is fully pipelined, it can issue one of these instructions every cycle.  

The figure (above, right) shows a better example of what the SH-4's floating point hardware can accomplish. Here the back register file is loaded with 16 values and the hardware performs the following matrix operation in seven clock cycles:  

         f0*b0 + f1*b1 + f2*b2 + f3*b3 ' f0  
         f0*b4 + f1*b5 + f2*b6 + f3*b7 ' f1  
         f0*b8 + f1*b9 + f2*b10 + f3*b11 ' f2  
         f0*b12 + f1*b13 + f2*b14 + f3*b15 ' f3  

The SH-4 is fully pipelined, and the RISC architecture can repeat these 16 fmuls and 12 fadds (28 operations) every four clock cycles, for an average of seven floating point operations per cycle. The superscalar CPU and double-precision fmov allow registers to be loaded from, and stored to cache during these four cycles, so the operations are sustainable. At its 200-MHz clock speed, the SH-4 achieves 1.4-GFlops performance, sustained.  

This high floating point power of the SH-4 will lend itself to a console system that is capable of having dynamic complex polygonal environments with characters in that environment made of high number of polygons making them look very detailed. Add rendering with a graphics chip to include such effects as texturing mapping, texture filtering, lens flare, smoke, fog, transparencies, shading and dynamic lighting. Such a system will allow the display of very impressive visuals as never seen before on a home console.  

 

next...