MAIN PROCESSOR

SH - 4 
 
The Jewel in SEGA's Crown? 
There is without a doubt, that the SH-4 is probably the best CPU that Sega could have chosen for the Dreamcast. It has even been said that Sega had a hand in deciding what extra features would be on this chip. I could understand Hitachi asking Sega what they would want on a next generation SH series chip, since Sega is one of their best customers. Sega has sold around 10 million Saturns so far, which means that Hitachi has sold Sega 20 million SH-2's and 10 million SH-1's. Hitachi certainly does not want to lose a customer like Sega. The main change on the SH-4 compared to the previous SH-3 and SH-2 is the inclusion of a floating point unit that is configured for calculating matrix math arrays. Graphic transformations require lots of calculations with matrix arrays, so the SH-4 would excel in graphic transformations. According to Hitachi documents, the SH-4 is capable of handling an impressive 5 million polygons per second which would allow for games to have very complex and detailed polygon models. There is no CPU currently in the PC world that can even come close to the floating point power of the SH-4.  

Two SH-4 Versions  

Clock Rate 200 MHz 167 MHz
Model No. HD6417750BP200 HD6417750F167
Integer Performance (Avg.) 360 MIPS 300 MIPS
Floating Point Performance (Max.) 1.4 GFlops 1.169 GFlops
Bus Size 64 Bits 32 Bits
Bus Bandwidth 800 MBytes/second 334 MBytes/second
Package 256-pin ball grid array (BGA) 208-pin quad flat package (QFP)
Pricing (10,000 units) 4,000 yen (US$31.70)¹ 3,000 yen (US$23.80)¹
Availability (samples) January, 1998 January, 1998
Availability (production quantities) 3rd Quarter 1998 3rd Quarter 1998

  

 (1) Based on YEN to US conversion for November 19th, 1997. 

SH-4 Specifications  

  • 200 MHz 
  • 360 integer MIPS (Dhrystone 1.1  benchmark) 
  • 32-bit integer unit
  • 2-way superscalar 
  • 5 stage pipeline 
  • 8 KByte instruction cache 
  • 16 KByte data cache 
  • 64-bit floating point unit 
  • 1.4 GFlops (0.9 GFlops sustained), 5-million polygon capability 
  • 64-bit external bus (256 pin package)
  • 800 MBytes/second bus bandwidth with 100 MHz SDRAM
  • Glueless bus memory interface to SGRAM, and SDRAM
  • Internal power of  1.8 V / 3.3 V (I/O)
  • 1.5 W (typ.) heat dissipation (at 200 MHz)
  • 0.25 µm, four-layer metal CMOS process 
  • 42.25 mm² die size 
  • 208-pin quad flat package (QFP) or 256-pin ball grid array (BGA) package

SH-4 Peripherals  
As you can see by the above diagram, the SH-4 comes with a wealth of on-chip peripherals like an interrupt controller (INTC), three versatile timers (TMU), a real-time clock (RTC), two serial interface channels (SCI), user break controller (UBC), and programmable power management controller. The direct memory access controller (DMAC) has four channels. The DMAC is excellent for moving blocks of memory around with almost no CPU intervention needed. This makes for efficient transfers of data from main memory to graphics memory for example.  

16-bit Instructions  
The instructions on the SH-4 are 16-bits in size and are fixed-length instructions. This provides code density that is up to 40 percent less than RISC processors which use 32-bit fixed-length instructions. The MIPS achitecture for example uses 32-bit fixed-length instructions, so that a program that was 4 Megabytes in size on a MIPS CPU would only be 2.5 Megabytes on the SH-4. Another advantage in the SH-4 using 16-bit instructions is that it lowers the cache bandwidth needed to pull instructions onto the chip, so that it has more bandwidth for pulling and pushing data on and off the chip. Another advantage to using 16-bit fixed-length instructions is that the on-chip instruction cache does not have to be as large as a comparable RISC processor that uses 32-bit fixed-length instructions, thus allowing for a smaller chip to be made which makes it cheaper to produce and reduces the amount of heat generated on the chip.  Software written for the previous generations of the SH series processors, which are the SH-1, SH-2 and the SH-3, will be able to run on the SH-4. This could allow easy ports of Saturn software to the Dreamcast, but it is too early to tell how easy it would be.  

Superscaler  
So what does it mean when a chip is superscaler and is the SH-4 a superscaler CPU? A superscaler processing unit is capable of executing two or more instructions at the same time. The instructions on the SH-4 can be classified into four groups: integer, simple integer/load/store, branch and floating point. Any two instructions can be processed in parallel as long as they are from different groups. Integer and floating point instructions can be processed in parallel but not two branch instructions for example. As we can see the SH-4 is a superscaler CPU and a very powerful processor because of this flexibility in its design.  

External Data Path  
The external data path on the SH-4 in the 256 pin package can be as large as 64-bits as oppose to the SH-4 in the 208 pin package which can only have a external data path of 32-bits. The SH-4 is a very flexible architecture in that it allows a variable width on the external data path to be either 8, 16, 32 or 64-bits in size. This allows the SH-4 to boot off of a economical 8 bit ROM for example and then use high speed 64-bit SDRAM. The external bus unit on the SH-4 allows a glueless interface to either SGRAM, SDRAM, EDO DRAM, fast page DRAM, SRAM, MROM which helps keep the cost down for systems designed around this chip. A system that uses 100 MHz SDRAM with the SH-4 can see transfer rates of up to 800 MBytes/second with the 64-bit data bus. The SH-2 on the Saturn for comparison, running at 28.6 MHz with a 32-bit data bus had a transfer rate of up to 114.4 MBytes. Quite a difference between the two chips. The SH-4 64-bit bus with its 800 MBytes of transfer speed will greatly lend itself to 2D games. The Dreamcast will be a 2D/3D power house thanks to the inclusion of the SH-4.  
  
Cache  
24 KBytes of total cache which is 6 times larger then the 4 KByte cache of the SH-2 used in the Saturn. The 24 KByte of cache consists of a 8 KByte instruction cache and 16 KByte data cache. This segmented cache design provides higher performance then a unified cache design. The 4 KByte cache used on the Saturn's SH-2s is a unified cache where instructions and data have to share the same 4 KByte space. The data cache on the SH-4 can utilize write back (WB) and write through (WT) modes of operation.  

Memory Management Unit  
The MMU on the SH-4 provides full Microsoft Windows CE compatibility. Page sizes of 1 KByte, 4 KByte, 64 KByte, and 1 MByte; which Windows CE can use to partition memory and to provide memory protection between different processes that are executing on the SH-4. Memory protection is important, so that different execution threads do not interfere with each other's memory spaces which can cause either the operating system or an application to crash. 

 

next....