SH - 4
The
Jewel in SEGA's Crown?
There
is without a doubt, that the SH-4 is probably the best
CPU that Sega could have chosen for the Dreamcast. It has
even been said that Sega had a hand in deciding what
extra features would be on this chip. I could understand
Hitachi asking Sega what they would want on a next
generation SH series chip, since Sega is one of their
best customers. Sega has sold around 10 million Saturns
so far, which means that Hitachi has sold Sega 20 million
SH-2's and 10 million SH-1's. Hitachi certainly does not
want to lose a customer like Sega. The main change on the
SH-4 compared to the previous SH-3 and SH-2 is the
inclusion of a floating point unit that is configured for
calculating matrix math arrays. Graphic transformations
require lots of calculations with matrix arrays, so the
SH-4 would excel in graphic transformations. According to
Hitachi documents, the SH-4 is capable of handling an
impressive 5 million polygons per second which would
allow for games to have very complex and detailed polygon
models. There is no CPU currently in the PC world that
can even come close to the floating point power of the
SH-4.
Two SH-4 Versions
| Clock Rate |
200 MHz |
167 MHz |
| Model No. |
HD6417750BP200 |
HD6417750F167 |
| Integer Performance (Avg.) |
360 MIPS |
300 MIPS |
| Floating Point Performance (Max.) |
1.4 GFlops |
1.169 GFlops |
| Bus Size |
64 Bits |
32 Bits |
| Bus Bandwidth |
800 MBytes/second |
334 MBytes/second |
| Package |
256-pin ball grid array (BGA) |
208-pin quad flat package (QFP) |
| Pricing (10,000 units) |
4,000 yen (US$31.70)¹ |
3,000 yen (US$23.80)¹ |
| Availability (samples) |
January, 1998 |
January, 1998 |
| Availability (production
quantities) |
3rd Quarter 1998 |
3rd Quarter 1998 |
(1) Based on YEN to
US conversion for November 19th, 1997.
SH-4
Specifications
- 200
MHz
- 360
integer MIPS (Dhrystone 1.1
benchmark)
- 32-bit
integer unit
- 2-way
superscalar
- 5
stage pipeline
- 8
KByte instruction cache
- 16
KByte data cache
- 64-bit
floating point unit
- 1.4
GFlops (0.9 GFlops sustained), 5-million polygon
capability
- 64-bit
external bus (256 pin package)
- 800
MBytes/second bus bandwidth with 100 MHz SDRAM
- Glueless
bus memory interface to SGRAM, and SDRAM
- Internal
power of 1.8 V / 3.3 V (I/O)
- 1.5 W
(typ.) heat dissipation (at 200 MHz)
- 0.25
µm, four-layer metal CMOS process
- 42.25
mm² die size
- 208-pin
quad flat package (QFP) or 256-pin ball grid
array (BGA) package
SH-4
Peripherals
As you
can see by the above diagram, the SH-4 comes with a
wealth of on-chip peripherals like an interrupt
controller (INTC), three versatile timers (TMU), a
real-time clock (RTC), two serial interface channels
(SCI), user break controller (UBC), and programmable
power management controller. The direct memory access
controller (DMAC) has four channels. The DMAC is
excellent for moving blocks of memory around with almost
no CPU intervention needed. This makes for efficient
transfers of data from main memory to graphics memory for
example.
16-bit
Instructions
The
instructions on the SH-4 are 16-bits in size and are
fixed-length instructions. This provides code density
that is up to 40 percent less than RISC processors which
use 32-bit fixed-length instructions. The MIPS
achitecture for example uses 32-bit fixed-length
instructions, so that a program that was 4 Megabytes in
size on a MIPS CPU would only be 2.5 Megabytes on the
SH-4. Another advantage in the SH-4 using 16-bit
instructions is that it lowers the cache bandwidth needed
to pull instructions onto the chip, so that it has more
bandwidth for pulling and pushing data on and off the
chip. Another advantage to using 16-bit fixed-length
instructions is that the on-chip instruction cache does
not have to be as large as a comparable RISC processor
that uses 32-bit fixed-length instructions, thus allowing
for a smaller chip to be made which makes it cheaper to
produce and reduces the amount of heat generated on the
chip. Software written for the previous generations
of the SH series processors, which are the SH-1, SH-2 and
the SH-3, will be able to run on the SH-4. This could
allow easy ports of Saturn software to the Dreamcast, but
it is too early to tell how easy it would be.
Superscaler
So
what does it mean when a chip is superscaler and is the
SH-4 a superscaler CPU? A superscaler processing unit is
capable of executing two or more instructions at the same
time. The instructions on the SH-4 can be classified into
four groups: integer, simple integer/load/store, branch
and floating point. Any two instructions can be processed
in parallel as long as they are from different groups.
Integer and floating point instructions can be processed
in parallel but not two branch instructions for example.
As we can see the SH-4 is a superscaler CPU and a very
powerful processor because of this flexibility in its
design.
External
Data Path
The
external data path on the SH-4 in the 256 pin package can
be as large as 64-bits as oppose to the SH-4 in the 208
pin package which can only have a external data path of
32-bits. The SH-4 is a very flexible architecture in that
it allows a variable width on the external data path to
be either 8, 16, 32 or 64-bits in size. This allows the
SH-4 to boot off of a economical 8 bit ROM for example
and then use high speed 64-bit SDRAM. The external bus
unit on the SH-4 allows a glueless interface to either
SGRAM, SDRAM, EDO DRAM, fast page DRAM, SRAM, MROM which
helps keep the cost down for systems designed around this
chip. A system that uses 100 MHz SDRAM with the SH-4 can
see transfer rates of up to 800 MBytes/second with the
64-bit data bus. The SH-2 on the Saturn for comparison,
running at 28.6 MHz with a 32-bit data bus had a transfer
rate of up to 114.4 MBytes. Quite a difference between
the two chips. The SH-4 64-bit bus with its 800 MBytes of
transfer speed will greatly lend itself to 2D games. The
Dreamcast will be a 2D/3D power house thanks to the
inclusion of the SH-4.
Cache
24
KBytes of total cache which is 6 times larger then the 4
KByte cache of the SH-2 used in the Saturn. The 24 KByte
of cache consists of a 8 KByte instruction cache and 16
KByte data cache. This segmented cache design provides
higher performance then a unified cache design. The 4
KByte cache used on the Saturn's SH-2s is a unified cache
where instructions and data have to share the same 4
KByte space. The data cache on the SH-4 can utilize write
back (WB) and write through (WT) modes of
operation.
Memory Management Unit
The
MMU on the SH-4 provides full Microsoft Windows CE
compatibility. Page sizes of 1 KByte, 4 KByte, 64 KByte,
and 1 MByte; which Windows CE can use to partition memory
and to provide memory protection between different
processes that are executing on the SH-4. Memory
protection is important, so that different execution
threads do not interfere with each other's memory spaces
which can cause either the operating system or an
application to crash.
next....
|