Technical Overview of GPU (Proteus 5)
developed by Parims
since 2007

Distance-Depended Environment Mapping on Semi-Specular Surface object (rendered
by K.Ikeda, Canon Inc.)
Table 1 lists the brief specifications of Proteus_5.
|
Functions
|
Performance
|
Remarks
|
| Programmable Engine | Embedded | see Multimedia Engine |
| Programmable Shader | Embedded | see Programmable Shader |
|
Shading Light-source |
|
Shading models
are user program- mable. |
|
Texture Mapping Simultaneous Texture Mapping |
|
Environment Cube |
|
Hair Generator |
HW |
Spline |
| Antialias | HW | Silhouette-line & Full screen |
| Shadow | HW | Silhouette-line soft shadow Omnidirectional shadow casting |
| Volume
Metric Renderer
Fog |
|
incl. Local view transformation 700 million density primitives / sec at 1GHz |
| Video Capture | HW | NTNC, HDTV |
|
Performance per channel Pixel
(Rasterizer) |
|
at 1GHz operating frequency |
| Sound | Dolby Sounds 3D Acoustics |
8 R/L channels |
| Power |
< 1.2W max. |
|
| Scalability | Yes | No limitation |
PROTEUS 5 has following features.
PROTEUS 5
carries various functions on the pure hardware modules,
which claim the performance hundreds times faster than the embedded software and
tens times faster than commercially available products ever seen. The
collaborative implementations of embedded software of multimedia engine and
reconfigurable hardware of renderer offer both flexibility and incomparable
performance to the wide-ranged applications.
Figure 1 shows the example of overall
block diagram of a single chip multimedia processor.

Figure 1 A
Single Chip Multimedia Processor
IF/AP module in Figure 1 includes e.g., PCI, I2S, specific network IF, and camera IF. PROTEUS 5 mainly consists of three IP submodules; multimedia engine (PE), hardware renderer and rasterizer (included in Video Mixer in Figure 1). Each module plays the function-specific roles of coarse-, medium- and fine-grained processing respectively as following;
The following functions could be embedded in the Engine.
1-1 Reconfigurable Arithmetic Operation Macro-Module
Multimedia Engine executes complex and high-degree arithmetic operations with a single instruction. The following arithmetic units are interconnected by writable micro-codes and configured the application-specific functions.
The four register files load 8 variables into above units in parallel. The following arithmetic operations run with a single clock cycle. Thus, the multiplication of 4 x 4 matrix and 3D homogeneous coordinates (x, y, z, w) is carried out by 4 clock cycles.

Figure 2 Programmable Function Generator
FPP unit includes a trigonometric or exponential function that runs with a single clock cycle, and two clock cycles provide further complex equations combining above equations with the following functions. Recursively storing the results into register files, the more complex equations can be solved with a few clock cycles. Above functions are often used for the geometric coordinate transformation, vector manipulation and filtering. Though the latency varies corresponding to the formula, the synchronization of input variables is automatically adjusted in this module and improve the operation rate of units drastically.
Function Specific unit is optionally provided with IP modules, which perform the function applicable for specific purposes. Figure 3 shows an example of hardware clipper embedded in this unit. Implementing this clipper with hardware, the performance can be obtained within a single clock per vector.

Figure 3 Hardware Clipper Embedded in Function Specific Unit
1.2 Pipeline Interconnection of PE Archtecture
Without exception, the 3D objects are rendered involving viewing transformation. The viewing transformation implies rotation, transition, scaling, perspective projection, clipping and view port mapping. In 3D system at medium-grain, workloads for these processing dominate the rendering performance. PROTEUS5 has a hardware accelerator specialized for 3D viewing transformation to enable realtime rendering of millions polygons per second. It practically produces the transformed vertices 0.7G / sec. for 4 x 4 matrix multiplication without clipping and 200 millions / sec. with 3D clipping polygons (in 4D homogeneous coordinate at 1GMHz) under the perspective projection. This performance may be superior to the commercially available high-end chips in market. Figure 4 shows the hardware accelerator unit of PROTEUS5. The matrix and vector multiplications are carried out at the Multimedia Engine.

Figure 4 Pipelined PE Architecture
A renderer consists of multiple function-specific hardware. All modules are interconnected in parallel or pipeline structures as shown in Figure 5. The pixels produced in these modules are transferred to the pixel cache and the frame buffers. The Renderer includes the following sub-modules;

Figure 5 Graphics Renderer
PROTEUS 5 has not only powerful hardware renderer as shown in Figure 5, but also supports the polygon-fill interpolation of physical based variables. Defining the vectors on polygon vertices directly, Multimedia Engine can reduce the workload hundreds times less than those of Gouraud or texture based rendering. The vertex variables of polygon in the PROTEUS5 are shown in Table 2.1.
| Variables | Format(Shader) | Format (Renderer) |
| 3D Geometry Coordinate | FP(23 bits) x, y, z, w | Int. 16-bit(x, y, w), 32-bit (z) |
| Texture Coordinate | FP(23 bits) u, v | Int. 16 max. |
| View Direction | FP(23 bits) Vx, Vy, Vz | Int. 16-bit |
| Surface Normal | FP(23 bits) Nx, Ny, Nz | Int. 16-bit |
| Light-Source Incident | FP(23 bits) Lx, Ly, Lz | Int. 16-source max. |
| Primary Color | R, G, B, A | Int. 10-bit each |
| Secondary Color | R, G, B, A | Int. 10-bit each |
| Surface Tangent | FP(23 bits) Sx, Sy, Sz | Int. 16-bit each |
| Light-Source Coordinate | FP(23 bits) Xl, Yl, Zl | Int. 16-bit (x, y), 32-bit (z) 4 max. |
| World Coordinate | FP(23 bits) xw, yw, zw | Int. 16-bit |
| Bump up Vector | Int. 16-bit | |
| Transparency | FP(23 bits) | Int. 16-bit |
| Attenuation | FP(23 bits) | Int. 16-bit |
| Refraction Vector | FP(23 bits) | Int. Tx, Ty, Tz |
Table 2.2 shows the definitions of texture pattern. In Table 2.2, the texture coordinate has two variables. The PROTEUS5 generates the three-dimensional texture coordinate (e.g., for environment cube mapping) using the vectors of vertices.
| Pattern Contents | Remarks |
| Color | R, G, B, A 10-bit each |
| Bump Map Normal | Bx, By, Bz 10-bit each |
| Surface Tangent | Sx, Sy, Sz 10-bit each |
| Environment Map | R, G, B, I 8-bit each |
| Reflectance Coefficient | N |
| Mask Bit | 1/0 |
| Threshold | R, G, B |
For mobile phone, the lower precision and resolution for the vertex variables are acceptable. The PROTEUS5 provides the optimized structure corresponding to the applications.
2.2 Programmable Illuminator
PROTEUS 5 provides a programmable illuminator of various light-reflection models on the basis of empirical model such as Phong shading and physical models such as Cook / Torrance, general BRDF, anisotropic BRDF and BSSRDF. It consists of pure hardware and produces a shaded pixel in pipeline engine. The PROTEUS5 illuminator avoids the ad-hoc image rendering and heavy workload used texture mapping. Figure 6 shows our programmable illuminator.

Figure 6 Example of Cook-Torrance Configuration in Programmable Illuminator
A Pixel Manipulator carries on the post-synthesis between frame buffers at video timing. It includes the following functions.
The PROTEUS5 implements the video controller that generates video timing, cursors, gamma compensation, frame buffer interface. A post-synthesis between gaseous and surface-defined objects, post-antialiasing on the basis of boundary edge method are also carried out as the final stage of video scanning.
Video capture module supports the capturing of video streams of various video rates sucha s NTSC and HDTV in realtime. The captured video stream may be mapped onto the image buffer after scaled or filtered in the BitBlock Transfer module. Figure 7 shows the structure of frame buffer system.
Proteus_5 is derived from
Processor
Technologies of Unlimited
Scalability. “5” implies five
phantasmagoric figures to handle text, graphics, video,
audio and mixed reality.