Technical Overview of GPU (Proteus 5) developed by Parims
since 2007



Distance-Depended Environment Mapping on Semi-Specular Surface object (rendered by K.Ikeda, Canon Inc.)


  Proteus 5 is a visual and audio processor providing mobile devices with powerful acceleration of multimedia computing. The super-parallel pipeline structures and visual technologies on empirical and physical based models enables to augment mixed reality scenes in real-time. Combing with various IPs, Proteus 5 promises the superior cost/performance to customary commercial products for following applications.

  Table 1 lists the brief specifications of Proteus_5.

Table 1. Proteus 5 Hardware Specification
Functions
Performance
Remarks
Programmable Engine Embedded see Multimedia Engine
Programmable Shader Embedded see Programmable Shader

Shading

Light-source
Flat/Gouraud and Phong
Cook and Torrance
BRDF / BSSRDF
Anisotropic Reflection
Fine-Grained Bump Shading
Sub-surface Reflection
Spot and Directional Lightening



8 max.
HW*

HW
HW
HW
HW
HW
HW

Shading models are user program- mable.
Pure BSSRDF. 4096 sampling point max.

8 light-sources

Texture Mapping

Simultaneous Texture Mapping
Filtering
Environment Mapping
Refraction Mapping
Global Illumination Mapping
Bump Mapping
Surface Tangent Mapping



8 max.
Tri-linear, Anisotropic
HW
HW
HW
HW
HW

Environment Cube
Rotation with surface normal is carried by HW

Hair Generator

HW

Spline
Antialias HW Silhouette-line & Full screen
Shadow HW Silhouette-line soft shadow
Omnidirectional shadow casting
Volume Metric Renderer

Fog
Cloud
Smoke/Steam

 

HW
HW
HW

incl. Local view transformation
700 million density primitives / sec
at 1GHz
Video Capture HW NTNC, HDTV

Performance per channel

Pixel (Rasterizer)
Triangle Polygon (View pipeline)



250 millions max./sec
150 millions max. /sec.


at 1GHz operating frequency
Sound Dolby Sounds
3D Acoustics
8 R/L channels
Power

< 1.2W max.
250mW typical.

Scalability Yes No limitation
Remark: HW: pure hardware circuit

PROTEUS 5 has following features.


  PROTEUS 5 carries various functions on the pure hardware modules, which claim the performance hundreds times faster than the embedded software and tens times faster than commercially available products ever seen. The collaborative implementations of embedded software of multimedia engine and reconfigurable hardware of renderer offer both flexibility and incomparable performance to the wide-ranged applications.
 Figure 1 shows the example of overall block diagram of a single chip multimedia processor.


Figure 1 A Single Chip Multimedia Processor

 IF/AP module in Figure 1 includes e.g., PCI, I2S, specific network IF, and camera IF. PROTEUS 5 mainly consists of three IP submodules; multimedia engine (PE), hardware renderer and rasterizer (included in Video Mixer in Figure 1). Each module plays the function-specific roles of coarse-, medium- and fine-grained processing respectively as following;


1. Multimedia Engine

 A Multimedia Engine has a 32-bit processor architecture based on the writable stored microprogram (128 horizontal bit field). It plays the graphics acceleration and audio / acoustic rendering at 4G-IPC. The brief specifications are as followings.
  1. 32-bit Microprogram Structure with 32K program-step of 128-bit field
  2. 128 x 32-bit and 64 x 88-bit register files
  3. Integer ALU and FPP units(with Multiplier, Divider, Adder/Subtracter, Programmable function (e.g., sine, cosine, tangent, exponent, etc.) generator, Integer to Floating-point converter, Floating-point to Integer converter, Integer Multiplier, Barrel shifter
  4. Programmable Fuction unit
  5. 256K Data Cache Memory
  6. External Memory IF (DDR, SRAM up to 4Gb)

 The following functions could be embedded in the Engine.

  1. OpenGL based Instruction sets
  2. Multi-Window Transformation
  3. Polygonizer
  4. Polygon Subdivider
  5. Collision Detector
  6. Fuzzy Object Generator
  7. Per Vertex Illuminator
  8. 5.1 Dolby Sounds
  9. 3D Acoustics

1-1 Reconfigurable Arithmetic Operation Macro-Module

 Multimedia Engine executes complex and high-degree arithmetic operations with a single instruction. The following arithmetic units are interconnected by writable micro-codes and configured the application-specific functions.

 The four register files load 8 variables into above units in parallel. The following arithmetic operations run with a single clock cycle. Thus, the multiplication of 4 x 4 matrix and 3D homogeneous coordinates (x, y, z, w) is carried out by 4 clock cycles.

 


Figure 2 Programmable Function Generator

 FPP unit includes a trigonometric or exponential function that runs with a single clock cycle, and two clock cycles provide further complex equations combining above equations with the following functions. Recursively storing the results into register files, the more complex equations can be solved with a few clock cycles. Above functions are often used for the geometric coordinate transformation, vector manipulation and filtering. Though the latency varies corresponding to the formula, the synchronization of input variables is automatically adjusted in this module and improve the operation rate of units drastically.

  Function Specific unit is optionally provided with IP modules, which perform the function applicable for specific purposes. Figure 3 shows an example of hardware clipper embedded in this unit. Implementing this clipper with hardware, the performance can be obtained within a single clock per vector.


Figure 3 Hardware Clipper Embedded in Function Specific Unit

1.2 Pipeline Interconnection of PE Archtecture 

  Without exception, the 3D objects are rendered involving viewing transformation. The viewing transformation implies rotation, transition, scaling, perspective projection, clipping and view port mapping. In 3D system at medium-grain, workloads for these processing dominate the rendering performance. PROTEUS5 has a hardware accelerator specialized for 3D viewing transformation to enable realtime rendering of millions polygons per second. It practically produces the transformed vertices 0.7G / sec. for 4 x 4 matrix multiplication without clipping and 200 millions / sec. with 3D clipping polygons (in 4D homogeneous coordinate at 1GMHz) under the perspective projection. This performance may be superior to the commercially available high-end chips in market. Figure 4 shows the hardware accelerator unit of PROTEUS5. The matrix and vector multiplications are carried out at the Multimedia Engine.


Figure 4 Pipelined PE Architecture


2. Graphics Renderer

2.1 Renderer

 A renderer consists of multiple function-specific hardware. All modules are interconnected in parallel or pipeline structures as shown in Figure 5. The pixels produced in these modules are transferred to the pixel cache and the frame buffers. The Renderer includes the following sub-modules;


Figure 5 Graphics Renderer

 PROTEUS 5 has not only powerful hardware renderer as shown in Figure 5, but also supports the polygon-fill interpolation of physical based variables. Defining the vectors on polygon vertices directly, Multimedia Engine can reduce the workload hundreds times less than those of Gouraud or texture based rendering. The vertex variables of polygon in the PROTEUS5 are shown in Table 2.1.

Table 2.1 Variables of Polygon Vertex
Variables Format(Shader) Format (Renderer)
3D Geometry Coordinate FP(23 bits) x, y, z, w Int. 16-bit(x, y, w), 32-bit (z)
Texture Coordinate FP(23 bits) u, v Int. 16 max.
View Direction FP(23 bits) Vx, Vy, Vz Int. 16-bit
Surface Normal FP(23 bits) Nx, Ny, Nz Int. 16-bit
Light-Source Incident FP(23 bits) Lx, Ly, Lz Int. 16-source max.
Primary Color R, G, B, A Int. 10-bit each
Secondary Color R, G, B, A Int. 10-bit each
Surface Tangent FP(23 bits) Sx, Sy, Sz Int. 16-bit each
Light-Source Coordinate FP(23 bits) Xl, Yl, Zl Int. 16-bit (x, y), 32-bit (z) 4 max.
World Coordinate FP(23 bits) xw, yw, zw Int. 16-bit
Bump up Vector   Int. 16-bit
Transparency FP(23 bits) Int. 16-bit
Attenuation FP(23 bits) Int. 16-bit
Refraction Vector FP(23 bits) Int. Tx, Ty, Tz

 Table 2.2 shows the definitions of texture pattern. In Table 2.2, the texture coordinate has two variables. The PROTEUS5 generates the three-dimensional texture coordinate (e.g., for environment cube mapping) using the vectors of vertices.

Table 2.2. Texture Patterns
Pattern Contents Remarks
Color R, G, B, A      10-bit each
Bump Map Normal Bx, By, Bz      10-bit each
Surface Tangent Sx, Sy, Sz       10-bit each
Environment Map R, G, B, I        8-bit each
Reflectance Coefficient N
Mask Bit 1/0
Threshold R, G, B

 For mobile phone, the lower precision and resolution for the vertex variables are acceptable. The PROTEUS5 provides the optimized structure corresponding to the applications.

2.2 Programmable Illuminator

 PROTEUS 5 provides a programmable illuminator of various light-reflection models on the basis of empirical model such as Phong shading and physical models such as Cook / Torrance, general BRDF, anisotropic BRDF and BSSRDF. It consists of pure hardware and produces a shaded pixel in pipeline engine. The PROTEUS5 illuminator avoids the ad-hoc image rendering and heavy workload used texture mapping. Figure 6 shows our programmable illuminator.


Figure 6 Example of Cook-Torrance Configuration in Programmable Illuminator


3. Rasterizer

 A Pixel Manipulator carries on the post-synthesis between frame buffers at video timing. It includes the following functions.

  1. Video and Cursor Generator
  2. Frame Buffer Interface (DDR)
  3. Video Capture
  4. DAC Interface

 The PROTEUS5 implements the video controller that generates video timing, cursors, gamma compensation, frame buffer interface. A post-synthesis between gaseous and surface-defined objects, post-antialiasing on the basis of boundary edge method are also carried out as the final stage of video scanning.

 Video capture module supports the capturing of video streams of various video rates sucha s NTSC and HDTV in realtime. The captured video stream may be mapped onto the image buffer after scaled or filtered in the BitBlock Transfer module. Figure 7 shows the structure of frame buffer system.


Figure 7 Frame Buffer Structure


More Information (coming soon)

 



Proteus_5 is derived from Processor Technologies of Unlimited Scalability. “5” implies five phantasmagoric figures to handle text, graphics, video, audio and mixed reality.