We are expanding the Spiral program generation system to generate fast code for the Cell BE. Our first targets are linear transforms, most importantly, the discrete Fourier transform (DFT).
The Cell Broadband Engine is a chip-multiprocessor designed for high-density floating point computation. As shown in the figure below, its design includes multiple SIMD vector cores called SPEs (synergistic processing elements) with large register files. SPEs have their own local memory (local stores), and transfers from main memory to the local stores are handled explicitly by the programmer. These and other characteristics make the Cell BE difficult to program and to achieve high performance on.
The Cell BE is capable of a theoretical peak floating point performance of 204.8 Gflop/s using just the SPEs. The most affordable way of obtaining a Cell BE is by buying a Playstation 3 (PS3). However, only 6 SPEs in the PS3 are accessible by the programmer.
|
|
Our experiments were conducted on Sony's PlayStation 3 (Cell processor at 3.2 GHz, 6 available SPEs), and the IBM Cell Blade QS20 (we used a single Cell processor with 8 SPEs). The plots show the performance of generate code for the 1D and 2D discrete Fourier transform (DFT) for various sizes and two input formats. The plots indicate where the input and output vectors are assumed to be resident: local stores (LS) or main memory. This is ongoing work.
Copyrights to many of the above papers are held by the publishers. The attached PDF files are preprints. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder. Some links to papers above connect to IEEE Xplore with permission from IEEE, and viewers must follow all of IEEE's copyright policies.
Contact: Srinivas Chellappa (schellap@andrew.cmu, you have to add dot edu)