|
NeuroMatrix® NMC3 DSP Core
NeuroMatrix® Core 3 (NMC3) is Á high performance DSP core with VLIW/SIMD/decoupled architectures. The core includes a 32-bit RISC processor and a 64-bit VECTOR co-processor to support vector operations with elements of variable bit length. The NMC3 core has been silicon proven in NM6405 DSP. It has scalable performance and original instruction set. The core is program compatible with predecessors NMC and NMC2.
Documentation
General features
- 32/64-bit proprietary RISC-core
- 64-bit Vector coprocessor (Patent US 6,539,368 B1).
- Small number of equivalent gates - 160.000
- Clock frequency:
- 150 MHz (CMOS technology - 0.25µ)
- 320 MHz (CMOS technology - 0.09µ);
- Speed-up of the coefficients loading into Vector coprocessor - one coefficient vector per clock cycle
- Up to 6 concurrent read/write operations per clock cycle
- Hardware subroutine return accelerator (interrupt processing)
- First clock cycle address register modification while multi cycle input/output instructions execution. This feature accelerates next instruction execution in case of the same address register usage
- Embedded queue mechanism into NMC3 pipeline gains performance while using synchronous both internal and external memory banks with different pipeline depth
- Singular address generator that is used for many input/output instructions simplifies processor core complexity
RISC processor features
- Data width - 32 bit
- Instruction width - 32 and 64 bit
- Address space - 4GÈ32 bit
- 8 stage pipeline includes queues to simultaneously work with some synchronous memory banks, each of them with different a pipeline depth
- 3 scalar instruction per clock cycle (ALU operation, address modification input/output operation)
- Performance - 150 MIPS (450 MOPS)
Vector coprocessor features
- SIMD (single-instruction-multiple-data) architecture
- programmable data length from 2 up to 64 bits (64bit length data words packed)
- basic operation is integer data matrix multiplication by integer data matrix
- concurrent execution of 2 saturation operations with input data flow
NMC3 Vector coprocessor performance
The Vector coprocessor, based on a SIMD architecture, works on packed integer data comprising 64-bit blocks in the formš of variable 1- to 64-bit words. An application can start with massive short word-length input data flow and maximum performance. Following operations lead to increase precision with dynamically data flow compression and performance decrease. To avoid arithmetic overflow, the NMC3 uses two types of saturation functions with user-programmable saturation boundaries.
Pick-performance (íáó - Multiplication and Accumulation per clock cycle):
- 2 MAC for 32-bit data;
- 4 MAC for 16- bit data;
- 24 MAC for 8- bit data;
- 80 MAC for 4- bit data;
- 224 MAC for 2- bit data.
Applications
- Hydro- and radiolocation
- IR and video processing
- Artificial Neural net emulation
- Navigation
- CDMA É TDMA base stations
- Vector and matrix computations
The NMC3 processor core is available in form of both Hard IP-block (by customer request) and a synthesizable Verilog RTL model with set of functional tests and user guides.
RC "Module" supply the software design kit NM-SDK Version 3.0 including an optimizing C++ compiler (ISO/IEC 14882:1998 standard), assembler, disassembler, linker, debugger and real-time DSP and NeuroMatrixR Processing Library (NMPL). The compiler adheres to the C++ standard, including templates, and uses the enhanced optimizing algorithms that allow increasing program execution speed and decreasing code size. The assembly language has an intuitive syntax and is close to high-level languages so it can simplify the development and understanding of source code for math-intensive real-time algorithms.
Single-DSP NM6405 PCI Evaluation Board can be used for software design. |