Open Access Open Access  Restricted Access Subscription or Fee Access

Embedded Systems Neural Network Stream Processing Core (NnSP)

Er. Jnana Ranjan Tripathy, Dr. Hrudaya Kumar Tripathy


Exploiting neural networks native parallelism and interaction locality, dedicated parallel hardware implementation of neural networks is essential for their effective use in time critical applications. The architecture proposed in this paper is a parallel stream processor called Neural Networks Stream Processor or NnSP which can be programmed to realize different neural-network topologies and architectures. NnSP is a collection of programmable processing engines organized in custom FIFO based cache architecture and busing system. Streams of synaptic data flow through the parallel processing elements, and computations are performed based on the instructions embedded in the preambles of the data streams. The command and configuration words embedded in the preamble of a stream, program each processing element to perform a desired computation on the upcoming data. The packetized nature of the stream architecture brings up a high degree of flexibility and scalability for NnSP. The stream processor is synthesized targeting an ASIC standard cell library for SoC implementation and also is realized on Xilinx VirtexII-Pro SoPC beds. A neural network employed for mobile robot navigation control, is implemented on the realized SoPC hardware. The realization speedup achievements are presented here.


Neural Networks; Stream Processors; Parallel Processing; Soc Implementation

Full Text:



D. Roggen, S. Hofmann, Y. Thoma, and D. Floreano, “Hardware spiking neural network with run-time reconfigurable connectivity in an autonomous robot,” in Proc. The NASA/Dod Conference on Evolvable Hardware, 2003.

A. Prez-Uribe, “Structure-adaptable digital neural networks,” Ph.D. dissertation, Swiss Federal Institute of Technology-Lausanne, Lausanne, 1999.

K. W. Przytula and V. K. Prasnna, Paralle Digital Implementations of Neural Networks. Englewood Cliffs, New Jersey: Prentice-Hall, 1993.

S. M. Fakhraie and K. C. Smith, VLSI-Compatible Implementations for Artificial Neural Networks. Norwell, Massachusets: Kluwer-Academic Publisher, 1997.

B. Khailany, W. Dally, U. Kapasi, P. Mattson, J. Namkoong, J. Owens, B. Towles, A. Chang, S. Rixner, Imagine: Media processing with streams, IEEE Micro, pp. 35-46, Volume 21, No. 2, (March 2001).

U. Kapasi, S. Rixner, W. Dally, B. Khailany, J. Ahn, P. Mattson, J. Owens, Programmable stream processors, IEEE Computer, pp. 54-62, Volume 36, No. 8, (August 2003).

J. C. Principe, N. R. Euliano, and W. C. Lefebvre, Neural and Adaptive Systems: Fundamentals through Simulation. New Yourk, NY: John- Wiley & Sons, 2000.

G. Dorffner, “Unified framework for MLPs and RBFNs: Introducing conic section networks,” Cybernetics and Systems: An International Journal, vol. 25, pp. 511–554, 1994.

(2005). [Online]. Available:

Olivier Michel. Khepera simulator package version 2.0: Freeware mobile robot simulator written at the University of Nice Sophia--Antipolis by Olivier Michel, 2005. Downloadable from the World Wide Web at

W. Elmenreich, “Intelligent methods for embedded systems,” in Proc.of the First Workshop on Intelligent Solutions for Embedded Systems, Vienna, Austria, June 2003, pp. 3–11.

B. Mathew, A. Davis, and M. Parker, “A low power architecture for embedded perception,” in Proc. of the 2004 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES’04). Washington DC,

T. Zhang, L. Benini, and G. D. Micheli, “Component selection andmatching for ip-based design,” in Proc. of Design, Automation and Test in Europe (DATE’01), Mar. 13–16, 2001, pp. 40 46.

S. M. Fakhraie and K. C. Smith, VLSI Compatible Implementations forArtificial Neural Networks. Norwell, Massachusetts: Kluwer Academic Publishers, 1997.

J. Zhu and P. Sutton, “FPGA implementations of neural networks – asurvey of a decade of progress,” in Proc. of the Field Programmable Logic Conference (FPL’2003), 2003, pp. 1062–1066

L. M. Reyneri, “Implementation issues of neuro-fuzzy hardware: Going toward hw/sw codesign,” IEEE Transactions on Neural Networks, vol. 14, no. 1, pp. 176–194, Jan. 2003.

D. Roggen, S. Hofmann, Y. Thoma, and D. Floreano, “Hardwarespiking neural network with run-time reconfigurable conectivity in anautonomous robot,” in Proc. of the 2003 NASA/DoD Conference onEvolvable Hardware, Los Alamitos, California, 2003, pp. 189–198.

U. Kapasi, S. Rixner, W. Dally, B. Khailany, J. Ahn, P. Mattson, andJ. Owens, “Programmable stream processors,” IEEE Computer, vol. 36, no. 8, Aug. 2003.

G. Mayraz and G. E. Hinton, “Recognizing handwritten digits usinghierarchical products of experts,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 2, pp. 189–197, Feb. 2002.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.