Low Overhead Monitor Mechanism for Fault-tolerant Analysis of NoC

J Liu, Jim Harkin, Yuhua Li, Liam Maguire, Alejandro Linares-Barranco

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

Modern Networks-on-Chip (NoC) have the capability to tolerate and adapt to the faults and failures in the hardware. Monitoring and debugging is a real challenge due to the NoC system complexity and large scale size. A key requirement is an evaluation and benchmarking mechanism to quantitatively analyse a NoC system’s fault tolerant capability. A novel monitoring mechanism is proposed to evaluate the fault tolerant capability of an NoC by: (1) using a compact monitor probe to detect the events of each NoC node; (2) re-using the exist NoC infrastructure to communicate analysis data back to a terminal PC which removes the need for additional hardware resources and maintain hardware scalability and (3) calculating throughput, the number of lost/corrupted packets and generating a heat map of NoC traffic for quantitative analysis. The paper presents results on a case study using an example fault-tolerant routing algorithm and highlights the minimal area overhead of the monitoring mechanism (~6%). Results demonstrate that the proposed online monitoring strategy is highly scalable due to the compact monitor probe and the ability to reuse the existing NoC communication infrastructure. In addition, the traffic heat map generation and throughput display demonstrates benefits in aiding NoC system prototyping and debugging.
LanguageEnglish
Title of host publicationUnknown Host Publication
Number of pages8
Publication statusPublished - 25 Sep 2014
EventIEEE 8th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-14) - Aizu-Wakamatsu, Japan
Duration: 25 Sep 2014 → …

Conference

ConferenceIEEE 8th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-14)
Period25/09/14 → …

Fingerprint

Monitoring
Hardware
Throughput
Network-on-chip
Benchmarking
Routing algorithms
Scalability
Display devices
Communication
Chemical analysis
Hot Temperature

Keywords

  • Networks-on-chip
  • fault tolerance

Cite this

Liu, J., Harkin, J., Li, Y., Maguire, L., & Linares-Barranco, A. (2014). Low Overhead Monitor Mechanism for Fault-tolerant Analysis of NoC. In Unknown Host Publication
Liu, J ; Harkin, Jim ; Li, Yuhua ; Maguire, Liam ; Linares-Barranco, Alejandro. / Low Overhead Monitor Mechanism for Fault-tolerant Analysis of NoC. Unknown Host Publication. 2014.
@inproceedings{fd0b4a91a6084ed9a4ae4d02879b2196,
title = "Low Overhead Monitor Mechanism for Fault-tolerant Analysis of NoC",
abstract = "Modern Networks-on-Chip (NoC) have the capability to tolerate and adapt to the faults and failures in the hardware. Monitoring and debugging is a real challenge due to the NoC system complexity and large scale size. A key requirement is an evaluation and benchmarking mechanism to quantitatively analyse a NoC system’s fault tolerant capability. A novel monitoring mechanism is proposed to evaluate the fault tolerant capability of an NoC by: (1) using a compact monitor probe to detect the events of each NoC node; (2) re-using the exist NoC infrastructure to communicate analysis data back to a terminal PC which removes the need for additional hardware resources and maintain hardware scalability and (3) calculating throughput, the number of lost/corrupted packets and generating a heat map of NoC traffic for quantitative analysis. The paper presents results on a case study using an example fault-tolerant routing algorithm and highlights the minimal area overhead of the monitoring mechanism (~6{\%}). Results demonstrate that the proposed online monitoring strategy is highly scalable due to the compact monitor probe and the ability to reuse the existing NoC communication infrastructure. In addition, the traffic heat map generation and throughput display demonstrates benefits in aiding NoC system prototyping and debugging.",
keywords = "Networks-on-chip, fault tolerance",
author = "J Liu and Jim Harkin and Yuhua Li and Liam Maguire and Alejandro Linares-Barranco",
year = "2014",
month = "9",
day = "25",
language = "English",
booktitle = "Unknown Host Publication",

}

Liu, J, Harkin, J, Li, Y, Maguire, L & Linares-Barranco, A 2014, Low Overhead Monitor Mechanism for Fault-tolerant Analysis of NoC. in Unknown Host Publication. IEEE 8th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-14), 25/09/14.

Low Overhead Monitor Mechanism for Fault-tolerant Analysis of NoC. / Liu, J; Harkin, Jim; Li, Yuhua; Maguire, Liam; Linares-Barranco, Alejandro.

Unknown Host Publication. 2014.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Low Overhead Monitor Mechanism for Fault-tolerant Analysis of NoC

AU - Liu, J

AU - Harkin, Jim

AU - Li, Yuhua

AU - Maguire, Liam

AU - Linares-Barranco, Alejandro

PY - 2014/9/25

Y1 - 2014/9/25

N2 - Modern Networks-on-Chip (NoC) have the capability to tolerate and adapt to the faults and failures in the hardware. Monitoring and debugging is a real challenge due to the NoC system complexity and large scale size. A key requirement is an evaluation and benchmarking mechanism to quantitatively analyse a NoC system’s fault tolerant capability. A novel monitoring mechanism is proposed to evaluate the fault tolerant capability of an NoC by: (1) using a compact monitor probe to detect the events of each NoC node; (2) re-using the exist NoC infrastructure to communicate analysis data back to a terminal PC which removes the need for additional hardware resources and maintain hardware scalability and (3) calculating throughput, the number of lost/corrupted packets and generating a heat map of NoC traffic for quantitative analysis. The paper presents results on a case study using an example fault-tolerant routing algorithm and highlights the minimal area overhead of the monitoring mechanism (~6%). Results demonstrate that the proposed online monitoring strategy is highly scalable due to the compact monitor probe and the ability to reuse the existing NoC communication infrastructure. In addition, the traffic heat map generation and throughput display demonstrates benefits in aiding NoC system prototyping and debugging.

AB - Modern Networks-on-Chip (NoC) have the capability to tolerate and adapt to the faults and failures in the hardware. Monitoring and debugging is a real challenge due to the NoC system complexity and large scale size. A key requirement is an evaluation and benchmarking mechanism to quantitatively analyse a NoC system’s fault tolerant capability. A novel monitoring mechanism is proposed to evaluate the fault tolerant capability of an NoC by: (1) using a compact monitor probe to detect the events of each NoC node; (2) re-using the exist NoC infrastructure to communicate analysis data back to a terminal PC which removes the need for additional hardware resources and maintain hardware scalability and (3) calculating throughput, the number of lost/corrupted packets and generating a heat map of NoC traffic for quantitative analysis. The paper presents results on a case study using an example fault-tolerant routing algorithm and highlights the minimal area overhead of the monitoring mechanism (~6%). Results demonstrate that the proposed online monitoring strategy is highly scalable due to the compact monitor probe and the ability to reuse the existing NoC communication infrastructure. In addition, the traffic heat map generation and throughput display demonstrates benefits in aiding NoC system prototyping and debugging.

KW - Networks-on-chip

KW - fault tolerance

M3 - Conference contribution

BT - Unknown Host Publication

ER -

Liu J, Harkin J, Li Y, Maguire L, Linares-Barranco A. Low Overhead Monitor Mechanism for Fault-tolerant Analysis of NoC. In Unknown Host Publication. 2014