Online Traffic-Aware Fault Detection for Networks-on-Chip

Research output: Contribution to journalArticle

31 Citations (Scopus)

Abstract

A key requirement for modern Networks-on-Chip (NoC) is the ability to detect and diagnose faults and failures. This paper addresses the challenge of fault diagnosis using online testing where the interruption of the runtime operation (performance) under diagnosis is minimised. A novel Monitor Module (MM) is proposed to detect NoC interconnect faults which minimise the intrusion of the regular NoC traffic throughput by (1) using a channel tester which only examines NoC channels when they are idle; and (2) using a testing interval parameter based on the Binary Exponential Back off algorithm to dynamically balance the level of testing when recovering from temporary faults. The paper presents results on the minimal impact on NoC throughput for a range of testing conditions and also highlights the minimal area overhead of the MM (11.56%) compared with an adaptive NoC router implemented on FPGA hardware. Simulation results demonstrate non-intrusion of the NoC runtime traffic throughput when channel are fault free, and also how throughput loss is minimised when faults are identified.
LanguageEnglish
Pages1984-1993
Number of pages10
JournalJournal of Parallel and Distributed Computing
Volume74
Issue number1
Early online date16 Sep 2013
DOIs
Publication statusPublished - 1 Jan 2014

Fingerprint

Fault detection
Throughput
Testing
Failure analysis
Network-on-chip
Routers
Field programmable gate arrays (FPGA)
Hardware

Cite this

@article{0602b0e924844658880a87bc17f23cdf,
title = "Online Traffic-Aware Fault Detection for Networks-on-Chip",
abstract = "A key requirement for modern Networks-on-Chip (NoC) is the ability to detect and diagnose faults and failures. This paper addresses the challenge of fault diagnosis using online testing where the interruption of the runtime operation (performance) under diagnosis is minimised. A novel Monitor Module (MM) is proposed to detect NoC interconnect faults which minimise the intrusion of the regular NoC traffic throughput by (1) using a channel tester which only examines NoC channels when they are idle; and (2) using a testing interval parameter based on the Binary Exponential Back off algorithm to dynamically balance the level of testing when recovering from temporary faults. The paper presents results on the minimal impact on NoC throughput for a range of testing conditions and also highlights the minimal area overhead of the MM (11.56{\%}) compared with an adaptive NoC router implemented on FPGA hardware. Simulation results demonstrate non-intrusion of the NoC runtime traffic throughput when channel are fault free, and also how throughput loss is minimised when faults are identified.",
author = "J Liu and Jim Harkin and Yuhua Li and LP Maguire",
year = "2014",
month = "1",
day = "1",
doi = "10.1016/j.jpdc.2013.09.001",
language = "English",
volume = "74",
pages = "1984--1993",
number = "1",

}

Online Traffic-Aware Fault Detection for Networks-on-Chip. / Liu, J; Harkin, Jim; Li, Yuhua; Maguire, LP.

Vol. 74, No. 1, 01.01.2014, p. 1984-1993.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Online Traffic-Aware Fault Detection for Networks-on-Chip

AU - Liu, J

AU - Harkin, Jim

AU - Li, Yuhua

AU - Maguire, LP

PY - 2014/1/1

Y1 - 2014/1/1

N2 - A key requirement for modern Networks-on-Chip (NoC) is the ability to detect and diagnose faults and failures. This paper addresses the challenge of fault diagnosis using online testing where the interruption of the runtime operation (performance) under diagnosis is minimised. A novel Monitor Module (MM) is proposed to detect NoC interconnect faults which minimise the intrusion of the regular NoC traffic throughput by (1) using a channel tester which only examines NoC channels when they are idle; and (2) using a testing interval parameter based on the Binary Exponential Back off algorithm to dynamically balance the level of testing when recovering from temporary faults. The paper presents results on the minimal impact on NoC throughput for a range of testing conditions and also highlights the minimal area overhead of the MM (11.56%) compared with an adaptive NoC router implemented on FPGA hardware. Simulation results demonstrate non-intrusion of the NoC runtime traffic throughput when channel are fault free, and also how throughput loss is minimised when faults are identified.

AB - A key requirement for modern Networks-on-Chip (NoC) is the ability to detect and diagnose faults and failures. This paper addresses the challenge of fault diagnosis using online testing where the interruption of the runtime operation (performance) under diagnosis is minimised. A novel Monitor Module (MM) is proposed to detect NoC interconnect faults which minimise the intrusion of the regular NoC traffic throughput by (1) using a channel tester which only examines NoC channels when they are idle; and (2) using a testing interval parameter based on the Binary Exponential Back off algorithm to dynamically balance the level of testing when recovering from temporary faults. The paper presents results on the minimal impact on NoC throughput for a range of testing conditions and also highlights the minimal area overhead of the MM (11.56%) compared with an adaptive NoC router implemented on FPGA hardware. Simulation results demonstrate non-intrusion of the NoC runtime traffic throughput when channel are fault free, and also how throughput loss is minimised when faults are identified.

U2 - 10.1016/j.jpdc.2013.09.001

DO - 10.1016/j.jpdc.2013.09.001

M3 - Article

VL - 74

SP - 1984

EP - 1993

IS - 1

ER -