Abstract
Cluster computing, whereby a large number of simple processors or nodes are combined together to apparently function as a single powerful computer, has emerged as a research area in its own right. The approach offers a relatively inexpensive means of providing a fault-tolerant environment and achieving significant computational capabilities for high-performance computing applications. However, the task of manually managing and configuring a cluster quickly becomes daunting as the cluster grows in size. Autonomic computing, with its vision to provide self-management, can potentially solve many of the problems inherent in cluster management. We describe the development of a prototype autonomic cluster management system (ACMS) that exploits autonomic properties in automating cluster management and its evolution to include reflex reactions via pulse monitoring.
Original language | English |
---|---|
Title of host publication | Unknown Host Publication |
Publisher | IEEE |
Pages | 478-482 |
Number of pages | 5 |
DOIs | |
Publication status | Published (in print/issue) - 22 Jul 2005 |
Event | Workshop on Reliability and Autonomic Management in Parallel and Distributed Systems (RAMPDS-05) at ICPADS-2005 - Fukuoka, Japan Duration: 22 Jul 2005 → … |
Workshop
Workshop | Workshop on Reliability and Autonomic Management in Parallel and Distributed Systems (RAMPDS-05) at ICPADS-2005 |
---|---|
Period | 22/07/05 → … |
Keywords
- High performance computing
- NASA
- Power system management
- Prototypes
- Space technology
- Concurrent computing
- Scalability
- Availability
- Computer networks
- Energy management