Computing, navigation, and communication systems are crucial for various critical and autonomous systems with long mission times (e.g. space, subsurface systems, eEnergy and meteorology studies). Reconfigurable hardware, e.g. Field-Programmable Gate Array (FPGA), is widely utilized in these application domains due to its flexibility, high performance, no Non-Refundable-Engineering cost and fast Time-To-Market. These systems often have stringent reliability requirements. Tolerance of both temporary and permanent failures are equally important since field replacement or service is not an option. Accumulation of failures in such systems can result in too many aborts and data corruption, leading to catastrophic failures. However, vulnerability of modern nanoscale FPGAs to various runtime failure mechanisms, such as radiation-induced soft errors, transistor aging, latent manufacturing defects, and increased circuit noise sensitivity, seriously threatens the reliability of mission-critical FPGA-based systems.
In this proposal, we pursue highly reliable, recoverable and available systems based on runtime reconfigurable platforms. We plan to develop a highly reliable dynamically reconfigurable platform using multiple FPGAs for tolerating permanent (hard), temporary (soft), and timing (aging) failures. We plan to provide efficient error detection, aging sensing, error localization, error/aging bypassing through reconfiguration, and recovery by taking advantage of programmability, regularity, and reconfigurability of modern FPGAs.