== Question == How do we best accommodate repair? == Summary == Persistent errors will occur during operation. To survive the system must at least be prepared for them and able to prevent them from corrupting operation. To optimize performance and survivability, the system must further repair itself, perhaps configuring around the newly defective element. == Subquestions == 1. What is the right granularity for repair? (e.g. core, functional unit, gate) 2. What is the right division of responsibility among layers? 3. What interfaces should be exposed for repair? == Relevant Scenarios == * [[Scenarios/S1|Scenario 1]] == Workshop Materials == * [[attachment:Meetings/First/Program/repair.pdf|Workshop 1 Slides]] == Existing Work == * Lakamraju shows how to substitute out logic and route around persistent defects in an FPGA. Vijay Lakamraju and Russell Tessier. ''[[http://doi.acm.org/10.1145/329166.329205|Tolerating Operational Faults in Cluster-based FPGAs]],'' in ''Proceedings of the International Symposium on Field-Programmable Gate Arrays,'' pp. 187--194, 2004. * The Bullet Proof design allows pipeline or processing elements impacted by persistent defects to be disabled. Todd Austin and Valeria Bertacco and Scott Mahlke and Yu Cao. ''[[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4584456|Reliable Systems on Unreliable Fabrics]],'' in ''IEEE Design and Test of Computers,'' volume 25, number 5, pp. 322--332, 2008. * ''add additional references here'' == Comments == * ''To comment, please add another bullet to this list.''