attachment
<div dir="ltr"><div><div><div><div>Hi,<br><br></div>The recent paper on Software Rejuvenation from Lawrence Bernstein,<br></div>who invented the concept in the 1990s.<br><br></div>Cheers,<br></div>- Ira<br><br><div><div><div><div><div><div><div class="m_7518637161535046750gmail_signature"><div dir="ltr"><div><div dir="ltr">Ira McDonald (Musician / Software Architect)<br>Co-Chair - TCG Trusted Mobility Solutions WG<br>Chair - Linux Foundation Open Printing WG<br>Secretary - IEEE-ISTO Printer Working Group<br>Co-Chair - IEEE-ISTO PWG Internet Printing Protocol WG<br>IETF Designated Expert - IPP & Printer MIB<br>Blue Roof Music / High North Inc<br><a style="color:rgb(51,51,255)" href="http://sites.google.com/site/blueroofmusic" target="_blank">http://sites.google.com/site/<wbr>blueroofmusic</a><br><a style="color:rgb(102,0,204)" href="http://sites.google.com/site/highnorthinc" target="_blank">http://sites.google.com/site/<wbr>highnorthinc</a><br>mailto: <a href="mailto:blueroofmusic@gmail.com" target="_blank">blueroofmusic@gmail.com</a><br>Jan-April: 579 Park Place Saline, MI 48176 <a href="tel:(734)%20944-0094" value="+17349440094" target="_blank">734-944-0094</a><br>May-Dec: PO Box 221 Grand Marais, MI 49839 <a href="tel:(906)%20494-2434" value="+19064942434" target="_blank">906-494-2434</a><br><br><div style="display:inline"></div><div style="display:inline"></div><div style="display:inline"></div><div></div><div></div><div></div><div></div></div></div></div></div></div>
<br><div class="gmail_quote">---------- Forwarded message ----------<br>From: <b class="gmail_sendername">Tatourian, Alan</b> <span dir="ltr"><<a href="mailto:alan.tatourian@intel.com" target="_blank">alan.tatourian@intel.com</a>></span><br>Date: Wed, Aug 9, 2017 at 1:15 AM<br>Subject: [info] Software Rejuvenation<br>To: <br><br><br>
<div lang="EN-US">
<div class="m_7518637161535046750gmail-m_-247593145259451465WordSection1">
<h3>Software Rejuvenation<u></u><u></u></h3>
<p class="MsoNormal">Lawrence Bernstein and Dr. Chandra M. R. Kintala <u></u><u></u></p>
<p class="MsoNormal">Stevens Institute of Technology<u></u><u></u></p>
<p class="MsoNormal">Here is a design approach that makes software more trustworthy, called software rejuvenation. It is a periodic, pre-emptive restart of a running system at a clean internal state that prevents latent faults from becoming future failures.
It was used in systems ranging from a Lucent billing unit to NASA's long-duration space mission to Pluto, and is implemented in IBM's Netfinity resource manager. It is easy to apply, uses very little central processing unit time, increases software reliability
by two orders of magnitude, and is recommended for all software-intensive systems.<u></u><u></u></p>
<p class="MsoNormal">Software modules comprise a large part of life- and mission-critical systems. System crashes are more likely to be the result of a fault in the software than in the hardware. In spite of our best efforts at removing the errors/faults (bugs1)
before deploying those systems, it is wise to assume that bugs remain in the system and those bugs often lead to failures (crashes).<u></u><u></u></p>
<p class="MsoNormal">Software fault tolerance is aimed at tolerating those residual faults by building mechanisms to watch for failures and recover from them [1, 2]. Fault tolerance is a reactive approach: Failures usually happen at unexpected times, and the
built-in mechanisms to recover from those failures will kick-in to restart the system and the service. However, these unscheduled interruptions in service are expensive and can be life-threatening. This article describes a proactive, preventive technique called
soft-ware rejuvenation that prevents faults from becoming failures.<u></u><u></u></p>
<p class="MsoNormal">Lawrence Bernstein observed in 1990 that faults/bugs, when triggered in soft-ware, do not always cause failures/crashes immediately but take the system into a state where it begins to decay2. This decay has symptoms of memory leakage, broken
pointers, unreleased file locks, numerical error accumulation, etc., causing gradual degradation in availability of service and data quality and eventually leading to a failure/crash.<u></u><u></u></p>
<p class="MsoNormal"><span style="background:rgb(255,242,204) none repeat scroll 0% 0%">Based on this observation, a new method to enhance the dependability of a software system, called software rejuvenation, was introduced in 1995 by Kintala and his colleagues in Bell Labs [1, 3]. Software
rejuvenation is a proactive approach that involves stopping an executing process periodically or when a failure is imminent, cleaning up the internal state of the sys-tem, and then restarting it at a known healthy state to prevent a predicted future failure.</span><u></u><u></u></p>
<p class="MsoNormal">Software rejuvenation is as intuitive as occasionally rebooting your PC, except that it was never defined, implemented, modeled, and analyzed for software systems before 1995 [3]. Shari Pfleeger used the term software rejuvenation to mean,
“…looking back at software work products to try to derive additional information …” in her seminal software engineering book [4]. Her use differs from ours as we focus on the execution of the software during its mission, and she focuses on the software development
process.<u></u><u></u></p>
<p class="MsoNormal">Modeling and Analysis<u></u><u></u></p>
<p class="MsoNormal">Software rejuvenation incurs overhead and should be done at a time when the cost due to service interruption is mini-mal. Hence modeling the system to find optimal rejuvenation times is crucial. A simple and useful model based on continuous-time
Markov chains was first introduced in [3] to analyze software rejuvenation.<u></u><u></u></p>
<p class="MsoNormal">The Future<u></u><u></u></p>
<p class="MsoNormal">Software rejuvenation is ready for industry-wide deployment. It can make software systems more trustworthy. Good designers will use it and move from the state of the art to the state of the practice. It is a good design practice for individual
systems.<u></u><u></u></p>
<p class="MsoNormal"><span style="background:rgb(255,242,204) none repeat scroll 0% 0%">Software rejuvenation is one aspect of self-healing</span> that has gained research interest recently. There are some interesting new problems for software rejuvenation in large-scale, networked, self-healing
systems. We describe some of those problems here and make some suggestions: <u></u>
<u></u></p>
<p class="MsoNormal">1. For networked applications, we need to monitor and gather the availability and quality of all the required resources for the application across the network, and then synthesize that gathered data and make a prediction about possible
failure of the application or a component in the application. Network application monitoring might be hard to do in such a generalized fashion. You can perhaps do it in a limited domain such as a Voice over Internet Protocol (VoIP) application in an enterprise
network.<u></u><u></u></p>
<p class="MsoNormal">2. Self-healing systems on a network need alternate paths for communication between components to avoid an impending failure. This may be hard to do in a generalized fashion. But in much the same way as in clustered systems providing redundancy
for centralized applications, you can perhaps provide alternate communication paths for some self-healing applications (for example, VoIP) using alternate service provider networks.<u></u><u></u></p>
<p class="MsoNormal">3. Modeling and implementation have several problems due to their large-scale nature. What is a state in a large-scale system when state is across sever-al products and systems in a network? Perhaps, you need to model the system in a hierarchical,
tree-structured fashion decomposing the state into smaller units as you need it for analysis. Failure symptoms are at a system/network (macro) level but rejuvenation actions are at a component (micro) level; how do you correlate the two? This topic is perhaps
related to event correlation in network management. How do you do rejuvenation efficiently in very large systems? Perhaps gradual load shedding can be used. What is a safe (clean internal) state to back up to? How do you back up to that state?<u></u><u></u></p>
<p class="MsoNormal">References<u></u><u></u></p>
<p class="m_7518637161535046750gmail-m_-247593145259451465MsoListParagraphCxSpFirst" style="margin-left:0.25in">
<u></u><span>1.<span style="font:normal normal normal normal 7pt "Times New Roman"">
</span></span><u></u>Bernstein, L. “Software Fault Tolerance Forestalls Crashes: To Err Is Human, to Forgive Is Fault Tolerant” in Advances in Computers 58. Highly Dependable Software. Ed. M. Zelkowitz. Academic Press, 2003: 240-285.<u></u><u></u></p>
<p class="m_7518637161535046750gmail-m_-247593145259451465MsoListParagraphCxSpMiddle" style="margin-left:0.25in">
<u></u><span>2.<span style="font:normal normal normal normal 7pt "Times New Roman"">
</span></span><u></u>Lyu, M., Ed. Software Fault Tolerance. New York: John Wiley, 1995.<u></u><u></u></p>
<p class="m_7518637161535046750gmail-m_-247593145259451465MsoListParagraphCxSpMiddle" style="margin-left:0.25in">
<u></u><span>3.<span style="font:normal normal normal normal 7pt "Times New Roman"">
</span></span><u></u>Huang, Y., C. Kintala, N. Kolettis, and N.D. Fulton. Software Rejuvenation: Analysis, Module and Applications. Proc. of 25th Symposium on Fault Tolerant Computing FTCS-25, Pasadena, CA, June 1995: 381-390 <<a href="http://www.ece.stevens-tech.edu/~" target="_blank">www.ece.stevens-tech.edu/~</a>
ckintala/Papers/RejuvFTCS25.pd<wbr>f>. The Web site <<a href="http://www.software-rejuvenation.com" target="_blank">www.software-rejuvenation.com</a><wbr>>, maintained by professor Trivedi at Duke University, has a collection of follow-up research papers on the topic.<u></u><u></u></p>
<p class="m_7518637161535046750gmail-m_-247593145259451465MsoListParagraphCxSpMiddle" style="margin-left:0.25in">
<u></u><span>4.<span style="font:normal normal normal normal 7pt "Times New Roman"">
</span></span><u></u>Pfleeger, S.L. Software Engineering Theory and Practice. 2nd ed. Prentice Hall, 2001: 496-502.<u></u><u></u></p>
<p class="m_7518637161535046750gmail-m_-247593145259451465MsoListParagraphCxSpMiddle" style="margin-left:0.25in">
<u></u><span>5.<span style="font:normal normal normal normal 7pt "Times New Roman"">
</span></span><u></u>Li, L., K. Vaidyanathan, and K.S. Trivedi. “An Approach for Estimation of Software Aging in a Web Server.” International Symposium on Empiri-cal Software Engineering, Nara, Japan, Oct. 2002.<u></u><u></u></p>
<p class="m_7518637161535046750gmail-m_-247593145259451465MsoListParagraphCxSpMiddle" style="margin-left:0.25in">
<u></u><span>6.<span style="font:normal normal normal normal 7pt "Times New Roman"">
</span></span><u></u>Vaidyanathan, K., R.E. Harper, S.W. Hunter, and K.S. Trivedi. Analysis and Implementation of Software Rejuvenation in Cluster Systems. Proc. of the Joint Intl. Conference on Measure-ment and Modeling of Computer Systems, ACM SIGMETRICS
2001/Performance 2001, Cambridge, MA, June 2001.<u></u><u></u></p>
<p class="m_7518637161535046750gmail-m_-247593145259451465MsoListParagraphCxSpMiddle" style="margin-left:0.25in">
<u></u><span>7.<span style="font:normal normal normal normal 7pt "Times New Roman"">
</span></span><u></u>Tai, A.T., L. Alkalai, and S.N. Chau. “Onboard Preventive Maintenance: A Design-Oriented Analytic Study for Long-Life Applications.” Performance Evaluation 35.3-4 (June 1999): 215- 232.<u></u><u></u></p>
<p class="m_7518637161535046750gmail-m_-247593145259451465MsoListParagraphCxSpMiddle" style="margin-left:0.25in">
<u></u><span>8.<span style="font:normal normal normal normal 7pt "Times New Roman"">
</span></span><u></u>Bernstein, L., Y.D. Yao, and K. Yao. “Software Rejuvenation: Avoiding Failures Even When There Are Faults.” The DoD SoftwareTECH News 6.2 (Oct. 2003): 8-11 <www. <a href="http://softwaretechnews.com" target="_blank">softwaretechnews.com</a>>.<u></u><u></u></p>
<p class="m_7518637161535046750gmail-m_-247593145259451465MsoListParagraphCxSpMiddle" style="margin-left:0.25in">
<u></u><span>9.<span style="font:normal normal normal normal 7pt "Times New Roman"">
</span></span><u></u>General Accounting Office. “B-247094, Report to the House of Representatives.” Washington, D.C.: GAO, Information Management and Technology Division, 4 Feb. 1992 <<a href="http://www.fas.org/spp/starwars/gao/im9" target="_blank">www.fas.org/spp/starwars/gao/<wbr>im9</a> 2026.htm>.<u></u><u></u></p>
<p class="m_7518637161535046750gmail-m_-247593145259451465MsoListParagraphCxSpLast" style="margin-left:0.25in">
<u></u><span>10.<span style="font:normal normal normal normal 7pt "Times New Roman"">
</span></span><u></u>Bao, Y., X. Sun, and K. Trivedi. Adaptive Software Rejuvenation: Degradation Models and Rejuvenation Schemes. Proc. of The International Conference on Dependable Systems and Networks, San Francisco, CA, June 2003.<u></u><u></u></p>
<p class="MsoNormal"><span style="color:rgb(59,56,56)"><u></u> <u></u></span></p>
</div>
</div>
</div><br></div></div></div></div></div></div>