Patterns for Fault Tolerant Software
Copyright 2007 Alcatel-Lucent. All Rights Reserved.
Published by John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, England Telephone (+44) 1243 779777
Email (for orders and customer service enquiries):
Visit our Home Page on www.wiley.com
All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except under the terms of the Copyright, Designs and Patents Act 1988 or under the terms of a licence issued by the Copyright Licensing Agency Ltd, 90 Tottenham Court Road, London W1T 4LP, UK, without the permission in writing of the Publisher. Requests to the Publisher should be addressed to the Permissions Department, John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, England, or emailed to , or faxed to (+44) 1243 770620.
Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The Publisher is not associated with any product or vendor mentioned in this book.
This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the Publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought.
Other Wiley Editorial Offices
John Wiley & Sons Inc., 111 River Street, Hoboken, NJ 07030, USA
Jossey-Bass, 989 Market Street, San Francisco, CA 94103-1741, USA
Wiley-VCH Verlag GmbH, Boschstr. 12, D-69469 Weinheim, Germany
John Wiley & Sons Australia Ltd, 42 McDougall Street, Milton, Queensland 4064, Australia
John Wiley & Sons (Asia) Pte Ltd, 2 Clementi Loop #02-01, Jin Xing Distripark, Singapore 129809
John Wily & Sons Canada Ltd, 22 Worcester Road, Etobicoke, Ontario, Canada, M9W 1LI
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.
Anniversary Logo Design: Richard J. Pacifico
Library of Congress Cataloging-in-Publication Data
Hanmer, Robert S.
Patterns for fault tolerant systems / Robert S. Hanmer.
p. cm.
Includes bibliographical references and index.
ISBN 978-0-470-31979-6 (cloth : alk. paper)
1. Fault-tolerant computing. I. Title.
QA76.9.F38H35 2007
004.2dc22 2007029096
British Library Cataloguing in Publication Data
A catalogue record for this book is available from the British Library
ISBN-13: 978-0-470-31979-6
For my best friends, Karen and Bud
Acknowledgements
There are many people who helped make this book and the patterns, not the least of who is Karen Hanmer who supported the effort and assisted greatly with the photographs. Henry Maron drew many of the solution illustrations. The ChiliPLoP 2007 Hot Topic group to review this book consisted of Paul Adamczyk, Richard P. Gabriel, and Ricardo Lopez. Their insightful and thought provoking comments were instrumental to the final shape of this book.
The University of Illinois Urbana Champaign Software Architecture Group under the leadership of Professor Ralph Johnson reviewed the manuscript and offered very useful comments and suggestions.
Veena Mendiratta, John Letourneau, Doug Kimber, Eric Bauer, Phil Scarff, Shawa Tam, Amir Raveh, Amr Elssamadisy, and Lee Ayres all contributed useful comments to help this book take shape.
Lucent and Alcatel-Lucent managers have supported this project from the beginning, including John McManus, Doug Wittig, Jan Fertig, Michael Massetti, Shawa Tam, Jon Heard, Joe Carson and Thierry Paul-Dauphin. A special thank you goes to Alicja Kawecki, who has always been very helpful with publication clearance.
Thank you to the people at John Wiley & Sons, Rosie Kemp, Sally Tickner, Drew Kennerley and Hannah Clement, for patiently answering all the questions that go along with a first book.
Pattern Origins and Earlier Versions
LEAKY BUCKET COUNTERS (27) was originally by Robert Gamoke, original version edited by James O. Coplien published in [ACG+96]. Very similar to Leaky Bucket of Credit by Gerard Meszaros published in [Mes 96]. Leaky Bucket of Credit describes using this same concept as a resource allocation mechanism. The Leaky Bucket Counter strategy was alluded to in p. 20034 of Bell System Technical Journal, Volume XLIII 5(10), Sept. 1964.
COMPLETE PARAMETER CHECKING (14) was suggested by Kopetz in [Kop79], pp 7576.
REASSESS OVERLOAD DECISION (44) was alluded to in [GHH+77], p. 1177.
DEFERRABLE WORK (43) was alluded to in [GHH+77], p. 1177.
QUEUE FOR RESOURCES (46) is related to [WWF96].
EXISTING METRICS (20) was alluded to in [CCR+77], p. 1116.
Earlier versions of the patterns FINISH WORK IN PROGRESS (54), FRESH WORK BEFORE STALE (55), SHARE THE LOAD (51), SHED LOAD (49) and SHED WORK AT PERIPHERY (52) were written by Gerard Meszaros and published in [Mes96].
Earlier versions of EQUITABLE RESOURCE ALLOCATION (45), DEFERRABLE WORK (43) (If Its Working Hard Dont Fix It), EXISTING METRICS (20) (Overload Elastics), OVERLOAD TOOLBOXES (42), QUEUE FOR RESOURCES (46), and REASSESS OVERLOAD DECISION (44) appear in [Han06a] in [MVN06].
Mike Adams was a co-author on previous versions of EQUITABLE RESOURCE ALLOCATION (45), OVERLOAD TOOLBOXES (42), and SLOW IT DOWN (53).
Titos Saridakis in [Sar02] has versions of ACKNOWLEDGEMENT (17), HEARTBEAT (16), ROLLBACK (32) and ROLL-FORWARD (33).
Michael Wu was co-author on previous versions of EXPANSIVE AUTOMATIC CONTROLS (47), PROTECTIVE AUTOMATIC CONTROLS (48), and FINAL HANDLING (50).
MINIMIZE HUMAN INTERVENTION (5) was originally written by James O. Coplien and Mike Adams.
An earlier version of RIDING OVER TRANSIENTS (26) was written by James O. Coplien.
Thank you to my PLoP shepherds for these patterns:
Our PLoP 95 Shepherd was Gerard Meszaros, who worked with lead author of [ACG+96] James Coplien on the pre-conference drafts.
Michael Pont offered many valuable comments and suggestions as shepherd that significantly improved the organization of WATCHDOG (18), HEARTBEAT (16), SYSTEM MONITOR (15), ACKNOWLEDGEMENT (17), and REALISTIC THRESHOLD (19). Mark Bradac and Lars Grunske also reviewed drafts.
Ward Cunningham was PLoP 2000 shepherd for the patterns that also appeared in [Han06a].
For the patterns CHECKPOINT (37), CONCENTRATED RECOVERY (29), INDIVIDUALS DECIDE TIMING (40), LIMIT RETRIES (35), REMOTE STORAGE (39), and WHAT TO SAVE (38), thanks to Titos Saridakis, Tim Parks for insight on Killer Messages, Mark Bradac, and to shepherd Toni Marinucci.
David DeLano provided many useful comments on EXPANSIVE AUTOMATIC CONTROLS (47), PROTECTIVE AUTOMATIC CONTROLS (48), and FINAL HANDLING (50).