Ссылки - Распределённые системы для практиков.

[1] G. Coulouris, J. Dollimore, T. Kindberg, and G. Blair, Distributed Systems: Concepts and Design (5th Edition). 2011.

[2] A. B. Bondi, “Characteristics of scalability and their impact on performance,” in Proceedings of the second international workshop on Software and performance – WOSP ’00. p. 195, 2000.

[3] P. Bailis and K. Kingsbury, “The Network is Reliable”, in ACM Queue, Volume 12, Issue 7, July 23, 2014, 2014.

[4] A. Alquraan, H. Takruri, M. Alfatafta, and S. Al-Kiswany, “An Analysis of Network-Partitioning Failures in Cloud Systems”, in Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, 2018.

[5] J. C. Corbett et al., “Spanner: Google’s Globally-Distributed Database,” in Proceedings of OSDI 2012, 2012.

[6] T. D. Chandra and S. Toueg, “Unreliable failure detectors for reliable distributed systems”, in Journal of the ACM. Volume 43 Issue 2, ACM. pp. 225–267, 1996.

[7] F. Chang et al., “Bigtable: A Distributed Storage System for Structured Data”, in Proceedings of 7th {USENIX} Symposium on Operating Systems Design and Implementation (OSDI), 2006.

[8] G. DeCandia et al., “Dynamo: Amazon’s Highly Available Key-value Store”, in Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles, 2007.

[9] A. Lakshman and P. Malik, “Cassandra — A Decentralized Structured Storage System”, in Operating Systems Review, 2010.

[10] R. van Renesse and F. B. Schneider, “Chain Replication for SupportingHigh Throughput and Availability,” in Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation, 2004.

[11] D. K. Gifford, “Weighted voting for replicated data”, in Proceedings of the seventh ACM symposium on Operating systems principles, 1979.

[12] N. Lynch and S. Gilbert, “Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services”, in SIGACT News, 2002.

[13] D. Abadi, “Consistency Tradeoffs in Modern Distributed Database System Design: CAP is Only Part of the Story”, in Computer, 2012.

[14] M. P. Herlihy and J. M. Wing, “Linearizability: A Correctness Condition for Concurrent Objects”, in ACM Transactions on Programming Languages and Systems, July 1990, 1990.

[15] P. Bailis, A. Davidson, A. Fekete, A. Ghodsi, J. M. Hellerstein, and I. StoicaUC, “Highly Available Transactions: Virtues and Limitations (Extended Version)”, in Proceedings of the VLDB Endowment, 2013.

[16] ANSI X3.135-1992, “American National Standard for Information Systems - Database Language - SQL”, 1992.

[17] H. Berenson, P. Bernstein, J. Gray, J. Melton, E. O’Neil, and P. O’Neil, “A Critique of ANSI SQL Isolation Levels”, in SIGMOD Rec., 1995.

[18] C. H. Papadimitriou, “The Serializability of Concurrent Database Updates”, in Journal of the ACM, October 1979, 1979.

[19] P. A. Bernstein and N. Goodman, “Serializability theory for replicated databases”, in Journal of Computer and System Sciences, 1983.

[20] M. J. Franking, “Concurrency Control and Recovery”, in SIGMOD ’92, 1992.

[21] M. J. Cahill, U. Rohm, and A. D. Fekete, “Serializable Isolation for Snapshot Databases”, in Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, 2008.

[22] D. R. K. Ports and K. Grittner, “Serializable Snapshot Isolation in PostgreSQL”, in Proceedings of the VLDB Endowment, Volume 5 Issue 12, August 2012, 2012.

[23] T. S. and C. Pillai, V. Chidambaram, R. Alagappan, S. Al-Kiswany, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau, “All File Systems Are Not Created Equal: On the Complexity of Crafting Crash-consistent Applications”, in Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, 2014.

[24] J. N. Gray, “Notes on data base operating systems”, in Operating Systems. Lecture Notes in Computer Science, vol 60. Springer, 1978.

[25] B. Lampson and H. E. Sturgis, “Crash Recovery in a Distributed Data Storage System”, 1979.

[26] H. Garcia-Molina, “Performance of Update Algorithms for Replicated Data in a Distributed Database”, 1979.

[27] D. Skeen, “Nonblocking Commit Protocols”, in Proceedings of the 1981 ACM SIGMOD International Conference on Management of Data, 1981, 1981.

[28] D. Skeen, “A Quorum-Based Commit Protocol”, 1982.

[29] C. Binnig, S. Hildenbrand, F. Farber, D. Kossmann, J. Lee, and N. May, “Distributed Snapshot Isolation: Global Transactions Pay Globally, Local Transactions Pay Locally”, in The VLDB Journal, Volume 23 Issue 6, December 2014, 2014.

[30] H. Garcia-Molina and K. Salem, “Sagas”, in Proceedings of the 1987 ACM SIGMOD International Conference on Management of Data, 1987.

[31] L. Frank and T. U. Zahle, “Semantic ACID Properties in Multidatabases Using Remote Procedure Calls and Update Propagations”, in Software—Practice & Experience, Volume 28 Issue 1, Jan. 1998, 1998.

[32] X. Defago, A. Schiper, and P. Urban, “Total order broadcast and multicast algorithms: Taxonomy and survey”, in ACM Computing Surveys, Volume 34, 2004.

[33] M. J. Fischer, N. A. Lynch, and M. S. Paterson, “Impossibility of Distributed Consensus with One Faulty Process”, in Journal of the ACM (JACM), 1985.

[34] L. Lamport, “The Part-time Parliament”, in ACM Transactions on Computer Systems (TOCS), 1998.

[35] L. Lamport, “Paxos Made Simple”, in ACM SIGACT News (Distributed Computing Column) 32, 4 (Whole Number 121, December 2001), 2001.

[36] T. D. Chandra, R. Griesemer, and J. Redstone, “Paxos Made Live: An Engineering Perspective”, in Proceedings of the Twenty-sixth Annual ACM Symposium on Principles of Distributed Computing, 2007.

[37] B. W. Lampson, “How to Build a Highly Available System Using Consensus”, in Proceedings of the 10th International Workshop on Distributed Algorithms, 1996.

[38] H. Du, J. S. David, and Hilaire, “Multi-Paxos: An Implementation and Evaluation”, 2009.

[39] V. Hadzilacos, “On the Relationship between the Atomic Commitment and Consensus Problems”, in Fault-Tolerant Distributed Computing, November 1990, pages 201–208, 1990.

[40] J. Gray and L. Lamport, “Consensus on Transaction Commit”, in ACM Transactions on Database Systems (TODS), Volume 31 Issue 1, March 2006, 2006.

[41] D. Ongaro and J. Ousterhout, “In Search of an Understandable Consensus Algorithm”, in Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference, 2014.

[42] C. Dyreson, “Physical Clock”, in Encyclopedia of Database Systems, 2009.

[43] M. Raynal and M. Singhal, “Logical time: capturing causality in distributed systems”, in Computer, Volume 29, Issue 2, Feb 1996, 1996.

[44] L. Lamport, “Time, Clocks, and the Ordering of Events in a Distributed System”, in Communications of the ACM 21, 7 July 1978, 1978.

[45] S. Reinhard and M. Friedemann, “Detecting Causal Relationships in Distributed Computations: In Search of the Holy Grail”, in Distributed Computing, Volume 7 Issue 3, March 1994, 1994.

[46] F. Colin J., “Timestamps in Message-Passing Systems That Preserve the Partial Ordering”, in Proceedings of the 11th Australian Computer Science Conference (ACSC’88), pp. 56–66, 1998.

[47] M. Friedemann, “Virtual Time and Global States of Distributed Systems”, in Parallel and Distributed Algorithms, 1988.

[48] R. van Renesse and F. B. Schneider, “Chain Replication for SupportingHigh Throughput and Availability,” in Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation, 1991.

[49] D. S. Parker et al., “Detection of mutual inconsistency in distributed systems”, in IEEE Transactions on Software Engineering, Volume, 9 Issue 3, pages 240-247, 1983.

[50] N. M. Perguica, C. Baquero, P. S. Almeida, V. Fonte, and G. Ricardo, “Dotted Version Vectors: Logical Clocks for Optimistic Replication”, in arXiv:1011.5808, 2010.

[51] K. M. Chandy and L. Lamport, “Distributed Snapshots: Determining Global States of Distributed Systems”, in ACM Transactions on Computer Systems (TOCS), Volume 3 Issue 1, Feb. 1985, 1985.

[52] S. Ghemawat, H. Gobioff, and S.-T. Leung, “The Google File System”, in Sumposium on Operating Systems Principles ’03, 2003.

[53] K. Shvachko, H. Kuang, and R. Chansler, “The Hadoop Distributed File System”, in IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), 2010.

[54] M. Burrows, “The Chubby Lock Service for Loosely-coupled Distributed Systems”, in Proceedings of the 7th Symposium on Operating Systems Design and Implementation, 2008.

[55] P. Hunt, M. Konar, F. P. Junqueira, and B. Reed, “ZooKeeper: Wait-free Coordination for Internet-scale Systems”, in Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, 2010.

[56] F. P. Junqueira, B. Reed, and M. Serafini, “Zab: High-performance Broadcast for Primary-backup Systems”, in Proceedings of the 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks, 2011.

[57] A. Medeiros, “ZooKeeper’ s atomic broadcast protocol : Theory and practice”, 2012.

[58] P. O’Neil, E. Cheng, D. Gawlick, and E. O’Neil, “The log-structured merge-tree (LSM-tree)”, in Acta Informatica, Volume 33 Issue 4, 1996, 1996.

[59] D. F. Bacon et al., “Spanner: Google’s Globally-Distributed Database,” in Proceedings of the 2017 ACM International Conference on Management of Data, 2017.

[60] E. Brewer, “Spanner, TrueTime and the CAP Theorem”, 2017.

[61] D. J. Rosenkrantz, R. E. Stearns, and P. M. Lewis,II, “System Level Concurrency Control for Distributed Database Systems”, in ACM Transactions on Database Systems (TODS), volume 3, Issue 2, June 1978, 1978.

[62] A. Thomson, T. Diamond, S.-C. Weng, K. Ren, P. Shao, and D. J. Abadi, “Calvin: Fast Distributed Transactions for Partitioned Database Systems”, in Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, 2012.

[63] D. J. Abadi and J. M. Faleiro, “An Overview of Deterministic Database Systems”, in Communications of the ACM, Volume 61 Issue 9, September 2018, 2019.

[64] J. Kreps, N. Narkhede, and J. Rao, “Kafka : a Distributed Messaging System for Log Processing”, in NetDB’ 11, June 2012, 2011, 2011.

[65] G. Wang et al., “Building a Replicated Logging System with Apache Kafka”, in NetDB’ 11, June 2012, 2011, 2011.

[66] A. Verma, L. Pedrosa, M. R. Korupolu, D. Oppenheimer, E. Tune, and J. Wilkes, “Large-scale cluster management at Google with Borg”, in Proceedings of the European Conference on Computer Systems, Eurosys, 2015.

[67] B. Burns, B. Grant, D. Oppenheimer, E. Brewer, and J. Wilkes, “Borg, Omega, and Kubernetes”, in ACM Queue, volume 14, pages 70-93, 2016, 2016.

[68] R. G. Brown, “The Corda Platform: An Introduction”, 2018.

[69] M. Hearn and R. G. Brown, “Corda: A distributed ledger”, in August 2019, version 1.0, 2019, 2019.

[70] S. Nakamoto, “Bitcoin: A Peer-to-Peer Electronic Cash System”, 2008.

[71] J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters”, in Proceedings of the 6th Conference on Symposium on Operating Systems Design & Implementation, Volume 6, 2004.

[72] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, “Spark: Cluster Computing with Working Sets”, in Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing, 2010.

[73] M. Zaharia et al., “Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing”, in Proceedings of the USENIX Conference on Networked Systems Design and Implementation, 2012.

[74] P. Carbone, A. Katsifodimos, S. Ewen, V. Markl, S. Haridi, and K. Tzoumas, “Apache Flink™: Stream and Batch Processing in a Single Engine”, in IEEE Data Engineering Bulletin 2015, 2015.

[75] P. Carbone, S. Ewen, G. Fora, S. Haridi, S. Richter, and K. Tzoumas, “State Management in Apache Flink: Consistent Stateful Distributed Stream Processing”, in Proceedings of the VLDB Endowment, Volume 10 Issue 12, August 2017, 2017.

[76] M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker, and I. Stoica, “Discretized Streams: Fault-tolerant Streaming Computation at Scale”, in Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, 2013.

[77] T. Akidau et al., “MillWheel: Fault-tolerant Stream Processing at Internet Scale”, in Proceedings of the VLDB Endowment, Volume 6 Issue 11, August 2013, 2013.

[78] T. Akidau et al., “The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-scale, Unbounded, Outof-order Data Processing”, in Proceedings of the 41st International Conference on Very Large Data Bases, Volume 8 Issue 12, August 2015, 2015.

[79] P. Carbone, G. Fora, S. Ewen, S. Haridi, and K. Tzoumas, “Lightweight Asynchronous Snapshots for Distributed Dataflows”, 2015.

[80] C. G. Gray and D. R. Cheriton, “Leases: An Efficient Fault-Tolerant Mechanism for Distributed File Cache Consistency”, in Proceedings of the Twelfth ACM Symposium on Operating Systems Principles, 1989.

[81] J. H. Saltzer, D. P. Reed, and D. D. Clark, “End-to-End Arguments in System Design”, in ACM Transactions in Computer Systems 2, 4, November, 1984, pages 277-288, 1984.

[82] T. Moors, “A critical review of End-to-end arguments in system design”, in Communications, ICC 2002. IEEE International Conference on, Volume 2, 2002.

[83] P. Viotti and M. Vukoliundefined, “Consistency in Non-Transactional Distributed Storage Systems”, in ACM Computing Surveys, Volume 49, No. 1, 2016.

[84] A. M. Kawazoe, W. Chen, and S. Toueg, “Heartbeat: A timeout-free failure detector for quiescent reliable communication”, in 11th International Workshop on Distributed Algorithms, ’97, 1997.

[85] K. Birman, “The Promise, and Limitations, of Gossip Protocols”, in ACM SIGOPS Operating Systems Review, October 2007, 2007.

[86] R. van Renesse, Y. Minsky, and M. Hayden, “A Gossip-Style Failure Detection Service”, in Proceedings of the IFIP International Conference on Distributed Systems Platforms and Open Distributed Processing, ’98, 1998.

[87] N. Hayashibara, X. Défago, R. Yared, and T. Katayama, “The phi accrual failure detector”, in 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.

[88] C. MacCárthaigh, “Workload isolation using shuffle-sharding”, in The Amazon Builders’ Library, 2019.

[89] J. Lamping and E. Veach, “A Fast, Minimal Memory, Consistent Hash Algorithm”, in arXiv:1406.2294, 2014.

[90] B. Appleton and M. O’Reilly, “Multi-probe consistent hashing”, in arXiv:1505.00062, 2015.

[91] D. G. Thaler and C. V. Ravishankar, “A Name-Based Mapping Scheme for Rendezvous”, in University of Michigan Technical Report CSE-TR-316-96, 2013.

[92] H. Howard, D. Malkhi, and A. Spiegelman, “Flexible Paxos: Quorum intersection revisited”, in arXiv:1608.06696, 2016.

[93] I. Moraru, D. G. Andersen, and M. Kaminsky, “There is More Consensus in Egalitarian Parliaments”, in Proceedings of the 24th ACM Symposium on Operating Systems Principles, 2013.

[94] L. Lamport, D. Malkhi, and L. Zhou, “Vertical Paxos and PrimaryBackup Replication”, in Proceedings of the 28th Annual ACM Symposium on Principles of Distributed Computing, 2009.

[95] R. Van Renesse and D. Altinbuken, “Paxos Made Moderately Complex”, in ACM Computing Surveys, February 2015, Article No. 42, 2015.

[96] M. Castro and B. Liskov, “Practical Byzantine Fault Tolerance”, in Proceedings of the 3rd Symposium on Operating Systems Design and Implementation, 1999.

[97] L. Lamport, “Byzantizing paxos by refinement”, in Proceedings of the 25th International Conference on Distributed Computing, 2011.

[98] M. Shapiro, N. Preguiça, C. Baquero, and M. Zawirski, “Conflict-free Replicated Data Types”, in [Research Report] RR-7687, 2011, pp.18. inria-00609399v1, 2011.

[99] M. Shapiro, N. Preguiça, C. Baquero, and M. Zawirski, “A comprehensive study of Convergent and CommutativeReplicated Data Types”, in [Research Report] RR-7506, Inria – Centre Paris-Rocquencourt; INRIA. 2011, pp.50. inria-00555588, 2011.

[100] R. Nishtala et al., “Scaling Memcache at Facebook”, in Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation, 2001.

[101] N. Bronson et al., “TAO: Facebook’s Distributed Data Store for the Social Graph”, in Proceedings of the 2013 USENIX Conference on Annual Technical Conference, 2013.

[102] B. H. Sigelman et al., “TAO: Facebook’s Distributed Data Store for the Social Graph,” 2010.

[103] A. Verbitski et al., “Amazon Aurora: On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes”, in Proceedings of the 2018 International Conference on Management of Data, 2018.

[104] A. Verbitski et al., “Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases”, in Proceedings of the 2017 ACM International Conference on Management of Data, 2017.

[105] J. Hamilton, “On Designing and Deploying Internet-Scale Services”, in Proceedings of the 21st Large Installation System Administration Conference (LISA ’07), 2007.

[106] E. A. Brewer, “Lessons from Giant-Scale Services”, in IEEE Internet Computing, Volume 5, No. 4, 2001.

[107] P. Huang et al., “Gray Failure: The Achilles’ Heel of Cloud-Scale Systems”, in Proceedings of the 16th Workshop on Hot Topics in Operating Systems, 2017.

[108] C. Lou, P. Huang, and S. Smith, “Understanding, Detecting and Localizing Partial Failures in Large System Software”, in 17th USENIX Symposium on Networked Systems Design and Implementation, 2020.

[109] B. Lebiednik, A. Mangal, and N. Tiwari, “Understanding, Detecting and Localizing Partial Failures in Large System Software”, in arXiv:1605.01701, 2016.

[110] M. A. Kuppe, L. Lamport, and D. Ricketts, “The TLA+ Toolbox”, in arXiv:1912.10633, 2019.

[111] C. Newcombe, T. Rath, F. Zhang, B. Munteanu, M. Brooker, and M. Deardeuff, “How Amazon Web Services Uses Formal Methods”, in Communications of the ACM, Volume 58, No. 4, 2015.