Дополнение A. Ссылки

Abadi., Daniel J, Adam Marcus, Samuel R. Madden, and Kate Hollenbach. 2009. SW-Store: A Vertically Partitioned DBMS for Semantic Web Data Management. VLDB Journal 18 (2): 385–406.

Abadi, Martín, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Gregory S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. 2016. TensorFlow: Large-scale Machine Learning on Heterogeneous Distributed Systems. arXiv/CoRR abs/1603.04467.

Abadi, Martín, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. TensorFlow: A System for Large-scale Machine Learning. OSDI 16: 265–283.

Aerospike, Inc. 2021. Key-Value Operations.

Ahmad, Faraz, Seyong Lee, Mithuna Thottethodi, and T. N. Vijaykumar. 2012. PUMA: Purdue MapReduce Benchmarks Suite. ttps://engineering.purdue.edu/~puma/puma.pdf

Aker, Brian. 2011. libMemcached.

Akinaga, Hiroyuki, and Hisashi Shima. 2010. Resistive Random Access Memory (ReRAM) Based on Metal Oxides. Proceedings of the IEEE 98: 2237–2251.

Alarcon, Nefi. 2019. GPU-accelerated Spark XGBoost—A Major Milestone on the Road to Large-Scale AI.

Albutiu, Martina-Cezara, Alfons Kemper, and Thomas Neumann. 2012. Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems. Proceedings of the VLDB Endowment 5 (10): 1064–1075.

Algo-Logic Systems. 2020. Low Latency KVS on Xilinx Alveo U200—Algo-Logic Systems Inc.

Al-Kiswany, Samer, Abdullah Gharaibeh, and Matei Ripeanu. 2013. GPUs as Storage System Accelerators. IEEE Transactions on Parallel and Distributed Systems 24 (8): 1556–1566.

Allen, Grant, and Mike Owens. 2010. The Definitive Guide to SQLite. 2nd ed. Berkeley, CA: Apress.

Alok, Gupta. 2020. Architecture Apocalypse Dream Architecture for Deep Learning Inference and Compute—VERSAL AI Core.

Alverson, Bob, Edwin Froese, Larry Kaplan, and Duncan Roweth. 2012. Cray® XCTM Series Network-Cray. https://www.alcf.anl.gov/files/CrayXCNetwork.pdf.

Amazon, AWS. 2019. Best Practices Design Patterns: Optimizing Amazon S3 Performance.

Amazon, AWS. 2021a. Amazon Elastic Block Store (EBS). https://aws.amazon.com/ebs/.

Amazon, AWS. 2021b. Amazon S3 REST API Introduction.

Amazon, AWS. 2021c. Boto3 documentation. https://boto3.amazonaws.com/v1/documentation/api/latest/index.html.

Amazon, AWS. 2021d. Elastic Fabric Adapter.

Amazon, AWS. 2021e. Using High-level (S3) Commands with the AWS CLI.

AMD. 2021a. AMD. https://www.amd.com.

AMD. 2021b. AMD InstinctTM MI Series Accelerators.

AMD. 2021c. AMD InstinctTM MI100 Accelerator. https://www.amd.com/en/products/server-accelerators/instinct-mi100.

AMD. 2021d. AMD ROCm.

AMD. 2021e. AMD “Zen 3” Core Architecture. https://www.amd.com/en/technologies/zen-core-3.

AMPLab, UC Berkeley. 2021. Big Data Benchmark.

Apache Flink. 2020. Accelerating Your Workload with GPU and Other External Resources.

Apache Software Foundation. 2016. Class MultithreadedMapper. https://hadoop.apache.org/docs/r2.6.5/api/org/apache/hadoop/mapreduce/lib/map/MultithreadedMapper.html.

Apache Software Foundation. 2019. Apache Crail.

Apache Software Foundation. 2020a. HDFS Erasure Coding.

Apache Software Foundation. 2020b. Memory Storage Support in HDFS. https://hadoop.apache.org/docs/r3.3.0/hadoop-project-dist/hadoop-hdfs/MemoryStorage.html.

Apache Software Foundation. 2020c. MapReduce Tutorial.

Apache Software Foundation. 2021a. AffinityFunction (Ignite 2.10.0)—Apache Ignite. https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/cache/affinity/AffinityFunction.html.

Apache Software Foundation. 2021b. Apache Arrow.

Apache Software Foundation. 2021c. Apache Cassandra.

Apache Software Foundation. 2021d. Apache Flink®—Stateful Computations over Data Streams.

Apache Software Foundation. 2021e. Apache Hadoop.

Apache Software Foundation. 2021f. Welcome to Apache HBaseTM.

Apache Software Foundation. 2021g. Apache Heron (Incubating).

Apache Software Foundation. 2021h. Apache HiveTM. https://hive.apache.org/.

Apache Software Foundation. 2021i. Apache Impala Is the Open Source, Native Analytic Database for Apache Hadoop.

Apache Software Foundation. 2021j. Apache Kafka.

Apache Software Foundation. 2021k. Apache SparkTM Is a Unified Analytics Engine for Large-scale Data Processing.

Apache Software Foundation. 2021l. Apache Storm.

Apache Software Foundation. 2021m. Apache Zepplin.

Apache Software Foundation. 2021n. Apache Zepplin.

Apache Software Foundation. 2021o. Apache ZooKeeperTM.

Apache Software Foundation. 2021p. Apache Ignite: Distributed Database for High-performance Computing with In-memory Speed.

Apache Software Foundation. 2021q. Welcome To Apache Giraph!

Apache Software Foundation. 2021r. Spark SQL and DataFrames—Apache Spark.

Apache Software Foundation. 2021s. Spark SQL Is Apache Spark’s Module for Working with Structured Data.

Apache Software Foundation. 2021t. Spark Streaming Makes It Easy to Build Scalable Fault-tolerant Streaming Applications.

Apache Software Foundation. 2021u. Working with SQL.

Arafa, Mohamed, Bahaa Fahim, Sailesh Kottapalli, Akhilesh Kumar, Lily P. Looi, Sreenivas Mandava, Andy Rudoff, Ian M. Steiner, Bob Valentine, Geetha Vedaraman, et al. 2019. Cascade Lake: Next Generation Intel Xeon Scalable Processor. IEEE Micro 39 (2): 29–36.

Argonne National Lab. 2021. Aurora Argonne Leadership Computing Facility.

ARM. 2021a. Arm Neoverse V1 Platform: Unleashing a New Performance Tier for Arm-based Computing.

ARM. 2021b. High Performance Computing. https://www.arm.com/solutions/infrastructure/high-performance-computing.

Armstrong, Timothy G., Vamsi Ponnekanti, Dhruba Borthakur, and Mark Callaghan. 2013. LinkBench: A Database Benchmark Based on the Facebook Social Graph. Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, 1185–1196.

Arulraj, Joy, Andrew Pavlo, and Subramanya R. Dulloor. 2015. Let’s Talk about Storage and Recovery Methods for Non-Volatile Memory Database Systems. Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, 707–722.

ASCAC Subcommittee on Exascale Computing, The. 2010. The Opportunities and Challenges of Exascale Computing.

Atikoglu, Berk, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2012. Workload Analysis of a Large-scale Key-Value Store. SIGMETRICS Performance Evaluation Review 40 (1): 53–64.

Awan, A. A., C. Chu, H. Subramoni, X. Lu, and D. K. Panda. 2018. OC-DNN: Exploiting Advanced Unified Memory Capabilities in CUDA 9 and Volta GPUs for Out-of-Core DNN Training. 2018 IEEE 25th International Conference on High Performance Computing (HIPC), 143–152.

Bader, David A., John R. Gilbert, Jeremy Kepner, and Kamesh Madduri. 2021. HPC Graph Analysis.

Baidu Research. 2021. DeepBench. https://github.com/baidu-research/DeepBench.

Bakkum, Peter, and Kevin Skadron. 2010. Accelerating SQL Database Operations on a GPU with CUDA. Proceedings of the 3rd Workshop on General-purpose Computation on Graphics Processing Units, 94–103.

Balakrishna, Vijay. 2016. Delivering on NoSQL Database Performance Requirements with NVMe SSDs (Samsung). Proceedings of the 2016 Flash Memory Summit.

Barthels, Claude, Simon Loesing, Gustavo Alonso, and Donald Kossmann. 2015. Rack-scale In-memory Join Processing Using RDMA. Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, 1463–1475.

Barthels, Claude, Ingo Müller, Timo Schneider, Gustavo Alonso, and Torsten Hoefler. 2017. Distributed Join Algorithms on Thousands of Cores. Proceedings of the VLDB Endowment 10 (5): 517–528.

Beckett, Dave, Matt Singer, Milind Damle, Rakesh Radhakrishnan, and Barrie Wheeler. 2018. Boosting Hadoop Performance and Cost Efficiency with Caching, Fast SSDs, and More Compute.

Behrens, Tobias, Viktor Rosenfeld, Jonas Traub, Sebastian Breß, and Volker Markl. 2018. Efficient SIMD Vectorization for Hashing in OpenCL. In the 21th International Conference on Extending Database Technology (EDBT). 489–492.

Beloglazov, Anton, and Rajkumar Buyya. 2010. Energy Efficient Allocation of Virtual Machines in Cloud Data Centers. 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, 577–578.

Bengio, Yoshua, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. 2003. A Neural Probabilistic Language Model. Journal of Machine Learning Research 3 (Feb): 1137–1155.

Bent, John, Garth Gibson, Gary Grider, Ben McClelland, Paul Nowoczynski, James Nunez, Milo Polte, and Meghan Wingate. 2009. PLFS: A Checkpoint Filesystem for Parallel Applications. Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, (SC ’09). Association for Computing Machinery, New York, NY, USA, Article 21, 1–12. DOI

Besta, Maciej, Dimitri Stanojevic, Johannes De Fine Licht, Tal Ben-Nun, and Torsten Hoefler. 2019. Graph Processing on FPGAs: Taxonomy, Survey, Challenges. arXiv preprint. arXiv:1903.06697.

Birrittella, Mark S., Mark Debbage, Ram Huggahalli, James Kunz, Tom Lovett, Todd Rimmer, Keith D. Underwood, and Robert C. Zak. 2015. Intel Omni-path Architecture: Enabling Scalable, High Performance Fabrics. In Proceedings of the 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects (HOTI ’15). 1–9. DOI

Bisong, Ekaba. 2019. Google Colaboratory. Building Machine Learning and Deep Learning Modules on Google Cloud Platform, 59–64. Berkeley, CA: Apress.

Bisson, Tim, Ke Chen, Changho Choi, Vijay Balakrishnan, and Yang-suk Kee. 2018. Crail-KV: A High-performance Distributed Key-Value Store Leveraging Native KV-SSDs over NVMe-oF. 2018 IEEE 37th International Performance Computing and Communications Conference (IPCCC), 1–8.

Biswas, R., X. Lu, and D. K. Panda. 2018. Accelerating TensorFlow with Adaptive RDMA-based GRPC. In 2018 IEEE 25th International Conference on High Performance Computing (HIPC), 2–11.

Bitfusion. 2017. Deep Learning Frameworks with Spark and GPUs.

Blazegraph. 2021. Welcome to Blazegraph. https://blazegraph.com/.

BlazingSQL. 2021. blazingSQL—Open Source SQL in Python.

Blott, Michaela, Kimon Karras, Ling Liu, Kees Vissers, Jeremia Bär, and Zsolt István. 2013. Achieving 10Gbps Line-rate Key-Value Stores with FPGAs. 5th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 13). San Jose, CA: USENIX Association.

Boito, Francieli Zanon, Eduardo C. Inacio, Jean Luca Bez, Philippe O. A. Navaux, Mario A. R. Dantas, and Yves Denneulin. 2018. A Checkpoint of Research on Parallel I/O for High-Performance Computing. ACM Computing Surveys (CSUR) 51 (2): 23.

Bostock, Mike. 2020. D3.js.

Braam, Peter J., and Rumi Zahir. 2002. Lustre: A Scalable, High-Performance File System. Cluster File Systems, Inc.

Brytlyt. 2021. BrytlytDB. https://www.brytlyt.com/what-we-do/brytlytdb/.

Buluç, Aydin, Tim Mattson, Scott McMillan, José Moreira, and Carl Yang. 2017a. Design of the GraphBLAS API for C. 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 643–652.

Buluç, Aydın, Timothy Mattson, Scott McMillan, José Moreira, and Carl Yang. 2019. The Graph-BLAS C API Specification.

Canziani, Alfredo, Adam Paszke, and Eugenio Culurciello. 2016. An Analysis of Deep Neural Network Models for Practical Applications. arXiv/CoRR abs/1605.07678.

Cao, Wei, Zhenjun Liu, Peng Wang, Sen Chen, Caifeng Zhu, Song Zheng, Yuhui Wang, and Guoqing Ma. 2018. Polarfs: An Ultra-low Latency and Failure Resilient Distributed File System for Shared Storage Cloud Database. Proceedings of the VLDB Endowment 11 (12): 1849–1862.

Carbone, Paris, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache Flink: Stream and Batch Processing in a Single Engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 38 (4). 28–38.

Caulfield, Adrian M., Todor I. Mollov, Louis Alex Eisner, Arup De, Joel Coburn, and Steven Swanson. 2012. Providing Safe, User Space Access to Fast, Solid State Disks. ACM SIGPLAN Notices 47 (4): 387–400.

Ceph. 2021. Ceph Delivers Object, Block, and File Storage in a Single, Unified System.

Cerebras. 2021a. Cerebras. https://cerebras.net.

Cerebras. 2021b. Cerebras Systems: Achieving Industry Best AI Performance through a Systems Approach.

CGCL-codes. 2017. TensorFlow RDMA. https://github.com/CGCL-codes/Tensorflow-RDMA.

Chabot, C. 2009. Demystifying Visual Analytics. IEEE Computer Graphics and Applications 29 (2): 84–87.

Chameleon. 2021. A Configurable Experimental Environment for Large-scale Edge to Cloud Research.

Chandramouli, Badrish, Guna Prasaad, Donald Kossmann, Justin Levandoski, James Hunter, and Mike Barnett. 2018. Faster: A Concurrent Key-Value Store with In-place updates. Proceedings of the 2018 International Conference on Management of Data, 275–290.

Chen, Chen, Xianzhi Du, Le Hou, Jaeyoun Kim, Jing Li, Yeqing Li, Abdullah Rashwan, Fan Yang, and Hongkun Yu. 2020. TensorFlow official model garden.

Chen, Tianqi, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. Mxnet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. arXiv preprint. arXiv:1512.01274.

Chen, Tianqi, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, et al. 2018. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), 578–594.

Chen, Wei, Aidi Pi, Shaoqi Wang, and Xiaobo Zhou. 2019. Pufferfish: Container-driven Elastic Memory Management for Data-intensive Applications. SoCC ’19: Proceedings of the ACM Symposium on Cloud Computing, 259–271.

Chen, Xinyu, Hongshi Tan, Yao Chen, Bingsheng He, Weng-Fai Wong, and Deming Chen. 2021. ThunderGP: HLS-based Graph Processing Framework on FPGAs. The 2021 ACM/SIGDA International Symposium on Field-programmable Gate Arrays, 69–80.

Chen, Xue-wen, and Xiaotong Lin. 2014. Big Data Deep Learning: Challenges and Perspectives. Access, IEEE 2: 514–525.

Chen, Youmin, Jiwu Shu, Jiaxin Ou, and Youyou Lu. 2018. HiNFS: A Persistent Memory File System with Both Buffering and Direct-access. ACM Transactions on Storage (TOS) 14 (1): 1–30.

Chen, Yu-Ting, Jason Cong, Zhenman Fang, Jie Lei, and Peng Wei. 2016. When Spark Meets FPGAs: A Case Study for Next-generation {DNA} Sequencing Acceleration. 8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud). USENIX Association.

Cheng, Wang. 2019. APUS: Fast and Scalable Paxos on RDMA. https://github.com/hku-systems/apus.

Ching, Avery, Sergey Edunov, Maja Kabiljo, Dionysios Logothetis, and Sambavi Muthukrishnan. 2015. One Trillion Edges: Graph Processing at Facebook-Scale. Proceedings of the VLDB Endowment 8 (12): 1804–1815.

Chintapalli, Sanket, Derek Dagit, Bobby Evans, Reza Farivar, Thomas Graves, Mark Holderbaugh, Zhuo Liu, Kyle Nusbaum, Kishorkumar Patil, Boyang Jerry Peng, and Paul Poulosky. 2016. Benchmarking Streaming Computation Engines: Storm, Flink and Spark Streaming. 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 1789–1792.

Chintapalli, Sanket, Derek Dagit, Bobby Evans, Reza Farivar, Thomas Graves, Mark Holderbaugh, Zhuo Liu, Kyle Nusbaum, Kishorkumar Patil, Boyang Jerry Peng, and Paul Poulosky. 2021. Yahoo Streaming Benchmarks.

Cho, Kyunghyun, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation. arXiv preprint. arXiv:1406.1078.

Cho, Minsik, Ulrich Finkler, David Kung, and Hillery Hunter. 2019. BlueConnect: Decomposing All-reduce for Deep Learning on Heterogeneous network Hierarchy. Proceedings of Machine Learning and Systems 1: 241–251.

Chu, Chengtao, Sang K. Kim, Yi an Lin, Yuanyuan Yu, Gary Bradski, Kunle Olukotun, and Andrew Y. Ng. 2007. Map-Reduce for Machine Learning on Multicore. In Advances in Neural Information Processing Systems 19: Proceedings of the 2006 Conference. eds. B. Schölkopf, J. C. Platt, and T. Hoffman, 281–288. Cambridge, MA: MIT Press.

Chu, Ching-Hsiang, Sreeram Potluri, Anshuman Goswami, Manjunath Venkata, Neena Inam, and Chris J. Newburn. 2018. Designing High-performance In-memory Key-Value Operations with Persistent GPU Kernels and OpenSHMEM. In Workshop on OpenSHMEM and Related Technologies (OpenSHMEM). 148–164. Springer International Publishing.

Chu, Howard. 2011. MDB: A Memory-Mapped Database and Backend for OpenLDAP.

CloudSuite Team. 2021. CloudSuite: A Benchmark Suite for Cloud Services.

Collobert, Ronan, Cl’ement Farabet, Koray Kavukcuoglu, and Soumith Chintala. 2021. Torch—Scientific Computing for LuaJIT.

Collobert, Ronan, Samy Bengio, and Johnny Marithoz. 2002. Torch: A Modular Machine Learning Software Library. IDIAP Research Report.

Condit, Jeremy, Edmund B. Nightingale, Christopher Frost, Engin Ipek, Benjamin Lee, Doug Burger, and Derrick Coetzee. 2009. Better I/O through Byte-addressable, Persistent Memory. Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, 133–146.

Convolbo, Moïse W., and Jerry Chou. 2016. Cost-aware DAG Scheduling Algorithms for Minimizing Execution Cost on Cloud Resources. Journal of Supercomputing 72 (3): 985–1012.

Cooper, B. F., A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In The Proceedings of the ACM Symposium on Cloud Computing (SoCC ’10). Association for Computing Machinery, New York, NY, USA, 143–154. DOI

Cornelis. 2020. Cornelis Networks. https://www.cornelisnetworks.com/.

Cray. 2021. HPE. 2021. XCTM Series DataWarpTM User Guide.

Crego, E., G. Munoz, and F. Islam. 2013. Big Data and Deep Learning: Big Deals or Big Delusions?

CSCS, Swiss National Supercomputing Centre. 2021. Piz Daint. https://www.cscs.ch/computers/piz-daint/.

Dagum, Leonardo, and Ramesh Menon. 1998. OpenMP: An Industry Standard API for Shared-Memory Programming. IEEE Computational Science and Engineering 5 (1): 46–55.

Dai, Jason (Jinquan), Yiheng Wang, Xin Qiu, Ding Ding, Yao Zhang, Yanzhang Wang, Xianyan Jia, Li (Cherry) Zhang, Yan Wan, Zhichao Li, et al. 2019. BigDL: A Distributed Deep Learning Framework for Big Data. Proceedings of the ACM Symposium on Cloud Computing. SoCC ’19, 50–60.

Dalessandro, Dennis, Ananth Devulapalli, and Pete Wyckoff. 2005. Design and Implementation of the iWarp Protocol in Software. In The IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS). 471–476.

Dandu, Satish Varma, Chinmay Chandak, Jarod Maupin, and Jeremy Dyer. 2020. cuStreamz: More Event Stream Processing for Less with NVIDIA GPUs and RAPIDS Software.

DataBricks. 2018. TensorFrames. https://github.com/databricks/tensorframes.

Databricks. 2021. Collaborative Notebooks: Collaborative Data Science with Familiar Languages and Tools.

Datadog. 2015. Monitor Cassandra with Datadog. https://www.datadoghq.com/blog/monitoring-cassandra-with-datadog/.

Datadog. 2021. DATADOG—Unified Monitoring for the cloud age.

DataMPI Team. 2021. DataMPI: Extending MPI for Big Data with Key-Value based Communication.

Davis, Timothy A. 2019. Algorithm 1000: SuiteSparse:GraphBLAS: Graph Algorithms in the Language of Sparse Linear Algebra. ACM Transactions on Mathematical Software 45 (4): article 44.

DDN. 2021. Infinite Memory Engine: Break Free from the Challenges and Inefficiencies Caused by I/O Bottlenecks.

Dean, Jeffrey, and Sanjay Ghemawat. 2008. MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM 51 (1): 107–113.

Deep500 Team. 2021. Deep500: An HPC Deep Learning Benchmark and Competition.

Deepnote. 2021. Deepnote. https://deepnote.com/.

DeepSpeech Team. 2021. Project DeepSpeech.

Deng, Jia, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Fei-Fei Li. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248–255).

Department of Energy, U.S. 2011. Terabits Networks for Extreme Scale Science. DOE Workshop Report.

Difallah, Djellel Eddine, Andrew Pavlo, Carlo Curino, and Philippe Cudre-Mauroux. 2013. OLTP-Bench: An Extensible Testbed for Benchmarking Relational Databases. Proceedings of the VLDB Endowment 7 (4): 277–288.

Doddamani, Spoorti, Piush Sinha, Hui Lu, Tsu-Hsiang K. Cheng, Hardik H. Bagdi, and Kartik Gopalan. 2019. Fast and Live Hypervisor Replacement. VEE 2019: Proceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments 45–58.

Domo. 2021. Data Never Sleeps 8.0. https://www.domo.com/learn/data-never-sleeps-8.

Dong, Yaozu, Jinquan Dai, Zhiteng Huang, Haibing Guan, Kevin Tian, and Yunhong Jiang. 2009. Towards High-quality I/O Virtualization. Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference. (SYSTOR ’09). Association for Computing Machinery, New York, NY, USA, Article 12, 1–8. DOI

Dormando. 2021. What is Memcached? http://memcached.org/.

Douglas, Chet. 2015. RDMA with PMEM Software Mechanisms for Enabling Access to Remote Persistent Memory.

Doweck, J., W. Kao, A. K. Lu, J. Mandelblat, A. Rahatekar, L. Rappoport, E. Rotem, A. Yasin, and A. Yoaz. 2017. Inside 6th-Generation Intel Core: New Microarchitecture Code-Named Skylake. IEEE Micro 37 (2): 52–62.

Dragojević, A., D. Narayanan, M. Castro, and O. Hodson. 2014. FaRM: Fast Remote Memory. 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14), 401–414.

Duato, José, Antonio J. Pena, Federico Silla, Rafael Mayo, and Enrique S Quintana-Ortí. 2010. rCUDA: Reducing the Number of GPU-based Accelerators in High Performance Clusters. 2010 International Conference on High Performance Computing and Simulation, 224–231.

Dulloor, Subramanya R., Sanjay Kumar, Anil Keshavamurthy, Philip Lantz, Dheeraj Reddy, Rajesh Sankaran, and Jeff Jackson. 2014. System Software for Persistent Memory. Proceedings of the Ninth European Conference on Computer Systems, 15.

Dynatrace. 2021a. Apache Spark monitoring.

Dynatrace. 2021b. Dynatrace. https://www.dynatrace.com/.

Dynatrace. 2021c. Hadoop Performance Monitoring.

Dysart, Timothy, Peter Kogge, Martin Deneroff, Eric Bovell, Preston Briggs, Jay Brockman, Kenneth Jacobsen, Yujen Juan, Shannon Kuntz, Richard Lethin, et al. 2016. Highly Scalable Near Memory Processing with Migrating Threads on the Emu System Architecture. 2016 6th Workshop on Irregular Applications: Architecture and Algorithms (IA3), 2–9.

E8 Storage. 2021. E8 Storage E8-D24 Rack Scale Flash, Centralized NVMe Solution.

Eisenman, Assaf, Darryl Gardner, Islam AbdelRahman, Jens Axboe, Siying Dong, Kim Hazelwood, Chris Petersen, Asaf Cidon, and Sachin Katti. 2018. Reducing DRAM Footprint with NVM in Facebook. Eurosys’18, 42.

Elangovan, Aparna. 2020. Optimizing I/O for GPU Performance Tuning of Deep Learning Training in Amazon SageMaker.

Elasticsearch B.V. 2021. Elasticsearch: The Heart of the Free and Open Elastic Stack. https://www.elastic.co/elasticsearch/.

Emani, Murali, Venkatram Vishwanath, Corey Adams, Michael E. Papka, Rick Stevens, Laura Florescu, Sumti Jairath, William Liu, Tejas Nama, and Arvind Sujeeth. 2021. Accelerating Scientific Applications with SambaNova Reconfigurable Dataflow Architecture. Computing in Science Engineering 23 (2): 114–119.

Engel, Jörn, and Robert Mertens. 2005. LogFS—Finally a Scalable Flash File System. Proceedings of the 12th International Linux System Technology Conference.

Facebook. 2018. RocksDB. https://rocksdb.org/.

Facebook AI Team. 2021. Facebook AI Performance Evaluation Platform.

Fan, Bin, David G. Andersen, and Michael Kaminsky. 2013. MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing. 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13), 371–384.

Fan, Ziqi, Fenggang Wu, Jim Diehl, David H. C. Du, and Doug Voigt. 2018. CDBB: An NVRAM-based Burst Buffer Coordination System for Parallel File Systems. Proceedings of the High Performance Computing Symposium, 1.

FASTDATA.io. 2021. PlasmaENGINE®.

Fent, Philipp, Alexander van Renen, Andreas Kipf, Viktor Leis, Thomas Neumann, and Alfons Kemper. 2020. Low-Latency Communication for Fast DBMS Using RDMA and Shared Memory. 2020 IEEE 36th International Conference on Data Engineering (ICDE), 1477–1488.

Fielding, Roy Thomas. 2000. Chapter 5: Representational State Transfer (REST). Architectural Styles and the Design of Network-based Software Architectures. PhD dissertation, University of California, Irvine.

Fikes, Andrew. 2010. Storage Architecture and Challenges. Google Faculty Summit, 535.

Foley, D., and J. Danskin. 2017. Ultra-Performance Pascal GPU and NVLink Interconnect. IEEE Micro 37 (02): 7–17.

Fujitsu. 2021. FUJITSU Processor A64FX.

Garrigues, Pierre. 2015. How Deep Learning Powers Flickr. RE.WORK Deep Learning Summit 2015.

Ghasemi, E. and Chow, P., 2016, July. Accelerating Apache Spark Big Data Analysis with FPGAs. 2016 International IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld). (pp. 737–744). IEEE.

Ghemawat, Sanjay, Howard Gobioff, and Shun-Tak Leung. 2003. The Google File System. Proceedings of the 19th ACM Symposium on Operating Systems Principles, 20–43.

Goldenberg, Dror, Michael Kagan, Ran Ravid, and Michael S. Tsirkin. 2005. Transparently Achieving Superior Socket Performance Using Zero Copy Socket Direct Protocol over 20 Gb/s InfiniBand Links. 2005 IEEE International Conference on Cluster Computing (Cluster), 1–10.

Gonzalez, Joseph E., Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ion Stoica. 2014. Graphx: Graph Processing in a Distributed Dataflow Framework. OSDI, 599–613.

Google. 2010. Our New Search Index: Caffeine.

Google. 2021a. Cloud TPU. https://cloud.google.com/tpu/.

Google. 2021b. TensorFlow.

Gottschling, Paul. 2019. Monitor Apache Hive with Datadog.

Goudarzi, Hadi, Mohammad Ghasemazar, and Massoud Pedram. 2012. SLA-based Optimization of Power and Migration Cost in Cloud Computing. 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2012.

Govindaraju, N. K., B. He, Q. Luo, and W. Fang. 2011. Mars: Accelerating MapReduce with Graphics Processors. IEEE Transactions on Parallel and Distributed Systems 22 (04): 608–620.

Govindaraju, Naga K., Brandon Lloyd, Wei Wang, Ming Lin, and Dinesh Manocha. 2004. Fast Computation of Database Operations Using Graphics Processors. Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data. SIGMOD ’04, 215–226.

Graham, Richard L., Devendar Bureddy, Pak Lui, Hal Rosenstock, Gilad Shainer, Gil Bloch, Dror Goldenerg, Mike Dubman, Sasha Kotchubievsky, Vladimir Koushnir, et al. 2016. Scalable Hierarchical Aggregation Protocol (SHArP): A Hardware Architecture for Efficient Data Reduction. Proceedings of the First Workshop on Optimization of Communication in HPC, 1–10.

Graph500.

Graphcore. 2021. https://www.graphcore.ai.

Greeneitch, Nathan G., Jing Xu, and Shailendrsingh Kishore Sobhee. 2019. Getting Started with Intel® Optimization for PyTorch.

Groupon. 2017. Sparklint. https://github.com/groupon/sparklint.

gRPC Authors. 2021. gRPC: A High Performance, Open Source Universal RPC Framework.

Gubner, Tim, and Peter A. Boncz. 2017. Exploring Query Compilation Strategies for JIT, Vectorization and SIMD. Eighth International Workshop on Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architectures (ADMS). Vol. 2.

Gugnani, Shashank, Xiaoyi Lu, and Dhabaleswar K. Panda. 2016. Designing Virtualization-aware and Automatic Topology Detection Schemes for Accelerating Hadoop on SR-IOV-Enabled Clouds. 2016 IEEE International Conference on Cloud Computing Technology and Science (CloudCom), 152–159.

Gugnani, Shashank, Xiaoyi Lu, and Dhabaleswar K. (DK) Panda. 2017. Swift-X: Accelerating OpenStack Swift with RDMA for Building an Efficient HPC Cloud. Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. CCGRID ’17, 238–247.

Gugnani, Shashank, Xiaoyi Lu, and Dhabaleswar K. Panda. 2018. Analyzing, Modeling, and Provisioning QoS for NVMe SSDs. 2018 IEEE/ACM 11th International Conference on Utility and Cloud Computing (UCC), 247–256.

Guo, Fan, Yongkun Li, Min Lv, Yinlong Xu, and John C. S. Lui. 2019. Hp-Mapper: A High Performance Storage Driver for Docker Containers. Proceedings of the ACM Symposium on Cloud Computing, 325–336.

Gupta, K., J. A. Stuart, and J. D. Owens. 2012. A Study of Persistent Threads Style GPU Programming for GPGPU Workloads. 2012 Innovative Parallel Computing (InPar), 1–14.

Gupta, Vishakha, Ada Gavrilovska, Karsten Schwan, Harshvardhan Kharche, Niraj Tolia, Vanish Talwar, and Parthasarathy Ranganathan. 2009. GviM: GPU-accelerated Virtual Machines. Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing (HPCVirt ’09). Association for Computing Machinery, New York, NY, USA, 17–24. DOI

Gurajada, Sairam, Stephan Seufert, Iris Miliaraki, and Martin Theobald. 2014. TriAD: A Distributed Shared-nothing RDF Engine Based on Asynchronous Message Passing. Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. SIGMOD ’14, 289–300.

Guz, Zvika, Harry Huan Li, Anahita Shayesteh, and Vijay Balakrishnan. 2017. NVMe-over-Fabrics Performance Characterization and the Path to Low-Overhead Flash Disaggregation. Proceedings of the 10th ACM International Systems and Storage Conference, SYSTOR ’17, 16.

Habana Gaudi. 2019. GaudiTM Training Platform White Paper.

Habana Goya. 2019. GoyaTM Inference Platform White Paper. https://habana.ai/wp-content/uploads/pdf/habana_labs_goya_whitepaper.pdf.

Hamilton, Mark, Sudarshan Raghunathan, Akshaya Annavajhala, Danil Kirsanov, Eduardo de Leon, Eli Barzilay, Ilya Matiach, Joe Davison, Maureen Busch, Miruna Oprescu, et al. 2018. Flexible and Scalable Deep Learning with MMLSpark. arXiv preprint. arXiv:1804.04031.

Handy, Jim. 2015. Understanding the Intel/Micron 3D XPoint Memory. In 2015 Storage Developer Conference (SDC) [Presentation]. 68.

Harris, Derrick. 2015. Google, Stanford Say Big Data is Key to Deep Learning for Drug Discovery. https://gigaom.com/2015/03/02/google-stanford-say-big-data-is-key-to-deep-learning-for-drug-discovery.

Harris, Mark. 2017. Unified Memory for CUDA Beginners.

Harzog, Bernd. 2019. Modern Applications Require Modern APM Solutions: A SolarWinds APM Suite Whitepaper.

He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778.

He, Wenting, Huimin Cui, Binbin Lu, Jiacheng Zhao, Shengmei Li, Gong Ruan, Jingling Xue, Xiaobing Feng, Wensen Yang, and Youliang Yan. 2015. Hadoop+ Modeling and Evaluating the Heterogeneity for MapReduce Applications in Heterogeneous Clusters. Proceedings of the 29th ACM International Conference on Supercomputing, 143–153.

Henseler, Dave, Benjamin Landsteiner, Doug Petesch, Cornell Wright, and Nicholas J. Wright. 2016. Architecture and Design of Cray DataWarp. Cray User Group CUG.

Herodotou, Herodotos, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong, Fatma Bilgen Cetin, and Shivnath Babu. 2011. Starfish: A Self-tuning System for Big Data Analytics. Proceedings of the Fifth Biennial Conference on Innovative Data Systems Research (CIDR) 11: 261–272. www.cidrdb.org.

HeteroDB. 2021. PG-Strom.

Hetherington, Tayler H., Mike O’Connor, and Tor M. Aamodt. 2015. Memcachedgpu: Scaling-up Scale-out Key-Value Stores. Socc ’15: Proceedings of the Sixth ACM Symposium on Cloud Computing, 43–57.

Hetherington, T. H., T. G. Rogers, L. Hsu, M. O’Connor, and T. M. Aamodt. 2012. Characterizing and Evaluating a Key-Value Store Application on Heterogeneous CPU-GPU Systems. 2012 IEEE International Symposium on Performance Analysis of Systems Software, 88–98.

Hochreiter, Sepp, and Jürgen Schmidhuber. 1997. Long Short-term Memory. Neural Computation 9 (8): 1735–1780.

Howard, Andrew G., Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv/CoRR abs/1704.04861.

Huang, J., K. Schwan, and M. K. Qureshi. 2014. NVRAM-aware Logging in Transaction Systems. Proceedings of the VLDB Endowment 8 (4) (December 2014), 389–400.

Huang, Ting-Chang, and Da-Wei Chang. 2016. TridentFS: A Hybrid File System for Non-volatile RAM, Flash Memory and Magnetic Disk. Software: Practice and Experience 46 (3): 291–318.

Huang, Yihe, Matej Pavlovic, Virendra Marathe, Margo Seltzer, Tim Harris, and Steve Byan. 2018. Closing the Performance Gap between Volatile and Persistent Key-Value Stores Using Cross-referencing Logs. 2018 USENIX Annual Technical Conference (USENIX ATC 18), 967–979.

Hyper. 2021. HyPer—A Hybrid OLTP&OLAP High Performance DBMS.

Iandola, Forrest N., Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level Accuracy with 50x Fewer Parameters and 0.5 MB Model Size. arXiv preprint. arXiv:1602.07360.

IBM. 2018. ibmgraphblas.

IBM. 2020. IBM Reveals Next-generation IBM POWER10 Processor.

IBM. 2021. IBM General Parallel File System (GPFS) Product Documentation. https://www.ibm.com/support/knowledgecenter/SSFKCN/gpfs_content.html.

IBMSparkGPU. 2016. SparkGPU.

IBTA, Inifiniband Trade Association. 2021a. Infiniband Trade Association.

IBTA, Inifiniband Trade Association. 2021b. RoCE Is RDMA over Converged Ethernet.

Infiniband Trade Association. 2010. Supplement to Infiniband Architecture Specification Volume 1, Release 1.2. 1: Annex A16: RDMA over Converged Ethernet (RoCE) Apr.

insideHPC. 2021. Microchip Technology Inc.: Introducing First PCI Express 5.0 Switches.

Intel. 2012. Intel® Data Direct I/O Technology (Intel® DDIO): A Primer. Technical report, Intel - Technical brief.

Intel. 2015a. Linux-pmfs/pmfs: Persistent Memory File System. https://github.com/linux-pmfs/pmfs.

Intel. 2015b. Performance Benchmarking for PCIe and NVMe Enterprise Solid-State Drives. White paper.

Intel. 2017. Intel SPDK. https://www.spdk.io/.

Intel. 2019a. Cascade Lake.

Intel. 2019b. Intel® Distribution of Caffe. https://github.com/intel/caffe.

Intel. 2019c. Intel Unveils New GPU Architecture with High-performance Computing and AI Acceleration, and oneAPI Software Stack with Unified and Scalable Abstraction for Heterogeneous Architectures.

Intel. 2019d. Next-generation Intel Xeon Scalable Processors to Deliver Breakthrough Platform Performance with up to 56 Processor Cores.

Intel. 2019e. PMDK: Persistent Memory Development Kit. https://github.com/pmem/pmdk/.

Intel. 2020a. HiBench Suite: The BigData Micro Benchmark Suite.

Intel. 2020b. Intel Announces Its Next Generation Memory and Storage Products.

Intel. 2020c. Intel Unpacks Architectural Innovations and Reveals New Transistor Technology at Architecture Day 2020.

Intel. 2021a. 24. Hash Library. Data Plane Development Kit 21.05.0 documentation. https://doc.dpdk.org/guides/prog_guide/hash_lib.html.

Intel. 2021b. AHCI Specification for Serial ATA.

Intel. 2021c. Intel® Advanced Vector Extensions 512 (Intel® AVX-512). https://www.intel.com/content/www/us/en/architecture-and-technology/avx-512-overview.html.

Intel. 2021d. Intel AI Hardware.

Intel. 2021e. The Intel Intrinsics Guide. https://software.intel.com/sites/landingpage/IntrinsicsGuide/.

Intel. 2021f. Intel® oneAPI Math Kernel Library.

Intel. 2021g. Intel® Xeon® Processors. https://www.intel.com/content/www/us/en/products/details/processors/xeon.html.

Intel. 2021h. oneAPI Deep Neural Network Library (oneDNN).

Intel. 2021i. Restricted Transactional Memory Overview. Intel® C++ Compiler Classic Developer Guide and Reference.

Intel. 2021j. Storage Performance Development Kit. https://github.com/spdk/spdk.

Interface AffinityFunction. 2021a. Apache Ignite (Ignite 2.10.0)—Apache Ignite.

Islam, N. S., M. W. Rahman, J. Jose, R. Rajachandrasekar, H. Wang, H. Subramoni, C. Murthy, and D. K. Panda. 2012. High Performance RDMA-based Design of HDFS over InfiniBand. SC ’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, 2012, pp. 1–12, doi: 10.1109/SC.2012.65.

Islam, Nusrat S., Xiaoyi Lu, Md. W. Rahman, and D. K. Panda. 2013. Can Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects? 2013 IEEE 21st Annual Symposium on High-performance Interconnects, 75–78, doi: 10.1109/HOTI.2013.24.

Islam, Nusrat S., Xiaoyi Lu, Md. Wasi-ur Rahman, and Dhabaleswar K. (DK) Panda. 2014. SOR-HDFS: A SEDA-based Approach to Maximize Overlapping in RDMA-enhanced HDFS. Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing. HDC ’14, 261–264.

Islam, Nusrat Sharmin, Xiaoyi Lu, Md. Wasi-ur Rahman, Jithin Jose, and Dhabaleswar K. (DK) Panda. 2012. A Micro-benchmark Suite for Evaluating HDFS Operations on Modern Clusters. In WBDB 2012: Specifying Big Data Benchmarks, 129–147. Lecture Notes in Computer Science 8163. New York: Springer.

Islam, Nusrat Sharmin, Xiaoyi Lu, Md. Wasi-ur Rahman, Dipti Shankar, and Dhabaleswar K. Panda. 2015. Triple-H: A Hybrid Approach to Accelerate HDFS on HPC Clusters with Heterogeneous Storage Architecture. 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), 101–110.

Islam, Nusrat Sharmin, Md. Wasi-ur Rahman, Xiaoyi Lu, and Dhabaleswar K. (DK) Panda. 2016a. Efficient Data Access Strategies for Hadoop and Spark on HPC Cluster with Heterogeneous Storage. 2016 IEEE International Conference on Big Data (Big Data), 223–232.

Islam, Nusrat Sharmin, Md. Wasi-ur Rahman, Xiaoyi Lu, and Dhabaleswar K. Panda. 2016b. High Performance Design for HDFS with Byte-addressability of NVM and RDMA. Ics ’16: Proceedings of the 2016 International Conference on Supercomputing, 8–1814.

ISO/IEC. 2016. ISO/IEC 9075-1:2016: Information Technology—Database Languages—SQL—Part 1: Framework (SQL/Framework).

Izraelevitz, Joseph, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, Amirsaman Memaripour, Yun Joon Soh, Zixuan Wang, Yi Xu, Subramanya R. Dulloor, et al. 2019. Basic Performance Measurements of the Intel Optane DC Persistent Memory Module. arXiv/CoRR abs/1903.05714.

Jacob, Leverich. 2021. Mutilate: A High-Performance Memcached Load Generator.

Javed, M. Haseeb, Khaled Z. Ibrahim, and Xiaoyi Lu. 2019. Performance Analysis of Deep Learning Workloads Using Roofline Trajectories. CCF Transactions on High Performance Computing 1 (3): 224–239.

Javed, M. H., X. Lu, and D. K. Panda. 2018. Cutting the Tail: Designing High Performance Message Brokers to Reduce Tail Latencies in Stream Processing. 2018 IEEE International Conference on Cluster Computing (CLUSTER), 223–233. 10.1109/CLUSTER.2018.00040.

Jia, Yangqing, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional Architecture for Fast Feature Embedding. Proceedings of the 22nd ACM International Conference on Multimedia, 675–678.

Jiang, Yimin, Yibo Zhu, Chang Lan, Bairen Yi, Yong Cui, and Chuanxiong Guo. 2020. A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters. 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), 463–479.

Jiang, Zihan, Wanling Gao, Lei Wang, Xingwang Xiong, Yuchen Zhang, Xu Wen, Chunjie Luo, Hainan Ye, Yunquan Zhang, Shengzhong Feng, et al. 2019. HPC AI500: A Benchmark Suite for HPC AI Systems. International Symposium on Benchmarking, Measuring and Optimization (pp. 10–22). Springer, Cham.

Jose, J., H. Subramoni, K. Kandalla, M. Wasi ur Rahman, H. Wang, S. Narravula, and D. K. Panda. 2012. Scalable Memcached Design for InfiniBand Clusters Using Hybrid Transports. 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID ’12), 236–243.

Jose, J., H. Subramoni, M. Luo, M. Zhang, J. Huang, Md. Wasi-ur Rahman, N. S. Islam, X. Ouyang, H. Wang, S. Sur, and D. K. Panda. 2011. Memcached Design on High Performance RDMA Capable Interconnects. Proceedings of the 2011 International Conference on Parallel Processing. International Conference on Parallel Processing, 2011, pp. 743–752, doi: 10.1109/ICPP.2011.37.

Jose, Jithin, Mingzhe Li, Xiaoyi Lu, Krishna Chaitanya Kandalla, Mark Daniel Arnold, and Dhabaleswar K. Panda. 2013. SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience. Proceedings of IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), 385–392.

Joshi, Kanchan, Kaushal Yadav, and Praval Choudhary. Enabling NVMe WRR Support in Linux Block Layer. In Hotstorage’17. (HotStorage ’17). 22.

Kadekodi, Rohan, Se Kwon Lee, Sanidhya Kashyap, Taesoo Kim, Aasheesh Kolli, and Vijay Chidambaram. 2019. SplitFS: Reducing Software Overhead in File Systems for Persistent Memory. Proceedings of the 27th ACM Symposium on Operating Systems Principles, 494–508.

Kalia, A., M. Kaminsky, and D. G. Andersen. 2014. Using RDMA Efficiently for Key-Value Services. Proceeding of SIGCOMM ’14, the 2014 ACM conference on SIGCOMM (SIGCOMM ’14). Association for Computing Machinery, New York, NY, USA, 295–306.

Kalia, Anuj, Michael Kaminsky, and David Andersen. 2019. Datacenter RPCs Can Be General and Fast. 16th USENIX Symposium on Networked Systems Design and Implementation ({NSDI} 19), 1–16.

Kalia, Anuj, Michael Kaminsky, and David G. Andersen. 2016. Design Guidelines for High Performance RDMA Systems. 2016 USENIX Annual Technical Conference (USENIX ATC 16), 437–450.

Kallman, Robert, Hideaki Kimura, Jonathan Natkins, Andrew Pavlo, Alexander Rasin, Stanley Zdonik, Evan P. C. Jones, Samuel Madden, Michael Stonebraker, Yang Zhang, et al. 2008. H-Store: A High-performance, Distributed Main Memory Transaction Processing System. Proceedings of the VLDB Endowment 1 (2): 1496–1499.

Kang, Jeong-Uk, Jeeseok Hyun, Hyunjoo Maeng, and Sangyeun Cho. 2014. The Multi-streamed Solid-state Drive. 6th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage ’14), 13.

Kang, Yangwook, Rekha Pitchumani, Pratik Mishra, Yang-suk Kee, Francisco Londono, Sangyoon Oh, Jongyeol Lee, and Daniel D. G. Lee. 2019. Towards Building a High-performance, Scale-in Key-Value Storage System. Proceedings of the 12th ACM International Conference on Systems and Storage. SYSTOR ’19, 144–154.

Kannan, Sudarsun, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Yuangang Wang, Jun Xu, and Gopinath Palani. 2018. Designing a True Direct-access File System with DevFS. 16th USENIX Conference on File and Storage Technologies (FAST 18), 241–256.

Kanwar, Pankaj, Peter Brandt, and Zongwei Zhou. 2020. TensorFlow 2 MLPerf Submissions Demonstrate Best-in-Class Performance on Google Cloud.

Kastuar, Vidhi, Will Ochandarena, and Tushar Saxena. 2019. Speed Up Training on Amazon SageMaker Using Amazon FSx for Lustre and Amazon EFS File Systems.

Katevenis, Manolis, Stefanos Sidiropoulos, and Costas Courcoubetis. 1991. Weighted Round-Robin Cell Multiplexing in a General-purpose ATM Switch Chip. IEEE Journal on Selected Areas in Communications 9 (8): 1265–1279.

Kehne, Jens, Jonathan Metter, and Frank Bellosa. GPUswap: Enabling Oversubscription of GPU Memory through Transparent Swapping. Vee ’15, 65–77.

Kemper, Alfons, and Thomas Neumann. 2021. HyPer: Hybrid OLTP&OLAP High-Performance Database System.

Kemper, Alfons, and Thomas Neumann. 2011. HyPer: A Hybrid OLTP&OLAP Main Memory Database System Based on Virtual Memory Snapshots. 2011 IEEE 27th International Conference on Data Engineering, 195–206.

Khorasani, Farzad, Keval Vora, Rajiv Gupta, and Laxmi N. Bhuyan. 2014. CuSha: Vertex-centric Graph Processing on GPUs. Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing. HPDC ’14, 239–252.

Kim, Changkyu, Tim Kaldewey, Victor W. Lee, Eric Sedlar, Anthony D. Nguyen, Nadathur Satish, Jatin Chhugani, Andrea Di Blas, and Pradeep Dubey. 2009. Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs. Proceedings of the VLDB Endowment 2 (2): 1378–1389.

Kim, Hyeong-Jun, Young-Sik Lee, and Jin-Soo Kim. 2016. NVMeDirect: A User-space I/O Framework for Application-specific Optimization on NVMe SSDs. 8th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage ’16).

Kim, Wook-Hee, Jinwoong Kim, Woongki Baek, Beomseok Nam, and Youjip Won. 2016. NVWAL: Exploiting NVRAM in Write-ahead Logging. SIGPLAN Not. 51, 4 (April 2016), 385–398.

Kinetica. 2021. The Database for Time and Space: Fuse, Analyze, and Act in Real Time.

Kingma, Diederik P., and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. 3rd International Conference for Learning Representations,

Kingsbury, Kyle, Pierre-Yves Ritschard, and James Turnbull. 2021. Riemann Monitors Distributed Systems.

Kissinger, Thomas, Tim Kiefer, Benjamin Schlegel, Dirk Habich, Daniel Molka, and Wolfgang Lehner. 2014. ERIS: A NUMA-aware In-memory Storage Engine for Analytical Workloads. Proceedings of the VLDB Endowment 7 (14): 1–12.

Klimovic, Ana, Heiner Litz, and Christos Kozyrakis. 2017. ReFlex: Remote Flash ≈ Local Flash. ACM SIGARCH Computer Architecture News 45 (1): 345–359.

Klimovic, Ana, Yawen Wang, Patrick Stuedi, Animesh Trivedi, Jonas Pfefferle, and Christos Kozyrakis. 2018. Pocket: Elastic Ephemeral Storage for Serverless Analytics. 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), 427–444.

Konduit. 2021. Deep Learning for Java: Open-source, Distributed, Deep Learning Library for the JVM.

Kourtis, Kornilios, Nikolas Ioannou, and Ioannis Koltsidas. 2019. Reaping the Performance of Fast {NVM} Storage with uDepot. 17th USENIX Conference on File and Storage Technologies (FAST 19), 1–15.

Krizhevsky, Alex. 2009. Learning Multiple Layers of Features from Tiny Images.

Krizhevsky, Alex. 2021. The CIFAR-10 Dataset. http://www.cs.toronto.edu/~kriz/cifar.html.

Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. Proceedings of the 25th International Conference on Neural Information Processing Systems—Volume 1 (NIPS ’12). Curran Associates Inc. 1097–1105.

Kubernetes Team. 2021. Kubernetes: Production-grade Container Orchestration.

Kulkarni, Sanjeev, Nikunj Bhagat, Maosong Fu, Vikas Kedigehalli, Christopher Kellogg, Sailesh Mittal, Jignesh M. Patel, Karthik Ramasamy, and Siddarth Taneja. 2015. Twitter Heron: Stream Processing at Scale. Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. SIGMOD ’15, 239–250.

Kültürsay, Emre, Mahmut Kandemir, Anand Sivasubramaniam, and Onur Mutlu. 2013. Evaluating STT-RAM as an Energy-efficient Main Memory Alternative. 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 256–267.

Kurth, T., J. Zhang, N. Satish, I. Mitliagkas, E. Racah, M. A. Patwary, T. Malas, N. Sundaram, W. Bhimji, M. Smorkalov, et al. 2017. Deep Learning at 15PF: Supervised and Semi-supervised Classification for Scientific Data. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’17). Association for Computing Machinery, Article 7, 1–11. DOI

Kwak, Haewoon, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What Is Twitter, a Social Network or a News Media? Proceedings of the 19th International Conference on World Wide Web. WWW ’10, 591–600.

Kwon, Dongup, Junehyuk Boo, Dongryeong Kim, and Jangwoo Kim. 2020. FVM: FPGA-assisted Virtual Device Emulation for Fast, Scalable, and Flexible Storage Virtualization. 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), 955–971.

Kwon, Y., Fingler, H., Hunt, T., Peter, S., Witchel, E., & Anderson, T. (2017, October). Strata: A Cross Media File S System. Proceedings of the 26th Symposium on Operating Systems Principles (pp. 460–477).

Lagrange, Veronica, Changho Choi, and Vijay Balakrishnan. 2016. Accelerating OLTP Performance with NVMe SSDs.

Lavasani, Maysam, Hari Angepat, and Derek Chiou. 2014. An FPGA-based In-line Accelerator for Memcached. IEEE Computer Architecture Letters 13 (2): 57–60.

Lawrence Livermore National Laboratory. 2020. LLNL and HPE to Partner with AMD on El Capitan, Projected as World’s Fastest Supercomputer.

Lawrence, Steve, C. Lee Giles, Ah Chung Tsoi, and Andrew D. Back. 1997. Face Recognition: A Convolutional Neural-Network Approach. IEEE Transactions on Neural Networks 8 (1): 98–113.

LDBC. 2021. Linked Data Benchmark Council (LDBC): The Graph and RDF Benchmark Reference. 27–31.

LeCun, Yann, Corinna Cortes, and Christopher J. C. Burges. 2021. The MNIST Database of Handwritten Digits.

Lee, Benjamin C., Engin Ipek, Onur Mutlu, and Doug Burger. 2009. Architecting Phase Change Memory as a Scalable DRAM Alternative. Proceedings of the 36th annual international symposium on Computer architecture (ISCA ’09). Association for Computing Machinery, 2–13.

Lee, Benjamin C., Ping Zhou, Jun Yang, Youtao Zhang, Bo Zhao, Engin Ipek, Onur Mutlu, and Doug Burger. 2010. Phase-change Technology and the Future of Main Memory. IEEE Micro 30 (1). 143–143.

Lee, Changman, Dongho Sim, Jooyoung Hwang, and Sangyeun Cho. 2015. F2FS: A New File System for Flash Storage. 13th USENIX Conference on File and Storage Technologies (FAST 15), 273–286.

Lee, Hyungro, and Geoffrey Fox. 2019. Big Data Benchmarks of High-performance Storage Systems on Commercial Bare Metal Clouds. 2019 IEEE 12th International Conference on Cloud Computing (CLOUD), 1–8.

Leibiusky, Jonathan. 2021. Jedis. https://github.com/redis/jedis.

Leis, Viktor, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2014. Morsel-driven Parallelism: A NUMA-aware Query Evaluation Framework for the Many-core Age. Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, 743–754.

Lenharth, Andrew, and Keshav Pingali. 2015. Scaling Runtimes for Irregular Algorithms to Large-scale NUMA Systems. Computer 48 (8): 35–44.

Lepak, Kevin, Gerry Talbot, Sean White, Noah Beck, and Sam Naffziger. 2017. The Next Generation amd Enterprise Server Product Architecture. IEEE Hot Chips 29.

Lepers, Baptiste, Oana Balmau, Karan Gupta, and Willy Zwaenepoel. 2019. KVell: The Design and Implementation of a Fast Persistent Key-Value Store. SOSP ’19: Proceedings of the 27th ACM Symposium on Operating Systems Principles, 447–461.

Li, Feng, Sudipto Das, Manoj Syamala, and Vivek R. Narasayya. 2016. Accelerating Relational Databases by Leveraging Remote Memory and RDMA. Proceedings of the 2016 International Conference on Management of Data, 355–370.

Li, Haoyuan, Ali Ghodsi, Matei Zaharia, Scott Shenker, and Ion Stoica. 2014. Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks. SoCC ’14: Proceedings of the ACM Symposium on Cloud Computing, 1–15.

Li, Min, Jian Tan, Yandong Wang, Li Zhang, and Valentina Salapura. 2015. SparkBench: A Comprehensive Benchmarking Suite for in Memory Data Analytic Platform Spark. Proceedings of the 12th ACM International Conference on Computing Frontiers (CF ’15), 53–1538.

Li, Mingzhe, Xiaoyi Lu, Khaled Hamidouche, Jie Zhang, and D. K. Panda. 2016. Mizan-RMA: Accelerating Mizan Graph Processing Framework with MPI RMA. 2016 IEEE 23rd International Conference on High Performance Computing (HPC), 42–51.

Li, Mu, David G. Andersen, Jun Woo Park, Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, and Bor-Yiing Su. 2014. Scaling Distributed Machine Learning with the Parameter Server. Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI ’14), 583–598.

Li, M., X. Lu, S. Potluri, K. Hamidouche, J. Jose, K. Tomko, and D. K. Panda. 2014. Scalable Graph500 Design with MPI-3 RMA. 2014 IEEE International Conference on Cluster Computing (CLUSTER), 230–238.

Li, Peilong, Yan Luo, Ning Zhang, and Yu Cao. 2015. Heterospark: A Heterogeneous CPU/GPU Spark Platform for Machine Learning Algorithms. 2015 IEEE International Conference on Networking, Architecture and Storage (NAS), 347–348.

Li, Tianxi, Dipti Shankar, Shashank Gugnani, and Xiaoyi Lu. 2020. RDMP-KV: Designing Remote Direct Memory Persistence Based Key-Value Stores with PMEM. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’20).

Li, Yixing, Zichuan Liu, Kai Xu, Hao Yu, and Fengbo Ren. 2018. A GPU-outperforming FPGA Accelerator Architecture for Binary Convolutional Neural Networks. Journal of Emerging Technologies in Computing Systems 14 (2). doi:10.1145/3154839.

LightNVM. 2018. Open-Channel SSD. http://lightnvm.io/.

Lim, H., D. Han, D. G. Andersen, and M. Kaminsky. 2014. MICA: A Holistic Approach to Fast In-memory Key-Value Storage. Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation (NSDI ’14).

Lin, Jimmy, and Alek Kolcz. 2012. Large-scale Machine Learning at Twitter. Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD ’12), 793–804.

Lin, Tsung-Yi, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. European Conference on Computer Vision, 740–755.

Linux. 2021a. AIO—POSIX Asynchronous I/O Overview.

Linux. 2021b. lseek—Linux Manual Page. https://man7.org/linux/man-pages/man2/lseek.2.html.

Linux RDMA. 2021. RDMA Core Userspace Libraries and Daemons.

Liu, Feilong, Lingyan Yin, and Spyros Blanas. 2019. Design and Evaluation of an RDMA-Aware Data Shuffling Operator for Parallel Database Systems. ACM Transactions on Database Systems 44 (4):1–45.

Liu, Jiuxing. 2010. Evaluating Standard-based Self-virtualizing Devices: A Performance Study on 10 GbE NICs with SR-IOV Support. Proceedings of IEEE International Parallel and Distributed Processing Symposium (IPDPS), 1–12.

Liu, Xin, Yu-tong Lu, Jie Yu, Peng-fei Wang, Jie-ting Wu, and Ying Lu. 2017. ONFS: A Hierarchical Hybrid File System Based on Memory, SSD, and HDD for High Performance Computers. Frontiers of Information Technology and Electronic Engineering 18 (12): 1940–1971.

Lockwood, Glenn. 2017. What’s So Bad about POSIX I/O?

Low, Yucheng, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M. Hellerstein. 2012. Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud. Proceedings of the VLDB Endowment 5 (8): 716–727.

Low, Yucheng, Joseph E. Gonzalez, Aapo Kyrola, Danny Bickson, Carlos E. Guestrin, and Joseph Hellerstein. 2014. GraphLab: A New Framework For Parallel Machine Learning. arXiv preprint. arXiv:1408.2041.

Lu, J., Y. Wan, Y. Li, C. Zhang, H. Dai, Y. Wang, G. Zhang, and B. Liu. 2019. Ultra-fast Bloom Filters using SIMD Techniques. IEEE Transactions on Parallel and Distributed Systems 30 (4): 953–964.

Lu, Lanyue, Thanumalayan Sankaranarayana Pillai, Hariharan Gopalakrishnan, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2017. WiscKey: Separating Keys from Values in SSD-Conscious Storage. ACM Transactions on Storage (TOS) 13 (1): 1–28.

Lu, Ruirui, Gang Wu, Bin Xie, and Jingtong Hu. 2014. Stream Bench: Towards Benchmarking Modern Distributed Stream Computing Frameworks. Proceedings of the 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing. (UCC ’14), 69–78.

Lu, X., H. Shi, H. Javed, R. Biswas, and D. K. Panda. 2017. Characterizing Deep Learning over Big Data (DLoBD) Stacks on RDMA-capable Networks. The 25th Annual Symposium on High-Performance Interconnects (HOTI).

Lu, X., H. Shi, R. Biswas, M. H. Javed, and D. K. Panda. 2018. Dlobd: A comprehensive study of deep learning over big data stacks on hpc clusters. IEEE Transactions on Multi-Scale Computing Systems 4 (4): 635–648. doi:10.1109/TMSCS.2018.2845886.

Lu, Xiaoyi, Bin Wang, Li Zha, and Zhiwei Xu. 2011. Can MPI Benefit Hadoop and MapReduce Applications? 2011 40th International Conference on Parallel Processing Workshops, 371–379.

Lu, Xiaoyi, Dipti Shankar, Shashank Gugnani, and Dhabaleswar K. Panda. 2016. High-performance Design of Apache Spark with RDMA and Its Benefits on Various Workloads. Proceedings of the 2016 IEEE International Conference on Big Data (Big Data). 253–262.

Lu, Xiaoyi, Fan Liang, Bin Wang, Li Zha, and Zhiwei Xu. 2014. DataMPI: Extending MPI to Hadoop-like Big Data Computing. 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 829–838.

Lu, Xiaoyi, Haiyang Shi, Dipti Shankar, and Dhabaleswar K. Panda. 2017. Performance Characterization and Acceleration of Big Data Workloads on OpenPOWER System. 2017 IEEE International Conference on Big Data (Big Data). 213–222. doi: 10.1109/BigData.2017.8257929.

Lu, Xiaoyi, Md. Wasi-ur Rahman, Nusrat Sharmin Islam, and Dhabaleswar K. (DK) Panda. 2014. A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks. In Advancing Big Data Benchmarks, 32–42. Lecture Notes in Computer Science 8585. New York: Springer.

Lu, Xiaoyi, M. W. U. Rahman, N. Islam, D. Shankar, and D. K. Panda. 2014. Accelerating Spark with RDMA for Big Data Processing: Early Experiences. 2014 IEEE 22nd Annual Symposium on High-Performance Interconnects (HOTI), 9–16.

Lu, Xiaoyi, Nusrat S. Islam, Md. Wasi. Rahman, Jithin Jose, Hari Subramoni, Hao Wang, and Dhabaleswar K. Panda. 2013. High-performance Design of Hadoop RPC with RDMA over InfiniBand. Proceedings of IEEE 42nd International Conference on Parallel Processing (ICPP). 641–650.

Lu, X., N. Islam, W. Rahman, and D. Panda. 2017. NRCIO: NVM-aware RDMA-based Communication and I/O Schemes for Big Data Analytics. Eighth Annual Non-Volatile Memories Workshop (NVMW ’17) [Presentation].

Lu, Youyou, Jiwu Shu, and Wei Wang. 2014. ReconFS: A Reconstructable File System on Flash Storage. 12th USENIX Conference on File and Storage Technologies (FAST 14), 75–88.

Lu, Youyou, Jiwu Shu, Youmin Chen, and Tao Li. 2017. Octopus: An RDMA-enabled Distributed Persistent Memory File System. 2017 USENIX Annual Technical Conference (USENIX ATC 17), 773–785.

Ma, Lingxiao, Zhi Yang, Han Chen, Jilong Xue, and Yafei Dai. 2017. Garaph: Efficient GPU-accelerated Graph Processing on a Single Machine with Balanced Replication. 2017 USENIX Annual Technical Conference (USENIX ATC 17), 195–207.

Malewicz, Grzegorz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A System for Large-Scale Graph Processing. Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, 135–146.

Markham, A., and Y. Jia. 2017. Caffe2: Portable High-performance Deep Learning Framework from Facebook. NVIDIA Developer (the blog):

Markthub, Pak, Mehmet E. Belviranli, Seyong Lee, Jeffrey S. Vetter, and Satoshi Matsuoka. Dragon: Breaking GPU Memory Capacity Limits with Direct NVM Access. In (SC ’18), 32, 1–13.

Marmol, Leonardo, Swaminathan Sundararaman, Nisha Talagala, Raju Rangaswami, Sushma Devendrappa, Bharath Ramsundar, and Sriram Ganesan. 2014. NVMKV: A Scalable and Lightweight Flash Aware Key-Value Store. Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC ’15). USENIX Association, USA, 207–219.

Massie, Matt. 2018. Ganglia Monitoring System.

Mattson, Peter, Christine Cheng, Gregory Diamos, Cody Coleman, Paulius Micikevicius, David Patterson, Hanlin Tang, Gu-Yeon Wei, Peter Bailis, Victor Bittorf, et al. 2020. MLPerf Training Benchmark. Proceedings of Machine Learning and Systems 2: 336–349.

Mellanox. 2010. CORE-Direct: The Most Advanced Technology for MPI/SHMEM Collectives Offloads. Technology brief.

Mellanox. 2016. Understanding Erasure Coding Offload. https://community.mellanox.com/docs/DOC-2414.

Mellanox. NVIDIA. 2021. NVIDIA Bluefield Data Processing Units.

Mellanox. 2018. Introducing 200G HDR InfiniBand Solutions. White paper. https://www.mellanox.com/related-docs/whitepapers/WP_Introducing_200G_HDR_InfiniBand_Solutions.pdf.

Mellanox, NVIDIA. 2011. Mellanox Announces Availability of UDA 2.0 for Big Data Analytic Acceleration.

Mellanox, NVIDIA. 2018. SparkRDMA ShuffleManager Plugin. https://github.com/Mellanox/SparkRDMA/.

Mellanox, NVIDIA. 2021. End-to-End High-Speed Ethernet and InfiniBand Interconnect Solutions.

Meng, Xiangrui, Joseph Bradley, Burak Yavuz, Evan Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, D. B. Tsai, Manish Amde, Sean Owen, et al. 2016. MLlib: Machine Learning in Apache Spark. Journal of Machine Learning Research 17 (1): 1235–1241.

Mickens, James, Edmund B. Nightingale, Jeremy Elson, Darren Gehring, Bin Fan, Asim Kadav, Vijay Chidambaram, Osama Khan, and Krishna Nareddy. 2014. Blizzard: Fast, Cloud-scale Block Storage for Cloud-oblivious Applications. 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14), 257–273.

Micron. 2019. 3D XPoint technology.

Microsoft. 2021a. Create an Azure VM with Accelerated Networking using Azure CLI—Microsoft Docs.

Microsoft. 2021b. GitHub - MicrosoftResearch/Dryad: This Is a Research Prototype of the Dryad and DryadLINQ Data-parallel Processing Frameworks Running on Hadoop YARN.

Min, Changwoo, Sanidhya Kashyap, Steffen Maass, and Taesoo Kim. 2016. Understanding Manycore Scalability of File Systems. USENIX Annual Technical Conference (USENIX ATC ’16), 71–85.

Mitchell, C., Y. Geng, and J. Li. 2013. Using One-sided RDMA Reads to Build a Fast, CPU-efficient Key-Value Store. Proceedings of USENIX Annual Technical Conference (USENIX ATC ’13).

MLBench Team. 2021. MLBench: Distributed Machine Learning Benchmark.

MLCommons. 2021. MLCommons Aims to Accelerate Machine Learning Innovation to Benefit Everyone.

Monroe, Don. 2020. Fugaku takes the lead. Communications of the ACM 64 (1): 16–18.

Moody, Adam, Danielle Sikich, Ned Bass, Michael J. Brim, Cameron Stanavige, Hyogi Sim, Joseph Moore, Tony Hutter, Swen Boehm, Kathryn Mohror, et al.; USDOE National Nuclear Security Administration. 2017. UnifyFS: A Distributed Burst Buffer File System 0.1.0.

Moor Insights and Strategy. 2020. The Graphcore Second Generation IPU. https://www.graphcore.ai/hubfs/MK2-%20The%20Graphcore%202nd%20Generation%20IPU%20Final%20v7.14.2020.pdf?h.

Morgan, Timothy Prickett. 2019. Doing the Math on Future Exascale Supercomputers.

Moritz, Philipp, Robert Nishihara, Ion Stoica, and Michael I. Jordan. 2015. SparkNet: Training Deep Networks in Spark. CoRR, abs/1511.06051.

Mouzakitis, Evan. 2016. How to monitor Hadoop with Datadog.

Mouzakitis, Evan, and David Lentz. 2018. Monitor Redis using Datadog. https://www.datadoghq.com/blog/monitor-redis-using-datadog/.

MPI, Forum. 1993. MPI: a message passing interface. In Proceedings of the 1993 ACM/IEEE conference on Supercomputing (Supercomputing ’93). Association for Computing Machinery, New York, NY, USA, 878–883. DOI

Murphy, Barbara. 2018. How to Shorten Deep Learning Training Times.

MySQL. 2020. MySQL Database. http://www.mysql.com.

National Energy Research Scientific Computing Center (NERSC). 2021a. Cori.

National Energy Research Scientific Computing Center (NERSC). 2021b. Perlmutter.

Netty Project, The. 2021. Netty Project. http://netty.io.

Network Based Computing Lab (NOWLAB). 2021a. High-Performance Big Data (HiBD).

Network Based Computing Lab (NOWLAB). 2021b. MVAPICH: MPI over InfiniBand, Omni-Path, Ethernet/iWARP, and RoCE.

Network Based Computing Lab (NOWLAB). 2022. High-Performance Big Data (HiBD).

Neumann, Thomas, and Gerhard Weikum. 2008. RDF-3X: A RISC-style Engine for RDF. Proceedings of the VLDB Endowment 1 (1): 647–659.

NLM, National Library of Medicine. 2020. PubChemRDF.

Norton, Alex, Steve Conway, and Earl Joseph. 2020. Bringing HPC Expertise to Cloud Computing. White paper. Hyperion Research.

NoSQL Database. 2021. NoSQL—Your Ultimate Guide to the Non-Relational Universe!.

Nowoczynski, P., N. Stone, J. Yanovich, and J. Sommerfield. 2008. Zest Checkpoint Storage System for Large Supercomputers. 2008 3rd Petascale Data Storage Workshop, 1–5.

NumFOCUS. 2021. Pandas.

NumPy. 2021. NumPy. http://www.numpy.org/.

NVIDIA. 2017. NVIDIA Tesla V100 GPU. ARCHITECTURE.

NVIDIA. 2020. cuStreamz: A Journey to Develop GPU-Accelerated Streaming Using RAPIDS. https://www.nvidia.com/en-us/on-demand/session/gtcfall20-a21437/.

NVIDIA. 2021a. CUDA Zone.

NVIDIA. 2021b. cuGraph. https://github.com/rapidsai/cugraph.

NVIDIA. 2021c. cuML.

NVIDIA. 2021d. Developing a Linux Kernel Module using GPUDirect RDMA.

NVIDIA. 2021e. GDRCopy: A Low-latency GPU Memory Copy Library based on NVIDIA GPUDirect RDMA Technology.

NVIDIA (Mellanox Technologies). 2021f. Apache Spark RDMA plugin.

NVIDIA. 2021g. NVIDIA Ampere Architecture: The Heart of the World’s Highest-performing Elastic Data Centers.

NVIDIA. 2021h. NVIDIA DGX-1: Essential Instrument of AI Research. https://www.nvidia.com/en-us/data-center/dgx-1/.

NVIDIA. 2021i. NVIDIA DGX-2: Break through the Barriers to AI Speed and Scale.

NVIDIA. 2021j. NVIDIA DGX Systems: Purpose-built for the Unique Demands of AI.

NVIDIA. 2021k. NVIDIA Pascal Architecture: Infinite Compute for Infinite Opportunities.

NVIDIA. 2021l. NVIDIA Turing GPU Architecture: Graphics Reinvented. White paper.

NVIDIA. 2021m. About Us. https://www.nvidia.com/en-us/about-nvidia/.

NVIDIA. 2021n. RAPIDS—Open GPU Data Science.

NVIDIA. 2021o. Virtual GPU Software User Guide.

NVMe Express. 2016. NVMe over Fabrics. http://www.nvmexpress.org/wp-content/uploads/NVMe_Over_Fabrics.pdf.

NVM Express. 2021. NVM Express.

Oak Ridge National Laboratory. 2018. Summit: America’s Newest and Smartest Supercomputer.

Oak Ridge National Laboratory. 2021a. Frontier: Direction of Discovery: ORNL”s Exascale Supercomputer Designed to Deliver World-Leading Performance in 2021.

Oak Ridge National Laboratory. 2021b. Summit: Oak Ridge National Laboratory’s 200 Petaflop Supercomputer.

OpenFabrics Alliance. 2021. OpenFabrics Alliance—Innovation in High Speed Fabrics.

Open Group, The. 2011. POSIXTM 1003.1 Frequently Asked Questions (FAQ Version 1.18).

OpenMP. 2018. OpenMP API Specification Version 5.0 November 2018: 2.9.3 SIMD Directives.

OpenSFS and EOFS. 2021. Lustre® Filesystem. http://lustre.org/.

OpenStack. 2021a. Cinder.

OpenStack. 2021b. OpenStack Object Storage (Swift).

OpenVINO. 2021. OpenVINO Toolkit. https://github.com/openvinotoolkit/openvino.

Oracle. 2021. MySQL.

OrangeFS. 2021. The OrangeFS Project.

Ott, David. 2011. Optimizing Applications for NUMA.

Ould-Ahmed-Vall, Elmoustapha, Mahmoud Abuzaina, Md. Faijul Amin, Jayaram Bobba, Roman S. Dubtsov, Evarist M. Fomenko, Mukesh Gangadhar, Niranjan Hasabnis, Jing Huang, Deepthi Karkada, et al. 2017. TensorFlow Optimizations on Modern Intel Architecture.

Ousterhout, John, Arjun Gopalan, Ashish Gupta, Ankita Kejriwal, Collin Lee, Behnam Montazeri, Diego Ongaro, Seo Jin Park, Henry Qin, Mendel Rosenblum, et al. 2015. The RAMCloud Storage System. ACM Transactions on Computer Systems (TOCS) 33 (3): 7.

Ozery, Itay. 2018. Mellanox Accelerates Apache Spark Performance with RDMA and RoCE Technologies.

Padua, David, ed. 2011. Partitioned Global Address Space (PGAS) Languages. In Encyclopedia of Parallel Computing, 1465. Boston, MA: Springer.

Pagh, Rasmus, and Flemming Friche Rodler. 2004. Cuckoo Hashing. Journal of Algorithms 51 (2): 122–144.

Palit, Tapti, Yongming Shen, and Michael Ferdman. 2016. Demystifying Cloud Benchmarking. 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 122–132.

Panda, Biswanath, Joshua S. Herbach, Sugato Basu, and Roberto J. Bayardo. 2011. MapReduce and Its Application to Massively Parallel Learning of Decision Tree Ensembles. In Scaling Up Machine Learning, eds. Ron Bekkerman, Mikhail Bilenko, and John Langford, 23–48. Cambridge: Cambridge University Press.

Panda, Dhabaleswar K., Xiaoyi Lu, and Hari Subramoni. 2018. Networking and Communication Challenges for Post-exascale Systems. Frontiers of Information Technology and Electronic Engineering 19: 1230–1235.

Panigrahy, Rina. 2004. Efficient Hashing with Lookups in Two Memory Accesses. arXiv/CoRR cs.DS/0407023.

Paszke, Adam, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An Imperative Style, High-performance Deep Learning Library. arXiv preprint. arXiv:1912.01703.

PCI-SIG. 2019. Pioneering the Interconnect Industry: PCI-SIG® Announces Upcoming PCIe® 6.0 Specification.

PCI-SIG. 2021. Single Root I/O Virtualization and Sharing Specification Revision 1.1. https://pcisig.com/single-root-io-virtualization-and-sharing-specification-revision-11

Pelley, Steven, Thomas F. Wenisch, Brian T. Gold, and Bill Bridge. 2013. Storage Management in the NVRAM Era. Proceedings of the VLDB Endowment 7 (2): 121–132.

PlatformLab. 2021. RAMCloud.

Plotly. 2021. Plotly. https://github.com/plotly/plotly.py.

Pmemkv. 2018. pmemkv.

Poke, Marius, and Torsten Hoefler. 2015. DARE: High-performance State Machine Replication on RDMA Networks. Proceedings of the 24th International Symposium on High-performance Parallel and Distributed Computing, 107–118.

Polychroniou, Orestis, and Kenneth A. Ross. 2014. Vectorized Bloom Filters for Advanced SIMD Processors. Proceedings of the Tenth International Workshop on Data Management on New Hardware, 6.

Polychroniou, Orestis, Arun Raghavan, and Kenneth A. Ross. 2015. Rethinking SIMD Vectorization for In-memory Databases. Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, 1493–1508.

Porobic, Danica, Erietta Liarou, Pınar Tözün, and Anastasia Ailamaki. 2014. Atrapos: Adaptive Transaction Processing on Hardware Islands. 2014 IEEE 30th International Conference on Data Engineering, 688–699.

Powell, Brett. 2017. Microsoft Power BI Cookbook: Creating Business Intelligence Solutions of Analytical Data Models, Reports, and Dashboards. Birmingham, UK: Packt Publishing Ltd.

Project Jupyter. 2021. Jupyter.

Prometheus. 2021. Prometheus—From Metrics to Insight.

Psaroudakis, Iraklis, Tobias Scheuer, Norman May, Abdelkader Sellami, and Anastasia Ailamaki. 2015. Scaling Up Concurrent Main-Memory Column-Store Scans: Towards Adaptive NUMA-aware Data and Task Placement. Proceedings of the VLDB Endowment 8 (CONF): 1442–1453.

Psaroudakis, Iraklis, Tobias Scheuer, Norman May, Abdelkader Sellami, and Anastasia Ailamaki. 2016. Adaptive NUMA-aware Data Placement and Task Scheduling for Analytical Workloads in Main-Memory Column-Stores. Proceedings of the VLDB Endowment 10 (2): 37–48.

Pumma, Sarunya, Min Si, Wu-Chun Feng, and Pavan Balaji. 2019. Scalable Deep Learning via I/O Analysis and Optimization. ACM Transactions on Parallel Computing 6 (2): article 6.

Python. 2021. Threading—Thread-based Parallelism.

Qureshi, Moinuddin K., Vijayalakshmi Srinivasan, and Jude A. Rivers. 2009. Scalable High Performance Main Memory System Using Phase-change Memory Technology. ACM SIGARCH Computer Architecture News 37 (3): 24–33.

Rahman, Md. Wasi-ur, Nusrat Sharmin Islam, Xiaoyi Lu, Jithin Jose, Hari Subramoni, Hao Wang, and Dhabaleswar K. Panda. 2013. High-performance RDMA-based Design of Hadoop MapReduce over InfiniBand. 2013 IEEE 27th International Parallel and Distributed Processing Symposium Workshops and PHD Forum (IPDPSW), 1908–1917.

Rahman, Md. Wasi ur, Nusrat Sharmin Islam, Xiaoyi Lu, and Dhabaleswar K. Panda. 2017. NVMD: Non-Volatile Memory Assisted Design for Accelerating MapReduce and DAG Execution Frameworks on HPC Systems. Proceedings of IEEE International Conference on Big Data, BigData ’17, 369–374.

Rahman, M. W., Xiaoyi Lu, Nusrat S. Islam, and D. K. Panda. 2014. HOMR: A Hybrid Approach to Exploit Maximum Overlapping in MapReduce over High Performance Interconnects. Proceedings of the 28th ACM international conference on Supercomputing (ICS ’14). Association for Computing Machinery, 33–42.

Rahman, M. W., Xiaoyi Lu, Nusrat S. Islam, Raghunath Rajachadrasekar, and D. K. Panda. 2015. High-performance Design of YARN MapReduce on Modern HPC Clusters with Lustre and RDMA. 2015 IEEE International Parallel and Distributed Processing Symposium, 2015, pp. 291–300, doi: 10.1109/IPDPS.2015.83.

Raja, Raghu. 2019. Amazon Elastic Fabric Adapter: Anatomy, Capabilities, and the Road Ahead. 15th Annual OpenFabrics Alliance Workshop.

Rajpurkar, Pranav, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ Questions for Machine Comprehension of Text. arXiv preprint. arXiv:1606.05250.

Raju, Pandian, Rohan Kadekodi, Vijay Chidambaram, and Ittai Abraham. 2017. PebblesDB: Building Key-Value Stores Using Fragmented Log-Structured Merge Trees. Proceedings of the 26th symposium on Operating Systems Principles (SOSP ’17). Association for Computing Machinery, New York, NY, USA, 497–514.

RapidLoop. 2021. OpsDash. https://www.opsdash.com/integrations.

RAPIDS. 2021. cuDF—GPU DataFrames Library.

RDMA Consortium. 2016. Architectural Specifications for RDMA over TCP/IP.

Reddi, Vijay Janapa, Christine Cheng, David Kanter, Peter Mattson, Guenther Schmuelling, Carole-Jean Wu, Brian Anderson, Maximilien Breughe, Mark Charlebois, William Chou, et al. 2020. MLPerf Inference Benchmark. 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), 446–459.

Red Hat, Inc. 2021. Gluster Is a Free and Open Source Software Scalable Network Filesystem.

Redis Labs. 2021a. RedisGraph: A Graph Database Module for Redis.

Redis Labs. 2021b. Redis Cluster Specification.

Redis Labs. 2021c. Redis Sentinel Documentation.

Redis Labs. 2021d. Redis. https://redis.io.

Ren, Kun, Alexander Thomson, and Daniel J. Abadi. 2014. An Evaluation of the Advantages and Disadvantages of Deterministic Database Systems. Proceedings of the VLDB Endowment 7 (10): 821–832.

Reynolds, Douglas A., and Richard C. Rose. 1995. Robust Text-independent Speaker Identification Using Gaussian Mixture Speaker Models. IEEE Transactions on Speech and Audio Processing 3 (1): 72–83.

Rho, Eunhee, Kanchan Joshi, Seung-Uk Shin, Nitesh Jagadeesh Shetty, Jooyoung Hwang, Sangyeun Cho, Daniel D. G. Lee, and Jaeheon Jeong. 2018. FStream: Managing Flash Streams in the File System. 16th USENIX Conference on File and Storage Technologies (FAST 18), 257–264.

RIKEN Center for Computational Science. 2020. Fugaku (supercomputer).

Rödiger, Wolf, Tobias Mühlbauer, Alfons Kemper, and Thomas Neumann. 2015. High-speed Query Processing over High-speed Networks. Proceedings of the VLDB Endowment 9 (4): 228–239.

Rohloff, Kurt, and Richard E. Schantz. 2010. High-performance, Massively Scalable Distributed Systems Using the MapReduce Software Framework: The SHARD Triple-Store. Programming Support Innovations for Emerging Distributed Applications (PSI ETA ’10), 4–145.

Ronan Clément, Koray, and Soumith. 2021. Torch - Scientific computing for LuaJIT.

Ross, Kenneth A. 2007. Efficient Hash Probes on Modern Processors. IEEE 23rd International Conference on Data Engineering (ICDE ’07), 1297–1301.

Rudoff, Andy. 2013. Programming Models for Emerging Non-volatile Memory Technologies. ;login: 38 (3): 40–45.

Rudoff, Andy. 2017. Persistent Memory Programming. ;login: 42: 34–40.

Ruprecht, Adam, Danny Jones, Dmitry Shiraev, Greg Harmon, Maya Spivak, Michael Krebs, Miche Baker-Harvey, and Tyler Sanderson. 2018. VM Live Migration at Scale. VEE ’18: Proceedings of the 14th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, 45–56.

Sadasivam, Satish Kumar, Brian W. Thompto, Ron Kalla, and William J. Starke. 2017. IBM Power9 Processor Architecture. IEEE Micro 37 (2): 40–51.

SambaNova Systems. 2021a. (Compute) Power to the People: Democratizing AI: A Conversation with AI Visionaries from SambaNova Systems.

SambaNova Systems. 2021b. Accelerated Computing with a Reconfigurable Dataflow Architecture. https://sambanova.ai/wp-content/uploads/2021/06/SambaNova_RDA_Whitepaper_English.pdf.

Saxena, Mohit, Michael M. Swift, and Yiying Zhang. 2012. FlashTier: A Lightweight, Consistent and Durable Storage Cache. Proceedings of the 7th ACM European Conference on Computer Systems, 267–280.

SchedMD. 2020a. Slurm Workload Manager—Documentation.

SchedMD. 2020b. Slurm Workload Manager—Overview.

Schmuck, Frank B., and Roger L Haskin. 2002. GPFS: A Shared-Disk File System for Large Computing Clusters. Proceedings of the Conference on File and Storage Technologies (FAST ’02), 231–244.

Scouarnec, Nicolas Le. 2018. Cuckoo++ Hash Tables: High-performance Hash Tables for Networking Applications. Proceedings of the 2018 Symposium on Architectures for Networking and Communications Systems, 41–54.

Scylla. 2021. The Real-time Big Data Database. www.scylladb.com.

SDSC, San Diego Supercomputer Center. 2021a. SDSC Comet User Guide.

SDSC, San Diego Supercomputer Center. 2021b. SDSC Gordon User Guide.

Segal, Oren, and Martin Margala. 2016. Exploring the Performance Benefits of Heterogeneity and Reconfigurable Architectures in a Commodity Cloud. 2016 International Conference on High Performance Computing and Simulation (HPCS), 132–139.

Segal, Oren, Martin Margala, Sai Rahul Chalamalasetti, and Mitch Wright. 2014. High Level Programming Framework for FPGAs in the Data Center. 2014 24th International Conference on Field Programmable Logic and Applications (FPL), 1–4.

Seide, Frank, and Amit Agarwal. 2016. CNTK: Microsoft’s Open-Source Deep-Learning Toolkit. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2135–2135. (KDD ’16). Association for Computing Machinery, 2135.

Semiodesk. 2021. Trinity RDF: Entity Framework for Graph Databases.

Shankar, Dipti, Xiaoyi Lu, and Dhabaleswar K. Panda. 2019a. SCOR-KV: SIMD-aware Client-centric and Optimistic RDMA-based Key-Value Store for Emerging CPU Architectures. 2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HIPC), 257–266.

Shankar, Dipti, Xiaoyi Lu, and Dhabaleswar K. Panda. 2019b. SimdHT-Bench: Characterizing SIMD-aware Hash Table Designs on Emerging CPU Architectures. 2019 IEEE International Symposium on Workload Characterization (IISWC), 178–188.

Shankar, Dipti, Xiaoyi Lu, Md. W. Rahman, Nusrat Islam, and D. K. Panda. 2015. Benchmarking Key-Value Stores on High-performance Storage and Interconnects for Web-scale Workloads. BIG DATA ’15: Proceedings of the 2015 IEEE International Conference on Big Data, 539–544

Shankar, Dipti, Xiaoyi Lu, M. W. Rahman, Nusrat Islam, and Dhabaleswar K. Panda. 2014. A Micro-benchmark Suite for Evaluating Hadoop MapReduce on High-performance Networks. Proceedings of the Fifth Workshop on Big Data Benchmarks, Performance Optimization, and Emerging Hardware (BE-5), 19–33. Lecture Notes in Computer Science 8807. Hangzhou, China: Springer.

Shankar, D., X. Lu, and D. K. Panda. 2016. Boldio: A Hybrid and Resilient Burst-Buffer over Lustre for Accelerating Big Data I/O. 2016 IEEE International Conference on Big Data (BIG DATA), 404–409.

Shankar, D., X. Lu, N. Islam, M. Wasi-Ur Rahman, and D. K. Panda. 2016. High-performance Hybrid Key-Value Store on Modern Clusters with RDMA Interconnects and SSDs: Non-blocking Extensions, Designs, and Benefits. 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 393–402.

Sharma, Upendra, Prashant Shenoy, Sambit Sahu, and Anees Shaikh. 2011. A Cost-aware Elasticity Provisioning System for the Cloud. 2011 31st International Conference on Distributed Computing Systems, 559–570.

Shen, Zhaoyan, Feng Chen, Yichen Jia, and Zili Shao. 2018. DIDACache: An Integration of Device and Application for Flash-based Key-Value Caching. ACM Transactions on Storage 14 (3): article 26.

Shi, Haiyang, and Xiaoyi Lu. 2019. TriEC: Tripartite Graph Based Erasure Coding NIC Offload. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’19).

Shi, Haiyang, and Xiaoyi Lu. 2020. INEC: Fast and Coherent In-network Erasure Coding. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ‘20) IEEE Press, Article 66, 1–17.

Shi, Haiyang, Xiaoyi Lu, Dipti Shankar, and Dhabaleswar K. Panda. UMR-EC: A Unified and Multi-rail Erasure Coding Library for High-performance Distributed Storage Systems. Proceedings of the 28th International Symposium on High-performance Parallel and Distributed Computing (HPDC ’19), 219–230.

Shi, Jiaxin, Youyang Yao, Rong Chen, Haibo Chen, and Feifei Li. 2016. Fast and Concurrent {RDF} Queries with RDMA-based Distributed Graph Exploration. 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), 317–332.

Shi, Lin, Hao Chen, and Jianhua Sun. 2009. vCUDA: GPU Accelerated High Performance Computing in Virtual Machines. 2009 IEEE International Symposium on Parallel Distributed Processing, 1–11.

Shreedhar, Madhavapeddi, and George Varghese. 1996. Efficient Fair Queuing Using Deficit Round-Robin. IEEE/ACM Transactions on Networking 4 (3): 375–385.

Shue, David, and Michael J. Freedman. From Application Requests to Virtual IOPs: Provisioned Key-Value Storage with Libra. (EuroSys ’14). Association for Computing Machinery, New York, NY, USA, Article 17, 1–14. DOI

Shun, Julian, and Guy E. Blelloch. 2013. Ligra: A Lightweight Graph Processing Framework for Shared Memory. Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 135–146.

Shvachko, Konstantin, Hairong Kuang, Sanjay Radia, and Robert Chansler. 2010. The Hadoop Distributed File System. Proceedings of the 26th Symposium on Mass Storage Systems and Technologies (MSST), 1–10.

Simonyan, Karen, and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-scale Image Recognition. Third International Conference on Learning Representations (ICLR 2015).

Singh, Teja, Sundar Rangarajan, Deepesh John, Russell Schreiber, Spence Oliver, Rajit Seahra, and Alex Schaefer. 2020. 2.1 Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core. 2020 IEEE International Solid-state Circuits Conference (ISSCC), 42–44.

Singhal, Amit. 2012. Introducing the Knowledge Graph: Things, Not Strings.

SingleStore Inc. 2021. SingleStore: The Single Database for All Data-Intensive Applications.

Sivathanu, Muthian, Tapan Chugh, Sanjay S. Singapuram, and Lidong Zhou. 2019. Astra: Exploiting Predictability to Optimize Deep Learning. Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’19), 909–923.

Smola, Alexander, and Shravan Narayanamurthy. 2010. An Architecture for Parallel Topic Models. Proceedings of the VLDB Endowment 3 (1-2): 703–710.

Song, Xiang, Jian Yang, and Haibo Chen. 2014. Architecting Flash-based Solid-state Drive for High-performance I/O Virtualization. IEEE Computer Architecture Letters 13 (2): 61–64.

spdk. io. 2021. SPDK Hello World.

SQream. 2021. Bringing the Power of the GPU to the Era of Massive Data. https://sqream.com/product/data-acceleration-platform/sql-gpu-database/.

Stanford DAWN Team. 2021. DAWNBench: An End-to-End Deep Learning Benchmark and Competition.

Stanford Vision Lab. 2021. ImageNet. https://image-net.org/.

Sterling, Thomas, Ewing Lusk, and William Gropp. 2003. Beowulf Cluster Computing with Linux. Cambridge, MA: MIT Press.

Sterling, Thomas, Ewing Lusk, and William Gropp. 2003b. Beowulf Cluster Computing with Linux. In Mit press. Cambridge, MA.

Streamz. 2021. Real-time Stream Processing for Python.

Strukov, Dmitri B., Gregory S. Snider, Duncan R. Stewart, and R. Stanley Williams. 2008. The Missing Memristor Found. Nature 453 (7191): 80.

Stuart, Jeff A., and John D. Owens. 2011. Multi-GPU MapReduce on GPU Clusters. 2011 IEEE International Parallel and Distributed Processing Symposium, 1068–1079.

Stuedi, Patrick, Animesh Trivedi, Jonas Pfefferle, Radu Stoica, Bernard Metzler, Nikolas Ioannou, and Ioannis Koltsidas. 2017. Crail: A High-performance I/O Architecture for Distributed Data Processing. IEEE Data Engineering Bulletin 40 (1): 38–49.

Su, Maomeng, Mingxing Zhang, Kang Chen, Zhenyu Guo, and Yongwei Wu. 2017. RFP: When RPC Is Faster than Server-bypass with RDMA. Proceedings of the 12th European Conference on Computer Systems. Eurosys ’17, 1–15.

Suzuki, Yusuke, Shinpei Kato, Hiroshi Yamada, and Kenji Kono. 2014. GPUvm: Why Not Virtualizing GPUs at the Hypervisor? Usenix Annual Technical Conference, 109–120.

Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going Deeper with Convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1–9.

Szegedy, Christian, Sergey Ioffe, Vincent Vanhoucke, and Alex Alemi. 2016. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI ’17). 4278–4284.

Tai, Kai Xin. 2020. Monitor Apache Flink with Datadog.

Taleb, Yacine, Ryan Stutsman, Gabriel Antoniu, and Toni Cortes. 2018. Tailwind: Fast and Atomic RDMA-based Replication. 2018 USENIX Annual Technical Conference (USENIX ATC 18), 851–863.

Talpey, Tom. 2015. Remote Access to Ultra-low-latency Storage.

Talpey, Tom. 2016. RDMA Extensions for Remote Persistent Memory Access. 12th Annual Open Fabrics Alli-ance Workshop.

Talpey, Tom. 2019. RDMA Persistent Memory Extensions. 15th Annual Open Fabrics Alliance Workshop.

Talpey, Tom, and Jim Pinkerton. 2016. Rdma Durable Write Commit. Internet Engineering Task Force (IETF) Internet-Draft,

Tang, Haodong, Jian Zhang, and Fred Zhang. 2018. Accelerating Ceph with RDMA and NVMe-oF.

TensorFlow. 2021. TensorBoard: TensorFlow’s Visualization Toolkit. https://www.tensorflow.org/tensorboard.

Thakur, Rajeev, William Gropp, and Ewing Lusk. 1998. A Case for Using MPI’s Derived Datatypes to Improve I/O Performance. Proceedings of the 1998 ACM/IEEE conference on Supercomputing (SC ’98). IEEE Computer Society, 1–10.

Thaler, David, and Chinya V. Ravishankar. 1996. A Name-based Mapping Scheme for Rendezvous. Technical report CSE-TR-316-96, University of Michigan.

Thomson, Alexander, Thaddeus Diamond, Shu-Chun Weng, Kun Ren, Philip Shao, and Daniel J. Abadi. 2012. Calvin: Fast Distributed Transactions for Partitioned Database Systems. Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD ’12), 1–12.

Tian, Kun, Yaozu Dong, and David Cowperthwaite. 2014. A Full GPU Virtualization Solution with Mediated Pass-Through. 2014 USENIX Annual Technical Conference (USENIX ATC 14). 121–132.

Toon, Nigel. 2020. Introducing 2nd Generation IPU Systems for AI at Scale.

TOP500.org. TOP500 Supercomputing Sites. http://www.top500.org/.

TOP500.org. 2020. Highlights—November 2020.

Toshniwal, Ankit, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M. Patel, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, et al. 2014. Storm@ Twitter. Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, (SIGMOD ’14). Association for Computing Machinery, 147–156. DOI

TPC-H Version 2 and Version 3. 2021. TPC-H Benchmark.

Transaction Processing Performance Council. TPC—Homepage.

Tu, Stephen, Wenting Zheng, Eddie Kohler, Barbara Liskov, and Samuel Madden. 2013. Speedy Transactions in Multicore In-memory Databases. Proceedings of the 24th ACM Symposium on Operating Systems Principles, 18–32.

Tulapurkar, A. A., Y. Suzuki, A. Fukushima, H. Kubota, H. Maehara, K. Tsunekawa, D. D. Djayaprawira, N. Watanabe, and S. Yuasa. 2005. Spin-Torque Diode Effect in Magnetic Tunnel Junctions. Nature 438 (7066): 339.

Twitter. 2017. Fatcache: Memcache on SSD.

Twitter. 2019. Twemcache: Twitter Memcached.

Valiant, Leslie G. 1990. A Bridging Model for Parallel Computation. Communications of the ACM 33 (8): 103–111.

Vavilapalli, Vinod Kumar, Arun C. Murthy, Chris Douglas, Sharad Agarwal, Mahadev Konar, Robert Evans, Thomas Graves, Jason Lowe, Hitesh Shah, Siddharth Seth, et al. 2013. Apache Hadoop YARN: Yet Another Resource Negotiator. Proceedings of the 4th Annual Symposium on Cloud Computing (SoCC), 5.

Vazhkudai, Sudharshan S., Bronis R. de Supinski, Arthur S. Bland, Al Geist, James Sexton, Jim Kahle, Christopher J. Zimmer, Scott Atchley, Sarp Oral, Don E. Maxwell, et al. 2018. The Design, Deployment, and Evaluation of the CORAL Pre-exascale Systems. Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis. (SC ’18), 52.

Verma, Abhishek, Luis Pedrosa, Madhukar R. Korupolu, David Oppenheimer, Eric Tune, and John Wilkes. 2015. Large-scale Cluster Management at Google with Borg. Proceedings of the European Conference on Computer Systems (EuroSys ’15). Association for Computing Machinery, New York, NY, USA, Article 18, 1–17. DOI

Volos, Haris, Sanketh Nalli, Sankarlingam Panneerselvam, Venkatanathan Varadarajan, Prashant Saxena, and Michael M. Swift. 2014. Aerie: Flexible File-system Interfaces to Storage-class Memory. Proceedings of the 9th European Conference on Computer Systems, 1–14.

Wang, Chao, Lei Gong, Qi Yu, Xi Li, Yuan Xie, and Xuehai Zhou. 2016. DLAU: A Scalable Deep Learning Accelerator Unit on FPGA. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 36 (3): 513–517.

Wang, Cheng, Jianyu Jiang, Xusheng Chen, Ning Yi, and Heming Cui. 2017. APUS: Fast and Scalable Paxos on RDMA. SoCC ’17: Proceedings of the 2017 Symposium on Cloud Computing, 94–107.

Wang, Guanhua, Shivaram Venkataraman, Amar Phanishayee, Nikhil Devanur, Jorgen Thelin, and Ion Stoica. 2020. Blink: Fast and Generic Collectives for Distributed ML. Proceedings of Machine Learning and Systems. 2: 172–186.

Wang, Kuang-Ching, James Griffioen, Ronald Hutchins, and Zongming Fei. 2020. Large Scale Networking (LSN) Workshop on Huge Data: A Computing, Networking and Distributed Systems Perspective.

Wang, Lei, Jianfeng Zhan, Chunjie Luo, Yuqing Zhu, Qiang Yang, Yongqiang He, Wanling Gao, Zhen Jia, Yingjie Shi, Shujie Zhang, et al. 2014. BigDataBench: A Big Data Benchmark Suite from Internet Services. 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA), 488–499.

Wang, Peng, Guangyu Sun, Song Jiang, Jian Ouyang, Shiding Lin, Chen Zhang, and Jason Cong. 2014. An Efficient Design and Implementation of LSM-Tree Based Key-Value Store on Open-Channel SSD. Proceedings of the 9th European Conference on Computer Systems, 1–14.

Wang, Teng, Kathryn Mohror, Adam Moody, Weikuan Yu, and Kento Sato. 2015. BurstFS: A Distributed Burst Buffer File System for Scientific Applications. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’16. 807–818, doi: 10.1109/SC.2016.68.

Wang, Teng, S. Oral, Yandong Wang, B. Settlemyer, S. Atchley, and Weikuan Yu. 2014. BurstMem: A High-performance Burst Buffer System for Scientific Applications. 2014 IEEE International Conference on Big Data. (Big Data). 71–79. IEEE.

Wang, Yandong, Li Zhang, Jian Tan, Min Li, Yuqing Gao, Xavier Guerin, Xiaoqiao Meng, and Shicong Meng. 2015. HydraDB: A Resilient RDMA-driven Key-Value Middleware for In-memory Cluster Computing. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. (SC ’15). Association for Computing Machinery, New York, NY, USA, Article 22, 1–11. DOI

Wang, Yandong, Xiaoqiao Meng, Li Zhang, and Jian Tan. 2014. C-Hint: An Effective and Reliable Cache Management for RDMA-accelerated Key-Value Stores. Proceedings of the ACM Symposium on Cloud Computing, 1–13.

Wang, Yandong, Xinyu Que, Weikuan Yu, Dror Goldenberg, and Dhiraj Sehgal. 2011. Hadoop Acceleration through Network Levitated Merge. SC ’11: Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis, 1–10.

Wang, Yiheng, Xin Qiu, Ding Ding, Yao Zhang, Yanzhang Wang, Xianyan Jia, Yan Wan, Zhichao Li, Jiao Wang, Shengsheng Huang, et al. 2018. BigDL: A Distributed Deep Learning Framework for Big Data. arXiv preprint. arXiv:1804.05839.

Wei, Q., M. Xue, J. Yang, C. Wang, and C. Cheng. 2015. Accelerating Cloud Storage System with Byte-addressable Non-volatile Memory. 2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS), 354–361.

Wei, Xingda, Jiaxin Shi, Yanzhe Chen, Rong Chen, and Haibo Chen. 2015. Fast In-memory Transaction Processing Using RDMA and HTM. Proceedings of the 25th Symposium on Operating Systems Principles, 87–104.

Welsh, Matt, David Culler, and Eric Brewer. 2001. SEDA: An Architecture for Well-conditioned, Scalable Internet Services. Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP ’01). Association for Computing Machinery, New York, NY, USA, 230–243. DOI

Wong, H. S., S. Raoux, S. Kim, J. Liang, J. P. Reifenberg, B. Rajendran, M. Asheghi, and K. E. Goodson. 2010. Phase Change Memory. Proceedings of the IEEE 98 (12): 2201–2227.

Wu, Xiaojian, and A. L. Reddy. SCMFS: A File System for Storage Class Memory. In Sc’11, 39.

Wu, Xiaojian, and A. L. Narasimha Reddy. 2011. SCMFS: A File System for Storage Class Memory and its Extensions. ACM Trans. Storage 9, 3, Article 7 (2013), 23 pages.

Xia, Fei, Dejun Jiang, Jin Xiong, and Ninghui Sun. 2017. HiKV: A Hybrid Index Key-Value Store for DRAM-NVM Memory Systems. 2017 USENIX Annual Technical Conference (USENIX ATC 17), 349–362.

Xilinx. 2021. FPGA Leadership across Multiple Process Nodes.

Xilinx, AMD. 2021. Field Programmable Gate Array: What Is an FPGA?.

X IO. 2021. Axellio Edge Computing Systems, from XIO Technologies. https://nvmexpress.org/portfolio-items/axellio-super-io-platform-from-xio-technologies/.

XLA. 2021. XLA: Optimizing Compiler for Machine Learning.

Xu, Jian, and Steven Swanson. 2016. NOVA: A Log-structured File System for Hybrid Volatile/Non-volatile Main Memories. 14th USENIX Conference on File and Storage Technologies (FAST 16), 323–338.

Xu, Qiumin, Huzefa Siyamwala, Mrinmoy Ghosh, Tameesh Suri, Manu Awasthi, Zvika Guz, Anahita Shayesteh, and Vijay Balakrishnan. 2015. Performance Analysis of NVMe SSDs and Their Implication on Real World Databases. Proceedings of the 8th ACM International Systems and Storage Conference, 6.

Xu, Yuehai, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2014. Characterizing Facebook’s Memcached Workload. IEEE Internet Computing 18 (2): 41–49.

Yahoo. 2018. CaffeOnSpark: Distributed Deep Learning on Hadoop and Spark Clusters.

Yahoo. 2021. TensorFlowOnSpark Brings Scalable Deep Learning to Apache Hadoop and Apache Spark Clusters.

Yahoo. 2021. Webscope Datasets. https://webscope.sandbox.yahoo.com/catalog.php.

Yang, Carl, Aydin Buluç, and John D. Owens. 2019. GraphBLAST: A High-performance Linear Algebra-based Graph Framework on the GPU. arXiv/CoRR abs/1908.01407.

Yang, Jian, Joseph Izraelevitz, and Steven Swanson. 2019. Orion: A Distributed File System for Non-volatile Main Memory and RDMA-capable Networks. 17th USENIX Conference on File and Storage Technologies (FAST 19), 221–234.

Yang, Ziye, Luse E. Paul, James R. Harris, Benjamin Walker, Daniel Verkamp, Changpeng Liu, Cunyin Chang, Gang Cao, Jonathan Stern, and Vishal Verma. 2017. SPDK: A Development Kit to Build High Performance Storage Applications. 2017 IEEE International Conference on Cloud Computing Technology and Science (CloudCom), 154–161.

Yoshimura, Takeshi, Tatsuhiro Chiba, and Hiroshi Horii. 2019. EvFS: User-level, Event-driven File System for Non-volatile Memory. 11th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 19).

Yuan, Yuan, Rubao Lee, and Xiaodong Zhang. 2013. The Yin and Yang of Processing Data Warehousing Queries on GPU Devices. Proceedings of the VLDB Endowment 6 (10): 817–828.

Yuan, Yuan, Meisam Fathi Salmi, Yin Huai, Kaibo Wang, Rubao Lee, and Xiaodong Zhang. 2016. Spark-GPU: An Accelerated In-memory Data Processing Engine on Clusters. 2016 IEEE International Conference on Big Data (Big Data), 273–283.

Zadok, Erez, Dean Hildebrand, Geoff Kuenning, and Keith A. Smith. 2017. POSIX Is Dead! Long Live… errr… What Exactly? Proceedings of the 9th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage ’17), 12–12.

Zaharia, Matei, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster Computing with Working Sets. Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing. (HotStorage ’17). USENIX Association.

Boston, MA.

Zaharia, Matei, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauly, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-memory Cluster Computing. 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12), 15–28.

Zaharia, Matei, Reynold S. Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J. Franklin, et al. 2016. Apache Spark: A Unified Engine for Big Data Processing. Communications of the ACM 59 (11): 56–65.

Zaharia, Matei, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, and Ion Stoica. 2013. Discretized Streams: Fault-tolerant Streaming Computation at Scale. Proceedings of the 24th ACM Symposium on Operating Systems Principles, 423–438.

Zamanian, Erfan, Xiangyao Yu, Michael Stonebraker, and Tim Kraska. 2019. Rethinking Database High Availability with RDMA Networks. Proceedings of the VLDB Endowment 12 (11): 1637–1650.

Zhang, Chen, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks. Proceedings of the 2015 ACM/SIGDA International Symposium on Field-programmable Gate Arrays, 161–170.

Zhang, Jie, David Donofrio, John Shalf, Mahmut T. Kandemir, and Myoungsoo Jung. 2015. NVMMU: A Non-volatile Memory Management Unit for Heterogeneous GPU-SSD Architectures. 2015 International Conference on Parallel Architecture and Compilation (PACT ’15), 13–24.

Zhang, Jie, Xiaoyi Lu, and Dhabaleswar K. Panda. 2017. High-performance Virtual Machine Migration Framework for MPI Applications on SR-IOV Enabled InfiniBand Clusters. 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 143–152.

Zhang, Jie, Xiaoyi Lu, Ching-Hsiang Chu, and Dhabaleswar K. Panda. 2019. C-GDR: High-performance Container-aware GPUDirect MPI Communication Schemes on RDMA Networks. 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 242–251.

Zhang, Jie, Xiaoyi Lu, Jithin Jose, Mingzhe Li, Rong Shi, and Dhabaleswar K. (DK) Panda. 2014. High Performance MPI Library over SR-IOV Enabled InfiniBand Clusters. 2014 21st International Conference on High Performance Computing (HIPC), 1–10.

Zhang, Kai, Kaibo Wang, Yuan Yuan, Lei Guo, Rubao Lee, and Xiaodong Zhang. 2015. Mega-KV: A Case for GPUs to Maximize the Throughput of In-memory Key-Value Stores. Proceedings of the VLDB Endowment 8 (11): 1226–1237.

Zhang, Kaiyuan, Rong Chen, and Haibo Chen. 2015. NUMA-aware Graph-structured Analytics. Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP 2015), 183–193.

Zhang, K., J. Hu, B. He, and B. Hua. 2017. DIDO: Dynamic Pipelines for In-memory Key-Value Stores on Coupled CPU-GPU Architectures. 2017 IEEE 33rd International Conference on Data Engineering (ICDE), 671–682.

Zhang, X., X. Zhou, M. Lin, and J. Sun. 2018. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6848–6856.

Zhang, Yiying, Jian Yang, Amirsaman Memaripour, and Steven Swanson. 2015. Mojim: A Reliable and Highly-available Non-volatile Memory System. ACM SIGARCH Computer Architecture News 43: 3–18.

Zhang, Yunming, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian Shun, and Saman Amarasinghe. 2018. GraphIt: A High-performance DSL for Graph Analytics. arXiv preprint. arXiv:1805.00923.

Zhang, Yunming, Vladimir Kiriansky, Charith Mendis, Saman Amarasinghe, and Matei Zaharia. 2017. Making Caches Work for Graph Analytics. 2017 IEEE International Conference on Big Data (Big Data), 293–302.

Zheng, Chao, Lukas Rupprecht, Vasily Tarasov, Douglas Thain, Mohamed Mohamed, Dimitrios Skourtis, Amit S. Warke, and Dean Hildebrand. 2018. Wharf: Sharing Docker Images in a Distributed File System. Proceedings of the ACM Symposium on Cloud Computing (SOCC ’18), 174–185.

Zheng, Shengan, Linpeng Huang, Hao Liu, Linzhu Wu, and Jin Zha. 2016. HMVFS: A Hybrid Memory Versioning File System. 2016 32nd Symposium on Mass Storage Systems and Technologies (MSST), 1–14.

Zheng, Shengan, Morteza Hoseinzadeh, and Steven Swanson. 2019. Ziggurat: A Tiered File System for Non-Volatile Main Memories and Disks. 17th USENIX Conference on File and Storage Technologies (FAST 19), 207–219.

Zhou, Jingren, and Kenneth A. Ross. 2002. Implementing Database Operations Using SIMD Instructions. Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, 145–156.

Zhou, Shijie, Rajgopal Kannan, Viktor K. Prasanna, Guna Seetharaman, and Qing Wu. 2019. HitGraph: High-throughput Graph Processing Framework on FPGA. IEEE Transactions on Parallel and Distributed Systems 30 (10): 2249–2264. 10.1109/TPDS.2019.2910068.

Zhu, Xiaowei, Wentao Han, and Wenguang Chen. 2015. GridGraph: Large-scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning. 2015 USENIX Annual Technical Conference (USENIX ATC 15), 375–386.

Zinkevich, Martin, Markus Weimer, Lihong Li, and Alex Smola. 2010. Parallelized Stochastic Gradient Descent. Advances in Neural Information Processing Systems (nips), eds. J. Lafferty, C. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, vol. 2 (NIPS ’10). Curran Associates Inc., 2595–2603.