Дополнение A. Ссылки
Содержание
Abadi., Daniel J, Adam Marcus, Samuel R. Madden, and Kate Hollenbach. 2009. SW-Store: A Vertically Partitioned DBMS for Semantic Web Data Management. VLDB Journal 18 (2): 385–406.
Abadi, Martín, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. TensorFlow: A System for Large-scale Machine Learning. OSDI 16: 265–283.
Aerospike, Inc. 2021. Key-Value Operations.
Ahmad, Faraz, Seyong Lee, Mithuna Thottethodi, and T. N. Vijaykumar. 2012. PUMA: Purdue MapReduce Benchmarks Suite. ttps://engineering.purdue.edu/~puma/puma.pdf
Aker, Brian. 2011. libMemcached.
Akinaga, Hiroyuki, and Hisashi Shima. 2010. Resistive Random Access Memory (ReRAM) Based on Metal Oxides. Proceedings of the IEEE 98: 2237–2251.
Alarcon, Nefi. 2019. GPU-accelerated Spark XGBoost—A Major Milestone on the Road to Large-Scale AI.
Albutiu, Martina-Cezara, Alfons Kemper, and Thomas Neumann. 2012. Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems. Proceedings of the VLDB Endowment 5 (10): 1064–1075.
Algo-Logic Systems. 2020. Low Latency KVS on Xilinx Alveo U200—Algo-Logic Systems Inc.
Al-Kiswany, Samer, Abdullah Gharaibeh, and Matei Ripeanu. 2013. GPUs as Storage System Accelerators. IEEE Transactions on Parallel and Distributed Systems 24 (8): 1556–1566.
Allen, Grant, and Mike Owens. 2010. The Definitive Guide to SQLite. 2nd ed. Berkeley, CA: Apress.
Alverson, Bob, Edwin Froese, Larry Kaplan, and Duncan Roweth. 2012. Cray® XCTM Series Network-Cray. https://www.alcf.anl.gov/files/CrayXCNetwork.pdf.
Amazon, AWS. 2019. Best Practices Design Patterns: Optimizing Amazon S3 Performance.
Amazon, AWS. 2021a. Amazon Elastic Block Store (EBS). https://aws.amazon.com/ebs/.
Amazon, AWS. 2021b. Amazon S3 REST API Introduction.
Amazon, AWS. 2021c. Boto3 documentation. https://boto3.amazonaws.com/v1/documentation/api/latest/index.html.
Amazon, AWS. 2021d. Elastic Fabric Adapter.
Amazon, AWS. 2021e. Using High-level (S3) Commands with the AWS CLI.
AMD. 2021a. AMD. https://www.amd.com.
AMD. 2021b. AMD InstinctTM MI Series Accelerators.
AMD. 2021c. AMD InstinctTM MI100 Accelerator. https://www.amd.com/en/products/server-accelerators/instinct-mi100.
AMD. 2021e. AMD “Zen 3” Core Architecture. https://www.amd.com/en/technologies/zen-core-3.
AMPLab, UC Berkeley. 2021. Big Data Benchmark.
Apache Flink. 2020. Accelerating Your Workload with GPU and Other External Resources.
Apache Software Foundation. 2016. Class MultithreadedMapper. https://hadoop.apache.org/docs/r2.6.5/api/org/apache/hadoop/mapreduce/lib/map/MultithreadedMapper.html.
Apache Software Foundation. 2019. Apache Crail.
Apache Software Foundation. 2020a. HDFS Erasure Coding.
Apache Software Foundation. 2020b. Memory Storage Support in HDFS. https://hadoop.apache.org/docs/r3.3.0/hadoop-project-dist/hadoop-hdfs/MemoryStorage.html.
Apache Software Foundation. 2020c. MapReduce Tutorial.
Apache Software Foundation. 2021a. AffinityFunction (Ignite 2.10.0)—Apache Ignite. https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/cache/affinity/AffinityFunction.html.
Apache Software Foundation. 2021b. Apache Arrow.
Apache Software Foundation. 2021c. Apache Cassandra.
Apache Software Foundation. 2021d. Apache Flink®—Stateful Computations over Data Streams.
Apache Software Foundation. 2021e. Apache Hadoop.
Apache Software Foundation. 2021f. Welcome to Apache HBaseTM.
Apache Software Foundation. 2021g. Apache Heron (Incubating).
Apache Software Foundation. 2021h. Apache HiveTM. https://hive.apache.org/.
Apache Software Foundation. 2021j. Apache Kafka.
Apache Software Foundation. 2021l. Apache Storm.
Apache Software Foundation. 2021m. Apache Zepplin.
Apache Software Foundation. 2021n. Apache Zepplin.
Apache Software Foundation. 2021o. Apache ZooKeeperTM.
Apache Software Foundation. 2021q. Welcome To Apache Giraph!
Apache Software Foundation. 2021r. Spark SQL and DataFrames—Apache Spark.
Apache Software Foundation. 2021u. Working with SQL.
Arafa, Mohamed, Bahaa Fahim, Sailesh Kottapalli, Akhilesh Kumar, Lily P. Looi, Sreenivas Mandava, Andy Rudoff, Ian M. Steiner, Bob Valentine, Geetha Vedaraman, et al. 2019. Cascade Lake: Next Generation Intel Xeon Scalable Processor. IEEE Micro 39 (2): 29–36.
Argonne National Lab. 2021. Aurora Argonne Leadership Computing Facility.
ARM. 2021a. Arm Neoverse V1 Platform: Unleashing a New Performance Tier for Arm-based Computing.
ARM. 2021b. High Performance Computing. https://www.arm.com/solutions/infrastructure/high-performance-computing.
Armstrong, Timothy G., Vamsi Ponnekanti, Dhruba Borthakur, and Mark Callaghan. 2013. LinkBench: A Database Benchmark Based on the Facebook Social Graph. Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, 1185–1196.
Arulraj, Joy, Andrew Pavlo, and Subramanya R. Dulloor. 2015. Let’s Talk about Storage and Recovery Methods for Non-Volatile Memory Database Systems. Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, 707–722.
Atikoglu, Berk, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2012. Workload Analysis of a Large-scale Key-Value Store. SIGMETRICS Performance Evaluation Review 40 (1): 53–64.
Awan, A. A., C. Chu, H. Subramoni, X. Lu, and D. K. Panda. 2018. OC-DNN: Exploiting Advanced Unified Memory Capabilities in CUDA 9 and Volta GPUs for Out-of-Core DNN Training. 2018 IEEE 25th International Conference on High Performance Computing (HIPC), 143–152.
Bader, David A., John R. Gilbert, Jeremy Kepner, and Kamesh Madduri. 2021. HPC Graph Analysis.
Baidu Research. 2021. DeepBench. https://github.com/baidu-research/DeepBench.
Bakkum, Peter, and Kevin Skadron. 2010. Accelerating SQL Database Operations on a GPU with CUDA. Proceedings of the 3rd Workshop on General-purpose Computation on Graphics Processing Units, 94–103.
Barthels, Claude, Simon Loesing, Gustavo Alonso, and Donald Kossmann. 2015. Rack-scale In-memory Join Processing Using RDMA. Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, 1463–1475.
Barthels, Claude, Ingo Müller, Timo Schneider, Gustavo Alonso, and Torsten Hoefler. 2017. Distributed Join Algorithms on Thousands of Cores. Proceedings of the VLDB Endowment 10 (5): 517–528.
Beloglazov, Anton, and Rajkumar Buyya. 2010. Energy Efficient Allocation of Virtual Machines in Cloud Data Centers. 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, 577–578.
Bengio, Yoshua, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. 2003. A Neural Probabilistic Language Model. Journal of Machine Learning Research 3 (Feb): 1137–1155.
Bisong, Ekaba. 2019. Google Colaboratory. Building Machine Learning and Deep Learning Modules on Google Cloud Platform, 59–64. Berkeley, CA: Apress.
Bisson, Tim, Ke Chen, Changho Choi, Vijay Balakrishnan, and Yang-suk Kee. 2018. Crail-KV: A High-performance Distributed Key-Value Store Leveraging Native KV-SSDs over NVMe-oF. 2018 IEEE 37th International Performance Computing and Communications Conference (IPCCC), 1–8.
Bitfusion. 2017. Deep Learning Frameworks with Spark and GPUs.
Blazegraph. 2021. Welcome to Blazegraph. https://blazegraph.com/.
BlazingSQL. 2021. blazingSQL—Open Source SQL in Python.
Boito, Francieli Zanon, Eduardo C. Inacio, Jean Luca Bez, Philippe O. A. Navaux, Mario A. R. Dantas, and Yves Denneulin. 2018. A Checkpoint of Research on Parallel I/O for High-Performance Computing. ACM Computing Surveys (CSUR) 51 (2): 23.
Brytlyt. 2021. BrytlytDB. https://www.brytlyt.com/what-we-do/brytlytdb/.
Buluç, Aydin, Tim Mattson, Scott McMillan, José Moreira, and Carl Yang. 2017a. Design of the GraphBLAS API for C. 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 643–652.
Cao, Wei, Zhenjun Liu, Peng Wang, Sen Chen, Caifeng Zhu, Song Zheng, Yuhui Wang, and Guoqing Ma. 2018. Polarfs: An Ultra-low Latency and Failure Resilient Distributed File System for Shared Storage Cloud Database. Proceedings of the VLDB Endowment 11 (12): 1849–1862.
Caulfield, Adrian M., Todor I. Mollov, Louis Alex Eisner, Arup De, Joel Coburn, and Steven Swanson. 2012. Providing Safe, User Space Access to Fast, Solid State Disks. ACM SIGPLAN Notices 47 (4): 387–400.
Ceph. 2021. Ceph Delivers Object, Block, and File Storage in a Single, Unified System.
Cerebras. 2021a. Cerebras. https://cerebras.net.
Cerebras. 2021b. Cerebras Systems: Achieving Industry Best AI Performance through a Systems Approach.
CGCL-codes. 2017. TensorFlow RDMA. https://github.com/CGCL-codes/Tensorflow-RDMA.
Chameleon. 2021. A Configurable Experimental Environment for Large-scale Edge to Cloud Research.
Chandramouli, Badrish, Guna Prasaad, Donald Kossmann, Justin Levandoski, James Hunter, and Mike Barnett. 2018. Faster: A Concurrent Key-Value Store with In-place updates. Proceedings of the 2018 International Conference on Management of Data, 275–290.
Chen, Tianqi, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. Mxnet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. arXiv preprint. arXiv:1512.01274.
Chen, Xinyu, Hongshi Tan, Yao Chen, Bingsheng He, Weng-Fai Wong, and Deming Chen. 2021. ThunderGP: HLS-based Graph Processing Framework on FPGAs. The 2021 ACM/SIGDA International Symposium on Field-programmable Gate Arrays, 69–80.
Chen, Youmin, Jiwu Shu, Jiaxin Ou, and Youyou Lu. 2018. HiNFS: A Persistent Memory File System with Both Buffering and Direct-access. ACM Transactions on Storage (TOS) 14 (1): 1–30.
Cheng, Wang. 2019. APUS: Fast and Scalable Paxos on RDMA. https://github.com/hku-systems/apus.
Ching, Avery, Sergey Edunov, Maja Kabiljo, Dionysios Logothetis, and Sambavi Muthukrishnan. 2015. One Trillion Edges: Graph Processing at Facebook-Scale. Proceedings of the VLDB Endowment 8 (12): 1804–1815.
Cho, Kyunghyun, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation. arXiv preprint. arXiv:1406.1078.
Chu, Howard. 2011. MDB: A Memory-Mapped Database and Backend for OpenLDAP.
CloudSuite Team. 2021. CloudSuite: A Benchmark Suite for Cloud Services.
Condit, Jeremy, Edmund B. Nightingale, Christopher Frost, Engin Ipek, Benjamin Lee, Doug Burger, and Derrick Coetzee. 2009. Better I/O through Byte-addressable, Persistent Memory. Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, 133–146.
Convolbo, Moïse W., and Jerry Chou. 2016. Cost-aware DAG Scheduling Algorithms for Minimizing Execution Cost on Cloud Resources. Journal of Supercomputing 72 (3): 985–1012.
Cornelis. 2020. Cornelis Networks. https://www.cornelisnetworks.com/.
Cray. 2021. HPE. 2021. XCTM Series DataWarpTM User Guide.
Crego, E., G. Munoz, and F. Islam. 2013. Big Data and Deep Learning: Big Deals or Big Delusions?
CSCS, Swiss National Supercomputing Centre. 2021. Piz Daint. https://www.cscs.ch/computers/piz-daint/.
Dagum, Leonardo, and Ramesh Menon. 1998. OpenMP: An Industry Standard API for Shared-Memory Programming. IEEE Computational Science and Engineering 5 (1): 46–55.
Dalessandro, Dennis, Ananth Devulapalli, and Pete Wyckoff. 2005. Design and Implementation of the iWarp Protocol in Software. In The IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS). 471–476.
DataBricks. 2018. TensorFrames. https://github.com/databricks/tensorframes.
Datadog. 2015. Monitor Cassandra with Datadog. https://www.datadoghq.com/blog/monitoring-cassandra-with-datadog/.
Datadog. 2021. DATADOG—Unified Monitoring for the cloud age.
DataMPI Team. 2021. DataMPI: Extending MPI for Big Data with Key-Value based Communication.
Davis, Timothy A. 2019. Algorithm 1000: SuiteSparse:GraphBLAS: Graph Algorithms in the Language of Sparse Linear Algebra. ACM Transactions on Mathematical Software 45 (4): article 44.
Dean, Jeffrey, and Sanjay Ghemawat. 2008. MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM 51 (1): 107–113.
Deep500 Team. 2021. Deep500: An HPC Deep Learning Benchmark and Competition.
Deepnote. 2021. Deepnote. https://deepnote.com/.
DeepSpeech Team. 2021. Project DeepSpeech.
Department of Energy, U.S. 2011. Terabits Networks for Extreme Scale Science. DOE Workshop Report.
Difallah, Djellel Eddine, Andrew Pavlo, Carlo Curino, and Philippe Cudre-Mauroux. 2013. OLTP-Bench: An Extensible Testbed for Benchmarking Relational Databases. Proceedings of the VLDB Endowment 7 (4): 277–288.
Domo. 2021. Data Never Sleeps 8.0. https://www.domo.com/learn/data-never-sleeps-8.
Dormando. 2021. What is Memcached? http://memcached.org/.
Doweck, J., W. Kao, A. K. Lu, J. Mandelblat, A. Rahatekar, L. Rappoport, E. Rotem, A. Yasin, and A. Yoaz. 2017. Inside 6th-Generation Intel Core: New Microarchitecture Code-Named Skylake. IEEE Micro 37 (2): 52–62.
Dragojević, A., D. Narayanan, M. Castro, and O. Hodson. 2014. FaRM: Fast Remote Memory. 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14), 401–414.
Duato, José, Antonio J. Pena, Federico Silla, Rafael Mayo, and Enrique S Quintana-Ortí. 2010. rCUDA: Reducing the Number of GPU-based Accelerators in High Performance Clusters. 2010 International Conference on High Performance Computing and Simulation, 224–231.
Dulloor, Subramanya R., Sanjay Kumar, Anil Keshavamurthy, Philip Lantz, Dheeraj Reddy, Rajesh Sankaran, and Jeff Jackson. 2014. System Software for Persistent Memory. Proceedings of the Ninth European Conference on Computer Systems, 15.
Dynatrace. 2021a. Apache Spark monitoring.
Dynatrace. 2021b. Dynatrace. https://www.dynatrace.com/.
Dynatrace. 2021c. Hadoop Performance Monitoring.
E8 Storage. 2021. E8 Storage E8-D24 Rack Scale Flash, Centralized NVMe Solution.
Eisenman, Assaf, Darryl Gardner, Islam AbdelRahman, Jens Axboe, Siying Dong, Kim Hazelwood, Chris Petersen, Asaf Cidon, and Sachin Katti. 2018. Reducing DRAM Footprint with NVM in Facebook. Eurosys’18, 42.
Elasticsearch B.V. 2021. Elasticsearch: The Heart of the Free and Open Elastic Stack. https://www.elastic.co/elasticsearch/.
Emani, Murali, Venkatram Vishwanath, Corey Adams, Michael E. Papka, Rick Stevens, Laura Florescu, Sumti Jairath, William Liu, Tejas Nama, and Arvind Sujeeth. 2021. Accelerating Scientific Applications with SambaNova Reconfigurable Dataflow Architecture. Computing in Science Engineering 23 (2): 114–119.
Facebook. 2018. RocksDB. https://rocksdb.org/.
Facebook AI Team. 2021. Facebook AI Performance Evaluation Platform.
Fan, Bin, David G. Andersen, and Michael Kaminsky. 2013. MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing. 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13), 371–384.
Fan, Ziqi, Fenggang Wu, Jim Diehl, David H. C. Du, and Doug Voigt. 2018. CDBB: An NVRAM-based Burst Buffer Coordination System for Parallel File Systems. Proceedings of the High Performance Computing Symposium, 1.
FASTDATA.io. 2021. PlasmaENGINE®.
Fielding, Roy Thomas. 2000. Chapter 5: Representational State Transfer (REST). Architectural Styles and the Design of Network-based Software Architectures. PhD dissertation, University of California, Irvine.
Fikes, Andrew. 2010. Storage Architecture and Challenges. Google Faculty Summit, 535.
Foley, D., and J. Danskin. 2017. Ultra-Performance Pascal GPU and NVLink Interconnect. IEEE Micro 37 (02): 7–17.
Fujitsu. 2021. FUJITSU Processor A64FX.
Garrigues, Pierre. 2015. How Deep Learning Powers Flickr. RE.WORK Deep Learning Summit 2015.
Ghasemi, E. and Chow, P., 2016, July. Accelerating Apache Spark Big Data Analysis with FPGAs. 2016 International IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld). (pp. 737–744). IEEE.
Ghemawat, Sanjay, Howard Gobioff, and Shun-Tak Leung. 2003. The Google File System. Proceedings of the 19th ACM Symposium on Operating Systems Principles, 20–43.
Goldenberg, Dror, Michael Kagan, Ran Ravid, and Michael S. Tsirkin. 2005. Transparently Achieving Superior Socket Performance Using Zero Copy Socket Direct Protocol over 20 Gb/s InfiniBand Links. 2005 IEEE International Conference on Cluster Computing (Cluster), 1–10.
Gonzalez, Joseph E., Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ion Stoica. 2014. Graphx: Graph Processing in a Distributed Dataflow Framework. OSDI, 599–613.
Google. 2010. Our New Search Index: Caffeine.
Google. 2021a. Cloud TPU. https://cloud.google.com/tpu/.
Gottschling, Paul. 2019. Monitor Apache Hive with Datadog.
Graham, Richard L., Devendar Bureddy, Pak Lui, Hal Rosenstock, Gilad Shainer, Gil Bloch, Dror Goldenerg, Mike Dubman, Sasha Kotchubievsky, Vladimir Koushnir, et al. 2016. Scalable Hierarchical Aggregation Protocol (SHArP): A Hardware Architecture for Efficient Data Reduction. Proceedings of the First Workshop on Optimization of Communication in HPC, 1–10.
Graphcore. 2021. https://www.graphcore.ai.
Groupon. 2017. Sparklint. https://github.com/groupon/sparklint.
gRPC Authors. 2021. gRPC: A High Performance, Open Source Universal RPC Framework.
Gugnani, Shashank, Xiaoyi Lu, and Dhabaleswar K. Panda. 2016. Designing Virtualization-aware and Automatic Topology Detection Schemes for Accelerating Hadoop on SR-IOV-Enabled Clouds. 2016 IEEE International Conference on Cloud Computing Technology and Science (CloudCom), 152–159.
Gugnani, Shashank, Xiaoyi Lu, and Dhabaleswar K. Panda. 2018. Analyzing, Modeling, and Provisioning QoS for NVMe SSDs. 2018 IEEE/ACM 11th International Conference on Utility and Cloud Computing (UCC), 247–256.
Gupta, K., J. A. Stuart, and J. D. Owens. 2012. A Study of Persistent Threads Style GPU Programming for GPGPU Workloads. 2012 Innovative Parallel Computing (InPar), 1–14.
Gurajada, Sairam, Stephan Seufert, Iris Miliaraki, and Martin Theobald. 2014. TriAD: A Distributed Shared-nothing RDF Engine Based on Asynchronous Message Passing. Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. SIGMOD ’14, 289–300.
Guz, Zvika, Harry Huan Li, Anahita Shayesteh, and Vijay Balakrishnan. 2017. NVMe-over-Fabrics Performance Characterization and the Path to Low-Overhead Flash Disaggregation. Proceedings of the 10th ACM International Systems and Storage Conference, SYSTOR ’17, 16.
Habana Gaudi. 2019. GaudiTM Training Platform White Paper.
Habana Goya. 2019. GoyaTM Inference Platform White Paper. https://habana.ai/wp-content/uploads/pdf/habana_labs_goya_whitepaper.pdf.
Hamilton, Mark, Sudarshan Raghunathan, Akshaya Annavajhala, Danil Kirsanov, Eduardo de Leon, Eli Barzilay, Ilya Matiach, Joe Davison, Maureen Busch, Miruna Oprescu, et al. 2018. Flexible and Scalable Deep Learning with MMLSpark. arXiv preprint. arXiv:1804.04031.
Harris, Derrick. 2015. Google, Stanford Say Big Data is Key to Deep Learning for Drug Discovery. https://gigaom.com/2015/03/02/google-stanford-say-big-data-is-key-to-deep-learning-for-drug-discovery.
Harris, Mark. 2017. Unified Memory for CUDA Beginners.
He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778.
He, Wenting, Huimin Cui, Binbin Lu, Jiacheng Zhao, Shengmei Li, Gong Ruan, Jingling Xue, Xiaobing Feng, Wensen Yang, and Youliang Yan. 2015. Hadoop+ Modeling and Evaluating the Heterogeneity for MapReduce Applications in Heterogeneous Clusters. Proceedings of the 29th ACM International Conference on Supercomputing, 143–153.
Herodotou, Herodotos, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong, Fatma Bilgen Cetin, and Shivnath Babu. 2011. Starfish: A Self-tuning System for Big Data Analytics. Proceedings of the Fifth Biennial Conference on Innovative Data Systems Research (CIDR) 11: 261–272. www.cidrdb.org.
Hetherington, T. H., T. G. Rogers, L. Hsu, M. O’Connor, and T. M. Aamodt. 2012. Characterizing and Evaluating a Key-Value Store Application on Heterogeneous CPU-GPU Systems. 2012 IEEE International Symposium on Performance Analysis of Systems Software, 88–98.
Hochreiter, Sepp, and Jürgen Schmidhuber. 1997. Long Short-term Memory. Neural Computation 9 (8): 1735–1780.
Huang, Ting-Chang, and Da-Wei Chang. 2016. TridentFS: A Hybrid File System for Non-volatile RAM, Flash Memory and Magnetic Disk. Software: Practice and Experience 46 (3): 291–318.
Huang, Yihe, Matej Pavlovic, Virendra Marathe, Margo Seltzer, Tim Harris, and Steve Byan. 2018. Closing the Performance Gap between Volatile and Persistent Key-Value Stores Using Cross-referencing Logs. 2018 USENIX Annual Technical Conference (USENIX ATC 18), 967–979.
Hyper. 2021. HyPer—A Hybrid OLTP&OLAP High Performance DBMS.
Iandola, Forrest N., Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level Accuracy with 50x Fewer Parameters and 0.5 MB Model Size. arXiv preprint. arXiv:1602.07360.
IBM. 2020. IBM Reveals Next-generation IBM POWER10 Processor.
IBM. 2021. IBM General Parallel File System (GPFS) Product Documentation. https://www.ibm.com/support/knowledgecenter/SSFKCN/gpfs_content.html.
IBTA, Inifiniband Trade Association. 2021a. Infiniband Trade Association.
IBTA, Inifiniband Trade Association. 2021b. RoCE Is RDMA over Converged Ethernet.
Infiniband Trade Association. 2010. Supplement to Infiniband Architecture Specification Volume 1, Release 1.2. 1: Annex A16: RDMA over Converged Ethernet (RoCE) Apr.
insideHPC. 2021. Microchip Technology Inc.: Introducing First PCI Express 5.0 Switches.
Intel. 2015a. Linux-pmfs/pmfs: Persistent Memory File System. https://github.com/linux-pmfs/pmfs.
Intel. 2015b. Performance Benchmarking for PCIe and NVMe Enterprise Solid-State Drives. White paper.
Intel. 2017. Intel SPDK. https://www.spdk.io/.
Intel. 2019b. Intel® Distribution of Caffe. https://github.com/intel/caffe.
Intel. 2019e. PMDK: Persistent Memory Development Kit. https://github.com/pmem/pmdk/.
Intel. 2020a. HiBench Suite: The BigData Micro Benchmark Suite.
Intel. 2020b. Intel Announces Its Next Generation Memory and Storage Products.
Intel. 2021a. 24. Hash Library. Data Plane Development Kit 21.05.0 documentation. https://doc.dpdk.org/guides/prog_guide/hash_lib.html.
Intel. 2021b. AHCI Specification for Serial ATA.
Intel. 2021c. Intel® Advanced Vector Extensions 512 (Intel® AVX-512). https://www.intel.com/content/www/us/en/architecture-and-technology/avx-512-overview.html.
Intel. 2021d. Intel AI Hardware.
Intel. 2021e. The Intel Intrinsics Guide. https://software.intel.com/sites/landingpage/IntrinsicsGuide/.
Intel. 2021f. Intel® oneAPI Math Kernel Library.
Intel. 2021g. Intel® Xeon® Processors. https://www.intel.com/content/www/us/en/products/details/processors/xeon.html.
Intel. 2021h. oneAPI Deep Neural Network Library (oneDNN).
Intel. 2021j. Storage Performance Development Kit. https://github.com/spdk/spdk.
Interface AffinityFunction. 2021a. Apache Ignite (Ignite 2.10.0)—Apache Ignite.
Islam, N. S., M. W. Rahman, J. Jose, R. Rajachandrasekar, H. Wang, H. Subramoni, C. Murthy, and D. K. Panda. 2012. High Performance RDMA-based Design of HDFS over InfiniBand. SC ’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, 2012, pp. 1–12, doi: 10.1109/SC.2012.65.
Islam, Nusrat S., Xiaoyi Lu, Md. W. Rahman, and D. K. Panda. 2013. Can Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects? 2013 IEEE 21st Annual Symposium on High-performance Interconnects, 75–78, doi: 10.1109/HOTI.2013.24.
Islam, Nusrat S., Xiaoyi Lu, Md. Wasi-ur Rahman, and Dhabaleswar K. (DK) Panda. 2014. SOR-HDFS: A SEDA-based Approach to Maximize Overlapping in RDMA-enhanced HDFS. Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing. HDC ’14, 261–264.
Islam, Nusrat Sharmin, Xiaoyi Lu, Md. Wasi-ur Rahman, Jithin Jose, and Dhabaleswar K. (DK) Panda. 2012. A Micro-benchmark Suite for Evaluating HDFS Operations on Modern Clusters. In WBDB 2012: Specifying Big Data Benchmarks, 129–147. Lecture Notes in Computer Science 8163. New York: Springer.
Islam, Nusrat Sharmin, Xiaoyi Lu, Md. Wasi-ur Rahman, Dipti Shankar, and Dhabaleswar K. Panda. 2015. Triple-H: A Hybrid Approach to Accelerate HDFS on HPC Clusters with Heterogeneous Storage Architecture. 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), 101–110.
Islam, Nusrat Sharmin, Md. Wasi-ur Rahman, Xiaoyi Lu, and Dhabaleswar K. (DK) Panda. 2016a. Efficient Data Access Strategies for Hadoop and Spark on HPC Cluster with Heterogeneous Storage. 2016 IEEE International Conference on Big Data (Big Data), 223–232.
Izraelevitz, Joseph, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, Amirsaman Memaripour, Yun Joon Soh, Zixuan Wang, Yi Xu, Subramanya R. Dulloor, et al. 2019. Basic Performance Measurements of the Intel Optane DC Persistent Memory Module. arXiv/CoRR abs/1903.05714.
Jacob, Leverich. 2021. Mutilate: A High-Performance Memcached Load Generator.
Javed, M. H., X. Lu, and D. K. Panda. 2018. Cutting the Tail: Designing High Performance Message Brokers to Reduce Tail Latencies in Stream Processing. 2018 IEEE International Conference on Cluster Computing (CLUSTER), 223–233. 10.1109/CLUSTER.2018.00040.
Jia, Yangqing, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional Architecture for Fast Feature Embedding. Proceedings of the 22nd ACM International Conference on Multimedia, 675–678.
Jose, J., H. Subramoni, K. Kandalla, M. Wasi ur Rahman, H. Wang, S. Narravula, and D. K. Panda. 2012. Scalable Memcached Design for InfiniBand Clusters Using Hybrid Transports. 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID ’12), 236–243.
Jose, J., H. Subramoni, M. Luo, M. Zhang, J. Huang, Md. Wasi-ur Rahman, N. S. Islam, X. Ouyang, H. Wang, S. Sur, and D. K. Panda. 2011. Memcached Design on High Performance RDMA Capable Interconnects. Proceedings of the 2011 International Conference on Parallel Processing. International Conference on Parallel Processing, 2011, pp. 743–752, doi: 10.1109/ICPP.2011.37.
Jose, Jithin, Mingzhe Li, Xiaoyi Lu, Krishna Chaitanya Kandalla, Mark Daniel Arnold, and Dhabaleswar K. Panda. 2013. SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience. Proceedings of IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), 385–392.
Kadekodi, Rohan, Se Kwon Lee, Sanidhya Kashyap, Taesoo Kim, Aasheesh Kolli, and Vijay Chidambaram. 2019. SplitFS: Reducing Software Overhead in File Systems for Persistent Memory. Proceedings of the 27th ACM Symposium on Operating Systems Principles, 494–508.
Kalia, Anuj, Michael Kaminsky, and David Andersen. 2019. Datacenter RPCs Can Be General and Fast. 16th USENIX Symposium on Networked Systems Design and Implementation ({NSDI} 19), 1–16.
Kalia, Anuj, Michael Kaminsky, and David G. Andersen. 2016. Design Guidelines for High Performance RDMA Systems. 2016 USENIX Annual Technical Conference (USENIX ATC 16), 437–450.
Kallman, Robert, Hideaki Kimura, Jonathan Natkins, Andrew Pavlo, Alexander Rasin, Stanley Zdonik, Evan P. C. Jones, Samuel Madden, Michael Stonebraker, Yang Zhang, et al. 2008. H-Store: A High-performance, Distributed Main Memory Transaction Processing System. Proceedings of the VLDB Endowment 1 (2): 1496–1499.
Kang, Yangwook, Rekha Pitchumani, Pratik Mishra, Yang-suk Kee, Francisco Londono, Sangyoon Oh, Jongyeol Lee, and Daniel D. G. Lee. 2019. Towards Building a High-performance, Scale-in Key-Value Storage System. Proceedings of the 12th ACM International Conference on Systems and Storage. SYSTOR ’19, 144–154.
Kannan, Sudarsun, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Yuangang Wang, Jun Xu, and Gopinath Palani. 2018. Designing a True Direct-access File System with DevFS. 16th USENIX Conference on File and Storage Technologies (FAST 18), 241–256.
Katevenis, Manolis, Stefanos Sidiropoulos, and Costas Courcoubetis. 1991. Weighted Round-Robin Cell Multiplexing in a General-purpose ATM Switch Chip. IEEE Journal on Selected Areas in Communications 9 (8): 1265–1279.
Kehne, Jens, Jonathan Metter, and Frank Bellosa. GPUswap: Enabling Oversubscription of GPU Memory through Transparent Swapping. Vee ’15, 65–77.
Kemper, Alfons, and Thomas Neumann. 2021. HyPer: Hybrid OLTP&OLAP High-Performance Database System.
Kim, Changkyu, Tim Kaldewey, Victor W. Lee, Eric Sedlar, Anthony D. Nguyen, Nadathur Satish, Jatin Chhugani, Andrea Di Blas, and Pradeep Dubey. 2009. Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs. Proceedings of the VLDB Endowment 2 (2): 1378–1389.
Kinetica. 2021. The Database for Time and Space: Fuse, Analyze, and Act in Real Time.
Kissinger, Thomas, Tim Kiefer, Benjamin Schlegel, Dirk Habich, Daniel Molka, and Wolfgang Lehner. 2014. ERIS: A NUMA-aware In-memory Storage Engine for Analytical Workloads. Proceedings of the VLDB Endowment 7 (14): 1–12.
Klimovic, Ana, Heiner Litz, and Christos Kozyrakis. 2017. ReFlex: Remote Flash ≈ Local Flash. ACM SIGARCH Computer Architecture News 45 (1): 345–359.
Klimovic, Ana, Yawen Wang, Patrick Stuedi, Animesh Trivedi, Jonas Pfefferle, and Christos Kozyrakis. 2018. Pocket: Elastic Ephemeral Storage for Serverless Analytics. 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), 427–444.
Konduit. 2021. Deep Learning for Java: Open-source, Distributed, Deep Learning Library for the JVM.
Kourtis, Kornilios, Nikolas Ioannou, and Ioannis Koltsidas. 2019. Reaping the Performance of Fast {NVM} Storage with uDepot. 17th USENIX Conference on File and Storage Technologies (FAST 19), 1–15.
Krizhevsky, Alex. 2009. Learning Multiple Layers of Features from Tiny Images.
Krizhevsky, Alex. 2021. The CIFAR-10 Dataset. http://www.cs.toronto.edu/~kriz/cifar.html.
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. Proceedings of the 25th International Conference on Neural Information Processing Systems—Volume 1 (NIPS ’12). Curran Associates Inc. 1097–1105.
Kubernetes Team. 2021. Kubernetes: Production-grade Container Orchestration.
Kulkarni, Sanjeev, Nikunj Bhagat, Maosong Fu, Vikas Kedigehalli, Christopher Kellogg, Sailesh Mittal, Jignesh M. Patel, Karthik Ramasamy, and Siddarth Taneja. 2015. Twitter Heron: Stream Processing at Scale. Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. SIGMOD ’15, 239–250.
Kültürsay, Emre, Mahmut Kandemir, Anand Sivasubramaniam, and Onur Mutlu. 2013. Evaluating STT-RAM as an Energy-efficient Main Memory Alternative. 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 256–267.
Lavasani, Maysam, Hari Angepat, and Derek Chiou. 2014. An FPGA-based In-line Accelerator for Memcached. IEEE Computer Architecture Letters 13 (2): 57–60.
Lawrence, Steve, C. Lee Giles, Ah Chung Tsoi, and Andrew D. Back. 1997. Face Recognition: A Convolutional Neural-Network Approach. IEEE Transactions on Neural Networks 8 (1): 98–113.
LDBC. 2021. Linked Data Benchmark Council (LDBC): The Graph and RDF Benchmark Reference. 27–31.
Lee, Changman, Dongho Sim, Jooyoung Hwang, and Sangyeun Cho. 2015. F2FS: A New File System for Flash Storage. 13th USENIX Conference on File and Storage Technologies (FAST 15), 273–286.
Leibiusky, Jonathan. 2021. Jedis. https://github.com/redis/jedis.
Leis, Viktor, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2014. Morsel-driven Parallelism: A NUMA-aware Query Evaluation Framework for the Many-core Age. Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, 743–754.
Lenharth, Andrew, and Keshav Pingali. 2015. Scaling Runtimes for Irregular Algorithms to Large-scale NUMA Systems. Computer 48 (8): 35–44.
Li, Feng, Sudipto Das, Manoj Syamala, and Vivek R. Narasayya. 2016. Accelerating Relational Databases by Leveraging Remote Memory and RDMA. Proceedings of the 2016 International Conference on Management of Data, 355–370.
Li, Haoyuan, Ali Ghodsi, Matei Zaharia, Scott Shenker, and Ion Stoica. 2014. Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks. SoCC ’14: Proceedings of the ACM Symposium on Cloud Computing, 1–15.
Li, Min, Jian Tan, Yandong Wang, Li Zhang, and Valentina Salapura. 2015. SparkBench: A Comprehensive Benchmarking Suite for in Memory Data Analytic Platform Spark. Proceedings of the 12th ACM International Conference on Computing Frontiers (CF ’15), 53–1538.
Li, Mu, David G. Andersen, Jun Woo Park, Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, and Bor-Yiing Su. 2014. Scaling Distributed Machine Learning with the Parameter Server. Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI ’14), 583–598.
Li, M., X. Lu, S. Potluri, K. Hamidouche, J. Jose, K. Tomko, and D. K. Panda. 2014. Scalable Graph500 Design with MPI-3 RMA. 2014 IEEE International Conference on Cluster Computing (CLUSTER), 230–238.
Li, Peilong, Yan Luo, Ning Zhang, and Yu Cao. 2015. Heterospark: A Heterogeneous CPU/GPU Spark Platform for Machine Learning Algorithms. 2015 IEEE International Conference on Networking, Architecture and Storage (NAS), 347–348.
Li, Tianxi, Dipti Shankar, Shashank Gugnani, and Xiaoyi Lu. 2020. RDMP-KV: Designing Remote Direct Memory Persistence Based Key-Value Stores with PMEM. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’20).
LightNVM. 2018. Open-Channel SSD. http://lightnvm.io/.
Lim, H., D. Han, D. G. Andersen, and M. Kaminsky. 2014. MICA: A Holistic Approach to Fast In-memory Key-Value Storage. Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation (NSDI ’14).
Lin, Tsung-Yi, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. European Conference on Computer Vision, 740–755.
Linux. 2021a. AIO—POSIX Asynchronous I/O Overview.
Linux. 2021b. lseek—Linux Manual Page. https://man7.org/linux/man-pages/man2/lseek.2.html.
Linux RDMA. 2021. RDMA Core Userspace Libraries and Daemons.
Liu, Jiuxing. 2010. Evaluating Standard-based Self-virtualizing Devices: A Performance Study on 10 GbE NICs with SR-IOV Support. Proceedings of IEEE International Parallel and Distributed Processing Symposium (IPDPS), 1–12.
Liu, Xin, Yu-tong Lu, Jie Yu, Peng-fei Wang, Jie-ting Wu, and Ying Lu. 2017. ONFS: A Hierarchical Hybrid File System Based on Memory, SSD, and HDD for High Performance Computers. Frontiers of Information Technology and Electronic Engineering 18 (12): 1940–1971.
Lockwood, Glenn. 2017. What’s So Bad about POSIX I/O?
Low, Yucheng, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M. Hellerstein. 2012. Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud. Proceedings of the VLDB Endowment 5 (8): 716–727.
Low, Yucheng, Joseph E. Gonzalez, Aapo Kyrola, Danny Bickson, Carlos E. Guestrin, and Joseph Hellerstein. 2014. GraphLab: A New Framework For Parallel Machine Learning. arXiv preprint. arXiv:1408.2041.
Lu, J., Y. Wan, Y. Li, C. Zhang, H. Dai, Y. Wang, G. Zhang, and B. Liu. 2019. Ultra-fast Bloom Filters using SIMD Techniques. IEEE Transactions on Parallel and Distributed Systems 30 (4): 953–964.
Lu, Lanyue, Thanumalayan Sankaranarayana Pillai, Hariharan Gopalakrishnan, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2017. WiscKey: Separating Keys from Values in SSD-Conscious Storage. ACM Transactions on Storage (TOS) 13 (1): 1–28.
Lu, Ruirui, Gang Wu, Bin Xie, and Jingtong Hu. 2014. Stream Bench: Towards Benchmarking Modern Distributed Stream Computing Frameworks. Proceedings of the 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing. (UCC ’14), 69–78.
Lu, X., H. Shi, H. Javed, R. Biswas, and D. K. Panda. 2017. Characterizing Deep Learning over Big Data (DLoBD) Stacks on RDMA-capable Networks. The 25th Annual Symposium on High-Performance Interconnects (HOTI).
Lu, X., H. Shi, R. Biswas, M. H. Javed, and D. K. Panda. 2018. Dlobd: A comprehensive study of deep learning over big data stacks on hpc clusters. IEEE Transactions on Multi-Scale Computing Systems 4 (4): 635–648. doi:10.1109/TMSCS.2018.2845886.
Lu, Xiaoyi, Bin Wang, Li Zha, and Zhiwei Xu. 2011. Can MPI Benefit Hadoop and MapReduce Applications? 2011 40th International Conference on Parallel Processing Workshops, 371–379.
Lu, Xiaoyi, Dipti Shankar, Shashank Gugnani, and Dhabaleswar K. Panda. 2016. High-performance Design of Apache Spark with RDMA and Its Benefits on Various Workloads. Proceedings of the 2016 IEEE International Conference on Big Data (Big Data). 253–262.
Lu, Xiaoyi, Fan Liang, Bin Wang, Li Zha, and Zhiwei Xu. 2014. DataMPI: Extending MPI to Hadoop-like Big Data Computing. 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 829–838.
Lu, Xiaoyi, Haiyang Shi, Dipti Shankar, and Dhabaleswar K. Panda. 2017. Performance Characterization and Acceleration of Big Data Workloads on OpenPOWER System. 2017 IEEE International Conference on Big Data (Big Data). 213–222. doi: 10.1109/BigData.2017.8257929.
Lu, Xiaoyi, Md. Wasi-ur Rahman, Nusrat Sharmin Islam, and Dhabaleswar K. (DK) Panda. 2014. A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks. In Advancing Big Data Benchmarks, 32–42. Lecture Notes in Computer Science 8585. New York: Springer.
Lu, Xiaoyi, M. W. U. Rahman, N. Islam, D. Shankar, and D. K. Panda. 2014. Accelerating Spark with RDMA for Big Data Processing: Early Experiences. 2014 IEEE 22nd Annual Symposium on High-Performance Interconnects (HOTI), 9–16.
Lu, Youyou, Jiwu Shu, and Wei Wang. 2014. ReconFS: A Reconstructable File System on Flash Storage. 12th USENIX Conference on File and Storage Technologies (FAST 14), 75–88.
Malewicz, Grzegorz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A System for Large-Scale Graph Processing. Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, 135–146.
Markthub, Pak, Mehmet E. Belviranli, Seyong Lee, Jeffrey S. Vetter, and Satoshi Matsuoka. Dragon: Breaking GPU Memory Capacity Limits with Direct NVM Access. In (SC ’18), 32, 1–13.
Marmol, Leonardo, Swaminathan Sundararaman, Nisha Talagala, Raju Rangaswami, Sushma Devendrappa, Bharath Ramsundar, and Sriram Ganesan. 2014. NVMKV: A Scalable and Lightweight Flash Aware Key-Value Store. Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC ’15). USENIX Association, USA, 207–219.
Massie, Matt. 2018. Ganglia Monitoring System.
Mellanox. 2016. Understanding Erasure Coding Offload. https://community.mellanox.com/docs/DOC-2414.
Mellanox. NVIDIA. 2021. NVIDIA Bluefield Data Processing Units.
Mellanox. 2018. Introducing 200G HDR InfiniBand Solutions. White paper. https://www.mellanox.com/related-docs/whitepapers/WP_Introducing_200G_HDR_InfiniBand_Solutions.pdf.
Mellanox, NVIDIA. 2018. SparkRDMA ShuffleManager Plugin. https://github.com/Mellanox/SparkRDMA/.
Mellanox, NVIDIA. 2021. End-to-End High-Speed Ethernet and InfiniBand Interconnect Solutions.
Meng, Xiangrui, Joseph Bradley, Burak Yavuz, Evan Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, D. B. Tsai, Manish Amde, Sean Owen, et al. 2016. MLlib: Machine Learning in Apache Spark. Journal of Machine Learning Research 17 (1): 1235–1241.
Mickens, James, Edmund B. Nightingale, Jeremy Elson, Darren Gehring, Bin Fan, Asim Kadav, Vijay Chidambaram, Osama Khan, and Krishna Nareddy. 2014. Blizzard: Fast, Cloud-scale Block Storage for Cloud-oblivious Applications. 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14), 257–273.
Micron. 2019. 3D XPoint technology.
Microsoft. 2021a. Create an Azure VM with Accelerated Networking using Azure CLI—Microsoft Docs.
Min, Changwoo, Sanidhya Kashyap, Steffen Maass, and Taesoo Kim. 2016. Understanding Manycore Scalability of File Systems. USENIX Annual Technical Conference (USENIX ATC ’16), 71–85.
Mitchell, C., Y. Geng, and J. Li. 2013. Using One-sided RDMA Reads to Build a Fast, CPU-efficient Key-Value Store. Proceedings of USENIX Annual Technical Conference (USENIX ATC ’13).
MLBench Team. 2021. MLBench: Distributed Machine Learning Benchmark.
MLCommons. 2021. MLCommons Aims to Accelerate Machine Learning Innovation to Benefit Everyone.
Monroe, Don. 2020. Fugaku takes the lead. Communications of the ACM 64 (1): 16–18.
Moor Insights and Strategy. 2020. The Graphcore Second Generation IPU. https://www.graphcore.ai/hubfs/MK2-%20The%20Graphcore%202nd%20Generation%20IPU%20Final%20v7.14.2020.pdf?h.
Morgan, Timothy Prickett. 2019. Doing the Math on Future Exascale Supercomputers.
Moritz, Philipp, Robert Nishihara, Ion Stoica, and Michael I. Jordan. 2015. SparkNet: Training Deep Networks in Spark. CoRR, abs/1511.06051.
Mouzakitis, Evan. 2016. How to monitor Hadoop with Datadog.
Mouzakitis, Evan, and David Lentz. 2018. Monitor Redis using Datadog. https://www.datadoghq.com/blog/monitor-redis-using-datadog/.
Murphy, Barbara. 2018. How to Shorten Deep Learning Training Times.
MySQL. 2020. MySQL Database. http://www.mysql.com.
National Energy Research Scientific Computing Center (NERSC). 2021a. Cori.
National Energy Research Scientific Computing Center (NERSC). 2021b. Perlmutter.
Netty Project, The. 2021. Netty Project. http://netty.io.
Network Based Computing Lab (NOWLAB). 2021a. High-Performance Big Data (HiBD).
Network Based Computing Lab (NOWLAB). 2022. High-Performance Big Data (HiBD).
Neumann, Thomas, and Gerhard Weikum. 2008. RDF-3X: A RISC-style Engine for RDF. Proceedings of the VLDB Endowment 1 (1): 647–659.
NLM, National Library of Medicine. 2020. PubChemRDF.
Norton, Alex, Steve Conway, and Earl Joseph. 2020. Bringing HPC Expertise to Cloud Computing. White paper. Hyperion Research.
NoSQL Database. 2021. NoSQL—Your Ultimate Guide to the Non-Relational Universe!.
Nowoczynski, P., N. Stone, J. Yanovich, and J. Sommerfield. 2008. Zest Checkpoint Storage System for Large Supercomputers. 2008 3rd Petascale Data Storage Workshop, 1–5.
NumPy. 2021. NumPy. http://www.numpy.org/.
NVIDIA. 2017. NVIDIA Tesla V100 GPU. ARCHITECTURE.
NVIDIA. 2020. cuStreamz: A Journey to Develop GPU-Accelerated Streaming Using RAPIDS. https://www.nvidia.com/en-us/on-demand/session/gtcfall20-a21437/.
NVIDIA. 2021b. cuGraph. https://github.com/rapidsai/cugraph.
NVIDIA. 2021d. Developing a Linux Kernel Module using GPUDirect RDMA.
NVIDIA (Mellanox Technologies). 2021f. Apache Spark RDMA plugin.
NVIDIA. 2021h. NVIDIA DGX-1: Essential Instrument of AI Research. https://www.nvidia.com/en-us/data-center/dgx-1/.
NVIDIA. 2021i. NVIDIA DGX-2: Break through the Barriers to AI Speed and Scale.
NVIDIA. 2021j. NVIDIA DGX Systems: Purpose-built for the Unique Demands of AI.
NVIDIA. 2021k. NVIDIA Pascal Architecture: Infinite Compute for Infinite Opportunities.
NVIDIA. 2021l. NVIDIA Turing GPU Architecture: Graphics Reinvented. White paper.
NVIDIA. 2021m. About Us. https://www.nvidia.com/en-us/about-nvidia/.
NVIDIA. 2021n. RAPIDS—Open GPU Data Science.
NVIDIA. 2021o. Virtual GPU Software User Guide.
NVMe Express. 2016. NVMe over Fabrics. http://www.nvmexpress.org/wp-content/uploads/NVMe_Over_Fabrics.pdf.
NVM Express. 2021. NVM Express.
Oak Ridge National Laboratory. 2018. Summit: America’s Newest and Smartest Supercomputer.
OpenFabrics Alliance. 2021. OpenFabrics Alliance—Innovation in High Speed Fabrics.
Open Group, The. 2011. POSIXTM 1003.1 Frequently Asked Questions (FAQ Version 1.18).
OpenMP. 2018. OpenMP API Specification Version 5.0 November 2018: 2.9.3 SIMD Directives.
OpenSFS and EOFS. 2021. Lustre® Filesystem. http://lustre.org/.
OpenStack. 2021b. OpenStack Object Storage (Swift).
OpenVINO. 2021. OpenVINO Toolkit. https://github.com/openvinotoolkit/openvino.
OrangeFS. 2021. The OrangeFS Project.
Ott, David. 2011. Optimizing Applications for NUMA.
Ousterhout, John, Arjun Gopalan, Ashish Gupta, Ankita Kejriwal, Collin Lee, Behnam Montazeri, Diego Ongaro, Seo Jin Park, Henry Qin, Mendel Rosenblum, et al. 2015. The RAMCloud Storage System. ACM Transactions on Computer Systems (TOCS) 33 (3): 7.
Ozery, Itay. 2018. Mellanox Accelerates Apache Spark Performance with RDMA and RoCE Technologies.
Padua, David, ed. 2011. Partitioned Global Address Space (PGAS) Languages. In Encyclopedia of Parallel Computing, 1465. Boston, MA: Springer.
Pagh, Rasmus, and Flemming Friche Rodler. 2004. Cuckoo Hashing. Journal of Algorithms 51 (2): 122–144.
Panda, Dhabaleswar K., Xiaoyi Lu, and Hari Subramoni. 2018. Networking and Communication Challenges for Post-exascale Systems. Frontiers of Information Technology and Electronic Engineering 19: 1230–1235.
Paszke, Adam, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An Imperative Style, High-performance Deep Learning Library. arXiv preprint. arXiv:1912.01703.
PCI-SIG. 2021. Single Root I/O Virtualization and Sharing Specification Revision 1.1. https://pcisig.com/single-root-io-virtualization-and-sharing-specification-revision-11
Pelley, Steven, Thomas F. Wenisch, Brian T. Gold, and Bill Bridge. 2013. Storage Management in the NVRAM Era. Proceedings of the VLDB Endowment 7 (2): 121–132.
Plotly. 2021. Plotly. https://github.com/plotly/plotly.py.
Poke, Marius, and Torsten Hoefler. 2015. DARE: High-performance State Machine Replication on RDMA Networks. Proceedings of the 24th International Symposium on High-performance Parallel and Distributed Computing, 107–118.
Polychroniou, Orestis, and Kenneth A. Ross. 2014. Vectorized Bloom Filters for Advanced SIMD Processors. Proceedings of the Tenth International Workshop on Data Management on New Hardware, 6.
Polychroniou, Orestis, Arun Raghavan, and Kenneth A. Ross. 2015. Rethinking SIMD Vectorization for In-memory Databases. Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, 1493–1508.
Porobic, Danica, Erietta Liarou, Pınar Tözün, and Anastasia Ailamaki. 2014. Atrapos: Adaptive Transaction Processing on Hardware Islands. 2014 IEEE 30th International Conference on Data Engineering, 688–699.
Powell, Brett. 2017. Microsoft Power BI Cookbook: Creating Business Intelligence Solutions of Analytical Data Models, Reports, and Dashboards. Birmingham, UK: Packt Publishing Ltd.
Project Jupyter. 2021. Jupyter.
Prometheus. 2021. Prometheus—From Metrics to Insight.
Psaroudakis, Iraklis, Tobias Scheuer, Norman May, Abdelkader Sellami, and Anastasia Ailamaki. 2015. Scaling Up Concurrent Main-Memory Column-Store Scans: Towards Adaptive NUMA-aware Data and Task Placement. Proceedings of the VLDB Endowment 8 (CONF): 1442–1453.
Psaroudakis, Iraklis, Tobias Scheuer, Norman May, Abdelkader Sellami, and Anastasia Ailamaki. 2016. Adaptive NUMA-aware Data Placement and Task Scheduling for Analytical Workloads in Main-Memory Column-Stores. Proceedings of the VLDB Endowment 10 (2): 37–48.
Python. 2021. Threading—Thread-based Parallelism.
Qureshi, Moinuddin K., Vijayalakshmi Srinivasan, and Jude A. Rivers. 2009. Scalable High Performance Main Memory System Using Phase-change Memory Technology. ACM SIGARCH Computer Architecture News 37 (3): 24–33.
Rahman, Md. Wasi-ur, Nusrat Sharmin Islam, Xiaoyi Lu, Jithin Jose, Hari Subramoni, Hao Wang, and Dhabaleswar K. Panda. 2013. High-performance RDMA-based Design of Hadoop MapReduce over InfiniBand. 2013 IEEE 27th International Parallel and Distributed Processing Symposium Workshops and PHD Forum (IPDPSW), 1908–1917.
Rahman, Md. Wasi ur, Nusrat Sharmin Islam, Xiaoyi Lu, and Dhabaleswar K. Panda. 2017. NVMD: Non-Volatile Memory Assisted Design for Accelerating MapReduce and DAG Execution Frameworks on HPC Systems. Proceedings of IEEE International Conference on Big Data, BigData ’17, 369–374.
Rahman, M. W., Xiaoyi Lu, Nusrat S. Islam, Raghunath Rajachadrasekar, and D. K. Panda. 2015. High-performance Design of YARN MapReduce on Modern HPC Clusters with Lustre and RDMA. 2015 IEEE International Parallel and Distributed Processing Symposium, 2015, pp. 291–300, doi: 10.1109/IPDPS.2015.83.
Rajpurkar, Pranav, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ Questions for Machine Comprehension of Text. arXiv preprint. arXiv:1606.05250.
RapidLoop. 2021. OpsDash. https://www.opsdash.com/integrations.
RAPIDS. 2021. cuDF—GPU DataFrames Library.
RDMA Consortium. 2016. Architectural Specifications for RDMA over TCP/IP.
Red Hat, Inc. 2021. Gluster Is a Free and Open Source Software Scalable Network Filesystem.
Redis Labs. 2021a. RedisGraph: A Graph Database Module for Redis.
Redis Labs. 2021b. Redis Cluster Specification.
Redis Labs. 2021c. Redis Sentinel Documentation.
Redis Labs. 2021d. Redis. https://redis.io.
Ren, Kun, Alexander Thomson, and Daniel J. Abadi. 2014. An Evaluation of the Advantages and Disadvantages of Deterministic Database Systems. Proceedings of the VLDB Endowment 7 (10): 821–832.
Reynolds, Douglas A., and Richard C. Rose. 1995. Robust Text-independent Speaker Identification Using Gaussian Mixture Speaker Models. IEEE Transactions on Speech and Audio Processing 3 (1): 72–83.
Rho, Eunhee, Kanchan Joshi, Seung-Uk Shin, Nitesh Jagadeesh Shetty, Jooyoung Hwang, Sangyeun Cho, Daniel D. G. Lee, and Jaeheon Jeong. 2018. FStream: Managing Flash Streams in the File System. 16th USENIX Conference on File and Storage Technologies (FAST 18), 257–264.
RIKEN Center for Computational Science. 2020. Fugaku (supercomputer).
Rödiger, Wolf, Tobias Mühlbauer, Alfons Kemper, and Thomas Neumann. 2015. High-speed Query Processing over High-speed Networks. Proceedings of the VLDB Endowment 9 (4): 228–239.
Rohloff, Kurt, and Richard E. Schantz. 2010. High-performance, Massively Scalable Distributed Systems Using the MapReduce Software Framework: The SHARD Triple-Store. Programming Support Innovations for Emerging Distributed Applications (PSI ETA ’10), 4–145.
Ronan Clément, Koray, and Soumith. 2021. Torch - Scientific computing for LuaJIT.
Ross, Kenneth A. 2007. Efficient Hash Probes on Modern Processors. IEEE 23rd International Conference on Data Engineering (ICDE ’07), 1297–1301.
Rudoff, Andy. 2013. Programming Models for Emerging Non-volatile Memory Technologies. ;login: 38 (3): 40–45.
Rudoff, Andy. 2017. Persistent Memory Programming. ;login: 42: 34–40.
Sadasivam, Satish Kumar, Brian W. Thompto, Ron Kalla, and William J. Starke. 2017. IBM Power9 Processor Architecture. IEEE Micro 37 (2): 40–51.
SambaNova Systems. 2021b. Accelerated Computing with a Reconfigurable Dataflow Architecture. https://sambanova.ai/wp-content/uploads/2021/06/SambaNova_RDA_Whitepaper_English.pdf.
Saxena, Mohit, Michael M. Swift, and Yiying Zhang. 2012. FlashTier: A Lightweight, Consistent and Durable Storage Cache. Proceedings of the 7th ACM European Conference on Computer Systems, 267–280.
SchedMD. 2020a. Slurm Workload Manager—Documentation.
SchedMD. 2020b. Slurm Workload Manager—Overview.
Schmuck, Frank B., and Roger L Haskin. 2002. GPFS: A Shared-Disk File System for Large Computing Clusters. Proceedings of the Conference on File and Storage Technologies (FAST ’02), 231–244.
Scouarnec, Nicolas Le. 2018. Cuckoo++ Hash Tables: High-performance Hash Tables for Networking Applications. Proceedings of the 2018 Symposium on Architectures for Networking and Communications Systems, 41–54.
Scylla. 2021. The Real-time Big Data Database. www.scylladb.com.
SDSC, San Diego Supercomputer Center. 2021a. SDSC Comet User Guide.
SDSC, San Diego Supercomputer Center. 2021b. SDSC Gordon User Guide.
Segal, Oren, and Martin Margala. 2016. Exploring the Performance Benefits of Heterogeneity and Reconfigurable Architectures in a Commodity Cloud. 2016 International Conference on High Performance Computing and Simulation (HPCS), 132–139.
Semiodesk. 2021. Trinity RDF: Entity Framework for Graph Databases.
Shankar, Dipti, Xiaoyi Lu, and Dhabaleswar K. Panda. 2019a. SCOR-KV: SIMD-aware Client-centric and Optimistic RDMA-based Key-Value Store for Emerging CPU Architectures. 2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HIPC), 257–266.
Shankar, Dipti, Xiaoyi Lu, and Dhabaleswar K. Panda. 2019b. SimdHT-Bench: Characterizing SIMD-aware Hash Table Designs on Emerging CPU Architectures. 2019 IEEE International Symposium on Workload Characterization (IISWC), 178–188.
Shankar, Dipti, Xiaoyi Lu, Md. W. Rahman, Nusrat Islam, and D. K. Panda. 2015. Benchmarking Key-Value Stores on High-performance Storage and Interconnects for Web-scale Workloads. BIG DATA ’15: Proceedings of the 2015 IEEE International Conference on Big Data, 539–544
Shankar, Dipti, Xiaoyi Lu, M. W. Rahman, Nusrat Islam, and Dhabaleswar K. Panda. 2014. A Micro-benchmark Suite for Evaluating Hadoop MapReduce on High-performance Networks. Proceedings of the Fifth Workshop on Big Data Benchmarks, Performance Optimization, and Emerging Hardware (BE-5), 19–33. Lecture Notes in Computer Science 8807. Hangzhou, China: Springer.
Shankar, D., X. Lu, and D. K. Panda. 2016. Boldio: A Hybrid and Resilient Burst-Buffer over Lustre for Accelerating Big Data I/O. 2016 IEEE International Conference on Big Data (BIG DATA), 404–409.
Shankar, D., X. Lu, N. Islam, M. Wasi-Ur Rahman, and D. K. Panda. 2016. High-performance Hybrid Key-Value Store on Modern Clusters with RDMA Interconnects and SSDs: Non-blocking Extensions, Designs, and Benefits. 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 393–402.
Sharma, Upendra, Prashant Shenoy, Sambit Sahu, and Anees Shaikh. 2011. A Cost-aware Elasticity Provisioning System for the Cloud. 2011 31st International Conference on Distributed Computing Systems, 559–570.
Shen, Zhaoyan, Feng Chen, Yichen Jia, and Zili Shao. 2018. DIDACache: An Integration of Device and Application for Flash-based Key-Value Caching. ACM Transactions on Storage 14 (3): article 26.
Shi, Haiyang, Xiaoyi Lu, Dipti Shankar, and Dhabaleswar K. Panda. UMR-EC: A Unified and Multi-rail Erasure Coding Library for High-performance Distributed Storage Systems. Proceedings of the 28th International Symposium on High-performance Parallel and Distributed Computing (HPDC ’19), 219–230.
Shi, Jiaxin, Youyang Yao, Rong Chen, Haibo Chen, and Feifei Li. 2016. Fast and Concurrent {RDF} Queries with RDMA-based Distributed Graph Exploration. 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), 317–332.
Shi, Lin, Hao Chen, and Jianhua Sun. 2009. vCUDA: GPU Accelerated High Performance Computing in Virtual Machines. 2009 IEEE International Symposium on Parallel Distributed Processing, 1–11.
Shreedhar, Madhavapeddi, and George Varghese. 1996. Efficient Fair Queuing Using Deficit Round-Robin. IEEE/ACM Transactions on Networking 4 (3): 375–385.
Shun, Julian, and Guy E. Blelloch. 2013. Ligra: A Lightweight Graph Processing Framework for Shared Memory. Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 135–146.
Shvachko, Konstantin, Hairong Kuang, Sanjay Radia, and Robert Chansler. 2010. The Hadoop Distributed File System. Proceedings of the 26th Symposium on Mass Storage Systems and Technologies (MSST), 1–10.
Singh, Teja, Sundar Rangarajan, Deepesh John, Russell Schreiber, Spence Oliver, Rajit Seahra, and Alex Schaefer. 2020. 2.1 Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core. 2020 IEEE International Solid-state Circuits Conference (ISSCC), 42–44.
Singhal, Amit. 2012. Introducing the Knowledge Graph: Things, Not Strings.
SingleStore Inc. 2021. SingleStore: The Single Database for All Data-Intensive Applications.
Song, Xiang, Jian Yang, and Haibo Chen. 2014. Architecting Flash-based Solid-state Drive for High-performance I/O Virtualization. IEEE Computer Architecture Letters 13 (2): 61–64.
spdk. io. 2021. SPDK Hello World.
SQream. 2021. Bringing the Power of the GPU to the Era of Massive Data. https://sqream.com/product/data-acceleration-platform/sql-gpu-database/.
Stanford DAWN Team. 2021. DAWNBench: An End-to-End Deep Learning Benchmark and Competition.
Stanford Vision Lab. 2021. ImageNet. https://image-net.org/.
Sterling, Thomas, Ewing Lusk, and William Gropp. 2003. Beowulf Cluster Computing with Linux. Cambridge, MA: MIT Press.
Sterling, Thomas, Ewing Lusk, and William Gropp. 2003b. Beowulf Cluster Computing with Linux. In Mit press. Cambridge, MA.
Streamz. 2021. Real-time Stream Processing for Python.
Strukov, Dmitri B., Gregory S. Snider, Duncan R. Stewart, and R. Stanley Williams. 2008. The Missing Memristor Found. Nature 453 (7191): 80.
Stuart, Jeff A., and John D. Owens. 2011. Multi-GPU MapReduce on GPU Clusters. 2011 IEEE International Parallel and Distributed Processing Symposium, 1068–1079.
Stuedi, Patrick, Animesh Trivedi, Jonas Pfefferle, Radu Stoica, Bernard Metzler, Nikolas Ioannou, and Ioannis Koltsidas. 2017. Crail: A High-performance I/O Architecture for Distributed Data Processing. IEEE Data Engineering Bulletin 40 (1): 38–49.
Su, Maomeng, Mingxing Zhang, Kang Chen, Zhenyu Guo, and Yongwei Wu. 2017. RFP: When RPC Is Faster than Server-bypass with RDMA. Proceedings of the 12th European Conference on Computer Systems. Eurosys ’17, 1–15.
Suzuki, Yusuke, Shinpei Kato, Hiroshi Yamada, and Kenji Kono. 2014. GPUvm: Why Not Virtualizing GPUs at the Hypervisor? Usenix Annual Technical Conference, 109–120.
Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going Deeper with Convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1–9.
Szegedy, Christian, Sergey Ioffe, Vincent Vanhoucke, and Alex Alemi. 2016. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI ’17). 4278–4284.
Tai, Kai Xin. 2020. Monitor Apache Flink with Datadog.
Taleb, Yacine, Ryan Stutsman, Gabriel Antoniu, and Toni Cortes. 2018. Tailwind: Fast and Atomic RDMA-based Replication. 2018 USENIX Annual Technical Conference (USENIX ATC 18), 851–863.
Talpey, Tom. 2015. Remote Access to Ultra-low-latency Storage.
Talpey, Tom. 2019. RDMA Persistent Memory Extensions. 15th Annual Open Fabrics Alliance Workshop.
Tang, Haodong, Jian Zhang, and Fred Zhang. 2018. Accelerating Ceph with RDMA and NVMe-oF.
TensorFlow. 2021. TensorBoard: TensorFlow’s Visualization Toolkit. https://www.tensorflow.org/tensorboard.
Thaler, David, and Chinya V. Ravishankar. 1996. A Name-based Mapping Scheme for Rendezvous. Technical report CSE-TR-316-96, University of Michigan.
Thomson, Alexander, Thaddeus Diamond, Shu-Chun Weng, Kun Ren, Philip Shao, and Daniel J. Abadi. 2012. Calvin: Fast Distributed Transactions for Partitioned Database Systems. Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD ’12), 1–12.
Toon, Nigel. 2020. Introducing 2nd Generation IPU Systems for AI at Scale.
TOP500.org. TOP500 Supercomputing Sites. http://www.top500.org/.
TOP500.org. 2020. Highlights—November 2020.
TPC-H Version 2 and Version 3. 2021. TPC-H Benchmark.
Transaction Processing Performance Council. TPC—Homepage.
Tu, Stephen, Wenting Zheng, Eddie Kohler, Barbara Liskov, and Samuel Madden. 2013. Speedy Transactions in Multicore In-memory Databases. Proceedings of the 24th ACM Symposium on Operating Systems Principles, 18–32.
Tulapurkar, A. A., Y. Suzuki, A. Fukushima, H. Kubota, H. Maehara, K. Tsunekawa, D. D. Djayaprawira, N. Watanabe, and S. Yuasa. 2005. Spin-Torque Diode Effect in Magnetic Tunnel Junctions. Nature 438 (7066): 339.
Twitter. 2017. Fatcache: Memcache on SSD.
Twitter. 2019. Twemcache: Twitter Memcached.
Valiant, Leslie G. 1990. A Bridging Model for Parallel Computation. Communications of the ACM 33 (8): 103–111.
Vavilapalli, Vinod Kumar, Arun C. Murthy, Chris Douglas, Sharad Agarwal, Mahadev Konar, Robert Evans, Thomas Graves, Jason Lowe, Hitesh Shah, Siddharth Seth, et al. 2013. Apache Hadoop YARN: Yet Another Resource Negotiator. Proceedings of the 4th Annual Symposium on Cloud Computing (SoCC), 5.
Vazhkudai, Sudharshan S., Bronis R. de Supinski, Arthur S. Bland, Al Geist, James Sexton, Jim Kahle, Christopher J. Zimmer, Scott Atchley, Sarp Oral, Don E. Maxwell, et al. 2018. The Design, Deployment, and Evaluation of the CORAL Pre-exascale Systems. Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis. (SC ’18), 52.
Volos, Haris, Sanketh Nalli, Sankarlingam Panneerselvam, Venkatanathan Varadarajan, Prashant Saxena, and Michael M. Swift. 2014. Aerie: Flexible File-system Interfaces to Storage-class Memory. Proceedings of the 9th European Conference on Computer Systems, 1–14.
Wang, Chao, Lei Gong, Qi Yu, Xi Li, Yuan Xie, and Xuehai Zhou. 2016. DLAU: A Scalable Deep Learning Accelerator Unit on FPGA. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 36 (3): 513–517.
Wang, Peng, Guangyu Sun, Song Jiang, Jian Ouyang, Shiding Lin, Chen Zhang, and Jason Cong. 2014. An Efficient Design and Implementation of LSM-Tree Based Key-Value Store on Open-Channel SSD. Proceedings of the 9th European Conference on Computer Systems, 1–14.
Wang, Teng, Kathryn Mohror, Adam Moody, Weikuan Yu, and Kento Sato. 2015. BurstFS: A Distributed Burst Buffer File System for Scientific Applications. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’16. 807–818, doi: 10.1109/SC.2016.68.
Wang, Teng, S. Oral, Yandong Wang, B. Settlemyer, S. Atchley, and Weikuan Yu. 2014. BurstMem: A High-performance Burst Buffer System for Scientific Applications. 2014 IEEE International Conference on Big Data. (Big Data). 71–79. IEEE.
Wang, Yandong, Xiaoqiao Meng, Li Zhang, and Jian Tan. 2014. C-Hint: An Effective and Reliable Cache Management for RDMA-accelerated Key-Value Stores. Proceedings of the ACM Symposium on Cloud Computing, 1–13.
Wang, Yandong, Xinyu Que, Weikuan Yu, Dror Goldenberg, and Dhiraj Sehgal. 2011. Hadoop Acceleration through Network Levitated Merge. SC ’11: Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis, 1–10.
Wang, Yiheng, Xin Qiu, Ding Ding, Yao Zhang, Yanzhang Wang, Xianyan Jia, Yan Wan, Zhichao Li, Jiao Wang, Shengsheng Huang, et al. 2018. BigDL: A Distributed Deep Learning Framework for Big Data. arXiv preprint. arXiv:1804.05839.
Wei, Q., M. Xue, J. Yang, C. Wang, and C. Cheng. 2015. Accelerating Cloud Storage System with Byte-addressable Non-volatile Memory. 2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS), 354–361.
Wu, Xiaojian, and A. L. Reddy. SCMFS: A File System for Storage Class Memory. In Sc’11, 39.
Xia, Fei, Dejun Jiang, Jin Xiong, and Ninghui Sun. 2017. HiKV: A Hybrid Index Key-Value Store for DRAM-NVM Memory Systems. 2017 USENIX Annual Technical Conference (USENIX ATC 17), 349–362.
Xilinx. 2021. FPGA Leadership across Multiple Process Nodes.
Xilinx, AMD. 2021. Field Programmable Gate Array: What Is an FPGA?.
X IO. 2021. Axellio Edge Computing Systems, from XIO Technologies. https://nvmexpress.org/portfolio-items/axellio-super-io-platform-from-xio-technologies/.
XLA. 2021. XLA: Optimizing Compiler for Machine Learning.
Xu, Jian, and Steven Swanson. 2016. NOVA: A Log-structured File System for Hybrid Volatile/Non-volatile Main Memories. 14th USENIX Conference on File and Storage Technologies (FAST 16), 323–338.
Xu, Qiumin, Huzefa Siyamwala, Mrinmoy Ghosh, Tameesh Suri, Manu Awasthi, Zvika Guz, Anahita Shayesteh, and Vijay Balakrishnan. 2015. Performance Analysis of NVMe SSDs and Their Implication on Real World Databases. Proceedings of the 8th ACM International Systems and Storage Conference, 6.
Xu, Yuehai, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2014. Characterizing Facebook’s Memcached Workload. IEEE Internet Computing 18 (2): 41–49.
Yahoo. 2018. CaffeOnSpark: Distributed Deep Learning on Hadoop and Spark Clusters.
Yahoo. 2021. Webscope Datasets. https://webscope.sandbox.yahoo.com/catalog.php.
Yang, Ziye, Luse E. Paul, James R. Harris, Benjamin Walker, Daniel Verkamp, Changpeng Liu, Cunyin Chang, Gang Cao, Jonathan Stern, and Vishal Verma. 2017. SPDK: A Development Kit to Build High Performance Storage Applications. 2017 IEEE International Conference on Cloud Computing Technology and Science (CloudCom), 154–161.
Yuan, Yuan, Meisam Fathi Salmi, Yin Huai, Kaibo Wang, Rubao Lee, and Xiaodong Zhang. 2016. Spark-GPU: An Accelerated In-memory Data Processing Engine on Clusters. 2016 IEEE International Conference on Big Data (Big Data), 273–283.
Zadok, Erez, Dean Hildebrand, Geoff Kuenning, and Keith A. Smith. 2017. POSIX Is Dead! Long Live… errr… What Exactly? Proceedings of the 9th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage ’17), 12–12.
Boston, MA.
Zaharia, Matei, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauly, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-memory Cluster Computing. 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12), 15–28.
Zaharia, Matei, Reynold S. Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J. Franklin, et al. 2016. Apache Spark: A Unified Engine for Big Data Processing. Communications of the ACM 59 (11): 56–65.
Zaharia, Matei, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, and Ion Stoica. 2013. Discretized Streams: Fault-tolerant Streaming Computation at Scale. Proceedings of the 24th ACM Symposium on Operating Systems Principles, 423–438.
Zhang, Chen, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks. Proceedings of the 2015 ACM/SIGDA International Symposium on Field-programmable Gate Arrays, 161–170.
Zhang, Jie, David Donofrio, John Shalf, Mahmut T. Kandemir, and Myoungsoo Jung. 2015. NVMMU: A Non-volatile Memory Management Unit for Heterogeneous GPU-SSD Architectures. 2015 International Conference on Parallel Architecture and Compilation (PACT ’15), 13–24.
Zhang, Jie, Xiaoyi Lu, and Dhabaleswar K. Panda. 2017. High-performance Virtual Machine Migration Framework for MPI Applications on SR-IOV Enabled InfiniBand Clusters. 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 143–152.
Zhang, Jie, Xiaoyi Lu, Ching-Hsiang Chu, and Dhabaleswar K. Panda. 2019. C-GDR: High-performance Container-aware GPUDirect MPI Communication Schemes on RDMA Networks. 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 242–251.
Zhang, Jie, Xiaoyi Lu, Jithin Jose, Mingzhe Li, Rong Shi, and Dhabaleswar K. (DK) Panda. 2014. High Performance MPI Library over SR-IOV Enabled InfiniBand Clusters. 2014 21st International Conference on High Performance Computing (HIPC), 1–10.
Zhang, K., J. Hu, B. He, and B. Hua. 2017. DIDO: Dynamic Pipelines for In-memory Key-Value Stores on Coupled CPU-GPU Architectures. 2017 IEEE 33rd International Conference on Data Engineering (ICDE), 671–682.
Zhang, Yiying, Jian Yang, Amirsaman Memaripour, and Steven Swanson. 2015. Mojim: A Reliable and Highly-available Non-volatile Memory System. ACM SIGARCH Computer Architecture News 43: 3–18.
Zhang, Yunming, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian Shun, and Saman Amarasinghe. 2018. GraphIt: A High-performance DSL for Graph Analytics. arXiv preprint. arXiv:1805.00923.
Zhang, Yunming, Vladimir Kiriansky, Charith Mendis, Saman Amarasinghe, and Matei Zaharia. 2017. Making Caches Work for Graph Analytics. 2017 IEEE International Conference on Big Data (Big Data), 293–302.
Zheng, Shengan, Linpeng Huang, Hao Liu, Linzhu Wu, and Jin Zha. 2016. HMVFS: A Hybrid Memory Versioning File System. 2016 32nd Symposium on Mass Storage Systems and Technologies (MSST), 1–14.
Zheng, Shengan, Morteza Hoseinzadeh, and Steven Swanson. 2019. Ziggurat: A Tiered File System for Non-Volatile Main Memories and Disks. 17th USENIX Conference on File and Storage Technologies (FAST 19), 207–219.
Zhou, Jingren, and Kenneth A. Ross. 2002. Implementing Database Operations Using SIMD Instructions. Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, 145–156.
Zhou, Shijie, Rajgopal Kannan, Viktor K. Prasanna, Guna Seetharaman, and Qing Wu. 2019. HitGraph: High-throughput Graph Processing Framework on FPGA. IEEE Transactions on Parallel and Distributed Systems 30 (10): 2249–2264. 10.1109/TPDS.2019.2910068.