本文共 12832 字,大约阅读时间需要 42 分钟。
A Berkeley View of Systems Challenges for AI
https://arxiv.org/pdf/1712.05855.pdfStrategies and Principles of Distributed Machine Learning on Big Data
https://arxiv.org/abs/1512.09295Deep learning
Nature volume 521, 2015 https://www.nature.com/articles/nature14539Deep learning reading list
http://deeplearning.net/reading-listMulti-tenant GPU Clusters for Deep LearningWorkloads: Analysis and Implications
https://www.microsoft.com/en-us/research/uploads/prod/2018/05/gpu_sched_tr.pdfTensorFlow: A System for Large-Scale Machine Learning
OSDI 2016 https://www.usenix.org/system/files/conference/osdi16/osdi16-abadi.pdfRay: A Distributed Framework for Emerging AI Applications
OSDI 2018 https://www.usenix.org/system/files/osdi18-moritz.pdfHyperDrive: Exploring Hyperparameters with POP Scheduling
MiddleWare 2017 https://dl.acm.org/citation.cfm?id=3135994Ease.ml: Towards Multi-tenant Resource Sharing for Machine Learning Workloads
VLDB 2018 http://www.vldb.org/pvldb/vol11/p607-li.pdfAutomating Model Search for Large Scale Machine Learning
SoCC 2015 http://dl.acm.org/authorize?N91362Google Vizier: A Service for Black-Box Optimization
KDD 2017 https://dl.acm.org/citation.cfm?id=3098043Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization
Journal of Machine Learning Research 18 (2018) https://arxiv.org/pdf/1603.06560.pdfHyperopt: a Python library for model selection and hyperparameter optimization
Computational Science & Discovery, 8(1) 2015 http://iopscience.iop.org/article/10.1088/1749-4699/8/1/014008Auto-Keras: Efficient Neural Architecture Search with Network Morphism
https://arxiv.org/pdf/1806.10282v2.pdfCavs: An Efficient Runtime System for Dynamic Neural Networks
ATC 2018 https://www.usenix.org/system/files/conference/atc18/atc18-xu-shizhen.pdfTVM: An Automated End-to-End Optimizing Compiler for Deep Learning
OSDI 2018 https://www.usenix.org/system/files/osdi18-chen.pdfPipeDream: Fast and Efficient Pipeline Parallel DNN Training
https://arxiv.org/pdf/1806.03377.pdfSTRADS: A Distributed Framework for Scheduled Model Parallel Machine Learning
EuroSys 2016 https://dl.acm.org/citation.cfm?id=2901331Dynamic Control Flow in Large-Scale Machine Learning
EuroSys 2018 https://dl.acm.org/citation.cfm?id=3190551Improving the Expressiveness of Deep Learning Frameworks with Recursion
EuroSys 2018 https://dl.acm.org/citation.cfm?id=3190530Continuum: A Platform for Cost-Aware, Low-Latency Continual Learning
SoCC 2018 https://dl.acm.org/citation.cfm?id=3267817KeystoneML: Optimizing Pipelines for Large-ScaleAdvanced Analytics
ICDE 2017 https://amplab.cs.berkeley.edu/wp-content/uploads/2017/01/ICDE_2017_CameraReady_475.pdfOwl: A General-Purpose Numerical Library in OCaml
https://arxiv.org/pdf/1707.09616.pdfLarge Scale Distributed Deep Networks
NIPS 2012 https://ai.google/research/pubs/pub40565.pdfManaged Communication and Consistency for Fast Data-Parallel Iterative Analytics
SoCC 2015 http://dl.acm.org/authorize?N91363Ako: Decentralised Deep Learning with Partial Gradient Exchange
SOCC 2016 https://lsds.doc.ic.ac.uk/sites/default/files/ako-socc16.pdfPoseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters
ATC 2017 https://www.usenix.org/system/files/conference/atc17/atc17-zhang.pdfParameter Hub: a Rack-Scale Parameter Server for Distributed Deep Neural Network Training
SoCC 2018 https://dl.acm.org/citation.cfm?id=3267840MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems
ML Systems Workshop at NIPS 2016 https://arxiv.org/pdf/1512.01274.pdfScaling Distributed Machine Learning with the Parameter Server
OSDI 2014 https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-li_mu.pdfProject Adam: Building an Efficient and Scalable Deep Learning Training System
OSDI 2014 https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-chilimbi.pdfOrpheus: Efficient Distributed Machine Learning via System and Algorithm Co-design
SoCC 2018 https://dl.acm.org/citation.cfm?id=3267810Petuum: A New Platform for Distributed Machine Learning on Big Data
KDD 2015 https://arxiv.org/pdf/1312.7651.pdfGPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
https://arxiv.org/pdf/1811.06965.pdfDeepCPU: Serving RNN-based Deep Learning Models 10x Faster
ATC 2018 https://www.usenix.org/system/files/conference/atc18/atc18-zhang-minjia.pdfClipper: A Low-Latency Online Prediction Serving System
NSDI 2017 https://www.usenix.org/system/files/conference/nsdi17/nsdi17-crankshaw.pdfResearch for Practice: Prediction-Serving Systems
ACM Queue 16(1), 2018 https://queue.acm.org/detail.cfm?id=3210557InferLine: ML Inference Pipeline Composition
https://arxiv.org/pdf/1812.01776.pdfPRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems
OSDI 2018 https://www.usenix.org/system/files/osdi18-lee.pdfOlympian: Scheduling GPU Usage in a Deep Neural Network Model Serving System
MiddleWare 2018 https://dl.acm.org/citation.cfm?id=3274813Low Latency RNN Inference with Cellular Batching
EuroSys 2018 https://dl.acm.org/citation.cfm?id=3190541SERF: Efficient Scheduling for Fast Deep Neural Network Serving via Judicious Parallelism
SC 2016 https://ieeexplore.ieee.org/document/7877104NoScope: Optimizing Neural Network Queries over Video at Scale
VLDB 2017 https://dl.acm.org/citation.cfm?id=3137664Scheduling
Optimus: An Efficient Dynamic Resource Scheduler for Deep Learning Clusters EuroSys 2018 https://dl.acm.org/citation.cfm?id=3190517SLAQ: Quality-Driven Scheduling for Distributed Machine Learning
SoCC 2017 https://dl.acm.org/authorize?N46878Proteus: agile ML elasticity through tiered reliability in dynamic resource markets
EuroSys 2017 https://dl.acm.org/citation.cfm?id=3064182Gandiva: Introspective Cluster Scheduling for Deep Learning
OSDI 2018 https://www.usenix.org/system/files/osdi18-xiao.pdfTopology-Aware GPU Scheduling for Learning Workloads in Cloud Environments
SC 2017 https://dl.acm.org/citation.cfm?id=3126933Hemingway: Modeling Distributed Optimization Algorithms
ML Systems Workshop at NIPS 2016 https://arxiv.org/pdf/1702.05865.pdfAsynchronous Methods for Deep Reinforcement Learning
ICML 2016 http://proceedings.mlr.press/v48/mniha16.pdfDon’t Use Large Mini-Batches, Use Local SGD
https://arxiv.org/pdf/1808.07217.pdfGeePS: Scalable deep learning on distributed GPUs with a GPU-specialized parameter server
EuroSys 2016 https://dl.acm.org/citation.cfm?id=2901323ImageNet Training in Minutes
ICPP 2018 https://dl.acm.org/citation.cfm?id=3225069Semantics-Preserving Parallelization of Stochastic Gradient Descent
IPDPS 2018 https://ieeexplore.ieee.org/abstract/document/8425176HOGWILD!: A Lock-Free Approach to ParallelizingStochastic Gradient Descent
NIPS 2011 https://papers.nips.cc/paper/4390-hogwild-a-lock-free-approach-to-parallelizing-stochastic-gradient-descent.pdfQSGD: Communication-Efficient SGD via Randomized Quantization
NIPS 2017 https://papers.nips.cc/paper/6768-qsgd-communication-efficient-sgd-via-gradient-quantization-and-encoding.pdfCan Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent
NIPS 2017 https://papers.nips.cc/paper/7117-can-decentralized-algorithms-outperform-centralized-algorithms-a-case-study-for-decentralized-parallel-stochastic-gradient-descent.pdfSlow and Stale Gradients Can Win the Race: Error-Runtime Trade-offs in Distributed SGD
AIStats 2018 https://arxiv.org/pdf/1803.01113.pdfProbabilistic Synchronous Parallel
https://arxiv.org/pdf/1709.07772.pdfDeepXplore: Automated Whitebox Testing of Deep Learning Systems
SOSP 2017 https://dl.acm.org/authorize?N47145Programmatically Interpretable Reinforcement Learning
ICML 2018 https://arxiv.org/pdf/1804.02477.pdfAI2: Safety and Robustness Certification of Neural Networks with Abstract Interpretation
SP 2018 https://ieeexplore.ieee.org/document/8418593“Why Should I Trust You?”Explaining the Predictions of Any Classifier
KDD 2016 https://www.kdd.org/kdd2016/papers/files/rfp0573-ribeiroA.pdfLearning to Explain: An Information-Theoretic Perspective on Model Interpretation
ICML 2018 https://arxiv.org/pdf/1802.07814.pdfA Unified Approach to Interpreting Model Predictions
NIPS 2017 https://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdfThe Mythos of Model Interpretability
WHI 2016 https://arxiv.org/pdf/1606.03490.pdfMISTIQUE: A System to Store and Query Model Intermediates for Model Diagnosis
SIGMOD 2018 https://dl.acm.org/citation.cfm?id=3196934MODELDB: A System for Machine Learning Model Management
HILDA 2016 https://mitdbg.github.io/modeldb/papers/hilda_modeldb.pdfModel Governance: Reducing the Anarchy of Production ML
ATC 2018 https://www.usenix.org/system/files/conference/atc18/atc18-sridhar.pdfThe Missing Piece in Complex Analytics: Low Latency,Scalable Model Management and Serving with Velox
CIDR 2015 http://www.bailis.org/papers/velox-cidr2015.pdfBandana: Using Non-volatile Memory for Storing Deep Learning Models
SysML 2019 https://arxiv.org/abs/1811.05922Deep learning with limited numerical precision
ICML 2015 http://proceedings.mlr.press/v37/gupta15.pdfIn-Datacenter Performance Analysis of a Tensor Processing Unit
ISCA 2017 https://dl.acm.org/citation.cfm?id=3080246Serving DNNs in Real Timeat Datacenter Scale with Project Brainwave
IEEE MICRO 38(2), Mar./Apr. 2018 https://ieeexplore.ieee.org/document/8344479Efficient Deep Learning on Multi-Source Private Data
https://arxiv.org/pdf/1807.06689.pdfChiron: Privacy-preserving Machine Learning as a Service
https://arxiv.org/pdf/1803.05961.pdfMLCapsule: Guarded Offline Deployment of Machine Learning as a Service
https://arxiv.org/pdf/1808.00590.pdfSlalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware
https://arxiv.org/pdf/1806.03287.pdfPrivado: Practical and Secure DNN Inference
https://arxiv.org/pdf/1810.00602.pdfApplied Machine Learning at Facebook: A Datacenter Infrastructure Perspective
HPCA 2018 https://research.fb.com/publications/applied-machine-learning-at-facebook-a-datacenter-infrastructure-perspective/Machine Learning at Facebook: Understanding Inference at the Edge
HPCA 2019 https://research.fb.com/publications/machine-learning-at-facebook-understanding-inference-at-the-edge/Meet Michelangelo: Uber’s Machine Learning Platform
https://eng.uber.com/michelangelo/Introducing FBLearner Flow: Facebook’s AI backbone
https://code.fb.com/core-data/introducing-fblearner-flow-facebook-s-ai-backbone/TFX: A TensorFlow-Based Production-Scale Machine LearningPlatform
http://dl.acm.org/authorize?N33328Horovod: fast and easy distributed deep learning in TensorFlow
https://arxiv.org/pdf/1802.05799v3.pdfResource Central: Understanding and Predicting Workloads for Improved Resource Management in Large Cloud Platforms
SOSP 2017 https://dl.acm.org/authorize?N47144Adaptive Execution of Continuous and Data-intensive Workflows with Machine Learning
MiddleWare 2018 https://dl.acm.org/citation.cfm?id=3274827AuTO: Scaling Deep Reinforcement Learning to Enable Datacenter-Scale Automatic Traffic Optimization
SIGCOMM 2018 https://dl.acm.org/citation.cfm?id=3230551Neural Adaptive Video Streaming with Pensieve
SIGCOMM 2017 https://dl.acm.org/citation.cfm?id=3098843Neural Adaptive Content-aware Internet Video Delivery
OSDI 2018 https://www.usenix.org/system/files/osdi18-yeo.pdfSystems for ML and Open Source Software Workshop at NeurIPS 2018
http://learningsys.org/nips18/acceptedpapers.htmlSysML 2018
http://www.sysml.cc/2018/index.htmlEngineering Dependable and Secure Machine Learning Systems 2019
https://sites.google.com/view/edsmls2019/programEngineering Dependable and Secure Machine Learning Systems 2018
https://sites.google.com/edu.haifa.ac.il/edsmls/programWorkshop on Distributed Machine Learning 2017
https://distributedml2017.wordpress.com/schedule/ML Systems Workshop at NIPS 2016
https://sites.google.com/site/mlsysnips2016/accepted-papersColumnML: Column Store Machine Learning with On The Fly Data Transformation
VLDB 2019Continuous Integration of Machine Learning Models: A Rigorous Yet Practical Treatment
SysML 2019Seer: Leveraging Big Data to Navigate the Complexity of Performance Debugging in Cloud Microservices
ASPLOS 2019RLgraph: Flexible Computation Graphs for Deep Reinforcement Learning
SysML 2019 https://arxiv.org/pdf/1810.09028.pdf转载地址:http://swuii.baihongyu.com/