【机器学习】机器学习系统SysML 阅读表

阅读量：4086 次

发布时间：2019-05-25

本文共 12832 字，大约阅读时间需要 42 分钟。

SysML reading list

Review

A Berkeley View of Systems Challenges for AI

https://arxiv.org/pdf/1712.05855.pdf

Strategies and Principles of Distributed Machine Learning on Big Data

https://arxiv.org/abs/1512.09295

Background

Deep learning

Nature volume 521, 2015

https://www.nature.com/articles/nature14539

Deep learning reading list

http://deeplearning.net/reading-list

Measurement

Multi-tenant GPU Clusters for Deep LearningWorkloads: Analysis and Implications

https://www.microsoft.com/en-us/research/uploads/prod/2018/05/gpu_sched_tr.pdf

Frameworks

TensorFlow: A System for Large-Scale Machine Learning

OSDI 2016

https://www.usenix.org/system/files/conference/osdi16/osdi16-abadi.pdf

Ray: A Distributed Framework for Emerging AI Applications

OSDI 2018

https://www.usenix.org/system/files/osdi18-moritz.pdf

Tuning

HyperDrive: Exploring Hyperparameters with POP Scheduling

MiddleWare 2017

https://dl.acm.org/citation.cfm?id=3135994

Ease.ml: Towards Multi-tenant Resource Sharing for Machine Learning Workloads

VLDB 2018

http://www.vldb.org/pvldb/vol11/p607-li.pdf

Automating Model Search for Large Scale Machine Learning

SoCC 2015

http://dl.acm.org/authorize?N91362

Google Vizier: A Service for Black-Box Optimization

KDD 2017

https://dl.acm.org/citation.cfm?id=3098043

Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization

Journal of Machine Learning Research 18 (2018)

https://arxiv.org/pdf/1603.06560.pdf

Hyperopt: a Python library for model selection and hyperparameter optimization

Computational Science & Discovery, 8(1) 2015

http://iopscience.iop.org/article/10.1088/1749-4699/8/1/014008

Auto-Keras: Efficient Neural Architecture Search with Network Morphism

https://arxiv.org/pdf/1806.10282v2.pdf

Runtime execution

Cavs: An Efficient Runtime System for Dynamic Neural Networks

ATC 2018

https://www.usenix.org/system/files/conference/atc18/atc18-xu-shizhen.pdf

TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

OSDI 2018

https://www.usenix.org/system/files/osdi18-chen.pdf

PipeDream: Fast and Efficient Pipeline Parallel DNN Training

https://arxiv.org/pdf/1806.03377.pdf

STRADS: A Distributed Framework for Scheduled Model Parallel Machine Learning

EuroSys 2016

https://dl.acm.org/citation.cfm?id=2901331

Dynamic Control Flow in Large-Scale Machine Learning

EuroSys 2018

https://dl.acm.org/citation.cfm?id=3190551

Improving the Expressiveness of Deep Learning Frameworks with Recursion

EuroSys 2018

https://dl.acm.org/citation.cfm?id=3190530

Continuum: A Platform for Cost-Aware, Low-Latency Continual Learning

SoCC 2018

https://dl.acm.org/citation.cfm?id=3267817

KeystoneML: Optimizing Pipelines for Large-ScaleAdvanced Analytics

ICDE 2017

https://amplab.cs.berkeley.edu/wp-content/uploads/2017/01/ICDE_2017_CameraReady_475.pdf

Owl: A General-Purpose Numerical Library in OCaml

https://arxiv.org/pdf/1707.09616.pdf

Distributed learning

Large Scale Distributed Deep Networks

NIPS 2012

https://ai.google/research/pubs/pub40565.pdf

Managed Communication and Consistency for Fast Data-Parallel Iterative Analytics

SoCC 2015

http://dl.acm.org/authorize?N91363

Ako: Decentralised Deep Learning with Partial Gradient Exchange

SOCC 2016

https://lsds.doc.ic.ac.uk/sites/default/files/ako-socc16.pdf

Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters

ATC 2017

https://www.usenix.org/system/files/conference/atc17/atc17-zhang.pdf

Parameter Hub: a Rack-Scale Parameter Server for Distributed Deep Neural Network Training

SoCC 2018

https://dl.acm.org/citation.cfm?id=3267840

MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems

ML Systems Workshop at NIPS 2016

https://arxiv.org/pdf/1512.01274.pdf

Scaling Distributed Machine Learning with the Parameter Server

OSDI 2014

https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-li_mu.pdf

Project Adam: Building an Efficient and Scalable Deep Learning Training System

OSDI 2014

https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-chilimbi.pdf

Orpheus: Efficient Distributed Machine Learning via System and Algorithm Co-design

SoCC 2018

https://dl.acm.org/citation.cfm?id=3267810

Petuum: A New Platform for Distributed Machine Learning on Big Data

KDD 2015

https://arxiv.org/pdf/1312.7651.pdf

GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

https://arxiv.org/pdf/1811.06965.pdf

Serving systems and inference

DeepCPU: Serving RNN-based Deep Learning Models 10x Faster

ATC 2018

https://www.usenix.org/system/files/conference/atc18/atc18-zhang-minjia.pdf

Clipper: A Low-Latency Online Prediction Serving System

NSDI 2017

https://www.usenix.org/system/files/conference/nsdi17/nsdi17-crankshaw.pdf

Research for Practice: Prediction-Serving Systems

ACM Queue 16(1), 2018

https://queue.acm.org/detail.cfm?id=3210557

InferLine: ML Inference Pipeline Composition

https://arxiv.org/pdf/1812.01776.pdf

PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems

OSDI 2018

https://www.usenix.org/system/files/osdi18-lee.pdf

Olympian: Scheduling GPU Usage in a Deep Neural Network Model Serving System

MiddleWare 2018

https://dl.acm.org/citation.cfm?id=3274813

Low Latency RNN Inference with Cellular Batching

EuroSys 2018

https://dl.acm.org/citation.cfm?id=3190541

SERF: Efficient Scheduling for Fast Deep Neural Network Serving via Judicious Parallelism

SC 2016

https://ieeexplore.ieee.org/document/7877104

NoScope: Optimizing Neural Network Queries over Video at Scale

VLDB 2017

https://dl.acm.org/citation.cfm?id=3137664

Scheduling

Optimus: An Efficient Dynamic Resource Scheduler for Deep Learning Clusters

EuroSys 2018

https://dl.acm.org/citation.cfm?id=3190517

SLAQ: Quality-Driven Scheduling for Distributed Machine Learning

SoCC 2017

https://dl.acm.org/authorize?N46878

Proteus: agile ML elasticity through tiered reliability in dynamic resource markets

EuroSys 2017

https://dl.acm.org/citation.cfm?id=3064182

Gandiva: Introspective Cluster Scheduling for Deep Learning

OSDI 2018

https://www.usenix.org/system/files/osdi18-xiao.pdf

Topology-Aware GPU Scheduling for Learning Workloads in Cloud Environments

SC 2017

https://dl.acm.org/citation.cfm?id=3126933

Algorithmic aspects in scalable ML

Hemingway: Modeling Distributed Optimization Algorithms

ML Systems Workshop at NIPS 2016

https://arxiv.org/pdf/1702.05865.pdf

Asynchronous Methods for Deep Reinforcement Learning

ICML 2016

http://proceedings.mlr.press/v48/mniha16.pdf

Don’t Use Large Mini-Batches, Use Local SGD

https://arxiv.org/pdf/1808.07217.pdf

GeePS: Scalable deep learning on distributed GPUs with a GPU-specialized parameter server

EuroSys 2016

https://dl.acm.org/citation.cfm?id=2901323

ImageNet Training in Minutes

ICPP 2018

https://dl.acm.org/citation.cfm?id=3225069

Semantics-Preserving Parallelization of Stochastic Gradient Descent

IPDPS 2018

https://ieeexplore.ieee.org/abstract/document/8425176

HOGWILD!: A Lock-Free Approach to ParallelizingStochastic Gradient Descent

NIPS 2011

https://papers.nips.cc/paper/4390-hogwild-a-lock-free-approach-to-parallelizing-stochastic-gradient-descent.pdf

QSGD: Communication-Efficient SGD via Randomized Quantization

NIPS 2017

https://papers.nips.cc/paper/6768-qsgd-communication-efficient-sgd-via-gradient-quantization-and-encoding.pdf

Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent

NIPS 2017

https://papers.nips.cc/paper/7117-can-decentralized-algorithms-outperform-centralized-algorithms-a-case-study-for-decentralized-parallel-stochastic-gradient-descent.pdf

Slow and Stale Gradients Can Win the Race: Error-Runtime Trade-offs in Distributed SGD

AIStats 2018

https://arxiv.org/pdf/1803.01113.pdf

Probabilistic Synchronous Parallel

https://arxiv.org/pdf/1709.07772.pdf

AI Testing and Verification

DeepXplore: Automated Whitebox Testing of Deep Learning Systems

SOSP 2017

https://dl.acm.org/authorize?N47145

Programmatically Interpretable Reinforcement Learning

ICML 2018

https://arxiv.org/pdf/1804.02477.pdf

AI2: Safety and Robustness Certification of Neural Networks with Abstract Interpretation

SP 2018

https://ieeexplore.ieee.org/document/8418593

Interpretability and Explainability

“Why Should I Trust You?”Explaining the Predictions of Any Classifier

KDD 2016

https://www.kdd.org/kdd2016/papers/files/rfp0573-ribeiroA.pdf

Learning to Explain: An Information-Theoretic Perspective on Model Interpretation

ICML 2018

https://arxiv.org/pdf/1802.07814.pdf

A Unified Approach to Interpreting Model Predictions

NIPS 2017

https://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf

The Mythos of Model Interpretability

WHI 2016

https://arxiv.org/pdf/1606.03490.pdf

Model Management

MISTIQUE: A System to Store and Query Model Intermediates for Model Diagnosis

SIGMOD 2018

https://dl.acm.org/citation.cfm?id=3196934

MODELDB: A System for Machine Learning Model Management

HILDA 2016

https://mitdbg.github.io/modeldb/papers/hilda_modeldb.pdf

Model Governance: Reducing the Anarchy of Production ML

ATC 2018

https://www.usenix.org/system/files/conference/atc18/atc18-sridhar.pdf

The Missing Piece in Complex Analytics: Low Latency,Scalable Model Management and Serving with Velox

CIDR 2015

http://www.bailis.org/papers/velox-cidr2015.pdf

Bandana: Using Non-volatile Memory for Storing Deep Learning Models

SysML 2019

https://arxiv.org/abs/1811.05922

Hardware

Deep learning with limited numerical precision

ICML 2015

http://proceedings.mlr.press/v37/gupta15.pdf

In-Datacenter Performance Analysis of a Tensor Processing Unit

ISCA 2017

https://dl.acm.org/citation.cfm?id=3080246

Serving DNNs in Real Timeat Datacenter Scale with Project Brainwave

IEEE MICRO 38(2), Mar./Apr. 2018

https://ieeexplore.ieee.org/document/8344479

Security aspects

Efficient Deep Learning on Multi-Source Private Data

https://arxiv.org/pdf/1807.06689.pdf

Chiron: Privacy-preserving Machine Learning as a Service

https://arxiv.org/pdf/1803.05961.pdf

MLCapsule: Guarded Offline Deployment of Machine Learning as a Service

https://arxiv.org/pdf/1808.00590.pdf

Slalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware

https://arxiv.org/pdf/1806.03287.pdf

Privado: Practical and Secure DNN Inference

https://arxiv.org/pdf/1810.00602.pdf

ML Platforms (Applied)

Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective

HPCA 2018

https://research.fb.com/publications/applied-machine-learning-at-facebook-a-datacenter-infrastructure-perspective/

Machine Learning at Facebook: Understanding Inference at the Edge

HPCA 2019

https://research.fb.com/publications/machine-learning-at-facebook-understanding-inference-at-the-edge/

Meet Michelangelo: Uber’s Machine Learning Platform

https://eng.uber.com/michelangelo/

Introducing FBLearner Flow: Facebook’s AI backbone

https://code.fb.com/core-data/introducing-fblearner-flow-facebook-s-ai-backbone/

TFX: A TensorFlow-Based Production-Scale Machine LearningPlatform

http://dl.acm.org/authorize?N33328

Horovod: fast and easy distributed deep learning in TensorFlow

https://arxiv.org/pdf/1802.05799v3.pdf

ML for Systems

Resource Central: Understanding and Predicting Workloads for Improved Resource Management in Large Cloud Platforms

SOSP 2017

https://dl.acm.org/authorize?N47144

Adaptive Execution of Continuous and Data-intensive Workflows with Machine Learning

MiddleWare 2018

https://dl.acm.org/citation.cfm?id=3274827

AuTO: Scaling Deep Reinforcement Learning to Enable Datacenter-Scale Automatic Traffic Optimization

SIGCOMM 2018

https://dl.acm.org/citation.cfm?id=3230551

Neural Adaptive Video Streaming with Pensieve

SIGCOMM 2017

https://dl.acm.org/citation.cfm?id=3098843

Neural Adaptive Content-aware Internet Video Delivery

OSDI 2018

https://www.usenix.org/system/files/osdi18-yeo.pdf

Workshops

Systems for ML and Open Source Software Workshop at NeurIPS 2018

http://learningsys.org/nips18/acceptedpapers.html

SysML 2018

http://www.sysml.cc/2018/index.html

Engineering Dependable and Secure Machine Learning Systems 2019

https://sites.google.com/view/edsmls2019/program

Engineering Dependable and Secure Machine Learning Systems 2018

https://sites.google.com/edu.haifa.ac.il/edsmls/program

Workshop on Distributed Machine Learning 2017

https://distributedml2017.wordpress.com/schedule/

ML Systems Workshop at NIPS 2016

https://sites.google.com/site/mlsysnips2016/accepted-papers

Upcoming 2019

ColumnML: Column Store Machine Learning with On The Fly Data Transformation

VLDB 2019

Continuous Integration of Machine Learning Models: A Rigorous Yet Practical Treatment

SysML 2019

Seer: Leveraging Big Data to Navigate the Complexity of Performance Debugging in Cloud Microservices

ASPLOS 2019

RLgraph: Flexible Computation Graphs for Deep Reinforcement Learning

SysML 2019

https://arxiv.org/pdf/1810.09028.pdf

转载地址：http://swuii.baihongyu.com/

你可能感兴趣的文章

LeetCode 887.鸡蛋掉落（C++）

查看>>

Dijkstra‘s algorithm (C++)

查看>>

奇异值分解(SVD)的原理详解及推导

查看>>

算法数据结构思维导图学习系列（1）- 数据结构 8种数据结构数组（Array）链表（Linked List）队列（Queue）栈（Stack）树（Tree）散列表（Hash）堆（Heap）图

查看>>

求LCA最近公共祖先的离线Tarjan算法_C++

查看>>

Leetcode 834. 树中距离之和 C++

查看>>

【机器学习】机器学习系统SysML 阅读表

查看>>

最小费用最大流修改的dijkstra + Ford-Fulksonff算法

查看>>

最小费用流 Bellman-Ford与Dijkstra 模板

查看>>

实现高性能纠删码引擎 | 纠删码技术详解（下）

查看>>

scala(1)----windows环境下安装scala以及idea开发环境下配置scala

查看>>

zookeeper（3）---zookeeper API的简单使用(增删改查操作)

查看>>

zookeeper(4)---监听器Watcher

查看>>

zookeeper(2)---shell操作

查看>>

mapReduce(3)---入门示例WordCount