Scalable ML: Communication-efficiency, Security, and Architecture
Tue
22
Feb
Tuesday 22 February, 2022at 13:15 - 14:15
Zoom
To fully realize the benefits of deep learning, we need to design highly scalable, robust, and privacy-preserving learning algorithms along with understanding the fundamental limits of the underlying architecture, e.g., a neural network over which the learning algorithm is applied. The key algorithm underlying deep learning revolution is stochastic gradient descent (SGD), which needs to be distributed to handle enormous and possibly sensitive data distributed among multiple owners, such as hospitals and cellphones, without sharing local data. When implementing SGD on large-scale and distributed systems, communication time required to share stochastic gradients is the main performance bottleneck. In addition to communication-efficiency, robustness is highly desirable in real-world settings. We present efficient gradient compression and robust aggregation schemes to reduce communication costs and enhance security while preserving privacy. Our algorithms currently offer the highest communication-compression while still converging under regular (uncompressed) hyperparameter values. Considering the underlying architecture, one fundamental question is "How much should we overparameterize a neural network?" We present the current best scaling on the number of parameters for fully-trained shallow neural networks under standard initialization schemes.