Seminar #2 - Alexander Renz-Wieland - NuPS (SIGMOD'22)

Abstract

To keep up with increasing dataset sizes and model complexity, distributed training has become a necessity for large machine learning tasks. Parameter servers (PSs) ease the implementation of distributed parameter management, but can induce severe communication overhead. In some cases, distributed performance may even fall behind that of single node baselines. In this talk, I present our work on making PSs more communication-efficient by adapting to the underlying workload. I discuss (i) how PSs can adapt parameter allocation dynamically to exploit access locality, (ii) how PSs can tailor their management techniques to individual parameters, and (iii) how PSs can do this adaptation automatically, without requiring tuning. Experimentally, such adaptation drastically improved PS efficiency, resulting in near-linear speed-ups over single node baselines.

Date
Jul 21, 2022 10:30 AM — 11:30 AM
Location
Huxley 315 - Imperial College London

Speaker Bio

Alexander Renz-Wieland is a PhD student working on large-scale machine learning student in the Database Systems and Information Management (DIMA) group at Technische Universität Berlin since September 2017. He is supervised by Volker Markl and Rainer Gemulla (Universität Mannheim). Prior to his PhD, he completed a M. Sc. Business Informatics with a specialization in Data and Web Science at Universität Mannheim, with a semester at VU Amsterdam (with courses from CWI). In his Master’s thesis, he worked on scalable sequential pattern mining.