Coded Computation Across Shared Heterogeneous Works With Communication Delay


SOURCE IEEE Trans. on Signal Processing, 70: 3371-3385, 2022

Published Date:2022-07


Distributed computing enables large-scale computation tasks to be processed by multiple workers in parallel. However, the randomness of communication and computation delays across the workers causes the straggler effect, which may degrade the delay performance. Coded computation helps to mitigate the straggler effect, but the amount of redundant load and task assignment to the workers should be carefully optimized. In this work, we consider a multi-master heterogeneous-worker distributed computing scenario, where multiple matrix multiplication tasks are encoded and allocated to the workers with different computing capabilities. The goal is to minimize the communication plus computation delay of all the tasks. We propose joint worker assignment, resource allocation and load allocation algorithms under both dedicated and fractional worker assignment policies, where each worker can process the encoded tasks from either a single master or multiple masters, respectively. Then, the non-convex delay minimization problem is solved by employing the Markov’s inequality-based approximation, Karush-Kuhn-Tucker conditions, and successive convex approximation methods. Through extensive simulations, we show that the proposed algorithms can reduce the task completion delay compared to the benchmarks.

This entry was posted in Publications and tagged . Bookmark the permalink.

Leave a Reply