LANGUAGE English
SOURCE IEEE COMMUNICATIONS LETTERS, VOL.25, NO.7, JULY2021,2338-2342
Published Date:2021-07
ABSTRACT
Considering the scheduling and allocation of tasks among multiple servers, distributed machine learning faces the problem of the straggler effect as well as system heterogeneity,
e.g., the computation time of the slowest worker can be much longer than that of the normal workers. This letter studies the distributed online tasks assignment problem under heterogeneous conditions where different workers have different computing capacities, in order to minimize the task completion time. We consider the task scheduling with random task arrivals, and introduce task cancellation after completion scheme to clear the unfinished parts after the completion of the task to further reduce redundant calculations. To address the challenge of finding the optimal solution, we propose an approximate online algorithm based on convex optimization and time recursion. Simulation results show that the proposed algorithm can reduce the completion delay by over 30% as compared with the one-shot counterpart, and maintain a relatively stable delay in the case of fluctuating arrival rates.