Distributed parallel training has become the standard paradigm for large-scale deep learning, with broad applications in artificial intelligence, multi-robot systems, and the Internet of Things. Achieving efficient large-scale distributed learning requires the coordination of substantial computing resources, but faces critical challenges such as heterogeneous node capabilities and limited communication bandwidth. To maximize the utilization of both computational and communication resources, this project aims to design and evaluate distributed learning algorithms that are both communication- and computation-efficient, grounded in distributed optimization techniques. The proposed research will focus on two main directions: 1) Robust and efficient distributed machine learning under heterogeneous data and GPU resources. We will develop robust communication strategies and accurate gradient estimation methods to enable efficient parallel training of large-scale models. 2) Distributed minimax optimization for adversarial learning and generalization. By exploring perturbations on both model parameters and data samples, along with quantitative metrics for generalization, we aim to improve robustness against noise and enhance the generalization performance of trained models.