The rapid development of AI and its growing range of applications have led to an increasing number of service requests being processed on cloud and edge servers. However, many AI-related tasks are computationally intensive and require substantial computing, communication, and storage resources, while still being expected to return results to users within a limited response time. This creates significant system-level challenges, especially bottlenecks caused by the intrinsic complexity of AI workloads, limited edge and cloud resources, network constraints, and user quality-of-service expectations. These bottlenecks can negatively affect latency, throughput, reliability, and overall service performance. To address these challenges, this project investigates new mechanisms and algorithms for improving the quality of service in cloud–edge systems supporting AI applications. The work will explore techniques such as deep learning, deep reinforcement learning, queueing theory, and data-driven analysis to optimize task placement, resource allocation, and service scheduling under dynamic workload and network conditions.