State-of-the-art neural networks offer powerful capabilities but can be too resource-intensive for deployment on Internet-of-Things (IoT) devices. This project proposes a split YOLO network architecture for distributed processing across resource-constrained IoT devices and servers. This approach reduces communication overhead by processing a portion of the network on the device itself, eliminating the need to send raw data to the server. To further optimize communication efficiency, the project implements weight and intermediate feature map quantization. This reduces the size of data transmitted between the device and server, enabling faster processing on the resource-limited IoT device.