Machine learning inference phase demands maximisation of throughput when used as a service. As there are variable number of layers with varied weights in a neural network, an efficient way of scheduling the layers as parallel pipelines is required. We will profile and test the performance of ML in Multi-Instance-GPU environment to extract insights for possible pipeline scheduling on GPU servers.