MPI Collectives

Building distributed collective algorithms from scratch with MPI

Implemented various distributed systems primitives on a High Performance Computing cluster in C. This includes commonly used algorithms in large scale ML model training like: Broadcast, Reduce, Scatter, Gather, All-Reduce, and more. Ran various performance tests on a massive supercomputing system and optimized for ultra low-latency.