Parallel Minimum Spanning Tree Library (Online)

Summary

We implemented a parallel graph-processing program for minimum spanning tree computation, focusing on workloads where the graph topology remains fixed while edge weights change across rounds. We evaluated multiple implementations, in OpenMP and CUDA, on graph snapshots with varying structural properties to understand where each approach performs well and where it encounters bottlenecks. Based on these findings, we designed a hybrid heterogeneous strategy that orchestrates the CPU and GPU implementations in parallel to hide GPU latency and match algorithmic choices to graph characteristics. On CMU GHC machines using 8 CPU cores and CUDA, our Hybrid strategy amortizes its preprocessing cost after ≈ 6 MST rounds and achieves about half the per-round time of the one-shot MP and CU strategies on our mixed sparse/dense R-MAT workload.

Project Schedule

Date	Task
Mar 29	Project structure + build system
Apr 1	Templated graph primitives with C++20 concepts
Apr 5	Benchmark graph generator with visualization
Apr 8	Baseline Kruskal + OpenMP Borůvka
Apr 8	Batch benchmark, CSR graph variant
Apr 9	Batch benchmark, atomic union-find
Apr 10	VTune benchmark, reduced contention through path halving
Apr 12	Research on reducing contraction contention
Apr 16	Sparse + dense CUDA implementation
Apr 20	Finalize all strategies + start on varying weights test
Apr 22	Graph decomposition based on subgraph density
Apr 24	Subgraph scheduling
Apr 28	Finish all benchmarks + final report progress check
Apr 30	Finish final report

✓ Completed ◯ Upcoming

Project Reports

Summary

Project Schedule