Reference: Byrd, G. T.; Saraiya, N. P.; & Delagi, B. Multicast Communication in Multiprocessor Systems. 1989.
Abstract: Recent high-performance multiprocessors exploit cut-through routing for unicast transmission, with packets routed as their first bytes arrive. We extend ideas considered for efficient cut-through routing in multiprocessors to include multicast, in order to benefit the many parallel programs in which producers provide each value to multiple consumers. We describe several alternative cut-through multicast protocols, including a restrictive (yet adaptive) routing scheme for deadlock avoidance. Simulations using synthetic and application-driven loads show it has significantly better performance than either multicast emulation or deadlock detection and resolution. The scheme provides cut-through multicast without requiring dedicated storage in the communication facilities for a full packet.
Full paper available as hqx.