MPiSC

Automatically Generated PDF Documents

This site hosts the compiled PDF files from the Typst project source files of MPiSC.


đź“„ Available Documents

Document Last Content Update View Download
Title: Slot-queue - An optimized wait-free distributed MPSC 2025-03-13 📕 View ⬇️ PDF
Title: Modified LTQueue without Load-Link/Store-Conditional 2025-03-10 📕 View ⬇️ PDF
Title: Studying and developing nonblocking distributed MPSC queues 2025-03-11 📕 View ⬇️ PDF

đź“ť Project Information

Objective

Dimension Desired property
Queue length Fixed length
Number of producers Many
Number of consumers One
Operations queue, enqueue
Concurrency Concurrent & Lock-free

Motivation

Approach

The porting approach we choose is to use MPI-3 RMA to port lock-free queue algorithms. We further optimize these ports using MPI SHM (or the so called MPI+MPI hybrid approach) and C++11 for shared memory synchronization.

Why MPI RMA? MPSC belongs to the class of irregular applications, this means that: In other words, we cannot statically analyze where the data may be stored - data can be stored anywhere and we can only determine its location at runtime. This means the tradition message passing interface using MPI_Send and MPI_Recv is insufficient: Suppose at runtime, process A wants and knows to access a piece of data at B, then A must issue MPI_Recv(B), but this requires B to anticipate that it should issue MPI_Send(A, data) and know that which data A actually wants. The latter issue can be worked around by having A issue MPI_Send(B, data_descriptor) first. Then, B must have waited for MPI_Recv(A). However, because the memory access pattern is not known, B must anticipate that any other processes may want to access its data. It's possible but cumbersome. MPI RMA is specifically designed to conveniently express irregular applications by having one side specify all it wants.
Why MPI-3 RMA? (paper) MPI-3 improves the RMA API, providing the non-collective MPI_Win_lock_all for a process to open an access epoch on a group of processes. This allows for lock-free synchronization.
Hybrid MPI+MPI (paper) The Pure MPI approach is oblivious to the fact that some MPI processes are on the same node, which causes some unnecessary overhead. MPI-3 introduces the MPI SHM API, allowing us to obtain a communicator containing processes on a single node. From this communicator, we can allocate a shared memory window using MPI_Win_allocate_shared. Hybrid MPI+MPI means that MPI is used for both intra-node and inter-node communication. This shared memory window follows the unified memory model and can be synchronized both using MPI facilities or any other alternatives. Hybrid MPI+MPI can take advantage of the many cores of current computer processors.
Hybrid MPI+MPI+C++11 (paper) Within the shared memory window, C++11 synchronization facilities can be used and prove to be much more efficient than MPI. So incorporating C++11 can be thought of as an optimization step for intra-node communication.
How to perform an MPI port in a lock-free manner? With MPI-3 RMA capabilities: This is made clear in [MPI3-RMA](/MPiSC/references/MPI3-RMA/).

Literature review

Known problems

Evaluation strategy

We need to evaluate at least 3 levels:

Correctness

Performance

Lock-freedom

Caution - Lock-freedom of dependencies A lock-free algorithm often assumes that some synchronization primitive is lock-free. This depends on the target platform and during implementation, the library used. Care must be taken to avoid accidental non-lock-free operation usage.

Scalability


Last build: Thu Mar 13 13:57:56 UTC 2025
Generated by GitHub Actions • View Source