Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

  • Software that brings its own copy of Intel MPI may fail on startup with an error similar to:

    Fatal error in MPI_Init: Other MPI error, error stack:
    MPIR_Init_thread(805): fail failed
    MPID_Init(1743)......: channel initialization failed
    MPID_Init(2137)......: PMI_Init returned -1
    • Reason: Within compute jobs, we automatically set the I_MPI_PMI_LIBRARY environment variable to allow running MPI jobs using srun with Intel MPI. However, codes that bring their own copy of Intel MPI usually use a different startup mechanism (e.g., mpirun) which fails when this variable is set.

    • Solution: Unset the variable in your job script using unset I_MPI_PMI_LIBRARY.

    • When you can call your software with srun, you can resolve the same type of error easier by adding --mpi=pmi2 to the invocation, e.g. srun -n 3 --mpi=pmi2 ./mw_fpga

  • In an interactive SLURM Job running your application / binary under mpirun / mpiexec might just hang (indefinitely). To fix this, set `I_MPI_HYDRA_BOOTSTRAP=ssh`.

  • No labels