Engaged CPU capacity by OOFEM multi-threading

Even with multi-threading enabled compilation (USE_OPENMP CMake option), mostly a limited CPU capacity is engaged by OOFEM.

OOFEM uses one CPU core with the default solver factorization method

One reason could be the default Skyline solver and the implemented Crout matrix factorization method. Skyline storage scheme does not eliminate all the zeroes in the matrix and the matrix factorization used in the code is a sequential method that cannot do the factorization using parallel computation. This is a big deal with the methods such as RBSM that have a larger number of degrees of freedom but also more number of zero elements in the stiffness matrix. The inability of the solver to perform parallel computation and the lack of a storage scheme that can eliminate all zeros will be missing one bird with two stones.

The easiest solution could be simply using the OneMKL solver in the company of Intel compiler. OneMKL uses the CSC storage scheme which will reduce the used memory, as well as high-performance parallel methods, which will improve the engaged CPU capacity in a parallel analysis. The OOFEM already has the classic Intel solver implemented in the code and with some modifications, it is possible to take advantage of this ability with the new OneAPI package.

To modify the default solver use the correct lstype & smtype keyword values (default is zero) after the solver type keyword in the input file. The targeted solver libraries should be enabled at the time of compilation with the correct CMake options (if you are using CLion see this post which shows how to enable the DSS solver as an example for modifying CLion options).

OOFEM input manual instructions for changing the solver and matrix storage scheme (for example lstype 6 for MKL solver and smtype 2 for CSC – or CompCol as OOFEM refers to – storage scheme)

For the OORBS repository, this improvement had a huge positive impact on the computation efficiency.

OORBS with OneMKL solver and compiled with Intel compiler is using the full capacity of an 8-core CPU of a Mac Pro machine

Leave a comment

Design a site like this with WordPress.com
Get started