Openmpi issues for large meshes fails

Description

From CGNStalk:
I'm currently getting the error on a call to cg_section_write:
cgio_write_all_data:H5Dwrite:write to node data failed
when writing 137M tet cells in serial (so 137M * 4 = ~548M array of size_t's). The same code works perfectly well with smaller meshes.

I'm using CGNS 3.4.0 compiled with --enable-64bit and --enable-lfs, with HDF5 1.10.5 (which I believe by default compiles with large file support, although i shouldn't be hitting that limit anyway.)

This code seems pretty benign – are there any related known issues, or a suggested way to go about debugging this?

Environment

None

Activity

Show:
Scot Breitenfeld
October 1, 2019, 3:36 PM

have a reproducer for this issue which also fails with OpenMPI 4.0. Perhaps this could be added to the test database (assuming it is the same problem Mark has encountered). I'm curious if you will be able to reproduce this behavior on your end because it has been kicking around in the back of my mind for a while.



Thanks,

Scot Breitenfeld
October 1, 2019, 4:38 PM

Added to my fork’s parallel make test.

Confirmed it passes with MPICH, fails with open mpi.

Given the correct cgns file, the read portion of the test works, so the written data is wrong,

dataset: </Unstructured3D/domain0/Elements 3D/ElementConnectivity/ data> and </Unstructured3D/domain0/Elements 3D/ElementConnectivity/ data>
155077563 differences found

 

 

 

Scot Breitenfeld
October 3, 2019, 6:28 PM

I can reproduce a similar error with mpich for independent IO. I don’t think this issue is related to the original report.

Mickael PHILIT
April 10, 2020, 7:57 PM
Edited

This testcase tries to write a more than 2Gb array with a serial API function while being in a parallel context. Indeed when setting the test with 311 as parameter the Element connectivity is 311^3 * (8+1)*sizeof(cgsize_t) wide. cg_poly_elements_write is a serial API so when writing each partition to the disk it will first read the connectivity, copy them and add the new one and then write back to the file. Thus if one tries to write a very big mesh it could overflow or at least be very slow. To solve the problem on the long run a cgp_poly_elements_write that is parallel aware might be needed. A temporary solution is to use the existing parallel api for arrays :

  • first do a partial section write

  • then write ElementStartOffsets in parallel with cgp_array_write and cgp_array_write_data

  • finally write Elements in parallel with cgp_array_write

Mickael PHILIT
April 11, 2020, 5:02 PM
Edited

To bypass this writting issue one can test the Experimental cgp_poly* branch that provides cgp_poly_section_write and cgp_poly_elements_write. If people find it useful, maybe this interface can be refined and ends in an official release.

Replacing cgp_section_write by cgp_poly_section_write just requires a new argument that is the full size of the connectivity (also equal to maximum offset written at the end of ElementStartOffset).

Replacing cg_poly_element_partial_write by cgp_poly_element_write is direct. The only things to notice is that the ElementStartOffset should be global while the serial API can take either global or local offset since it relies on the file space to determine the global offsets.

Assignee

Scot Breitenfeld

Reporter

Scot Breitenfeld

Components

Fix versions

Affects versions

Priority

Critical
Configure