(1) There are no issues writing large files on either Windows or Linux when serial CGNS used
(2) When parallel CGNS is used, there appears to be a limit on the size of a single array that can be written. For Windows the limit appears to be 4Gb. For Linux the limit appears to be 2Gb. The overall file size can be larger than 4Gb, provided no single array written is >4Gb (windows) or 2Gb (Linux)
So, in summary, the issue is with writing large single arrays on both platforms when using the parallel library. This is happening even when using cgp_open with a serial communicator.
Since the CFD solution we are writing is written across a number of separate arrays (one for velocity, another the mesh etc etc) we have not hit this restriction yet, but it could happen with very large meshes, so this needs to be fixed.
How did I come to this conclusion?
I used the attached Large.cpp file to write >4Gb files on windows and Linux. This only uses the serial cg_open call and writes a single large array. This works fine on both platforms.
Our existing Forte simulator is using parallel CGNS 3.3.0 and I used it to write a 20Gb file on windows when running a real engine simulation. However, although the file is 20Gb, no single array is >=4Gb.
To determine the limit for writing single arrays in parallel, I modified my test code to include an option to write multiple arrays (attached). The first arg is the size of each array and the second the number of arrays. Running with 4 MPI processes and args 2 10 - i.e. write 10 arrays each of size 2Gb, leading to a 20Gb file, works on Windows. This code writes serially using cgp_open and a serial communicator and then repeats the same write in parallel. However, as soon as each single array is >=4Gb, the write fails. On Linux, the limit is 2Gb.
mpiexec -localonly -n 2 CGNSLarge_sections.exe 4 2
Large CGNS file test program, size=4Gb, written in 2 slices
Open in serial
Now write the data......1......2......done
cgio_write_all_data:H5Dwrite:write to node data failed
mpirun -n 2 CGNSLarge_sections_new 2 1
Large CGNS file test program, size=2Gb, written in 1 slices
Open in serial
Now write the data......1......done
File written successfully
File closed successfully
Re-open the file.....done
...1...268434944 2.68435e+08 <> 0
With current CGNS develop branch and HDF5 1.13, this issue appears in better shape on linux. While running the provided c++ code the described 2Gb writing limit is not seen in parallel.
This can be attributed to HDF5 1.10.2 release overcoming the MPI-IO limitation (https://www.hdfgroup.org/2018/04/why-should-i-care-about-the-hdf5-1-10-2-release/)
The serial part of the provided test case does not work and reach a 2Gb limitation when reading. (Maybe it’s due to only first rank that is doing IO while HDF5 is expecting all ranks to do some IO ? Setting cgp_pio_mode to CGP_INDEPENDENT may help)