SIONlib: Scalable I/O library for parallel access to task-local files

Templates for parallel I/O with SIONlib
Sample parallel MPI program for I/O with SIONlib
Serial program for SIONlib file access
Other examples

Templates for parallel I/O with sionlib

All templates can also be found in the distribution package under the PATH: sionlib/examples

Parallel write:
sid=sion_paropen_mpi( ... ,chunksize, comm, &fileptr, ...)  # collective

loop: { 
        sion_ensure_free_space(sid,nbytes);                 # non-collective
        fwrite(data,1,nbytes,fileptr)
      }

sion_parclose_mpi(sid);                                     # collective

- the call to sion_ensure_free_space can be omitted, if it is guaranteed that not more bytes as specified in chunksize are written to the file.
- chunksize can be different on different tasks
- if the data does not fit in the current chunk, sion_ensure_free_space would assign a new chunk in the file to the task and also advance the file pointer to the new position. This operation is non-collective; all information about the locally used chunks and the number of bytes written to the chunks are buffered in memory and stored in the sion file in the collective sion_parclose_mpi.
- one parameter of the open function is an MPI communicator, which allows opening a sion file from a subset of all MPI tasks


Parallel read:
sid=sion_paropen_mpi( ... ,&chunksize, comm, &fileptr, ...) # collective

while((!sion_feof(sid))) {                                  # non-collective
      btoread=sion_bytes_avail_in_block(sid);               # non-collective
      bread=fread(localbuffer,1,btoread,fileptr);     
}             

sion_parclose_mpi(sid);                                     # collective

- it must be ensured that the number of bytes read from a chunk is not larger as the number of bytes written to it. For this the function sion_bytes_avail_in_block provides this number for the current chunk.

- if all bytes of a chunk are already read and there are more chunks available for this task, sion_feof will advance the filepointer to the start position of the next chunk in the sion file

- sion_paropen_mpi is collective. The meta data will read only by one task and broadcasted to all other tasks. It is also possible to open/close the sion file on each with serial sion functions without collective operations, see function sion_open_rank.


Parallel read without collective operation:
sid=sion_open_rank( ... ,&chunksize, rank, &fileptr, ...)   # non-collective

while((!sion_feof(sid))) {                                  # non-collective
      btoread=sion_bytes_avail_in_block(sid);               # non-collective
      bread=fread(localbuffer,1,btoread,fileptr);     
}             

sion_close(sid);                                            # non-collective

- the meta-information of the sion file will be read from each task. This is a parallel access to same filesystem blocks, however this are only read accesses which should not be lock the filesystem block.

top of page

Sample parallel MPI program for I/O with SIONlib

This simple parallel program can be found under the PATH: sionlib/examples/simple
This directory contain also Makefiles to build the program.
...
#include "sion.h"
#define FNAMELEN 255
#define BUFSIZE (1024*1024)

int main(int argc, char **argv)
{

  int            rank, size, globalrank, sid, i, numFiles;
  char          fname[FNAMELEN], *newfname=NULL;
  int           ;
  MPI_Comm      gComm, lComm;
  sion_int64    chunksize,left;
  sion_int32    fsblksize;
  size_t        btoread, bread, bwrote;
  char          *localbuffer;
  FILE          *fileptr;

  MPI_Init(&argc, &argv);
  MPI_Comm_size(MPI_COMM_WORLD, &size);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);

  /* allocate and initalize a buffer  */
  localbuffer = (char *) malloc(BUFSIZE);
  srand(time(NULL));
  for (i = 0; i < BUFSIZE; i++) localbuffer[i] = (char) rand() % 256;

  /* inital parameters */
  strcpy(fname, "parfile.sion");
  numFiles   = 1;  
  gComm=lComm= MPI_COMM_WORLD;  
  chunksize  = 10*1024*1024;  
  fsblksize  = 1*1024*1024; 
  globalrank = rank;

  /* write */
  sid = sion_paropen_mpi(fname, "bw", &numFiles, gComm, &lComm, 
                         &chunksize, &fsblksize, &globalrank, &fileptr, &newfname);
  left=BUFSIZE;
  while (left > 0) {
    sion_ensure_free_space(sid, left);
    bwrote = fwrite(localbuffer, 1, left, fileptr);
    left -= bwrote;
  }
  sion_parclose_mpi(sid);  

  printf("Task %02d: wrote sionfile -> %s\n",rank,newfname);


  /* read */
  sid=sion_paropen_mpi(fname,"br",&numFiles,MPI_COMM_WORLD,&lComm,
                       &chunksize,&fsblksize, &globalrank, &fileptr, &newfname);
  while((!sion_feof(sid))) { 
    btoread=sion_bytes_avail_in_block(sid);         
    bread=fread(localbuffer,1,btoread,fileptr);     
  }             
  sion_parclose_mpi(sid);  

  printf("Task %02d: read sionfile -> %s\n",rank,newfname);

  MPI_Finalize();

  return (0);
}
top of page

Serial program for SIONlib file access

Templates for accessing SION-files from a serial program are shown below. The templates can also be found in the distribution package under the PATH: sionlib/examples


Serial write:
sid=sion_open( ...,chunksize, &fileptr)

rank_loop: {
    sion_seek(sid,rank,SION_CURRENT_BLK,SION_CURRENT_POS);

    sion_ensure_free_space(sid,nbytes);
    fwrite(...,fileptr)
}
sion_close(id);

Serial read:
sid=sion_open( ...,chunksize, &fileptr)                   
        
sion_get_locations(sid,&size,&blocks,&globalskip,&start_of_varheader,
                       &sion_localsizes,&sion_globalranks,
                       &sion_chunkcount,&sion_chunksizes);

loop: { 
   sion_seek(sid,rank,blknr,pos);
         
   fread(...,fileptr) 
}
sion_close(id);                                

- access to all ranks and chunks is possible (sion_seek)
- sion_get_locations returns pointers to internal fields, containing the number of chunks written by each task (sion_chunkcount) and their sizes (sion_chunksizes)

top of page

Other examples

The example directory also contains in sub-directory ./examples/pepc an example application dependant converter program for the parallel simulation application PEPC (Multi-Purpose Parallel Tree-Code, http://www.fz-juelich.de/jsc/pepc/). This serial program converts data files between the native PEPC ASCII format and a binary format using sionfiles and can be used to build converter programs for own applications.

top of page