[Science] [BAS home] [Met home] [Beowulf home]	Antarctic Meteorology

GCOM

GCOM is the communications layer between the UM fortran code and the underlying transport layer (mpich, scali, whatever).

Important note re 2.9b5

Paul Burton says (actually said, sometime back in march or april 2003): "Actually, I'd hold off trying 2.9b5 for a bit. I've identified some problems with this fix, as not all MPI implementation (SCALI MPI being one) cope very well with 64bit integers.

I've got a fix for this, which I'm currently implementing to make a new GCOM version. Hopefully I'll have it ready for testing later today or early next week."

Err, I haven't tried the new version yet. I think Mark Webb has, and I think it worked.

I know of 4 gcom versions:

"original", that came with the UM vn4.5
2.8. An update, supposed to help some hangs by being buffered. I don't use this though others do. See http://home.badc.rl.ac.uk/iwi/um/mpi-buffering.html.
2.9b5. Version updated (by Paul Burton, UKMO, Unified Model System Development Manager) to fix problems with GC_I* routines at 64 bit
2.9b6? Improvement on b5. Must try it sometime.

The code for version 2.9b5 is available gcom2.9b5.tar.gz. My pre-built library at 64 bit by fujistsu is availble lib/libgcom_mpi_buffered_pg_64.a. A tar file of my build directory (ncluding makefiles etc) is build.tar.gz. None of these come with any guarantees. For purity, you may prefer to obtain yours via "the proper channels".

For bit comparison tests, see yabga and bit-cf.

Problems, fixed by 2.9b5

Summary

If running at 64 bit under beowulf, you *probably* need either gcom2.9b5 or my qtpos1a.upd fix. At 32 bit you probably need neither.

Details

The standard version of the UM, at least under the fujitsu(/lahey?) compiler, does not bit-compare, comparing 1x1 to 1x2 processors. This turns out to be caused by GC_ISUM in QTPOS1A, which simply doesn't work. There is a work-around to this, using GC_RSUM (which is OK becuase the integers are small): see qtpos1a.upd. Now, Paul Burton has created an improved version of gcom which is probably a rather better fix to this problem (its a better fix because its global, and also because it fixes the other GC_I* routines, which Mark Webb found caused trouble in tracer advection and/or the sulphur cycle).

As far as I know, this problem only exists at 64 bit. Paul Burton says: "The basic problem was that these GCOM routines were calling the MPI routines, indicating that the type of the data was MPI_INTEGER, which on most machines is taken to be a 32bit entity (Cray's with native 64bit INTEGERs expect this to be a 64bit INTEGER, so we'd never seen this problem here).

I've now put the MPI calls under the "I_64B" cpp switch, with MPI_INTEGER8 type being used when 64bit INTEGERs have been selected. As far as I can see, MPI_INTEGER8 isn't part of the MPI standard. However, it is defined on the T3E, SX6 and the version of MPI that Mark is using on his Beowulf cluster, so I'm fairly confident this should work on most (all?) platforms which physically support a 64bit INTEGER type."

Of course, when I did my bit-cf tests, I did them at 64-bit first, then carried over modsets as needed to 32-bit, so have not previously tried 32-bit without my fix. I have done that just now, however, and confirm that 32-bit *does* bit-cf, with earlier gcom's, and without my qtpos1a fix.

BTW, note that the GC_I* routines are little used so the problems only occur in a few places. Well, actually they are a bit more used that I suspected:

master:~/PUM_Output/vn4.5/dataw.yabga/compile.yabga$ grep -i "CALL GC_I" *.f
acumps1.f:        CALL GC_IBCAST(678,1,0,nproc,info,icode)                        
acumps1.f:        CALL GC_IBCAST(679,1,0,nproc,info,icode)                        
atmdyn1.f:      CALL GC_IMAX(1,N_PROCS,info,int_log)                              
atmdyn1.f:      CALL GC_IMAX(1,N_PROCS,info,int_log)                              
atmdyn1.f:         CALL GC_IMAX(1,N_PROCS,info,ICODE)                             
dosums1.f:      CALL GC_IMIN(1,nproc,info,GLOBAL_START)                           
dosums1.f:      CALL GC_IMAX(1,nproc,info,GLOBAL_END)                             
exitchk1.f:      CALL GC_IBCAST(1,1,0,nproc,info,end_run)                          
filter1a.f:      CALL GC_IMAX(1,nproc,info,max_field_length)                       
genintf1.f:            call gc_ibcast(450,1,0,nproc,info,ierr)                     
genintf1.f:            call gc_ibcast(450,1,0,nproc,info,ierr)                     
initdum1.f:      CALL GC_IBCAST(666,1,0,nproc,info,ICODE)                          
meandia2.f:              CALL GC_ISUM(1, nproc, info, icode)                       
meandia2.f:              CALL GC_IBCAST(999,1,0,nproc,info,ICODE)                  
meanps1.f:        CALL GC_IBCAST(679,1,0,nproc,info,icode)                        
popen1a.f:      CALL GC_IBCAST(1,1,0,nproc,info,ERR)                              
pthadj1a.f:      CALL GC_IMAX(1,N_PROCS,info,ERROR_CODE)                           
qtpos1a.f:      CALL GC_IMAX(1,n_procs,info,ERROR_CODE)                           
qtpos1a.f:      CALL GC_ISUM(Q_LEVELS,N_PROCS,info,int_FAILURE)                  
rad_ctl1.f:            CALL GC_IMAX(1,NPROC,INFO,GLOBAL_CLOUD_TOP)                 
rdlsm1a.f:      CALL GC_IBCAST(100,glsize(1)*glsize(2),0,nproc,info,              
rdmult1a.f:      CALL GC_IBCAST(333,2,0,nproc,info,io_ret_codes)                   
rdmult1a.f:        CALL GC_IMAX(1,nproc,info,pstar_const)                          
setfil1a.f:      CALL GC_IMAX(1,nproc,info,NORTHERN_FILTERED_P_ROW)                
setfil1a.f:      CALL GC_IMIN(1,nproc,info,SOUTHERN_FILTERED_P_ROW)                
setfil1a.f:        CALL GC_IBCAST(I,g_blsizep(2,I),I,nproc,info,   
stwork1a.f:          CALL GC_IBCAST(101,1,0,nproc,info,ICODE)                      
stwork1a.f:          CALL GC_IBCAST(101,1,0,nproc,info,ICODE)                      
timer3a.f:        CALL GC_IBCAST(3213,1,0,nproc,info,summ_n_timers)               
u_model1.f:        call gc_ibcast (458,1,0,nproc,info,iostatus)                    
u_model1.f:              call gc_ibcast (458,1,0,nproc,info,iostatus)

Quite how the other calls work out OK I don't know. But most of those are GC_IBCAST's.

Past last modified: 13/6/2003 / wmc@bas.ac.uk