[Science] [BAS home] [Met home] [Beowulf home] | Antarctic Meteorology |
/proc/cpuinfo (bslpsdl: dual opteron) | /proc/cpuinfo (bslcene: newer quad opteron)
They were setup by JPR in may 2004; I couldn't immeadiately make the model work and left it; now (aug) I am returning to it.
BTW, g95 appears to work but gfortran won't compile it: compiler errors, doesn't know about "loc".
Timings: 2 proc cene g95:
-rwxrwxrwx 1 wmc icd 930192 Jul 30 08:10 /data/beowulf4/wmc/archive/yadta/yadtaa.psk7mam.pp -rwxrwxrwx 1 wmc icd 930192 Jul 30 11:53 /data/beowulf4/wmc/archive/yadta/yadtaa.psk7jja.pp -rwxrwxrwx 1 wmc icd 930192 Jul 30 15:32 /data/beowulf4/wmc/archive/yadta/yadtaa.psk7son.pp -rwxrwxrwx 1 wmc icd 930192 Jul 30 19:11 /data/beowulf4/wmc/archive/yadta/yadtaa.psk8djf.pp3:40 per season = 16:20 per year which is slower than 4 proc. The CPU stats show something interesting, which is a lot of CPU time spend in sys - maybe 20% per proc on 4 proc.
I'm currently trying 3 proc. It seems (from the first month) to be going at 1 month in a bit less than one hour. This is about the same speed as 4 proc. Update: running at about 11:24 mins per year. About same as 4-proc. Good.
Black is the g95 opteron run. Blue is the old athlon/fujitsu run, which I trust. Red is the opteron/pgi run, which (as you see) drifts upwards.
Its too soon to say that the run is absolutely stable, but its looking good so far.
This pic shows annual mean T, broken down by latitude zone:
export F90=pgf90 export F90FLAGS="-r8 -i8" export GM_HOME=/local/gm/2.0.11 export CFLAGS="-I$GM_HOME/binary/include -I$GM_HOME/include" ./configure -fc=pgf90 -fflags="-r8 -i8" --with-device=ch_gm -lib="-L$GM_HOME/binary/lib/ -L$GM_HOME/lib/ -lgm"
export F90=pgf90 export F90FLAGS="-r8 -i8" ./configure -fc=pgf90 -fflags="-r8 -i8"which gets us:
... checking for size of Fortran type integer... 8 checking for size of Fortran type real... 8 checking for size of Fortran type double precision... 8 ... checking for Fortran 77 name mangling... lower underscore ...
This then *works*.
If the model dies in TS 1 with:
0 - MPI_BSEND : Invalid rank 1 [0] Aborting program ! [0] Aborting program! p0_24036: p4_error: : 8262
Then you have built/used the wrong MPICH. The system default mpich appears to die like the above. You need one built with/for the same compiler as the model and gcom.
$ /local/data/gcm/mpich/bin/mpirun -np 4 -machinefile n1 ./test1.exe ===================================================== GCOM Version 2.9b8 Buffered MPI Using precision : 64bit INTEGERs and 64bit REALs Built at Mon Apr 19 17:33:11 BST 2004 ===================================================== 10000 4 processors started. 10001 *** GSYNC test 10002 Processor 0 is alive. 10006 *** SYNC test PASSED 10007 *** ISEND test 10008 *** ISEND test 1 PASSED 10009 *** ISEND test 2 PASSED 10010 *** ISEND test 3 PASSED 10011 *** ISEND test 4 PASSED 10012 *** IBCAST test 10013 *** IBCAST test 0 PASSED 10014 *** IBCAST test 1 PASSED 10015 *** IBCAST test 2 PASSED 10016 *** IBCAST test 3 PASSED 10017 *** IMINMAX test 10018 *** IMINMAX test PASSED 10019 *** ISUM test 10020 *** ISUM test PASSED 10021 *** GCGIBCAST test 10022 *** GCG_IBCAST test 1 PASSED 10023 *** GCG_IBCAST test 2 PASSED 10024 *** ALL_TO_ALL test 10025 *** ALL_TO_ALL test 1 PASSED 10029 *** ABORT test gc_abort (Processor 0 ): Aborted! [0] MPI Abort by user Aborting program ! [0] Aborting program! p0_4487: p4_error: : 9 10003 Processor 1 is alive. 10026 *** ALL_TO_ALL test 2 PASSED p1_4509: p4_error: net_recv read: probable EOF on socket: 1 Broken pipe 10005 Processor 3 is alive. 10028 *** ALL_TO_ALL test 4 PASSED p3_4553: p4_error: net_recv read: probable EOF on socket: 1Note: I had trouble linking it... its hard to persuade it to use *my* mpich rather than its version. Use -# (-\#) to show you what its doing. And use makefile.link (not make, using Makefile) to link it.
Past last modified: 1/8/2005 / wmc@bas.ac.uk |
© Copyright Natural Environment Research Council - British Antarctic Survey 2002 |