Page 1 of 1
OpenSees using only one node on cluster
Posted: Fri May 06, 2011 6:51 am
by ozgura
Hello,
I am running parametric analyses using OpenSeesMP on a cluster. In our cluster each node has 8 processors. In the qsub file, I ask for 5 nodes and 8 processors so a total of 40 (#PBS -lnodes=5:ppn=8). I just realized that OpenSees is only using one node though. OpenSees reads the total CPU right as [getNP] is gives 40. However, it tries to run all these 40 processes on one node by using 20% CPU for each process. Thus, 20%*40=8 CPU. Thus while 1 node has a very high load in it, the other 4 nodes are idle.
Any ideas?
Thank you
Ozgur
Re: OpenSees using only one node on cluster
Posted: Fri May 06, 2011 10:21 am
by fmk
poist the parallel part of the script, i.e. how you divide the workload
Re: OpenSees using only one node on cluster
Posted: Fri May 06, 2011 10:36 pm
by ozgura
Hi Frank,
We are distributing the tasks as follows:
####################################################
set pid [getPID]
set npp [getNP]
…
# Define variables common to all analyses
…
set count 0 ; # Analysis Counting Variable
foreach <ground motion and associated scaling factor>
if {[expr $count % $npp] == $pid} {
puts “Now running analysis $count on Process $pid”
…
# Define analysis, model, recorders, etc. and perform unique analysis associated with the count number
…
} ; # end if
} ; # end foreach
#####################################################
As I mentioned in the previous message, we checked the output.log to see that we were getting the same number of “pid’s” as are specified in “npp”, i.e., the output printed to stdout indicates that OpenseesMP is using the total number of processes that we are requesting, though our machines indicate that we are only using a fraction.
Please let me know if this information isnt enough. I can post the real script or send the file to your email.
Thank you.
Re: OpenSees using only one node on cluster
Posted: Mon May 16, 2011 9:35 am
by fmk
the problem could be an mpi one .. what is the operating system?
Re: OpenSees using only one node on cluster
Posted: Wed May 18, 2011 8:31 am
by ozgura
Frank,
I talked to our supercomputer staff, and they told me that MPI has been tested many times and is currently in use by many people. We are running OpenSUSE 11.1 and OpenSees was compiled with OpenMPI-1.3.3. As an example, the following hello world script prints more than one hostname.
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
int rank;
char hostname[256];
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
gethostname (hostname,255);
printf("Hello world! I am process number: %d on host %s\n", rank,
hostname);
MPI_Finalize();
return EXIT_SUCCESS;
}
Thanks,
Re: OpenSees using only one node on cluster
Posted: Wed May 18, 2011 10:06 am
by fmk
are you writing to the same files or are you doing a lot of puts to the screen?
Re: OpenSees using only one node on cluster
Posted: Thu May 19, 2011 10:41 am
by ozgura
Hi Frank,
It looks like the problem was due to OpenMPI version 1.3.3. VT supercomputer staff says that they resolved the problem using OpenMPI versions 1.4.3 (GNU) and 1.4.2(intel). So we tried to build OpenSees again (with latest source code,revision 4541) using newer version of OpenMPI but got an error message. I will post it in a new subject.
Thank you.