OpenSees using only one node on cluster

This forum is for issues related to parallel processing
and OpenSees using the new interpreters OpenSeesSP and OpenSeesMP

Moderator: selimgunay

Post Reply
ozgura
Posts: 36
Joined: Mon Apr 19, 2010 9:46 am
Location: Virginia Tech

OpenSees using only one node on cluster

Post by ozgura »

Hello,

I am running parametric analyses using OpenSeesMP on a cluster. In our cluster each node has 8 processors. In the qsub file, I ask for 5 nodes and 8 processors so a total of 40 (#PBS -lnodes=5:ppn=8). I just realized that OpenSees is only using one node though. OpenSees reads the total CPU right as [getNP] is gives 40. However, it tries to run all these 40 processes on one node by using 20% CPU for each process. Thus, 20%*40=8 CPU. Thus while 1 node has a very high load in it, the other 4 nodes are idle.

Any ideas?

Thank you
Ozgur
Ozgur Atlayan
Virginia Tech
fmk
Site Admin
Posts: 5884
Joined: Fri Jun 11, 2004 2:33 pm
Location: UC Berkeley
Contact:

Re: OpenSees using only one node on cluster

Post by fmk »

poist the parallel part of the script, i.e. how you divide the workload
ozgura
Posts: 36
Joined: Mon Apr 19, 2010 9:46 am
Location: Virginia Tech

Re: OpenSees using only one node on cluster

Post by ozgura »

Hi Frank,

We are distributing the tasks as follows:
####################################################

set pid [getPID]
set npp [getNP]


# Define variables common to all analyses


set count 0 ; # Analysis Counting Variable

foreach <ground motion and associated scaling factor>

if {[expr $count % $npp] == $pid} {

puts “Now running analysis $count on Process $pid”

# Define analysis, model, recorders, etc. and perform unique analysis associated with the count number


} ; # end if
} ; # end foreach

#####################################################

As I mentioned in the previous message, we checked the output.log to see that we were getting the same number of “pid’s” as are specified in “npp”, i.e., the output printed to stdout indicates that OpenseesMP is using the total number of processes that we are requesting, though our machines indicate that we are only using a fraction.

Please let me know if this information isnt enough. I can post the real script or send the file to your email.
Thank you.
Ozgur Atlayan
Virginia Tech
fmk
Site Admin
Posts: 5884
Joined: Fri Jun 11, 2004 2:33 pm
Location: UC Berkeley
Contact:

Re: OpenSees using only one node on cluster

Post by fmk »

the problem could be an mpi one .. what is the operating system?
ozgura
Posts: 36
Joined: Mon Apr 19, 2010 9:46 am
Location: Virginia Tech

Re: OpenSees using only one node on cluster

Post by ozgura »

Frank,

I talked to our supercomputer staff, and they told me that MPI has been tested many times and is currently in use by many people. We are running OpenSUSE 11.1 and OpenSees was compiled with OpenMPI-1.3.3. As an example, the following hello world script prints more than one hostname.

#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv)
{
int rank;
char hostname[256];

MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
gethostname (hostname,255);
printf("Hello world! I am process number: %d on host %s\n", rank,
hostname);
MPI_Finalize();

return EXIT_SUCCESS;
}


Thanks,
Ozgur Atlayan
Virginia Tech
fmk
Site Admin
Posts: 5884
Joined: Fri Jun 11, 2004 2:33 pm
Location: UC Berkeley
Contact:

Re: OpenSees using only one node on cluster

Post by fmk »

are you writing to the same files or are you doing a lot of puts to the screen?
ozgura
Posts: 36
Joined: Mon Apr 19, 2010 9:46 am
Location: Virginia Tech

Re: OpenSees using only one node on cluster

Post by ozgura »

Hi Frank,

It looks like the problem was due to OpenMPI version 1.3.3. VT supercomputer staff says that they resolved the problem using OpenMPI versions 1.4.3 (GNU) and 1.4.2(intel). So we tried to build OpenSees again (with latest source code,revision 4541) using newer version of OpenMPI but got an error message. I will post it in a new subject.
Thank you.
Ozgur Atlayan
Virginia Tech
Post Reply