Parallelization of the Nanoscale
Device Simulator nanoMOS 2.0 Using a 100 Nodes Linux Cluster
IEEE Nanotechnology Conference Aug 26-28, 2002
Washington DC
Sebastien Goasguen, Ali. R. Butt , Kevin D. Colby, Mark Lundstrom
Purdue University
Electrical Engineering Department
Outline
Problem overview
nanoMOS 2.0
The cluster and PUNCH
Parallelization
Parallelization under Matlab
MPITB and PVMTB
PVMTB basics
Parallelization of Ballistic NEGF in nanoMOS 2.0
Parallelization of a detailed scattering model
nanoMOS 2.0
silicon dioxide silicon dioxide
Gate
Gate
drain source
SiO2
L = 10 nm
energy--->
nanoMOS 2.0
• 2D simulation of nanoscale silicon SOI MOSFETs
-NEGF
-BTE (ballistic) -drift-diffusion -energy transport -density gradient -effective potential
• Written in Matlab
• Parallelized for a Linux cluster
• > 180 downloads via PUNCH
www.nanohub.purdue.edu
Software applications
PUNCH
workstations
middleware
web enabling
-network operating system -logical user accounts
-virtual file system
-resource management system
www.nanohub.purdue.edu
The cluster
•200 processors
•130 GFLOPS 2x1.2 GHz Athlon CPU, 1 GB RAM per node
Integrating a cluster with PUNCH
The user does not know that the job is being run on a cluster.
Install a compute Backend on the cluster
Jobs are scheduled to the cluster Backend through the ACTyP.
The backend submit the jobs to the PBS (Portable Batch System) of the cluster.
PUNCH can also interface with Condor (Job migration)
Backends could be installed on remote clusters and PUNCH jobs could be submitted through the PBS queue.
PUNCH runs jobs on the cluster !
•PBS monitors jobs on the cluster.
•Punch000 runs a job through PBS !!!
Parallelization
Code parallelization is done through message passing between processors
Message passing is implemented by MPI (message passing interface) or PVM (parallel virtual machine)
Break your problem in small problems. The problems are linked with MPI or PVM to get the full solution.
PVM home page (links and tutorial)
http://www.csm.ornl.gov/pvm
PVM home page at netlib (source code)
http://www.netlib.org/pvm3/
MPI home page
http://www.mpi-forum.org
Outline
Problem overview
nanoMOS 2.0
The cluster and PUNCH
Parallelization
Parallelization under Matlab
MPITB and PVMTB
PVMTB basics
Parallelization of Ballistic NEGF in nanoMOS 2.0
Parallelization of a detailed scattering model
Parallel programming under Matlab
Matlab is not parallel ! But tools are available for
parallel computation on shared memory machines or distributed memory machines…(Matlab central file exchange)
Javier Fernandez Baldomero from the University of Granada/Spain has written PVMTB and MPITB.
C/C++, Mex functions bindings of MPI or PVM routines.
The best solution for PC-clusters, easy to use and a very good tool to learn parallelization under Matlab environment.
Production codes will be written with MPI or PVM !
Sending and receiving data with PVM
Buffer initialization needs to be done before packing anything
pvm_initsend(PLACE)
Packing can be done in various ways but matlab matrices are packed with
pvm_pack(A)
Sending can be directed a one particular matlab child or at all of them.
pvm_send(tid,TAG) or pvm_mcast(tids,TAG)
Sending and receiving data
Parent Children
A=eye(4); parent=pvm_parent;
pvm_initsend(PLACE) tid=pvm_mytid;
pvm_pack(A)
pvm_mcast(tids,TAG)
pvm_recv(parent,TAG) pvm_unpack(‘A’)
A=A+tid; %dummy operation
pvm_initsend(PLACE) pvm_pack(A)
pvm_send(parent,TAG) For numt=1:(nhost-1)
pvm_recv(tids(numt),TAG) pvm_unpack(‘A_new’) A_new
end
Outline
Problem overview
nanoMOS 2.0
The cluster
Parallelization
Parallelization under Matlab
MPITB and PVMTB
PVMTB basics
Parallelization of Ballistic NEGF in nanoMOS 2.0
Parallelization of a detailed scattering model
NEGF Ballistic and with Scattering
z y
x z
y
x
E E r dE dE dE
E
∫∫∫ A ( , , , )
x y
y
x
E r dE dE
E
∫∫ A ( , , )
( )
∫ ABallistic E
x, r dE
x
Mode space approach takes care of the
confinement direction
In the ballistic case,
contribution from the Ey’s is computed analytically
DOS is obtained by integrating the spectral function
?
Schematic view of Parallel nanoMOS
Master
Slaves
Pre-processing Poissson
1-D Schrodinger in the confinement direction
Compute their respective part of the DOS, send back the carrier density communication PVM
routines
Computational Cost on a single processor
CPU time measurements
0 100 200 300 400 500 600
1 2 3 4
Time (s) Series1
Series2 Series3
Estep=0.125 meV
Estep=0.25 meV Estep=0.5 meV
Estep=0.25 meV, all valleys
Parallelization results (Estep=0.25 meV)
Bottleneck= Computation/Communication
Serial vs. Parallel
0 100 200 300 400 500 600
Series1 Series2 Series3
Time (s)
One valley
All valleys
PVM Versus MPI !
Dashed line:PVM
Solid line:MPI
Blue: DOS is full matrix
Red: DOS is sparse
Green: no DOS
PVM versus MPI : Efficiency of non-self consistent loops
Blue: DOS is full matrix
Red: DOS is sparse
Green: no DOS
Dashed line:PVM
Solid line:MPI
Outline
Problem overview
nanoMOS 2.0
The cluster
Parallelization
Parallelization under Matlab
MPITB and PVMTB
PVMTB basics
Parallelization of Ballistic NEGF in nanoMOS 2.0
Parallelization of a detailed scattering model
NEGF Ballistic and with Scattering
z y
x z
y
x
E E r dE dE dE
E
∫∫∫ A ( , , , )
x y
y
x
E r dE dE
E
∫∫ A ( , , )
( )
∫ A E , r dE
Mode space approach takes care of the
confinement direction
In the ballistic case,
contribution from the Ey’s is computed analytically
DOS is obtained by integrating the spectral function
∫∫
In the detailed scattering model, numerical integration in both
longitudinal and transverse energies is required
Results
Non self-consistent simulation was taking ~3-4 days on a single processor machine: 76 hours
Now it takes 2-3 hours on 40 processors.
Therefore self-consistent simulations are now possible (It would have taken ~40 days) in less than 36 hours.
45 minutes 120 CPUs
1.07 hours Speed-up 71x 80 CPUs
2.10 hours
Speed-up 36.2x 38 CPUs
38.65 hours
Speed-up 1.96x 2 CPUs
Conclusions
Gained expertise in parallelization techniques
PVM and MPI
Setup the PVMTB and MPITB toolkit for parallelization under Matlab
Tutorial for PVMTB is available
Wrote a parallel version of nanoMOS 2.0
Benchmarked performances and compared MPI with PVM.
Wrote a parallel detailed scattering model that gave us good physical insights about scattering phenomena…