• No se han encontrado resultados

Device Simulator nanoMOS 2.0 Using a 100 Nodes Linux Cluster

N/A
N/A
Protected

Academic year: 2022

Share "Device Simulator nanoMOS 2.0 Using a 100 Nodes Linux Cluster"

Copied!
25
0
0

Texto completo

(1)

Parallelization of the Nanoscale

Device Simulator nanoMOS 2.0 Using a 100 Nodes Linux Cluster

IEEE Nanotechnology Conference Aug 26-28, 2002

Washington DC

Sebastien Goasguen, Ali. R. Butt , Kevin D. Colby, Mark Lundstrom

Purdue University

Electrical Engineering Department

(2)

Outline

 Problem overview

 nanoMOS 2.0

 The cluster and PUNCH

 Parallelization

 Parallelization under Matlab

 MPITB and PVMTB

 PVMTB basics

 Parallelization of Ballistic NEGF in nanoMOS 2.0

 Parallelization of a detailed scattering model

(3)

nanoMOS 2.0

silicon dioxide silicon dioxide

Gate

Gate

drain source

SiO2

L = 10 nm

energy--->

nanoMOS 2.0

• 2D simulation of nanoscale silicon SOI MOSFETs

-NEGF

-BTE (ballistic) -drift-diffusion -energy transport -density gradient -effective potential

Written in Matlab

• Parallelized for a Linux cluster

• > 180 downloads via PUNCH

(4)

www.nanohub.purdue.edu

Software applications

PUNCH

workstations

middleware

web enabling

-network operating system -logical user accounts

-virtual file system

-resource management system

www.nanohub.purdue.edu

(5)

The cluster

•200 processors

•130 GFLOPS 2x1.2 GHz Athlon CPU, 1 GB RAM per node

(6)

Integrating a cluster with PUNCH

 The user does not know that the job is being run on a cluster.

 Install a compute Backend on the cluster

 Jobs are scheduled to the cluster Backend through the ACTyP.

 The backend submit the jobs to the PBS (Portable Batch System) of the cluster.

 PUNCH can also interface with Condor (Job migration)

 Backends could be installed on remote clusters and PUNCH jobs could be submitted through the PBS queue.

(7)

PUNCH runs jobs on the cluster !

•PBS monitors jobs on the cluster.

•Punch000 runs a job through PBS !!!

(8)

Parallelization

 Code parallelization is done through message passing between processors

 Message passing is implemented by MPI (message passing interface) or PVM (parallel virtual machine)

 Break your problem in small problems. The problems are linked with MPI or PVM to get the full solution.

 PVM home page (links and tutorial)

http://www.csm.ornl.gov/pvm

 PVM home page at netlib (source code)

http://www.netlib.org/pvm3/

 MPI home page

http://www.mpi-forum.org

(9)

Outline

 Problem overview

 nanoMOS 2.0

 The cluster and PUNCH

 Parallelization

 Parallelization under Matlab

 MPITB and PVMTB

 PVMTB basics

 Parallelization of Ballistic NEGF in nanoMOS 2.0

 Parallelization of a detailed scattering model

(10)

Parallel programming under Matlab

 Matlab is not parallel ! But tools are available for

parallel computation on shared memory machines or distributed memory machines…(Matlab central file exchange)

 Javier Fernandez Baldomero from the University of Granada/Spain has written PVMTB and MPITB.

 C/C++, Mex functions bindings of MPI or PVM routines.

 The best solution for PC-clusters, easy to use and a very good tool to learn parallelization under Matlab environment.

 Production codes will be written with MPI or PVM !

(11)

Sending and receiving data with PVM

 Buffer initialization needs to be done before packing anything

pvm_initsend(PLACE)

 Packing can be done in various ways but matlab matrices are packed with

pvm_pack(A)

 Sending can be directed a one particular matlab child or at all of them.

pvm_send(tid,TAG) or pvm_mcast(tids,TAG)

(12)

Sending and receiving data

Parent Children

A=eye(4); parent=pvm_parent;

pvm_initsend(PLACE) tid=pvm_mytid;

pvm_pack(A)

pvm_mcast(tids,TAG)

pvm_recv(parent,TAG) pvm_unpack(‘A’)

A=A+tid; %dummy operation

pvm_initsend(PLACE) pvm_pack(A)

pvm_send(parent,TAG) For numt=1:(nhost-1)

pvm_recv(tids(numt),TAG) pvm_unpack(‘A_new’) A_new

end

(13)

Outline

 Problem overview

 nanoMOS 2.0

 The cluster

 Parallelization

 Parallelization under Matlab

 MPITB and PVMTB

 PVMTB basics

 Parallelization of Ballistic NEGF in nanoMOS 2.0

 Parallelization of a detailed scattering model

(14)

NEGF Ballistic and with Scattering

z y

x z

y

x

E E r dE dE dE

E

∫∫∫ A ( , , , )

x y

y

x

E r dE dE

E

∫∫ A ( , , )

( )

A

Ballistic

E

x

, r dE

x

Mode space approach takes care of the

confinement direction

In the ballistic case,

contribution from the Ey’s is computed analytically

DOS is obtained by integrating the spectral function

?

(15)

Schematic view of Parallel nanoMOS

Master

Slaves

Pre-processing Poissson

1-D Schrodinger in the confinement direction

Compute their respective part of the DOS, send back the carrier density communication PVM

routines

(16)

Computational Cost on a single processor

CPU time measurements

0 100 200 300 400 500 600

1 2 3 4

Time (s) Series1

Series2 Series3

Estep=0.125 meV

Estep=0.25 meV Estep=0.5 meV

Estep=0.25 meV, all valleys

(17)

Parallelization results (Estep=0.25 meV)

(18)

Bottleneck= Computation/Communication

(19)

Serial vs. Parallel

0 100 200 300 400 500 600

Series1 Series2 Series3

Time (s)

One valley

All valleys

(20)

PVM Versus MPI !

Dashed line:PVM

Solid line:MPI

Blue: DOS is full matrix

Red: DOS is sparse

Green: no DOS

(21)

PVM versus MPI : Efficiency of non-self consistent loops

Blue: DOS is full matrix

Red: DOS is sparse

Green: no DOS

Dashed line:PVM

Solid line:MPI

(22)

Outline

 Problem overview

 nanoMOS 2.0

 The cluster

 Parallelization

 Parallelization under Matlab

 MPITB and PVMTB

 PVMTB basics

 Parallelization of Ballistic NEGF in nanoMOS 2.0

 Parallelization of a detailed scattering model

(23)

NEGF Ballistic and with Scattering

z y

x z

y

x

E E r dE dE dE

E

∫∫∫ A ( , , , )

x y

y

x

E r dE dE

E

∫∫ A ( , , )

( )

A E , r dE

Mode space approach takes care of the

confinement direction

In the ballistic case,

contribution from the Ey’s is computed analytically

DOS is obtained by integrating the spectral function

∫∫

In the detailed scattering model, numerical integration in both

longitudinal and transverse energies is required

(24)

Results

 Non self-consistent simulation was taking ~3-4 days on a single processor machine: 76 hours

 Now it takes 2-3 hours on 40 processors.

 Therefore self-consistent simulations are now possible (It would have taken ~40 days) in less than 36 hours.

45 minutes 120 CPUs

1.07 hours Speed-up 71x 80 CPUs

2.10 hours

Speed-up 36.2x 38 CPUs

38.65 hours

Speed-up 1.96x 2 CPUs

(25)

Conclusions

 Gained expertise in parallelization techniques

 PVM and MPI

 Setup the PVMTB and MPITB toolkit for parallelization under Matlab

 Tutorial for PVMTB is available

 Wrote a parallel version of nanoMOS 2.0

 Benchmarked performances and compared MPI with PVM.

 Wrote a parallel detailed scattering model that gave us good physical insights about scattering phenomena…

Referencias

Documento similar