3.2 Requerimientos del Sistema
3.2.1 Funcionalidad
• Parallelism & driving force of com!uter design & energy and cost "eing the !rimary design constraint#
• There are "asically two )inds of Parallelism in a!!lications#
%#Da'a@Lee! Para!!e!is 1DLP2 There are many data items that can "e o!erated on at the same time#
&#Tas@Lee! Para!!e!is 1TLP2 arises "ecause tas)s of wor) are created that can o!erate inde!endently and largely in !arallel#
Com!uter hardware can e1!loit these two )inds of a!!lication#
Parallelism in ma6or four ways#
(#Ins'r+.'i"n Lee! Para!!e!is 1ILP2
•. E1!loits 2P with com!iler#
•. =ll Processors since a"out .% use !i!elining to overla! the e1ecution of instructions and im!rove !erformance#
•. This !otential overla! among instructions is called Instruction 2evel Parallelism#
•. The instructions can "e evaluated in !arallel#
&#Ve.'"r Ar.-i'e.'+res an/ Grap-i. Pr".ess"r Uni's 1GPU2
E1!loits 2P "y a!!lying a single instruction to a collection of data in !arallel#
(#T-rea/ Lee! Para!!e!is
E1!loits either 2P or T2P in a tightly cou!led hardware module that allows for interaction among !arallel threads#
*#Re:+es' Lee! Para!!e!is
E1!loits !arallelism among largely decou!led tas)s s!eci*ed "y the !rogrammer or the o!erating systems#
•
Michael 5lynn !laced all com!uters in to one of four categoriesA
%# Sing!e Ins'r+.'i"ns Sing!e Da'a 1SISD2 s'rea
A
•.
Uni!rocessor category #
•.
/tandard se0uential com!uter, "ut it can e1!loit I2P#
•.
/I/ architectures that use I2P techni0ues such as su!erscalar#
&# Sing!e Ins'r+.'i"ns M+!'ip!e Da'a 1SIMD2 s'rea
A
•.
In a /IM machine, the same instruction is e1ecuted "y multi!le !rocessors using dierent data streams#
•.
Each !rocessor has its own data memory, "ut there is only one instruction memory and control !rocessor, which fetches and dis!atches instructions#
•.
/tandard se0uential com!uter, "ut it can e1!loit I2P#
•.
/I/ architectures that use I2P techni0ues such as su!erscalar#
•.
It e1!loits 2P, "y a!!lying the same o!erations to
multi!le items of data in !arallel#
(# M+!'ip!e Ins'r+.'i"ns Sing!e Da'a 1MISD2 s'reaA
• o commercial multi!rocessor of this ty!e has "een "uilt to date#
*# M+!'ip!e Ins'r+.'i"ns M+!'ip!e Da'a 1MIMD2 s'reaA
• Each !rocessor fetches its own instructions and o!erates on its own data#
• These !rocessors either utili8e centrali8ed shared memory architecture or each has its own memory and they communicate with each other through cross"ar networ)s#
/IM !rocessors can e1!loit data !arallelism, "ut are not as De1i"le as MIM !rocessors# They are suita"le for algorithms with high data
!arallelism and little data de!endent control Dow#
MIM !rocessors are more De1i"le, they can "e either function as single-user machines, focusing on high !erformance for one !articular a!!lication or as multi-!rogrammed machines running many tas)s simultaneously#
>owever they are much more e1!ensive and com!licated due to
re!lication of control hardware, high instruction "andwidth re0uirement and /ynchroni8ation of data !ath#
Kesides !ure /IM and MIM a!!roaches, a com"ination of "oth /IM
and MIM a!!roaches is also !ossi"le, e1!loiting the advantages of "oth /IM
and MIM architectures#
Tightly cou!led MIM architectures e1!loits T2P , since multi!le coo!erating Threads o!erate in !arallel#
2oosely cou!led MIM architectures 3Clusters and 7/C4 e1!loits R2P, where many inde!endent tas)s can !roceed in !arallel with little need for communication and /ynchroni8ation#
MULTITHREADING
• M+!'i'-rea/ing /imultaneous e1ecution of two or more threads
"y the multi!le !rocessors#
• On a /ingle !rocessor, Multithreading generally occurs "y Time
ivision Multi!le1ing 3TM4# The !rocessor switches "etween dierent threads#
• On a Multi!rocessor the threads or tas)s will actually run at the same time with each !rocessor or core running as !articular thread or tas)#
T)pes
# Coarse-grained Multithreading#
9# 5ine-grained Multithreading#
:# /imultaneous Multithreading#
A/an'ages "3 M+!'i'-rea/ing
# If a thread gets a lot of cache misses, the other thread can continue, ta)ing
advantage of unused com!uting resources, which thus can lead to faster overall
e1ecution, as these resources would have "een idle if only a single thread was
MULTITHREADING
Disa/an'ages
# Multi!le threads can interfere with each other, when sharing hardware
resources such as caches or T2P#
9# E1ecution time of a single threads are not im!roved, due to slower fre0uency or
adding !i!eline stages that are necessary to accommodate thread switching >?7#
:# Re0uires more changes to "oth a!!lica"le !rograms and O/ than multi!rocessing#
C"arse@graine/ M+!'i'-rea/ing
• =lso )nown as Kloc) or coo!erative multithreading#
• /im!lest ty!e of multithreading, occurs when one thread runs until, it is
"loc)ed "y a event that normally would create a long latency stall#
• /uch a stall might "e a cache miss, that have to access o-chi! memory &
might ta)e huge num"er of CPU cycles, for the data to return#
• Instead of waiting for the stall to resolve, a threaded !rocess
MULTITHREADING
Fine@graine/ M+!'i'-rea/ing
• It is to remove all de!endencies stalls from the e1ecuting
!i!elining#
• /ince one thread is relatively inde!endent from other thread there is less
chance of one instruction in one !i!eline stages needing an out!ut from an older
instruction in !i!eline#
Har/8are C"s'
• It has additional cost of each !i!eline stages trac)ing the thread I of the
Instruction it is !rocessing#
• /ince there are more threads "eing e1ecuted concurrently in the !i!eline
shared resources increase# Caches need to "e larger to avoid threading "etween
the dierent threads#
Si+!'ane"+s M+!'i'-rea/ing
• Most advanced ty!e of multithreading a!!lied to su!erscalar