Capítulo III: Aplicación de un procedimiento para el diseño de competencias laborales en
3.2 Procedimiento para la elaboración de competencias en el Cardiocentro “Ernesto
Crucibleoffers three key runtime environments for the execution of analytics transpiled from the DSL source. This model, in which an executable compiled to Java source is integrated with existing environments by a runtime shim, is similar to that used in COMPSs [105], the componentised superscalar programming model and runtime system. COMPSs uses the EMOTIVE [44] middleware to enable execution on a variety of runtime environments; in a similar vein,
4. Unified Secure On- and Off-Line Analytics Crucible uses its own novel library of runtime middleware to convert each
framework’s native runtime behaviour to integrate with theCruciblemessage
passing model. The key advancement of Crucible’s implementations over
the likes of COMPSs is that instead of simply converting between APIs with essentially similar runtime models (e.g., for the deployment of virtual machines on
a cloud and scheduling of jobs on those instances), each of theCrucibleruntimes
described below must integrate fundamentally different execution models.
Standalone Processing
The first, and simplest, runtime environment is designed for readily testing a
Crucibletopology locally, without any need for a distributed infrastructure. This Standalone environment executes a given topology in a single JVM, relying heavily on Java’s multithreading capabilities. Simple in-memory locking and global state are provided as the topology will always be located in a single JVM.
Message passing is performed entirely in-memory, using a sharedDispatcher
instance with a blocking concurrent queue providing synchronisation. This
queue, an instance of java.util.concurrent.LinkedBlockingQueue, main-
tains a queue of java.lang.Runnabletasks which are passed to ajava.util.
concurrent.ExecutorServicefor execution in a thread-pool. These tasks are used for registering/de-registering subscribers, and submitting tuples to the list of subscribers for a given PE. Multiple Reader Single Writer semantics for the sub-
scriber mapping are ensured using a series ofjava.util.concurrent.Semaphore
mutexes.
On-Line Processing
IBM’s InfoSphere Streams provides the platform for Crucible’s streaming
(on-line) runtime engine. An extension to theCrucibleDSL compiler generates
a complete SPL (IBM’s Streams Processing Language) project from the given topology. This project can be imported directly into InfoSphere Streams Studio; it consists of the required project infrastructure (including toolkit and classpath
dependency references), and a single SPL Main Composite describing the topology. Each SPL PE in Streams is an instance of a Streams-specific wrapper class, CruciblePE. This class handles invocation of thereceive$... tuple methods,
dispatch between Streams and theCruciblePEs, and tuple serialisation.
There is a one-to-one mapping between tuples emitted inCrucible and
tuples emitted in Streams. Each key in a Crucible tuple has a fixed field
in a Streams tuple type, and values for all keys are transmitted with each emission. These values are interleaved with their associated security labels, such that the label for a given key immediately precedes it. Tuple values must
be converted between Streams andCrucibleusing a serialisation framework
of some kind: this framework is injected into the PE at runtime. Kryo is a
runtime library for Java which serialises an arbitrary JavaSerializable into a
buffer of bytes with a similar contract to the built-in JavaObjectOutputStream
andObjectInputStream, only with superior time and space efficiency [2, 99].
On these strengths, the default serialiser for Crucibleis Kryo – but it would
be feasible to add, for example, a Protocol Buffers [47] based implementation
if interoperability with external systems were required. Security Labels are
not serialised through Kryo; to facilitate their inspection by debug tooling on
the Streams instance, as well as easing their consumption in a non-Crucible
analytic workflow, they are encoded as strings. Security Labels are written asrstringvalues, while all others are serialised as an immutablelist<int8> (representing an array of the serialised bytes).
Each of theseCruciblePE instances can be scheduled into separate JVMs
running on different hosts, according to the behaviour of the Streams deployment
manager. Manual editing of the generated SPL, e.g., to use SPLMM (SPL Mixed Mode, using Perl as a preprocessor), can be used to parallelise a single PE across multiple hosts. The injectable global synchronisation primitives discussed in Section 4.1.4 may be used to ensure correctness in this form of data-parallelism.
4. Unified Secure On- and Off-Line Analytics
Figure 4.4: Crucible Accumulo Runtime Message Dispatch, demonstrating
how Scanners are used to pull data through a collection of custom Iterators to analyse data sharded across Accumulo Tablets.
Off-Line Processing
The mapping fromCrucible’s execution model to Accumulo for off-line pro-
cessing is more involved. In order to exploit the data locality and inherent parallelism available in HDFS, while maintaining the event-driven programming
model employed in the CrucibleDSL, the Accumulo runtime makes use of
Accumulo Iterators [39]. AnIteratormay scan multiple tablets in parallel, and
will stream ordered results to theScannerwhich invoked the iterator. Crucible
makes use of this paradigm by spawning a CrucibleIteratorfor each PE in
the topology, along with a multithreaded Scanner to consume results. Each
CrucibleIteratormay be instantiated and destroyed repeatedly as the scan progresses through the data store.
EachCrucibleIteratoris assigned to its own table, named after theUUID of the Job and the PE to which it refers. Values map onto an Accumulo Key by using a timestamp for the Row ID, the Source PE of a tuple as Column Family, and the emitted item’s key as Column Qualifier. Column Visibility is used to
encode Security Labels, making efficient use of Accumulo’s native support for cell-level security.
In this way, theCrucibleIteratorcan invoke the correctreceivemethod
on a PE, by collating allhkey, value, labelitriples of a givenRowID. By mapping
CrucibleSecurity Labels onto Accumulo Visibilities, all message passing data (and final results) are persisted to HDFS with their correct labels: external Accumulo clients may read that state, provided they possess the correct set of
authorizations: ensuring cell-level security well beyond theCruciblesystem
boundary.
Crucible’sAccumuloDispatchertakes tuples emitted by a PE, and writes
them to the tables of each subscriber to that stream, for the relevantCrucible
Iterator to process in parallel. The final component is the multithreaded Scanner, which continually consumes from the iterator stack, restarting from the last key scanned when the stack exhausts available input, thus ensuring that the job fully processes all tuples in all tables.
This flow is presented in Figure 4.4: theAccumulo Master schedules Cru-
cible’s Iterators ontoTablet Servers as a result of requests from the client-side
Scanners. There is one Scanner present for each PE in the system: in Figure 4.4 there are therefore three PEs shown – there could be a many-to-many mapping of PEs to Tablets, as Accumulo distributes data for each PE’s table across the available Tablet Servers. Note here that only the final results are returned to the client-side Scanners: all intermediate data is written across the Accumulo cluster’s internal network.