• No se han encontrado resultados

TRABAJO CONJUNTO ENTRE ONGD Y CENTROS EDUCATIVOS SEGOVIANOS

CAPÍTULO III: RESULTADOS Y DISCUSIÓN

4. TRABAJO CONJUNTO ENTRE ONGD Y CENTROS EDUCATIVOS SEGOVIANOS

As you may guess, the Node.js ecosystem already provides some solutions to handle asynchronous control flows using generators. For example, suspend

(https://npmjs.org/package/suspend) is one of the oldest and supports promises,

thunks, Node.js-style callbacks, as well as raw callbacks. Also, most of the promises libraries we analyzed earlier in the chapter provide helpers to use promises with generators.

All these solutions are based on the same principles we demonstrated with the

asyncFlow() function; so, we may want to reuse one of these instead of writing

one ourselves.

For the examples in this section, we chose to use co (https://npmjs.org/package/ co), which is currently receiving a lot of momentum. A flexible solution, co supports

several types of yieldables, some of which are: • Thunks

• Promises

• Arrays (parallel execution) • Objects (parallel execution) • Generators (delegation)

co also has its own ecosystem of packages including the following:

• Web frameworks, the most popular being koa (https://npmjs.org/ package/koa)

• Libraries implementing specific control flow patterns • Libraries wrapping popular APIs to support co

We will use co to reimplement our web spider application using generators.

While, to convert Node.js style functions to thunks, we are going to use a little library called thunkify (https://npmjs.org/package/thunkify).

Sequential execution

Let's start our practical exploration of generators and co by modifying version 2 of the web spider application. The very first thing we want to do is to load our dependencies and generate a thunkified version of the functions we are going to use. These will go at the top of the spider.js module:

var thunkify = require('thunkify'); var co = require('co');

var request = thunkify(require('request')); var fs = require('fs');

var mkdirp = thunkify(require('mkdirp')); var readFile = thunkify(fs.readFile); var writeFile = thunkify(fs.writeFile); var nextTick = thunkify(process.nextTick);

Looking at the preceding code, we can surely notice some similarities with the code we used earlier in the chapter to promisify some APIs. In this regard, it is interesting to point out that if we decided to use the promisified version of our functions instead of their thunkified alternative, the code that will now follow would remain exactly the same, thanks to the fact that co supports both thunks and promises as yieldable

objects. In fact, if we want, we could even use both thunks and promises in the same application, even in the same generator. This is a tremendous advantage in terms of flexibility, as it allows us to use generator-based control flow with whatever solution we already have at our disposal.

Okay, now let's start transforming the download() function into a generator:

function* download(url, filename) { console.log('Downloading ' + url); var results = yield request(url); var body = results[1];

yield writeFile(filename, body);

console.log('Downloaded and saved:' + url); return body;

}

By using generators and co, our download() function suddenly becomes trivial. All

we had to do is just convert it into a generator function and use yield wherever we

had an asynchronous function (as thunk) to invoke.

Next, it's the turn of the spider() function:

function* spider(url, nesting) {

var filename = utilities.urlToFilename(url); var body;

try {

body = yield readFile(filename, 'utf8'); } catch(err) {

if(err.code !== 'ENOENT') { throw err;

}

body = yield download(url, filename); }

yield spiderLinks(url, body, nesting); }

The interesting detail to notice from this last fragment of code is how we were able to use a try-catch block to handle exceptions. Also, we can now use throw to propagate errors! Another remarkable line is where we yield the download() function, which is not a thunk nor a promisified function, but just another generator. This is possible, thanks to co, which also supports other generators as yieldables.

At last, we can also convert spiderLinks(), where we implemented an iteration to download the links of a web page in sequence. With generators, this becomes trivial as well:

function* spiderLinks(currentUrl, body, nesting) { if(nesting === 0) {

return yield nextTick(); }

var links = utilities.getPageLinks(currentUrl, body); for(var i = 0; i < links.length; i++) {

yield spider(links[i], nesting - 1); };

There is really little to explain from the previous code, there is no pattern to show for the sequential iteration; generators and co are doing all the dirty work for us,

so we were able to write the asynchronous iteration as if we were using blocking, direct style APIs.

Now comes the most important part, the entry point of our program: co(function* () { try { yield spider(process.argv[2], 1); console.log('Download complete'); } catch(err) { console.log(err); }; })();

This is the only place where we have to invoke co(...) to wrap a generator.

In fact, once we do that, co will automatically wrap any generator we pass to

a yield statement, and this will happen recursively, so the rest of the program

is totally agnostic to the fact we are using co, even though it's under the hood.

It is important to notice that the co() function returns a

thunk, so we have to invoke it to start the spider task.

Now it should be possible to run our generator-based web spider application. Just remember to use the --harmony or --harmony-generators flag in the

command line:

node --harmony-generators spider <URL>

Parallel execution

The bad news about generators is that they are great for writing sequential algorithms, but they can't be used to parallelize the execution of a set of tasks, at least not using just yield and generators. In fact, the pattern to use for these

circumstances is to simply rely on a callback-based or promise-based function, which in turn can easily be yielded and used with generators.

Fortunately, for the specific case of the unlimited parallel execution, co already

allows us to obtain it natively by simply yielding an array of promises, thunks, generators, or generator functions.

With this in mind, version 3 of our web spider application can be implemented simply by rewriting the spiderLinks() function as follows:

function* spiderLinks(currentUrl, body, nesting) { if(nesting === 0) {

return nextTick(); }

var links = utilities.getPageLinks(currentUrl, body); var tasks = links.map(function(link) {

return spider(link, nesting - 1); });

yield tasks; }

What we did was just collect all the download tasks, which are essentially generators, and then yield on the resulting array. All these tasks will be executed by co in parallel and then the execution of our generator (spiderLinks) will be resumed when all the tasks finish running.

If you think we cheated by exploiting the feature of co that allows us to yield on an array, we can demonstrate how the same parallel flow can be achieved using a callback-based solution similar to what we have already used earlier in the chapter. Let's use this technique to rewrite the spiderLinks() once again:

function spiderLinks(currentUrl, body, nesting) { if(nesting === 0) { return nextTick(); } //returns a thunk return function(callback) {

var completed = 0, errored = false;

var links = utilities.getPageLinks(currentUrl, body); if(links.length === 0) {

return process.nextTick(callback); }

function done(err, result) { if(err && !errored) { errored = true; callback(err); }

if(++completed === links.length && !errored) { callback();

} }

for(var i = 0; i < links.length; i++) { co(spider(links[i], nesting - 1))(done); };

} }

To run the spider() function in parallel, which is a generator, we had to convert it

into a thunk and then execute it. This was possible by wrapping it with the co(...)

function, which essentially creates a thunk out of a generator. This way, we were able to invoke it in parallel and set the done() function as callback. Usually, all the libraries for generator-based control flow have a similar feature, so you can always transform a generator into a callback-based function if needed.

To start multiple download tasks in parallel, we just reused the callback-based pattern for parallel execution, which we defined earlier in the chapter. We should also notice that we transformed the spiderLinks() function to a thunk (it's not even

a generator anymore.) This enabled us to have a callback function to invoke when

all the parallel tasks are completed.

Pattern (generator-to-thunk): converts a generator to a thunk in order to be able to run it in parallel or utilize it for taking advantage

of other callback- or promises-based control flow algorithms.

Limited parallel execution

Now that we know how to move in case of nonsequential execution flows, it should be easy to plan the implementation of version 4 of our web spider application, the one imposing a limit on the number of concurrent download tasks. We have several options we can use to do that, some of them are as follows:

• Use the callback-based version of the TaskQueue class we implemented previously in the chapter. We would need to just thunkify its functions and any generator we want to use as a task.

• Use the promises-based version of the TaskQueue class, and just make sure

that each generator we want to use as a task is converted into a function returning a promise.

• Use async, and thunkify any helper we plan to use, in addition to converting

any generator to a callback-based function that can be used by the library. • Use a library from the co ecosystem, specifically designed for this type of flow, such as, co-limiter (https://npmjs.org/package/co-limiter). • Implement a custom algorithm based on the producer-consumer pattern, the

same that co-limiter uses internally.

For educational purposes, we are going to choose the last option, so we can dive into a pattern that is often associated with coroutines (but also threads and processes).