Notes from an intuitive programmer: async

Showing posts with label async. Show all posts

Wednesday, November 14, 2012

Node.js event loop does not poll

Node.js uses a well-known event loop, but does it work by polling? Some have that impression.

An event loop¹ works by requesting its events from a message pump (per Wikipedia).

Here's how the event loop is implemented: 'Internally, node.js relies on libev to provide the event loop, which is supplemented by libeio[,] which uses pooled threads to provide asynchronous I/O.'².

Here's Wikipedia's article on polling³ and another definition⁴.

Now, 'poll' is a system call which asks Unix to check a set of file descriptors:

'poll, ppoll - wait for some event on a file descriptor...If none of the events requested (and no error) has occurred for any of the file descriptors, then poll() blocks until one of the events occurs.'⁵

Possibly, the system call's name may have misled people into thinking a userland program is doing polling. Nevertheless, when the 'poll' (Unix system call) is invoked, this is not polling in itself.

Hypothetically, in order to get information from a message pump, an event loop could employ the Unix system call 'poll' to check a file descriptor, to which the message pump would write events.

Ultimately, this may be the source of the conceptual confusion here, or it may be caused by the fact that (actual) polling is the easiest method to think of, when programming.

For our case in particular, if an event loop calls Unix 'poll', this is not an instance of the event loop polling anything. Neither node.js, nor any event loop, but only Unix, polls the file descriptors (if indeed it even really does, anymore).

Anyway, an event loop, such as node.js's, does not poll its message pump. Instead, it merely makes a (blocking) request to it. Calling just any request 'polling' pollutes the meaning of the word (and that may be happening here.)

tl;dr – So, let's try anymore not to say that node.js is polling its events—okay? Instead, let's simply say that node.js waits for its events. (A lost cause, I know—but at least I've said it.)

1 http://en.wikipedia.org/wiki/Event_loop
2 http://blog.mixu.net/2011/02/01/understanding-the-node-js-event-loop/
3 http://en.wikipedia.org/wiki/Polling_%28computer_science%29
4 http://whatis.techtarget.com/definition/polling
5 http://linux.die.net/man/2/poll

Copyright (c) 2012 Mark D. Blackwell.

Saturday, July 7, 2012

Manage long-running external webservice requests from Rails apps (on cloud servers), howto

Case: (as long as Rails is synchronous) requests to external webservices take the use of server resources to impossible levels, even when webservices behave normally—let alone when they are long delayed.

Plan: two web apps (one Rails, the other async Sinatra) can fairly easily manage the problem of external web service requests by minimizing use of server resources—without abandoning normal, threaded, synchronous Rails. The async Sinatra web app can be a separate business, even a moneymaking one.

This solution uses RabbitMQ, Memcache and PusherApp.

The async Sinatra web dynos (on the one hand) comprise external webservice request brokers. Also they have browser-facing functionality for signing up webmasters.

The Rails web dynos don't wait (on the other hand) for external webservices and they aren't short-polled by browsers.

This attempts to be efficient and robust. It should speed up heavily loaded servers while remaining within the mainstream of the Rails Way as much as possible.

E.g. it tries hard not to Pusherize browsers more than once for the case that a cached response to an external webservice was missed, but relies on browser short-polling after perhaps a 10-second timeout to cover these and other unusual cases.

But in the normal case browser short-polling will be avoided so Rails server response time should be peppy.

It tries to delete its temporary work from memcache but even if something is missed, memcache times out its data eventually so too much garbage won't pile up there.

Note: this is for web services without terribly large responses (thus appropriate for memcaching). Very large responses and non-idempotent services should be handled another way such as supplying them directly to the browser.

Method: the Rails web app dynos immediately use memcached external webservice responses if the URL's match.

Otherwise they push the URL of each external webservice request and an associated PusherApp channel ID (for eventually informing the browser) to a RabbitMQ Exchange.

For security purposes, minimal information is passed through PusherApp to the browser (only suggesting a short-poll now, not where).

The Rails web dynos (if necessary) return an incomplete page to the browser as usual (for completion with AJAX).

To cover cases where something got dropped the browser should short-poll the Rails app after a longish timeout—its length should be set by an environment variable and may be shortened to half a second when the Rails website is not terribly active, or when the async Sinatra web dynos are scaled down to off.

Each async Sinatra web dyno attaches a queue to the Rails app's RabbitMQ exchange for accepting messages without confirmation.

With each queued message, an async Sinatra web dyno:

Checks the memcache for the external webservice request (with response)—if present, it:

Drops the message. (Some may slip through and be multiply-processed, but that's okay.)
Frees memcache of the request (without response) if it still exists (see below).

Memcaches the external webservice request (without response) with the current time (not in the key).
If the request times out, drops it in favor of letting the browser handle the problem, but leaves the memcached external webservice request (without response) for later viewing by async Sinatra web dynos.
(Usually) receives a response from the external webservice request.
Again checks memcache for the external webservice request (combined with the same response). If it's not there:

Pusherizes the appropriate browser. (Some requests may be multiply-processed, but that's okay.)
Memcaches the external webservice request (with response).
Clears from memcache the external webservice request without response.

The browser then asks the Rails web dyno to supply all available AJAX updates.

The Rails web dyno returns (usually incomplete: whatever is memcached—some may have been dropped, but that's okay) a set of still-needed AJAX responses to the browser (for further completion with AJAX).

Or (if all were memcached) the Rails web dynos return the complete set of outstanding AJAX responses to the browser.

I'm starting to implement this here, now.
Copyright (c) 2012 Mark D. Blackwell.

Monday, July 2, 2012

Scale Rails in the cloud by handling external webservice delays, howto

Background:

With Rails, each simultaneous browser request maintains its use of server memory till Rails responds—for unthreaded Rails, this is several tens of Mb. For threaded or JRuby, it's still substantial.

Webservers often (or usually) queue requests for a relatively small number of Rails instances, perhaps a dozen or two.

Freezing (in I/O wait) thousands of Rails instances for long-delayed webservices (located elsewhere on the Internet) is impracticable.

A naive Rails website design (without AJAX) which obtained all needed results from external services first (before responding to the browser) would provide little server throughput.

Usually for website scalability, developers offload (to other worker programs) what, in a naive or low-traffic design, Rails instances (themselves) might do.

In more scalable and sophisticated designs, Rails responds rapidly to initial requests (thus freeing its instance). Then, AJAX finishes the webpage (i.e., the browser polls the server).

Case:

It is a generally established software principle that events are better than polling.

Webpage content delivered by AJAX can be short-polled from the browser. However, short-polling gives unnecessarily slow response. From a user experience (UX) perspective it is either too slow, especially at peak times, or slower than it could be. Also short-polling loads up servers with extra, running Rails instances (maybe queued up) yet ultimately, most such requests determine (regrettably) there is nothing new.

Furthermore, during each polling request Rails reacquires all relevant information from its database cache (due to the stateless nature of the web). Therefore, each Rails short-polling responses takes a terrible amount of server resources—yet only calculates the time for the next poll.

Plan:

Long-polling (or websockets) should be used by Rails websites which access external services.

This can be accomplished efficiently if the browser doesn't long-polls Rails, but instead an asynchronous webserver such as (Ruby) EventMachine, configured for something other than Rails: for instance asynchronous Sinatra::Synchrony.

A RabbitMQ exchange (in a cloud environment such as Heroku) then can provide a queue to (all) front-facing asynchronous website server instances (dynos) containing information desired about (all) worker programs performing tasks which Rails instances offload.

Because the notifications don't need to be stored permanently, it's better to use a message system than a database.

Probably the information (simply) would be notification that each (single) task was complete. The exchange would connect all worker instances to all server instances in an instance-agnostic design typical of the cloud. A notification would include the user's session ID; then each asynchronous webserver could filter the notifications down to (presently active) long-polling connections they (themselves) own. Browsers can provide the session ID's (perhaps in a header) while setting up the long-poll.

Normally a webserver will keep long-poll HTTP connections open for quite a long time; however, if for any reason a connection has been broken, it doesn't matter much; the RabbitMQ queue (configurably) doesn't keep a message once the webserver has received it (so they won't pile up anywhere). Also if the webserver is restarted, the old queue automatically will be deleted.

This is because, in RabbitMQ, (named) exchanges actually manage the flow of messages; each receiver creates a queue of its own (from that exchange) which receives all the messages (applying to all server instances on the website).

Receipt confirmation also is unnecessary. If some messages might be dropped when the server is busy, so what? Nothing much bad will happen further; in that case the user may refresh the webpage—so the scheme is quite tolerant of individual failures.

After getting the new messages, the asynchronous webservers merely return from those long polls. After return, AJAX in the browser knows it can make a short-poll Rails request and be guaranteed of receiving more information for the webpage. Even if the connection merely is broken accidentally, the overall process is still safe, and will result in valid short-poll AJAX results. In other words (for simplicity), the normal way Rails responds to AJAX short-polling should not change.

This cycle of long-poll, short-poll should automatically repeat till Rails effectively tells the AJAX code (in the browser) all the work is done for the page—i.e., till no worker jobs are queued.

Perhaps (the default schedule of) AJAX short-polling can most easily be put off by increasing (to some large value) the time delay on the first short-poll. Presumably this is configurable in Rails. Long-polling of the other (asynchronous) webserver should be added to Rails page view layouts.

Thin (and Ruby EventMachine) are asynchronous and non-blocking just like Node.js. They can accept thousands of simultaneous HTTP connections (to browsers) each consuming only a few Kb of memory. Thin being based partly on Ruby EventMachine demonstrates the latter's quality.

The job queue for Rails worker programs also probably should be a RabbitMQ exchange, since we're using it.

Some various other asynchronous Ruby servers are: cramp, goliath, rainbows! and puma.

Actually, probably the best asynchronous webserver for this purpose is the paid service Pusher (or the open-source Slanger equivalent, to keep it in-house).

References:

blog.headius.com/2008/08/qa-what-thread-safe-rails-means.html
confreaks.com/videos/727-rockymtnruby2011-real-time-rack
github.com/igrigorik/async-rails/
github.com/igrigorik/em-synchrony
github.com/jjb/threaded-rails-example
jordanhollinger.com/2011/04/22/how-to-use-thin-effectivly [sic]
www.igvita.com/2009/05/13/fibers-cooperative-scheduling-in-ruby/
thechangelog.com/post/927103350/episode-0-3-1-websockets
www.igvita.com/2010/03/22/untangling-evented-code-with-ruby-fibers/
www.igvita.com/2010/06/07/rails-performance-needs-an-overhaul/
www.igvita.com/2011/03/08/goliath-non-blocking-ruby-19-web-server/
www.tumblr.com/tagged/pusher?before=1323905509
yehudakatz.com/2010/08/14/threads-in-ruby-enough-already/

Copyright (c) 2012 Mark D. Blackwell.