Notes from an intuitive programmer: long-poll

Wednesday, November 14, 2012

Node.js event loop does not poll

Node.js uses a well-known event loop, but does it work by polling? Some have that impression.

An event loop¹ works by requesting its events from a message pump (per Wikipedia).

Here's how the event loop is implemented: 'Internally, node.js relies on libev to provide the event loop, which is supplemented by libeio[,] which uses pooled threads to provide asynchronous I/O.'².

Here's Wikipedia's article on polling³ and another definition⁴.

Now, 'poll' is a system call which asks Unix to check a set of file descriptors:

'poll, ppoll - wait for some event on a file descriptor...If none of the events requested (and no error) has occurred for any of the file descriptors, then poll() blocks until one of the events occurs.'⁵

Possibly, the system call's name may have misled people into thinking a userland program is doing polling. Nevertheless, when the 'poll' (Unix system call) is invoked, this is not polling in itself.

Hypothetically, in order to get information from a message pump, an event loop could employ the Unix system call 'poll' to check a file descriptor, to which the message pump would write events.

Ultimately, this may be the source of the conceptual confusion here, or it may be caused by the fact that (actual) polling is the easiest method to think of, when programming.

For our case in particular, if an event loop calls Unix 'poll', this is not an instance of the event loop polling anything. Neither node.js, nor any event loop, but only Unix, polls the file descriptors (if indeed it even really does, anymore).

Anyway, an event loop, such as node.js's, does not poll its message pump. Instead, it merely makes a (blocking) request to it. Calling just any request 'polling' pollutes the meaning of the word (and that may be happening here.)

tl;dr – So, let's try anymore not to say that node.js is polling its events—okay? Instead, let's simply say that node.js waits for its events. (A lost cause, I know—but at least I've said it.)

1 http://en.wikipedia.org/wiki/Event_loop
2 http://blog.mixu.net/2011/02/01/understanding-the-node-js-event-loop/
3 http://en.wikipedia.org/wiki/Polling_%28computer_science%29
4 http://whatis.techtarget.com/definition/polling
5 http://linux.die.net/man/2/poll

Copyright (c) 2012 Mark D. Blackwell.

Monday, July 2, 2012

Scale Rails in the cloud by handling external webservice delays, howto

Background:

With Rails, each simultaneous browser request maintains its use of server memory till Rails responds—for unthreaded Rails, this is several tens of Mb. For threaded or JRuby, it's still substantial.

Webservers often (or usually) queue requests for a relatively small number of Rails instances, perhaps a dozen or two.

Freezing (in I/O wait) thousands of Rails instances for long-delayed webservices (located elsewhere on the Internet) is impracticable.

A naive Rails website design (without AJAX) which obtained all needed results from external services first (before responding to the browser) would provide little server throughput.

Usually for website scalability, developers offload (to other worker programs) what, in a naive or low-traffic design, Rails instances (themselves) might do.

In more scalable and sophisticated designs, Rails responds rapidly to initial requests (thus freeing its instance). Then, AJAX finishes the webpage (i.e., the browser polls the server).

Case:

It is a generally established software principle that events are better than polling.

Webpage content delivered by AJAX can be short-polled from the browser. However, short-polling gives unnecessarily slow response. From a user experience (UX) perspective it is either too slow, especially at peak times, or slower than it could be. Also short-polling loads up servers with extra, running Rails instances (maybe queued up) yet ultimately, most such requests determine (regrettably) there is nothing new.

Furthermore, during each polling request Rails reacquires all relevant information from its database cache (due to the stateless nature of the web). Therefore, each Rails short-polling responses takes a terrible amount of server resources—yet only calculates the time for the next poll.

Plan:

Long-polling (or websockets) should be used by Rails websites which access external services.

This can be accomplished efficiently if the browser doesn't long-polls Rails, but instead an asynchronous webserver such as (Ruby) EventMachine, configured for something other than Rails: for instance asynchronous Sinatra::Synchrony.

A RabbitMQ exchange (in a cloud environment such as Heroku) then can provide a queue to (all) front-facing asynchronous website server instances (dynos) containing information desired about (all) worker programs performing tasks which Rails instances offload.

Because the notifications don't need to be stored permanently, it's better to use a message system than a database.

Probably the information (simply) would be notification that each (single) task was complete. The exchange would connect all worker instances to all server instances in an instance-agnostic design typical of the cloud. A notification would include the user's session ID; then each asynchronous webserver could filter the notifications down to (presently active) long-polling connections they (themselves) own. Browsers can provide the session ID's (perhaps in a header) while setting up the long-poll.

Normally a webserver will keep long-poll HTTP connections open for quite a long time; however, if for any reason a connection has been broken, it doesn't matter much; the RabbitMQ queue (configurably) doesn't keep a message once the webserver has received it (so they won't pile up anywhere). Also if the webserver is restarted, the old queue automatically will be deleted.

This is because, in RabbitMQ, (named) exchanges actually manage the flow of messages; each receiver creates a queue of its own (from that exchange) which receives all the messages (applying to all server instances on the website).

Receipt confirmation also is unnecessary. If some messages might be dropped when the server is busy, so what? Nothing much bad will happen further; in that case the user may refresh the webpage—so the scheme is quite tolerant of individual failures.

After getting the new messages, the asynchronous webservers merely return from those long polls. After return, AJAX in the browser knows it can make a short-poll Rails request and be guaranteed of receiving more information for the webpage. Even if the connection merely is broken accidentally, the overall process is still safe, and will result in valid short-poll AJAX results. In other words (for simplicity), the normal way Rails responds to AJAX short-polling should not change.

This cycle of long-poll, short-poll should automatically repeat till Rails effectively tells the AJAX code (in the browser) all the work is done for the page—i.e., till no worker jobs are queued.

Perhaps (the default schedule of) AJAX short-polling can most easily be put off by increasing (to some large value) the time delay on the first short-poll. Presumably this is configurable in Rails. Long-polling of the other (asynchronous) webserver should be added to Rails page view layouts.

Thin (and Ruby EventMachine) are asynchronous and non-blocking just like Node.js. They can accept thousands of simultaneous HTTP connections (to browsers) each consuming only a few Kb of memory. Thin being based partly on Ruby EventMachine demonstrates the latter's quality.

The job queue for Rails worker programs also probably should be a RabbitMQ exchange, since we're using it.

Some various other asynchronous Ruby servers are: cramp, goliath, rainbows! and puma.

Actually, probably the best asynchronous webserver for this purpose is the paid service Pusher (or the open-source Slanger equivalent, to keep it in-house).

References:

blog.headius.com/2008/08/qa-what-thread-safe-rails-means.html
confreaks.com/videos/727-rockymtnruby2011-real-time-rack
github.com/igrigorik/async-rails/
github.com/igrigorik/em-synchrony
github.com/jjb/threaded-rails-example
jordanhollinger.com/2011/04/22/how-to-use-thin-effectivly [sic]
www.igvita.com/2009/05/13/fibers-cooperative-scheduling-in-ruby/
thechangelog.com/post/927103350/episode-0-3-1-websockets
www.igvita.com/2010/03/22/untangling-evented-code-with-ruby-fibers/
www.igvita.com/2010/06/07/rails-performance-needs-an-overhaul/
www.igvita.com/2011/03/08/goliath-non-blocking-ruby-19-web-server/
www.tumblr.com/tagged/pusher?before=1323905509
yehudakatz.com/2010/08/14/threads-in-ruby-enough-already/

Copyright (c) 2012 Mark D. Blackwell.