Notes from an intuitive programmer: Scale Rails in the cloud by handling external webservice delays, howto

Background:

With Rails, each simultaneous browser request maintains its use of server memory till Rails responds—for unthreaded Rails, this is several tens of Mb. For threaded or JRuby, it's still substantial.

Webservers often (or usually) queue requests for a relatively small number of Rails instances, perhaps a dozen or two.

Freezing (in I/O wait) thousands of Rails instances for long-delayed webservices (located elsewhere on the Internet) is impracticable.

A naive Rails website design (without AJAX) which obtained all needed results from external services first (before responding to the browser) would provide little server throughput.

Usually for website scalability, developers offload (to other worker programs) what, in a naive or low-traffic design, Rails instances (themselves) might do.

In more scalable and sophisticated designs, Rails responds rapidly to initial requests (thus freeing its instance). Then, AJAX finishes the webpage (i.e., the browser polls the server).

Case:

It is a generally established software principle that events are better than polling.

Webpage content delivered by AJAX can be short-polled from the browser. However, short-polling gives unnecessarily slow response. From a user experience (UX) perspective it is either too slow, especially at peak times, or slower than it could be. Also short-polling loads up servers with extra, running Rails instances (maybe queued up) yet ultimately, most such requests determine (regrettably) there is nothing new.

Furthermore, during each polling request Rails reacquires all relevant information from its database cache (due to the stateless nature of the web). Therefore, each Rails short-polling responses takes a terrible amount of server resources—yet only calculates the time for the next poll.

Plan:

Long-polling (or websockets) should be used by Rails websites which access external services.

This can be accomplished efficiently if the browser doesn't long-polls Rails, but instead an asynchronous webserver such as (Ruby) EventMachine, configured for something other than Rails: for instance asynchronous Sinatra::Synchrony.

A RabbitMQ exchange (in a cloud environment such as Heroku) then can provide a queue to (all) front-facing asynchronous website server instances (dynos) containing information desired about (all) worker programs performing tasks which Rails instances offload.

Because the notifications don't need to be stored permanently, it's better to use a message system than a database.

Probably the information (simply) would be notification that each (single) task was complete. The exchange would connect all worker instances to all server instances in an instance-agnostic design typical of the cloud. A notification would include the user's session ID; then each asynchronous webserver could filter the notifications down to (presently active) long-polling connections they (themselves) own. Browsers can provide the session ID's (perhaps in a header) while setting up the long-poll.

Normally a webserver will keep long-poll HTTP connections open for quite a long time; however, if for any reason a connection has been broken, it doesn't matter much; the RabbitMQ queue (configurably) doesn't keep a message once the webserver has received it (so they won't pile up anywhere). Also if the webserver is restarted, the old queue automatically will be deleted.

This is because, in RabbitMQ, (named) exchanges actually manage the flow of messages; each receiver creates a queue of its own (from that exchange) which receives all the messages (applying to all server instances on the website).

Receipt confirmation also is unnecessary. If some messages might be dropped when the server is busy, so what? Nothing much bad will happen further; in that case the user may refresh the webpage—so the scheme is quite tolerant of individual failures.

After getting the new messages, the asynchronous webservers merely return from those long polls. After return, AJAX in the browser knows it can make a short-poll Rails request and be guaranteed of receiving more information for the webpage. Even if the connection merely is broken accidentally, the overall process is still safe, and will result in valid short-poll AJAX results. In other words (for simplicity), the normal way Rails responds to AJAX short-polling should not change.

This cycle of long-poll, short-poll should automatically repeat till Rails effectively tells the AJAX code (in the browser) all the work is done for the page—i.e., till no worker jobs are queued.

Perhaps (the default schedule of) AJAX short-polling can most easily be put off by increasing (to some large value) the time delay on the first short-poll. Presumably this is configurable in Rails. Long-polling of the other (asynchronous) webserver should be added to Rails page view layouts.

Thin (and Ruby EventMachine) are asynchronous and non-blocking just like Node.js. They can accept thousands of simultaneous HTTP connections (to browsers) each consuming only a few Kb of memory. Thin being based partly on Ruby EventMachine demonstrates the latter's quality.

The job queue for Rails worker programs also probably should be a RabbitMQ exchange, since we're using it.

Some various other asynchronous Ruby servers are: cramp, goliath, rainbows! and puma.

Actually, probably the best asynchronous webserver for this purpose is the paid service Pusher (or the open-source Slanger equivalent, to keep it in-house).

References:

blog.headius.com/2008/08/qa-what-thread-safe-rails-means.html
confreaks.com/videos/727-rockymtnruby2011-real-time-rack
github.com/igrigorik/async-rails/
github.com/igrigorik/em-synchrony
github.com/jjb/threaded-rails-example
jordanhollinger.com/2011/04/22/how-to-use-thin-effectivly [sic]
www.igvita.com/2009/05/13/fibers-cooperative-scheduling-in-ruby/
thechangelog.com/post/927103350/episode-0-3-1-websockets
www.igvita.com/2010/03/22/untangling-evented-code-with-ruby-fibers/
www.igvita.com/2010/06/07/rails-performance-needs-an-overhaul/
www.igvita.com/2011/03/08/goliath-non-blocking-ruby-19-web-server/
www.tumblr.com/tagged/pusher?before=1323905509
yehudakatz.com/2010/08/14/threads-in-ruby-enough-already/

Copyright (c) 2012 Mark D. Blackwell.

Notes from an intuitive programmer

Monday, July 2, 2012

Scale Rails in the cloud by handling external webservice delays, howto

No comments:

Post a Comment