So, there were some generated xml pages that needed caching. After some assorted reading, the basic caches_page approach seemed sufficient.
First step was to tweak config/routes.rb to bring the url params into the path to make the url cacheable. Something like:
map.connect “feed/resorts_and_holidays/:country_name/:how_recent”, :controller => ‘feed’, :action => ‘resorts_and_holidays’
Then add the following to app/controller/feed_controller.rb, to cache the output of the resorts_and_holidays page:
caches_page :resorts_and_holidays
Oh yes, and the following added to config/environments/development.rb to ensure that caching takes place in development:
config.action_controller.perform_caching = true
And it worked! Hurrah! Stuff was duly being cached into public/feed/…
However, on closer inspection, all was not well. All the cache filenames ended with .html, which was a bad thing since they contained xml. Argh but, luckily, better minds than mine had reached the same argh. There was lots of discussion and several alternatives, but it seemed right to plump for altering the map.connect config to explicitly specify a .xml extension
map.connect “feed/:country_name/:how_recent/resorts_and_holidays.xml”, …
And this did indeed work, although now you could not have default values for the url params. Country_name and how_recent must be set explicitly. e.g. feed/all/3/resorts_and_holidays.xml. A small price to pay. Hurrah once again.
Flushed with success, that cache now needed expiring. The rails-caching-tutorial describes sweeping, and sure enough it does work. Trying the direct approach, there was lots of faffing needed to plonk expire_page into all the relevant places in the models, taking care to ensure the right cache entries were swept for particular calls. It became clear that being precise was not a quick route to happiness.
After further reading of the caching tutorial, and an article about lazy sweeping, blatting the entire cache every time new data was written was the way to go. This would not be terribly efficient if you had lots of little writes that do only affect a few cache entries each time, but with a daily update that more or less rewrites everything, a full sweep was in fact likely to be quicker than a little sweep per model update.
Following the instructions to sweep lazily, the first step was to move the cache directory (presumably to make the recursive file deletion easier/safer), by tweaking config/environment.rb:
config.action_controller.page_cache_directory = RAILS_ROOT + ‘/public/cache/’
Then define a new sweeper class (by copying the lazy sweeping code) in a new directory, app/sweepers/site_sweeper.rb, not forgetting to add the following to config/environment.rb:
config.load_paths += %W( #{RAILS_ROOT}/app/sweepers )
To hook the sweeper into the controller, add the following into app/controller/application.rb, in the parent class of the controller to be cached:
cache_sweeper :site_sweeper,
nly => [ :new, :create, :edit, :update, :destroy, :do_upload_from_file ]
specifying the actions which are to trigger the lazy sweeping.
Another opportunity for a hurrah perhaps? No, something was wrong. The cache files were being written with the correct .xml extension, swept, but otherwise bypassed or ignored by the controller when handling a request. Every response was still being generated in full rather than being served from cache.
It turned out, after much delving and hair pulling and googling, that the problem was with the change to config.action_controller.page_cache_directory. Rails cannot properly handle this being changed.
One solution was to rely on apache rewrite rules so that apache does the checking of the cache before passing the request on to rails, and this is the way to go anyway in production (and is probably why the various blogs which had suggested changing the cache dir had not noticed or mentioned the rails bug). A non-apache option was to leave that page_cache_directory param on its default setting, and amend the site_sweeper to sweep the directory public/feed (where feed was the controller being cached). Thus, the caches_page approach was finally working.
There were two further, minor tweaks, paying homage to the spirit of DRY; a last few polishes to the gleaming wonder of cached xml pages:
- refer to the cache root dir in the sweeper class using ActionController::Base.page_cache_directory, rather than hard-coding it.
- use a named route when contructing urls. So, instead of map.connect “…” in config/route.rb, you can have map.feed_route “…”, and construct the url for that route in the controller by invoking feed_route_url(…) instead of using url_for(…).
Looking back on the coding needed to get this working, it really amounts to a few lines here and there, and could have been done by One Who Knows in about 20 mins. Instead, it took the best part of two days. Hopefully this write-up will be of some use to Others Who Come After.