About 15 years ago I made an R:Base application that tracked my collection of music LPs and CDs. It was a 3rd normal form database with a menu-driven front end written using the R:Base 3GL. It tracked recordings: composer, piece, performers, location, etc. Also tracked reviewers - myself and a couple of fellow audiophile friends and our equipment. Some reviewers (including myself) had more than one audio system and reviewed recordings separately on each. Reviews were a M-M link table from recordings to reviewers. Each had numerical ratings for sound & performance, as well as text notes. Basically, it was a simple example of a normalized database, with transactional access and a front end interface.
BACK TO THE FUTURE: BUILDING THE RAILS APP
Recently, as an exercise in learning Ruby & Rails, I decided to resurrect this app in Rails. First I got Rails working with Postgres, since I preferred it to MySQL. I defined the Ruby objects and schema using the Rails API to define the 1-M and M-M relationships, auto-created a Rails app, and ported the data from my old application into the new one. I defined some basic features such as clickable tags to sort the data, automatic aggregatation of data to compute and display average ratings, simple wizard style step by step forms for entering new recordings & reviews, etc. So far so good.
THE PROBLEM
The problem was, with only about 400 recordings in the system, the main screen was issuing hundreds of SQL selects and taking about 4 seconds to display. The problem was the way Rails was automatically joining recordings, reviewers and reviews from the Ruby data model to the M-M relationship in the DB.
What was happening was this: I gave each recording 2 read-only computed fields: avg_performance and avg_sound. These were the averages of the numerical ratings for performance & sound for each recording. To compute this, it has to read each review linked to each recording. In SQL, this means for each recording, it finds all related reviews. All my primary and foreign keys were integers and indexed in the RDB, so the join performance should have been good. But looking at the Rails console output when I use the app, I saw that Rails wasn't forming a join. For each recording it issued a separate SQL select to fetch the related reviews. What should have been a single SQL join select, instead was hundreds of individual SQL commands.
The problem in greater detail: Rails data objects inherit a find() command (among many other things) through ActiveRecord::Base. By default, to display the list of recordings the app says: Audiorec.find(:all). This Audiorec class uses has_many/:through declarations for its relation to reviews and reviewers. But Rails optimizes the loading of records by only loading the base object - in find(:all), Rails does not load any objects referenced by has_many/:through declarations. Instead, it loads them on demand when they are accessed. Problem is, in my application they are always going to be used (since they are needed to sort the records), so this optimization kills performance.
THE SOLUTION
My first reaction was to cache the data, but I didn't really like this because (A) it opened a Pandora's box of data sync/stale problems and (B) it was a lot of custom coding that shouldn't be necessary in a system like Rails, and (C) it undermined the Rails framework, which should have support for what I want to do.
It seemed a better idea to find a way force Rails to load the child objects up front.
That is, instead of issuing hundreds of individual SQL selects,
issue a single SQL join that loads every recording linked with all its reviews.
This was a daunting task, as the Rails API docs have hundreds of classes
each with a multitude of methods, parameters and options.
Instead of wasting hours groping blind through these API docs,
I Googled a few keywords and found an article from Oracle about optimizing Rails for their database.
Long story short (if it's not already too late to say that),
it turned out to be a single parameter to the ActiveRecord::Base.find() method.
It now says: ActiveRecord.find(:all, :include=>[:reviews])
This tells Rails to load the :reviews field/property/object along with each Audiorec.
The schema definition (create_audiorec.rb) and class (audiorec.rb) files
tell Rails everything it needs to know in order to form the SQL join.
The bottom line: hundreds of individual SQL commands became a single SQL join. Performance was about 10x faster and the main screen went to about half-second response time with about 400 recordings each with 1-3 reviews.
NOTES
Rails is a huge system and Ruby is a language so terse it can appear downright cryptic. The Ruby language docs are poorly organized and the Rails documentation is much like Unix "man" pages - it assumes you already know what the system can do and only explains the details of how to do it.
Case in point: this experience. A simple feature that is buried so deeply in cryptic documentation that it takes a 3rd party review/tutorial to reveal.
I'm normally a guy who likes to learn systems by first getting a high level idea what the system does and how it's put together, then working from the bottom up actually building stuff. But learning Rails this way by reading API docs would be futile as the system is so large & complex. The 'net is rife with tutorials and examples written by people who have learned the system. This is an invaluable aid even to those of us coming to Ruby/Rails with decades of experience in other systems.