Tag Archives: ruby

Benchmarks: acts-as-taggable-on vs PostgreSQL Arrays

While looking at performance optimizations for a rails project, I noticed these lines in my debug console:

  ActsAsTaggableOn::Tag Load (0.5ms)  SELECT "tags".* FROM "tags" INNER JOIN "taggings" ON "tags"."id" = "taggings"."tag_id" WHERE "taggings"."taggable_id" = $1 AND "taggings"."taggable_type" = $2 AND (taggings.context = ('tags'))  [["taggable_id", 103], ["taggable_type", "Prediction"]]

This makes sense, my project is using acts-as-taggable-on to tag models. However, our tagging needs are quite simple, and since we are using postgres, I wondered whether using postgres array types might be more efficient. To get a feel for the basic concept, see 41 studio’s writeup.

However, before going through all the trouble, I’d like to see if the performance gains are appreciable or not. Using rails benchmark functionality, we can do this pretty easily.

Full source code is available at https://github.com/adamnengland/rails-tag-bench, or follow along for the step by step for the full experience.

Getting Started
You’ll need

  • Rails 4.0.2
  • Ruby 2.0.0
  • Postgres.app on OS X – though you can certainly modify this to work with any postgres install

Create a new Rails project

rails new rails-tag-bench
cd rails-tag-bench

Open gemfile and add
gem ‘pg’, ‘0.17.1’

(I had to do this first: gem install pg — –with-pg-config=/Applications/Postgres93.app/Contents/MacOS/bin/pg_config)

Then bundle install to get your dependencies.

Replace config/database.yml with

development:
  adapter: postgresql
  encoding: unicode
  database: rails_tag_bench
  pool: 5
  username: rails_tag_bench
  password:
  
test:
  adapter: sqlite3
  database: db/test.sqlite3
  pool: 5
  timeout: 5000

production:
  adapter: sqlite3
  database: db/production.sqlite3
  pool: 5
  timeout: 5000

We’ll need a database user, so open up postgres and issue:

create user rails_tag_bench with SUPERUSER;

Okay, lets create the database

rake db:create

In postgres type
\c rails_tag_bench
to confirm that the database is set up.

To do this, we’ll also need acts-as-taggable-on, so update the gemfile

gem ‘acts-as-taggable-on’, ‘2.4.1’

and bundle install

rails g acts_as_taggable_on:migration
rake db:migrate

Lets start with the taggable version:
rails g model ArticleTaggable title:string body:text
rake db:migrate

open the created article_taggable.rb and edit

class ArticleTaggable < ActiveRecord::Base
  acts_as_taggable
end

Lets setup the benchmark:

rails g task bench

Fill out the body like so

require 'benchmark'
namespace :bench do
  task writes: :environment do
    Benchmark.bmbm do |x|
      x.report("Benchmark 1") do 
        1_000.times do
          ArticleTaggable.create(:title => ('a'..'z').to_a.shuffle[0,8].join, :body => ('a'..'z').to_a.shuffle[0,100].join, :tag_list => ['TAG1'])
        end
      end
    end    
  end

  task reads: :environment do
    Benchmark.bmbm do |x|
      x.report("Benchmark 1") do 
        1_000.times do
          ArticleTaggable.includes(:tags).find_by_id(Random.new.rand(1000..2000));
        end
      end
    end     
  end
end

You can run the benchmarks like so:

rake db:reset
rake bench:writes
rake bench:reads

Which should give you output like this:

➜  rails-tag-bench  rake bench:writes
Rehearsal -----------------------------------------------
Benchmark 1   8.620000   0.340000   8.960000 ( 10.716852)
-------------------------------------- total: 8.960000sec

                  user     system      total        real
Benchmark 1   8.540000   0.320000   8.860000 ( 10.543746)
➜  rails-tag-bench  rake bench:reads
Rehearsal -----------------------------------------------
Benchmark 1   2.930000   0.160000   3.090000 (  3.906484)
-------------------------------------- total: 3.090000sec

                  user     system      total        real
Benchmark 1   2.880000   0.150000   3.030000 (  3.825437)

So, on my macbook air, we wrote 1000 records in 10.5437 seconds, and read 1000 records in 3.8254 seconds with acts-as-taggable-on

Now, lets implement the example using postgres arrays, and see where we land

rails g model ArticlePa title:string body:text tags:string

Edit the new migration as follows

class CreateArticlePas < ActiveRecord::Migration
  def change
    create_table :article_pas do |t|
      t.string :title
      t.text :body
      t.string :tags, array: true, default: []

      t.timestamps
    end
  end
end
rake db:migrate

update our benchmarking code:

require 'benchmark'
namespace :bench do
  task writes: :environment do
    Benchmark.bmbm do |x|
      x.report("Using Taggable") do 
        1_000.times do
          ArticleTaggable.create(:title => ('a'..'z').to_a.shuffle[0,8].join, :body => ('a'..'z').to_a.shuffle[0,100].join, :tag_list => ['TAG1'])
        end
      end
      x.report("Using Postgres Arrays") do
        1_000.times do
          ArticlePa.create(:title => ('a'..'z').to_a.shuffle[0,8].join, :body => ('a'..'z').to_a.shuffle[0,100].join, :tags => ['TAG1'])
        end
      end
    end    
  end

  task reads: :environment do
    Benchmark.bmbm do |x|
      x.report("Using Taggable") do 
        1_000.times do
          ArticleTaggable.includes(:tags).find_by_id(Random.new.rand(1000..2000));
        end
      end
      x.report("Using Postgres Arrays") do 
        1_000.times do
          ArticlePa.find_by_id(Random.new.rand(1000..2000));
        end
      end      
    end     
  end
end
rake db:reset
rake bench:writes
rake bench:reads

The Results

➜  rails-tag-bench  rake bench:writes
Rehearsal ---------------------------------------------------------
Using Taggable          8.520000   0.330000   8.850000 ( 10.532700)
Using Postgres Arrays   1.460000   0.110000   1.570000 (  2.082705)
----------------------------------------------- total: 10.420000sec

                            user     system      total        real
Using Taggable          8.340000   0.310000   8.650000 ( 10.221277)
Using Postgres Arrays   1.410000   0.110000   1.520000 (  2.012559)

➜  rails-tag-bench  rake bench:reads
Rehearsal ---------------------------------------------------------
Using Taggable          2.920000   0.160000   3.080000 (  3.898911)
Using Postgres Arrays   0.420000   0.060000   0.480000 (  0.700684)
------------------------------------------------ total: 3.560000sec

                            user     system      total        real
Using Taggable          2.870000   0.140000   3.010000 (  3.805598)
Using Postgres Arrays   0.400000   0.060000   0.460000 (  0.677917)

For my money, the postgres arrays appear to be much faster, which comes as little surprise. By cutting out all the additional joins, we greatly reduce the query time.

However, it is important to note that this isn’t an apples-to-apples comparison. Acts-As-Taggable-On provides a lot of functionality that simple arrays do not provide. More importantly, this locks you into the postgres database, which may or may not be a problem for you. However, if you really have simplistic tag needs, the performance improvements might be worth it.

Tagged , , , , ,

A Tale of 2 Authentications

I’ve recently had the opportunity to work with the REST APIs for two great services, Dropbox & SugarSync.

For the uninitiated, SugarSync and Dropbox are two competing cloud storage providers.  While Dropbox seems to have better name recognition, SugarSync provides a better pricing model, and great support across different Operating Systems & Mobile Platforms.  However, I’m not here to talk about the merits of the products,  I’m here to talk about the APIs, specifically the Authentication methods they use, and what other API developers can learn from them

Authentication

Dropbox –  The first obstacle with a service like Dropbox is authentication.  Dropbox uses oAuth v1.  While I usually find oAuth documentation confusing (and Dropbox is no exception), it is pretty easy to master with modern toolkits.  Within a few minutes, I was able to get my Dropbox request token (in ruby):


dropboxSession = DropboxSession.new('myKey', 'mySecret')
@dropBoxRequestToken = dropboxSession.get_request_token()

As is typical with oAuth, I can then send this request token to  dropbox, they request that the user authenticate it, and I receive my access token.

SugarSync – This one is a little different.  SugarSync avoids the oAuth route, and instead provides a simple authentication method.  You make a call, providing the user’s credentials, as well as your app key & secret, SugarSync responds with an authorization token.  As opposed to oAuth, I just have to make a single POST call to https://api.sugarsync.com/authorization, and I’m done in 1 step.  Cool, huh?

What’s wrong with Sugar Sync’s authentication model?

I see this after time.

  1. Developer creates API
  2. Developer wants to avoid storing user names and passwords, so developer looks at access-token based authentication.
  3. Developer finds oAuth to be too cumbersome.
  4. Developer creates one-off token based authentication.

The error is the idea that non-oAuth, token based authentication is any more secure than simplyusing HTTP Basic Authentication (the simplest of all authentication schemes) over SSL.  The argument goes like this:

Token Authentication Defender:  If we HTTP Basic Authentication, the API clients will either have to store the user name & password, or ask the user for it each time they access the service.  We don’t want the clients to store the password, to avoid potential exposure.

Me:  How are you going to stop me from storing the password?  If I have to get the credentials from the user once, and send it to an “Authorize” call, what is going to stop me from saving it into my database?  Nothing.

Some Simple Guidelines

In this particular situation, I’d suggest that the SugarSync team move to an oAuth service.  It really is bad practice for their 3rd party apps to gain access to their users credentials.  oAuth, which slightly more complicated, provides the separation of concerns that this kind of API needs.  SugarSync should worry about username/password verification, and your app should simply sign the requests appropriately.

As for the APIs that I’ll design in the future, I’ll probably follow these (overly general) guidelines

  1. Is the API intended for a large number of 3rd party apps to access?  Is it essentially useless without the user providing credentials?  In these cases, oAuth seems like a winner.
  2. Is this an API mostly for use between various products within the same company?  Do I feel comfortable with the client applications passing through a password?  In this case, HTTP Basic Auth is the easy way out, both for the API designer and consumer.  Just make sure you use SSL.
  3. Do I have a really, really, really good reason to build a brand new authentication method?  Is my problem really unique compared to what Facebook, Twitter, or Dropbox does?  Am I prepared to open source some code and write about my solution to prevent it from becoming a one-off monster that the team will regret?  If so, maybe, just maybe, I’ll write a new authentication method…

Update 7-31-2012 Since this was written, SugarSync has changed their API to remove the need to store a users password. Definitely a step in the right direction. You can see the process here.

Tagged , , , , ,