I recently encountered a MySQL table backing an ActiveRecord model that grew so unexpectedly fast that it ran out of space. Our engineering team did what anybody would do in an emergency: moved all the existing data somewhere with more space and created a new, empty table so we wouldn’t lose new writes.
Due to the nature of this data and the code that references it we couldn’t immediately move all queries to just use the new storage location. We had to keep both datasets for a while and we needed existing references to the ActiveRecord model to just keep working.
Note: Our “fix” was purely temporary. What I’m about to describe is a gross hack that got us through the immediate crisis but we immediately began work to properly collapse these two datasets into one with a better runway for future growth. It’s a useful example of Ruby’s flexibility but please don’t put what follows into production unless you absolutely have to.
What we did was swap out the ActiveRecord model for a bare class that simply delegates all the messages an instance receives to both of the MySQL datastores to present a unified view of two totally different tables on two different hosts.
Here’s how we did it.
Delegating everything
All of our code was referencing TheModel
and expecting it to behave like ActiveRecord. So the first step was to hide the actual model as inner class of a BasicObject (which is an object that has almost no methods defined). Then we created a second inner class that referenced the new dataset. Then we took all the existing, shared behavior from the original model and put it in a module to be included by both.
class TheModel < BasicObject
module ExistingBehavior # methods that used to be inside TheModel
def self.included(model)
model.class_eval do # method calls that used to be in the class
# e.g. `belongs_to :user`
end
end
end
class New < ::ActiveRecord::Base
establish_connection "original_datastore" # Connect to the right db
self.table_name = :records
include ExistingBehavior
end
class Old < ::ActiveRecord::Base
include ExistingBehavior
establish_connection "other_datastore" # Connect to the right db
self.table_name = :records_other
end
end
Reading from two places
What we’ll need is a way to turn calls for TheModel.all
or TheModel.where(some_condition).all
actually reads from two different datastores. It’s not too hard to proxy to just one datastore because all we need to do is delegate a few calls on the TheModel class to the inner model:
class TheModel < BasicObject
class << self
delegate :all,
:where, # This isn't a complete list,
:first, # of creation methods but
:last, # it'll do for now
:new,
:create,
:create!,
:to => :'TheModel::New'
end
class New < ActiveRecord::Base
end
end
Now if you run TheModel.all you’re actually calling TheModel::New.all. And any chained messages you add on to the result of that call will go to the right place.
But what we really want is to have TheModel.all return TheModel::New.all + TheModel::Old.all. Let’s start with a naive approach:
class TheModel < BasicObject
class << self # Keep delegating constructor methods
delegate :new, :create, :create!, # to the new dataset
:to => :'TheModel::New'
end
def self.method_missing(method_name, *args, &block) # all other calls get sent to
New.send(method_name, *args, &block) + # both models and joined
Old.send(method_name, *args, &block)
end
end
But that assumes that all of our operations will be immediately concatenating two values. That’s not true, sometimes we want to chain up successive where
calls. To make this possible we need some kind of intermediate object that holds state before we’re ready to concatenate the results:
class TheModel < BasicObject
def self.method_missing(method_name, *args, &block)
FindInTwoPlaces.new(New, Old).send(method_name, *args, &block)
end
class FindInTwoPlaces
def initialize(new_dataset, old_dataset)
@new_dataset, @old_dataset = new_dataset, old_dataset
end
def all
@new_dataset.all + @old_dataset.all
end
def method_missing(method_name, *args, &block)
FindInTwoPlaces.new( # All calls except `all`
@new_dataset.send(method_name, *args, &block), # just create a new instance
@old_dataset.send(method_name, *args, &block)
)
end
end
This FindInTwoPlaces
class is initialized with two datastores. Initially that’s just the bare models representing the complete datastores. But as we chain messages each datastore object gets further refined. If you call TheModel.where("id=1")
you get a FindInTwoPlaces
instances that contains TheModel::New.where("id=1")
and TheModel::Old.where("id=1")
. When you then call .all
on that object they get concatenated together.
So far so good, but we’re getting a lot of complexity for just the .all
class method. We can do better by defining all of the final methods that we will want to use and proxying all other messages to a new intermediate object. For now let’s just implement all
, count
, and to_a
class TheModel < BasicObject
def self.method_missing(method_name, *args, &block)
FindInTwoPlaces.new(New, Old).send(method_name, *args, &block)
end
class FindInTwoPlaces
def initialize(new_dataset, old_dataset)
@new_dataset, @old_dataset = new_dataset, old_dataset
end
def all
@new_dataset.all + @old_dataset.all
end
def count
@new_dataset.count + @old_dataset.count
end
def to_a
@new_dataset.all + @old_dataset.all
end
def method_missing(method_name, *args, &block)
FindInTwoPlaces.new(
@new_dataset.send(method_name, *args, &block),
@old_dataset.send(method_name, *args, &block)
)
end
end
Now TheModel.where("id=1").order("created_at DESC").limit(2).all
works as expected. It’ll return all records from both datasets that match those conditions. However, if you call .first
at the end instead of .all
it’ll totally fail to work. Let’s fix that.
class TheModel < BasicObject
def self.method_missing(method_name, *args, &block)
FindInTwoPlaces.new(New, Old).send(method_name, *args, &block)
end
class FindInTwoPlaces
def initialize(new_dataset, old_dataset)
@new_dataset, @old_dataset = new_dataset, old_dataset
end
def first
@new_dataset.first || @old_dataset.first
end
def all
@new_dataset.all + @old_dataset.all
end
def count
@new_dataset.count + @old_dataset.count
end
def to_a
@new_dataset.all + @old_dataset.all
end
def method_missing(method_name, *args, &block)
FindInTwoPlaces.new(
@new_dataset.send(method_name, *args, &block),
@old_dataset.send(method_name, *args, &block)
)
end
end
end
Here we’ve introduced our first application-specific decision. It may be that you want one dataset prioritized over another. For my case here I need the newer one if it exists. We’ve also introduced our first operation that isn’t concatenative. If we implement all possible final methods we’ll have to be careful to use +
or ||
or other operators as is appropriate.
Let’s implement all of the rest of the methods that might appear as the final part of an ActiveRecord query chain. And to save on some typing I’m going to go ahead and refactor them into lists of method names grouped by operator. I’ll also be fixing a bug that exists in the above implementations by ensuring that all arguments to these methods get properly forwarded to the internal datastore objects.
class TheModel < BasicObject
def self.method_missing(method_name, *args, &block)
FindInTwoPlaces.new(New, Old).send(method_name, *args, &block)
end
class FindInTwoPlaces < Struct(:new_dataset, :old_dataset) # Making this a struct lets us skip
%w{all count to_a pluck}.each do |m| # writing an initializer and it
define_method m do |*args, &block| # gives us accessors for free
new_dataset.send(m, *args, &block) + old_dataset.send(m, *args, &block)
end
end
%w{first last}.each do |m|
define_method m do |*args, &block|
new_dataset.send(m, *args, &block) || old_dataset.send(m, *args, &block)
end
end
def empty?
new_store.empty? && old_store.empty?
end
def method_missing(method_name, *args, &block)
FindInTwoPlaces.new(
new_dataset.send(method_name, *args, &block),
old_dataset.send(method_name, *args, &block)
)
end
end
You may notice I haven’t implemented the entire API of an ActiveRecord model. That’s because the full API includes a ton more methods including both .forty_two
and .forty_two!
.
If we needed all of those operations we’d probably have deeper problems because the objects in an application need to communicate over the narrowest API possible to keep the app simple. However, blindly passing them off to method_missing will have indeterminate results. So we should explicitly disallow their use.
Here, then is the final version of our wacky meta-model. It presents two totally separate ActiveRecord datastores (possibly on different hosts or even using different database technology) as a single, unified datastore:
class TheModel < BasicObject
class << self
delegate :new, :create, :create!, :to => :'TheModel::New'
end
def self.method_missing(method_name, *args, &block)
FindInTwoPlaces.new(New, Old).send(method_name, *args, &block)
end
module ExistingBehavior
def self.included(model)
model.class_eval do
end
end
end
class New < ::ActiveRecord::Base
establish_connection "original_datastore"
self.table_name = :records
include ExistingBehavior
end
class Old < ::ActiveRecord::Base
include ExistingBehavior
establish_connection "other_datastore"
self.table_name = :records_other
end
FindInTwoPlaces = Struct.new(:new_dataset, :old_dataset) do
## Concatenating operations
%w{
all
count
delete_all
destroy_all
explain
ids
pluck
sum
to_a
update_all
}.each do |m|
define_method m do |*args, &block|
new_dataset.send(m, *args, &block) + old_dataset.send(m, *args, &block)
end
end
## Selecting a value from one or the other
%w{
any
first
include
last
many?
}.each do |m|
define_method m do |*args, &block|
new_dataset.send(m, *args, &block) || old_dataset.send(m, *args, &block)
end
end
## Impossible operations
%w{
average
calculate
create
create_with
delete
destroy
exec_explain
exists
fifth
find
find_by
find_each
find_in_batches
find_or_create_by
find_or_initialize_by
first
first_or_create
first_or_initialize
forty_two
fourth
fourth!
last
lock
maximum
minimum
readonly
second
take
third
update
}.each do |m|
define_method m do |*args, &block|
raise "Sorry, #{m} isn't possible via FindInTwoPlaces, do it by hand"
end
end
def method_missing(method_name, *args, &block)
self.class.new(
new_dataset.send(method_name, *args, &block),
old_dataset.send(method_name, *args, &block)
)
end
end
end
Again, please don’t use this in production unless you absolutely have to.