Small Footprint MongoDB

Posted by code_monkey_steve on Mar 25, 2012 Mar 25

MongoDB comes configured out of the box for maximum performance and reliability on production databases. But it can be a bit of a disk hog, and if you’re using a development environment with an SSD like me (which I highly recommend), disk space might be scarce. After doing a little research, I found configuration settings that significantly reduce MongoDB’s disk usage.

Edit your MongoDB configuration file (/etc/mongod.conf) and add some/all of the following:

smallfiles = true

Uses smaller data file sizes — starting at 16MB instead of 128MB — and create fewer files initially. This can save almost 200MB on small collections (each!).

oplogsize = 100 (MB)

If you’re using an oplog for replication (or just for update notifications), you can set the size smaller than the default of “5% of all disk space”.

nojournal = true

MongoDB 2.0 introduced journaling, which is great for production environments, but not very useful in development. You can disable it and save several GB.

comments

Excluding a bad RPM package

Posted by code_monkey_steve on May 12, 2011 May 12

I’m a big fan of KDE, as both a user and a developer, and Akregator is my RSS feed reader of choice. I’m also a big fan of RSS feeds, using them for almost all my regular daily information consumption.

So imagine my notable lack of delight when, after doing a regular YUM update, I discovered that the latest version of Akregator has a serious bug that makes it almost unusable. And I didn’t even want the new version anyway.

Ah, but since I’m using RPM and YUM, the fix for this sort of thing is actually pretty simple, although it took me a few minutes of reading man pages to work it out, so I thought I should share the fruits of my labor. Here’s how you exclude a bad RPM package:

First, find the last good version of the appropriate package:


$ rpm -qf `which akregator`
kdepim-4.4.11.1-2.fc14.x86_64

$ sudo yum list kdepim --showduplicates
...
Installed Packages
kdepim.x86_64        7:4.4.11.1-2.fc14
Available Packages
kdepim.x86_64        6:4.4.6-2.fc14
kdepim.x86_64        7:4.4.11.1-2.fc14

In this case, the desired version is 4.4.6 (the “6:” is the epoch, the “-2” is the release, and the “fc14” is the architecture).

Next, downgrade the package. If there are any dependency errors, you’ll also need to downgrade those packages too.


$ sudo yum downgrade kdepim-4.4.6
...
Error: Package: 7:kdepim-libs-4.4.11.1-2.fc14.x86_64
...

$ sudo yum downgrade kdepim-4.4.6 kdepim-libs
...
Removed:
  kdepim.x86_64 7:4.4.11.1-2.fc14                                                   kdepim-libs.x86_64 7:4.4.11.1-2.fc14

Installed:
  kdepim.x86_64 6:4.4.6-2.fc14                                                      kdepim-libs.x86_64 6:4.4.6-2.fc14

Complete!

And finally, edit /etc/yum.conf and add the offending package version to an exclude line:


[main]
...
exclude=kdepim-4.4.11.1 kdepim-libs-4.4.11.1

Take that, kdepims-4.4.11. plonk

comments | Tags: rpm and tips

AudioBook Log

Posted by code_monkey_steve on May 8, 2011 May 8

For over a year I’ve been commuting for almost 2 hours a day, and in an attempt to stave-off potentially lethal boredom, I’ve been passing the time listening to audiobooks. Here’s what I’ve been reading/listening to:

Note, format is: title (year), authors (readers)

While all of the books were good, the quality of the audiobook depends almost entirely on the voice of the person reading it. So far my favorites are Tom Weiner’s rich baritone and Richard Dawkins’ pleasant english accent. By far the worst, though, is anything read by Carl Sagan: I can highly recommend Carl’s plodding pace and odd word emphasis to insomniacs who don’t respond to strong drugs.

comments | Tags: books

Toward a More Perfect Mongo ODM

Posted by code_monkey_steve on Apr 23, 2011 Apr 23

MongoDB, MongoMapper, and Mongoid

I’m now well into my third Rails project using the MongoDB document database, and while I’m still a big fan of Mongo, I’ve been underwhelmed by the ODMs I’ve used. On the first project, I started with MongoMapper, which is very mature and well-supported, but was a little klunky and tried a little too hard to be like ActiveRecord.

For the second project, I switched to Mongoid, which was a huge improvement. It played nicely with ActiveSupport and ActiveModel, and had better support for doing things the Mongo way. But in the end, it had several nasty bugs related to associations and embedded documents, and the better I understood what I wanted from an ODM, the more I realized that Mongoid wasn’t it.

The Alternatives

Candy

I looked into Candy, and found its approach intriguingly fresh. Models don’t have to specify field names or types, and can be Arrays or Hashes or any other Ruby type. But I don’t like the lack of control over the serialization process (e.g. find, save, callbacks, validations, etc.), nor the absence of any sort of relational mechanism. Like Mongoid, it does have a nice query Criteria DSL, though.

Mongomatic

I’m not sure Mongomatic even qualifies as an ODM, as it doesn’t seem to do any mapping. From what I’ve seen, it’s just a thin wrapper around the Ruby MongoDB driver, adding little. I don’t know why anyone would bother using it.

MongoODM (my fork)

It could use a better name, but it’s a nice ODM, if a bit immature. I especially like its support for embedded documents, i.e. you don’t have to do anything special, just assign a variable of the specified Mongo-serializeable type (Document or otherwise) to a field, and it Just Works. It also supports Arrays and Hashes that can take any heterogeneous collection of types.

It’s also better designed under the hood than Mongoid or MongoMapper, taking full advantage of Ruby conventions to be easily hackable. MongoODM is definitely the best candidate for Perfect ODM I’ve yet seen.

The Perfect ODM

Here’s what I really in my perfect Mongo ODM:

Plays Well with Rails

Like it or not, the ActiveRecord API is the standard convention for performing DB operations. And to the extent that SQL and MongoDB are conceptually similar, they should maintain the same API. This makes it easier to integrate with other software that may assume AR conventions, but more importantly, it keeps me from having to learn and remember a whole new set of only-slightly-different APIs.

Duck typing and Other Ruby-isms

This is one big feature that ActiveRecord does not (and cannot) have, but which Mongo gives us almost for free — dynamic typing, just like native Ruby. Mongoid supports this for polymorphism, but MongoODM also supports dynamic types in Hashes and Arrays, and it was this fact that original attracted me to it. I have no problem with declaring document fields, but why should I have to specify the type? For that matter, why should I be constrained to a static type?

Schema DSL

Even though I want the freedom to store any value of any type in any field, I know that schemas are still important, both for validation and configuration management. All ODMs provide ActiveRecord-style type-specifiers and validations (Mongoid and MongoODM also use ActiveModel), but I’d like to see document schemata become a top-level object, some superset of JSON Schema, with a friendly and extensible DSL. Something like this:

class Person
    schema do
      property(:name) {
        type   String
        length 1..20
        required
      }
      property(:phone) {
        type Phone
        optional
      }
      property(:aliases) {
        type Array.of(String)
        optional
      }
      property(:vehicles) {
        type Array.of(Car, Boat, Spaceship)
        required
        default []
      }
      additional_properties false
    end
  end

Once the schema is nestled into object form, there’s a whole bunch of interesting things you can do, in addition to validations:

  • Schema versioning and heterogeneous collections
  • Data migrations and schema management
  • Client-side validations (via JSON Schema)
  • Automatic form generation (think: ActiveScaffold on steroids)
References and Associations

This area gets a bit tricky, partially because of SQL’s wretched legacy of foriegn keys and join tables, but also because the problem is just inherently difficult. Ideally, the database or ODM would provide an equivalent to Ruby’s garbage-collected memory management system, where any document field could be a reference to any other object of any type, and all objects would be automatically destroyed when no longer used.

MongoDB actually comes pretty close with their support for Database References. These allow you to assign to a document field a reference to any document in any collection. I’ve expanded MongoODM with a transparent Reference proxy, and assigning a reference to a field is as simple as calling .reference (or .ref) on a document. I’ve been looking at adding something similar for GridFS attachments, but with a reference count to allow easy sharing of large binary objects between documents.

The Future — No ODM?

This post is mostly just a dump of ideas I’ve had while working to expand MongoODM. But I find myself working more and more in Javascript, these days, and taking advantage of things like jQuery and Backbone to build rich client applications in the browser. In this situation, which I think will become more common, I don’t need so much ODM support in Ruby/Rails, and more-so in Javascript. So now I’m contemplating a MongoDB interface in JavaScript, passing through some sort of Rack proxy to perform access control. I’ll let you know how it turns out …

comments | Tags: mongodb

Back from the Dead

Posted by code_monkey_steve on Mar 19, 2011 Mar 19

Greetings, everyone! Welcome to my new-and-improved blog.

I’ve decided to merge my coding blog (“One-Banana Problem”), which has gone woefully under-updated, with my website (“finagle.org”), which is gone woefully under-used, and you’re now looking at the result. If you’re thinking it looks a lot like One-Banana Problem …

  1. You actually read my blog? Wow thanks, you’re awesome, that almost doubles my audience!
  2. It looks the same because it’s a port of the original content and style from OBP to a new platform.

Why a new platform? I’m glad you asked, therein lies the story …

The Story

It all started a few months back, when I went to compose a long-overdue OBP post, and discovered that Mephisto wasn’t letting me login. How rude. A quick Google not only failed to provide a solution to the problem, but indicated that Mephisto was no longer maintained or supported. The consensus appeared to be to migrate to a combination of Jekyll, a static-site generator, and the Disqus commenting system. Over the months since, I’ve followed the same path, deploying the new blog on Heroku, my preferred hosting service. Porting the content was fairly easy, but porting the theme not so much.

Jekyll

The best I can say about Jekyll is that it works well enough to get things running again. At first, I was considering using it for some static brochure sites for friends and family, but after getting it working I’ve decided it’s not good for me for a few reasons:

  • No HAML support. Although there are lots of hacks and extensions that claim to fix this, I could never get any of them to work.
  • Heroku provides Varnish to automatically cache pages, so generating static HTML content doesn’t give any significant speed improvement.
  • I do have to check-in all that static content to Git, which just needlessly bloats the Heroku slug.
  • Heroku provides a Rack interface, which means I still need to use a (tiny) Sinatra app to serve that static content.

While I can appreciate the idea of storing the content in the blog code itself, with metadata in a YAML header block and the copy in Textile, I think I’ll stick to Sinatra for my static sites. I’m already working on a CMS-ish project that will probably replace Jekyll as soon as it’s done enough.

Moving Forward

I originally started One-Banana Problem in 2006, after getting laid-off from a failing startup. I wanted to make the switch from a C++ developer to a Ruby/Rails guy, and I thought a blog would help not only to share my hard-won knowledge, but also to market myself to the Rails community in hopes of finding a good job. To that end, I tried to keep the tone professional and the content mostly related to Ruby and Rails.

But here we are, five years later, and now I’m living living the dream of a full-time Ruby developer (woo-hoo!). However, having achieved that goal, I find myself with less motivation for blogging, and more importantly, with less time.

To counter this trend I’ve decided that, as part of The Great Blog Renaming, that I would expand the scope from a mostly coding-centric blog, to including hacking of all kinds. I am a geek, after all, and to my brain all problems have technological solutions. I have some interesting ideas for things like secure voting, digital currency, and even improvements to the democratic process (which hasn’t changed significantly in centuries), and I’ll try to find the time to comment on these sorts of political and social issues more freely in the future.

I’ve also come to the conclusion that trying to keep things “professional” is boring and pointless. If you can’t be open an honest bastard with the whole Internet, viewable by all and archive forever, then why bother blogging?

comments | Tags: blog jekyll heroku