Sunday, 20 May 2012

Why a PaaS needs a DBaaS

I have recently been exploring using a PaaS to deploy large scale applications and have come to the conclusion that in order to truly horizontally scale, I need to integration a PaaS with a DBaaS (Database as a Service). Here is is how I got to this conclusion.  

The great thing about today’s PaaS offerings it is that they provide a great way to increase developer agility by allowing application teams to push out applications without having to be involved with the underlying infrastructure.

How this works is that the PaaS takes on the responsibility of deploying the application to a set of application nodes fronted by a web server and binding the application to a set of supporting services (Databases etc). Here a simple example for a Java application deployed within a PaaS:




If I write my application correctly and avoid state at the application server layer then I can simply get the PaaS to add more application server nodes dynamically.  




There no real surprise here that at large scale the application will be bottlenecked at the database layer. Unfortunately many PaaS offerings consider the database as a supporting services and don’t offer a way of horizontally scale at the service layer. To be fair some PaaSs offer multiple databases instances for HA but this is not the same as
horizontal consistent read and write scalability.  

All is not lost, we are starting to see the raise of Database as a Service (DBaaS). It almost goes without saying that in order meet the requirements above you will need to use a database that scales horizontally. Examples of such services are MongoHQ and MongoLab.

Now instead of pointing my application to the embedded database within the PaaS, I simply point it to my DBaaS API. 




The great thing about this is now I can dynamically horizontally scale both the application and database layer as required.

This is actually not that difficult to achieve as a PaaS will typically provide HTTP out and many DBaaS have a REST API.

Wednesday, 22 February 2012

Monday, 19 December 2011

Enable MongoDB Profiling within MMS


MMS is 10gen’s cloud based monitoring tool for MongoDB. As you would expect, MMS monitors your MongoDB environment including mongod, mongos and config servers.  In this blog I am going to walkthrough setting up MongoDB query profiling within MMS.

Installing the MMS agent
If you have not downloaded the MMS agent already, click the “download agent” link immediately after you sign into MMS to download an agent specifically configured for your account.

The agent is a small python application so you will also need to install python and pymongo

Starting the MMS agent
Once you have the agent installed, simply start it with:

nohup python agent.py > /[LOG-DIRECTORY]/agent.log 2>&1 &

Adding a Host
Login to MMS (mms.10gen.com) and click on the “+” icon next “Hosts”


Enter your host details and press “Add”.

In a few minutes MMS will start collecting metrics about your MonogDB setup.  

Enabling MMS Profiling
Now you are collecting metric details about your host, the next thing to do is enable profile collection.  

Lets start by enabling profiling on a database. There is actually a number of different ways you could do this however in this example I am going to use the MongoDB shell and dynamically turn on profiling on an existing database.

There are three profiling levels to choose from:

0 - off
1 - log slow operations
2 - log all operations


Note that MongoDB considers an operations that takes longer than 100ms to be slow.

Login to the mongo shell and execute the command setProfilingLevel on your database. For example:

db.setProfilingLevel(1);

To enable the profile data collection within MMS, click on pie icon next to your host:


A basic message box should appear that explains the effect of profiling. Click the enable button:


Viewing Profile Data
At this point everything should be setup and ready to go. Drill into the hosts that you enabled profiling on and navigate to the “Profile Data” tab. 


Sunday, 11 December 2011

MongoDB File Store with Java

I was recently working with the Java MongoDB driver to deliver the ability to store large files within MongoDB. MongoDB provides a large file store via an implementation of GridFS.

A single document within MongoDB has a maximum size of 16MB however many files are over this size. It is GridFS’s responsible for braking down a file into small chunks and managing the meta data about the file chunks.

In this blog, I am going to walkthrough a simple example of using GridFS from version 2.7.0 of the MongoDB Java Driver.

Connecting to MongoDB
The first thing to do is to create a connection to MongoDB. This is a fairly standard process. Simply create a instance on Mongo and get hold of a database. In order to create a file store from this database, simply create an instance of GridFS and pass it the database.

Saving a File
As a next step, lets save a file within the GridFS file store. Te first thing to do is the create an instance of the standard File that points to the file you would like to store. Once we have a File, we pass the reference onto GridFS createFile method.

The createFile method gives you back a reference to a GridFSInputFile which you can then populate with additional meta-data before saving.


If the file is over 16MB, GridFS will break the file into document chunks for your behind the scenes.

Find a File
Clearly we need the ability to find the file now it is in MongoDB. As you would expect, the GridFS interface provides a number of “finder” methods. You can find a file based upon the id, filename or custom query for example:



Writing a File
Just for completeness lets write a temporary file that contains the file contents we found in the previous step.

Simply create a standard temp file and pass it to the “writeTo” method your instance of your GridFSDBFile.


Complete Source

Wednesday, 7 December 2011

Exporting MongoDB Data from Cloud Foundry


Following on from “Tunnelling to MongoDB on Cloud Foundry”, a powerful capability of having a tunnel into MongoDB on Cloud Foundry is that you can now easily export and backup your data. 

Here is a quick overview of how to hook the MongoDB export and backup tools to Cloud Foundry.

Establishing a Tunnel
Following the steps in “Tunnelling to MongoDB on Cloud Foundry”, setup a tunnel that will allow you to connect MongoDB tools to the MongoDB node (“myDB”) you created in Cloud Foundry. To create the tunnel execute:

$vmc tunnel myDB none

Note: I am using the “none” argument to say I will connect the client later. It you don’t include “none” vmc will list your client options.

Before establishing the tunnel, VMC will prompt for your Cloud Foundry password. Assuming you enter the correct password, VMC will then establish the tunnel and print out the tunnel properties:

Service connection info:
      username : <USER_NAME>
      password :<PASSWORD>
      name     : db

Starting tunnel to myDB on port 10000.
Open another shell to run command-line clients or
use a UI tool to connect using the displayed information.
Press Ctrl-C to exit...

Data Import/Export
Now that the tunnel is open, you can use it to export your data. MongoDB comes with a tool called “mongoexport” that allows you to export a target collection or query result. Here is a simple example:

$mongoexport -port 10000 -u <USER_NAME> -p <PASSWORD> -d db -c <COLLECTION> -o <EXPORT_LOCATION>
You can also import data into a collection with “mongoimport”:  

$mongoimport -port 10000 -u <USER_NAME> -p <PASSWORD> -d db -c <COLLECTION> <DATA_FILE>

Data Backup/Restore
The Cloud Foundry tunnel also provides a simple way of backing up and restoring MonogDB data hosted on cloudfoundry.com. MongoDB comes with a tool called “mongodump” that is designed to take hot backups from mongod. Here is an example of connection “mongodump” to the Cloud Foundry tunnel:  

$mongodump -port 10000 -u <USER_NAME> -p <PASSWORD> -d db

If required you can use the tool “mongorestore” to restore your backup:

$mongorestore -port 10000 -u <USER_NAME> -p <PASSWORD> -d db <BACKUP_DIR>



Saturday, 3 December 2011

Tunnelling to MongoDB on Cloud Foundry


The Cloud Foundry team just released the ability to tunnel into data services hosted on cloudfoundry.com. Here is a short overview on how to tunnel to MongoDB.

Installing Cloud Foundry Tools
The Cloud Foundry client tools are ruby applications so the first thing to do is install Ruby 1.9.2. If you are using a mac a simple way to install Ruby is via RVM. You can check what version of Ruby you have installed with:

$ruby -v

Cloud Foundry has the command line tool called ‘vmc’. The tunnelling command within vmc was was made available in vmc 0.3.14.beta and is not currently in the vmc GA realise. To install the beta version execute:

$gem install vmc --pre

The tunnel itself is provided by a Cloud Foundry project called Caldecott. To install Caldecott execute:

$gem install caldecott

Creating a MongoDB
To create a MongoDB service within Cloud Foundry simply execute the "create-service" command within vmc and provide it a name of your service. In this example I am going to create a mongodb service called “myDB”.

$vmc create-service mongodb myDB

Creating a Tunnel
Lets setup a tunnel that will allow us to connect MongoDB tools to the  MongoDB node (“myDB”) we just created in Cloud Foundry. To create the tunnel execute:

$vmc tunnel myDB none

Note: I am using the “none” argument to say I will connect the client later. It you don’t include “none” vmc will list your client options.

Before establishing the tunnel VMC will prompt for your Cloud Foundry password. Assuming you enter the correct password, VMC will then establish the tunnel and print out the tunnel properties:

Service connection info:
  username : <USER_NAME>
  password :<PASSWORD>
  name     : db

Starting tunnel to myDB on port 10000.
Open another shell to run command-line clients or
use a UI tool to connect using the displayed information.
Press Ctrl-C to exit...

 
Connecting the MongoDB Shell
At this point your tunnel should be established on local port 10000. To connect you simply start the mongoDB shell with the tunnel port, username, password and database. For example:

$./mongo -port 10000 -u  <USER_NAME> -p <PASSWORD> db

Once connected you can now use the mongo shell as normal.



You can also connect other MongoDB tools to allow you to import/export data. I may cover MongoDB data import/export in a future blog.  

Friday, 2 December 2011

MongoDB Multi Data Centre Tagging with Java


Following on from my blog “Overview of MongoDB Java Write Concern Options”, here is a short introduction to MongoDB tagging across data centres and an example of using a write concern to check that a write reached each data centres.

Tagging Introduction
Nodes within a MongoDB replica set can be tagged with additional meta data. This meta data is really just a document that can contain any custom property. In this example lets image we have a replica set distributed across three data centres (New York, London and Hong Kong). During the configuration of a replica set we can add a tag with the property of “dc” that identifies the node’s location. For example each New York node could have add the tag ‘{"dc": "ny"}’:

{_id : 0, host : "A", tags : {"dc": "ny"}}

GetLastError Mode
One of the powers of tagging a MongoDB replica set node is that you can refer to a tag within a custom GetLastError mode. In this example we can create a simple GetLastError mode called "allDcs" that requires three nodes with district "dc" tags:

settings : {
       getLastErrorModes : {
           allDcs : {"dc" : 3}
       }
   }


Complete Replica Set Configuration
For completeness here is my complete replica set configuration:

Java Write Concern
Using a custom GetLastError mode within a write concern is straightforward. All you need to do is create a new instance of WriteConcren and pass the name of the GetLastError mode into the constructor.

In this example, if we want our save method to block until the write has reached a node within all three data centres then we create in instance of WriteConcren with constructor parameter "allDcs".

coll.save(dbObj,new WriteConcren("allDcs"));

Summary
As tags are simply documents on a node configuration then you can add any custom tag property you wish. This example showed the use of taggings to do a write concern across data centres however another common use case would be to tag nodes with a rack identifier. That way you could check the write has been distributed across racks.