Tuesday, November 13, 2012

Building a chat application using Node.js and Couchbase

After some basic articles about Couchbase installation, Node.js integration. Let's now dive into a more complete  example: a chat application.

The first version of the chat should be compliant with the following requirements:

  • web based
  • single room
  • user just needs to enter a login and he can start to interact with other connected users
  • user should be able to navigate into the chat history

The Couchbase Chat application is build using the following components
  • Node.js for the application
  • Couchbase to persist all the messages

I won't go in all the detail of the design of the Node.js application. You can find many example of Node based chat application. I prefere to focus on how I have design the persistence using Couchbase more than the application itself. If you want me to give more detail about the complete application feel free to drop me a message/comment and I will do it.

What are the challenges with persisting the messages?
Storing the information is quite easy, just "dump" the message information in your database. The challenge is more around the fact that user want to access the history of the messages. So the key point here is how to store the information in a way that it is easy to get back in a sorted fashion.

You will find many different ways of achieving that depending of the technology you are using and they query capabilities of your persistence engine. Using Couchbase you have two ways to access/find the data:

  • Using Views that allows you to query and secondary level index and do advanced operation such as sorting, query on key range, ...
  • Directly access the data using its key

In this post I will show you how you can use the the two options to build your application and retrieved information that are stored and retrieved in a specific order:

  • First Options: using a view to get the message history
  • Second Options: using a counter as a key for the messages

The source code of the application is available in Github : https://github.com/tgrall/couchbase-chat


Get the Couchbase connection

The following code is used to connect to Couchbase, once it is done, the Web server is started:

var express = require('express');
var app = express();
var http = require('http');
var server = http.createServer(app);
var io = require('socket.io').listen(server);

var driver = require('couchbase');

driver.connect({
 "username": "",
 "password": "",
 "hostname": "localhost:8091",
 "bucket": "default"}, 
 function(err, couchbase) {
  if (err) {
   throw (err)
  }

  server.listen(8080);

  app.get('/', function(req, res) {
   res.sendfile(__dirname + '/index.html');
  });
...
// Application code
// Socket.io events
...
});

Let's now see how Couchbase is used in the chat application.

First Option : Using views to get the message history

Post a new message
In this example messages are formatted using the following information:

{
  "type": "message",
  "user": "Tug",
  "message": "Hello all !",
  "timestamp": 1349836768909
}

The key is based on the timestamp and  the user name : 1349836768909-Tug. I am adding the user name to be sure that the key is unique. Like that I do not have to manage conflicts.

The insertion of the message :

  socket.on('postMessage', function(data) {
    // create a new message
    var message = {
      type: "message",
      user: socket.username,
      message: data,
      timestamp: Date.now()
    }
    var messageKey = message.timestamp +"-"+ message.user;
    io.sockets.emit('updateChatWindow', message);
    couchbase.set(messageKey, JSON.stringify(message),function(err) {  }); 
  });

  • The postMessage event is called by the client when the user post a new message. 
  • A new message object is created with : a type, the user, the message itself and a timestamp.
  • The message is sent to the different clients using the io.sockets.emit() function (line 10)
  • Finally the message is saved into Couchbase (line 11). As you can see the only thing you have to do is to send the Javascript object as a simple JSON String.
At this point your application work perfectly, all the connected user will see the new messages since they are sent by the server as soon as they are created. But it is not possible for a user to navigate in the chat history and see older messages.


Retrieve messages from Couchase 

As explained earlier, it is possible to use a view to retrieve the message from the database in a proper order.  The view looks like that:

function (doc, meta) { 
  if ( meta.type == "json" && doc.type == "message" ) {
   emit(doc.timestamp, null);
  }
}

Each time a new document is inserted in the database, if this is a JSON document and the type of this document is "message" the index will be updated. When this view is called the result looks like :

{"id":"1352733392477-JOHN","key":1352733392477,"value":null},

As you can see the id of the document (timestamp-username) is automatically inserted in the response.


You can use the following command to insert the view in your Couchbase Server: (configure the server address, port and bucket accordingly to your environment)
curl -X PUT -H 'Content-Type: application/json' http://127.0.0.1:8092/default/_design/chat -d '{"views":{"message_hisory":{"map":"function (doc, meta) {\n  \n  if ( meta.type == \"json\" && doc.type == \"message\" ) {\n   emit(doc.timestamp, null);\n  }\n}"}}}'



The application now calls the view using the following code

socket.on('showhistory', function(limit,startkey) {
  limit = (limit == undefined || limit == 0)? 5 : limit;
  var options = {"descending": "true", "limit" : limit, "stale" : "false"};
  if (startkey > 0) {
    options.startkey = startkey-1;
  }
  couchbase.view("chat","message_hisory", options , function(err, resp, view) {
    var rows = view.rows;
    var keys = new Array();
    for( var i = 0; i < rows.length ; i++  ) {
      keys.push( rows[i].id );
    }
    couchbase.get(keys,function(err, doc, meta) {
      socket.emit('updateChatWindow', doc, true);
    });
  });
});

When the client send a showHistory event the application capture this event and call the view with proper parameters to send back the list of messages to the client.

The options object contains the different parameters that will be used to call the view:
  • Use descending order to return the messages from the newest to the oldest
  • The number of message to return (limit)
  • Ask the view to update the index before returning the rows using the stale=false parameter.
  • Use startkey parameter if the client send a specific starting point. 
On line 7, the view "chat", "message_history" is called using the Node.js SDK, with the options object.

In the callback function, the application creates an array containing the document id (the keys of the document itself), then on line 13 the messages are retrieved from Couchbase using the get() function. (note: in this function I may have a small issue when multiple messages are sent in the same milliseconds and are just on the edge of the offset)

We have an interesting point to discuss, the view is used only to return the list of keys, and then do a multiple get call with the list of keys. This is most of the time better than returning too much data in the view.

In this first option, the application is using a view to get the message history. This is great, the only thing to look at closely is the fact that this approach uses index and the indexes are stored on the disk. So you need to be sure that the message is saved and the index updated before printing the message in the history, this is why the stale=false is required in this specific scenario.


Second Option : Using a counter as document Key

Let's see now how it is possible, with few changes in the the application, to do the same without using a view and only use the in memory keys. Using this approach the application only use the keys that are all in the memory of the server (memcached).

The application logic stays the same:
  1. When user connects to the server the system returns the last 5 messages from the database
  2. Each time the user posts a message it should be persisted
  3. The user can manually load older messages from the database to view the complete chat history

Post a new message
The key associated to the message is now a counter, and the application use the increment feature of Couchbase:
socket.on('postMessage', function(data) {
  // create a new message
  var message = {
    type: "message",
    user: socket.username,
    message: data,
    timestamp: Date.now()
  }
  couchbase.incr("chat:msg_count", function (data, error, key, cas, value ) { 
    var messageKey = "chat:"+ value;
    message.id = value;
    io.sockets.emit('updateChatWindow', message);
    couchbase.set(messageKey, JSON.stringify(message),function(err) {  }); 
    });
});

Once the message object is created (line 3), the application increments a value chat:msg_count that will be used as message counter (line 9). Note that the Node Couchbase SDK will automatically create the key if it is not present when the incr() method is called.

When the server has returned the new value, increment by 1 with a default value of 0, the callback function is call :

  • The value is used to create a new key for the message (line 10)
  • The message is push to the different users  (line 12)
  • Then the message is saved into Couchbase (line 13)


So what we have here:
  • a new item that contains the counter, associated to the key : chat:msg_count
  • each message will have a key that looks like chat:0, chat:1, chat:2, ... 

Retrieve messages from Couchase 
Retrieving the older messages from Couchbase is very easy since all the message contains a unique and sequencial id. The showHistory event just need to create a list of keys based on the correct number and get them from Couchbase.

socket.on('showHistory', function(limit,startkey) {
  var keys = new Array();
  for (i = startkey; i > (startkey-limit) && i >= 0 ; i--) {
    keys.push("chat:"+i);
  }
  couchbase.get(keys,function(err, doc, meta) {
    socket.emit('updateChatWindow', doc, true);
  });
});

The line 3-5 are used to create an array of keys, and then in line 6 this array is used to do a multiple get and send the messages to the client using socket.emit.

Here the logic is almost the same that the one used in the previous example. The only difference is the fact that we do not call Couchbase server to create the list of keys to use to print the message history.

Conclusion

As you can see when working with a NoSQL database like any other persistence store you often different ways of achieving the same thing. In this example I used two approaches, one using a view, the other one using the key directly.

The important thing here is to take some time when designing your application to see which approach will be the best for your application. In this example of the chat application I would probably stay with the "Key/Counter" approach that will be the most efficient in term of performance and scalability since it does not use secondary index.







Monday, November 5, 2012

Couchbase : Create a large dataset using Twitter and Java

An easy way to create large dataset when playing/demonstrating Couchbase -or any other NoSQL engine- is to inject Twitter feed into your database.

For this small application I am using:

In this example I am using Java to inject Tweets into Couchbase, you can obviously use another langage if you want to.

The sources of this project are available on my Github repository  Twitter Injector for Couchbase you can also download the Binary version here, and execute the application from the command line, see Run The Application paragraph. Do not forget to create your Twitter oAuth keys (see next paragraph) 


Create oAuth Keys

The first thing to do to be able to use the Twitter API is to create a set of keys. If you want to learn more about all these keys/tokens take a look to the oAuth protocol : http://oauth.net/


1. Log in into the Twitter Development Portal : https://dev.twitter.com/

2. Create a new Application 
Click on the "Create an App" link or go into the "User Menu > My Applications > Create a new application"

3. Enter the Application Details information



4. Click "Create Your Twitter Application" button

Your application's OAuth settings are now available :


5- Go down on the Application Settings page and click on the "Create My Access Token" button



You have now all the necessary information to create your application:
  • Consumer key 
  • Consumer secret
  • Access token
  • Access token secret

These keys will be uses in the twitter4j.properties file when running the Java application from the command line see



Create the Java Application

The following code is the main code of the application:

Some basic explanation:

  • The setUp() method simply reads the twitter4j.properties file from the classpath to build the Couchbase connection string.
  • The injectTweets opens the Couchbase connection -line 76- and calls the TwitterStream API. 
  • A Listener is created and will receive all the onStatus(Status status) from Twitter. The most important method is onStatus() that receive the message and save it into Couchbase. 
  • One interesting thing : since Couchbase is a JSON Document database it allows your to just take the JSON String and save it directly.
    cbClient.add(idStr,0 ,twitterMessage);

Packaging
To be able to execute the application directly from the Jar file, I am using the assembly plugin with the following informations from the pom.xml :




  ... 
  
    
     com.couchbase.demo.TwitterInjector
    
    
     .
    
  
  ...

Some information:

  • The mainClass entry allows you to set which class to execute when running java -jar command.
  • The Class-Path entry allows you to set the current directory as part of the classpath where the program will search for the twitter4j.properties file.
  • The assembly file is also configure to include all the dependencies (Twitter4J, Couchbase client  SDK, ...)
If you do want to build it from the sources, simply run :

mvn clean package

This will create the following Jar file ./target/CouchbaseTwitterInjector.jar



Run the Java Application

Before running the application you must create a twitter4j.properties file with the following information :

twitter4j.jsonStoreEnabled=true

oauth.consumerKey=[YOUR CONSUMER KEY]
oauth.consumerSecret=[YOUR CONSUMER SECRET KEY]
oauth.accessToken=[YOUR ACCESS TOKEN]
oauth.accessTokenSecret=[YOUR ACCESS TOKEN SECRET]

couchbase.uri.list=http://127.0.0.1:8091/pools
couchbase.bucket=default
couchbase.password=

Save the properties file and from the same location run:


jar -jar [path-to-jar]/CouchbaseTwitterInjector.jar


This will inject Tweets into your Couchbase Server. Enjoy !