Moving My Beers From Couchbase to MongoDB
See it on my new blog : here
Moving my Java from Couchbase to MongoDB pic.twitter.com/Wnn3pXfMGi
— Tugdual Grall (@tgrall) January 26, 2015
So I decided to move it from a simple picture to a real project. Let’s look at the two phases of this so called project:
- Moving the data from Couchbase to MongoDB
- Updating the application code to use MongoDB
Moving the data
I have created a replication server, that uses the Couchbase XDCR protocol to get the document out and insert them into MongoDB. This server use the Couchbase CAPI Server project available here.This server will receive all the mutations made in the Couchbase:
- When a document is inserted or updated the full document is sent
- When a document is deleted, only the medata are sent
- The
replication server
, save the data into MongoDB (inserts and/or updates - no delete), and then return the list to Couchbase as part of the XDCR Protocol.
- If the JSON document does not contains a type field, all the documents will be saved in a single collection
- If the JSON document contains a type field then a collection will be created for each type and documents will be inserted/updated in these collections
- MongoDB does not allow attributes key to have . and $ signs, so it is necessary to change the name with alternative characters. This is done automatically during the copy of the data.
As you can see in the screencast this is straightforward.(note that I have only tested very simple use cases and deployment)
You can download the tool and the source code here:
- https://github.com/tgrall/mongodb-cb-replicator
- Download the MongoCBReplicator.jar file.
Updating the application code
The next step is to use these data in an application. For this I simply use the Beer Sample Java application available on Couchbase repository.I just recreated the project and modified few things, to get the application up and running:
- Change the connection string
- Remove the code that generate views
- Replace set/get by MongoDB operations
- Replace call to the views by simple queries
I did not change any business logic, or added features, or even replaced the way navigation and page rendition is made. I just focused on the database access, for example :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
|
I did not attend to optimize the MongoDB code, but just to replace as few lines of code as possible.
Note: I have not created any index during the process. Obviously if your application have more and more data and you do intense work with it you must analyze your application/queries to see which indexes must be created.
Adding new features
Once you have the data into MongoDB you can do a lot more without anything more than MongoDB:Full Text Search
You can create a Text index on various fields in the collection to provide advanced search capabilities to your users.1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
Then you can query the database using the
$text
operation, for example all breweries with Belgium and without Ale1 2 3 4 5 6 7 |
|
Some analytics
Not sure these queries really make sense, but it is just to show that now you can leverage your documents without the need of any 3rd party tool.Number of beer by category, from the most common to the less one:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Number of beer of a specific ABV by brewery, for example: top 3 breweries with the most beer with an abv greather or equals to a value, let’s say 5:
1 2 3 4 5 6 7 8 9 10 |
|
Geospatial queries
The first thing to do with the data is to change the data structure to save the various data into a GeoJSON format, for this we can simply use a script into the MongoDB Shell:1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
This call take all the breweries and add a new attribute, name
loc
as a GeoJSON point. I could also chose to remove the old geo information using a ‘$unset’, but I did not; let’s imagine that some API/applications are using it. This is a good example of flexible schema.Now I can search for all the brewery that are at less than 30km from the Golden Gate in San Francisco: [-122.478255,37.819929]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
You can also use Geospatial indexes and operators in the aggregation queries used above
Conclusion
As as said in the introduction, this week end project started as a joke on Twitter, and finished with a small blog post and Gitub repositories.My goal here is not to compare the two solutions -I made my choice few months back- but simply show how you can move from one to the other with almost no effort, not only the data but also the application code.
2 comments:
I've been evaluating MongoDB & Couchbase, and hear that there is restriction on total number of documents/ inserts per sec? Is that true. While couchbase writes can scale linearly?
Hello,
You do not have any "restriction" per se. The fact is, it depends a lot of your use case, and the hardware you are using.
In any case both solutions support sharding, so you will be able to distribute the writes (and read) and many machines.
The way it done is very different in each solution and both have pros and cons. The only correct answer for me is :
- look at your use case in detail
Base on this you can find the proper solution, Do not hesitate to reach me at tugdual[at]gmail[dot]com to discuss in detail.
Tug
Post a Comment