For this example I will start by creating a simple view based on the beer-sample dataset, the view is used to find brewery by country:
function (doc, meta) { if (doc.type == "brewery" && doc.country){ emit(doc.country); } }
This view list all the breweries by country, the index looks like:
Doc id | Key | Value |
---|---|---|
bersaglier | Argentina | null |
cervecera_jerome | Argentina | null |
brouwerij_nacional_balashi | Aruba | null |
australian_brewing_corporation | Australia | null |
carlton_and_united_breweries | Australia | null |
coopers_brewery | Australia | null |
foster_s_australia_ltd | Australia | null |
gold_coast_brewery | Australia | null |
lion_nathan_australia_hunter_street | Australia | null |
little_creatures_brewery | Australia | null |
malt_shovel_brewery | Australia | null |
matilda_bay_brewing | Australia | null |
... | ... | ... |
... | ... | ... |
... | ... | ... |
yellowstone_valley_brewing | United States | null |
yuengling_son_brewing | United States | null |
zea_rotisserie_and_brewery | United States | null |
fosters_tien_gang | Viet Nam | null |
hue_brewery | Viet Nam | null |
So now you want to navigate in this index with a page size of 5 rows.
Using skip / limit Parameters
The most simplistic approach is to use limit and skip parameters for example:Page 1 : ?limit=5&skip0
Page 2: ?limit=5&skip=5
...
Page x: ?limit=5&skip(limit*(page-1))
You can obviously use any other parameters you need to do range or key queries (startkey/endkey, key, keys) and sort option (descending).
This is simple but not the most efficient way, since the query engine has to read all the rows that match the query, until the skip value is reached.
Some code sample in python that paginate using this view :
This application loops on all the pages until the end of the index.
As I said before this is not the best approach since the system must read all the values until the skip is reached. The following example shows a better way to deal with this.
Using startkey / startkey_docid parameters
To make this pagination more efficient it is possible to take another approach. This approach uses the startkey and startkey_docid to select the proper documents.
- The startkey parameter will be the value of the key where the query should start to read (based on the last key of the "previous page"
- Since for a key for example "Germany" you may have one or more ids (documents) it is necessary to say to Couchbase query engine where to start, for this you need to use the startkey_docid parameter, and ignore this id since it is the last one of the previous page.
So if we look at the index, and add a row number to explain the pagination
Row num | Doc id | Key | Value | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Query for page 1 ?limit=5 | ||||||||||||
1 | bersaglier | Argentina | null | |||||||||
2 | cervecera_jerome | Argentina | null | |||||||||
3 | brouwerij_nacional_balashi | Aruba | null | |||||||||
4 | australian_brewing_corporation | Australia | null | |||||||||
5 | carlton_and_united_breweries | Australia | null | |||||||||
| ||||||||||||
6 | coopers_brewery | Australia | null | |||||||||
7 | foster_s_australia_ltd | Australia | null | |||||||||
8 | gold_coast_brewery | Australia | null | |||||||||
9 | lion_nathan_australia_hunter_street | Australia | null | |||||||||
10 | little_creatures_brewery | Australia | null | |||||||||
Query for page 3
| ||||||||||||
11 | malt_shovel_brewery | Australia | null | |||||||||
12 | matilda_bay_brewing | Australia | null | |||||||||
... | ... | ... | ||||||||||
... | ... | ... | ||||||||||
... | ... | ... | ||||||||||
... | yellowstone_valley_brewing | United States | null | |||||||||
... | yuengling_son_brewing | United States | null | |||||||||
... | zea_rotisserie_and_brewery | United States | null | |||||||||
... | fosters_tien_gang | Viet Nam | null | |||||||||
... | hue_brewery | Viet Nam | null |
So as you can see in the examples above, the query uses the startkey, a document id, and just passes it using skip=1.
Let's now look at the application code, once again in Python
This application loops on all the pages until the end of the index
Using this approach, the application start to read the index at a specific key (startkey parameter), and only loop on the necessary entry in the index. This is more efficient than using the simple skip approach.
Views with Reduce function
When your view is using a reduce function, if you want to paginate on the various keys only (with the reduce function) you need to use the skip and limit parameters.
When you are using the paramater startkey_docid with a reduce function it will calculate the reduce only to the subset of document ids that are part of your query.
When you are using the paramater startkey_docid with a reduce function it will calculate the reduce only to the subset of document ids that are part of your query.
Couchbase Java SDK Paginator
In the previous examples, I have showed how to do pagination using the various query parameters. The Java SDK provides a Paginator object to help developers to deal with pagination. The following example is using the same view with the Paginator API.
So as you can see you can easily paginate on the results of a Query using the Java Paginator.
The Java Paginator is aware of the fact that they query is using a reduce or not, so you can use it with all type of queries - Internally it will switch between the skip/limit approach and the doc_id approaches. You can see how it is done in the Paginator class.
Note that if you want to do that in a Web application between HTTP request you must keep the Paginator object in the user session since the current API keeps the current page in its state.
So as you can see you can easily paginate on the results of a Query using the Java Paginator.
- At the line #37, the Paginator is created from using the view and query objects and a page size is specified
- Then you just need to use the hasNext() and next() methods to navigate in the results.
The Java Paginator is aware of the fact that they query is using a reduce or not, so you can use it with all type of queries - Internally it will switch between the skip/limit approach and the doc_id approaches. You can see how it is done in the Paginator class.
Note that if you want to do that in a Web application between HTTP request you must keep the Paginator object in the user session since the current API keeps the current page in its state.
Conclusion
In this blog post you have learned how to deal with pagination in Couchbase views; to summarize- The pagination is based on some specific parameters that you send when executing a query.
- Java developers can use the Paginator class that simplifies pagination.
I am inviting you to look at the new Couchbase Query Language N1QL, still under development, that will provide more options to developers including pagination, using LIMIT & OFFSET parameters, for example:
SELECT fname, age FROM tutorial WHERE age > 30 LIMIT 2 OFFSET 2
Hi Tug, I really like your samples, and I was wondering if is possible for you to show us how we can implement a "LIKE" query.
ReplyDeleteSELECT column_name(s)
FROM table_name
WHERE column_name LIKE pattern;
You know when we use views we can emit some keys, but if Im looking for a pattern inside that main key, how we can retrieve all the keys that contains that pattern?.
Thnx a lot.
How about the performance between n1ql and map reduce? On a large bucket, say 100m documents. No insight info everywhere.
ReplyDeleteHello, we do not have official numbers of performance of N1QL for now. As you know it is still under development. But the goal is to have predictable results like we have today with Views.
ReplyDeleteAristoteles,
ReplyDeleteI have a draft of content here that could help you:
https://gist.github.com/tgrall/c27591db3af256abf45c
As you can see with view you can only do a "LIKE 'X%'" so a startsWith (not a contains)
Will write a blog post soon
Hi Tug, thnx for the information, I was looking for the LIKE search, and some people use the following approach:
ReplyDeleteTe main idea is to split the keyword, and emit all the words resulting in something like this:
[
{ key: "and", value: { text: "Live long and prosper." } },
{ key: "Live", value: { text: "Live long and prosper." } },
{ key: "long", value: { text: "Live long and prosper." } },
{ key: "prosper", value: { text: "Live long and prosper." } }]
]
What do you think about this approach?.
Personally I have some issues with duplicated data in the views, but if you want to include some information about this, in some future post will be great.
Thnx a lot.
Hi Tug,
ReplyDeleteWe are having an issue in Couchbase 2.1.1. Basically when we can the REST interface like so: ?startkey=[[2013,10,10,23,12,0],"EVERYONE"]&startkey_docid=6e57a775-1e96-4ac3-95c1-f2218355aa3d&skip=1 we still get the startkey_docid and the only doc that is skipped in the first being returned from our view. There is an open question here: http://stackoverflow.com/questions/19915575/couchbase-startkey-docid-not-working-as-we-expect-it