Google AppEngine Full Text Search with Cloud SQL
Introduction
Many Google
AppEngine developers have been waiting for the Full Text Search feature,
especially coming from Google the biggest search engine on the Web. I
was quite happy to see that Google team is working on it as you can
check in the Google I/O 2011 session : Full Text Search by Bo Majewski, Ged Ellis . As far as I know the very promising indexing service is not yet available.
In
this article I will explain how you can provide some kind of full text
search in your application using services available App Engine services.
In
my specific use case I do not ask for a lot of feature, I just need to
have simple search a string in various attributes of my entities
independently of the case, and possible special characters (such as è,é,
... ). I am far of being an expert of Google Datastore API but I did
not find any simple way to achieve this directly using the Java API.
What I have done to solve this issue is to duplicate a part of my data
into the Google Cloud SQL to use the MySQL fulltext search capabilities.
Prerequisites
To achieve the following tasks you need to :
- Install the Google AppEngine SDK for Java
- Install MySQL 5.5
- A Google Account with AppEngine and Cloud SQL services.
- Optionally : Eclipse Plugin to develop, debug and deploy my Web application to GAE
Content
In the following paragraphs I will explain the basics of the integration of Cloud SQL for full text search, but you can, if you want, jump to :
- the full application code : https://github.com/tgrall/gae-full-text-search
- take a look of the running application : http://gae-fulltext-search.appspot.com/
Start by creating some simple entities with some attributes for example, an entity name Article, with title and body attributes.
import com.google.appengine.api.datastore.DatastoreService; import com.google.appengine.api.datastore.DatastoreServiceFactory; import com.google.appengine.api.datastore.Entity; //... //... Entity article = new Entity("Article"); article.setProperty("title", "MySQL Tutorial"); article.setProperty("body", "DBMS stands for DataBase ..."); datastore.put(article); article = new Entity("Article"); article.setProperty("title", "Datastore Index Selection and Advanced Search"); article.setProperty("body", "Learn how recent improvements to the query planner ... function in your application"); datastore.put(article);
If you look in the Datastore API, or even JDO or JPA you have no simple way to look for all the articles that are related to Triathlon, or Database, or Entities. Google DataStore does not support clause where with a "OR" between different fields; and I do not want to mention the fact that it is not possible to ignore the text case in a simple way.
This is why we need to have some full text features. Some of you are surely thinking about using Apache Lucene to do the trick, and yes it is possible. You can use for example the GAELucene project : http://code.google.com/p/gaelucene/. I use another approach, may be less advanced in term of "indexing/searching" options but sufficient for my use case:
- I store the text values on which I want to do some search in Google Cloud SQL and use the Full Text features of MySQL.
2. Create a SQL Table to store Text values (in development environment)
When using Google AppEngine, the Cloud SQL instances are accessed using a specific driver and configuration that we will see later. For now, we are still in development environment, this is where you have to use your local MySQL instance.
In this specific use case we will copy in a table the two fields and add a new unique key based on the entity key. So here the SQL to create this:
CREATE SCHEMA search_values DEFAULT CHARACTER SET utf8 ; USE search_values; CREATE TABLE articles ( entity_key varchar(250), title text, body text, PRIMARY KEY RESULTS_PK (entity_key), FULLTEXT (title,body) ) ENGINE=MyISAM DEFAULT CHARSET=utf8;Lines 1 and 3 are here to create the database schema and use it; then the script create a table that will contain a copy of the title and body from the entity.
3. Configure your development environment
This section is a short explanation of the Cloud SQL Documentation : Getting Started: Java
- Copy the MySQL JDBC driver into your Google App Engine SDK directory, under
/lib/impl/
. You can download the MySQL JDBC driver here. - In Eclipse, select your Java package.
- Click Run > Run Configurations.
- Expand the Web Application menu item.
- Add the following lines into the VM Arguments pane:
-Drdbms.server=local -Drdbms.driver=com.mysql.jdbc.Driver -Drdbms.url=jdbc:mysql://localhost:3306/search_values?user=username&password=password
- Click the Classpath tab.
- Select your project and click Add External JARs...
- Navigate to the Google App Engine SDK directory, then
lib/impl
, and select the JDBC driver JAR file. Click Open. The driver JAR is listed under User Entries. - Click Apply.
4. Use your MySQL table and copy the text values from Google Datastore to MySQL Table
Copying the data from Datastore entity to the table is quite easy:
Connection conn = null; try { DriverManager.registerDriver(new AppEngineDriver()); conn = DriverManager.getConnection("jdbc:google:rdbms://[your db instance]/search_values"); conn.setAutoCommit(false); String statement = "REPLACE INTO articles (entity_key, title, body) VALUES( ? , ? , ? )"; PreparedStatement stmt = conn.prepareStatement(statement); DatastoreService datastore = DatastoreServiceFactory.getDatastoreService(); Query q = new Query("Article"); PreparedQuery pq = datastore.prepare(q); // loop on each entity and insert the values in the SQL Table for (Entity result : pq.asIterable()) { stmt.setString(1, KeyFactory.keyToString(result.getKey()) ); stmt.setString(2, result.getProperty("title").toString() ); stmt.setString(3, result.getProperty("body").toString() ); stmt.executeUpdate(); conn.commit(); } } catch (SQLException e) { e.printStackTrace(); } finally { if (conn != null) try { conn.close(); } catch (SQLException ignore) {} }
Some specials things here, compare to standard Java Web Development:
- I manage the connection directly in my code (I have not looked yet if I can use datasources/connection pool in the context of Google AppEngine)
- Line #3: registering the AppEngine driver that is responsible of managing the connection, expecially work in development -local MySQL- or production mode -CloudSQL-.
- Line #4 : Get the connection. It is interesting to mention that in development the connection URL is grabbed from the environment variable Drdbms.url you have set previously. We will see later how we move this to the cloud. This is the magical part of the AppEngineDriver that manages different connection types Local MySQL or CloudSQL depending of the context
- After these lines, the code is quite simple :
- Get all the Articles entities from the datastore and loop
- "Upsert" the database record (REPLACE INTO syntax)
- Line #15 is storing the Key of the entity in a safe string using the KeyFactory.keyToString() method.
If you want to test this code just put this lines in a servlet to "sycnhronize" the data from datastore into the MySQL table. Obviously this code is just here for learning propose and should be integrated in a better way in a real application; starting with pushing the data in the database when entities are created/updated (and deleted ;) ). The sample code available from GitHub contains these methods.
5. Implement a search method
The goal is simple return a list of entities returned by a simple search criteria :
- public Iterable
searchEntities(String query)
The logic is here quite simple:
- Execute a SQL query
- For each result, get the Entity using the Key
- Return the list of Entities
public IterablesearchEntity(String query) { List results = new ArrayList (); Connection conn = null; try { DriverManager.registerDriver(new AppEngineDriver()); conn = DriverManager.getConnection("jdbc:google:rdbms://[your db instance]/search_values"); String statement = "SELECT entity_key FROM articles WHERE MATCH (title,body) AGAINST (? WITH QUERY EXPANSION);"; PreparedStatement stmt = conn.prepareStatement(statement); stmt.setString(1, query); ResultSet rs = stmt.executeQuery(); while (rs.next()) { String keyAsString = rs.getString(1); Entity article = DatastoreServiceFactory.getDatastoreService().get( KeyFactory.stringToKey(keyAsString) ); results.add(article); } } catch (SQLException e) { e.printStackTrace(); } catch (EntityNotFoundException e) { e.printStackTrace(); } finally { if (conn != null) try { conn.close(); } catch (SQLException ignore) {} } return results; }
In this method, the system connect to the database and then execute a query to search data using any type of SQL/MySQL query. In this exampe I am using the full text function with the "WITH QUERY EXPANSION". You can obviously use any type of SQL queries for example simple LIKE statement if this is enough four your application.
With this approach when I search for :
- "database" : the method returns all the articles concerning database, mysql, RDBMS independently of the case.
- "index" " the method returns all the articles talking about indexing/indexes or search.
6. Deploy to GAE
Once you have created your application, and activated and configure your CloudSQL instance (here), you can deploy your application and enjoy an easy way of using Full Text Search with GAE.
Conclusion
In this article I explained how you can use Google Cloud SQL to easily supports Full Text Search queries, based on the Full Text support of MySQL.
The code snippets that I have shared in this article are really basic and not ready for real life usage but still a good starting point. For example I have been using this in my application with GAE Queues to manage my indexes on larger volume of data.
As said before, you can test the application online at http://gae-fulltext-search.appspot.com/ and the source code is available on GitHub : https://github.com/tgrall/gae-full-text-search
No comments:
Post a Comment