How to sync a mysql database to external data source

I have a mysql database table called search that I need to keep up to data with an ElasticSearch index. I have already exported the table from the table to the es index, but now I need to keep the data in sync or else the search will become stale quite quickly.

The only way I can think of is by exporting the table every x minutes and then comparing it with what was last imported. This isn’t feasible since the table has about 10M rows and I don’t want to be doing table exports every five minutes all day long. What would be a good solution for this? Note that I only have read-access to the database.

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

I would leverage Logstash with a jdbc input plugin and an elasticsearch output plugin. There’s a blog article showing a full example of this solution.

After installing Logstash, you can create a configuration file with the plugins I mentioned above like this:

input {
    jdbc {
        jdbc_connection_string => "jdbc:mysql://localhost:3306/mydb"
        jdbc_user => "user"
        jdbc_password => "1234"
        jdbc_validate_connection => true
        jdbc_driver_library => "mysql-connector-java-5.1.36-bin.jar"
        jdbc_driver_class => "com.mysql.jdbc.Driver"
        schedule => "5m"
        statement => "SELECT * FROM search WHERE timestamp > :sql_last_value"
    }
}
output {
    elasticsearch {
        protocol => http
        index => "searches"
        document_type => "search"
        document_id => "%{uid}"
        host => "ES_NODE_HOST"
    }
}

You need to make sure to change a few values to match your environment, but this should work out without a problem for what you need to do.

Every 5 minutes the query will run and will fetch all search records whose timestamp (change that name to match your data) is more recent than the last time the query ran. The selected records will be sinked in the searches index located in your Elasticsearch server on ES_NODE_HOST. Make sure to change the index and type name accordingly, as well as the name of the primary key field (i.e. uid) to match your data as well.


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x