Sphinx without using an auto_increment id

I am current in planning on creating a big database (2+ million rows) with a variety of data from separate sources. I would like to avoid structuring the database around auto_increment ids to help prevent against sync issues with replication, and also because each item inserted will have a alphanumeric product code that is guaranteed to be unique – it seems to me more sense to use that instead.

I am looking at a search engine to index this database with Sphinx looking rather appealing due to its design around indexing relational databases. However, looking at various tutorials and documentation seems to show database designs being dependent on an auto_increment field in one form or another and a rather bold statement in the documentation saying that document ids must be 32/64bit integers only or things break.

Is there a way to have a database indexed by Sphinx without auto_increment fields as the id?


Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Sure – that’s easy to work around. If you need to make up your own IDs just for Sphinx and you don’t want them to collide, you can do something like this in your sphinx.conf (example code for MySQL)

source products {

  # Use a variable to store a throwaway ID value
  sql_query_pre = SELECT @id := 0 

  # Keep incrementing the throwaway ID.
  # "code" is present twice because Sphinx does not full-text index attributes
  sql_query = SELECT @id := @id + 1, code AS code_attr, code, description FROM products

  # Return the code so that your app will know which records were matched
  # this will only work in Sphinx 0.9.10 and higher!
  sql_attr_string = code_attr  

The only problem is that you still need a way to know what records were matched by your search. Sphinx will return the id (which is now meaningless) plus any columns that you mark as “attributes”.

Sphinx 0.9.10 and above will be able to return your product code to you as part of the search results because it has string attributes support.

0.9.10 is not an official release yet but it is looking great. It looks like Zawodny is running it over at Craig’s List so I wouldn’t be too nervous about relying on this feature.

Method 2

sphinx only requires ids to be integer and unique, it doesn’t care if they are auto incremented or not, so you can roll out your own logic. For example, generate integer hashes for your string keys.

Method 3

Sphinx doesnt depend on auto increment , just needs unique integer document ids. Maybe you can have a surrogate unique integer id in the tables to work with sphinx. As it is known that integer searches are way faster than alphanumeric searches. BTW how long is ur alphanumeric product code? any samples?

Method 4

I think it’s possible to generate a XML Stream from your data.
Then create the ID via Software (Ruby, Java, PHP).

Take a look at

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Notify of

Inline Feedbacks
View all comments
Would love your thoughts, please comment.x