I have a table A which has a column ‘template_phash’. I store the phash generated from 400K images.
Now I take a random image and generate a phash from that image.
Now how do I query so that I can get the record from table A which hamming distance difference is less than a threshold value, say 20.
I have seen Hamming distance on binary strings in SQL, but couldn’t figure it out.
I think I figured out that I need to make a function to achieve this but how?
Both of my phash are in BigInt eg: 7641692061273169067
Please help me make the function so that I could query like
SELECT product_id, HAMMING_DISTANCE(phash1, phash2) as hd FROM A WHERE hd < 20 ORDER BY hd ASC;
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
I figured out that the hamming distance is just the count of different bits between the two hashes. First xor the two hashes then get the count of binary ones:
SELECT product_id, BIT_COUNT(phash1 ^ phash2) as hd from A ORDER BY hd ASC;
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0