I am researching for sha1 collision.
The size of the hashset in all cases of sha1 is 2^160. Assuming you put it in the aws database, how long will it take to retrieve the record? Is there an area defined as how many seconds it takes to retrieve a record in mysql or mssql documentation? Also, what database type of aws do you think is good for a huge amount of 2^160?
Thanks in advance for your reply.
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
To retrieve one row from any suitably indexed table will take “a few” milliseconds.
Said another way, a single program can fetch hundreds, maybe thousands, of row, one at a time, from the table indexed by that sha1 column in one second.
To understand how this is possible, read about how a
The number of bits in the key (160) has very little impact on what I said above. And 2^160 is simply the maximum number of distinct values for that key.
The only significant difference between looking it up on AWS and our personal laptop is the network latency.
If you are inserting a row but want to first check for the sha1 already being there, simply do the insert, then check for a “duplicate key” error. Or use
INSERT IGNORE... Or use
INSERT ... ON DUPLICATE KEY UPDATE... The Insert must (effectively) do a
SELECT first to see if the will be a dup key — essentially no extra effort.
Think back to how your grandparents used to look up a word in a book called a dictionary or telephone directory or atlas.
There is no way to store that much data, on aws or anywhere else. Nor is it possible to generate that much data in any conceivable amount of time.
I just want to calculate an imaginary estimated time
About 4 seconds.
Imagine B-Tree on a 160 bit key. If each Page is 8000 bytes, and the page pointer is also 160 bits that’s 25 key values per page. 2^160 ~ 25^35, so there would be 35 levels in the B-Tree, so 35 IO’s to get to the leaf level.
This would necessarily be a planet-sized distributed system, so allow 100ms per IO, and that’s something under 4 seconds.