I am a student at university and our task is to create a search engine. I am having difficulty generating a unique id to assign to each url when added into the frontier. I have attempted using the SHA-256 hashing algorithm as well as Guid. Here is the code that i used to implement the guid:
public string generateID(string url_add)
{
long i = 1;
foreach (byte b in Guid.NewGuid().ToByteArray())
{
i *= ((int)b + 1);
}
string number = String.Format("{0:d9}", (DateTime.Now.Ticks / 10) % 1000000000);
return number;
}
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Why not just use ToString?
public string generateID()
{
return Guid.NewGuid().ToString("N");
}
If you would like it to be based on a URL, you could simply do the following:
public string generateID(string sourceUrl)
{
return string.Format("{0}_{1:N}", sourceUrl, Guid.NewGuid());
}
If you want to hide the URL, you could use some form of SHA1 on the sourceURL, but I’m not sure what that might achieve.
Method 2
Why don’t use GUID?
Guid guid = Guid.NewGuid(); string str = guid.ToString();
Method 3
Here is a ‘YouTube-video-id’ like id generator e.g. “UcBKmq2XE5a”
StringBuilder builder = new StringBuilder();
Enumerable
.Range(65, 26)
.Select(e => ((char)e).ToString())
.Concat(Enumerable.Range(97, 26).Select(e => ((char)e).ToString()))
.Concat(Enumerable.Range(0, 10).Select(e => e.ToString()))
.OrderBy(e => Guid.NewGuid())
.Take(11)
.ToList().ForEach(e => builder.Append(e));
string id = builder.ToString();
It creates random ids of size 11 characters. You can increase/decrease that as well, just change the parameter of Take method.
0.001% duplicates in 100 million.
Method 4
Why can’t we make a unique id as below.
We can use DateTime.Now.Ticks and Guid.NewGuid().ToString() to combine together and make a unique id.
As the DateTime.Now.Ticks is added, we can find out the Date and Time in seconds at which the unique id is created.
Please see the code.
var ticks = DateTime.Now.Ticks; var guid = Guid.NewGuid().ToString(); var uniqueSessionId = ticks.ToString() +'-'+ guid; //guid created by combining ticks and guid var datetime = new DateTime(ticks);//for checking purpose var datetimenow = DateTime.Now; //both these date times are different.
We can even take the part of ticks in unique id and check for the date and time later for future reference.
Method 5
If you want to use sha-256 (guid would be faster) then you would need to do something like
SHA256 shaAlgorithm = new SHA256Managed(); byte[] shaDigest = shaAlgorithm.ComputeHash(ASCIIEncoding.ASCII.GetBytes(url)); return BitConverter.ToString(shaDigest);
Of course, it doesn’t have to ascii and it can be any other kind of hashing algorithm as well
Method 6
This question seems to be answered, however for completeness, I would add another approach.
You can use a unique ID number generator which is based on Twitter’s Snowflake id generator. C# implementation can be found here.
var id64Generator = new Id64Generator();
// ...
public string generateID(string sourceUrl)
{
return string.Format("{0}_{1}", sourceUrl, id64Generator.GenerateId());
}
Note that one of very nice features of that approach is possibility to have multiple generators on independent nodes (probably something useful for a search engine) generating real time, globally unique identifiers.
// node 0 var id64Generator = new Id64Generator(0); // node 1 var id64Generator = new Id64Generator(1); // ... node 10 var id64Generator = new Id64Generator(10);
Method 7
We can do something like this
string TransactionID = "BTRF"+DateTime.Now.Ticks.ToString().Substring(0, 10);
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0