Vincent SHAO wrote:
> Search engine have to record all of the query string. Now i have a
> search engine log which contains 10 milllion query strings, but almost
> of them are repeated, not more than 3 million of them are non-
> repeated.
> My task is to pick the top 10 most popular query string, memory < 1G,
> the length of the query string is no more than 255.
>
> The faster, the better.
> the principal solutions, algorithm and data structure.
>
> Thank you.:-)
My first attempt would be to stuff the query strings into a map with the
query string (or a hash of it) as the key, the number of times it occurs
as
the data.
Then a loop to read the data and sort, or simply compare counts and store
the keys for the top 10.
--
Jim Langston
tazmaster@[EMAIL PROTECTED]


|