The set is sorted – searching is faster!

In my last blog entry, I had said that the set container was better than a vector container only in the fact that its elements were sorted. Well this gives it a huge advantage when it comes to finding an element. Take a look at the following code snippet that populates a vector, a list and a set with 100,000 randomly generated integers :-

srand((unsigned int)time(0));
srand(rand() + rand() + rand());
vector<int> vecint;
list<int> listint;
set<int> setint;
_tcout << _T("Populating numbers... ") << endl;
for(int i=0; i<100000; i++)
{
    if(i%1000 == 0)
        _tcout << _T("\b\b\b") << i/1000 << _T("%");
    int num = rand();
    vecint.push_back(num);
    setint.insert(num);
    listint.push_back(num);
}
_tcout << _T("\b\b\b") << _T("100%") << endl << endl;

Here’s some code that generates a 1000 random numbers to search for :-

vector<int> numstofind;
for(int i=0; i<1000; i++)
    numstofind.push_back(vecint[(int)((
    (double)rand() / (double) RAND_MAX) * 1000)]);

I wrote a templated function that’ll iterate through the entire collection looking for these numbers :-

template<typename T> int FindNum(T t,
    vector<int> numstofind, const TCHAR* mesg)
{
    _tcout << mesg << endl;
    DWORD init = GetTickCount();


    for each(int i in numstofind)
        *find(t.begin(), t.end(), i);


    DWORD diff = GetTickCount() - init;
    _tcout << diff << _T(" milliseconds elapsed") << endl << endl;
    return diff;
}

And I call the function for each container class :-

FindNum< vector<int> >(vecint, numstofind, _T("Find numbers in vector"));
FindNum< list<int> >(listint, numstofind, _T("Find numbers in list"));
FindNum< set<int> >(setint, numstofind, _T("Find numbers in set"));

Here are my results :-

Populating numbers...
100%

Find numbers in vector
360 milliseconds elapsed

Find numbers in list
441 milliseconds elapsed

Find numbers in set
26589 milliseconds elapsed

Hey, something’s wrong. Wasn’t the set container supposed to be a lot faster than vector or list at finding elements? Oops, we’ve used the find() algorithm to find an element which iterates through each element till a match is found. We don’t have to do that for a set, since a set is sorted and has its find() method which we should have used. Here’s a specialization of the above template function for set :-

template<> int FindNum< set<int> >(
    set<int> t, vector<int> numstofind, const TCHAR* mesg)
{
    _tcout << mesg
        << _T(" (specialized version for set<int>) ") << endl;
    DWORD init = GetTickCount();


    for each(int i in numstofind)
        *t.find(i);


    DWORD diff = GetTickCount() - init;
    _tcout << diff << _T(" milliseconds elapsed") << endl << endl;
    return diff;
}

Here are the results I got after I added this specialization :-

Populating numbers...
100%

Find numbers in vector
360 milliseconds elapsed

Find numbers in list
461 milliseconds elapsed

Find numbers in set (specialized version for set)
20 milliseconds elapsed

Ah, now we have got the expected results, haven’t we?

Advertisements
Posted in C++

One thought on “The set is sorted – searching is faster!

  1. If true perf is what you’re after, a sorted vector will actually be faster
    than a set for lookup operations. Of course this is only feasible if the data in the set is
    constant, since vectors are horrible at deletion. And a hash_set probably beats both of them for large data sets.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s