In my last blog entry, I had said that the `set`

container was better than a `vector`

container only in the fact that its elements were sorted. Well this gives it a huge advantage when it comes to finding an element. Take a look at the following code snippet that populates a `vector`

, a `list`

and a `set`

with 100,000 randomly generated integers :-

srand((unsigned int)time(0)); srand(rand() + rand() + rand()); vector<int> vecint; list<int> listint; set<int> setint; _tcout << _T("Populating numbers... ") << endl; for(int i=0; i<100000; i++) { if(i%1000 == 0) _tcout << _T("\b\b\b") << i/1000 << _T("%"); int num = rand(); vecint.push_back(num); setint.insert(num); listint.push_back(num); } _tcout << _T("\b\b\b") << _T("100%") << endl << endl;

Here’s some code that generates a 1000 random numbers to search for :-

vector<int> numstofind; for(int i=0; i<1000; i++) numstofind.push_back(vecint[(int)(( (double)rand() / (double) RAND_MAX) * 1000)]);

I wrote a templated function that’ll iterate through the entire collection looking for these numbers :-

template<typename T> int FindNum(T t, vector<int> numstofind, const TCHAR* mesg) { _tcout << mesg << endl; DWORD init = GetTickCount(); for each(int i in numstofind) *find(t.begin(), t.end(), i); DWORD diff = GetTickCount() - init; _tcout << diff << _T(" milliseconds elapsed") << endl << endl; return diff; }

And I call the function for each container class :-

FindNum< vector<int> >(vecint, numstofind, _T("Find numbers in vector")); FindNum< list<int> >(listint, numstofind, _T("Find numbers in list")); FindNum< set<int> >(setint, numstofind, _T("Find numbers in set"));

Here are my results :-

Populating numbers... 100% Find numbers in vector 360 milliseconds elapsed Find numbers in list 441 milliseconds elapsed Find numbers in set 26589 milliseconds elapsed

Hey, something’s wrong. Wasn’t the `set`

container supposed to be a lot faster than `vector`

or `list`

at finding elements? Oops, we’ve used the `find()`

algorithm to find an element which iterates through each element till a match is found. We don’t have to do that for a `set`

, since a `set`

is sorted and has its `find()`

method which we should have used. Here’s a specialization of the above template function for `set`

:-

template<> int FindNum< set<int> >( set<int> t, vector<int> numstofind, const TCHAR* mesg) { _tcout << mesg << _T(" (specialized version for set<int>) ") << endl; DWORD init = GetTickCount(); for each(int i in numstofind) *t.find(i); DWORD diff = GetTickCount() - init; _tcout << diff << _T(" milliseconds elapsed") << endl << endl; return diff; }

Here are the results I got after I added this specialization :-

Populating numbers... 100% Find numbers in vector 360 milliseconds elapsed Find numbers in list 461 milliseconds elapsed Find numbers in set (specialized version for set) 20 milliseconds elapsed

Ah, now we have got the expected results, haven’t we?

If true perf is what you’re after, a sorted vector will actually be faster

than a set for lookup operations. Of course this is only feasible if the data in the set is

constant, since vectors are horrible at deletion. And a hash_set probably beats both of them for large data sets.