![]() |
![]() |
|||||||||||||||||||||||||||||
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|||||||||||||||||||||||||
What's Wrong With This Code? Volume #6
The STL
|
![]() |
Tip:
Remember to consider possible resource leaks when removing elements from containers that hold pointers. |
Correct code would (probably) first iterate through v1 and call delete on each element and then do the assignment, unless you can guarantee that the objects have other references besides the elements of v1.
Question #2:
// delete the element in position 5 void f() { if (v1.size() > 5) { delete v1[5]; } }
Why is calling f() most likely going to lead to a program crash?
Answer #2:
The error here is that after element 5 is deleted, the vector still holds has the pointer to that deleted item. This is classic dangling pointer bug. It is a good idea to remove v1[5] from the vector after you call delete.
![]() |
Tip:
Removing an element from a container will invalidate any iterators currently pointing into the vector. |
Question #3:
// copy v1 into v2 void f() { std::copy(v1.begin(), v1.end(), v2.begin()); }
How can this blow up?
Answer #3:This is a correct call to copy. It copies the range of the first two iterators into a corresponding range starting with the third parameter, v2.begin(). Unfortunately, there is potential for disastrous results. Notice, this is the entire body of a function, which emphasizes the fact that this code is run in isolation.
What's the problem? There is no check to see that the destination range is big enough to hold all the items. If v2 is not large enough to hold all of the items, the call to copy could cause a buffer overrun and corrupt your program and hopefully crash it. Suppose vector v1 had a size of 10, but v2 had a size of only 3. After the third element is copied, the next element written to v2 is out of bounds.
Remember that STL algorithms assume that their arguments are correct. They don't perform any checking on their arguments. Therefore, it is up to the author of f() to check that v2 is at least as large as v1 before calling the copy algorithm.
Also, if v2 is longer than v1, the extra elements at the end of v2 are not cleared. Thus, it's incorrect to assume v2 to be an exact copy of v1 after returning from f(). All we can know is that for a v1 of size N, after returning from this call to copy(), v2's first N elements are copies of v1's corresponding elements. Everything after N in v2 is unchanged. To ensure that v1 and v2 are identical after the call to copy, resize v2 to the size of v1 first.
Here is a safer implementation:
// copy v1 into v2 void f() { v2.resize(v1.size()); std::copy(v1.begin(), v1.end(), v2.begin()); }
But that should sound off some alarm in your head. Why? Well, you're reimplementing the operator= now, and probably in a less efficient way than the provided operator=.
Therefore, I suggest replacing f() with:
void f() { v2 = v1; }
All considerations for potential memory leaks from #1 above also apply to this situation. If this will leak, we need to delete the items in v2 first. For pointers, using reference counts would be very helpful here.
Question #4:
// find an element and remove it from container. // assume the caller keeps a reference to a so it // isn't leaked. void f(A *a) { V::iterator i = std::find(v1.begin(), v1.end(), a); if (i) { v1.erase(i); } }
What is the subtle bug here that has serious consequences?
Answer #4:
This time, the problem is subtle. If you don't see it, take a second look. The problem here is frequent enough and sinister enough to spend a minute thinking about it. Hint: v1.erase(i) is *always* called!
What the programmer had intended to do was to search for the element and remove it from the container if it was found. But the code contains a subtle (and nasty!) bug.
The programmer correctly searches v1 for a particular element a by calling find(), and correctly gets back an iterator into the sequence. So far, all is well. But the bug occurs in the if() condition, because (contrary to what the programmer had thought) this code is not testing if the item was found. The comparison is actually testing if the iterator is not zero. Most iterators are implemented as pointers, so this will compile (since a pointer can be implicitly tested against zero.) But the meaning is flawed because find() will always return an iterator that is not "null".
find() returns an iterator pointing to the one-past-last element in the sequence when the element is not found, equal to the iterator returned by calling end(). We therefore must compare the returned iterator against the container's end() value, and not against zero.
![]() |
Note:
find(iter1, iter2, value); find returns iter2 if value is not found. find does not return NULL. Don't compare the return value of find with NULL or zero, because the comparison will never be true! |
The code should be like this:
V::iterator i = std::find(v1.begin(), v1.end(), a); if (i != v1.end()) // Note the change!!! { v1.erase(i); }
Again, consider the possibility of leaking the object to which the pointer being erased still points. We remove the element but don't delete the pointer. We'll assume that v1[i] has other references so it is not leaked by this operation.
Question #5:
// count all a's in the range that are "special" extern bool is_special_value(A *); int count_special_values(V const & v) { int count = 0; V::iterator i = v.begin(); while (i != v.end()) { if (is_special_value(*i)) { ++count; } i++; } return count; }
Why doesn't this compile? Once it compiles, what can be improved about this function?
Answer #5:
Actually, there are several things that can be improved here. First is a minor efficiency detail, but it occurs frequently enough that it is worth mentioning. In the for loop, the i is postfix incremented (i++). That means that the old value is returned in a temporary copy, and the iterator is also incremented. Because it returns a temporary iterator that is not used, it is wasted overhead. ALWAYS use prefix increment (++i) whenever the return value is not used, for *any* data type. On some platforms, it's even faster for integers. Unless someone overloads the prefix operator in a nonsensical way, it's always as fast or faster than postfix.
![]() |
Tip:
Prefer ++i over i++ whenever possible. I think the name of the language, C++, is partially to blame for the over use of postfix operator. |
Next, since v is a const reference, it must use a const_iterator or it will not compile. For read-only operations, using const_iterator is always the best choice. Const is a safety measure. Most professionals use safety equipment: football players wear pads, doctors wear latex gloves, and (good) computer programmers use const whenever possible. After all, it's crazy to enter a construction yard without a hard-hat.
Now for the main point of this exercise. This question was designed to point out that the standard library supplies a great deal of functionality that many people do not use. Consider the count_if() algorithm to implement this function, since it does what the programmer wants, straight out of the box.
![]() |
Tip:
Use the standard library as often as possible. If you don't know what's in the standard library, it's worth the time to learn. The upfront cost of learning what is available is more than paid for when you can actually make use of the standard library. It's tested, efficient, correct, and standardized. |
A better way to have written this function, then would be to use the standard library algorithm count_if().
// count all a's in the range that are "special" extern bool is_special_value(A *); int count_special_values() { return std::count_if(v.begin(), v.end(), is_special_value); }
What are the chances of this new version being buggy? Practically zero. What are the chances of accidentally using an inefficient operation? Practically zero, since you write no code. Of course, the existence of count_if() almost eliminates the need for count_special_values() altogether.
![]() |
Tip:
Avoid writing home-brew algorithms whenever possible because doing so will often turn out to be a waste of your time. |
Question #6:
// erase all members of v1 void f() { V::iterator i; V::iterator end = v1.end(); for (i= v1.begin(); i != end; ++i) { v1.erase(i); } }
What's wrong with the assumptions in this code? How can this be made more efficient?
Answer #6:
First and foremost, i is an iterator into a vector v, but after you erase an element the iterator becomes invalid.
When a vector removes an element, it moves everything around, and the old iterators are pointing to the wrong things, possibly to the wrong memory entirely. Do not use invalid iterators, they are some of the toughest bugs to track down!
We could replace the body of f() with this:
void f() { int size = v1.size(); for (int ii = 0; ii < size; ++ii) { v1.erase(v1.begin()); } }
This is now correct, but is extremely slow. The vector must shift all elements after the removed element to fill in the "hole". Therefore, it must shift N elements N times, making this an N^2 algorithm.
Can we do better? Yes.
One might try a more efficient algorithm: instead of removing the head of the vector, remove the tail. Then nothing needs shifting because its only the tail that is removed, thus no holes are introduced into the vector.
void f() { int size = v1.size(); for (size_t ii = 0; ii < size; ++ii) { v1.erase(v1.end()); } }
Unfortunately, it is undefined to remove end() from a sequence, since end() is not referring to an actual element in the container. We could do this instead:
void f() { size_t size = v1.size(); for (size_t ii = 0; ii < size; ++ii) { v1.erase(--v1.end()); } }
It may seem strange to decrement end, but for bidirectional iterator it's ok, since we did test that the container is not empty. However, this still creates a temporary iterator and then modifies it for each iteration through the loop. A more efficient way would be to call pop_back().
void f() { size_t size = v1.size(); for (size_t ii = 0; ii < size; ++ii) { v1.pop_back(); } }
The above algorithm operates in linear time. Again I ask, is there a better way? Again the answer is "yes".
The solution is to use the right operation. All containers have a clear() method that efficiently removes all elements. clear() must also be linear, but the constant overhead is certainly smaller than for the code in the above implementation of f().
In situations like this, there is absolutely zero benefit to rolling your own code. Let the library do the work, it knows the best way to operate on itself.
void f() { v1.clear(); }
Now the question of the century: Why does f() exist at all, if clearing v1 is all we want to accomplish? If there are no other preconditions, and no other post conditions, calling f() has a price of an extra function call, without ANY benefits of being an extra function. It requires more documentation, more testing, and adds complexity. Without other benefits, in this case, it is better to just call clear() directly.
Question #7:
// remove even-valued entries from the list typedef std::list<A*> List; void f(List & lst) { // now operate on the list List::iterator i = lst.begin(); while (i != lst.end()) { if ((*i)->is_even()) { lst.erase(i); } ++i; } }
What common mistake is the programmer making?
Answer #7:
Again we have iterator invalidation. After i is erased from the list, it becomes invalid. But later, that invalid iterator is incremented, resulting in undefined behavior.
But this item highlights a special case: that the std::list collection guarantees that insertions and removals of elements only affect the specific element being removed or added, and do NOT affect other elements, nor do they invalidate iterators referring to other elements.
With that in mind, we can rewrite the above loop to be correct.
// // Correct implementation // void f(List & lst) { // now operate on the list List::iterator i = lst.begin(); while (i != lst.end()) { if ((*i)->is_even()) { lst.erase(i++); } else { ++i; } } }
The "trick" (if you consider it one) is to postfix-increment the iterator in the call to erase(), or prefix increment it otherwise. Why does this work? Remember how postfix increment works: It creates a temporary copy of the iterator, increments the real iterator, and returns the temporary. Since it is done in that exact order, the temporary is a valid iterator when we increment. The real iterator is safely advanced to the next node, and then the temporary is returned to the erase() function, which then removes that element that our iterator USED to point to. Since lists only invalidate iterators that point to the element when it's deleted, the real iterator is already on the next node before the old element i referred to is removed.
Also, you will notice that in the correct implementation above, the increment of i is moved into an else block. This way, if *i is even, we don't increment twice.
// // incorrect implementation // void f(List & lst) { // now operate on the list List::iterator i = lst.begin(); while (i != lst.end()) { if ((*i)->is_even()) { lst.erase(i++); } ++i; // PROBLEM: i may be twice incremented } }
![]() |
Tip:
Never blindly double-increment an iterator. When you must increment an iterator, be sure it is not equal to end(). |
Question #8:
std::mismatch(v1.begin(), v2.begin(), v2.end());
Something is wrong with this call that has undefined runtime behavior (though you can reasonably expect a crash.) What?
Answer #1:
std::mismatch scans two sequences to find the first element that is not equal. It takes a range for one of the sequences, and a starting point for the second sequence. It assumes that the second sequence is at least as long as the first.
The functionality of mismatch is really irrelevant to the problem with the code though. The argument list in the above code is where you will find the problem. The programmer is passing in the wrong iterators in the wrong places. A parameter mismatch, if you like puns.
std::mismatch is declared like this:
template <class InputIterator1, class InputIterator2> pair<InputIterator1, InputIterator2> mismatch(InputIterator1 first1, InputIterator1 last1, InputIterator2 first2);
If the template syntax is distracting you, just look at the parameters the function takes: three input iterators. first1 and last1 specify a range in the first sequence, and first2 is an iterator to the beginning of the second sequence. That is, the first two arguments form a range of one sequence, the third argument is the start of another sequence.
So what's wrong with the original code? The programmer thought that the first argument was the start of a sequence, the second parameter was the start of the second sequence, and the third parameter was the end of the second sequence. Passing the iterators in the wrong order has some very serious consequences:
1) | The range specified is invalid. Iterating through the range that the programmer provided would probably cause a memory fault or a program crash. Actually, it's undefined, but undefined can be translated to mean "bad things will happen." Usually a crash. A range specified by a pair of iterators has some rules: The range specified must be two iterators in the SAME sequence, and the first iterator must be capable of reaching the second iterator by repeated application of operator++. |
2) | Because the third parameter is expected to contain the iterator corresponding to the beginning of the second sequence, the mismatch algorithm will start one past the end, and increment from there, a recipe for disaster. |
Unfortunately, the code compiles without any problems. Here is the correct way to call the function:
V * seq1; V * seq2; if (v1.size() < v2.size()) { seq1 = &v1; seq2 = &v2; } else { seq1 = &v2; seq2 = &v1; } std::mismatch(seq1->begin(), seq1->end(), seq2->begin());
The extra work done before the call to mismatch is simply setting up intermediate pointers to the lists, such that we can guarantee that the second sequence is at least as large as the first. This way we never walk past the end of a sequence.
![]() |
Tip:
There is no way for the compiler to detect a problem when the programmer gives the iterators in the wrong order. Therefore when you use STL algorithms, slow down just a bit and make sure you're using them correctly, and passing in the arguments in the correct order. Check the documentation (or header files!) to make SURE that what you think the algorithms expect are really what you provide. |
![]() |
Tip:
If a program is crashing in an STL algorithm, always think of potential problems with iterators first: could they have been invalidated? Are the ranges correct? Are my arguments to functions in the correct order? When applicable, is the second sequence long enough (for example, mismatch(), equal(), search(), etc)? |
Question #9:
// call do_something_with() for each item in v1 that is not even extern void do_something_with(A*); void f() { V::iterator i = v1.begin(); while (i != v1.end()) { if ((*i)->is_even()) { // skip this item ++i; } do_something_with(*i); i++; } }
What bugs are in this code? It not only contains logic bugs (it doesn't do what the documentation (comment at top) says), but it has iterator problems as well. How many can you find?
Answer #1:
Hopefully, from the earlier problems, this one is a piece of cake. Do you see the problem? Two things already discussed should jump out, one big, one small. Plus, though perhaps a little subtle, there is a logic bug lurking in there as well, but fixing the first two problems will also fix the logic bug for free!
First the small problem. i++ by itself is inefficient. It should be ++i or we are wasting CPU cycles. This is especially important in loops, where the overhead is multiplied by the number of iterations. This was discussed earlier in item #5.
The bigger problem is that, once again, we are doing a blind double-increment, without checking that i is not already equal to end() after the first increment. (I say "blind" because it is not checking for the problem, assuming the problem cannot (or will not) happen.)
The logic bug also exists in the "skip even" logic, also related to the blind-double-increment. The problem is that if we check the very last element for evenness, and it is, we advance the iterator to one-past-the-end. Since we blindly increment it again, the iterator is now one-past the one-past-end element, and program behavior is undefined. This is the danger of double increments: you can accidentally jump "over" the loop termination condition, and then practically enter an infinite loop (at least until your program crashes.) Accessing past the end of a sequence is undefined and out of bounds.
As for the logic bug, suppose the vector has all even numbers in it (or is_even() returns true for every element.) This code will skip the first even, and then immediately do_something_with() the next value without first checking that it's not even. Since the documentation at the top of the function says that do_something_with() shall be called only on items that are not even, this is a logic bug. The above code is capable of calling do_something_with() on elements that are even. This is another side-effect of blindly incrementing the loop variable.
![]() |
Tip:
Never increment your loop control variable in more than one place in the body of your loop, whenever you can help it. When you think you need to, think twice before coding it. That is, it's wise to treat double incrementing a loop variable as "incorrect" until proven otherwise. |
Here is a better implementation that restructures the logic to eliminate the extra increment, so it is only incremented in one place. This also fixes the logic bug described earlier. (Isn't it funny how removing a double-increment can fix more than one problem?) Hopefully you're convinced by now that incrementing a loop variable multiple times can be bad, if the conditions are not checked before each increment. This is especially true of iterators, even if they are not in a loop they can walk past the end if they're double incremented blindly.
// // correct rewrite // void f() { V::iterator i = v1.begin(); while (i != v1.end()) { if (!(*i)->is_even()) { do_something_with(*i); } ++i; } }
Here we simplified it to only do_something_with() if the value is not even. This code is much more straight forward, and now bug free. (Well, assuming that elements cannot be null.)
The above function can still be improved, for those wishing to squeeze every ounce of efficiency as possible out of the compiler. The remaining concern is the overhead of returning a temporary value from end() each time we iterate through the loop. This is solely an efficiency issue, and while probably minor, there is no reason not to fix this because it's clean and elegant, and faster.
Since the container cannot change in size, (since the body of the loop does not modify it), we can be sure that end() will always return an iterator to the same element. Therefore, there is no need to call end() and recalculate the end for each iteration, since it always returns the same value. Even though it's an inline function, it is not "free" due to the construction of a temporary iterator. Iterators are objects (or can be implemented that way), and they have a cost to create. Fortunately, we can cache the return value, and then compare against our cached value rather than recalculating it each iteration. We can be more efficient if we make the minor change as follows:
void f() { V::iterator i = v1.begin(); V::iterator e = v1.end(); // cache end()'s return value while (i != e)) { if (!(*i)->is_even()) { do_something_with(*i); } ++i; } }
And there you have it, fast, bug free, and easy.