Mean of two floating point numbers can be dangerous

Today I discovered an interesting bug in my code. I had a list of one dimensional points of type double and wanted to find a value somewhere in between to split the list into two lists based on the <= relation. I computed the splitting point by finding a best place in the list and computing the exact split value as a mean of 2 ordered neighboring points:

This  worked fine until the following happened:

Notice that the 2 points differ only in the least significant bit of the mantissa. This means that there doesn’t exist any middle point in the binary representation and on my hardware it caused the splitValue to be equal to point2. Because I used the <= relation and these two points were at the end of the list this meant that no splitting actually happened and it sent my code into infinite recursion until stack overflow happened.

I solved it by extending the code to the following:

If the above situation happens the splitValue is assigned the value of the lesser point. This ensures that the split always splits the list because the right point (point2) is strictly greater than the left point.

The power of <algorithm>

Have you seen the presentation by Bjarne Stroustrup Why you should avoid Linked Lists? This presentation has received lots of popularity from many programmers even though the example Bjarne used to compare the performance of linked lists and vectors was quite misleading

He planned to show that vectors are better than linked lists even in scenarios that most programmers think are better suited for linked lists. The problem is, that in his example, the asymptotic complexity is the same when using the linked list or when using the vector. I don’t think any competent programmer would even consider using linked list in that case. He would probably use some kind of binary search tree which would dominate the performance as shown in another response to the presentation.

On Bjarne’s defense, he did correctly mention that the problem in his example is the linear search. The problem is that many people ignore it and jump to wrong conclusions like: Arrays are good, all other fancy containers are bad, and that is all because of cache.

In this post, I wanted to show an example where linked lists dominate arrays to demonstrate that they are not useless at all. The example is simply the following task, which might be quite common in real life: Remove all odd integers from a sequence.

This can be solved quite easily by the following code which is quite ordinary:

For linked lists this code has complexity of O(n) because we already have the iterator for the element we are erasing and do not have to perform linear-search for every element as in Bjarne’s example. For vectors the complexity is O(n^2) because we need to shift all of the following elements after every erase we perform.

Benchmarking on my computer shows an expected decisive victory of linked lists now.

This clearly shows that linked lists are not useless. As with everything, you just need to use them in a right situation. In this case we already have an iterator inside the list so we can perform item insertion or removal in constant time. Of course, in real life we do not have only linked lists and vectors. It is true that for most tasks std::deque would outperform linked lists. Pure linked lists have really niche applications. As an example, they can possibly outperform std::deque when the items are huge.

Now comes the gotcha and the main point of this post. There exists another way how to perform this task and that is by using the standard library:

After benchmarking this the results may seem surprising to you:

As you can see the vector is now slightly faster than the linked list. How can it be? Where did the complexity go? The answer is that the std::remove_if is a bit more clever than our approach above and the complexity for vector is again O(n).

The lesson from this is that you should really, really look at the <algorithm> library and, instead of imperfectly reinventing wheels, try to use it when you can.

The following is the whole code I used for benchmarking:

 

 

 

Converting int to float while preserving it’s binary layout

So it happened today that I was forced to use a library in C++ which stores RGB8 colors in float. I personally think it is a really bad design decision but I had to deal with that.

Simple function like the following one doesn’t work:

The problem is that the uint32_t is implicitly converted to float and the binary layout is lost.

After a quick google search I had arrived at this:

It works correctly and maybe it is an idiomatic way to achieve this but to me it seems a bit hacky. After a while I had recalled unions and decided to use the following code:

It is a bit longer but to me it seems much more readable. Also my inner geek is satisfied that I’ve used union and it actually made sense (:

How to use Qt only in a part of your application

I have been developing an application in ROS and I have needed to create GUI for one specific subtask. Because the GUI might get quite complicated I have decided to use Qt. However I didn’t want to use Qt in the whole application because this “GUI task” might not have been invoked in the applications lifetime.

The standard way how Qt sets itself up is this:

I simply couldn’t do this for the reasons listed. After some experimentation I have arrived at this solution (It’s just a prototype):

The important things are:

  1. QApplication is created only once. Pointer to it is made private static. That way even if you create more instances of the class (ManualFeatureMatcher in this case) there will still be only one instance of QApplication. I have also faced bugs when the QApplication was destroyed and created again. Smart pointer isn’t really that much necessary here, normal pointer would have achieved the same.
  2. QDialogs are used. I wanted to make the method matchFeatures() blocking. QDialogs also have their own EventLoop so there is no need to use the QApplication::exec()
  3. Fake argc and argv are static. This gave me a headache for a while. Those fake arguments need to live as long as the QApplication lives because it keeps their addresses.
  4. There is a check for an existing QApplication instance using QApplication::instance and not a check for nullness of the private pointer. Not needed for now but I do not want to break this module when I decide to use GUI somewhere else.

This solution seems quite solid and works fine for me. I am a Qt newbie however (started yesterday), so there is a possibility I may have overlooked something.

Think twice before pulling up

Pull up refactoring means that you move a member from a subclass to a superclass. It might be useful to remove duplicated code but it can also damage your design quite a lot.

A lot of times I wanted to perform pull up I came up with a better solution. The thing is that the behavior has to make sense in the context of the superclass. Yesterday I came up  with this code.

 

 This is the result of bad pull up refactoring. Some of the subclasses shared method CreateDirectionVector so I decided to pull it up to the parent. As you can see the method doesn’t really fit there. You can ask why is the WeaponBehavior responsible for carrying out this vector calculation. Also none of the other methods uses it so it isn’t obvious why the method even exists.

I have solved this by creating a class VectorUtilities and put the method there. I am always anxious when I want to create such class but I think in this case the Utility class is justifiable in my architecture.

 

 

Returning input arguments

I have finally laid my eyes on the book Clean Code by Robert C. Martin. I am just quickly browsing it because I think I am already familiar with most of the practices but sometimes something catches my eye.

In the chapter about monadic functions (functions with one argument) author mentions this function:

He says:

Using an output argument instead of a return value for a transformation is confusing. If a function is going to transform its input argument, the transformation should appear as the return value.

And suggests to refactor to something like this (where you can return the input argument):

I am not quite sure about happily returning the input argument. From the signature of the function it is not obvious to me that the returned instance is the same as the input one. It seems more like it is returning a different object (maybe a copy?).

Another thing is that if we return the  argument and change the object, then the function still uses the argument as an output argument. That doesn’t seem too DRY to me because then the function returns the same object through 2 different channels.

My conclussion is that I think that if a function transforms an input argument then the clearer way to tell that is not to mask it by the return value that is somewhat redundant.

visualization of Pi

Interesting Pi visualization from a different angle

This image has landed on my facebook feed (original source). The lines connect neighboring digits in the decimal representation of Pi for 10000 digits. A lot of people has commented how the Pi is special and beautiful. They are right but the special feature of Pi in this case is a bit hidden. Any sequence of independent random variables with uniform distribution would produce similar result.

There exists a special term for numbers that have this interesting property. These are called normal numbers and wikipedia says this about them:

In mathematics, a normal number is a real number whose infinite sequence of digits in every base b is distributed uniformly in the sense that each of the b digit values has the same natural density 1/b, also all possible b^2 pairs of digits are equally likely with density b^−2, all b^3 triplets of digits equally likely with density b^−3

The thing is that it is not yet proven if Pi is normal number or not. However it is tested for a vast amount of digits so the first 10000 digits really have the uniform distribution.

Because I couldn’t quickly find the transition probabilities on the internet, I have made a simple statistics myself, you can look at it here:

 

 

Why, why, why, why …?

I bet you know this situation. You encounter an interesting theorem, but you are not 100% sure why it is true. You find a proof for this theorem only to see it refers to another theorem which is also interesting, but you are also not 100% sure why is it true. You can go on and on with this exploring seemingly unrelated things and admiring how everything is connected with everything :). I personally love when I do this stuff because I think it gives me a lot of insight into things.

I can give an example of this which I have really enjoyed a few months ago. I have asked myself this question: Why can every possible rotation in 3D space be expressed in axis-angle representation?

When you look at this statement it seems reasonable to be true, but there is always a seed of doubt so I have started looking for a proof. The proof I have found was really elegant. It stated that because every rotation in 3D space can be expressed by a 3×3 matrix and this matrix has to have at least  one real eigenvalue, you can pick one of the eigenvectors of the matrix as the direction vector of the axis of rotation.

I have really liked this proof but I had to ask the next question myself: Why there has to be at least one real eigenvalue? The answer came a few seconds after: Because the characteristic polynomial is of order 3 and these polynomial have at least one real root. And because I was already in this mood I have spent the next few minutes experimenting with odd polynomials to really prove that there has to be at least one real root. This is what a simple question about rotations has led me to that day.