A union is a structure where data members share the same memory space. A union can be used for saving memory space by allowing two data members that are never used at the same time to share the same piece of memory. See page 91 for an example.
A union can also be used for accessing the same data in different ways. Example: // Example 7.43 union { float f; int i; } x; x.f = 2.0f;
x.i |= 0x80000000; // set sign bit of f cout << x.f; // will give -2.0
In this example, the sign bit of f is set by using the bitwise OR operator, which can only be applied to integers.
7.27 Bitfields
Bitfields may be useful for making data more compact. Accessing a member of a bitfield is less efficient than accessing a member of a structure. The extra time may be justified in case of large arrays if it can save cache space or make files smaller.
It is faster to compose a bitfield by the use of << and | operations than to write the members individually. Example:
// Example 7.44a struct Bitfield { int a:4; int b:2; int c:2; }; Bitfield x; int A, B, C; x.a = A; x.b = B; x.c = C;
Assuming that the values of A, B and C are too small to cause overflow, this code can be improved in the following way:
// Example 7.44b union Bitfield { struct { int a:4; int b:2; int c:2; }; char abc; }; Bitfield x; int A, B, C; x.abc = A | (B << 4) | (C << 6); Or, if protection against overflow is needed:
// Example 7.44c
7.28 Overloaded functions
The different versions of an overloaded function are simply treated as different functions. There is no performance penalty for using overloaded functions.
7.29 Overloaded operators
An overloaded operator is equivalent to a function. Using an overloaded operator is exactly as efficient as using a function that does the same thing.
An expression with multiple overloaded operators will cause the creation of temporary objects for intermediate results, which may be undesired. Example:
// Example 7.45a
class vector { // 2-dimensional vector public:
float x, y; // x,y coordinates vector() {} // default constructor vector(float a, float b) {x = a; y = b;} // constructor
vector operator + (vector const & a) { // sum operator return vector(x + a.x, y + a.y);} // add elements };
vector a, b, c, d;
a = b + c + d; // makes intermediate object for (b + c) The creation of a temporary object for the intermediate result (b+c) can be avoided by joining the operations:
// Example 7.45b
a.x = b.x + c.x + d.x; a.y = b.y + c.y + d.y;
Fortunately, most compilers will do this optimization automatically in simple cases.
7.30 Templates
A template is similar to a macro in the sense that the template parameters are replaced by their values before compilation. The following example illustrates the difference between a function parameter and a template parameter:
// Example 7.46
int Multiply (int x, int m) { return x * m;}
template <int m>
int MultiplyBy (int x) { return x * m;}
int a, b;
a = Multiply(10,8); b = MultiplyBy<8>(10);
a and b will both get the value 10 * 8 = 80. The difference lies in the way m is transferred to the function. In the simple function, m is transferred at runtime from the caller to the called function. But in the template function, m is replaced by its value at compile time so that the compiler sees the constant 8 rather than the variable m. The advantage of using a template parameter rather than a function parameter is that the overhead of parameter transfer is
avoided. The disadvantage is that the compiler needs to make a new instance of the template function for each different value of the template parameter. If MultiplyBy in this example is called with many different factors as template parameters then the code can become very big.
In the above example, the template function is faster than the simple function because the compiler knows that it can multiply by a power of 2 by using a shift operation. x*8 is replaced by x<<3, which is faster. In the case of the simple function, the compiler does not know the value of m and therefore cannot do the optimization unless the function can be inlined. (In the above example, the compiler is able to inline and optimize both functions and simply put 80 into a and b. But in more complex cases it might not be able to do so).
A template parameter can also be a type. The example on page 38 shows how you can make arrays of different types with the same template.
Templates are efficient because the template parameters are always resolved at compile time. Templates make the source code more complex, but not the compiled code. In general, there is no cost in terms of execution speed to using templates.
Two or more template instances will be joined into one if the template parameters are exactly the same. If the template parameters differ then you will get one instance for each set of template parameters. A template with many instances makes the compiled code big and uses more cache space.
Excessive use of templates makes the code difficult to read. If a template has only one instance then you may as well use a #define, const or typedef instead of a template parameter.
Templates may be used for metaprogramming, as explained at page 160.
Using templates for polymorphism
A template class can be used for implementing a compile-time polymorphism, which is more efficient than the runtime polymorphism that is obtained with virtual member functions. The following example shows first the runtime polymorphism:
// Example 7.47a. Runtime polymorphism with virtual functions class CHello {
public:
void NotPolymorphic(); // Non-polymorphic functions go here virtual void Disp(); // Virtual function
void Hello() {
cout << "Hello ";
Disp(); // Call to virtual function }
};
class C1 : public CHello { public:
virtual void Disp() { cout << 1;
} };
class C2 : public CHello { public:
virtual void Disp() { cout << 2;
} };
void test () {
C1 Object1; C2 Object2; CHello * p;
p = &Object1;
p->NotPolymorphic(); // Called directly p->Hello(); // Writes "Hello 1" p = &Object2;
p->Hello(); // Writes "Hello 2" }
The dispatching to C1::Disp() or C2::Disp() is done at runtime here if the compiler does not know what class of object p points to (see page 75). Current compilers are not very good at optimizing away p and inlining the call to Object1.Hello(), though future compilers may be able to do so.
If it is known at compile-time whether the object belongs to class C1 or C2, then we can avoid the inefficient virtual function dispatch process. This can be done with a special trick which is used in the Active Template Library (ATL) and Windows Template Library (WTL):
// Example 7.47b. Compile-time polymorphism with templates // Place non-polymorphic functions in the grandparent class: class CGrandParent {
public:
void NotPolymorphic(); };
// Any function that needs to call a polymorphic function goes in the // parent class. The child class is given as a template parameter: template <typename MyChild>
class CParent : public CGrandParent { public:
void Hello() {
cout << "Hello ";
// call polymorphic child function: (static_cast<MyChild*>(this))->Disp(); }
};
// The child classes implement the functions that have multiple // versions:
class CChild1 : public CParent<CChild1> { public:
void Disp() { cout << 1; }
};
class CChild2 : public CParent<CChild2> { public: void Disp() { cout << 2; } }; void test () {
CChild1 Object1; CChild2 Object2; CChild1 * p1;
p1 = &Object1;
p1->Hello(); // Writes "Hello 1" CChild2 * p2;
p2 = &Object2;
}
Here CParent is a template class which gets information about its child class through a template parameter. It can call the polymorphic member of its child class by type-casting its 'this' pointer to a pointer to its child class. This is only safe if it has the correct child class name as template parameter. In other words, you must make sure that the declaration
class CChild1 : public CParent<CChild1> {
has the same name for the child class name and the template parameter.
The order of inheritance is now as follows. The first generation class (CGrandParent) contains any non-polymorphic member functions. The second generation class
(CParent<>) contains any member functions that need to call a polymorphic function. The third generations classes contain the different versions of the polymorphic functions. The second generation class gets information about the third generation class through a template parameter.
No time is wasted on runtime dispatch to virtual member functions if the class of the object is known. This information is contained in p1 and p2 having different types. A disadvantage is that CParent::Hello() has multiple instances that take up cache space.
The syntax in example 7.47b is admittedly very kludgy. The few clock cycles that we may save by avoiding the virtual function dispatch mechanism is rarely enough to justify such a complicated code that is difficult to understand and therefore difficult to maintain. If the compiler is able to do the devirtualization (see page 75) automatically then it is certainly more convenient to rely on compiler optimization than to use this complicated template method.
7.31 Threads
Threads are used for doing two or more jobs simultaneously or seemingly simultaneously. Modern CPUs have multiple cores that makes it possible to run multiple threads
simultaneously. Each thread will get time slices of typically 30 ms for foreground jobs and 10 ms for background jobs when there are more threads than CPU cores. The context switches after each time slice are quite costly because all caches have to adapt to the new context. It is possible to reduce the number of context switches by making longer time slices. This will make applications run faster at the cost of longer response times for user input.
Threads are useful for assigning different priorities to different tasks. For example, in a word processor the user expects an immediate response to pressing a key or moving the mouse. This task must have a high priority. Other tasks such as spell-checking and repagination are running in other threads with lower priority. If the different tasks were not divided into
threads with different priorities, then the user might experience unacceptably long response times to keyboard and mouse inputs when the program is busy doing the spell checking. Any task that takes a long time, such as heavy mathematical calculations, should be scheduled in a separate thread if the application has a graphical user interface. Otherwise the program will be unable to respond quickly to keyboard or mouse input.
It is possible to make a thread-like scheduling in an application program without invoking the overhead of the operating system thread scheduler. This can be accomplished by doing the heavy background calculations piece by piece in a function that is called from the message loop of a graphical user interface (OnIdle in Windows MFC). This method may be faster than making a separate thread in systems with few CPU cores, but it requires that the background job can be divided into small pieces of a suitable duration.
The best way to fully utilize systems with multiple CPU cores is to divide the job into multiple threads. Each thread can then run on its own CPU core.
There are four kinds of costs to multithreading that we have to take into account when optimizing multithreaded applications:
• The cost of starting and stopping threads. Do not put a task into a separate thread if it is short in duration compared with the time it takes to start and stop the thread. • The cost of task switching. This cost is minimized if the number of threads with the
same priority is no more than the number of CPU cores.
• The cost of synchronizing and communicating between threads. The overhead of semaphores, mutexes, etc. is considerable. If two threads are often waiting for each other in order to get access to the same resource, then it may be better to join them into one thread. A variable that is shared between multiple threads must be declared
volatile. This prevents the compiler from storing that variable in a register, which is not shared between threads.
• The different threads need separate storage. No function or class that is used by multiple threads should rely on static or global variables. (See thread-local storage p. 28) The threads have each their stack. This can cause cache contentions if the threads share the same cache.
Multithreaded programs must use thread-safe functions. A thread-safe function should never use static variables.
See chapter 10 page 107 for further discussion of the techniques of multithreading.