Defining new types

When developing complex programs, you only seldom have to to with ints or doubles directly. Most of the time you will work with objects that correspond to real-world (or virtual-world) objects like persons, credit cards, bank accounts, players, monsters, books, dates and so on. Dealing with these by explicitly juggling with C++’s built-in types quickly becomes cumbersome and error-prone.

For example, assume we are developing a game in which two player can move across the screen and has a certain amount of points. To represent this player in our program we could use two ints for the x and y-coordinate of the position and another one for the points:

#include <iostream>

void draw_player(int player_x, int player_y, int player_points)
{
    std::cout
        << "Player at " << player_x << "," << player_y
        << " with " << player_points << " points.\n";
}

void next_round(unsigned& game_round)
{
    ++game_round;
    std::cout << "Round " << game_round << ":\n";
}

int main()
{
    auto game_round = 0u;
    auto player_a_x = 0, player_a_y = 0, player_a_points = 0;
    auto player_b_x = 0, player_b_y = 0, player_b_points = 0;

    next_round(game_round);
    draw_player(player_a_x, player_a_y, player_a_points);
    draw_player(player_b_x, player_b_y, player_b_points);

    next_round(game_round);
    // Move player a right a bit:
    player_a_x += 5;
    draw_player(player_a_x, player_a_y, player_a_points);
}

This code suffers from several problems:

  • The player variables are not explicitly separated from each other or other unrelated variables.

  • The variables are are ordinary ints. E.g. nothing would prevent you from accidentally writing nonsensical statements like:

    // Oops, wrong order!
    draw_player(player_a_points, player_a_x, player_a_y);
    
    // Oops, players mixed!
    draw_player(player_a_x, player_a_y, player_b_points);
    
    // More nonsense:
    next_round(player_a_points);
    draw_player(player_b_x, player_b_y, game_round);
    
  • Adding a new data component to a player involves changing all places where a player was used.

Simple compound types: struct

By using the struct construct and the . (dot) operator, we can improve our program as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#include <iostream>

struct Player {
    int x;
    int y;
    int n_points;
};

void draw_player(Player player)
{
    std::cout
        << "Player at " << player.x << "," << player.y
        << " with " << player.n_points << " points.\n";
}

void next_round(unsigned& game_round)
{
    ++game_round;
    std::cout << "Round " << game_round << ":\n";
}

int main()
{
    auto game_round = 0u;
    Player player_a = {}, player_b = {};

    next_round(game_round);
    draw_player(player_a);
    draw_player(player_b);

    next_round(game_round);
    player_a.x += 5; // Move player to the right a bit.
    draw_player(player_a);
}

First, we define the Player structure in line 3 ‒ 6. It starts with the keyword struct followed by a name. Then come, inside curly braces {} the data members of the struct: these are just explicitly typed variable definitions, like with local variables. After the closing curly brace follows a semicolon ; (contrary to functions, where this is neither required nor allowed).

This structure is its own type. It can be used quite similarly to built-in types like int: It can e.g. be used as function parameter type (line 8) or for local variables (line 24). However, since it is a distinct type, you cannot assign a Player variable to an int variable or vice versa, solving the second of the aforementioned problems. To use variables of the type Player the program above uses the following operations on the whole struct Player object:

  • Initialize all members to zero by using = {} (bools would become false; see line 24).
  • Assign players to each other (happens implicitly when passing the player by value to draw_player; see lines 8 and 27, 28 and 32).

However, in the end a struct only consists of its data members, and hence the most important operation of a struct object is accessing its members. This is done with the . (dot) operator in lines 11, 12 and 31 of the above program. An member access expression like player.x can be used like a normal variable: both reading the data member (e.g. writing it to std::cout or saving it into an int variable) and writing to it (i.e. assigning to it using = or using a compound assignment operator like +=) is possible: it yields an lvalue. As a consequence of being an lvalue you can also bind this to a reference like:

Player player = {};
int& rx = player.x;
rx = 3; // Now player.x is also 3.
player.x = 42; // Now rx is also 42.

Note that the two Player variables are of course independent of each other, just like the int variables were in the initial program. This means that changing data members of one player will not change the other player.

struct initialization with {}

In the above example programs we have always used = {} to initialize all of a structs members with zero. However, we can also choose our own values for the data members by writing them inside the braces, separated by comma:

#include <iostream>

struct Player {
    int x;
    int y;
    int n_points;
};

void print_player(Player player)
{
    std::cout << player.x << ',' << player.y << ": " << player.n_points << '\n';
}

Player new_player(int n_points) {
    return {0, 0, n_points};
}

int main()
{
    Player player = {5, 3, 100};
    print_player(player);
    player = {1, 2, 3};
    print_player(player);

    Player player_b = {5};
    print_player(player_b);

    print_player({10, 7, 250});
    print_player(new_player(310));
}

Output:

5,3: 100
1,2: 3
5,0: 0
1,2: 42
10,7: 250
0,0: 310

If you study the above program and its output you will also notice that

  • = {...} can also be used for reassigning variables that have a struct type.
  • If you specify less values in the braces than the struct has members, the rest is filled with zero/false. In fact the = {} syntax we have seen before is just a special case of this.
  • If you have a function with a parameter of a struct type, you can use {...} to create a object of that type with the given values on the fly, withouth needing a variable.
  • Similarly, if you have a function with struct return type, you can use {...} to return create an object of that type with the given values on the fly.

Naturally, in all cases where literals like 42 or 5 were used, you can use arbitrary expressions like variables or computations of the right type.

const and structs

Of course variables with struct types can also be made const, meaning that neither the object as whole nor any of its data members may be modified:

const Player player = {5, 5, 100};
print_player(player);
//player = {3, 3, 500}; // Does not compile, player is const.
std::cout << player.x;
//player.x = 6; // Does not compile, player is const.

Nested structs

Like nearly everything in C++, structs can be nested too. For example, it is probable that in a game, we will neeed x/y-positions not just for the player but also for other things like monsters, doors, coins or whatever things are lying around in the game’s virtual world. Thus it totally makes sense to break out these positions from the Player type into a new type Point:

struct Point {
  int x;
  int y;
};

struct Player {
  Point position;
  int n_points;
};

Initialization and member access is naturally also nested then:

Player pl = {{1, 2}, 0};
std::cout << pl.position.y << '\n'; // Prints “2”.
std::cout << pl.n_points << '\n'; // Prints “0”.

Efficiently passing structs to functions

As explained in Background Information: The stack, passing an argument to a function involves pushing it on the stack. While this is fine for small arguments like ints and doubles, most struct types are much bigger. For example, our Player has at least the size of three ints. We can check the exact size with the uary sizeof operator. This operator can be applied to a type or to any expression and evaluates to the size of the (expression’s) type measured in bytes [1]:

std::cout << sizeof(int) << '\n';
std::cout << sizeof(Player) << '\n';
Player player_a = {};
std::cout << sizeof(player_a) << '\n';

With the MSVC compiler, the Player has in fact twelve bytes. To avoid copying all these bytes to the stack for passing it to a function, we could use references, e.g.:

void print_player(Player& player) { /* ... */ }

Now only the address (4 bytes on 32 bit systems and 8 bytes on 64 bit systems) is pushed on the stack. There are two problems though: First, we cannot pass const Player variables or expressions like {5, 5, 100} (an rvalue) and second, we could accidentally change the passed in variable. To avoid both problems we use const references:

void print_player(const Player& player) { /* ... */ }

This solves both of the above problems.

Of course you can also use const references to builtin types like int, but this usually makes no sense and is in fact slower than just copying the int over because accessing the value through a reference is slower than accessing it directly.

Encapsulation and member functions

structs can be more than simple data containers.

Member functions

So far, we have only had data members in structs, but it turns out that we can also add functions to them. For example, we could write a member function draw() instead of a “free” function draw_player(const Player&) like this:

#include <iostream>

struct Point {
    int x;
    int y;
};

struct Player {
    Point position;
    int n_points;

    void draw() const
    {
        std::cout
            << "Player at " << position.x << "," << position.y
            << " with " << n_points << " points.\n";
    }
};

int main()
{
    Player player_a = {}, player_b = {{2, 3}, 1};
    player_a.draw();
    player_b.draw();
}

While data members like position and n_points are stored in each Player object, functions are not really a part of the object. They behave like normal functions with the difference that you can only call them on a particular object (“instance”) of a Player / struct-type. Behind the scenes, the member function receives a hidden additional argument. You can think of it as having the type Player const& if you write const after the function, as done above, or Player& otherwise (note that this means that you can only member functions marked as const on const objects). Then, when a variable or function name appears in the member function’s body, the compiler looks in these places:

  1. local variables and function parameters
  2. members of the hidden additional parameter
  3. global variables and free functions

Considering all this, what is the difference to the version with the free function? If you were to look very closely at the machine code that is generated from both version, you could see that they are in fact bit-by-bit identical. So at runtime there is no difference. Now it seems to be only a matter of style, but there is in fact a drawback: Later when we learn how to split a program into multiple files, we will see that a struct-definition must not span multiple files. So without touching the file where the struct is defined, it is impossible to add new functions. With free functions, you can add more functions for working with the struct without changing the definition.

Encapsulation and access control

There is an important technique that only works with member functions: encapsulation (also called information hiding). By encapsulating members of a type, you restrict access to them to member functions of the type only. This has two advantages: First, you can often change the (now) internal representation of the type without having to change code that uses the type. Secondly, it allows you to enforce invariants on the members, that is to enforce that some conditions always hold.

For the first advantage, consider how, when we combined the x and y members of the player into a new Position type, we also had to change draw_player and all other code using x and y. Were the position encapsulated in the right way, we wouldn’t have had to change anything but the Player-struct itself. The following example demonstrates this part of encapsulation.

#include <iostream>

struct Point {
    int x;
    int y;
};

struct Player {
public:
    int x() const { return m_position.x; }
    void set_x(int x_) { m_position.x = x_; } 

    int y() const { return m_position.y; }
    void set_y(int y_) { m_position.y = y_; }

    int n_points() const { return m_n_points; }
    void set_n_points(int n_points_) { m_n_points = n_points_; }

    Point position() const { return m_position; }

private:
    Point m_position;
    int m_n_points;
};

void draw_player(Player const& player)
{
    std::cout
        << "Player at " << player.x() << "," << player.y()
        << " with " << player.n_points() << " points.\n";
}

int main()
{
    Player player;
    player.set_x(1);
    player.set_y(2);
    player.set_n_points(10);

    draw_player(player);
}

Everything after private: is only accessible to member functions of the struct. On the other hand, everything after public: is also accessible to outside functions. These key words are called access control key words. It is customary to not intend them (relative to the struct itself).

Note

Access-control is per-type not per-object.

That is, having a member function like the following Player:

void steal_points(Player& other) { m_n_points += other.m_n_points; }

in Player and using it like this:

player_a.steal_points(player_b);

is completely OK.

Notice how instead of having a public data member n_points, Player now has a private data member m_n_points (the m_ is just a fairly customary name prefix to mark private data members) and two function for accessing it, namely int n_points() to get the value and set_n_points(int) to set it (the parameter name n_points_ has a trailing underscore to distinguish it from the n_points() function). The former function is called a getter and the latter a setter. Note that not every getters must have a corresponding setter (or vice versa, though that is more unusual), e.g. the position of a Player is read-only because only a getter Point position() const is provided. On the other hand, there may be getters and setters for “properties” that are not even directly backed by data members, as is the case with the x and y properties of the player.

Note

Getters are often used in expressions involving more than one function call and thus should be side-effect free; see Side effects and their order.

Constructors

There is a problem with the previous example: it fails to realize the second advantage of encapsulation, namely maintained invariants. Although we have not explicitly stated any, it clearly undesirable to have a Player where any members are initialized. In fact, the situation regarding initialization with this partially-encapsulated example is worse than before since we cannot use the = { } syntax anymore because the data members are private (in fact this aggregate initialization` cannot be used at all as soon as any members of the initialized type are private). To remediate this, you can define a special member function called the constructor. Constructors functions have no return type and the same name as the type. They can have parameters but don’t have to. A constructor that can be called without parameters is called a default-constructor. If we add a default-constructor to our struct Player, it looks like this:

struct Player {
public:
    Player():
        m_position{0, 0},
        m_n_points{0}
    {
    }

    // ... getters and setters as before ...
private:
    Point m_position;
    int m_n_points;
};

The effect of this change is that every Player variable will be automatically initialized with position of (0, 0) and zero points.

Apart from the name, which is the same as the type’s, and the missing return type, constructors have one more feature that normal functions don’t have: Initializer lists. This comma-delimited list starts at the : after the argument list and the entries have the form data_member_name{initial_value}. In the case of Player above, this is the same as if we had written:

Player()
{
    m_position = {0, 0};
    m_n_points = 0;
}

There are cases where this makes a differences, namely when the some data members have constructors themselves.

Instead of a default-constructor, we could also add a constructor that accepts the initial values as arguments:

Player(Position position_, int n_points_):
    m_position{position_},
    m_n_points{n_points_}
{
}

Now I can also explain where the initializer list behaves different from using =: E.g. if Point had a constructor Point(int x, int y), then we would have to use the initialization list, because if we leave it out, the compiler will try to insert a call to the default constructor as if we had written:

Player(Position position_, int n_points_):
    m_position{}
{
    m_position = position_;
    m_points = n_points;
}

Since the constructor of m_position requires two ints as parameters, however, this would be an error.

A quick glance at templates

One thing that is a bit limiting for the struct Point is that it only has ints as the x and y components. If we only use the Point for one application and we only need int, that may be fine. But in general, we might need Point s of double or int or maybe in one case we need to optimize for memory and even want to use a Point of signed char. We could just write a class for every case such as:

struct IntPoint { int x; int y; };
struct FloatPoint { float x; float y; };
struct DoublePoint { double x; double y; };
// ...

In the case of Point this may even be feasible as the struct is so tiny. But now imagine we also wanted to provide some functions for Points, such as:

IntPoint move_int_point(IntPoint p, IntPoint by)
{
    return {p.x + by.x, p.y + by.y};
}

void print_int_point(IntPoint p)
{
    std::cout << '(' << p.x << ',' << p.y << ')';
}

If we had \(n\) functions and \(m\) types of points, we would need to implement all of the \(n\) functions separately for all \(m\) types which results in \(m \cdot n\) functions. Clearly this does not scale up well.

Class templates

With the template mechanism of C++, structs we can instead write a struct template (usually called a class-template) for Point:

template <typename T>
struct Point {
    T x;
    T y;
};

The template keyword tells the compiler that what follows is a template definition and inside the angle brackets (which are used the usual less-than and greater-than signs repurposed) come the template parameters. In this case the template has as parameter a type (typename) with the parameter name T. Inside the template definition, you can use T just like a normal type.

To use this class template, we now need to specify the template parameter:

void print_point(Point<int> p)
{
    std::cout << '(' << p.x << ',' << p.y << ')';
}

int main()
{
    Point<int> mypoint = {2, 5};
    print_point(mypoint)
}

This is very similar to functions. In fact, you could view the class template Point as a “type function” that takes a component-type as parameter and “returns” a point-type. The big difference is that this “computation” is done by the compiler and not when the program runs.

Function templates

Now we need to write only one Point class-template instead of several SomeTypePoint classes, but we still would need \(m \cdot n\) functions. But function templates come to the rescue!

template <typename T>
void print_point(Point<T> p)
{
    std::cout << '(' << p.x << ',' << p.y << ')';
}

Transforming print_point from a function to a function-template should look extremely similar to the previous tranformation of Point to a class-template. However, when calling a function template, there is a catch: You usually don’t need to specify the template parameter because the compiler figures them out automatically by comparing the function argument with parameter. That is, the main function from before now looks exactly the same!

By using class- and function-templates, instead of \(m \cdot n\) functions, we now need to write each function only once for all Point types, greatly improving maintainability and readability of the program.


Footnotes

[1]Actually, the C++ standard defines sizeof to return the size in multiples of char‘s size (sizeof(char) is defined to be 1). Although the standard only specifies that a char must be at least one byte large, in practice it has exactly one byte on all but some exotic platforms.