From source code to executables

When you write a C++ program, you start with the source code. That’s a human-readable notation of the program, in the C++ programming language. For example a program that writes the text “Hello world!” to a console window looks like

#include <iostream>

int main()
{
    std::cout << "Hello world!\n";
}

You write this text using any plain text editor you like (e.g. Notepad++ or Vim, but not a rich text editor like Word or Open Office Writer). The colors are just added here and by good editors automatically to make the code more readable (that feature is called syntax highlighting); you do not set them yourself in any way and they do not influence the program behaviour. The file is usually saved as with a .cpp file name extension to make it clear that this is a C++ source code file. You might also encounter the .cc or .cxx extensions.

Computers cannot execute such a source file, they must be translated into binary processor instructions first. Thus, next step is to feed this program to a compiler, which does this translation for us. There is a multitude of ways of doing that, depending on the compiler. The compiler then takes the source file and “compiles” it into an executable program which you can then start.

More complex programs will usually consist of multiple C++ files which are then separately compiled into intermediate output files (usually with a .o or .obj extension) and then “linked” together by a program called linker to an executable. In fact, even if just a single source file is compiled into an executable, the compiler automatically invokes the linker behind the scenes.

Thus, the steps are

  1. Write source code. Tool: editor; result: .cpp-File
  2. Compile source code to object file. Tool: compiler; result: intermediate object file
  3. Link object file to executable. Tool: linker; result: executable File (.exe on Windows, without extension on Unix-like systems like MacOS and Linux).

[
  -{Latex[length=4mm]}, auto, node distance=6em,
  file/.style={rectangle, draw, align=center},
  start/.style={circle, fill, draw}
]
\node (start) [start] {};
\node (cpp) [file, right=of start]
  {Source file\\\texttt{*.cpp}, \texttt{*.cc}};
\node (obj) [file,right=of cpp] {Object file\\\texttt{*.o}, \texttt{*.obj}};
\node (exe) [file,right=of obj]
  {Executable file\\\texttt{*.exe}, w/o extension};

\path (start) edge node {Editor} (cpp)
      (cpp) edge node {Compiler} (obj)
      (obj) edge node {Linker} (exe);

From source code to executables.

How To: Install a compiler

Note that nowaydays every compiler comes with a linker, so there is no need to install an extra linker.

Windows

On Windows, I recommend using Microsoft’s free Visual Studio Community Edition. At the time of this writing, the most recent version (which we will of course use) is 2015 Update 3, downloadable at https://www.visualstudio.com/. When you download and run the installer, be sure to tick the checkbox under Programming Languages ‣ Visual C++ ‣ Common Tools For Visual C++ 2015 (Visual Studio supports multiple programming languages and C++ is not among the defaults). You can untick any checkboxes saying something about Windows Phone or .NET.

I will refer to this compiler as MSVC.

Debian/Ubuntu based Linux

On Debian/Ubuntu based Linux, we will use the g++ compiler, a part of the GCC. Installing is as easy as opening a terminal and typing sudo apt-get install g++, but since you’re using Linux, I assume you know what you’re doing anyway. ;-)

How To: Compile your source code

(Assuming you saved your file as hello_world.cpp)

Windows

Open the VS2013 x86 Native Tools Command Prompt. It is located in the start menu (or whatever you call it nowadays) at Visual Studio 2013 ‣ Visual Studio Tools. Then change the current directory to the folder where you saved your file using cd, e.g. cd C:\Users\YOURNAME\somefolder\somesubfolder.

Note

If your source code is on a drive different from the one where the initial directory of the command prompt is, use the cd /D YOURDIRECTORY command (the /D option means “change the Drive if necessary”).

Compiling is then done with the command cl /EHsc /W4 hello_world.cpp. You should now see (among others) a new file hello_wold.exe. You can check in the command prompt by typing dir. If you want to execute your program now, type hello_world, and you should see a line Hello world! appearing in the console window.

The compiler command

Let’s break down the compiler command.

cl /EHsc /W4 hello_world.cpp

cl is the name of Microsoft’s compiler. The things preceded by slashes / are options which tell the compiler how to process your source code. /EHsc tells it to enable so called exceptions, which really should be enabled by default but isn’t. You don’t need to understand this, just remember to add this option whenever you compile C++ code. /W4 stands for “warning level 4” and tells the compiler to tell you about any suspicious things it notices in your source code (this also should be a default but sadly isn’t in any compiler I know of). hello_world.cpp is obviously the name of the source file to compile.

Linux

Open a terminal and change the current directory to the one containing your source file. Then execute g++ -std=c++14 -Wall -Wextra -ohello_world hello_world.cpp. An executable hello_world should now appear. You can check in the terminal by typing ls. If you want to execute your program now, type ./hello_world, and you should see a line Hello world! appearing in the terminal.

The compiler command

Let’s break down the compiler command.

g++ -std=c++14 -Wall -Wextra -ohello_world hello_world.cpp

g++ is the name of the compiler. The things preceded by dashes - are options which tell the compiler how to process your source code. -std=c++14 tells it that we want to use the newest C++14 standard. -Wall and -Wextra tell the compiler to tell you about any suspicious things it notices in your source code (this should be a default but sadly isn’t in any compiler I know of). -ohello_world means “output (-o) as hello_world ” and simply specifies the name of the executable to produce. If you leave this option out, it will be called a.out. hello_world.cpp is obviously the name of the source file to compile.

A few command line tips

To use the command line effectively, I recommend reading a good tutorial [1]. I will, however, give a few tips here:

  • You can use the Up and Down keys to cycle through your last commands. For example, if you want to run the last command again, just press Up again, and the command will reappear, ready to be edited or directly started with Return.
  • You can use the Tab key to complete filenames: for instance, if you typed hel and the file hello_world.cpp is in the current directory, you can press Tab to complete the filename.
  • Pressing Ctrl+C quits the running console program and drops you back to the command prompt, without closing the console window.
  • On Linux, use the ls or ls -l command to show the files and subdirectories in the current directory. On Windows, dir does that.
  • To open the current directory in your GUI file system browser (e.g. Nautilus, Dolphin or Windows Explorer) Use xdg-open . on Linux and explorer . on Windows.

The Hello World program analyzed

Now that you have compiled and run the Hello World program, you are probably curious what all these cryptic lines in the source code mean. Let’s repeat our little program here:

#include <iostream>

int main()
{
    std::cout << "Hello world!\n";
}

Linewise breakdown

Preprocessor #includes

Let’s start with the first line: This is a so-called preprocessor directive more specifically an include directive. It instructs to compiler to read the iostream file and paste it contents in place of the directive, just as if we had written the whole file’s content there instead of just the #include. We need this file because it tells the compiler what std::cout in line 5 means. It is part of the standard library, a set of files that comes with every C++ compiler.

Whitespace

The second line is empty, it has no special meaning. I just inserted it to make the source code look nicer. Generally, C++ ignores all whitespace (that’s how programmers collectively call spaces, tab(ulator)s and line breaks), except that they are sometimes needed to separate words. The only time where this was needed in this program is the space between int and main in line 3. The second case where whitespace has a meaning is preprocessor directives: Here, a line break is needed to tell the compiler where it ends. This means that, if I wanted, I could have written the entire program as short as

#include<iostream>
int main(){std::cout<<"Hello world!\n";}

and it would mean exactly the same as the long version. Oh yes, I have also left the space between Hello and world there, but that will become obvious when we get to line 5. One can also insert as many whitespace as one wants, for example:

#include              <iostream>
              int
main
                      (
               ){
   std  ::         cout <<
"Hello world!\n"

;



}

Pretty much the only place where you cant freely sprinkle whitespace is in preprocessor directives (the #include-directive in our Hello World program), where everything must be written on one line, meaning that exactly the following line breaks are illegal in the program above:

#
include
<
iostream
>

Of course, you also can’t insert whitespace in the middle of words like writing c out instead of cout, and there are a few other exceptions in the form of symbols that are written using more than one character, namely the << and the :: in the program above, that could not have been written as e.g. < < and :  :.

However, I think you see why I wrote the program like I did in the first place: it is just easier to read.

Defining the main program

Now it starts to get interesting!

#include <iostream>

int main()
{
    std::cout << "Hello world!\n";
}

The third line basically says to the compiler: “Here comes the main program!” [2], and the curly braces show the compiler where it starts ({ in line 4) and ends (} in line 6). Everything between them will be executed when the program is started.

Writing to the console

#include <iostream>

int main()
{
    std::cout << "Hello world!\n";
}

This line really is the only one that is actually executed at runtime. It is a so called statement. These are really “instructions”, or “commands” that instruct the computer to do something. In this case, it tells the computer to print the text “Hello world!” followed by a line break (\n) to the standard output (std::cout) [3]. For now, you can thus think of the << symbol symbol as meaning “Print the value at the right hand side to the thing on the left hand side”. [4] The value here is the text between the quotes ("). Such quoted text will be printed as you typed it, except that a backslash \ is used to give the next character a special meaning (like the line break/newline character for \n).

Statements are always terminated with a semicolon (;).

Successive Statements

Of course we can write more than just one statement in the main program. Let’s throw in some more output statements:

#include <iostream>

int main()
{
   std::cout << "This is executed first.\n";
   std::cout << "Hello world!\n";
   std::cout << "Then this,\n"; std::cout << "And this at last.\n";
}

As you probably expected, this will print:

This is executed first.
Hello world!
Then this,
And this at last.

The statements are executed in the same order as you write them.

It has no special meaning that I wrote the last two statements in the same line. As I already said, the compiler does not care about whitespace such as line breaks. It determines where one statement ends and the next one starts based on the semicolons. However, you should generally write only one statement per line. The above was just for demonstration purposes, it’s no good style.

More output

Here are are a two more things you can do with cout, which will be useful later on:

You can omit the trailing \n in the output text, so that the next output statement will continue in the line started by the former:

std::cout << "This is "
std::cout << "on a single line.\n";

yields

This is on a single line.

And you can use multiple << in a single statement:

std::cout << "Something.\n" << "Something " << "other.\n";

yields

Something.
Something other.

Note that in the code examples above, I have only included what is written between the curly braces of the main program. You already know the rest and always reading/writing the same boilerplate code starts getting boring.

Comments

And while we are in the progress of introducing useful misc things, let’s do one more: A comment is a part of the code which is ignored by the compiler:

// Hello World: Print a text to the terminal and exit.

#include <iostream> // For std::cout.

int main()
{
   std::cout << /* The classic text: */ "Hello world!";
}

This program really just prints “Hello world!”. The source code showcases the two kinds of comments in C++:

  • Line comments start with // and extend until the line ends.
  • C-Style comments start with /* and end with */. Note that they cannot be nested, so the following is a syntax error:
/* In comment. /* Still in comment. */ Already outside comment. */

Use comments whenever you think that other people (including your future self) will have an easier time reading the code with them than without them. Always assume that these people know C++ and do not comment what the code already says.

You can also use comments to temporarily disable (comment out) parts of your code:

#include <iostream>

int main()
{
   std::cout << "Hello world!";
   //std::cout << "Won't be executed.";
}

This still prints only “Hello world.”. Note that I have not put a space between the double slash and the comment’s content. That’s just my personal style: For a real comment, I separate the text from the comment sign, for commented out code, I don’t.

Summary

  • Spaces, tabs and line breaks are collectively called whitespace.
  • Except in words, multi-character symbols and preprocessor directives, you can freely add whitespace (for the latter, you can still freely add spaces and tabs).
  • The #include preprocessor directive inserts the contents of the file with the given name literally.
  • The iostream file is necessary for std::cout.
  • The main program is executed when the program starts. It is introduced by int main() and surrounded by curly braces ({ and }).
  • A statement tells the computer to do something. The semicolon ; tells the compiler where the statement ends.
  • A program can contain multiple statements. They are executed in the order in which they occur in the source code.
  • Text between /* and */ and the remainder of a line after // are comments: they are ignored by the compiler.
  • A value can be printed to the screen with std::cout << value.
  • << can be used multiple times with std::cout, e.g. std::cout << "x = " << 42;
  • Text (strings) that should be printed literally, has to be enclosed in double quotes like "Some text.".
  • A \n in a string will be printed as a line break.

Footnotes

[1]Although I have not read it and it uses PowerShell on Windows instead of the ordinary cmd, http://cli.learncodethehardway.org/book/ made a good first impression on me, if you can find nothing else.
[2]More technically, it says “Here comes a function called main that returns an integer (int) and takes no arguments (()).” Because exactly such a function is defined by the standard as being the main program, it really does, in effect, say “Here comes the main program!”. We will get to the other parts later, step by step.
[3]To be more precise, it of course does not directly tell the computer to print anything; it tells the compiler to emit the binary machine instructions into the output executable that will cause the computer to print “Hello World” when the program is executed.
[4]In fact, this line uses many features of the C++ language that we will partly come to only very late, e.g. classes, namespaces and operator overloading.