NOTE: Every post ends with "END OF POST". If you don't see it then open the full post in a separate page!

C++ reflections (1 of N)

There was this article that generated a bit of a heated series of comments among C++ developers. Some words in this article hurt (the ego) and could have been said differently… anyway at least it may make people reflect a bit.

Not sure why, but domain specific languages (DSLs) don’t seem to be very prevalent yet.

C++ has many design flaws. Many due to legacy reasons. Some of them probably will never be fixed or changed due to backwards compatibility with the endless amount of existing and mission crticial C++ code. And we will need
C++ developers to maintain these existing and critical systems/libraries.

For cases where performance really matters, unfortunately C++ is a very strong (or sometimes the only) candidate.

The problem is when C++ is used in the wrong context/domain. For example, nowadays nobody would write a desktop GUI application in C++. but on some embedded devices you would (Qt).

Due to its complexity, C++ should be used only in a limited context. C++ is painful to use unless you’re familiar with its weird idiosyncrasies. People with domain knowledge are usually not C++ experts, but domain experts. A domain expert has enough things to think about, and adding C++ to the mix would make them less efficient and more annoyed.

For research (science, with expensive and rare domain knowledge), you would not use C++ because it just slows your scientist mind down with unnecessary language/machine details. But when the research goes into application then you may need to rewrite the researcher’s Python code in some other language for performance reasons.
And that language is sometimes C++, sometimes else.

The complexity of C++ may also make many people accidentally and necessarily become “C++ experts”.

Sometimes it is also difficult to foresee what performance a certain language implementation would give. For example, you choose Java/etc, and then after one year of development you face performance issues. Then you may have to pick up a different language that compiles to more efficient code, and rewrite parts or the entire program. Again, choosing this “more efficient language” is tricky, so you are likely to pick one that is already battle-tested even if it hurts your brain (C++).

Using DSLs in a “right tool for the right job” manner is a great idea. BUT 🙂 In a single project you’re likely to have a limited budget. So you may not afford to pay polyglot software engineers, or multiple software engineers for each DSL. You may also want your team to be “full-stack” (no silos), and also want to deliver on-time. The more tools your team has to master the more difficult and costly development becomes. Unless all the DSLs use similar/familiar syntax, but that may be problematic too, because programmers may introduce bugs/perf issues when writing code in DSL-A while thinking in DSL-2. 🙂

In web/enterprise development the tendency is already kind of DSL-like. You use Javascript for UI/frontend (with its endless frameworks/”DSLs”), Java/Kotlin/C#/Scala/etc for backend, Go/Rust/C/C++/etc for very performance critical backend.

And then job security… you may choose a well established language (Java/C#/C++) to have a chance on the job market. Every language needs to be learnt and mastered at some level because each has its own idiosynchrasies (you need to think in language-A to write “good” code in language-A).
If you want to be a general software engineer then mastering 2-3 languages is hard enough. With a multitude of DSLs for different domains, you may not be able to be a “general software engineer”. You would have to become a domain expert. Doing that in multiple domains is probably more difficult than “just” learning a few general purpose langugaes.

Yes, I hate C++ for many reasons. It makes me waste time with language details, beginners and even experts are bugged with seemingly unimportant things: how to pass this parameter (value/ref/const&/*/optional/etc), how to return this parameter, how to assign this result (const/&/auto/etc.), how to initialize this (default/value/braced/etc), use template not runtime polymorphism here, how to design your class (explict/default special functions, virtual destructor, etc) etc etc.

C++ is good for embedded/high-perf work because there by nature you MUST think about these language details. That is part of the domain knowledge of the embedded developer when using C++. It would be great to have a DSL for embedded. But that DSL would also have to be somewhat extensible by the programmer. And that may lead to an overcomplicated DSL. And if it’s extensible only via a comittee then that may be a slow process and people move on to some other DSL (or non-DSL) 🙂

I wish we wouldn’t need C++! I really do! 🙂

END OF POST


Using auto for declaring variables of move-only types.


This post demonstrates how to use auto with move-only types.

In Declaration and initialization with auto, we showed that using auto for local declarations makes the code safer and more readable. It avoids writing declarations where we easily loose sight of the variable name. Like the first declaration in this example:

// Bad: The variable name is surrounded with "junk".
std::vector<std::string> myVariable { CreateStringVector(input1, input2) };

// Good: The variable name is always on the right side of "=".
auto myVariable = std::vector<std::string>{ CreateStringVector(input1, input2) };

Unfortunately, if you want to declare a variable of a move-only type with auto then you might try this:

#include <atomic>
#include <iostream>

int main()
{
    auto Counter = std::atomic<int>{22}; // Error: Won't compile.
    std::cout << Counter << std::endl;
    return 0;
}

But then you get this compilation error (clang 3.5.0):

prog.cc:6:10: error: call to implicitly-deleted copy constructor of 'std::__1::atomic<int>'
    auto Counter = std::atomic<int>{22};
         ^         ~~~~~~~~~~~~~~~~~~~~
/usr/local/libcxx-3.5/include/c++/v1/atomic:730:7: note: copy constructor of 'atomic<int>' is implicitly deleted because base class '__atomic_base<int>' has a deleted copy constructor
    : public __atomic_base<_Tp>
      ^
/usr/local/libcxx-3.5/include/c++/v1/atomic:649:7: note: copy constructor of '__atomic_base<int, true>' is implicitly deleted because base class '__atomic_base<int, false>' has a deleted copy constructor
    : public __atomic_base<_Tp, false>
      ^
/usr/local/libcxx-3.5/include/c++/v1/atomic:634:5: note: '__atomic_base' has been explicitly marked deleted here
    __atomic_base(const __atomic_base&) = delete;
    ^
1 error generated.

You get the error because the move-only type (std::atomic in this case) has a deleted copy constructor.
Luckily, all is not lost on our quest for Almost Always Auto. The solution is to use Universal References:

#include <atomic>
#include <iostream>

int main()
{
    auto && Counter = std::atomic<int>{22}; // OK
    std::cout << Counter << std::endl;
    return 0;
}

The above code compiles and runs.
To see what’s happening, let’s define our own move-only type:

#include <iostream>

class MoveOnly
{
public:
    MoveOnly() { std::cout << "MoveOnly::MoveOnly()" << std::endl; }
    MoveOnly(const MoveOnly &) = delete;
    MoveOnly(MoveOnly &&) { std::cout << "MoveOnly::MoveOnly(&&)" << std::endl; }
    MoveOnly & operator=(const MoveOnly &) = delete;
    MoveOnly & operator=(MoveOnly &&) = delete;
    ~MoveOnly() { std::cout << "MoveOnly::~MoveOnly()" << std::endl; }
    void Func() {}
};

int main()
{
    auto && moveOnlyRValue = MoveOnly{};
    moveOnlyRValue.Func();
    return 0;
}

Below is the output of the above code. No surprises or extra work, just as we’d want:

MoveOnly::MoveOnly()
MoveOnly::~MoveOnly()

To see the type that the compiler deduced for our variables, we can use the “Type Displayer” trick by Scott Meyers:

#include <iostream>

class MoveOnly
{
public:
    MoveOnly() { std::cout << "MoveOnly::MoveOnly()" << std::endl; }
    MoveOnly(const MoveOnly &) = delete;
    MoveOnly(MoveOnly &&) { std::cout << "MoveOnly::MoveOnly(&&)" << std::endl; }
    MoveOnly & operator=(const MoveOnly &) = delete;
    MoveOnly & operator=(MoveOnly &&) = delete;
    ~MoveOnly() { std::cout << "MoveOnly::~MoveOnly()" << std::endl; }
    void Func() {}
};

template <typename T>
class TypeDisplayer;

int main()
{
    auto && moveOnlyRValue = MoveOnly{};
    auto td1 = TypeDisplayer<decltype(moveOnlyRValue)>{};
    moveOnlyRValue.Func();
    
    auto moveOnly = std::move(moveOnlyRValue);
    auto td2 = TypeDisplayer<decltype(moveOnly)>{};
    moveOnly.Func();

    return 0;
}

This is what we get when we attempt to compile:

prog.cc:21:16: error: implicit instantiation of undefined template 'TypeDisplayer<MoveOnly &&>'
    auto td1 = TypeDisplayer<decltype(moveOnlyRValue)>{};
               ^
prog.cc:16:7: note: template is declared here
class TypeDisplayer;
      ^
prog.cc:25:16: error: implicit instantiation of undefined template 'TypeDisplayer<MoveOnly>'
    auto td2 = TypeDisplayer<decltype(moveOnly)>{};
               ^
prog.cc:16:7: note: template is declared here
class TypeDisplayer;
      ^
2 errors generated.

The compiler tells us that moveOnlyRValue is deduced as “MoveOnly &&” and moveOnly is deduced as “MoveOnly”.

Is it safe to do this? Is it safe to declare a move-only variable as “auto &&”? What happens?
Well, our variable will be an rvalue reference to a temporary object.

  • The question is the same as “Is it safe to declare our variable with “auto &””
  • Which is the same as “Is it safe to declare our variable with “Type &””
  • Which is similar to “Is it safe to declare our variable with “const Type &””

And we do this latter all the time to avoid unnecessary copies of temporaries:

const BigStuff & bigStuff = MakeBigStuff();

This causes trouble only if this reference outlives the object. For example, if we return it from a function, but luckily compilers already know about this pitfall and shout at us:

#include <iostream>

class MoveOnly
{
public:
    MoveOnly() { std::cout << "MoveOnly::MoveOnly()" << std::endl; }
    MoveOnly(const MoveOnly &) = delete;
    MoveOnly(MoveOnly &&) { std::cout << "MoveOnly::MoveOnly(&&)" << std::endl; }
    MoveOnly & operator=(const MoveOnly &) = delete;
    MoveOnly & operator=(MoveOnly &&) = delete;
    ~MoveOnly() { std::cout << "MoveOnly::~MoveOnly()" << std::endl; }
    void Func() const {}
};

MoveOnly & Shmunc()
{
    auto && moveOnlyRValue = MoveOnly{};
    return moveOnlyRValue; // Danger: Returning reference to temporary
}

int main()
{
    const auto & n = Shmunc();
    n.Func();
    return 0;
}

You get this warning if you enable warnings of your compiler, which you always should (e.g. -Wall for gcc and clang):

prog.cc:18:12: warning: returning reference to local temporary object [-Wreturn-stack-address]
    return moveOnlyRValue;
           ^~~~~~~~~~~~~~
prog.cc:17:13: note: binding reference variable 'moveOnlyRValue' here
    auto && moveOnlyRValue = MoveOnly{};
            ^                ~~~~~~~~~~
1 warning generated.

The solution is not to return references to local temporaries 🙂 :

#include <iostream>

class MoveOnly
{
public:
    MoveOnly() { std::cout << "MoveOnly::MoveOnly()" << std::endl; }
    MoveOnly(const MoveOnly &) = delete;
    MoveOnly(MoveOnly &&) { std::cout << "MoveOnly::MoveOnly(&&)" << std::endl; }
    MoveOnly & operator=(const MoveOnly &) = delete;
    MoveOnly & operator=(MoveOnly &&) = delete;
    ~MoveOnly() { std::cout << "MoveOnly::~MoveOnly()" << std::endl; }
    void Func() const {}
};

MoveOnly Hunc()
{
    auto && moveOnlyRValue = MoveOnly{};
    return std::move(moveOnlyRValue);
}

int main()
{
    const auto & n = Hunc();
    n.Func();
    return 0;
}

We get this output:

MoveOnly::MoveOnly()
MoveOnly::MoveOnly(&&)
MoveOnly::~MoveOnly()
MoveOnly::~MoveOnly()

So, local variables with move-only types can be declared using “auto &&”.

References

END OF POST


Review: “Andrey Alexandrescu, Systematic Error Handling in C++” (Expected, ScopeGuard)

This is a review of a video lecture given by Andrey Alexandrescu on Systematic Error Handling in C++

Video:
* http://channel9.msdn.com/Shows/Going+Deep/C-and-Beyond-2012-Andrei-Alexandrescu-Systematic-Error-Handling-in-C
Slides:
* http://sdrv.ms/RXjNPR or
* https://onedrive.live.com/view.aspx?resid=F1B8FF18A2AEC5C5!1158&app=WordPdf&authkey=!APo6bfP5sJ8EmH4

Andrey Alexandrescu introduces two nice C++ classes to manage and handle errors in C++.
* Expected<T>
* ScopeGuard

Expected<T> is a template class. It is a bit similar to std::optional<T> (or boost::optional<T>) but it also carries the reason for an invalid value in the form of an exception object. Internally, it is a union that has either a T object or an exception.

It is easiest to understand it through a use-case, taken from Slide 28:

Expected<int> parseInt(const std::string& s)
{
    int result;
    ...
    if (nonDigit)
    {
        return Expected<int>::fromException(std::invalid_argument("not a number"));
    }
    ...
    if (tooManyDigits)
    {
        return Expected<int>::fromException(std::out_of_range("overflow"));
    }
    ...
    return result;
}

In the example above, we either return a valid result or an invalid result represented by an exception object. Note, that we do not throw the exception here. We just return it.

The caller then can decide whether to handle the exception immediately or (re)throw it. Example from Slide 29:

// Caller
string s = readline();
auto x = parseInt(s).get(); // Throw on error
auto y = parseInt(s); // Won’t throw
if (!y.valid()) {
    // Handle locally
    if (y.hasException<std::invalid_argument>()) {
        // Not an int
    }
    y.get(); // Just "re"throw
}

ScopeGuard is a class that executes the provided callable object at scope exit (i.e. at function exit and when exceptions occur).
Andrey provides some convenience macros fur using ScopeGuard but I think it is easier to see what it does without macros. Here is a usage example from slide 42:

// Assuming that action1() and action2() does not need neither cleanup nor rollback if they fail.

action1(); // Throws if fails
auto cleanup1 = ScopeGuard{[&] { doCleanup1(); }};
auto rollback1 = ScopeGuard{[&] { doRollback1(); }};

action2(); // Throws if fails
auto cleanup2 = ScopeGuard{[&] { doCleanup2(); }};
auto rollback2 = ScopeGuard{[&] { doRollback2(); }};

nextAction(); // Throws if fails

rollback1.dismiss();
rollback2.dismiss();

// cleanup1 and cleanup2 called at scope exit.

Although you can abuse e.g. std::unique_ptr to do RAII with a fake pointer (a pointer that does not point to a valid address) and a custom deleter, the resulting code may be misleading because it has nothing to do with pointers (the use of unique_ptr, the mysterious 0xfefefefe, the unused void* in doCleanup()).

// This code is a bit misleading. We just want to clean up but we mess with an API meant for pointers.

void doCleanup(void *)
{
    ...
}

void Action()
{
    ...
    auto cleanup = std::unique_ptr<void, decltype(doCleanup)*>{
                       reinterpret_cast<void *>(0xfefefefe), doCleanup };
    ...
    // doCleanup() called at scope exit or upon exception.
}

ScopeGuard provides a clear API for using RAII with unsafe/legacy code.

END OF POST


Using std::unique_ptr (RAII) with malloc() and free()


This is a short post about using std::unique_ptr with malloc() and free(). Although, it can be used with other resource management functions too (files, sockets, etc.).

Often, when working with legacy C and C++ code, we see usages of malloc() and free(). Just like new and delete, explicit memory management should be hidden in the guts of libraries whenever possible and never be exposed to the casual programmer.

It would be great if the legacy code could be easily changed to use std::make_unique() to make it nice clean and safe at the same time. Unfortunately, std::make_unique() uses new and delete operators internally. So, if our legacy code uses some custom functions to allocate and deallocate memory then we may be forced to do more refactoring than we might have time for (e.g. to use new expressions instead of custom or malloc based allocators).

Luckily, we can still get the benefit of RAII by using std::unique_ptr by trading off some cleanliness.
But std::unique_ptr takes a deleter type. No problem, we have decltype() to the rescue!

#include <memory>

int main()
{
    auto Data =
        std::unique_ptr<double, decltype(free)*>{
            reinterpret_cast<double*>(malloc(sizeof(double)*50)),
            free };
    return 0;
}

The decltype(free) gives back a function type of “void (void*)” but we need a pointer to a function. So, we say “decltype(free)*” which gives us “void (*)(void*)”. Excellent!

A bit awkward, but it is still nice, since it does RAII (automatic free) and both the allocator (malloc()) and the deallocator (free()) is clearly visible to the reader.

With decltype() we don’t have to write our own deleter functor like in this example:

#include <memory>

struct MyDeleter
{
    void operator()(double *p) { free(p); }
};

int main()
{
    auto Data =
        std::unique_ptr<double, MyDeleter>{
            reinterpret_cast<double*>(malloc(sizeof(double) * 50)) };
    return 0;
}

Also, with decltype() we don’t have to spell out the type of the deleter like in this example:

#include <memory>

int main()
{
    auto Data =
        std::unique_ptr<double, void(*)(void*)>{
            reinterpret_cast<double*>(malloc(sizeof(double) * 50)),
            free };
    return 0;
}

So, with std::unique_ptr you can quickly hack RAII into legacy code. However, as a general guideline, prefer refactoring legacy code using modern C++.

END OF POST


shared_ptr polymorphic magic pitfall


Pitfall in the polymorphic “magic” of std::shared_ptr

In [Sutter] we read:
“If A is intended to be used as a base class, and if callers should be able to destroy polymorphically, then make A::~A public and virtual. Otherwise make it protected (and not-virtual).”

Then in [ACCU], the author writes that if you use shared_ptr for polymorphic destruction then you may omit the virtual destructor. The reason for this can be read from [Arena].

In short, through it’s template code, the shared_ptr remembers the pointer type used during construction. For example, if you say “shared_ptr<Base>{new Derived{}}” then shared_ptr will internally store a Derived*. If you say “shared_ptr<Base>{new Base{}}” then it stores a Base*. Then when the shared_ptr is destructed, it calls delete on the stored pointer. Naturally, with non-virtual destructors, for Base* it will call Base::~Base and for Derived* it will call Derived::~Derived.

If you are not careful and blindly follow the example in [ACCU], there is a nasty surprise waiting for you!
Let’s take the following example:

#include <iostream>
#include <memory>

struct A
{
    ~A() { std::cout << __FUNCTION__ << std::endl; }
};

struct Base
{
    // No virtual dtor!
};

struct Derived : public Base
{
    Derived() : m_pA{new A{}} {}
    A m_A;
    std::unique_ptr<A> m_pA;
};

int main()
{
    {
        std::cout << "\nDelete Derived via Base* : A::~A() NOT CALLED" << std::endl;
        Base *p = new Derived{};
        delete p;
    }
    {
        std::cout << "\nDelete Derived via shared_ptr<Base> with Derived* : A::~A() called" << std::endl;
        std::shared_ptr<Base> sp{new Derived{}};
    }
    {
        std::cout << "\nDelete Derived via make_shared<Derived> : A::~A() called" << std::endl;
        std::shared_ptr<Base> sp{std::make_shared<Derived>()};
    }
    {
        std::cout << "\nDelete Derived via shared_ptr<Base> with Base* : A::~A() NOT CALLED" << std::endl;
        Base *p = new Derived{};
        std::shared_ptr<Base> sp{p};
    }
    return 0;
}

Base does not have a virtual destructor, so if we delete a Derived pointer via Base then A::~A will not be called. Classic memory leak candidate.
The above code gives the following output:

Delete Derived via Base* : A::~A() NOT CALLED

Delete Derived via shared_ptr<Base> with Derived* : A::~A() called
~A
~A

Delete Derived via make_shared<Derived> : A::~A() called
~A
~A

Delete Derived via shared_ptr<Base> with Base* : A::~A() NOT CALLED

The first call is as expected, since Base has non-virtual destructor, A is not destructed. On the other hand, via the “magic” of shared_ptr, in the 2nd and 3rd cases, Derived::~Derived (and thus A::~A) gets called even thou we have a shared_ptr<Base>. The 4th example has a surprise: although we use again a shared_ptr with a pointer to a Derived object, shared_ptr calls Base::~Base because it was initialized with a Base*. In this case shared_ptr cannot see that the provided Base* pointer actually points to a Derived object.

shared_ptr<Base> calls Derived::~Derived only if it is constructed directly with a pointer of type Derived*. If you construct shared_ptr<Base> with a Base* then it will not call Derived::~Derived, it will call ~Base::Base. The magic does not happen!

Also, in [Flaming] we read this:

“Classes that have custom destructors, copy/move constructors or copy/move assignment operators should deal exclusively with ownership. Other classes should not have custom destructors, copy/move constructors or copy/move assignment operators.”

But notice in our example above that neither Base nor Derived manage resources. Derived uses only self-destructing objects internally. So one might think that there is no ownership issue here, so we erroneously decide not to provide a “custom (virtual) destructor”. So the above should read like this:

“Classes that have custom destructors, copy/move constructors or copy/move assignment operators should deal exclusively with ownership. Other classes should not have custom destructors, copy/move constructors or copy/move assignment operators except for [Sutter].” 🙂

Conclusion
No matter how “smart” your pointer to Derived is and no matter if you use everywhere only self-destructing objects (e.g. RAII), use a virtual desctructor for polymorphic deletion!

Again, as stated in [Sutter]:
“If A is intended to be used as a base class, and if callers should be able to destroy polymorphically, then make A::~A public and virtual. Otherwise make it protected (and not-virtual).”

References

END OF POST


Emulating in, out and inout function parameters – Part 2


Emulating in, out and inout function parameters in C++.

This is the continuation of Part 1 of this post.

In Part 1, we created classes to represent Input, Output and Input-Output parameters and arguments.
Here is an example how those classes can be used:

#include "param_inout.hpp"
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
using namespace param_inout;
using namespace std;
#define PRINT(arg) cout << __FUNCTION__ << ":" << __LINE__ << ": "<< arg << " " << endl

double func1(inp<int> p) {
    // p = 1; // Error: p is read-only.
    return p * 2.2;
}

void func2(outp<int> p) {
    // int a = p; // Error: p is write-only.
    p = 88;
}

void func3(inoutp<string> p) {
    auto t = string(p); // p is readable.
    p = "Hello ";       // p is writable.
    p = string(p) + t;  // p is readable and writable.
}

void func4(inp<string> pattern,
           inp<vector<string>> items,
           inoutp<string> message,
           outp<int> matchCount) {
    PRINT(message.arg());
    auto& ritems = items.arg();
    matchCount = count_if(begin(ritems), end(ritems),
        [&](const string& a) { return a.find(pattern) != string::npos; });
    message = "Done";
}

void func5(inp<int> p1, outp<int> p2, inoutp<string> p3) {
    // func1(p1); // Error! Good! inp::inp(const inp&) is private.
    func1(ina(p1));
    func2(outa(p2));
    func3(inouta(p3));
}

int main() {
    // auto a = func1(ina(2.2)); // Error: Cannot convert ina<double> to ina<int>
    // auto a = func1(2); // Error: inp::inp(const int&) is private.
    auto a0 = func1(ina(static_cast<int>(2.2)));   PRINT(a0);
    auto a1 = func1(ina(2));                       PRINT(a1);
    auto a2 = 0;                func2(outa(a2));   PRINT(a2);
    auto a3 = string{"world!"}; func3(inouta(a3)); PRINT(a3);

    auto a4 = vector<string>{"avocado", "apple", "plum", "apricot", "orange"};
    auto a5 = string{"Searching..."};
    auto a6 = 0;
    func4(ina(string{ "ap" }), ina(a4), inouta(a5), outa(a6));
    PRINT(a5);  PRINT(a6);

    func5(ina(5), outa(a6), inouta(a5));  PRINT(a5); PRINT(a6);

    return 0;
}

This example code produces the following output:

main:47: 4.4
main:48: 4.4
main:49: 88
main:50: Hello world!
func4:30: Searching...
main:56: Done
main:56: 2
main:58: Hello Done
main:58: 88

In the code example above, it is clear at the parameter declaration how each parameter is used by the function. Also, for declaration, we chose the convention to include a trailing “p” after the category. For example, outp signifies an Output Function Parameter. It says that it’s for output and it also says that it’s a parameter (i.e. not an argument).

At the call sites, we pass function arguments using ina (input), outa (output) and inouta (input and output). This way, we can clearly see that they are Function Arguments (not parameters); and how the function is going to use those arguments. No surprises.

It is also obvious how to construct function declarations ourselves. For example, the simple functions at the beginning of this post may be declared like this, regardless who does it:

void f(inoutp<string> s);
void g(inp<string> s);

void caller() {
  auto a1 = string{"hello"};
  f(inouta(a1));
  g(ina(a1));
}

And our std::vector example will become this:

void f(inp<vector<string>> v);

It is clear what is happening both at the declaration and at the call site. Also, this convention is much easier and obvious to follow.

References

References about the complexity of function parameter declaration guidelines

END OF POST


Emulating in, out and inout function parameters – Part 1


Emulating in, out and inout function parameters in C++.

In C++, passing arguments to functions can be done in a variety of ways. If you are not careful, even thou your function works as intended, the way its parameters are declared can easily mislead the caller.

Consider the following simple examples:

void f(string *s);
void g(string &s);

void caller() {
  auto a1 = string{"hello"};
  f(&a1);
  g(a1);
}

Is function f going to change s? Is it so that the writer meant “const char *” but the const qualifier is missing by mistake?
Same applies to function g. Also, when calling g, it is not apparent that it may change a1. It looks like it takes a1 by value.

These and similar issues can be avoided by laying down coding conventions about “how to name functions” and “how to declare function parameters” for your team members in your project. The problem with such guidelines is that it may be hard to follow. Even in simple cases they may not be obvious. For example, we may know that our compiler can copy std::vector by merely copying a pointer. So you automatically declare the input parameter like this:

void f(vector<string> v);

On the other hand, other people may not know this. In this case, they tend to declare the same kind of input parameter like this:

void f(const vector<string>& v);

Both of these are correct but it leads to inconsistency and confusion. Also, guidelines about “how to declare function parameters” can become quite complex considering when and how to use pointers, references, const qualifiers, pass-by-value, etc in function declarations.

99% of the time, we can categorize function parameters as

  • Input: Parameters that are only read by the function. They are not changed.
  • Output: Parameters that are only written by the function. The value of the corresponding argument that the caller passes in is not relevant to the function. These parameters are products of the function and the caller sees their new values when the function returns.
  • Input and output: Parameters that are both read and written by the function.

Some languages, like C#, provide standard tools for specifying these categories for function parameters. C++ does not provide standard tools for this. Fortunately, C++ is a very flexible language and we can roll our own tools to achieve this.

Here is one way to implement such a feature:

#pragma once
namespace param_inout {
    // Input
    template <typename T> class inp;
    template <typename T> inp<T> ina(const T&);
    template <typename T> inp<T> ina(const inp<T>&);
    template <typename T>
    class inp {
    public:
        inp(inp&& other) : m_arg{other.m_arg} { /* empty */ }
        operator const T&() const { return m_arg; }
        const T& arg() const { return m_arg; }
    private:
        inp(const inp&) = delete;
        inp(const T& arg) : m_arg{arg} { /* empty */ }
        friend inp<T> ina<T>(const T&);
        friend inp<T> ina<T>(const inp<T>& arg);
        const T& m_arg;
    };
    template <typename T>
    inp<T> ina(const T& arg) { return inp<T>{arg}; }
    template <typename T>
    inp<T> ina(const inp<T>& param) { return inp<T>{param.m_arg}; }

    // Output
    template <typename T> class outp;
    template <typename T> outp<T> outa(T&);
    template <typename T> outp<T> outa(outp<T>&);
    template <typename T>
    class outp {
    public:
        outp(outp&& other) : m_arg{other.m_arg} { /* empty */ }
        outp& operator=(const T& otherArg) { m_arg = otherArg; return *this; }
    private:
        outp(const outp&) = delete;
        outp(T& arg) : m_arg{arg} { /* empty */ }
        friend outp<T> outa<T>(T&);
        friend outp<T> outa<T>(outp<T>&);
        T& m_arg;
    };
    template <typename T>
    outp<T> outa(T& arg) { return outp<T>{arg}; }
    template <typename T>
    outp<T> outa(outp<T>& param) { return outp<T>{param.m_arg}; }

    // Input and output
    template <typename T> class inoutp;
    template <typename T> inoutp<T> inouta(T&);
    template <typename T> inoutp<T> inouta(inoutp<T>&);
    template <typename T>
    class inoutp {
    public:
        inoutp(inoutp&& other) : m_arg{other.m_arg} { /* empty */ }
        operator T&() { return m_arg; }
        T& arg() const { return m_arg; }
        inoutp& operator=(const T& otherArg) { m_arg = otherArg; return *this; }
    private:
        inoutp(const inoutp&) = delete;
        inoutp(T& arg) : m_arg{arg} { /* empty */ }
        friend inoutp<T> inouta<T>(T&);
        friend inoutp<T> inouta<T>(inoutp<T>&);
        T& m_arg;
    };
    template <typename T>
    inoutp<T> inouta(T& arg) { return inoutp<T>{arg}; }
    template <typename T>
    inoutp<T> inouta(inoutp<T>& param) { return inoutp<T>{param.m_arg}; }
}

These simple classes wrap around references to the actual function arguments. Why?

  • For Output and Input-Output parameters, taking a reference is necessary since we want to write into the arguments and make those writes visible to the caller.
  • For Input parameters, we take a const reference. It is a reference because most of the time it is “efficient enough”. It is const because it is a reference and we want it to be read-only. If we want, we can specialize it for simple types, like char or int, to store a copy and not a reference. For the sake of consistency and simplicity, even if you choose to specialize it, it’s probably better to always treat inp<T> as a reference. That is, make a copy of its contents (inp<T>.arg()) if you want to save it somewhere after the function returns.

In Part 2 of this post, we look at how these classes can be used.
Continue to Part 2.

END OF POST


Declaration and initialization with auto


Guidelines about the auto keyword of C++11 for variable declaration.

The guidelines around the usage of the auto keyword in C++11 is controversial. One argument is that “auto hides the type”. In practice, I found that this is not a concern, unless you use “notepad.exe” as your IDE. This post lists a collection of examples and guidelines for local variable declaration and initialization with auto.

Initialization in C++ is a mess.

int i;
float f(3.3);
double d = 4.4;
char c = i + 2;
class A {
  A() : s(), i() {}
  string s;
  int i;
};
string t(); // This is not even a variable declaration!

All the above variables are declared and initialized using different syntax and semantics.
For example, variable i is Default Initialized. In case of int, it is effectively not initialized at all.
Variable f is initialized with 3.3 but it looks more like a function call.
The A::s and A::i variables are Value Initialized, A::i with 0 and A::s with the default constructor of string.
On the other hand, the last one is not a variable definition but a function declaration. The t is a function that takes no arguments and returns a string.
This is anything but consistent. If I was new in C++ land, I would consider this confusing and I may prematurely turn away from learning C++ in favor of some other language. I think it is a pity.

Fortunately with C++11, the language gained a new way of declaring variables. The same code looks like this when using the auto keyword and Braced Initialization:

auto i = 1;
auto f = 3.3f;
auto d = 4.4;
auto c = char{i + 2};
class A {
  A() : s{}, i{} {}
  string s;
  int i;
};

This looks much more consistent and also safer (see the guidelines at the end of this post).

Here is a more elaborate example about the basic usage:

#include <iostream>
#include <string>
#include <typeinfo>
#include <vector>
#include <memory>
using namespace std;

template<typename T>
void print(const T& a) { cout << typeid(T).name() << "=(" << a.i << "," << a.d << "," << a.s << ") " << endl; }

struct A {
  int i;  double d;  string s;
};

struct B {
  explicit B(int ai, string as) : i{ai}, s{as} {}
  virtual ~B() {}
  int i;  double d;  string s;
};

struct C : public B {
  explicit C() : B{1, "C"} {}
};

int main() {                          // Type of declared variable
                                      // ---------------------------
  auto i1 = 11;                       // int
  auto i2 = 12u;                      // unsigned int
  auto i3 = 13.0f;                    // float
  auto i4 = 13.0;                     // double
  auto i5 = "hello";                  // const char[6]
  auto i6 = string{"hello"};          // string
  // auto i7 = int{i3};               // n/a (Error: narrowing from float to int!)

  auto v1 = vector<int>{};            // vector<int> (empty vector)
  auto v2 = vector<int>{1, 2, 3, 4};  // vector<int> (with elements 1, 2, 3, 4)
 
  auto p1 = unique_ptr<B>{new C{}};   // unique_ptr<B>
  auto p2 = shared_ptr<B>{new C{}};   // shared_ptr<B>
  auto p3 = new A{};                  // A*
  delete p3;
  auto p4 = static_cast<B*>(new C{}); // B* (Ugly! Good! Discourages polymorphic raw pointers.)
  delete p4;
 
  // auto a0 = A;                     // n/a (Error: Initialization required! Good!)
  auto a1 = A{};                      // A (Value initialization)
  auto a2 = A{11, 3.3, "A"};          // A (Aggregate initialization)

  // auto b1 = B{};                   // n/a (Error: No such user-defined constructor!)
  auto b2 = B{22, "B"};               // B
  // auto b3 = B{1, 4.4, "B"};        // n/a (Error: No such user-defined constructor!)

  print(a1);  print(a2);
  print(b2);
  return 0;
}

Here is the output of the above program:

1A=(0,0,)
1A=(11,3.3,A)
1B=(22,-0.553252,B)

There are some situations when determining the resulting type of the declaration may look not so obvious, but it all makes sense and remembering the following helps you out:

  • “auto a = expression;” in general:
    • Creates a new object called a.
    • It has the same type as expression.
    • Initialized with the value of expression.
  • “auto a = expression;”
    • a is a new object. The value of expression is copied into a. So, the CV qualifier of the expression is dropped.
  • “auto& a = referencable_expression;” and
  • “auto* a = pointer_expression;” same as
    “auto a = pointer_expression;”

    • This creates a new reference (a reference or a pointer) to the object returned by the expression. Because it is a reference, the CV qualifier of the
      expression is “inherited” to a. For example, this will create a const reference if the expression itself is const. Makes sense!

Here is an example of some “not so obvious” declarations denoting the source type and the resulting type. Pay attention that the first column is the resulting type and the second column is the source type (the “a <- b" notation indicates that we get type a from type b).

#include <iostream>
#include <typeinfo>
using namespace std;

int main() {
  //                                        Result      Source
  //                                    -----------------------------------
  auto i = int{3};                       // int         <-  int

  auto v1 = i;                           // int         <- int
  const auto v2 = i;                     // const int   <- int
  auto& v3 = i;                          // int&        <- int
  const auto& v4 = i;                    // const int&  <- int
  auto v5 = static_cast<const int>(i);   // int         <- const int
  auto v6 = static_cast<const int&>(v3); // int         <- const int&

  auto& t4 = v4;                         // const int&  <- const int& (CV inherited)
  const auto& t5 = v4;                   // const int&  <- const int&
  auto& t6 = v3;                         // int&        <- int&

  auto w1 = v3;                          // int         <- int&
  auto w2 = v4;                          // int         <- const int&

  auto p1 = &i;                          // int*        <- int*
  auto p2 = p1;                          // int*        <- int*
  const auto p3 = p1;                    // int* const  <- int*
  auto& p4 = p1;                         // int*&       <- int*
 
  auto q1 = static_cast<const int*>(&i); // const int*  <- const int* (CV inherited)
  auto q2 = q1;                          // const int*  <- const int* (CV inherited)
  auto q3 = *q1;                         // int         <- const int

  auto* r1 = static_cast<const int*>(&i);// const int*  <- const int* (CV inherited)
  auto* r2 = r1;                         // const int*  <- const int* (CV inherited)
  auto* r3 = *r1;                        // int         <- const int

  auto z1 = {1, 2, 3};                   // std::initializer_list<int>
}

One other important rule to remember is that the Initializer-list Constructor wins over normal constructors. In other words, when in doubt, the compiler picks the Initializer-list Constructor. Here is an example that demonstrates this:

#include <iostream>
#include <string>
#include <typeinfo>
#include <initializer_list>
#include <vector>
using namespace std;

template<typename T>
void print(const T& a) {
  cout << typeid(T).name() << "=(" << a.i << "," << a.d << "," << a.s << ",[";
  for (const auto i : a.v) { cout << i << " "; }
  cout << "])" << endl;
}

struct D {
  explicit D() {}
  explicit D(initializer_list<int> il) : v{il} {}
  int i;  double d;  string s;  vector<int> v;
};

struct C {
  explicit C(int ai, string as) : i{ai}, s{as} {}
  explicit C(int ai1, int ai2) : i{ai1 + ai2} {}
  explicit C(initializer_list<int> il) : v{il} {}
  int i;  double d;  string s;  vector<int> v;
};

int main() {
  auto v1 = vector<int>{};     // Empty vector
  auto v2 = vector<int>{1, 5}; // Vector with values [1, 5]
  auto v3 = vector<int>(1, 5); // Vector with values [1, 1, 1, 1, 1]

  auto d1 = D{};                 // D::D() user-defined
  auto d2 = D{1, 2, 3, 4 ,5, 6}; // D::D(initializer_list<int>)

  auto c1 = C{};                 // C::C(initializer_list<int>), no user-defined C::C()
  // auto c2 = C{33, 4.4, "C"};  // Error:
                                 // Compiler wants C::C(initializer_list<int>)
  auto c3 = C{10, 20};           // C::C(initializer_list<int>)
  auto c4 = C{33, "C"};          // C::C(int, string), compiler cannot deduce init-list
  auto c5 = C{1, 2, 3, 4, 5, 6}; // C::C(initializer_list<int>)

  print(d1);  print(d2);
  print(c1);  print(c3);  print(c4);  print(c5);
  return 0;
}

Here is the output of this example program:

1D=(-1217180992,3.64111e-314,,[])
1D=(-1217352208,-5.02222e-42,,[1 2 3 4 5 6 ])
1C=(-1218706460,-0.318174,,[])
1C=(-1220249883,4.88997e-270,,[10 20 ])
1C=(33,4.85918e-270,C,[])
1C=(-1217346088,4.8697e-270,,[1 2 3 4 5 6 ])

Guidelines

  • Almost Always Auto: Use auto for local variable declaration whenever you can.
    • Safety: Discourages uninitialized variables (Default Initialization).
    • Safety: Discourages using naked pointers (the polymorphic case).
    • Correctness: No automatic narrowing! You get the exact type.
    • Convenience: With auto, the compiler “figures out” the type for you.
    • Convenience: Use standard literal suffixes and prefixes for less typing.
      • Integer suffixes: u (unsigned int), l (long int), ul (unsigned long int).
      • Floating point suffixes: L (long double), f (float).
      • Character prefixes: L’’ (wchar_t), L”” (const wchar_t[]).
  • Almost Always Brace Initializers: Use brace initializers whenever you can.
    • Consistency: Provides one syntax for initialization.
    • Initialization != Function call != Type cast
      • “auto w = Widget(12)”: This looks like a function call, but it’s not. It is a type cast of int to Widget (Explicit Type Conversion).
      • “auto w = Widget{12}”: This looks like a brace initializer and it is a brace initializer.
  • Almost: The compiler will tell you when you cannot use auto.

References

END OF POST


The A::Restricted idiom


Fine grained access control to private members of a class

Sometimes I wish I could control the access to a class in a finer way.
Usually you have these tools in your arsenal:

  • private
  • protected
  • public
  • friend

Private is fine, since most of the stuff in a class should be hidden from all, in order to maximize encapsulation.

Protected parts can be accessed by anybody who derives from your class. This opens your class too much, i.e. to every deriving class. Making these protected parts private later on can be impossible due to the number of deriving classes or due to not knowing who may derive from you (e.g. if you are a library). This can considerably hinder refactoring efforts.

Public parts have the same problems as the protected parts and more. The whole world can see your public parts.

Making a non-related entity a friend of your class also exposes too much. The friend can access every part of your class even if it does not need to.

The bottom line is, if you open up your class with protected or public then the encapsulation of your class is hurt badly. Also, making non-members as friends is almost always unnecessarily generous.

We would need better support in the core language for more specific control over who can access and what. Something like this:

// WARNING! This is NOT C++! The "public(...)" is fictional.

class A {
public(class B, class D): // Private for all but public for B and D.
  void f();
private: // Private for all.
  int a;
};

class B : public A {
   void h() { f(); } // OK! A::f() is public for B.
};

class C : public A {
   void h() { f(); } // Error! A::f() is private for C.
};

class D {
   void h(A& a) { a.f(); } // OK! A::f() is public for D.
};

In the above code, A::f() is marked as public only for classes B and D. For everybody else, it is private (default access for class members).
Unfortunately, the public access modifier in C++ does not support this syntax and semantics.

Luckily, there is a workaround to emulate this kind of behavior. Here is one way to do this:

// a.hpp
#pragma once

class A {
private:
    void f();
    virtual void g();
    double d;

public:
    A();

    class Restricted {
    private:
        A& parent;
        Restricted(A& p);
        // Proxy functions
        void f();
        void g();
        // Friends of Restricted
        friend class A;
        friend class B;
        friend class D;
    };
    Restricted restricted;
};
// a.cpp
#include "a.hpp"
#include

A::A() : restricted{*this} { }
void A::f() { std::cout << "A::f()" << std::endl; }
void A::g() { std::cout << "A::g()" << std::endl; }

A::Restricted::Restricted(A& p) : parent{p} { }
void A::Restricted::f() { parent.f(); }
void A::Restricted::g() { parent.g(); }

Class A has some private parts, f, g and d. From these parts, we want to expose only the functions and we want to strictly control who can access them.

For this, we define an inner Restricted class. In Restricted, everything is private. We open it up only for selected entities; here A, B and D. We create proxy functions whose task is to forward the call to those parts in A that we want to make accessible to the friends of Restricted. The Restricted class has a reference to the outer A object to which it forwards the calls.

The friends of Restricted can access Restricted::parent, but this is not a problem at all. The parent is a reference, so it cannot be changed to point to some other A after construction. Also, only the public parts of A can be access through parent. The encapsulation of A is not weakened.

The friends of Restricted can access Restricted::Restricted(A&) and construct Restricted objects, but this is not a problem either. In the worst case, this can result in multiple Restricted objects referencing the same A object. Again, the encapsulation of A is not weakened because through these Restricted::parent references only the public parts of A can be accessed. Also, through these Restricted objects only the selected private parts of A can be accessed (here A::f() via A::Restricted::f() and A::g() via A::Restricted::g()).

In class A, everything is private and only the very minimum is made public. A::A() is public because we want to allow anybody to create A objects. A::restricted is public because otherwise the friends (e.g. B) specified inside Restricted cannot access it.

The examples below show how class A can be used. Access control to A::Restricted is managed strictly by A via the friends of A::Restricted. So, nobody else can gain access without the “permission” of A.

We create a class B deriving from A. B is a friend of A::Restricted. This gives B access to the restricted parts of A.

// b.hpp
#pragma once
#include "a.hpp"
class B : public A {
public:
    void h();
private:
    virtual void g() override;
};
// b.cpp
#include "b.hpp"
#include

void B::h() {
    std::cout << "B::h() enter" << std::endl;
    // ++d;         // Error! A::d is private.
    // f();         // Error! A::f() is private.
    g();            // OK! B::g() is accessible here.
    restricted.f(); // OK! A::Restricted::f() is public here.
    restricted.g(); // OK! A::Restricted::g() is public here.
    std::cout << "B::h() exit" << std::endl;
}

void B::g() { std::cout << "B::g()" << std::endl; }

We create a class C deriving from A, but we do not give it access to the restricted parts of A.

// c.hpp
#pragma once
#include "a.hpp"
class C : public A {
public:
    void h();
};
// c.cpp
#include "c.hpp"
void C::h() {
    // f();             // Error! A::f() is private.
    // g();             // Error! A::g() is private.
    // restricted.f();  // Error! A::Restricted::f() is private.
    // restricted.g();  // Error! A::Restricted::g() is private.
}

We create a class D that is not related to A. Yet, it can access the restricted parts of A because we explicitly allow it.

// d.hpp
#pragma once
class D {
public:
    void h(class A&);
};
// d.cpp
#include "d.hpp"
#include "a.hpp"
void D::h(A& a) {
    // a.f();            // Error! A::f() is private.
    // a.g();            // Error! A::g() is private.
    a.restricted.f(); // OK! A::Restricted::f() is public here.
    a.restricted.g(); // OK! A::Restricted::g() is public here.
}
// main.cpp
#include "b.hpp"
#include "c.hpp"
#include "d.hpp"
int main() {
    auto b = B{};    b.h();
    // b.f();            // Error! A::f() is private.
    // b.g();            // Error! B::g() is private.
    // b.restricted.f(); // Error! A::Restricted::f() is private.
    auto c = C{};    c.h();
    auto d = D{};    d.h(b);
    return 0;
}

In this example, only B and D can access A::f() and A::g(), but only through A::Restricted. Nobody else can. A::d remains private for everybody. And this I call the A::Restricted idiom.
Here is the output of this example program:

B::h() enter
B::g()
A::f()
B::g()
B::h() exit
A::f()
B::g()

This idiom is a variation of the Attorney-Client Idiom. I prefer this variant (i.e. A::Restricted) because it provides a more convenient and more intuitive syntax for accessing the restricted parts. This is achieved by the automatic wiring between class A and class Restricted.

References

END OF POST


Variable declaration and initialization


Initialization in C++ is a complex topic!
Some of the initialization types of C++ are the followings:

  • Default Initialization: If a variable is not initialized explicitly. The value is garbage.
  • Value Initialization: Initializes with the type’s default initial value (e.g. default constructor for user types, zero for number types).
  • Direct Initialization: Initializes with the given value.

The example below shows what happens to variables when initialized with these different initialization types.

#include <iostream>
#include <cstdint> // uint64_t
#include <algorithm> // fill
using namespace std;

static uint64_t garbage[32];
template<typename T>
void print(const string& m, const T& a)
{ cout << m << ": (" << a.d << "," << a.f << "," << a.s << "," << a.i << ") "
       << endl; }
static void reset() { fill(begin(garbage), end(garbage), 0xDEADBEAF); }

class A {
public: double d;  float f;  string s; int i;
};

class B {
public: double d;  float f;  string s; int32_t i;
  B() {}
  explicit B(int8_t a) : i{a} {}
  explicit B(uint8_t a) : d{}, f{}, s{}, i{a} {}
  explicit B(int16_t a) : d{}, s{}, i{a} {}
  explicit B(double ad, float af, string as, int ai)
           : d{ad}, f{af}, s{as}, i{ai} {}
};

int main() {
  reset();

  auto v1 = new (garbage) A; // Placement New syntax.
  print("Default", *v1); reset();

  auto v2 = new (garbage) A{};
  print("Value without default ctor", *v2); reset();

  auto v4 = new (garbage) B{};
  print("Value with default ctor", *v4); reset();

  auto v5 = new (garbage) B{int8_t{2}};
  print("Direct with Default in ctor", *v5); reset();

  auto v6 = new (garbage) B{uint8_t{2}};
  print("Direct with Value in ctor", *v6); reset();

  auto v7 = new (garbage) B{int16_t{2}};
  print("Direct with Value and Default in ctor", *v7); reset();

  auto v8 = new (garbage) B{1.0, 2.0, "Moikkelis!", 3};
  print("Direct with Direct in ctor", *v8); reset();

  auto v3 = new (garbage) A{1.0, 2.0, "Hello!", 3};
  print("Aggregate", *v3); reset();

  return 0;
}

Here is the output of the above code:

Default: (1.84579e-314,-6.25982e+18,,-559038801) 
Value without default ctor: (0,0,,0) 
Value with default ctor: (1.84579e-314,-6.25982e+18,,-559038801) 
Direct with Default in ctor: (1.84579e-314,-6.25982e+18,,2) 
Direct with Value in ctor: (0,0,,2) 
Direct with Value and Default in ctor: (0,-6.25982e+18,,2) 
Direct with Direct in ctor: (1,2,Moikkelis!,3) 
Aggregate: (1,2,Hello!,3) 

The “ugly” floating point numbers represent uninitialized variables. As you can see, sometimes initialization does not happen even thou you thought it would.

Here are some tips in order to avoid common pitfalls:

  • Always initialize during declaration!
  • Prefer Direct Initialization.
  • Careful with Value Initialization of user types with user-defined default constructor: It can leave things uninitialized if the programmer accidentally left some member variables Default Initialized.
  • Avoid Default Initialization (i.e. no initialization).

References:
http://en.cppreference.com/w/cpp/language/default_initialization
http://en.cppreference.com/w/cpp/language/value_initialization
http://en.cppreference.com/w/cpp/language/direct_initialization
http://en.wikipedia.org/wiki/Placement_syntax

END OF POST