Optimisation Techniques

In this post: I outline a variety of commonly-used, simple approaches for optimising algorithms.
Useful background: None in particular!

Discovering and implementing efficient algorithms is often nowhere near an easy task; however, more novel ways to improve overall efficiency of computation, such as through heuristic/probabilistic approaches and shortcuts in code, can be employed quite easily to great effect, often in a tradeoff with other resources. Let's look at a handful of commonly-used tricks to improve average efficiency:

Tail recursion

Consider some arbitrary function that takes a form like this:

int f(int x)
{
    //do stuff, culminating in:
    return g(x);
}

What's going on with the call stack when we call f() from, say, main()? main() transfers control to f(), which in turn transfers control to g(). g() then transfers control back to f(), which transfers control back to main(). The argument and the return value are also passed along a stack in tandem.

f() branches to g() with the understanding that control will return to it later... or, does it? In this example, the g() call is the very last thing f() does: this is known as a tail call. There's no reason why g() can't simply return its result directly back to main(), as returning it to f() does no more than that. A clever compiler will detect this, and make f() simply jump to g() without needing to return. This is called tail call optimisation. In contrast, this cannot be done if f() needs to do anything after the g() call; for example:

int f(int x)
{
    //do stuff, culminating in this non-tail call:
    return 2 * g(x);
}

Tail call optimisation can become very significant when dealing with recursive functions. Let's consider one of the most classic examples of a recursive function, the factorial x!.

Note: Here, I've simply used long as the integer type. This will quickly become too small to represent the factorial. Later in this article, I'll change to an arbitrary-precision arithmetic implementation. Many are available open-source; hereinafter, I use InfInt.

long factorialR(long x)
{
    if (x < 2) return 1;
    return x * factorialR(x - 1);
}

long factorialTR(long x, long y = 1)
{
    if (x < 2) return y;
    return factorialTR(x - 1, y * x);
}

The first function above demonstrates a direct interpretation of the recurrence relation. It isn't tail recursive as it must multiply the result of the recursive call by x before returning control -- for large x, it will cause a stack overflow! The second function is mathematically equivalent but utilises a second argument as a running total, which means it can instead use tail recursion (the default argument of 1 is the base case; this can be represented by a wrapper function instead of a default). A modern, clever compiler will be able to perform tail call optimisation to avoid stack overflow with the second function.

Having said that, an even more modern, clever compiler will be able to convert the first function into the optimised form, too, so it's increasingly unlikely one has to actively worry about tail recursion. However, not all languages and compilers support tail call optimisation, so it remains an extremely important low-level concept to be aware of.

Note: For example, whilst tail call optimisation is not part of the C++ standard, major modern compilers will perform it at optimisation levels of O2 and above. Python does not support it as a specific design choice, though third-party solutions are available.
Ouroboros, the snake eating its own tail, confirms the ancient Egyptians' grasp on recursive tail call optimisation techniques


Memoisation

If you've used a computer for a non-negligible amount of time, you've probably heard about the concept of a cache, most likely in relation to an internet browser. Caching is a technique based on the concept of locality of reference, which in layman's terms is a very simple observation: if something is done once, it's likely to be done again. If a user visits a website, a browser will store elements of that site in a local cache because it's likely the user will visit it again.

Memoisation refers to the application of caching to function calls. Let's begin by looking at a simple example: an iterative implementation of the factorial. Here, I've used the InfInt bignum class. I'll also use the <chrono> library for measuring the execution time of a function, using a simple timeFunc function which returns the time in seconds (source code is linked at the end of this article).

InfInt factorial(InfInt x)
{
    InfInt y = 1;
    for (InfInt i = 2; i <= x; ++i)
        y *= i;

    return y;
}

For large values, this can take a non-negligible amount of time. However, locality of reference implies that if we calculate this for a given x, we'll likely need to do it again. To memoise this function, we simply need to create a map structure which stores {x, x!} pairs. When computing a factorial, we first check if the value has already been computed and, if so, just find it; otherwise, we compute it manually and make sure to record it for future use.

InfInt factorial(InfInt x)
{
    static std::map<InfInt, InfInt> mem;

    //If it's already known, just return it
    auto it = mem.find(x);
    if (it != mem.end())
        return it->second;

    //Else, calculate from scratch
    InfInt y = 1;
    for (InfInt i = 2; i <= x; ++i)
        y *= i;

    //Log for future use
    mem[x] = y;
    return y;
}

Test:
std::cout << timeFunc(factorial, 10778) << std::endl;
std::cout << timeFunc(factorial, 10778) << std::endl;
Output:
4.63005
2.6403e-05

I've used an (ordered) std::map rather than an std::unordered_map because InfInt doesn't implement a hash. The time taken to retrieve the value from the map is incredibly smaller than the time taken to manually compute the value, and this will only improve further for larger values. We can take this further, however, by considering the recursive definition once again. If we already know the value of n!, we only need to multiply it by n+1 in order to find n+1!.

InfInt factorial(InfInt x)
{
    static std::map<InfInt, InfInt> mem;

    auto it = mem.find(x);
    if (it != mem.end())
        return it->second;

    //Recursive calculation, as earlier
    InfInt y = (x < 2) ? 1 : x * factorial(x - 1);

    mem[x] = y;
    return y;
}

Test:
std::cout << timeFunc(factorial, 5050) << std::endl;
std::cout << timeFunc(factorial, 5051) << std::endl;
Output:
0.914094
0.000388162

This exploits a rather general concept in computing and mathematical optimisation: dynamic programming. Here a top-down approach is used, which calculates 5051! by breaking it down into subproblems, namely 5050! and then 5050!×5051, reducing the overall time significantly because 5050! is more easily known.

This isn't necessarily restricted to subsequent calls of a function, either. In this case of the factorial of x, each n! for n<x is only retrieved once. But, this concept can also apply to single calls of a function which uses subproblem results more than once each, such as the Fibonacci sequence, Fn = Fn-1 + Fn-2.

InfInt fibRaw(InfInt x)
{
    if (x < 2) return x;
    else return fibRaw(x - 1) + fibRaw(x - 2);
}

InfInt fib(InfInt x)
{
    static std::map<InfInt, InfInt> mem;

    if (x < 2) return x;

    auto it = mem.find(x);
    if (it != mem.end())
        return it->second;

    InfInt y = fib(x - 1) + fib(x - 2);

    mem[x] = y;
    return y;
}

Test:
std::cout << timeFunc(fibRaw, 32) << std::endl;
std::cout << timeFunc(fib, 32) << std::endl;
std::cout << timeFunc(fib, 32) << std::endl;
Output:
18.7622
0.0003058
3.94e-06

The first function directly implements the recurrence relation. Even for an input as low as 32, it takes what we would reasonably consider a long time to complete -- this algorithm is of exponential complexity. Despite this, adding in memoisation (i.e. introducing dynamic programming) as in the second function decreases the time taken hugely, because each Fn subproblem only needs to be recursively solved once, rather than each time it occurs in the full problem. Finally, as expected, calling the same calculation once again cuts the time even shorter, as no recursion is needed at all.

Self-organising lists

Let's step away from functional optimisation and look at the most basic type of object storage: the unsorted list. Elements are stored in a linear fashion in no particular order. For lookup, we'd generally prefer using hash maps or search trees, but if our elements are neither hashable nor sortable, we're stuck with a linear search.

Similarly as to how insertion sort outperforms divide-and-conquer approaches like quicksort at small input sizes, a linear search should not be underestimated for small arrays. Nevertheless, it is far from desirable with medium-to-large lists. A useful approach to improving the effectiveness in the long run is to, as mentioned hereinbefore, take advantage of locality of reference: if we search for an element once, we'll likely search for it again.

Let's initialise a vector of n values. The worst case for a linear search is an absence of the target element, which is of no interest to us here. The worst positive case is the element being at the very end, index n-1. First, let's try a simple idea of self-organisation: if we find an element, swap it straight to the front.

Note: To ensure the vector is passed by reference, remember to adjust the timeFunc parameter list appropriately.

template <typename T>
bool linearSearch(std::vector<T> &v, T x)
{
    bool found = false;
    size_t i = 0;
    for (; i < v.size() && !found; ++i)
    found = v[i] == x;
 
    //Front transpose
    --i;
    if (found && i != 0)
        std::swap(v[i], v[0]);

    return found;
}

Test:
size_t n = 30000000;
std::vector<InfInt> v(n);
for (size_t i = 0; i < n; ++i)
    v[i] = i;
--n;
std::cout << timeFunc(linearSearch<InfInt>, v, n) << std::endl;
std::cout << timeFunc(linearSearch<InfInt>, v, n) << std::endl;
std::cout << timeFunc(linearSearch<InfInt>, v, n) << std::endl;
std::cout << timeFunc(linearSearch<InfInt>, v, n) << std::endl;
Output:
2.84668
2.365e-06
2.758e-06
1.97e-06

Obviously, the time taken to find the element plummets as soon as it's found once already, because the target element being in the first position is the best-case scenario for the linear search. This is fantastic for recalling one element in the array! Unfortunately, it is far too drastic when considering more than one element: effectively, this only keeps a 'cache' of one, at the beginning of the list, over-rewarding an element searched once whilst instantly demoting the element currently in prime position. Let's take a less drastic measure and swap an element into its preceding position, instead:

//Predecessor transpose
if (found && i != 0)
    std::swap(v[i], v[i - 1]);

Output:
2.81738
2.81649
2.81442
2.82053

This method more appropriately rewards elements at their own rate, without disrupting the rewarding of others too much. However, the magnitude of the reward is clearly far too low; the time barely improves, and indeed the fourth time anomalously increases simply due to the variance involved (some test runs show no decreasing pattern at all). Instead, let's attempt a compromise wherein we move the element half-way up to the front, i.e. n -> n/2.

//Binary transpose
if (found && i != 0)
    std::swap(v[i], v[i / 2]);

Output:
2.81354
1.42044
0.720225
0.351029

Independence is still largely retained from the rewarding of other elements, whilst the search time roughly halves each time, which is a lot more effective than decreasing by some constant factor.

The concept of a self-organising list is simple but effective. It enjoys use and research in areas such as embedded systems and artificial intelligence, but also conceptual application in human-computer interaction: it's exceedingly common for a list of options in a UI to organise the most-commonly selected elements to the top/front by some scheme.

Sentinel values

Let's stay on the topic of linear search and take a brief look at its complexity. Quite intuitively, it is of linear time complexity, but it may be interesting to look more closely at the constant time factors. In each iteration of the loop, we perform two comparisons:
  • has the element been found?
  • after increasing the index, are we still within bound?
The linear search returns in the negative if our index exceeds the bounds of the input before the element has been found. Thus, for an input size n, we are performing roughly 2n comparisons in total. There's a way we can slightly improve linear search by exploiting a different aspect of our data: not hashing or ordering, but the existence of a null value. This can take the form of an invalid type of element in our context; for instance, a null pointer, or a zero in a list of positive integers.

We call this a sentinel value; it sees use in a number of contexts not directly related to linear search in this explicit sense. The most well-known example is the null-terminated string, i.e. the C-style string, which is nothing more than an array of characters. With an arbitrary list of characters in memory, how do we know when to stop reading the string? One possibility is to always provide a size argument in tandem with the string, i.e. the array's bound, but this is clunky and turns one conceptual object, the string, into at least a pair of two arguments which must be kept together at all times. Instead, strings are finished in a '\0' character, which uniquely identifies the end of the string. The more modern C++-style std::string can instead possess a size member variable to track this. Another common example of a sentinel value is the null pointer marking the end of a linked list.

Using a sentinel value in an explicit linear search is instead an optimisation approach. If our main loop only ever needs to check against a sentinel value to detect the end of an array, we are essentially integrating the element-finding check with the index-bound check. We append the element we are searching for itself to the end of the list, regardless of whether or not it may currently exist somewhere already. Our main loop then only needs to check if the element is found while incrementing the index; we will not arrive at an out-of-bounds error because it is guaranteed to at least exist at the very end. Our final standalone check determines our result by checking if it was found only at the very end or preexisting before that. This reduces our total comparisons to roughly n+1.

template <typename T>
bool linearSearchSentinel(const std::vector<T> &v, T x)
{
    size_t i = 0;
    while (v[i] != x)
        ++i;

    return i < v.size() - 1;
}

Test:
size_t n = 20000000;
std::vector<InfInt> v(n);
for (size_t i = 0; i < n; ++i)
    v[i] = i;
 
std::cout << timeFunc(linearSearch<InfInt>, v, n) << std::endl;
v.push_back(n); //Sentinel value is target
std::cout << timeFunc(linearSearchSentinel<InfInt>, v, n) << std::endl;
Output:
1.81005
1.52224

The test shows a small but noticeable improvement in search time. In reality, however, this is often unlikely to be a useful approach, as it requires that the list be mutable: you may not have write access in a searching context. Even if you do, it is only beneficial if the decrease in search time is more significant than the time taken to construct the new element and, in the case of a dynamic array such as the vector, potentially allocate new space. Despite this, the sentinel value in general is a useful concept and a sentinel-optimised search can be beneficial in the correct context.

Approximate computing

Finally, another extremely important concept to keep in mind takes a very different form to the specific techniques demonstrated in this article:

The importance of an exact solution/strategy can be overrated.

If an exact solution takes two years, but a sufficiently approximate solution two seconds, it's clear which is likely to be a more useful approach in practice. Perhaps we can use an approximate solution that can gradually become more accurate to some arbitrary extent we desire (hill climbing)? Perhaps a solution has only some probability to be incorrect, in an acceptable cost-benefit analysis (Monte Carlo algorithm)? Perhaps an approximate solution can help us find an exact solution much more quickly than brute-force (heuristics)? For now, these are topics for another day's post.

Source code

Comments

Popular Posts