Two Sum (two solutions)¶

Michael Snowden, 24 January 2024, last updated 16 March 2024

This simple algorithmic essay aims to solve the classic Two Sum problem from LeetCode.

Readers should have an intermediate understanding of Python to understand this essay.

1 Problem¶

Given an array of integers nums and an integer target, return indices of the two numbers such that they add up to target.

$-109 \leq$ nums[i] $\leq 109$
$-109 \leq$ target $\leq 109$
Only one valid answer exists.

2 Algorithms¶

With our problem defined, the next step is to think of ways to solve it. This section presents two approaches to solving Two Sum: brute force, and mapping.

2.1 Brute force¶

Generally speaking, a brute force algorithm tries all possibilities, and selects a correct one. For this problem, the possibilities are all sums that can be obtained by pairing each number in nums with every other number, and the correct pair is selected when the sum matches target.

Brute force algorithm: An outer loop iterates through each number in nums, then for each number, an inner loop iterates nums again. For each pair of numbers, if their indices are different and their sum matches target, return their indices.

2.2 Mapping¶

In the Brute force algorithm, we checked each pair of numbers in nums to see if the resulting sum was equal to target. Since we are already checking every number in the list, why not store some piece information from each number that will help us find our matching pair?

For every number in nums, we can map the difference between it and the target (target - number) to its corresponding index using a hashtable. This allows us to check the hashtable for matching numbers much faster.

Mapping algorithm: For each number in nums, if it's in the hashmap, return its index and the index mapped to it. Otherwise, calculate the difference (target - number) and map it to the corresponding index of number.

3 Code¶

In this section we will implement and test the algorithms.

In [1]:

Copied!

import os

if 'COLAB_GPU' in os.environ:  # if we are in Google Colab...
    !pip install algoesup --no-deps
    !pip install allowed ruff
import os

if 'COLAB_GPU' in os.environ:  # if we are in Google Colab...
    !pip install algoesup --no-deps
    !pip install allowed ruff

In [2]:

Copied!

from algoesup import test, time_functions
from algoesup import test, time_functions

In [3]:

Copied!

%load_ext algoesup.magics
%ruff on
%allowed on
%load_ext algoesup.magics
%ruff on
%allowed on

ruff was activated
allowed was activated

3.1 Testing¶

We start off by writing some tests.

To test the above solutions, we need to consider edge cases and other important functional tests. We should include tests for the minimum input size, and any extremes values that can be present. When integers are part of the input, and there are no restrictions, negative numbers and zero should be added to the tests.

In [4]:

Copied!





two_sum_tests = [
  # ["description", nums, target, expected_output],
    ["minimum size for nums", [1, 2], 3, (0, 1)],
    ["non-adjacent indices", [1, 4, 9, 7], 8, (0, 3)],
    ["first two elements", [5, 7, 1, 2, 8], 12, (0, 1)],
    ["last two elements", [1, 3, 5, 7, 8], 15, (3, 4)],
    ["repeated elements", [6, 2, 3, 2], 4, (1, 3)],
    ["max and min range", [-109, 109, 0], 0, (0, 1)],
    ["lowest target value", [-50, 1, -59], -109, (0, 2)],
    ["highest target value", [50, 1, 59], 109, (0, 2)],
]
two_sum_tests = [
  # ["description", nums, target, expected_output],
    ["minimum size for nums", [1, 2], 3, (0, 1)],
    ["non-adjacent indices", [1, 4, 9, 7], 8, (0, 3)],
    ["first two elements", [5, 7, 1, 2, 8], 12, (0, 1)],
    ["last two elements", [1, 3, 5, 7, 8], 15, (3, 4)],
    ["repeated elements", [6, 2, 3, 2], 4, (1, 3)],
    ["max and min range", [-109, 109, 0], 0, (0, 1)],
    ["lowest target value", [-50, 1, -59], -109, (0, 2)],
    ["highest target value", [50, 1, 59], 109, (0, 2)],
]

3.2 Implementations¶

The next cell implements the Brute force algorithm using nested for loops

In [5]:

Copied!





def two_sum_bf(nums, target):
    """Given a list of integers return the indices of the pair that sums to target."""
    for index_1 in range(len(nums)):
        for index_2 in range(len(nums)):
            if index_1 != index_2 and nums[index_1] + nums[index_2] == target:
                return index_1, index_2

test(two_sum_bf, two_sum_tests)
def two_sum_bf(nums, target):
    """Given a list of integers return the indices of the pair that sums to target."""
    for index_1 in range(len(nums)):
        for index_2 in range(len(nums)):
            if index_1 != index_2 and nums[index_1] + nums[index_2] == target:
                return index_1, index_2

test(two_sum_bf, two_sum_tests)

Testing two_sum_bf...
Tests finished: 8 passed, 0 failed.

Next up is the Mapping algorithm implemented using Python's dict.

In [6]:

Copied!





def two_sum_map(nums, target):
    """Given a list of integers return the indices of the pair that sums to target."""
    differences = {}
    for index in range(len(nums)):
        difference = target - nums[index]
        if nums[index] in differences:
            return differences[nums[index]], index
        differences[difference] = index

test(two_sum_map, two_sum_tests)
def two_sum_map(nums, target):
    """Given a list of integers return the indices of the pair that sums to target."""
    differences = {}
    for index in range(len(nums)):
        difference = target - nums[index]
        if nums[index] in differences:
            return differences[nums[index]], index
        differences[difference] = index

test(two_sum_map, two_sum_tests)

Testing two_sum_map...
Tests finished: 8 passed, 0 failed.

4 Performance¶

In this section we measure the run-times of our solutions under certain conditions and discuss the results.

4.1 Generating inputs¶

Since time_functions from the algoesup library requires code to generate inputs, we shall write that first.

It is often useful to measure the run-times of a solution when it is doing the most work; this is called the worst-case. We want to generate inputs that will take our solution the most time to complete, and this happens when the two numbers that sum to target are the last two.

In [7]:

Copied!





def worst(size):
    """Given a size, generate a worst-case problem instance for two sum."""
    nums = [0] * (size - 2) + [1, 1]
    target = 2
    return (nums, target)
def worst(size):
    """Given a size, generate a worst-case problem instance for two sum."""
    nums = [0] * (size - 2) + [1, 1]
    target = 2
    return (nums, target)

4.2 Run-times for each solution¶

We now compare worst-case runtimes for both solutions. The input has an initial size of 100, and is doubled 4 times; the run times are measured for the initial size, then each time it is doubled.

In [8]:

Copied!

solutions = [two_sum_bf, two_sum_map]
time_functions(solutions, worst, start=100, double=4, chart=True)
solutions = [two_sum_bf, two_sum_map]
time_functions(solutions, worst, start=100, double=4, chart=True)

Inputs generated by worst

Input size      two_sum_bf     two_sum_map 
       100           454.4             5.7 µs
       200          1906.9            11.1 µs
       400          7516.0            22.5 µs
       800         30797.2            45.9 µs
      1600        125003.9            93.3 µs

No description has been provided for this image

The numbers printed before the chart represent the run-times of our solutions in microseconds (µs) for increasing input sizes.

On the chart, the data points for two_sum_bf almost instantly eclipse that of two_sum_map. It looks as if the run-times for two_sum_map are not growing at all, but we know by looking at numbers above that this is not the case.

Let us see if we can modify the inputs of time_functions for a better visual representation.

In [9]:

Copied!

solutions = [two_sum_bf, two_sum_map]
time_functions(solutions, worst, start=1, double=4, text=False, chart=True)
solutions = [two_sum_bf, two_sum_map]
time_functions(solutions, worst, start=1, double=4, text=False, chart=True)

We changed the initial input size to 1, and the trend of the run-times is a little clearer now. The Brute force algorithm's run-time's still accelerate off into the stratosphere, but we can see the separation and trend of the Mapping algorithm a little better.

5 Conclusion¶

We started this essay with the definition of the Two Sum problem. Next, we outlined two algorithms: brute force, and mapping . After that, we implemented and tested our solutions using Python, and in the penultimate section we used empirical testing and discussed the results. Now we must decide which of our algorithms is best.

The Brute force algorithm is not very efficient when it comes to run-times. When the size of the input increases the run-times increase by a large amount. The one redeeming aspect of this algorithm is the efficient memory usage which is achieved by not using any additional data structures like dictionaries.

In contrast, the Mapping algorithm is reasonably efficient in terms of run times, but this is achieved by using extra memory in the form of the dictionary. In the final analysis, the slow run-times of the brute force algorithm cannot be ignored. The small trade of memory for faster run-times is worth it in this instance. We therefore conclude the mapping algorithm is best.