Sum of pairwise Hamming Distance | InterviewBit | Solution Explained in Detail

InterviewBit: Sum of pairwise Hamming Distance is an interesting interview question. I explain the bit manipulation approach in detail, along with visuals to help understand things better. IB Problem Link. LC Problem Link.

0. Sum of pairwise Hamming Distance: Problem Discussion

0.0. IO Format

  • input:
    • a list of integers, A
  • output:
    • a single integer value, which is the sum of pairwise hamming distances
  • constraints:
    • len(A) is 2 * 105 on IB and 104 on LC
    • A[i] can be 0 to 109

Hamming distance between two non-negative integers is the amount of bits they differ in – for example hamming distance between 2 and 4 is 2.

 Sum of pairwise Hamming Distance: hamming distance example
Sum of pairwise Hamming Distance: hamming distance example

Similarly, hamming distance between 6 and 4 is given as:

 Sum of pairwise Hamming Distance: hamming distance example
Sum of pairwise Hamming Distance: hamming distance example 2

0.1. Examples

To understand the problem better, let’s look at an example.

For [2,4,6], we have 9 different possible pairs. We compute hamming distance for all of them. The final expression comes out to be (0 + 2 + 1) + (2 + 0 + 1) + (1 + 1 + 0) which is equal to 8.

 Sum of pairwise Hamming Distance: example explanation
Sum of pairwise Hamming Distance: example explanation

1. Sum of pairwise Hamming Distance: Naïve Approach

1.0. Intuitions

One way to approach this problem is to implement exactly as what the question says – we can go over all the N2 possibilities and then find hd(x, y) for each pair of (x, y).

Finding hd(x, y) is the easy part – since we can take the XOR of x and y and find the number of bits set. XOR because we want to find the number of disagreements between x and y.

If you are confused about this part – and it feels non-obvious, I’d recommend doing some more bit related questions.

So,

  1. iterate over all pairs of possibilities O(N2)
  2. find the hd for each pair O(1)

2. Sum of pairwise Hamming Distance: Naïve Implementation

2.0. Code

int hd(int x, int y) {
    int count = 0;
    for (int i = 0; i < 32; i++) {
        if ((1 << i) & (x^y)) {
            count++;
        }
    }
    return count;
}

int Solution::hammingDistance(const vector<int> &A) {
    long long ans = 0;
    long long MOD = 1000000007;

    for (int x: A) {
        for (int y: A) {
            ans = (ans + hd(x, y)) % MOD;
        }
    }

    return ans;
}
class Solution:
    # @param A : tuple of integers
    # @return an integer
    def hammingDistance(self, A):

        def get_dist(x, y):
            count = 0
            for i in range(31):
                if (1 << i) & (x^y): count += 1
            return count
        
        ans = 0
        MOD = 1000000007
        for i in range(len(A)):
            for j in range(i+1, len(A)):
                ans += get_dist(A[i], A[j]) % MOD
        return (ans*2)%MOD

2.1. Complexity Analysis

  • Time: O(N2), where N is the length of the array – we iterate over all pairs of elements from A.
  • Space: O(1) since we only ever store some variables.

3. Sum of pairwise Hamming Distance: Observations & Logic

3.0. Need for Optimization

Maybe its your interviewer testing you out for an efficient approach, or its the red TLE on an OJ. How in the world can we optimize this?

3.1. A Change in Perspective

When the question does not budge, change the way you look at it. In our case, its a simple change in the way we compute hamming distance.

Till now, we defined hamming distance with the function hd(x, y), taking in two inputs, iterating over the XOR of the bits and counting the number of bits set – basically, we counted the number of times x and y disagreed.

review of how we computed hamming distance for two integers
review of how we computed hamming distance for two integers

But what if … instead of looking at numbers –> bits, we look at bits themselves?

3.2. Bitwise Computations

Let’s say we have the numbers: [1, 1, 0, 1, 0]. To find out hd, we find all the disagreements between these numbers. The number 1 only disagrees with the number 0 – twice. Similarly, the number 0 only disagrees with the number one – thrice.

  • 1 disagreements = 2
  • 0 disagreements = 3

Now, what is the “Sum of pairwise Hamming Distance”? there are 3 ones and each disagrees by 2 amounts; then there are 2 zeros and each disagree with 3. So, we have (3 * 2 + 2 * 3) = 12. This is just:

number_of_ones * number_of_zeros * 2

Feel free to check that its the correct answer!

efficient logic for finding out Sum of pairwise Hamming Distance
efficient logic for finding out Sum of pairwise Hamming Distance

Simple, right? This is the entire solution.

Wait what?

While we only found this for 5 simple integers, if we write down the binary representations of all the numbers, and then use the same logic as above, we can find the hd for all pairs of integers in O(31 * N) time – 31 for computing this for each bit (in an integer value range).

 Sum of pairwise Hamming Distance: Optimized logic dry run: iteration 1
Sum of pairwise Hamming Distance: Optimized logic dry run: iteration 1
 Sum of pairwise Hamming Distance: Optimized logic dry run: iteration 2
Sum of pairwise Hamming Distance: Optimized logic dry run: iteration 2
 Sum of pairwise Hamming Distance: Optimized logic dry run: iteration 3
Sum of pairwise Hamming Distance: Optimized logic dry run: iteration 3

4. Sum of pairwise Hamming Distance: Optimized Implementation

4.0. Code

int Solution::hammingDistance(const vector<int> &A) {
    long long ans = 0;
    long long MOD = 1000000007;

    for (int i = 0; i < 32; i++) {
        long long int number_of_ones = 0;
        long long int number_of_zeros = 0;

        for (int x: A) {
            if ((1 << i) & x) number_of_ones++;
            else number_of_zeros++;
        }

        ans = (ans + (number_of_ones * number_of_zeros * 2) % MOD) % MOD;
    }

    return ans;
}
class Solution:
# @param A : tuple of integers
# @return an integer
def hammingDistance(self, A):
        n = len(A)
        total = 0
        MOD = 1000000007
        for i in range(31):
            one_count = 0
            for x in A:
                if (1 << i) & x: one_count += 1
            total += 2*one_count*(n - one_count) % MOD
        return total % MOD

4.1. Complexity Analysis

  • Time: O(N), to iterate over the N items in the A array given. It is actually O(31*N) but that’s the same as O(N).
  • Space: O(1), since we only ever store a couple of variables.

Avatar for Tanishq Chaudhary

Producing high-quality intuitive explanations of interview problems. Currently covering LeetCode and InterviewBit.

    Comments

    1. Please write the proof.

    2. This solution need an amendment for 2 digit number.

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    This site uses Akismet to reduce spam. Learn how your comment data is processed.