Sum of pairwise Hamming Distance | InterviewBit | Solution Explained in Detail

January 11, 2022

InterviewBit Math Solutions, InterviewBit Solutions

Tanishq Chaudhary

InterviewBit: Sum of pairwise Hamming Distance is an interesting interview question. I explain the bit manipulation approach in detail, along with visuals to help understand things better. IB Problem Link. LC Problem Link.

Contents hide

0. Sum of pairwise Hamming Distance: Problem Discussion

0.0. IO Format

0.1. Examples

1. Sum of pairwise Hamming Distance: Naïve Approach

1.0. Intuitions

2. Sum of pairwise Hamming Distance: Naïve Implementation

2.0. Code

2.1. Complexity Analysis

3. Sum of pairwise Hamming Distance: Observations & Logic

3.0. Need for Optimization

3.1. A Change in Perspective

3.2. Bitwise Computations

4. Sum of pairwise Hamming Distance: Optimized Implementation

4.0. Code

4.1. Complexity Analysis

0. Sum of pairwise Hamming Distance: Problem Discussion

0.0. IO Format

input:
- a list of integers, A
output:
- a single integer value, which is the sum of pairwise hamming distances
constraints:
- len(A) is 2 * 10⁵ on IB and 10⁴ on LC
- A[i] can be 0 to 10⁹

Hamming distance between two non-negative integers is the amount of bits they differ in – for example hamming distance between 2 and 4 is 2.

Sum of pairwise Hamming Distance: hamming distance example

Similarly, hamming distance between 6 and 4 is given as:

0.1. Examples

To understand the problem better, let’s look at an example.

For [2,4,6], we have 9 different possible pairs. We compute hamming distance for all of them. The final expression comes out to be (0 + 2 + 1) + (2 + 0 + 1) + (1 + 1 + 0) which is equal to 8.

Sum of pairwise Hamming Distance: example explanation

1. Sum of pairwise Hamming Distance: Naïve Approach

1.0. Intuitions

One way to approach this problem is to implement exactly as what the question says – we can go over all the N² possibilities and then find hd(x, y) for each pair of (x, y).

Finding hd(x, y) is the easy part – since we can take the XOR of x and y and find the number of bits set. XOR because we want to find the number of disagreements between x and y.

If you are confused about this part – and it feels non-obvious, I’d recommend doing some more bit related questions.

So,

iterate over all pairs of possibilities O(N²)
find the hd for each pair O(1)

2. Sum of pairwise Hamming Distance: Naïve Implementation

2.0. Code

int hd(int x, int y) {
    int count = 0;
    for (int i = 0; i < 32; i++) {
        if ((1 << i) & (x^y)) {
            count++;
        }
    }
    return count;
}

int Solution::hammingDistance(const vector<int> &A) {
    long long ans = 0;
    long long MOD = 1000000007;

    for (int x: A) {
        for (int y: A) {
            ans = (ans + hd(x, y)) % MOD;
        }
    }

    return ans;
}

class Solution:
    # @param A : tuple of integers
    # @return an integer
    def hammingDistance(self, A):

        def get_dist(x, y):
            count = 0
            for i in range(31):
                if (1 << i) & (x^y): count += 1
            return count
        
        ans = 0
        MOD = 1000000007
        for i in range(len(A)):
            for j in range(i+1, len(A)):
                ans += get_dist(A[i], A[j]) % MOD
        return (ans*2)%MOD

2.1. Complexity Analysis

Time: O(N²), where N is the length of the array – we iterate over all pairs of elements from A.
Space: O(1) since we only ever store some variables.

3. Sum of pairwise Hamming Distance: Observations & Logic

3.0. Need for Optimization

Maybe its your interviewer testing you out for an efficient approach, or its the red TLE on an OJ. How in the world can we optimize this?

3.1. A Change in Perspective

When the question does not budge, change the way you look at it. In our case, its a simple change in the way we compute hamming distance.

Till now, we defined hamming distance with the function hd(x, y), taking in two inputs, iterating over the XOR of the bits and counting the number of bits set – basically, we counted the number of times x and y disagreed.

review of how we computed hamming distance for two integers

But what if … instead of looking at numbers –> bits, we look at bits themselves?

3.2. Bitwise Computations

Let’s say we have the numbers: [1, 1, 0, 1, 0]. To find out hd, we find all the disagreements between these numbers. The number 1 only disagrees with the number 0 – twice. Similarly, the number 0 only disagrees with the number one – thrice.

1 disagreements = 2
0 disagreements = 3

Now, what is the “Sum of pairwise Hamming Distance”? there are 3 ones and each disagrees by 2 amounts; then there are 2 zeros and each disagree with 3. So, we have (3 * 2 + 2 * 3) = 12. This is just:

number_of_ones * number_of_zeros * 2

Feel free to check that its the correct answer!

efficient logic for finding out Sum of pairwise Hamming Distance

Simple, right? This is the entire solution.

Wait what?

While we only found this for 5 simple integers, if we write down the binary representations of all the numbers, and then use the same logic as above, we can find the hd for all pairs of integers in O(31 * N) time – 31 for computing this for each bit (in an integer value range).

Sum of pairwise Hamming Distance: Optimized logic dry run: iteration 1

Sum of pairwise Hamming Distance: Optimized logic dry run: iteration 2

Sum of pairwise Hamming Distance: Optimized logic dry run: iteration 3

4. Sum of pairwise Hamming Distance: Optimized Implementation

4.0. Code

int Solution::hammingDistance(const vector<int> &A) {
    long long ans = 0;
    long long MOD = 1000000007;

    for (int i = 0; i < 32; i++) {
        long long int number_of_ones = 0;
        long long int number_of_zeros = 0;

        for (int x: A) {
            if ((1 << i) & x) number_of_ones++;
            else number_of_zeros++;
        }

        ans = (ans + (number_of_ones * number_of_zeros * 2) % MOD) % MOD;
    }

    return ans;
}

class Solution:
# @param A : tuple of integers
# @return an integer
def hammingDistance(self, A):
        n = len(A)
        total = 0
        MOD = 1000000007
        for i in range(31):
            one_count = 0
            for x in A:
                if (1 << i) & x: one_count += 1
            total += 2*one_count*(n - one_count) % MOD
        return total % MOD

4.1. Complexity Analysis

Time: O(N), to iterate over the N items in the A array given. It is actually O(31*N) but that’s the same as O(N).
Space: O(1), since we only ever store a couple of variables.

math

PRV POST

NXT POST

Tanishq Chaudhary

Producing high-quality intuitive explanations of interview problems. Currently covering LeetCode and InterviewBit.

Sum of pairwise Hamming Distance | InterviewBit | Solution Explained in Detail

January 11, 2022

InterviewBit Math Solutions, InterviewBit Solutions

0. Sum of pairwise Hamming Distance: Problem Discussion

0.0. IO Format

0.1. Examples

1. Sum of pairwise Hamming Distance: Naïve Approach

1.0. Intuitions

2. Sum of pairwise Hamming Distance: Naïve Implementation

2.0. Code

2.1. Complexity Analysis

3. Sum of pairwise Hamming Distance: Observations & Logic

3.0. Need for Optimization

3.1. A Change in Perspective

3.2. Bitwise Computations

4. Sum of pairwise Hamming Distance: Optimized Implementation

4.0. Code

4.1. Complexity Analysis

Tanishq Chaudhary

Recommended Posts

Gas Station | InterviewBit | Solution Explained

Majority Element | InterviewBit | Solution Explained

Assign Mice to Holes | InterviewBit | Solution Explained

Comments

Leave a Reply Cancel reply