# Sum of pairwise Hamming Distance | InterviewBit | Solution Explained in Detail

InterviewBit: Sum of pairwise Hamming Distance is an interesting interview question. I explain the bit manipulation approach in detail, along with visuals to help understand things better. IB Problem Link. LC Problem Link.

## 0. Sum of pairwise Hamming Distance: Problem Discussion

### 0.0. IO Format

- input:
- a list of integers, A

- output:
- a single integer value, which is the sum of pairwise hamming distances

- constraints:
- len(A) is 2 * 10
^{5}on IB and 10^{4}on LC - A[i] can be 0 to 10
^{9}

- len(A) is 2 * 10

Hamming distance between two non-negative integers is the amount of bits they differ in – for example hamming distance between 2 and 4 is 2.

Similarly, hamming distance between 6 and 4 is given as:

### 0.1. Examples

To understand the problem better, let’s look at an example.

For [2,4,6], we have 9 different possible pairs. We compute hamming distance for all of them. The final expression comes out to be `(0 + 2 + 1) + (2 + 0 + 1) + (1 + 1 + 0)`

which is equal to 8.

## 1. Sum of pairwise Hamming Distance: Naïve Approach

### 1.0. Intuitions

One way to approach this problem is to implement exactly as what the question says – we can go over all the N^{2} possibilities and then find hd(x, y) for each pair of (x, y).

Finding hd(x, y) is the easy part – since we can take the XOR of x and y and find the number of bits set. XOR because we want to find the number of **disagreements** between x and y.

If you are confused about this part – and it feels non-obvious, I’d recommend doing some more bit related questions.

So,

- iterate over all pairs of possibilities
`O(N`

^{2}) - find the hd for each pair
`O(1)`

## 2. Sum of pairwise Hamming Distance: Naïve Implementation

### 2.0. Code

int hd(int x, int y) { int count = 0; for (int i = 0; i < 32; i++) { if ((1 << i) & (x^y)) { count++; } } return count; } int Solution::hammingDistance(const vector<int> &A) { long long ans = 0; long long MOD = 1000000007; for (int x: A) { for (int y: A) { ans = (ans + hd(x, y)) % MOD; } } return ans; }

class Solution: # @param A : tuple of integers # @return an integer def hammingDistance(self, A): def get_dist(x, y): count = 0 for i in range(31): if (1 << i) & (x^y): count += 1 return count ans = 0 MOD = 1000000007 for i in range(len(A)): for j in range(i+1, len(A)): ans += get_dist(A[i], A[j]) % MOD return (ans*2)%MOD

### 2.1. Complexity Analysis

- Time:
`O(N`

, where N is the length of the array – we iterate over all pairs of elements from A.^{2}) - Space:
`O(1)`

since we only ever store some variables.

## 3. Sum of pairwise Hamming Distance: Observations & Logic

### 3.0. Need for Optimization

Maybe its your interviewer testing you out for an efficient approach, or its the red TLE on an OJ. How in the world can we optimize this?

### 3.1. A Change in Perspective

When the question does not budge, change the way you look at it. In our case, its a simple change in the way we compute hamming distance.

Till now, we defined hamming distance with the function `hd(x, y)`

, taking in two inputs, iterating over the XOR of the bits and counting the number of bits set – basically, we counted the number of times x and y disagreed.

But what if … instead of looking at numbers –> bits, we look at bits themselves?

### 3.2. Bitwise Computations

Let’s say we have the numbers: [1, 1, 0, 1, 0]. To find out hd, we find all the disagreements between these numbers. The number 1 only disagrees with the number 0 – twice. Similarly, the number 0 only disagrees with the number one – thrice.

`1`

disagreements = 2`0`

disagreements = 3

Now, what is the “Sum of pairwise Hamming Distance”? there are 3 ones and each disagrees by 2 amounts; then there are 2 zeros and each disagree with 3. So, we have `(3 * 2 + 2 * 3) = 12`

. This is just:

**number_of_ones * number_of_zeros * 2**

Feel free to check that its the correct answer!

Simple, right? This is the entire solution.

Wait what?

While we only found this for 5 simple integers, if we write down the binary representations of all the numbers, and then use the same logic as above, we can find the hd for all pairs of integers in O(31 * N) time – 31 for computing this for each bit (in an integer value range).

## 4. Sum of pairwise Hamming Distance: Optimized Implementation

### 4.0. Code

int Solution::hammingDistance(const vector<int> &A) { long long ans = 0; long long MOD = 1000000007; for (int i = 0; i < 32; i++) { long long int number_of_ones = 0; long long int number_of_zeros = 0; for (int x: A) { if ((1 << i) & x) number_of_ones++; else number_of_zeros++; } ans = (ans + (number_of_ones * number_of_zeros * 2) % MOD) % MOD; } return ans; }

class Solution: # @param A : tuple of integers # @return an integer def hammingDistance(self, A): n = len(A) total = 0 MOD = 1000000007 for i in range(31): one_count = 0 for x in A: if (1 << i) & x: one_count += 1 total += 2*one_count*(n - one_count) % MOD return total % MOD

### 4.1. Complexity Analysis

- Time:
`O(N)`

, to iterate over the N items in the A array given. It is actually O(31*N) but that’s the same as O(N). - Space:
`O(1)`

, since we only ever store a couple of variables.

### Recommended Posts

##### Water Flow | InterviewBit | Solution Explained

May 11, 2022