Number of Valid Words for Each Puzzle | LeetCode | Bitmaksing Solution Explained in Detail

LeetCode 1178. Number of Valid Words for Each Puzzle. It can seem hard, but is really a breakdown of a couple of simple ideas. I explain the bitmasking solution in detail. Problem Link.

0. Number of Valid Words for Each Puzzle: Problem Discussion

0.0. Input/Output format

Given: words list, puzzles list

Goal: return the number of valid words a puzzle has, for each puzzle

Constraints:

  • words list (N): 105
  • puzzles list (M): 104
  • word length (W): 50
  • puzzle length (P): 7

0.1. Example

Number of Valid Words for Each Puzzle: Sample test case (for understanding)
Number of Valid Words for Each Puzzle: Sample test case (for understanding)

1. Number of Valid Words for Each Puzzle: Observations & Logic

1.0. Bruteforce Solution

Let’s do exactly as the question says.

class Solution:
    def findNumOfValidWords(self, words: List[str], puzzles: List[str]) -> List[int]:
        
        ans = []
        for puzzle in puzzles:
            count = 0
            for word in words:
                valid = 1
                if puzzle[0] not in word:
                    valid = 0
                else:
                    for c in word:
                        if c not in puzzle:
                            valid = 0
                            break
                count += valid
                
            ans.append(count)
            
        return ans

Complexity:

  • Time: O(MNWP) for all the nested loops and the if condition: if c not in puzzle.
  • Space: O(1). Storing ans is not counted, since it is required as the answer.

1.1. Key Observations

Take the example of words = ["aasdf", "dsa", "cat"] and puzzles = ["asdf", "meow"]

We note one interesting thing.

Frequency and ordering of characters in a word W does not matter – we simply care about its existence.

The puzzle word “asdf” maps to both “aasdf” and “dsa” because both these words follow both the conditions.

  • puzzle[0] = “asdf”[0] = ‘a’ is present in both the words
  • all the characters of both the words are present in puzzle.

This is where we introduce bitmasks.

Since we only care about the existence of characters, we can have 26 total positions, each encoding the information whether the ith character is present in the word or not.

Number of Valid Words for Each Puzzle: Understanding the bitmasking
Number of Valid Words for Each Puzzle: Understanding the bitmasking

Now, we can store these bit sequences as strings, or, there’s one more way.

We can directly store them as integers. Each character represents one bit position – “aasdf” with sequence as “100101…0” will then be stored as “0…101001” (reversed) in an integer format – giving us the integer value 262185.

1.2. More Efficiency!

If you notice, we still haven’t talked about the giant O(MN) nested loop – taking 109 time. We can now start to optimize that.

Since we know the mask(P) and the mask(W) for all Ws, given a mask(P), we note the second key point:

All the valid words have mask(word) a subset of mask(P), for any given P.

Recall how the mask of “asdf” puzzle word looks like, and compare it to “aasdf” and “dsa”. Intuitively speaking, a word should be the subset of the puzzle word for it to be valid.

1.3. Implementation Detail

If you realize, there may be cases of same masks for two different words. In that case, it would be a better idea to store all of this interesting information in a dictionary (key: masked word, value: frequency).

1.4. Formalizing Logic

  • convert all the words to masked words
  • iterate over all the puzzles
    • extract the first character of puzzle (its masked value)
    • iterate over all the submasks
    • while submask: submask = (submask - 1) & mask (common trick, good to remember!)
    • check if the submask is present in the dictionary O(1) average instead of iterating over all the characters.

While life is now much easier – with compressed representation and subset logic, there’s one more sneaky thing left.

2. Number of Valid Words for Each Puzzle: Implementation

2.0. Code

        def get_mask(word):
            mask = 0
            for c in word:
                mask |= 1 << (ord(c)-ord('a'))
            return mask
        
        words = [get_mask(word) for word in words]        
        d = Counter(words)
        
        ans = []
        for puzzle in puzzles:
            first = 1 << (ord(puzzle[0])-ord('a'))
            puzzle = get_mask(puzzle[1:])
            count = d[first]
            submask = puzzle
            while submask:
                count += d[submask | first]
                submask = (submask - 1) & puzzle
            ans.append(count)
        return ans

2.1. Complexity Analysis

  • Time: O(NW + M(P + 2P))
    • NW for creating all the masks of each word in words
    • M for iterating over all the puzzles
      • P for getting the mask
      • 2P for iterating over all the bitmasks
  • Space: O(N) to store all the dictionary
    • Note: We reuse words to store mask(word), simply replacing word

Avatar for Tanishq Chaudhary

Producing high-quality intuitive explanations of interview problems. Currently covering LeetCode and InterviewBit.

    Comments

    1. If we use the method suggested with 2,2,3,3,0,0 we will also get 010 as an answer, yet it’s not the majority element

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    This site uses Akismet to reduce spam. Learn how your comment data is processed.