Number of Valid Words for Each Puzzle | LeetCode | Bitmaksing Solution Explained in Detail
LeetCode 1178. Number of Valid Words for Each Puzzle. It can seem hard, but is really a breakdown of a couple of simple ideas. I explain the bitmasking solution in detail. Problem Link.
0. Number of Valid Words for Each Puzzle: Problem Discussion
0.0. Input/Output format
Given: words list, puzzles list
Goal: return the number of valid words a puzzle has, for each puzzle
- words list (N): 105
- puzzles list (M): 104
- word length (W): 50
- puzzle length (P): 7
1. Number of Valid Words for Each Puzzle: Observations & Logic
1.0. Bruteforce Solution
Let’s do exactly as the question says.
class Solution: def findNumOfValidWords(self, words: List[str], puzzles: List[str]) -> List[int]: ans =  for puzzle in puzzles: count = 0 for word in words: valid = 1 if puzzle not in word: valid = 0 else: for c in word: if c not in puzzle: valid = 0 break count += valid ans.append(count) return ans
- Time: O(MNWP) for all the nested loops and the if condition:
if c not in puzzle.
- Space: O(1). Storing
ansis not counted, since it is required as the answer.
1.1. Key Observations
Take the example of
words = ["aasdf", "dsa", "cat"] and
puzzles = ["asdf", "meow"]
We note one interesting thing.
Frequency and ordering of characters in a word W does not matter – we simply care about its existence.
The puzzle word “asdf” maps to both “aasdf” and “dsa” because both these words follow both the conditions.
- puzzle = “asdf” = ‘a’ is present in both the words
- all the characters of both the words are present in puzzle.
This is where we introduce bitmasks.
Since we only care about the existence of characters, we can have 26 total positions, each encoding the information whether the
ith character is present in the word or not.
Now, we can store these bit sequences as strings, or, there’s one more way.
We can directly store them as integers. Each character represents one bit position – “aasdf” with sequence as “100101…0” will then be stored as “0…101001” (reversed) in an integer format – giving us the integer value 262185.
1.2. More Efficiency!
If you notice, we still haven’t talked about the giant O(MN) nested loop – taking 109 time. We can now start to optimize that.
Since we know the mask(P) and the mask(W) for all Ws, given a mask(P), we note the second key point:
All the valid words have mask(word) a subset of mask(P), for any given P.
Recall how the mask of “asdf” puzzle word looks like, and compare it to “aasdf” and “dsa”. Intuitively speaking, a word should be the subset of the puzzle word for it to be valid.
1.3. Implementation Detail
If you realize, there may be cases of same masks for two different words. In that case, it would be a better idea to store all of this interesting information in a dictionary (key: masked word, value: frequency).
1.4. Formalizing Logic
- convert all the words to masked words
- iterate over all the puzzles
- extract the first character of puzzle (its masked value)
- iterate over all the submasks
while submask: submask = (submask - 1) & mask(common trick, good to remember!)
- check if the submask is present in the dictionary O(1) average instead of iterating over all the characters.
While life is now much easier – with compressed representation and subset logic, there’s one more sneaky thing left.
2. Number of Valid Words for Each Puzzle: Implementation
def get_mask(word): mask = 0 for c in word: mask |= 1 << (ord(c)-ord('a')) return mask words = [get_mask(word) for word in words] d = Counter(words) ans =  for puzzle in puzzles: first = 1 << (ord(puzzle)-ord('a')) puzzle = get_mask(puzzle[1:]) count = d[first] submask = puzzle while submask: count += d[submask | first] submask = (submask - 1) & puzzle ans.append(count) return ans
2.1. Complexity Analysis
- Time: O(NW + M(P + 2P))
- NW for creating all the masks of each word in words
- M for iterating over all the puzzles
- P for getting the mask
- 2P for iterating over all the bitmasks
- Space: O(N) to store all the dictionary
- Note: We reuse
mask(word), simply replacing
- Note: We reuse
January 24, 2022
January 20, 2022
January 15, 2022