LeetCode 2539: Count the Number of Good Subsequences

Brute force

Checking all substrings will take time $\mathcal{O}(n \cdot 2^n)$ .

Observations

We can use divide-and-conquer in two ways to simplify the counting problem:

Each char in a good subsequence appears the same number of times. If in the input str, the char $c$ has the max frequency $n_{max}$ , then we can partition the good subsequences in $n_{max}$ groups: having 1 of each char, 2 of each char, …, $n_{max}$ of each char. We can then count the good subsequences in each of these groups separately. Sum of these will be the answer.
There are total $2^n$ subsequences of a $n$ -length str. Say, there are m types of char. So, $n = \sum_{i=1}^{m} n_i$ . We note that $2^n = 2^{n_1} \cdot 2^{n_2} \cdots 2^{n_m}$ . Therefore, we can count the possible subsequences for each type of char separately and then multiply them together. The interleaving of these different char’s in the input does not affect the total count. We can use this idea to count $k$ -length good subsequences having a single char type. Then, the product of the counts for each distinct char in the alphabet will be the total count of good subsequences having $k$ instances of each char type for the input str. For example, if there are three $a$ ‘s and four $b$ ‘s, the count of 2-length good subsequences having $a$ only is: ${{3}\choose{2}} = 3$ and the count of 2-length good subsequences having $b$ only is ${{4}\choose{2}} = 6$ . Now, we can multiply these two counts to have the total count of good subsequences having two $a \rq$ s and two $b \rq$ s.

Count of good subsequences with $k$ instances of each char-type is thus product $\prod_{c \in \texttt{alphabet}} \left({{n_c}\choose{k}} + 1\right)$ . We add $1$ to the count for each char type to accommodate leaving that char-type altogether from the good subsequence. For example, with three $a$ ‘s, three $b$ ‘s, and three $x$ ‘s, a good subsequence having 2 of each char type is $aaxx$ where we have left $b$ altogether.

For each $k$ , in the end, we subtract 1, to reduce the count contributed by the empty subsequence, which by definition is not a good subsequence.

Efficiency of computing ${n}\choose{k}$ determines the complexity.

Dynamic Programming

Subproblem

${{n}\choose{k}} = \begin{cases} 1, & \text{if } k = 0 \\ 0, & \text{if } n = 0, k > 0 \\ {{n-1}\choose{k-1}} + {{n-1}\choose{k}} & \text{otherwise} \end{cases}$

Order

Subproblems are on a grid. Edges flow from subproblems on an earlier row and on an earlier (or current) column. So, we process in increasing order of row ( $n$ ) and on each row in increasing order of column ( $k$ ).

Time: $\mathcal{O}(n^2)$ , space: $\mathcal{O}(n^2)$ .

class Solution:
    def countGoodSubsequences(self, s: str) -> int:
        mod = pow(10, 9)+7
        alphabet = [ chr( ord('a')+offset ) for offset in range(26) ]
        n = defaultdict(int)
        for c in s:
            n[c] += 1
        n_max = max( n.values() )

        C = [
            [0] * (n_max+1) for _ in range(n_max+1)
        ]
        for i in range(n_max+1):
            C[i][0] = 1
            
        for j in range(1, n_max+1):
            for i in range(1, n_max+1):
                C[i][j] = C[i-1][j-1] + C[i-1][j]

        total_count = 0
        for k in range(1, n_max+1):
            count = 1
            for c in alphabet:
                if n[c] < k:
                    continue
                count *= (C[n[c]][k] + 1)
            count -= 1
            total_count = (total_count + count) % mod

        return total_count

Since a subproblem only depends on previous and current columns, we can keep track of 2 columns worth of ${n}\choose{k}$ and embed it into the counting loop.

Time: $\mathcal{O}(n^2)$ , space: $\mathcal{O}(n)$ .

class Solution:
    def countGoodSubsequences(self, s: str) -> int:
        mod = pow(10, 9)+7
        alphabet = [ chr( ord('a')+offset ) for offset in range(26) ]
        n = defaultdict(int)
        for c in s:
            n[c] += 1
        n_max = max( n.values() )

        C = [
            [0, 0] for _ in range(n_max+1)
        ]
        for i in range(n_max+1):
            C[i][0] = 1
            
        prev, curr = 0, 1
        total_count = 0
        for k in range(1, n_max+1):
            C[0][curr] = 0
            for i in range(1, n_max+1):
                C[i][curr] = C[i-1][prev] + C[i-1][curr]
            count = 1
            for c in alphabet:
                if n[c] < k:
                    continue
                count *= (C[n[c]][curr] + 1)
            count -= 1
            total_count = (total_count + count) % mod
            prev, curr = curr, prev

        return total_count

From factorials

We can use the direct definition of n-choose-k: ${{n}\choose{k}} = \frac{n!}{(k!)\ (n-k)!} = n! \cdot (k!)^{-1} \cdot ((n-k)!)^{-1}$ . We just need efficient ways to compute $n!$ and ${(n!)}^{-1}$ . Fortunately, the problem asks us to give the answer modulo $10^9 + 7$ which happens to be a prime number. We use Dynamic Programming:

Subproblem

$n! = \begin{cases} 1, & \text{if } n = 0 \\ n \cdot (n-1)! & \text{otherwise} \end{cases}$

Since our mod, $10^9 + 7$ , is prime, we can utilize Fermat’s little theorem to efficiently compute $(n!)^{-1}$ in the following way:

Fermat’s Little Theorem: If $p$ and $a$ are two numbers such that (1) $p$ is a prime number (2) $p \nmid a$ , then: $a^{p-1} \equiv 1 (\texttt{mod} \ p)$ . From this, we can derive: $a^{-1} \equiv a^{p-2} (\texttt{mod} \ p)$ . In turn this helps compute: $(n!)^{-1} \equiv (n!)^{10^9+7} (\texttt{mod} \ 10^9+7)$ . It just happens that the input str size is limited to $10^4$ making it suitable for Fermat’s since $10^9+7 \nmid x!$ for $x \le 10^4$ .

Let $Inv(x)$ be $(x!)^{-1}$ . Then,

$Inv(x) = \begin{cases} Inv(x+1) \cdot (x+1), & \text{if } x < n \\ (n!)^{\left(10^9+7 \right) \ - \ 2} (\texttt{mod} \ 10^9+7) & \text{otherwise} \end{cases}$ .

We precompute factorials and $Inv$ which takes $\mathcal{O}(n)$ time. With these, computing ${n}\choose{k}$ is now $\mathcal{O}(1)$ time.

Time: $\mathcal{O}(n)$ , space: $\mathcal{O}(n)$ .

class Solution:
    def compute_modular_factorials(self, n: int, mod: int) -> List[int]:
        fact = [0] * (n+1)
        fact[0] = 1
        for i in range(1, n+1):
            fact[i] = (i * fact[i-1]) % mod
        return fact

    def compute_modular_exp(self, x: int, n: int, mod: int) -> int:
        exp = 1
        while n > 0:
            half_n, is_odd = divmod(n, 2)
            if is_odd:
                exp = (exp * x) % mod
            x = (x * x) % mod
            n = half_n
        return exp

    def compute_modular_inv(self, fact: List[int], mod: int) -> List[int]:
        n = len(fact)-1
        inv = [0] * (n+1)
        inv[n] = self.compute_modular_exp( fact[n], mod-2, mod )
        for i in range(n-1, -1, -1):
            inv[i] = (inv[i+1] * (i+1)) % mod
        return inv

    def choose(self, n: int, k: int, fact: List[int], inv: List[int]) -> int:
        if k == 0:
            return 1
        if n == 0:
            return 0
        return fact[n] * inv[k] * inv[n-k]

    def countGoodSubsequences(self, s: str) -> int:
        mod = pow(10, 9)+7
        alphabet = [ chr( ord('a')+offset ) for offset in range(26) ]
        n = defaultdict(int)
        for c in s:
            n[c] += 1
        n_max = max( n.values() )

        fact = self.compute_modular_factorials( n_max, mod )
        inv  = self.compute_modular_inv( fact, mod )

        total_count = 0
        for k in range(1, n_max+1):
            count = 1
            for c in alphabet:
                if n[c] < k:
                    continue
                count *= ( self.choose(n[c], k, fact, inv) + 1 )
            total_count = (total_count + (count-1)) % mod
        return total_count

unreasonably effective