# Dynamic Programming on Distinct Subsequences

Datetime:2016-08-23 03:11:42         Topic: Dynamic Programming          Share        Original >>

Dynamic programming (DP) is a group of very useful algorithms to solve searching problems. In many cases, it is easy to realize that a particular problem can be solved in DP, but you may spend a lot of time on finding the iterative equations. Distinct Subsequences is one such problem.

Here is the description from leetcode.

Given a string S and a string T , count the number of distinct subsequences of T in S .

A subsequence of a string is a new string which is formed from the original string by deleting some (can be none) of the characters without disturbing the relative positions of the remaining characters. (ie, `"ACE"` is a subsequence of `"ABCDE"` while `"AEC"` is not).

Here is an example:

S= `"rabbbit"` , T = `"rabbit"`

Return `3` .

If you are sensitive enough, "two strings" and "subsequence" will guide you to DP immediately. From some previous experience like Edit Distance , you know it is a good idea to construct an `m` by `n` matrix, where `m = T.length()` and `n = S.length()` . Each element in the matrix, say `matrix[i][j]` , represents the number of distinct subsequences of string `T[0:i+1]` and `S[0:j+1]` . Now comes the hard part: how to fill in the matrix?

The matrix can be filled row by row, i.e. each time we add a single character in `T` and evaluate the number of distinct subsequences for all prefix of `S` . If `T[i] != S[j]` , since `T[0:i+1]` is a subsequence of `S[0:j+1]` (if exists), the new character `T[i]` should be matched before `S[j]` . As a result we have the equation `matrix[i][j] = matrix[i][j-1]` .

If `T[i] == S[j]` , there are two possible cases: (1) `T[i]` is the character that matches to `S[j]` , so we should throw away both `T[i]` and `S[j]` , and the number of subsequences should equal to `matrix[i-1][j-1]` ; or (2) `T[i]` is matched to a character before `S[j]` , where we can get the number from `matrix[i][j-1]` , as we have illustrated before. The total number of distinct subsequences is the sum of those two cases, so the equation should be `matrix[i][j] = matrix[i-1][j-1] + matrix[i][j-1]` .

To summarize, the core logic of this DP problem is shown in the following code block

`if (S[j] != T[i])    matrix[i][j] = matrix[i][j-1];else    matrix[i][j] = matrix[i-1][j-1] + matrix[i][j-1];`

Since we only use two adjacent rows in the matrix at any time, the memory consumption can be cut down to \$O(n)\$ instead of \$O(mn)\$.

In conclusion, the difficulty of this common DP problem stands on finding the correct iterative equations. Some patterns are well-known for similar problems, such as the allocation of `m` by `n` matrix. However you will need to try hard to understand the exact meaning of a cell, and how to connect the meaning among multiple cells.