76.Minimum Window Substring¶

76.Minimum Window Substring¶

Tags: Hard String Hash Table Two Pointers Sliding Window

Links: https://leetcode.com/problems/minimum-window-substring/

Given a string S and a string T, find the minimum window in S which will contain all the characters in T in complexity O(n).

Example:

Input: S = "ADOBECODEBANC", T = "ABC"
Output: "BANC"

Note:

If there is no such window in S that covers all characters in T, return the empty string "".
If there is such window, you are guaranteed that there will always be only one unique minimum window in S.

最初没有经过优化的版本, 思路是首先找到一个包含字符串t所有字符的子串, 然后将start不断的前移,直到无法包含t的所有字符为止, 时间复杂度为 $O(n)$ ，空间复杂度为 $O(1)$ 。

class Solution {
public:
    string minWindow(string s, string t) {
        std::ios_base::sync_with_stdio(false);
        cin.tie(NULL);
        cout.tie(NULL);

        int m = s.size(), n = t.size(); if (m < n) return "";
        vector<int> text(256, 0), pattern(256, 0);
        for (auto &e : t) ++pattern[e];
        for (int i = 0; i < n; ++i) ++text[s[i]];

        int resStart = 0, len = INT_MAX;
        int start = 0, end = n - 1;
        while (true) {
            while (check(pattern, text)) {
                if (end - start + 1 < len) {
                    len = end - start + 1;
                    resStart = start;
                }
                --text[s[start++]];
            }

            if (end + 1 >= m) break;

            while (end + 1 < m && !check(pattern, text)) {
                ++text[s[++end]];
            }
        }

        return len == INT_MAX ? "" : s.substr(resStart, len);
    }

    inline bool check(vector<int> &pattern, vector<int> &text)
    {
        for (int i = 0; i < 256; ++i) {
            if (text[i] < pattern[i]) return false;
        }

        return true;
    }
};

上面的这个版本最开始让len长度为m，但是会在s = "a", t = "b"的时候出错，所以增加了一步判断。但是时间性能不是很好，因为每次去check的时候要最坏可能进行256次比较，而且存在很多无意义的比较，那么这里可以存在进一步的优化。

思路是正确的，但是我们可以把while循环里的部分进行一下精简。另外这道题对于题意的理解也是很重要的一部分，究竟是包含所有字符，比如一个字符重复出现aa，那么究竟是包含一个a就可以，还是需要包含所有的a，从题目角度来看，应该是需要考虑包含重复字符的。

统计好t串中字母的个数了之后，开始遍历s串，对于s中的每个遍历到的字母，都在 hash中的映射值减1，如果减1后的映射值仍大于等于0，说明当前遍历到的字母是T串中的字母，使用一个计数器 cnt，使其自增1。当 cnt 和t串字母个数相等时，说明此时的窗口已经包含了T串中的所有字母，此时更新一个 len和起始位置pos，这里的 len用来记录出现过的包含T串所有字母的最短的子串的长度。然后开始收缩左边界，由于遍历的时候，对映射值减了1，所以此时去除字母的时候，就要把减去的1加回来，此时如果加1后的值大于0了，说明此时少了一个t中的字母，那么 cnt值就要减1了，然后移动左边界 start。

class Solution {
public:
    string minWindow(string s, string t) {
        std::ios_base::sync_with_stdio(false);
        cin.tie(NULL);
        cout.tie(NULL);

        int m = s.size(), n = t.size(); if (m < n) return "";
        vector<int> hash(256, 0);
        for (auto & e: t) ++hash[e];
        int cnt = 0, len = INT_MAX, pos = 0;
        int start = 0;
        for (int i = 0; i < m; ++i) {
            if (--hash[s[i]] >= 0) ++cnt; //如果对应字符出现过，计数器+1
            while (cnt == n) {
                if (len > i - start + 1) { //更新最小长度和起始位置
                    len = i - start + 1;
                    pos = start;
                }
                if (++hash[s[start]] > 0) --cnt; //说明对应字符在t中出现过
                ++start; //缩小左边界
            }
        }

        return len == INT_MAX ? "" : s.substr(pos, len);
    }
};

考虑到都是字符实现的，那么只需要开一个256的数组肯定够用，避免了去用map。