跳转至

1410.HTML Entity Parser

Tags: Medium String Stack

Links: https://leetcode.com/problems/html-entity-parser/


HTML entity parser is the parser that takes HTML code as input and replace all the entities of the special characters by the characters itself.

The special characters and their entities for HTML are:

  • Quotation Mark: the entity is " and symbol character is ".
  • Single Quote Mark: the entity is ' and symbol character is '.
  • Ampersand: the entity is & and symbol character is &.
  • Greater Than Sign: the entity is > and symbol character is >.
  • Less Than Sign: the entity is < and symbol character is <.
  • Slash: the entity is and symbol character is /.

Given the input text string to the HTML parser, you have to implement the entity parser.

Return the text after replacing the entities by the special characters.

Example 1:

Input: text = "&amp; is an HTML entity but &ambassador; is not."
Output: "& is an HTML entity but &ambassador; is not."
Explanation: The parser will replace the &amp; entity by &

Example 2:

Input: text = "and I quote: &quot;...&quot;"
Output: "and I quote: \"...\""

Example 3:

Input: text = "Stay home! Practice on Leetcode :)"
Output: "Stay home! Practice on Leetcode :)"

Example 4:

Input: text = "x &gt; y &amp;&amp; x &lt; y is always false"
Output: "x > y && x < y is always false"

Example 5:

Input: text = "leetcode.com&frasl;problemset&frasl;all"
Output: "leetcode.com/problemset/all"

Constraints:

  • 1 <= text.length <= 10^5
  • The string may contain any possible characters out of all the 256 ASCII characters.

class Solution {
public:
    string entityParser(string text) {
        std::ios_base::sync_with_stdio(false);
        cin.tie(NULL);
        cout.tie(NULL);

        string res;
        int n = text.size();
        int pos = 0; 
        while (pos < n) {
            if (text[pos] == '&') {
                if (text.substr(pos, 6) == "&quot;") {
                    res += "\"";
                    pos += 6;
                }
                else if (text.substr(pos, 6) == "&apos;") {
                    res += "'";
                    pos += 6;
                }
                else if (text.substr(pos, 5) == "&amp;") {
                    res.push_back('&');
                    pos += 5;
                }
                else if (text.substr(pos, 4) == "&gt;") {
                    res.push_back('>');
                    pos += 4;
                }
                else if (text.substr(pos, 4) == "&lt;") {
                    res.push_back('<'); pos += 4;
                }
                else if (text.substr(pos, 7) == "&frasl;") {
                    res.push_back('/'); pos += 7;
                }
                else {
                    res.push_back('&'); ++pos;
                }
            }
            else {
                res.push_back(text[pos]);
                ++pos;
            }
        }

        return res;
    }
};

如果之前关注过《C++ Primer》校审的叶劲峰老师的GitHub,其中就有一个关于json parser的小练习,接触过再来看这道题就不会觉得很陌生。

这个非常类似于写一个json parser的思路,写json parser每次只需要根据字符串的首字母就可以很方便的确定后续内容,这个题目同理,发现每一个特殊字符的第一个字符的开头都是&,所以就可以据此判断。另外为什么不选取&后一个字符作为进一步判断,因为比如&apos;&amp;,就需要额外判断两个字符,代码写起来不是很优雅,所以直接substr

另外以HTML为背景的题目

  • POJ 2271 HTML