LeetCode 271: Encode and Decode Strings

link

If we can mark the end of each word, we can decode. To mark the end of a word, we cannot use a character from the alphabet because we would not know if it is a marker or part of a word.

Characters as numbers

The ord() of a character is a non-negative integer. So, if we encode the ord()‘s, then -1 could serve as a marker for word ends.

Say total character count is $n$ .

Time: $\mathcal{O}(n)$ , space: $\mathcal{O}(n)$ .

class Codec:
    def encode(self, strs: List[str]) -> str:
        """Encodes a list of strings to a single string.
        """
        encoded = []
        for s in strs:
            for c in s:
                encoded.append(ord(c))
            encoded.append(-1)
        
        return ",".join(str(x) for x in encoded)


    def decode(self, s: str) -> List[str]:
        """Decodes a single string to a list of strings.
        """
        decoded = []
        curr = []
        letters = s.split(",")
        for l in letters:
            if l == "-1":
                decoded.append("".join(curr))
                curr = []
                continue
            
            curr.append(chr(int(l)))
        
        return decoded

# Your Codec object will be instantiated and called as such:
# codec = Codec()
# codec.decode(codec.encode(strs))

Append word length

We can append the length of each word in the beginning followed by some marker like “|”. So, the encoded string looks like: $n_1|\texttt{word}_1n_2|\texttt{word}_2\ldots$ . As soon as we see “|”, we know where the next word begins and ends. After processing the word, we have the same pattern again.

class Codec:
    def encode(self, strs: List[str]) -> str:
        """Encodes a list of strings to a single string.
        """
        encoded = []
        for s in strs:
            encoded.append(f"{len(s)}|")
            encoded.append(s)
        
        return "".join(encoded)


    def decode(self, s: str) -> List[str]:
        """Decodes a single string to a list of strings.
        """
        decoded = []
        # We have repeated pattern like: n1|word1
        # `begin` marks the start of pattern
        i = begin = 0
        while i < len(s):
            if s[i] != "|":
                i += 1
                continue

            word_len = int(s[begin:i])
            word_start = i+1
            word_end = word_start+word_len
            decoded.append( s[word_start : word_end] )
            i = begin = word_end
        
        return decoded

# Your Codec object will be instantiated and called as such:
# codec = Codec()
# codec.decode(codec.encode(strs))

unreasonably effective