Sliding window
Each 10-letter sequence is a window. We keep the window counts in a map.
Say len(s) = and there are
distinct 10-letter repeated sequences.
Time: , space:
.
class Solution:
def findRepeatedDnaSequences(self, s: str) -> List[str]:
sequence_count = {}
begin = 0
for end, c in enumerate(s):
if end >= 9:
seq = s[begin : end+1]
if seq not in sequence_count:
sequence_count[seq] = 1
else:
sequence_count[seq] += 1
begin += 1
return [seq for seq, count in sequence_count.items() if count > 1]
Since there are four different letters, the window can be considered as a base-4, 10-digit number. Say first window is: . The next window should be:
. In other words,
.
In this way:
- We do not need to do string slice/copy per iteration, saving time.
- In the maps, we save numbers instead of len-10 strings, saving space.
class Solution:
def findRepeatedDnaSequences(self, s: str) -> List[str]:
if len(s) < 10:
return []
digit_of = {
"A": 0,
"C": 1,
"G": 2,
"T": 3,
}
base = 4
window = 0
for i in range(10):
window = base * window + digit_of[s[i]]
window_seq = {window: (0, 9)}
window_count = {window: 1}
msb = pow(base, 9)
lsb = 1
begin = 0
for end in range(10, len(s)):
begin_digit = digit_of[s[begin]]
end_digit = digit_of[s[end]]
window -= msb * begin_digit
window *= base
window += lsb * end_digit
begin += 1
if window in window_count:
window_count[window] += 1
else:
window_count[window] = 1
window_seq[window] = (begin, end)
repeated_seq = []
for window, count in window_count.items():
if count > 1:
begin, end = window_seq[window]
repeated_seq.append(s[begin : end + 1])
return repeated_seq
Leave a comment