首页 / 算法 / 经典模式匹配算法总结及实现
经典模式匹配算法总结及实现
内容导读
互联网集市收集整理的这篇技术教程文章主要介绍了经典模式匹配算法总结及实现,小编现在分享给大家,供广大互联网技能从业者学习和参考。文章包含3602字,纯文字阅读大概需要6分钟。
内容图文
前言
读书笔记,整理自 [美] Goodrich et al. 所著《Data Structures and Algorithms in Python》。
模式匹配
模式匹配是数据结构中字符串的一种基本运算场景,给定一个子串,要求在某个字符串中找出与该子串相同的所有子串。尽管早已可以通过 Python 下的 re 库使用正则表达式高效而简洁地实现模式匹配,但了解相关算法背后机理亦不失其学习的意义。
1. 穷举算法
穷举的思想在于从 T 的第一个字符开始遍历每一个字符,依次匹配 P。是最简单,也最低效的匹配方法。
def find_brute(T, P):
"""Return the lowest index of T at which substring P begins (or else -1)."""
n, m = len(T), len(P) # introduce convenient notations
for i in range(n-m+1): # try every potential starting index within T
k = 0 # an index into pattern P
while k < m and T[i + k] == P[k]: # kth character of P matches
k += 1
if k == m: # if we reached the end of pattern,
return i # substring T[i:i+m] matches P
return -1 # failed to find a match starting with any i
2. Boyer-Moore算法
通过跳跃启发式算法避免大量无用的比较。每次逐字符匹配从 P 最后一个字符开始。
def find_boyer_moore(T, P):
"""Return the lowest index of T at which substring P begins (or else -1)."""
n, m = len(T), len(P) # introduce convenient notations
if m == 0: return 0 # trivial search for empty string
last = {} # build 'last' dictionary
for k in range(m):
last[ P[k] ] = k # later occurrence overwrites
# align end of pattern at index m-1 of text
i = m-1 # an index into T
k = m-1 # an index into P
while i < n:
if T[i] == P[k]: # a matching character
if k == 0:
return i # pattern begins at index i of text
else:
i -= 1 # examine previous character
k -= 1 # of both T and P
else:
j = last.get(T[i], -1) # last(T[i]) is -1 if not found
i += m - min(k, j + 1) # case analysis for jump step
k = m - 1 # restart at end of pattern
return -1
3. Knuth-Morris-Pratt算法
穷举算法和 Boyer-Moore 算法在完全匹配中必然进行 len(P) 次匹配,KMP 算法充分利用 P 内部的字符串重叠,做进一步优化。
def find_kmp(T, P):
"""Return the lowest index of T at which substring P begins (or else -1)."""
n, m = len(T), len(P) # introduce convenient notations
if m == 0: return 0 # trivial search for empty string
fail = compute_kmp_fail(P) # rely on utility to precompute
j = 0 # index into text
k = 0 # index into pattern
while j < n:
if T[j] == P[k]: # P[0:1+k] matched thus far
if k == m - 1: # match is complete
return j - m + 1
j += 1 # try to extend match
k += 1
elif k > 0:
k = fail[k-1] # reuse suffix of P[0:k]
else:
j += 1
return -1 # reached end without match
def compute_kmp_fail(P):
"""Utility that computes and returns KMP 'fail' list."""
m = len(P)
fail = [0] * m # by default, presume overlap of 0 everywhere
j = 1
k = 0
while j < m: # compute f(j) during this pass, if nonzero
if P[j] == P[k]: # k + 1 characters match thus far
fail[j] = k + 1
j += 1
k += 1
elif k > 0: # k follows a matching prefix
k = fail[k-1]
else: # no match found starting at j
j += 1
return fail
内容总结
以上是互联网集市为您收集整理的经典模式匹配算法总结及实现全部内容,希望文章能够帮你解决经典模式匹配算法总结及实现所遇到的程序开发问题。 如果觉得互联网集市技术教程内容还不错,欢迎将互联网集市网站推荐给程序员好友。
内容备注
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 gblab@vip.qq.com 举报,一经查实,本站将立刻删除。
内容手机端
扫描二维码推送至手机访问。