Python RegEx – 如何处理字符串中的可选部分
内容导读
互联网集市收集整理的这篇技术教程文章主要介绍了Python RegEx – 如何处理字符串中的可选部分,小编现在分享给大家,供广大互联网技能从业者学习和参考。文章包含4535字,纯文字阅读大概需要7分钟。
内容图文
![Python RegEx – 如何处理字符串中的可选部分](/upload/InfoBanner/zyjiaocheng/795/443bbdf279bc4061ad6fe32e530d2958.jpg)
这是我当前使用正则表达式解析来自消防部门寻呼机的消息的源代码.除了pAddress行之外,一切正常.
import re
sInput = '(CUPE123, CUPE124, MTVW211, MTVW215, SUNV5326) ALARM-STRUC (Alarm Type THERMAL SMOKE) (Box 12345) APPLE INC - 1 INFINITE LOOP CUPERTINO. (XStr DE ANZA BLVD/MARIANI AVE) .BUILDING FIRE - SMOKE SHOWING - PERSONS REPORTED. #F987654321'
# Matches truck names using the consistent four uppercase letters followed by three - four numbers.
pTrucks = ','.join(re.findall(r'\w[A-Z]{3}\d[0-9]{2,3}', sInput))
# Matches source and job type using the - as a guide, this section is always proceeded by the trucks on the job
# therefore is always proceeded by a ) and a space. Allows between 3-9 characters either side of the - this is
# to allow such variations as 911-RESC, FAA-AIRCRAFT etc.
pJobSource = ''.join(re.findall(r'\) ([A-Za-z1-9]{2,8}-[A-Za-z1-9]{2,8})', sInput))
# Gets address by starting at (but ignoring) the job source e.g. -RESC and capturing everything until the next . period
# the end of the address section always has a period. Uses ?; to ignore up to two sets of brackets that may appear in
# the string for things such as box numbers or alarm types.
pAddress = ''.join(re.findall(r'-[A-Z1-9]{2,8} (.*?)\. \(', sInput))
# Finds the specified cross streets as they are always within () brackets, each bracket has a space immediately
# before or after and the work XStr is always present.
pCrossStreet = ''.join(re.findall(r' \((XStr.*?)\) ', sInput))
# The job details / description is always contained between two . periods e.g. .42YOM CARDIAC ARREST. each period
# has a space either immediately before or after.
pJobDetails = ''.join(re.findall(r' \.(.*?)\. ', sInput))
# Job number is always in the format #F followed by seven digits. The # is always proceeded by a space. Allowed
# between 1 and 8 digits for future proofing.
pJobNumber = ''.join(re.findall(r' (#F\d{0,7})', sInput))
# Get optional Alarm type which is always presented with a space (Alarm
pAlarmDetails = ''.join(re.findall(r' \((Alarm .*?)\) ', sInput))
# Get optional Box type which is always presented with a space (Box
pBoxDetails = ''.join(re.findall(r' (\(Box .*?\))', sInput))
print "Responding Trucks: " + pTrucks
print "Job Source / Type: " + pJobSource
print "Address: " + pAddress
print "Cross Streets: " + pCrossStreet
print "Job Details: " + pJobDetails
print "Additional Info: " + pAlarmDetails + ", " + pBoxDetails
print "\n\nJob Number: " + pJobNumber
问题是寻呼机输入有两个可选字段
(报警类型*)和(方框*)
取决于工作,可能存在,不存在或两者的组合.目前的代码将返回
Responding Trucks: CUPE123,CUPE124,MTVW211,MTVW215,SUNV5326
Job Source / Type: ALARM-STRUC
Address: (Alarm Type THERMAL SMOKE) (Box 12345) APPLE INC - 1 INFINITE LOOP CUPERTINO
Cross Streets: XStr DE ANZA BLVD/MARIANI AVE
Job Details: BUILDING FIRE - SMOKE SHOWING - PERSONS REPORTED
Additional Info: Alarm Type THERMAL SMOKE, (Box 12345)
Job Number: #F9876543
一切都很完美,除了地址线,它还引入了Alarm类型和Box#.
如何修改RegEx以便将(报警类型)和(框)字段视为可选项?我已经从另一个SO线程尝试了这个,它与当前的sinput字符串完美配合.
pAddress = ''.join(re.findall(r'-[A-Z1-9]{2,8}(?: \(Alarm .*?\))(?: \(Box .*\)) (.*?)\. \(', sInput))
回国
Responding Trucks: CUPE123,CUPE124,MTVW211,MTVW215,SUNV5326
Job Source / Type: ALARM-STRUC
Address: APPLE INC - 1 INFINITE LOOP CUPERTINO
Cross Streets: XStr DE ANZA BLVD/MARIANI AVE
Job Details: BUILDING FIRE - SMOKE SHOWING - PERSONS REPORTED
Additional Info: Alarm Type THERMAL SMOKE, (Box 12345)
Job Number: #F9876543
这是完美的和我想要的结果,但是,当我更改sInput字符串既不包含(报警类型*)或(框*)
sInput = '(CUPE123, CUPE124, MTVW211, MTVW215, SUNV5326) ALARM-STRUC APPLE INC - 1 INFINITE LOOP CUPERTINO. (XStr DE ANZA BLVD/MARIANI AVE) .BUILDING FIRE - SMOKE SHOWING - PERSONS REPORTED. #F987654321'
然后输出在地址字段中不返回任何内容
Responding Trucks: CUPE123,CUPE124,MTVW211,MTVW215,SUNV5326
Job Source / Type: ALARM-STRUC
Address:
Cross Streets: XStr DE ANZA BLVD/MARIANI AVE
Job Details: BUILDING FIRE - SMOKE SHOWING - PERSONS REPORTED
Additional Info: ,
Job Number: #F9876543
我觉得我太近了,只是错过了一些东西……对于这篇长篇文章感到抱歉,可能有点TMI.
TL; DR如何修改pAddress变量的RegEx以忽略(Alarm Type *)和(Box *)字段,无论它们是否存在?
解决方法:
你只需要添加?两个非捕获组的(零或一个匹配)量词.
-[A-Z1-9]{2,8}(?: \(Alarm .*?\))?(?: \(Box .*\))? (.*?)\. \(
现在它应该工作,无论是否存在报警类型和框.
内容总结
以上是互联网集市为您收集整理的Python RegEx – 如何处理字符串中的可选部分全部内容,希望文章能够帮你解决Python RegEx – 如何处理字符串中的可选部分所遇到的程序开发问题。 如果觉得互联网集市技术教程内容还不错,欢迎将互联网集市网站推荐给程序员好友。
内容备注
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 gblab@vip.qq.com 举报,一经查实,本站将立刻删除。
内容手机端
扫描二维码推送至手机访问。