(Python)将输出文本文件分解为令牌
内容导读
互联网集市收集整理的这篇技术教程文章主要介绍了(Python)将输出文本文件分解为令牌,小编现在分享给大家,供广大互联网技能从业者学习和参考。文章包含7873字,纯文字阅读大概需要12分钟。
内容图文
![(Python)将输出文本文件分解为令牌](/upload/InfoBanner/zyjiaocheng/662/b4a3dc67fd134d699c262be774cb956d.jpg)
简而言之:我有一个来自系统的输出文件,该文件分成用“ | |;”划分的标记,在该文件上我需要获取管道“ |”之间的内容.并将它们写入另一个文件.
输出文件如下所示:
|Operation_ID|,|Operation_Name|,|business_group_name|,|business_unit_name|,|Program_ID|,|Program_Name|,|Project_ID|,|Project_Name|,|Program_Type_Name|,|Program_Cost_Type_Name|,|Start_date|,|Estimated_End_Date|,|End_Date|,|SQA_Name|,|CMA_Name|,|SSE_Name|,|PMs|,|TLs|,|PortfolioManager|,|Finished|,|Research|,|SQA_ID|,|CMA_ID|,|SSE_ID|
|20|,|XXX|,|YYY|,|ZZZ|,|1|,|WWW|,|2163|,|QQQ|,||,||,|15/12/2008|,||,|22/01/2009|,||,||,||,|EEE EEE |,||,||,|True|,||,||,||,||
|22|,|XXX|,|YYY|,|ZZZ|,|3|,|WWW|,|2165|,|QQQ|,||,||,|01/01/2009|,||,|09/04/2010|,||,||,||,|EEE EEE EEE|,||,||,|True|,|False|,||,||,||
|20|,|XXX|,|YYY|,|ZZZ|,|10|,|WWW|,|2164|,|QQQ|,|Development|,|Direct|,|15/12/2008|,||,|26/02/2010|,||,||,||,|EEE |,|EEE EEE ; EEE EEE ; EEE EEE |,||,|True|,|False|,||,||,||
|22|,|XXX|,|YYY|,|ZZZ|,|3|,|WWW|,|2166|,|QQQ|,||,||,|15/12/2008|,||,|31/05/2010|,||,||,||,||,||,||,|True|,|False|,||,||,||
|20|,|XXX|,|YYY|,|ZZZ|,|10|,|WWW|,|2168|,|QQQ|,|Development|,|Direct|,|05/01/2009|,||,|20/05/2009|,||,||,||,|EEE EEE EEE|,|EEE EEE |,||,|True|,||,||,||,||
|20|,|XXX|,|YYY|,|ZZZ|,|1|,|WWW|,|2169|,|QQQ|,||,||,|13/01/2009|,||,|22/05/2009|,||,||,||,|EEE EEE EEE|,|EEE EEE EEE EEE|,||,|True|,||,||,||,||
|21|,|XXX|,|YYY|,|ZZZ|,|2|,|WWW|,|2174|,|QQQ|,||,||,|08/01/2009|,||,|20/04/2009|,||,||,||,|EEE EEE |,|EEE EEE|,||,|True|,||,||,||,||
|23|,|XXX|,|YYY|,|ZZZ|,|47|,|WWW|,|2176|,|QQQ|,|Internal|,|Indirect|,|21/01/2009|,||,|17/12/2010|,||,||,||,|EEE EEE; EEE EEE|,||,||,|True|,|True|,||,||,||
|20|,|XXX|,|YYY|,|ZZZ|,|1|,|WWW|,|2142|,|QQQ|,||,||,|21/10/2008|,||,|13/05/2009|,||,||,||,|EEE EEE |,||,||,|True|,||,||,||,||
|20|,|XXX|,|YYY|,|ZZZ|,|1|,|WWW|,|2147|,|QQQ|,||,||,|07/11/2008|,||,|26/11/2008|,||,||,||,|EEE EEE EEE EEE |,|EEE EEE |,||,|True|,||,||,||,||
|20|,|XXX|,|YYY|,|ZZZ|,|1|,|WWW|,|2148|,|QQQ|,||,||,|07/11/2008|,||,|09/04/2009|,||,||,||,||,||,||,|True|,||,||,||,||
|22|,|XXX|,|YYY|,|ZZZ|,|3|,|WWW|,|2149|,|QQQ|,||,||,|01/11/2008|,|31/01/2011|,|01/12/2010|,||,||,||,|EEE EEE ; EEE EEE|,|EEE EEE; EEE EEE|,||,|True|,|False|,||,||,||
|22|,|XXX|,|YYY|,|ZZZ|,|20|,|WWW|,|2150|,|QQQ|,|Development|,||,|31/10/2008|,|31/10/2010|,|29/10/2010|,||,||,||,|EEE EEE |,|EEE EEE |,||,|True|,|False|,||,||,||
|20|,|XXX|,|YYY|,|ZZZ|,|1|,|WWW|,|2152|,|QQQ|,||,||,|26/11/2008|,||,|03/07/2009|,||,||,||,|EEE EEE EEE ; EEE EEE EEE EEE |,|EEE EEE |,||,|True|,||,||,||,||
|22|,|XXX|,|YYY|,|ZZZ|,|3|,|WWW|,|2151|,|QQQ|,||,||,|01/11/2008|,||,|29/01/2009|,||,||,||,||,||,||,|True|,||,||,||,||
|23|,|XXX|,|YYY|,|ZZZ|,|47|,|WWW|,|2187|,|QQQ|,|Internal|,|Indirect|,|21/01/2009|,||,|03/12/2009|,||,||,||,|EEE EEE|,|EEE EEE EEE|,||,|True|,|True|,||,||,||
|23|,|XXX|,|YYY|,|ZZZ|,|47|,|WWW|,|2192|,|QQQ|,|Internal|,|Indirect|,|21/01/2009|,||,|11/01/2011|,||,||,||,|EEE EEE EEE; EEE EEE|,||,||,|True|,|True|,||,||,||
|20|,|XXX|,|YYY|,|ZZZ|,|1|,|WWW|,|2196|,|QQQ|,||,||,|23/01/2009|,||,|24/03/2010|,||,||,||,|EEE EEE |,||,||,|True|,|False|,||,||,||
|21|,|XXX|,|YYY|,|ZZZ|,|41|,|WWW|,|2231|,|QQQ|,|Research|,||,|21/05/2009|,||,|01/12/2009|,||,||,||,||,||,||,|True|,|False|,||,||,||
|21|,|XXX|,|YYY|,|ZZZ|,|41|,|WWW|,|2230|,|QQQ|,|Research|,||,|21/05/2009|,||,|30/11/2009|,||,||,||,||,||,||,|True|,|False|,||,||,||
|21|,|XXX|,|YYY|,|ZZZ|,|41|,|WWW|,|2232|,|QQQ|,|Research|,||,|21/05/2009|,||,|09/07/2010|,||,||,||,||,|EEE EEE EEE|,||,|True|,|True|,||,||,||
|24|,|XXX|,|YYY|,|ZZZ|,|44|,|WWW|,|2237|,|QQQ|,|Research|,|Indirect|,|21/05/2009|,||,|22/01/2010|,||,||,||,||,||,||,|True|,|False|,||,||,||
|21|,|XXX|,|YYY|,|ZZZ|,|41|,|WWW|,|2238|,|QQQ|,|Research|,||,|21/05/2009|,||,|25/02/2010|,||,||,||,||,||,||,|True|,|False|,||,||,||
|21|,|XXX|,|YYY|,|ZZZ|,|41|,|WWW|,|2239|,|QQQ|,|Research|,||,|21/05/2009|,||,|04/01/2011|,||,||,||,||,||,||,|True|,|True|,||,||,||
|21|,|XXX|,|YYY|,|ZZZ|,|41|,|WWW|,|2240|,|QQQ|,|Research|,||,|21/05/2009|,||,|05/01/2011|,||,||,||,||,||,||,|True|,|True|,||,||,||
|26|,|XXX|,|YYY|,|ZZZ|,|50|,|WWW|,|2242|,|QQQ|,|Internal|,|Indirect|,|21/05/2009|,||,|14/10/2010|,||,||,||,||,||,||,|True|,|True|,||,||,||
|22|,|XXX|,|YYY|,|ZZZ|,|3|,|WWW|,|2273|,|QQQ|,||,||,|25/05/2009|,||,|29/01/2010|,||,||,||,||,|EEE EEE|,||,|True|,|False|,||,||,||
我是python / programming的新手,所以我尝试编写以下算法:
# => Reads the file test.txt;
# => Scans character by character for '|' character;
# => If character '|' is found, skips to next character and add subsequent
# characters to a 'token' array, until next character is '|' again;
# => When next character is '|', add 'token' array to 'array_of_tokens';
# => Once END OF FILE arrives, writes 'array_of_tokens' to 'test_output.txt'
# file;
test_file = 'test.txt'
test_output = 'test_output.txt'
token = []
array_of_tokens = []
index = 0
# => Reads the file test.txt;
with open(test_file) as file:
while True:
# => Scans character by character for '|' character;
character = file.read(1)
# => If character '|' is found,
if character == "|"
# skips to next character
character = next(character),
# until next character is '|' again;
while not character == '|'
# add subsequent characters to a 'token' array
token(index) = character
index ++
character = next(character)
# => When next character is '|', add 'token' array to 'array_of_tokens';
if next(character) == '|'
array_of_tokens = token
else if not character:
break
print "Read a character: ", character
# => Once END OF FILE arrives, writes 'array_of_tokens' to 'test_output.txt'
# file;
test_output.write(str(array_of_tokens))
它显然没有用.问题是,我不确定自己现在应该做什么,我知道我需要的结果(写在注释中),但是我不确定如何使代码正常工作.有人可以帮忙吗?另外,如果有什么技巧可以找到建议/资源,我可以研究成为一名更好的程序员,成为一个真正的程序员,我将深表感激!
提前致谢!
解决方法:
只需使用str.translate除去|,在上拆分并过滤空字符串即可:
In [9]: s="|22|,|XXX|,|YYY|,|ZZZ|,|3|,|WWW|,|2273|,|QQQ|,||,||,|25/05/2009|,||,|29/01/2010|,||,||,||,||,|EEE EEE|,||,|True|,|False|,||,||,||"
In [10]: print(filter(None,s.translate(None,"|").split(",")))
['22', 'XXX', 'YYY', 'ZZZ', '3', 'WWW', '2273', 'QQQ', '25/05/2009', '29/01/2010', 'EEE EEE', 'True', 'False']
如果您需要将数据与列对齐,请不要过滤.
因此,根据您要将数据写入输出文件的方式,只需要使用输入就可以像下面这样:
with open("test.txt") as f, open('test_output.txt',"w") as out:
wr = csv.writer(out, delimiter=",")
for line in f:
wr.writerow(filter(None, line.rstrip().translate(None, "|").split(",")))
您的输出将是:
Operation_ID,Operation_Name,business_group_name,business_unit_name,Program_ID,Program_Name,Project_ID,Project_Name,Program_Type_Name,Program_Cost_Type_Name,Start_date,Estimated_End_Date,End_Date,SQA_Name,CMA_Name,SSE_Name,PMs,TLs,PortfolioManager,Finished,Research,SQA_ID,CMA_ID,SSE_ID
20,XXX,YYY,ZZZ,1,WWW,2163,QQQ,15/12/2008,22/01/2009,EEE EEE ,True
22,XXX,YYY,ZZZ,3,WWW,2165,QQQ,01/01/2009,09/04/2010,EEE EEE EEE,True,False
20,XXX,YYY,ZZZ,10,WWW,2164,QQQ,Development,Direct,15/12/2008,26/02/2010,EEE ,EEE EEE ; EEE EEE ; EEE EEE ,True,False
22,XXX,YYY,ZZZ,3,WWW,2166,QQQ,15/12/2008,31/05/2010,True,False
20,XXX,YYY,ZZZ,10,WWW,2168,QQQ,Development,Direct,05/01/2009,20/05/2009,EEE EEE EEE,EEE EEE ,True
20,XXX,YYY,ZZZ,1,WWW,2169,QQQ,13/01/2009,22/05/2009,EEE EEE EEE,EEE EEE EEE EEE,True
etc.................
正如tdelaney在评论中提到的那样,这假定您在管道中没有任何管道.
对于python3,我们需要做更多的工作,因为str.translate略有不同.我们需要使用str.maketrans创建一个表:
import csv
with open("test.txt") as f, open('test_output.txt', "w") as out:
wr = csv.writer(out, delimiter=",")
table = str.maketrans("|",",")
for line in f:
wr.writerow(list(filter(None, line.rstrip().translate(table).split(","))
另一种方法是仅分割“ |”并过滤逗号和空字符串:
with open("in.txt") as f, open('test_output.txt', "w") as out:
wr = csv.writer(out, delimiter=",")
for line in f:
wr.writerow(filter(lambda x: x not in {",",""},line.rstrip().split("|")))
内容总结
以上是互联网集市为您收集整理的(Python)将输出文本文件分解为令牌全部内容,希望文章能够帮你解决(Python)将输出文本文件分解为令牌所遇到的程序开发问题。 如果觉得互联网集市技术教程内容还不错,欢迎将互联网集市网站推荐给程序员好友。
内容备注
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 gblab@vip.qq.com 举报,一经查实,本站将立刻删除。
内容手机端
扫描二维码推送至手机访问。