通过网络得到html,并解析出其中网址
内容导读
互联网集市收集整理的这篇技术教程文章主要介绍了通过网络得到html,并解析出其中网址,小编现在分享给大家,供广大互联网技能从业者学习和参考。文章包含4054字,纯文字阅读大概需要6分钟。
内容图文
![通过网络得到html,并解析出其中网址](/upload/InfoBanner/zyjiaocheng/1097/8fa767c2fb5840f392d91c305d736155.jpg)
1 import java.io.BufferedReader; 2 import java.io.InputStream; 3 import java.io.InputStreamReader; 4 import java.net.URL; 5 import java.net.URLConnection; 6 import java.util.ArrayList; 7 import java.util.List; 8 9 public class TestIndex { 10 11 private String rootUrl = "http://localhost/apk/"; 12private String listUrl = rootUrl + "test-index.htm"; 13privatestatic List<String> imageUrlList = new ArrayList<String>(); 14publicstaticvoid main(String args[]){ 15 TestIndex ti = new TestIndex(); 16 ti.getData(); 17 System.out.println(imageUrlList.size()); 18for(int i=0; i<imageUrlList.size();i++){ 19 System.out.println(imageUrlList.get(i)); 20 } 2122 } 2324private InputStream getNetInputStream(String urlStr) 25 { 26try27 { 28 URL url = new URL(urlStr); 29 URLConnection conn = url.openConnection(); 30 conn.connect(); 31 InputStream is = conn.getInputStream(); 32return is; 33 } 34catch (Exception e) 35 { 3637 } 38returnnull; 39 } 40privatevoid getData() { 41try42 { 43 InputStream is = getNetInputStream(listUrl); 44 InputStreamReader isr = new InputStreamReader(is); 45 BufferedReader br = new BufferedReader(isr); 46 String s = null; 47 String html=""; 48while ((s = br.readLine()) != null) 49 { 50 html+=s; 51 } 5253 is.close(); 54 String startStr = "src=\"https://"; 55 String endStr = " width="; 56int start = 0; 57int end = 0; 58int index =0; 59 imageUrlList.clear(); 60while (true) 61 { 62 start = html.indexOf(startStr, index); 63if (start < 0) 64break; 65 index=start; 66 end = html.indexOf(endStr, index); 67 String ss = html.substring(start+5,end-1); 68 imageUrlList.add(ss); 69 index +=ss.length(); 70 } 71 } 72catch (Exception e) 73 { 74// TODO: handle exception75 } 76 } 77 }
解析出htm文件中包含的网址。
结果:
20
https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcRvQgUjsVDBncM3mVIgIyIuE87BnlyJUy2BNsAp8kUoTanrC_css5mVAw
https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcThd8cYjOTmCgYJZxX5ls-xpxaAlH1_yocOSCqI5_7OkL29SNtbCZ7q2Yoj
https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTl-FzKmsppxuwzmTITGCv9uDxmrWr1pG0lw8mUD9wkWIloASxQeBEMnVjz
https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcQWbmiZJIXKHV2IoTBp7zSY6kD5g5VPzVtBTLJYYR5nwTtKi2-0_u93qL4e
https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcSlrLi_GtVgUehU7coFe1eMdrJxPdvS42iTqXkla0g75s31NBfAq2u1LE4
https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcSkrlyGxSs8Dr_7k3MUvoGq1vE45LgHZ0zEhIEdD9LLZiaoMcE7IAqn8ho
https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTu__OUSJ4R4EKBu4jOi2ZAdHohpVQIBy3-SfnI8FYpN8wVC9kJG_aWuk_w
https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcR3Bf7YtsHJ813A5_wWzpxIy4MbEmqz5NLw3qv1nPxOZqVjH7QlY-qYSCg
https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcToB4nJPqVwnzn0xeasnXyhxGgOqHXdypE6KZIMTfV9k52eIrE3iYsA6Ixm
https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcTkKw0LpqdB2eQMUpwdQdvM9DTeNtq1mrvMNivoQtN37p3m0OPsx4ME9i4O
https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSZGzMf_3hmdDktz91yp5ZQi-eGWLCenZ0U446sXT2nqYuwlWRI_V_BVIWi
https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTQF-55T5GM3dLdaoafPdlIYK0ESNvM6-Bsb4-B2rQTeyD5gGoCKxokExM-
https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcRoRjo4TFeXmx47zE6VH0ylcO0IQ2HBsOHYIMJCI9MsRyg_PF1WhHbqG76Q
https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcRrdegt1koEy51dLWrJAbVMJBlCEZ7fPl2mztYYM6onvxocRCq030Ft1gE
https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcTtnQpte0uq9Ue9nsg25GeO1kw_-Hcn69ozTQkiMBHrXKwlANutyhwKD9XM
https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRNRdxzmuFKABoGgyv2SC0gMticosL2LB3V1fBMOwNtVBZxHkyMw4IcWBFj
https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQr40CEf75nWCj5dg-oeKtb9zK6mhktu7vnfoYAh5ioy34goC3c9ptDkQwP
https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcQUnyHrVEbppqhZnWnQrijhBFP0X34gRf7pKw6PdT4ggepB2k9g-p71sgGh
https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcR9Us9qblbTJaw47gULXCI8sHKN4I61gYsT2ijebtZzgsMDI8GmYqQpIIw
https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSIrW-IbBZjM9Ztn60r9QE1_FIMjt494qGX12tqsLsibYPLuFVwyVSgz1I
原文:http://www.cnblogs.com/hixin/p/4158930.html
内容总结
以上是互联网集市为您收集整理的通过网络得到html,并解析出其中网址全部内容,希望文章能够帮你解决通过网络得到html,并解析出其中网址所遇到的程序开发问题。 如果觉得互联网集市技术教程内容还不错,欢迎将互联网集市网站推荐给程序员好友。
内容备注
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 gblab@vip.qq.com 举报,一经查实,本站将立刻删除。
内容手机端
扫描二维码推送至手机访问。