Solr5.3.1整合IKAnalyzer
内容导读
互联网集市收集整理的这篇技术教程文章主要介绍了Solr5.3.1整合IKAnalyzer,小编现在分享给大家,供广大互联网技能从业者学习和参考。文章包含5047字,纯文字阅读大概需要8分钟。
内容图文
![Solr5.3.1整合IKAnalyzer](/upload/InfoBanner/zyjiaocheng/1232/b1f536228e584048b490ca30aa7fbc6d.jpg)
由于solr5.3.1本身不支持中文分词,而msseg4j的分词效果不明显。因而采用IK进行分词,然而参考http://www.superwu.cn/2015/05/08/2134/在google上下载的jar包放到solr目录下直接报如下异常。
严重: Servlet.service() for servlet [default] in context with path [/solr] threw exception [Filter execution threw an exception] with root cause java.lang.AbstractMethodError at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:179) at org.apache.solr.handler.AnalysisRequestHandlerBase.analyzeValue(AnalysisRequestHandlerBase.java:91) at org.apache.solr.handler.FieldAnalysisRequestHandler.analyzeValues(FieldAnalysisRequestHandler.java:221) at org.apache.solr.handler.FieldAnalysisRequestHandler.handleAnalysisRequest(FieldAnalysisRequestHandler.java:182) at org.apache.solr.handler.FieldAnalysisRequestHandler.doAnalysis(FieldAnalysisRequestHandler.java:102) at org.apache.solr.handler.AnalysisRequestHandlerBase.handleRequestBody(AnalysisRequestHandlerBase.java:63) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:956) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:423) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1079) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:625) at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.doRun(AprEndpoint.java:2522) at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:2511) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Thread.java:745)
一开始以为是配置问题,怎么配都不行。后来看了下源码,发现solr5.3.1中 Luecene的Analyzer接口的createComponents方法把第二个参数去掉了。因此修改源码是在所难免了。源码的修改可参考:http://iamyida.iteye.com/blog/2193513。也可以直接获取改好的源码重新打包即可。
主要修改部分、IKAnalyzer.java
/** * 重载Analyzer接口,构造分词组件 */ @Override protected TokenStreamComponents createComponents(String text) { Reader reader = new BufferedReader(new StringReader(text)); Tokenizer _IKTokenizer = new IKTokenizer(reader , this.useSmart()); returnnew TokenStreamComponents(_IKTokenizer); }
IKTokenizer.java中添加如下构造函数
public IKTokenizer(AttributeFactory factory, boolean useSmart) { super(factory); offsetAtt = addAttribute(OffsetAttribute.class); termAtt = addAttribute(CharTermAttribute.class); typeAtt = addAttribute(TypeAttribute.class); _IKImplement = new IKSegmenter(input , useSmart); }
其它都是一些零零碎碎的修改。可查看修改后的源文件。
新建一个工程(附件中的IK-Analyzer-extra),添加工厂类IKTokenizerFactory,方便程序的扩展和维护。
package org.wltea.analyzer.util; import java.util.Map; import org.apache.lucene.analysis.Tokenizer; import org.apache.lucene.analysis.util.TokenizerFactory; import org.apache.lucene.util.AttributeFactory; import org.wltea.analyzer.lucene.IKTokenizer; public class IKTokenizerFactory extends TokenizerFactory { privateboolean useSmart; public IKTokenizerFactory(Map<String, String> args) { super(args); useSmart = getBoolean(args, "useSmart", false); } @Override public Tokenizer create(AttributeFactory attributeFactory) { Tokenizer tokenizer = new IKTokenizer(attributeFactory,useSmart); return tokenizer; } }
最后是schema.xml中添加如下配置
<fieldType name="text_ik" class="solr.TextField"> <!--索引时候的分词器--> <analyzer type="index"> <tokenizer class="org.wltea.analyzer.util.IKTokenizerFactory" useSmart="true"/> </analyzer> <!--查询时候的分词器--> <analyzer type="query"> <tokenizer class="org.wltea.analyzer.util.IKTokenizerFactory" useSmart="false"/> </analyzer> </fieldType>
最后将IK-Analyzer-5.3.1.jar和IK-Analyzer-extra-5.3.1.jar拷贝至solr项目的lib目录下即可。
另外提醒下各位,IK的源码已经搬迁至这了:http://git.oschina.net/wltea/IK-Analyzer-2012FF/。
工程文件:
http://pan.baidu.com/s/1skv1jCp
http://pan.baidu.com/s/1c1o0gI8
参考文献:
http://iamyida.iteye.com/blog/2220474
http://iamyida.iteye.com/blog/2193513
原文:http://www.cnblogs.com/rwxwsblog/p/5048935.html
内容总结
以上是互联网集市为您收集整理的Solr5.3.1整合IKAnalyzer全部内容,希望文章能够帮你解决Solr5.3.1整合IKAnalyzer所遇到的程序开发问题。 如果觉得互联网集市技术教程内容还不错,欢迎将互联网集市网站推荐给程序员好友。
内容备注
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 gblab@vip.qq.com 举报,一经查实,本站将立刻删除。
内容手机端
扫描二维码推送至手机访问。