首页 / PHP / PHP字索引,性能和合理的结果
PHP字索引,性能和合理的结果
内容导读
互联网集市收集整理的这篇技术教程文章主要介绍了PHP字索引,性能和合理的结果,小编现在分享给大家,供广大互联网技能从业者学习和参考。文章包含5353字,纯文字阅读大概需要8分钟。
内容图文
![PHP字索引,性能和合理的结果](/upload/InfoBanner/zyjiaocheng/729/0e4ffd4d77e644659ead59596a238103.jpg)
我目前正在为搜索功能开发索引器.索引器将处理来自“字段”的数据.
字段看起来像:
Field_id Field_type Field_name Field_Data
- 101 text Name Intel i7
- 102 integer Cores 4 physical, 4 virtual
- 103 select Vendor Intel
- 104 multitext Description The i7 is intel's next gen range of cpus.
索引器将生成以下结果/索引:
Keyword Occurrences
- intel 101, 103, 104
- i7 101, 104
- physical 102
- virtual 102
- next 104
- gen 104
- range 104
- cpus 104 (*)
- cpu 104 (*)
所以它有点看起来很好很好,但是,有些问题我想要理清:
>过滤掉常用词(正如你可能注意到的那样,“”是“的”和“英特尔”在列表中缺失)
>关于“cpus”(复数与单数),最好是使用特定类型(单数或复数),两者还是精确(即“cpus”是不同的“cpu”)?
>继续上一个项目,我如何确定复数(不同的口味:test =>测试fish => fish and leaf => leaves)
>我目前正在使用MySql,我非常关心性能问题;我们有500个类别,我们甚至没有启动该网站
>假设我想使用搜索词“vendor:intel”,其中vendor指定字段名称(field_name),您认为会对sql server产生巨大影响吗?
>搜索限制;我根本不喜欢这个,但这是一种可能性,如果你知道任何变通方法,那就听听吧!
>还有其他一些我可能忘记的问题,如果你发现任何问题,欢迎你对我大喊大叫;-)
>我不需要搜索引擎来抓取链接,事实上,我特别希望它不会抓取链接.
(顺便说一句,我不偏向于英特尔,只是碰巧我拥有一台基于i7的电脑;-))
解决方法:
这是对您原始问题的回应,以及您之后的answer/question.
我之前使用过Sphinx搜索引擎(很久以前,所以我有点生疏了),发现它非常好,即使文档有时有点缺乏??.
我确信还有其他方法可以做到这一点,无论是使用自己的自定义代码,还是使用其他搜索引擎–Sphinx恰好是我使用过的.我并不是说它会按照你想要的方式做你想做的一切,但我有理由相信它能很容易地完成大部分工作,并且比用PHP / MySQL单独编写的任何东西都快得多.
我建议在深入Sphinx documentation之前阅读Build a custom search engine with PHP.如果你认为阅读之后不合适,那就足够了.
在回答您的具体问题时,我将文档中的一些链接与一些相关引用放在一起:
过滤掉常用词(正如你可能注意到的那样,“”是“”和“英特尔”在列表中缺失)
Stopwords are the words that will not
be indexed. Typically you’d put most
frequent words in the stopwords list
because they do not add much value to
search results but consume a lot of
resources to process.
关于“cpus”(复数与单数),最好是使用特定类型(单数或复数),两者还是精确(即“cpus”是不同的“cpu”)?
Word forms are applied after
tokenizing the incoming text by
charset_table rules. They essentialy
let you replace one word with another.
Normally, that would be used to bring
different word forms to a single
normal form (eg. to normalize all the
variants such as “walks”, “walked”,
“walking” to the normal form “walk”).
It can also be used to implement
stemming exceptions, because stemming
is not applied to words found in the
forms list.
继续前面的项目,我如何确定复数(不同的口味:test =>测试fish => fish和leaf => leaves)
Sphinx支持Porter Stemming Algorithm
The Porter stemming algorithm (or
‘Porter stemmer’) is a process for
removing the commoner morphological
and inflexional endings from words in
English. Its main use is as part of a
term normalisation process that is
usually done when setting up
Information Retrieval systems.
假设我想使用搜索词“vendor:intel”,其中vendor指定字段名称(field_name),您认为会对sql server产生巨大影响吗?
A good example for attributes would be
a forum posts table. Assume that only
title and content fields need to be
full-text searchable – but that
sometimes it is also required to limit
search to a certain author or a
sub-forum (ie. search only those rows
that have some specific values of
author_id or forum_id columns in the
SQL table); or to sort matches by
post_date column; or to group matching
posts by month of the post_date and
calculate per-group match counts.This can be achieved by specifying all
the mentioned columns (excluding title
and content, that are full-text
fields) as attributes, indexing them,
and then using API calls to setup
filtering, sorting, and grouping.
您还可以使用5.3. Extended query syntax搜索特定字段(而不是按属性过滤结果):
field search operator:
@vendor intel
搜索引擎如何索引一组字段并将找到的短语/关键字/ etc与特定的字段ID绑定?
On success, Query() returns a result set that contains some of the found matches (as requested by SetLimits()) and additional general per-query statistics. > The result set is a hash (PHP specific; other languages might utilize other structures instead of hash) with the following keys and values:
“matches”:
Hash which maps found document IDs to another small hash containing document weight and attribute values (or an array of the similar small hashes if SetArrayResult() was enabled).“total”:
Total amount of matches retrieved on server (ie. to the server side result set) by this query. You can retrieve up to this amount of matches from server for this query text with current query settings.“total_found”:
Total amount of matching documents in index (that were found and procesed on server).“words”:
Hash which maps query keywords (case-folded, stemmed, and otherwise processed) to a small hash with per-keyword statitics (“docs”, “hits”).“error”:
Query error message reported by searchd (string, human readable). Empty if there were no errors.“warning”:
Query warning message reported by searchd (string, human readable). Empty if there were no warnings.
另见Listing 11和Listing 13 Listing 13.
内容总结
以上是互联网集市为您收集整理的PHP字索引,性能和合理的结果全部内容,希望文章能够帮你解决PHP字索引,性能和合理的结果所遇到的程序开发问题。 如果觉得互联网集市技术教程内容还不错,欢迎将互联网集市网站推荐给程序员好友。
内容备注
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 gblab@vip.qq.com 举报,一经查实,本站将立刻删除。
内容手机端
扫描二维码推送至手机访问。