首页 / HBASE / FiltersinHBase(orintrarowscanningpartII)

FiltersinHBase(orintrarowscanningpartII)

内容导读

互联网集市收集整理的这篇技术教程文章主要介绍了FiltersinHBase(orintrarowscanningpartII)，小编现在分享给大家，供广大互联网技能从业者学习和参考。文章包含3158字，纯文字阅读大概需要5分钟。

内容图文

FiltersinHBase(orintrarowscanningpartII)

Filters in HBase are a somewhat obscure and under-documented feature. (Even us committers are often not aware of their usefulness - see HBASE-5229, and HBASE-4256... Or maybe it's just me...).

Intras row scanning can be done using ColumnRangeFilter. Other filters such as ColumnPrefixFilter or MultipleColumnPrefixFilter might also be handy for this. All three filters have in common that they can provide scanners (see scanning in hbase) with what I will call "seek hints". These hints allow a scanner to seek to the next column, the next row, or an arbitrary next cell determined by the filter. This is far more efficient than having a dumb filter that is passed each cell and determines whether the cell is included in the result or not.

Many other filters also provide these "seek hints". The exception here are filters that filter on column values, as there is no inherent ordering between column values; these filters need to look at the value for each column.

For example check out this code in MultipleColumnPrefixFilter (ASF 2.0 license):
TreeSet<byte []> lesserOrEqualPrefixes =
(TreeSet<byte []>) sortedPrefixes.headSet(qualifier, true);
if (lesserOrEqualPrefixes.size() != 0) {
byte [] largestPrefixSmallerThanQualifier = lesserOrEqualPrefixes.last();
if (Bytes.startsWith(qualifier, largestPrefixSmallerThanQualifier)) {
return ReturnCode.INCLUDE;
}
if (lesserOrEqualPrefixes.size() == sortedPrefixes.size()) {
return ReturnCode.NEXT_ROW;
} else {
hint = sortedPrefixes.higher(largestPrefixSmallerThanQualifier);
return ReturnCode.SEEK_NEXT_USING_HINT;
}
} else {
hint = sortedPrefixes.first();
return ReturnCode.SEEK_NEXT_USING_HINT;
}
(the is used later to skip ahead to that column prefix)

See how this code snippet allows the filter to

seek to the next row if all prefixes are know to be less or equal the current qualifier (and the largest didn't match the passed column qualifier). Note that a single seek to the next row can potentially skip millions of columns with a single seek operation.
seek to the next larger prefix if there are more prefixes, but the current does not match the qualifier.
seek to the first prefix (the smallest) if none the prefixes are less or equal to the current qualifier.

If you didn't feel like looking at the code, you can take away from this that these filters can be safely and efficiently used in very wide rows. If the filter instead would indicate only INCLUDE or SKIP and be forced to visit/examine every version of every column of every row, it would be inefficient to use for wide rows with hundreds of thousands or millions of columns.

I'm in the process of adding more information for these Filter to the HBase ~~Book~~ Reference Guide.

原文地址：Filters in HBase (or intra row scanning part II), 感谢原作者分享。

内容总结

以上是互联网集市为您收集整理的FiltersinHBase(orintrarowscanningpartII)全部内容，希望文章能够帮你解决FiltersinHBase(orintrarowscanningpartII)所遇到的程序开发问题。如果觉得互联网集市技术教程内容还不错，欢迎将互联网集市网站推荐给程序员好友。

内容备注

版权声明：本文内容由互联网用户自发贡献，该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容，请发送邮件至 gblab@vip.qq.com 举报，一经查实，本站将立刻删除。

内容手机端

扫描二维码推送至手机访问。

本文链接：https://qyyshop.com/info/557304.html

来源：【匿名】

【上一篇】HBase运维碎碎念【下一篇】基于HBase Thrift接口的一些使用问题及相关注意事项的详解

更多 ►

【FiltersinHBase(orintrarowscanningpartII)】教程文章相关的互联网学习教程文章

hbase SingleColumnValueFilter 列不存在无法过滤【代码】

问题描述对一张log表按时间过滤正常数据的话,每行有一个时间戳列timestamp作为操作时间,按这个列值过滤出特定时间段的log信息但是不知怎么的log表中多了一些垃圾数据(不一定是垃圾数据,只是没有timestamp这个字段)。过滤第一天的话会有5800条没有操作时间(timestamp),过滤第二天的时候还是有5800条没有操作时间的,过滤前两天的时候还是5800条。问题分析问题很明显了,就是当某一行没有要过滤的字段时,SingleColumnValueFilter是默认...

HBase Scan Filter 自定义 Comparator 比较器

? ? 最近项目需求需要完善Sqoop的更多功能点，其中一项是将Hbase的数据导出到hdfs或hive，重点是Hbase出来的数据需要支持条件过滤。类似于Sql中的什么 > ,< ,=，主要是针对数字类型的数据过滤等。? ? 研究了关于Hbase的过滤只能通过Filter来进行，其中符合我们条件的Filter有一个：? ? ?SingleColumnValueFilter? ? 这个Filter支持根据字段值进行过滤。? ? 但是Filter 的 Comparator 没有一个支持数字类型比较器，BinaryComparato...

HbaseValueFilter

Hbase ValueFilter用于过滤值 package com.fatkun.filter.comparison;import java.io.IOException;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.hbase.HBaseConfiguration;import org.apache.hadoop.hbase.HColumnDescriptor;impoHbase ValueFilter用于过滤值 package com.fatkun.filter.comparison; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hado...

HbaseQualifierFilter

Hbase QualifierFilter用于过滤qualifier，也就是一个列族里面data:xxx，冒号后面的字符串。 =。= package com.fatkun.filter.comparison;import java.io.IOException;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.hbase.HBaseConHbase QualifierFilter用于过滤qualifier，也就是一个列族里面data:xxx，冒号后面的字符串。 =。= package com.fatkun.filter.comparison; import java.io.IOException; im...

HbaseDependentColumnFilter

Here you have a more complex filter that does not simply filter out data based on directly available information. Rather, it lets you specify a dependent column—or reference column—that controls how other columns are filtered. It uses thHere you have a more complex filter that does not simply filter out data based on directly available information. Rather, it lets you specify a dependent column—...

hbaseRowFilter

RowFilter用于过滤row key Operator Description LESS 小于 LESS_OR_EQUAL 小于等于 EQUAL 等于 NOT_EQUAL 不等于 GREATER_OR_EQUAL 大于等于 GREATER 大于 NO_OP 排除所有 Comparator Description BinaryComparator 使用Bytes.compareTo()比较 BinaryPrefixRowFilter用于过滤row keyOperator DescriptionLESS 小于LESS_OR_EQUAL 小于等于EQUAL 等于NOT_EQUAL 不等于GREATER_OR_EQUAL 大于等于GREATER 大于NO_OP 排除所有Comparato...

HbaseFamilyFilter

FamilyFilter 用于过滤Family package com.fatkun.filter;import java.io.IOException;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.hbase.HBaseConfiguration;import org.apache.hadoop.hbase.HColumnDescriptor;import org.apacFamilyFilter 用于过滤Family package com.fatkun.filter; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HB...

FiltersinHBase(orintrarowscanningpartII)

Filters in HBase are a somewhat obscure and under-documented feature. (Even us committers are often not aware of their usefulness - see HBASE-5229, and HBASE-4256... Or maybe its just me...). Intras row scanning can be done using ColumnRaFilters in HBase are a somewhat obscure and under-documented feature. (Even us committers are often not aware of their usefulness - see HBASE-5229, and HBASE-4256...

java – 如何在HBase上使用带有SingleColumnValueFilter的自定义比较器？【代码】

我试图使用两个SingleColumnValueFilter对象来过滤HBase表中的行,以返回属于该列的长值范围内的所有记录.根据SingleColumnValueFilter的文档,它会对列值进行字典比较,除非您将其传递给自己的比较器. api显示SingleColumnValueFilter以WritableByteArrayComparable作为实现此目的的方法. 我写了一个扩展WritableByteArrayComparable并覆盖compare方法的类.public class LongWritableComparable extends WritableByteArrayComparable...

Spark：如何使用HBase过滤器,例如python-api的QualiferFilter【代码】

我想通过在python-api上使用像QualiferFilter这样的过滤器从HBase获取行.我知道从代码下获取HBase行的方法.host = 'localhost' keyConv = "org.apache.spark.examples.pythonconverters.ImmutableBytesWritableToStringConverter" valueConv = "org.apache.spark.examples.pythonconverters.HBaseResultToStringConverter" conf = {"hbase.zookeeper.quorum": host, "hbase.mapreduce.inputtable": "user","hbase.mapreduce.scan.c...

首页 / HBASE / FiltersinHBase(orintrarowscanningpartII)

FiltersinHBase(orintrarowscanningpartII)

内容导读

内容图文

内容总结

内容备注

内容手机端

【FiltersinHBase(orintrarowscanningpartII)】教程文章相关的互联网学习教程文章

FILTER - 相关标签

SCAN - 相关标签

HBASE - 最新教程

HBASE - 最热教程