首页 / PYTHON / python-如何使用elasticsearch.helpers.streaming_bulk

python-如何使用elasticsearch.helpers.streaming_bulk

内容导读

互联网集市收集整理的这篇技术教程文章主要介绍了python-如何使用elasticsearch.helpers.streaming_bulk，小编现在分享给大家，供广大互联网技能从业者学习和参考。文章包含2175字，纯文字阅读大概需要4分钟。

内容图文

python-如何使用elasticsearch.helpers.streaming_bulk

有人可以建议如何使用函数elasticsearch.helpers.streaming_bulk代替elasticsearch.helpers.bulk将数据索引到elasticsearch中.

如果我只是简单地更改streaming_bulk而不是批量,则不会索引任何内容,因此我猜它需要以其他形式使用.

下面的代码从CSV文件中以500个元素的块创建索引,类型和索引数据,并进入elasticsearch.它工作正常,但是我在徘徊是否可以提高性能.这就是为什么我想尝试streaming_bulk函数的原因.

目前,我需要10分钟才能为200MB的CSV文档索引100万行.我使用两台机器,Centos 6.6,8个CPU,x86_64,CPU MHz：2499.902,内存：15.574G.
不确定它可以更快地进行.

es = elasticsearch.Elasticsearch([{'host': 'uxmachine-test', 'port': 9200}])
index_name = 'new_index'
type_name = 'new_type'
mapping = json.loads(open(config["index_mapping"]).read()) #read mapping from json file

es.indices.create(index_name)
es.indices.put_mapping(index=index_name, doc_type=type_name, body=mapping)

with open(file_to_index, 'rb') as csvfile:
    reader = csv.reader(csvfile)        #read documents for indexing from CSV file, more than million rows
    content = {"_index": index_name, "_type": type_name}
    batch_chunks = []
    iterator = 0

    for row in reader:
        var = transform_row_for_indexing(row,fields, index_name, type_name,id_name,id_increment)
        id_increment = id_increment + 1
        #var = transform_row_for_indexing(row,fields, index_name, type_name)
        batch_chunks.append(var)
        if iterator % 500 == 0:
            helpers.bulk(es,batch_chunks)
            del batch_chunks[:]
            print "ispucalo batch"
        iterator = iterator + 1
    # indexing of last batch_chunk
    if len(batch_chunks) != 0:
        helpers.bulk(es,batch_chunks)

解决方法:

因此,流式批量返回一个interator.这意味着除非您开始对其进行迭代,否则将不会发生任何事情. “批量”功能的代码如下所示：

success, failed = 0, 0

# list of errors to be collected is not stats_only
errors = []

for ok, item in streaming_bulk(client, actions, **kwargs):
    # go through request-reponse pairs and detect failures
    if not ok:
        if not stats_only:
            errors.append(item)
        failed += 1
    else:
        success += 1

return success, failed if stats_only else errors

因此,基本上只调用streaming_bulk(client,actions,** kwargs)实际上不会做任何事情.直到像本for循环中那样对它进行迭代,索引才真正开始发生.

因此,在您的代码中.欢迎您将“批量”更改为“ streaming_bulk”,但是您需要遍历流批量的结果才能真正索引任何内容.

内容总结

以上是互联网集市为您收集整理的python-如何使用elasticsearch.helpers.streaming_bulk全部内容，希望文章能够帮你解决python-如何使用elasticsearch.helpers.streaming_bulk所遇到的程序开发问题。如果觉得互联网集市技术教程内容还不错，欢迎将互联网集市网站推荐给程序员好友。

内容备注

版权声明：本文内容由互联网用户自发贡献，该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容，请发送邮件至 gblab@vip.qq.com 举报，一经查实，本站将立刻删除。

内容手机端

扫描二维码推送至手机访问。

本文链接：https://qyyshop.com/info/693687.html

来源：【匿名】

【上一篇】python-BaggingClassifier使用的分类器的调整参数【下一篇】浅谈PHP运行Python脚本的方法

更多 ►

【python-如何使用elasticsearch.helpers.streaming_bulk】教程文章相关的互联网学习教程文章

Python Elasticsearch API操作ES集群【代码】

环境 Centos 7.4Python 2.7 Pip 2.7 MySQL-python 1.2.5 Elasticsearc 6.3.1Elasitcsearch6.3.2知识点调用Python Elasticsearh API Python Mysqldb使用DSL查询与聚合Pyehon 列表操作代码#!/usr/bin/env python # -*- coding: utf-8 -*- #minyt 2018.9.1 #获取24小时内出现的模块次数 # 该程序通过elasticsearch python client 获取相关精简数据，可以计算请求数、超时数、错误数、正确率、错误率等等 import MySQLdb from elast...

Python日期存入elasticsearch的坑

今天在消费kafka数据到elasticsearch（以下简称es）中的时候遇到一个问题，也是一个坑，折腾了半天，后来发现得来全不费工夫，全是白忙活啊！！！问题如下：kafka数据中有一个字段是时间字符串，格式为：‘2018-05-31 16：33:45：234’为了方便以后根据日期检索数据，es里已将该字段设置为date类型，接下来便开始了str到date转换的艰辛之路......尝试过的解决办法如下：1、使用datetime将str转换为datetime对象输出结果如下：...

第17篇-使用Python的初学者Elasticsearch教程

我的Elasticsearch系列文章，逐渐更新中，欢迎关注0A.关于Elasticsearch及实例应用00.Solr与ElasticSearch对比01.ElasticSearch能做什么？02.Elastic Stack功能介绍03.如何安装与设置Elasticsearch API04.如果通过elasticsearch的head插件建立索引_CRUD操作05.Elasticsearch多个实例和head plugin使用介绍06.当Elasticsearch进行文档索引时，它是怎样工作的？07.Elasticsearch中的映射方式—简洁版教程08.Elasticsearch中的分析和分...

python下的Elasticsearch操作【代码】

导入包from elasticsearch import Elasticsearch本地连接es = Elasticsearch([‘127.0.0.1:9200‘])创建索引es.indices.create(index="python_es01",ignore=400)ingore=400 ingore是忽略的意思，400是未找到删除索引es.indices.delete(index="python_es01")检查索引是否存在es.indices.exists(index="python_es01")插入数据es.index(index="python_es01",doc_type="doc",id=1,body={"name":"kitty","age":50})同时也可以不加id,即e...

四十六 Python分布式爬虫打造搜索引擎Scrapy精讲—elasticsearch(搜索引擎)scrapy写入数据到elasticsearch中【代码】【图】

前面我们讲到的elasticsearch(搜索引擎)操作，如：增、删、改、查等操作都是用的elasticsearch的语言命令，就像sql命令一样，当然elasticsearch官方也提供了一个python操作elasticsearch(搜索引擎)的接口包，就像sqlalchemy操作数据库一样的ORM框，这样我们操作elasticsearch就不用写命令了，用elasticsearch-dsl-py这个模块来操作，也就是用python的方式操作一个类即可 elasticsearch-dsl-py下载下载地址：https://github.com/ela...

使用Python操作Elasticsearch数据索引的教程【代码】

Elasticsearch是一个分布式、Restful的搜索及分析服务器，Apache Solr一样，它也是基于Lucence的索引服务器，但我认为Elasticsearch对比Solr的优点在于：轻量级：安装启动方便，下载文件之后一条命令就可以启动； Schema free：可以向服务器提交任意结构的JSON对象，Solr中使用schema.xml指定了索引结构；多索引文件支持：使用不同的index参数就能创建另一个索引文件，Solr中需要另行配置；分布式：Solr Cloud的配置...

python 监控elasticsearch集群状态并推送到openfalcon

#!/usr/bin/python #! --*-- coding:utf-8 --*--import requests import time import json import sys import commandsts = int(time.time()) #print ts cmd = 'curl 172.31.0.92:9200/_cat/health' (a, b) = commands.getstatusoutput(cmd) #print b status= b.split(' ')[157] if status=='green': healthy=3 elif status=='yellow': healthy=2 elif status=='red': healthy=1 else: healthy=0#print healthypa...

Elasticsearch --- 3. ik中文分词器, python操作es【代码】【图】

一.IK中文分词器　　1.下载安装官网地址　　2.测试 #显示结果{"tokens" : [{"token" : "上海","start_offset" : 0,"end_offset" : 2,"type" : "CN_WORD","position" : 0},{"token" : "自来水","start_offset" : 2,"end_offset" : 5,"type" : "CN_WORD","position" : 1},{"token" : "自来","start_offset" : 2,"end_offset" : 4,"type" : "CN_WORD","position" : 2},{"token" : "水","start_offset" : 4,"end_offset" : 5,"type...

python3 elasticsearch6.4 创建索引插入数据

es_client = Elasticsearch(["localhost:9200"]) 创建索引 es_client.indices.create(index=log_index, ignore=400) 创建索引（如果此索引不存在则会创建）并且插入数据 body = {"func_info":"删除日志", "error_info":"id为空", "write_date":datetime.datetime.now()} 创建索引（如果此索引不存在则会创建）并且可以批量插入数据，此效率比较高 body1 = {"func_info":"删除日志", "error_info":"id为空", "write_date":dat...

安装ElasticSearch搜索工具并配置Python驱动的方法【图】

ElasticSearch是一个基于Lucene的搜索服务器。它提供了一个分布式多用户能力的全文搜索引擎，基于RESTful web接口。Elasticsearch是用Java开发的，并作为Apache许可条款下的开放源码发布，是第二流行的企业搜索引擎。设计用于云计算中，能够达到实时搜索，稳定，可靠，快速，安装使用方便。我们建立一个网站或应用程序，并要添加搜索功能，令我们受打击的是：搜索工作是很难的。我们希望我们的搜索解决方案要快，我们希望有一个零...

使用Python操作Elasticsearch数据索引的教程【图】

Docker&Java&Mysql&Python3&Supervisor&Elasticsearch安装【代码】

目录docker安装java安装mysql安装python3安装supervisor安装ElasticSearch打包imagesdockeryum install docker systemctl start docker systemctl enable docker docker pull centos 执行 yum install等待很漫长。如果报错：Cannot set property TasksAccounting, or unknown property安装javacp jdk*.tar.gz到容器中，解压 vi /etc/profile export JAVA_HOME=/usr/local/java8 export PATH=$JAVA_HOME/bin:$PATH export CLASSPATH...

使用 python 收集 kubernetes 集群的 events 并写入 elasticsearch【代码】【图】

from kubernetes import client, config, watch from elasticsearch import Elasticsearch import arrow import sys import requests import jsondingding_webhook = "https://oapi.dingtalk.com/robot/send?access_token=xxxxx"hosts = ['172.16.21.39:9200','172.16.21.40:9200','172.16.21.41:9200' ]def send_text(content):data = {"msgtype": "text","text": {"content": content}}requests.post(url=dingding_webhook, json...

elasticsearch基础及python操作es【代码】

1. 删除 POST http://ip:port/索引/类型/_delete_by_query 条件：删除所有query语句匹配上的 {"query": {"match": {"name": "根据name条件删除"}}} 2. 修改 1. 根据条件更新 POST http://127.0.0.1:9200/索引/类型/id/_update_by_query{"script": {"source": "ctx._source['user_name']='csdn';ctx._source['age']=12;"},"query": {"term": {"user_id": 60}} } term：条件的意思执行上面的query，意思是把，索引/类型下的所有符合...

ELasticSearch和python对接快速使用指南【代码】【图】

项目环境 window10 + python3.7 + ElasticSearch6.3.2 写作目的在一个基于python web系统的开发过程中，被指定要求使用elasticSearch数据库，但自身仅有MySQL使用经验，没有dsl语句的编写经验，其学习成本又比较高，最后是发现了一个可以将SQL语句转DSL语句，才得以快速完成项目需求。特此记录踩坑历程。使用过程 1.安装ElasticSearch 总的来说，需要先配置好Java环境，配置好环境变量即可。详参——>elasticsearch安装与使用，此...

首页 / PYTHON / python-如何使用elasticsearch.helpers.streaming_bulk

python-如何使用elasticsearch.helpers.streaming_bulk

内容导读

内容图文

内容总结

内容备注

内容手机端

【python-如何使用elasticsearch.helpers.streaming_bulk】教程文章相关的互联网学习教程文章

Python Elasticsearch API操作ES集群【代码】

Python日期存入elasticsearch的坑

第17篇-使用Python的初学者Elasticsearch教程

python下的Elasticsearch操作【代码】

四十六 Python分布式爬虫打造搜索引擎Scrapy精讲—elasticsearch(搜索引擎)scrapy写入数据到elasticsearch中【代码】【图】

使用Python操作Elasticsearch数据索引的教程【代码】

python 监控elasticsearch集群状态并推送到openfalcon

Elasticsearch --- 3. ik中文分词器, python操作es【代码】【图】

python3 elasticsearch6.4 创建索引插入数据

安装ElasticSearch搜索工具并配置Python驱动的方法【图】

使用Python操作Elasticsearch数据索引的教程【图】

Docker&Java&Mysql&Python3&Supervisor&Elasticsearch安装【代码】

使用 python 收集 kubernetes 集群的 events 并写入 elasticsearch【代码】【图】

elasticsearch基础及python操作es【代码】

ELasticSearch和python对接快速使用指南【代码】【图】

PYTHON - 相关标签

SEARCH - 相关标签

TR - 相关标签

PYTHON - 技术教程分类

PYTHON - 最新教程

PYTHON - 最热教程