Elasticsearch 索引文档
admin
2023-01-27 16:37:46
0

内容主要通过翻译官方文档而来,版本7.10

  1. 索引文档操作(通过curl实现)

curl -X PUT "localhost:9200/twitter/_doc/1" -H 'Content-Type: application/json' -d'

{

    "user" : "kimchy",

    "post_date" : "2009-11-15T14:12:12",

    "message" : "trying out Elasticsearch"

}

'


-X 选项: 指定curl的请求操作,默认是GET,也可以是PUT POST DELETE

-H 选项: 传入请求头

-d 选项: data,数据内容选项


不存在索引时,会自动创建。当然可以进行设置(通过action.auto_create_index)。

PUT _cluster/settings

{

    "persistent": {

        "action.auto_create_index": "twitter,index10,-index1*,+ind*" 

    }

}

注: 名称为twitter,index10的索引会创建,不符合index1*格式,但符合ind*格式也会被创建。


PUT _cluster/settings

{

    "persistent": {

        "action.auto_create_index": "false" 

    }

}

注: 默认全部不自动创建。会提示错误。如例子: {"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index [mytwitter]"


PUT _cluster/settings

{

    "persistent": {

        "action.auto_create_index": "true" 

    }

}

注: 默认全部自动创建


默认的MAPPING规则,一个索引下只允许有一个type.

例如试图创建第二个名为mydoc的type:

curl -X PUT "localhost:9200/twitter/mydoc/1" -H 'Content-Type: application/json' -d'

{

    "user" : "kimchy",

    "post_date" : "2009-11-15T14:12:12",

    "message" : "trying out Elasticsearch"

}

'

会产生报错:{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Rejecting mapping update to [twitter] as the final mapping would have more 


than 1 type: [_doc, mydoc]"}],"type":"illegal_argument_exception","reason":"Rejecting mapping update to [twitter] as the final mapping would have more than 1 


type


2 索引文档的op_type选项(只允许新建,不允许更新文档):

curl -X PUT "localhost:9200/twitter/_doc/1?op_type=create" -H 'Content-Type: application/json' -d'

{

    "user" : "kimchy",

    "post_date" : "2009-11-15T14:12:12",

    "message" : "trying out Elasticsearch"

}

'

如果索引文档twitter/_doc/1已经存在,创建就会失败。


与上面等价的写法:

curl -X PUT "localhost:9200/twitter/_create/1" -H 'Content-Type: application/json' -d'

{

    "user" : "kimchy",

    "post_date" : "2009-11-15T14:12:12",

    "message" : "trying out Elasticsearch"

}

'


文档ID的自动生成:

如果没有指定文档ID,系统会自动生成一个唯一ID(索引该文档理论肯定是新创建的,不会更新其他文档):

例:

curl -X POST "localhost:9200/twitter/_doc/" -H 'Content-Type: application/json' -d'

{

    "user" : "mjj",

    "post_date" : "2009-11-15T14:12:12",

    "message" : "test Elasticsearch"

}

'

返回结果(部分): {"_index":"twitter","_type":"_doc","_id":"olLK42oBqV8-hMggVV3X"


3 乐观的并发控制:

Optimistic concurrency controledit

Index operations can be made conditional and only be performed if the last modification to the document was assigned the sequence number and primary term 


specified by the if_seq_no and if_primary_term parameters. If a mismatch is detected, the operation will result in a VersionConflictException and a status 


code of 409. See Optimistic concurrency control for more details.

索引文档结束后,返回结果中会包含一个序号:_seq_no。索引文档前会获取下一个序号,作为自己的序号,结束后会再获取序号,进行比较。如果序号不一致,说明有其他程序索


引了文档。那么该次操作就返回409号错误。


4. Routing(文档存放于那个物理shard)

By default, shard placement ? or routing ? is controlled by using a hash of the document’s id value. For more explicit control, the value fed into the hash 


function used by the router can be directly specified on a per-operation basis using the routing parameter. For example:


POST twitter/_doc?routing=kimchy

{

    "user" : "kimchy",

    "post_date" : "2009-11-15T14:12:12",

    "message" : "trying out Elasticsearch"

}


In the example above, the "_doc" document is routed to a shard based on the routing parameter provided: "kimchy".


When setting up explicit mapping, the _routing field can be optionally used to direct the index operation to extract the routing value from the document 


itself. This does come at the (very minimal) cost of an additional document parsing pass. If the _routing mapping is defined and set to be required, the 


index operation will fail if no routing value is provided or extracted.

默认情况下,系统通过对文档id进行hash运算,确定存放于具体的shard。但我们也可以通过指定routing参数,使hash函数对所提过的参数值进行运算,确定shard。

而且,还可以在mapping中,通过设置_routing字段来指示用文档中的哪个值来进行hash运算。但是如果索引的文档中没有包含mapping设置中的字段,将会产生报错。


5 Wait For Active Shards

默认设置下,primary shard索引完文档就完成了操作。

但可以通过index.write.wait_for_active_shards调整,确保有多个shard已保存了变更,默认该值为1(primay shard也算1个shard)。

如果设置为2,表示primary shard完成索引后,还要复制一份变更到另一个replica shard才行,replica shard完成前就需要等待。

如果index.write.wait_for_active_shards设置成all,就是所有num of shards+1. 索引操作需要有新的节点加入才能完成。

number_of_replicas数表示所需的replican shards. 但active shards包含primary shard.

例子:

For example, suppose we have a cluster of three nodes, A, B, and C and we create an index index with the number of replicas set to 3 (resulting in 4 shard 


copies, one more copy than there are nodes). If we attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is 


available before proceeding. This means that even if B and C went down, and A hosted the primary shard copies, the indexing operation would still proceed 


with only one copy of the data. If wait_for_active_shards is set on the request to 3 (and all 3 nodes are up), then the indexing operation will require 3 


active shard copies before proceeding, a requirement which should be met because there are 3 active nodes in the cluster, each one holding a copy of the 


shard. However, if we set wait_for_active_shards to all (or to 4, which is the same), the indexing operation will not proceed as we do not have all 4 copies 


of each shard active in the index. The operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.

 

6. Noop updates 空更新


When updating a document using the index API a new version of the document is always created even if the document hasn’t changed. If this isn’t acceptable 


use the _update API with detect_noop set to true. This option isn’t available on the index API because the index API doesn’t fetch the old source and isn’


t able to compare it against the new source.


There isn’t a hard and fast rule about when noop updates aren’t acceptable. It’s a combination of lots of factors like how frequently your data source 


sends updates that are actually noops and how many queries per second Elasticsearch runs on the shard receiving the updates.

当通过Index API更新一个文档时,无论内容有没有被实际更改,都会创建version。如果这是不可接受的,就要使用_update API,并设置空操作检测选项(detect_noop)设置成


true.这个选项在index API中不存在,因为index API不会去获取旧的数据与新的数据比对。


7 Timeout

The primary shard assigned to perform the index operation might not be available when the index operation is executed. Some reasons for this might be that 


the primary shard is currently recovering from a gateway or undergoing relocation. By default, the index operation will wait on the primary shard to become 


available for up to 1 minute before failing and responding with an error. The timeout parameter can be used to explicitly specify how long it waits. Here is 


an example of setting it to 5 minutes:

curl -X PUT "localhost:9200/twitter/_doc/1?timeout=5m" -H 'Content-Type: application/json' -d'

{

    "user" : "kimchy",

    "post_date" : "2009-11-15T14:12:12",

    "message" : "trying out Elasticsearch"

}

'

如果primary出现异常,不能完成索引文档操作,系统就会等待,默认情况是等待一分钟,仍然异常就会超时报错。以上有设置超时时间为5分钟。


8 Versioning

Each indexed document is given a version number. By default, internal versioning is used that starts at 1 and increments with each update, deletes included. 


Optionally, the version number can be set to an external value (for example, if maintained in a database). To enable this functionality, version_type should 


be set to external. The value provided must be a numeric, long value greater than or equal to 0, and less than around 9.2e+18.


When using the external version type, the system checks to see if the version number passed to the index request is greater than the version of the currently 


stored document. If true, the document will be indexed and the new version number used. If the value provided is less than or equal to the stored document’s 


version number, a version conflict will occur and the index operation will fail. For example:

curl -X PUT "localhost:9200/twitter/_doc/1?version=2&version_type=external" -H 'Content-Type: application/json' -d'

{

    "message" : "elasticsearch now has versioning support, double cool!"

}

'


索引文档版本控制。

每个索引文档都有个版本号,默认由ES自内部制,从1开始,更新和删除操作会增加版本序号。

版本也可以由外部系统控制,通过version_type设置为external和给定version值。如果给定的版本号大于当前版本号,会报错。手动指定版本号的例子如上。


相关内容

热门资讯

终于了解“九九牌游十三水.怎么... 终于了解“九九牌游十三水.怎么装挂?”外卦神器下载您好,九九牌游十三水这个游戏其实有挂的,确实是有挂...
今日重大消息“先锋大厅.怎么开... 家人们!今天小编来为大家解答先锋大厅透视挂怎么安装这个问题咨询软件客服徽9784099的挂在哪里买很...
今日重大消息“同城跑胡子.到底... 网上科普关于“同城跑胡子有没有挂”话题很是火热,小编也是针对同城跑胡子作*弊开挂的方法以及开挂对应的...
【第一资讯】“七彩丹霞.到底有... 【第一资讯】“七彩丹霞.到底有挂吗?”必胜开挂神器您好,七彩丹霞这个游戏其实有挂的,确实是有挂的,需...
2025企业科技创新发展论坛在... 中证报中证网讯(记者 张兴旺)12月13日,2025企业科技创新发展论坛在深圳举办。该论坛以“融链赋...
原创 为... 在阅读文章前,辛苦您点下“关注”,方便讨论和分享。作者定会不负众望,按时按量创作出更优质的内容 文...
【今日要闻】“玄龙二厅.有没有... 家人们!今天小编来为大家解答玄龙二厅透视挂怎么安装这个问题咨询软件客服徽9752949的挂在哪里买很...
今日重磅消息“科乐填大坑.辅助... 家人们!今天小编来为大家解答科乐填大坑透视挂怎么安装这个问题咨询软件客服徽4282891的挂在哪里买...
我来教教您“新海贝之城拼三张.... 有 亲,根据资深记者爆料新海贝之城拼三张是可以开挂的,确实有挂(咨询软件...
今日重磅消息“哪吒重生.辅助器... 家人们!今天小编来为大家解答哪吒重生透视挂怎么安装这个问题咨询软件客服徽9752949的挂在哪里买很...