elasticSearch简单介绍,注意,此笔记基于elasticsearch7版本,8版本已经移除type概念。
Elasticsearch(简称ES)是一个基于Apache Lucene™的开源搜索引擎,无论在开源还是专有领域,Lucene 可以被认为是迄今为止最先进、性能最好的、功能最全的搜索引擎库。注意,Lucene 只是一个库。想要发挥其强大的作用,你需使用 Java 并要将其集成到你的应用中。
Lucene 非常复杂,你需要深入的了解检索相关知识来理解它是如何工作的,就跟学习 springmvc 之前先从 servlet 开始,繁琐复杂的工作,Solor、Elasticsearch 应由而生, 其使用 Java 编写并使用 Lucene 来建立索引并实现搜索功能,但是它的目的是通过简单连贯的 RESTful API 让全文搜索变得简单并隐藏 Lucene 的复杂性。
上面的介绍摘自Elasticsearch基本概念_波斯_辣椒的博客 。根据 CC 4.0 BY-SA 协议获取授权并转载。(偷个懒,毕竟介绍都差不多(❁´◡`❁))。
注:Solor是另一个lucene封装库。
Elasticsearch: 权威指南 | Elastic 。中文文档,可能已经过时。
Elasticsearch Guide | Elastic 。英文文档。
1.安装elasticSearch
docker命令可以参考Docker命令_各种参数简介博客
通过docker下载
下载的版本是7.6.2
1 docker pull elasticsearch:7.6.2
配置
1 2 3 4 5 6 7 8 9 mkdir -p /mydata/elasticsearch/config mkdir -p /mydata/elasticsearch/dataecho "http.host: 0.0.0.0" >/mydata/elasticsearch/config/elasticsearch.yml chmod -R 777 /mydata/elasticsearch/
启动Elastic search
下面的命令需要去除注释后全部执行。注意:
docker run 只在第一次运行的时候使用,(后续使用docker start),用于将镜像放到容器中。不用指定容器id或名称
docker start 重新启动已存在的镜像。用于后面重新启动镜像。需要指定容器id或名称。
查看启动日志:docker logs 容器名称或ID
如果安装配置错误,可以考虑删除容器(不是删除镜像)。
docker stop 容器ID
docker rm 容器ID
重新执行docker run
1 2 3 4 5 6 7 8 9 10 11 12 13 docker run --name elasticsearch -p 9200:9200 -p 9300:9300 \ -e "discovery.type=single-node" \ -e ES_JAVA_OPTS="-Xms64m -Xmx512m" \ -v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \ -v /mydata/elasticsearch/data:/usr/share/elasticsearch/data \ -v /mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins \ -d elasticsearch:7.6.2
设置开机启动(非必须)
1 docker update elasticsearch --restart=always
开放端口:9200,9300
访问
直接访问ip+端口,此处端口为上面设置的9200。若安装失败,使用docker logs 容器id
查看日志。
2.安装kibana
通过docker下载,需要与elasticsearch版本保持一致
1 docker pull kibana:7.6.2
启动kibana,并设置kibana的elasticsearch地址
1 2 3 4 5 6 docker run --name kibana -e ELASTICSEARCH_HOSTS=http://192.168.6.128:9200 -p 5601:5601 -d kibana:7.6.2 docker run -d -p 5601:5601 --link elasticsearch -e "ELASTICSEARCH_URL=localhost:9200" kibana:7.6.2
开放端口:5601
访问kibana
若安装成功,则可以直接访问kibana的地址。主机ip+port。此处为5601。若安装失败,使用docker logs 容器id
查看日志。
3.elasticSearch概念
基本概念:
索引(indices)-------------------Databases 数据库
类型(type)----------------------Table 数据表,在indices下,可以定义一个或多个type。(ES8已移除)
文档(Document)---------------Row 行。以JSON的形式保存
字段(Field)---------------------Columns 列
正排索引:
是以文档对象的唯一 ID 作为索引,以文档内容作为记录的结构。例如关系型数据库的ID。
docID
value
1
动态规划
2
动态壁纸超好看
3
好看动态图
倒排索引
将文档内容中的单词作为索引,将包含该词的文档 docID 作为记录的结构。
先经过正排索引,给文档编号,作为为唯一标识,如上正排索引的表中docID
对字段进行分词。(因此有各种分词器)
按分词建立倒排索引表。term为词,posting list为这个词在哪些docID的value中出现过
term
posting list
动态
1,2,3
规划
1
好看
2,3
壁纸
2
图
3
这些词就是term,而存储原ID的是posting list,存储了所有符合某个term的文档id。
当搜索 动态高清壁纸 的时候,ID=2的命中两次,ID=1、3分别命中1次。经过一些列算法,动态壁纸超好看 分数最高。所有命中的都可以查到,但是分数较低。
当然,这样的存储肯定会特别占用内存,搜索词term的的时候也会比较耗时,底层Lucene有自己更为复杂的实现。
PS:好吧,我就是不懂😭
4.elasticSearch接口
其接口是RestFul 风格的。
查看cat支持的所有指令
GET: http://localhost:9200/_cat
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 =^.^= /_cat/allocation /_cat/shards /_cat/shards/{ index} /_cat/master /_cat/nodes /_cat/tasks /_cat/indices /_cat/indices/{ index} /_cat/segments /_cat/segments/{ index} /_cat/count /_cat/count/{ index} /_cat/recovery /_cat/recovery/{ index} /_cat/health /_cat/pending_tasks /_cat/aliases /_cat/aliases/{ alias} /_cat/thread_pool /_cat/thread_pool/{ thread_pools} /_cat/plugins /_cat/fielddata /_cat/fielddata/{ fields} /_cat/nodeattrs /_cat/repositories /_cat/snapshots/{ repository} /_cat/templates
查看节点信息(_cat是Kibana 控制台)
GET:http://localhost:9200/_cat/nodes。
1 127.0 .0 .1 17 97 1 0.01 0.02 0.00 dilm * a0afe6713d7f
查看节点的健康情况
GET http://localhost:9200/_cat/health
1 1680096955 13 : 35 : 55 elasticsearch yellow 1 1 6 6 0 0 3 0 - 66.7 %
查看主节点信息
GET http://localhost:9200/_cat/master
1 Z8a8Ekp4TVuqkv9rEQEzzA 127.0 .0 .1 127.0 .0 .1 a0afe6713d7f
查看ES的索引(数据库)
GET http://localhost.28:9200/_cat/indices
1 2 3 4 5 6 yellow open website LA03aN0qStmXdO8nmuzWDw 1 1 2 2 8.6 kb 8.6 kb yellow open bank tAFAIEHTSkK-Oavb2Wd3JQ 1 1 1000 0 414.3 kb 414.3 kb green open .kibana_task_manager_1 V13OX5BcTiaN4YLaXsObfQ 1 0 2 0 21.7 kb 21.7 kb green open .apm-agent-configuration isjY4x8cRvCW2XMevs0ypw 1 0 0 0 283 b 283 b green open .kibana_1 mK0GqkVdREKfabACDAxPKQ 1 0 8 0 25.2 kb 25.2 kb yellow open customer X34j-t18T12nerLUajXISQ 1 1 3 0 3.7 kb 3.7 kb
索引一个文档
GET http://localhost:9200/customer/external/1
查询customer索引,external类型下的ID为1
1 2 3 4 5 6 7 8 9 10 11 12 13 { "_index" : "customer" , # 索引 "_type" : "external" , # 类型 "_id" : "1" , # id "_version" : 1 , # 版本号 "_seq_no" : 10 , # 序列号用于并发控制 "_primary_term" : 1 , "found" : true , "_source" : { # 具体的key-value "name" : "John Doe" } }
post新增与修改
可以不指定ID,如果不指定,则自动生成ID。如果指定了,则是修改,同时_version
会加1,_seq_no
也会加1。
POST: http://localhost:9200/customer/external/2
调用
结果
1 2 3 4 5 6 7 8 9 10 11 12 13 14 { "_index" : "customer" , "_type" : "external" , "_id" : "2" , "_version" : 8 , "result" : "updated" , # 结果是更新 "_shards" : { "total" : 2 , "successful" : 1 , "failed" : 0 } , "_seq_no" : 14 , "_primary_term" : 1 }
put新增与修改
必须指定ID,若没有,则是新增,否则是修改
PUT:http://localhost:9200/customer/external/1
调用:
结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 { "_index" : "customer" , "_type" : "external" , "_id" : "1" , "_version" : 5 , "result" : "updated" , "_shards" : { "total" : 2 , "successful" : 1 , "failed" : 0 } , "_seq_no" : 18 , "_primary_term" : 1 }
乐观锁修改
_seq_no
,_primary_term
,可以用于乐观锁更新。if_seq_no=1&if_primary_term=1。通过序列好使用乐观锁
PUT:http://localhost:9200/customer/external/1?if_seq_no=1&if_primary_term=1
调用
结果
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 { "error" : { "root_cause" : [ { "type" : "version_conflict_engine_exception" , "reason" : "[1]: version conflict, required seqNo [1], primary term [1]. current document has seqNo [18] and primary term [1]" , "index_uuid" : "X34j-t18T12nerLUajXISQ" , "shard" : "0" , "index" : "customer" } ] , "type" : "version_conflict_engine_exception" , "reason" : "[1]: version conflict, required seqNo [1], primary term [1]. current document has seqNo [18] and primary term [1]" , "index_uuid" : "X34j-t18T12nerLUajXISQ" , "shard" : "0" , "index" : "customer" } , "status" : 409 }
post更新,带ID,带_update
如果更新的数据没有任何变化,则不进行任何操作
POST:http://localhost:9200/customer/external/1/_update
调用:
1 2 3 4 5 { "doc" : { "name" : "john name" } }
结果
1 2 3 4 5 6 7 8 9 10 11 12 13 14 { "_index" : "customer" , "_type" : "external" , "_id" : "1" , "_version" : 7 , # 版本号增加 "result" : "updated" , # 结果是更新 "_shards" : { "total" : 2 , "successful" : 1 , "failed" : 0 } , "_seq_no" : 23 , # 序列号增加 "_primary_term" : 1 }
再次调用
1 2 3 4 5 6 7 8 9 10 11 12 13 14 { "_index" : "customer" , "_type" : "external" , "_id" : "1" , "_version" : 7 , # 不变化 "result" : "noop" , # noop "_shards" : { "total" : 0 , "successful" : 0 , "failed" : 0 } , "_seq_no" : 23 , # 不变化 "_primary_term" : 1 }
删除数据
DELETE:http://localhost:9200/customer/external/1
返回数据
1 2 3 4 5 6 7 8 9 10 11 12 13 14 { "_index" : "customer" , "_type" : "external" , "_id" : "1" , "_version" : 8 , "result" : "deleted" , "_shards" : { "total" : 2 , "successful" : 1 , "failed" : 0 } , "_seq_no" : 24 , "_primary_term" : 1 }
批量操作,只能在kibana上
如下为批量添加两条数据
elasticsearch-test-data: es测试数据 (gitee.com)
1 2 3 4 5 6 POST /bank/account/_bulk { "index" : { "_id" : "1" } } { "account_number" : 1 , "balance" : 39225 , "firstname" : "Amber" , "lastname" : "Duke" , "age" : 32 , "gender" : "M" , "address" : "880 Holmes Lane" , "employer" : "Pyrami" , "email" : "[email protected] " , "city" : "Brogan" , "state" : "IL" } { "index" : { "_id" : "6" } } { "account_number" : 6 , "balance" : 5686 , "firstname" : "Hattie" , "lastname" : "Bond" , "age" : 36 , "gender" : "M" , "address" : "671 Bristol Street" , "employer" : "Netagy" , "email" : "[email protected] " , "city" : "Dante" , "state" : "TN" }
5 ES检索接口
所有的检索都是先接索引后接_search
。
查询条件位于URL
GET http://localhost:9200/bank/_search?q=*&sort=account_number:asc
查询bank索引下,查询条件为所有数据(q=*),按account_number升序排序
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 { "took" : 43 , "timed_out" : false , "_shards" : { "total" : 1 , "successful" : 1 , "skipped" : 0 , "failed" : 0 } , "hits" : { "total" : { "value" : 1000 , # 总共1000 条数据,此查询会默认返回10 条 "relation" : "eq" # 查询条件为等于(eq) } , "max_score" : null , # 最大得分为空(没有最大得分) "hits" : [ { "_index" : "bank" , "_type" : "account" , "_id" : "0" , "_score" : null , "_source" : { "account_number" : 0 , "balance" : 16623 , "firstname" : "Bradshaw" , "lastname" : "Mckenzie" , "age" : 29 , "gender" : "F" , "address" : "244 Columbus Place" , "employer" : "Euron" , "email" : "[email protected] " , "city" : "Hobucken" , "state" : "CO" } , "sort" : [ 0 ] } ] } }
查询条件位于json(DSL查询)
查询bank索引下的数据
GET http://localhost:9200/bank/_search
调用
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 { "query" : { "match_all" : { } # 查询所有 } , "sort" : [ { "account_number" : "asc" # 排序规则,此处为简写 } , { "balance" : { "order" : "desc" # 排序规则,此处为全写 } } ] , "from" : 0 , "size" : 5 , # 分页查询,从0 开始,只查5 条 "_source" : [ # 如果只想返回balance与firstname字段 "balance" , "firstname" ] }
返回:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 { "took" : 2 , "timed_out" : false , "_shards" : { "total" : 1 , "successful" : 1 , "skipped" : 0 , "failed" : 0 } , "hits" : { "total" : { "value" : 1000 , "relation" : "eq" } , "max_score" : null , "hits" : [ { "_index" : "bank" , "_type" : "account" , "_id" : "0" , "_score" : null , "_source" : { "firstname" : "Bradshaw" , "balance" : 16623 } , "sort" : [ 0 , 16623 ] } , { "_index" : "bank" , "_type" : "account" , "_id" : "1" , "_score" : null , "_source" : { "firstname" : "Amber" , "balance" : 39225 } , "sort" : [ 1 , 39225 ] } , { "_index" : "bank" , "_type" : "account" , "_id" : "2" , "_score" : null , "_source" : { "firstname" : "Roberta" , "balance" : 28838 } , "sort" : [ 2 , 28838 ] } , { "_index" : "bank" , "_type" : "account" , "_id" : "3" , "_score" : null , "_source" : { "firstname" : "Levine" , "balance" : 44947 } , "sort" : [ 3 , 44947 ] } , { "_index" : "bank" , "_type" : "account" , "_id" : "4" , "_score" : null , "_source" : { "firstname" : "Rodriquez" , "balance" : 27658 } , "sort" : [ 4 , 27658 ] } ] } }
全文检索(分词查询)
使用match,是分词查询,会按评分进行排序
GET http://localhost:9200/bank/_search
调用
1 2 3 4 5 6 7 8 9 { "query" : { "match" : { # 查询address包含Kings Place分词后的数据(此处会) # 此处会被分为Kings,Place两个词, "address" : "Kings Place" } } }
返回
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 { "took" : 1 , "timed_out" : false , "_shards" : { "total" : 1 , "successful" : 1 , "skipped" : 0 , "failed" : 0 } , "hits" : { "total" : { "value" : 182 , # 182 条 "relation" : "eq" # 查询关系 为等于 } , "max_score" : 7.6978617 , # 最大得分 "hits" : [ { "_index" : "bank" , "_type" : "account" , "_id" : "20" , "_score" : 7.6978617 , # 查询到的分数(此条是最高的) "_source" : { "account_number" : 20 , "balance" : 16418 , "firstname" : "Elinor" , "lastname" : "Ratliff" , "age" : 36 , "gender" : "M" , "address" : "282 Kings Place" , # 查询匹配字段 "employer" : "Scentric" , "email" : "[email protected] " , "city" : "Ribera" , "state" : "WA" } } , { "_index" : "bank" , "_type" : "account" , "_id" : "722" , "_score" : 5.9908285 , # 查询到的分数 "_source" : { "account_number" : 722 , "balance" : 27256 , "firstname" : "Roberts" , "lastname" : "Beasley" , "age" : 34 , "gender" : "F" , "address" : "305 Kings Hwy" , "employer" : "Quintity" , "email" : "[email protected] " , "city" : "Hayden" , "state" : "PA" } } , { "_index" : "bank" , "_type" : "account" , "_id" : "37" , "_score" : 1.7070332 , "_source" : { "account_number" : 37 , "balance" : 18612 , "firstname" : "Mcgee" , "lastname" : "Mooney" , "age" : 39 , "gender" : "M" , "address" : "826 Fillmore Place" , "employer" : "Reversus" , "email" : "[email protected] " , "city" : "Tooleville" , "state" : "OK" } } ] } }
短语匹配
使用match_phrase进行短语匹配。只要文档里面包含所有分词后的短语,就会被查到,是分词后去查询的,目标文档必须包含分词后的所有词 ,与term的精确匹配 不同
GET http://localhost:9200/bank/_search
调用:
1 2 3 4 5 6 7 { "query" : { "match_phrase" : { "address" : "Kings Place" } } }
返回
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 { "took" : 11 , "timed_out" : false , "_shards" : { "total" : 1 , "successful" : 1 , "skipped" : 0 , "failed" : 0 } , "hits" : { "total" : { "value" : 1 , "relation" : "eq" } , "max_score" : 7.6978617 , "hits" : [ { "_index" : "bank" , "_type" : "account" , "_id" : "20" , "_score" : 7.6978617 , "_source" : { "account_number" : 20 , "balance" : 16418 , "firstname" : "Elinor" , "lastname" : "Ratliff" , "age" : 36 , "gender" : "M" , "address" : "282 Kings Place" , # 只有这一条是精确匹配 "employer" : "Scentric" , "email" : "[email protected] " , "city" : "Ribera" , "state" : "WA" } } ] } }
多字段匹配
使用multi_match进行多字段匹配。多个字段匹配查询条件,相当于sql的or条件。会进行分词。
GET: http://localhost:9200/bank/_search
调用:
1 2 3 4 5 6 7 8 9 10 { "query" : { "multi_match" : { "query" : "mill movico" , "fields" : [ "address" , "city" # 这两个字段匹配查询条件 ] } } }
返回
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 { "took" : 4 , "timed_out" : false , "_shards" : { "total" : 1 , "successful" : 1 , "skipped" : 0 , "failed" : 0 } , "hits" : { "total" : { "value" : 4 , "relation" : "eq" } , "max_score" : 6.505949 , "hits" : [ { "_index" : "bank" , "_type" : "account" , "_id" : "472" , "_score" : 6.505949 , "_source" : { "account_number" : 472 , "balance" : 25571 , "firstname" : "Lee" , "lastname" : "Long" , "age" : 32 , "gender" : "F" , "address" : "288 Mill Street" , "employer" : "Comverges" , "email" : "[email protected] " , "city" : "Movico" , "state" : "MT" } } , { "_index" : "bank" , "_type" : "account" , "_id" : "970" , "_score" : 5.4032025 , "_source" : { "account_number" : 970 , "balance" : 19648 , "firstname" : "Forbes" , "lastname" : "Wallace" , "age" : 28 , "gender" : "M" , "address" : "990 Mill Road" , "employer" : "Pheast" , "email" : "[email protected] " , "city" : "Lopezo" , "state" : "AK" } } , { "_index" : "bank" , "_type" : "account" , "_id" : "136" , "_score" : 5.4032025 , "_source" : { "account_number" : 136 , "balance" : 45801 , "firstname" : "Winnie" , "lastname" : "Holland" , "age" : 38 , "gender" : "M" , "address" : "198 Mill Lane" , "employer" : "Neteria" , "email" : "[email protected] " , "city" : "Urie" , "state" : "IL" } } , { "_index" : "bank" , "_type" : "account" , "_id" : "345" , "_score" : 5.4032025 , "_source" : { "account_number" : 345 , "balance" : 9812 , "firstname" : "Parker" , "lastname" : "Hines" , "age" : 38 , "gender" : "M" , "address" : "715 Mill Avenue" , "employer" : "Baluba" , "email" : "[email protected] " , "city" : "Blackgum" , "state" : "KY" } } ] } }
bool复合查询
用于构建合并多个查询条件。
must: 必须满足的条件;作为条件,同时还会贡献得分 ,与filter区别
must_not:必须不满足;不会贡献得分,通filter
should:应该,满足了会贡献得分。
GET:http://localhost:9200/bank/_search
调用:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 { "query" : { "bool" : { "must" : [ # 必须满足的条件 { "match" : { # 匹配 "gender" : "M" # gender字段M } } , { "match" : { "address" : "mill" } } ] , "must_not" : [ # 必须不满足 { "match" : { "age" : 38 # 年龄不等于38 } } ] , "should" : [ # 应该,满了会加分,不满足也会查到,但是不加分 { "match" : { "lastname" : "Wallace" } } ] } } }
返回:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 { "took" : 7 , "timed_out" : false , "_shards" : { "total" : 1 , "successful" : 1 , "skipped" : 0 , "failed" : 0 } , "hits" : { "total" : { "value" : 1 , "relation" : "eq" } , "max_score" : 12.585751 , "hits" : [ { "_index" : "bank" , "_type" : "account" , "_id" : "970" , "_score" : 12.585751 , "_source" : { "account_number" : 970 , "balance" : 19648 , "firstname" : "Forbes" , "lastname" : "Wallace" , "age" : 28 , "gender" : "M" , "address" : "990 Mill Road" , "employer" : "Pheast" , "email" : "[email protected] " , "city" : "Lopezo" , "state" : "AK" } } ] } }
filter过滤
用来作为筛选条件,不会贡献得分。不同于must与must_not
GET:http://localhost:9200/bank/_search
调用
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 { "query" : { "bool" : { "filter" : [ { "range" : { # 为范围查询 "age" : { # 查询年龄在18 到20 "gte" : 18 , "lte" : 20 } } } ] } } }
返回
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 { "took" : 1 , "timed_out" : false , "_shards" : { "total" : 1 , "successful" : 1 , "skipped" : 0 , "failed" : 0 } , "hits" : { "total" : { "value" : 44 , "relation" : "eq" } , "max_score" : 0.0 , # 最高的分为0 ,验证了不会贡献得分 "hits" : [ { "_index" : "bank" , "_type" : "account" , "_id" : "157" , "_score" : 0.0 , "_source" : { "account_number" : 157 , "balance" : 39868 , "firstname" : "Claudia" , "lastname" : "Terry" , "age" : 20 , "gender" : "F" , "address" : "132 Gunnison Court" , "employer" : "Lumbrex" , "email" : "[email protected] " , "city" : "Castleton" , "state" : "MD" } } , { "_index" : "bank" , "_type" : "account" , "_id" : "215" , "_score" : 0.0 , "_source" : { "account_number" : 215 , "balance" : 37427 , "firstname" : "Copeland" , "lastname" : "Solomon" , "age" : 20 , "gender" : "M" , "address" : "741 McDonald Avenue" , "employer" : "Recognia" , "email" : "[email protected] " , "city" : "Edmund" , "state" : "ME" } } ] } }
term与文本精确查询
term的精确匹配,不会进行分词。非文本字段的精确查询,例如年龄、金额等数字。
若需要精确检索,使用字段.keywword。
GET:http://localhost:9200/bank/_search
调用:
1 2 3 4 5 6 7 { "query" : { "term" : { "age" : 28 } } }
文本的精确查询
1 2 3 4 5 6 7 { "query" : { "match" : { "address.keyword" : "789 Madison Street" # keywword关键字 } } }
聚合检索aggs
用于对查询后的条件进行分析与提取,类似于SQL的group by和SQL的聚合函数。
搜索address为mill,所有人的年龄分布,与平均年龄。但是不显示这些人的详情。
GET:http://localhost:9200/bank/_search
调用:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 { "query" : { "match" : { "address" : "mill" # 查询address为mill的数据 } } , "aggs" : { "ageAgg" : { # 自定义聚合名字为ageAgg "terms" : { # 聚合类型terms,该类型类似于count group by `age` 按年龄分组统计数量 "field" : "ag" , "size" : 10 # 只展示前10 个 } } , "ageAvg" : { # 自定义聚合名字为ageAvg "avg" : { # 聚合类型为avg "field" : "age" # 统计平均年龄 } } } , "size" : "0" # 分页为0 ,不要任何文档数据,只要聚合结果 }
返回:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 { "took" : 5 , "timed_out" : false , "_shards" : { "total" : 1 , "successful" : 1 , "skipped" : 0 , "failed" : 0 } , "hits" : { "total" : { "value" : 4 , "relation" : "eq" } , "max_score" : null , "hits" : [ ] } , "aggregations" : { "ageAgg" : { # 自定义名字为ageAgg的聚合 "doc_count_error_upper_bound" : 0 , "sum_other_doc_count" : 0 , "buckets" : [ { "key" : 38 , # 年龄38 的有2 个 "doc_count" : 2 } , { "key" : 28 , "doc_count" : 1 } , { "key" : 32 , "doc_count" : 1 } ] } , "ageAvg" : { # 自定义名字为ageAvg的聚合 "value" : 34.0 # 平均年龄为34.0 } } }
按年龄分组统计数量;按年龄、性别分组统计数量; 按年龄、性别分组统计平均工资; 按年龄的平均工资
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 { "query" : { "match_all" : { } } , "aggs" : { # 聚合 "ageAgg" : { # 自定义聚合名 "terms" : { "field" : "age" , # 按年龄count "size" : 2 # 展示前3 条 } , "aggs" : { # 嵌套聚合,此聚合基于按age分组后 "genderAgg" : { #自定义聚合名字 "terms" : { "field" : "gender.keyword" , # 聚合按gender的精确字段统计(因为此字段是文本) "size" : 10 } , "aggs" : { # 嵌套聚合,此聚合基于按age、gender分组后 "balanceAvg" : { # 自定义聚合名字 "avg" : { # 求balance平均值 "field" : "balance" } } } } , "ageBalanceAvg" : { # 自定义聚合名字 "avg" : { # 求平均值 (每个age的平均balance) "field" : "balance" } } } } } , "size" : "0" }
返回:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 { "took" : 9 , "timed_out" : false , "_shards" : { "total" : 1 , "successful" : 1 , "skipped" : 0 , "failed" : 0 } , "hits" : { "total" : { "value" : 1000 , "relation" : "eq" } , "max_score" : null , "hits" : [ ] } , "aggregations" : { "ageAgg" : { # 聚合名称 "doc_count_error_upper_bound" : 0 , "sum_other_doc_count" : 879 , "buckets" : [ { "key" : 31 , # 年龄31 的 "doc_count" : 61 , # 数量 "genderAgg" : { # 聚合名称 "doc_count_error_upper_bound" : 0 , "sum_other_doc_count" : 0 , "buckets" : [ { "key" : "M" , # gender 为M的 "doc_count" : 35 , # 数量 "balanceAvg" : { # 聚合名称 "value" : 29565.628571428573 # balance平均值29565.628571428573 } } , { "key" : "F" , # gender为F的 "doc_count" : 26 , # 数量26 "balanceAvg" : { # 聚合名称 "value" : 26626.576923076922 # balance平均值29565.628571428573 (年龄31 的、gender为F的平均balance) } } ] } , "ageBalanceAvg" : { # 聚合名 "value" : 28312.918032786885 # balanc平均值28312.918032786885 (年龄38 的平均balance) } } , { "key" : 39 , "doc_count" : 60 , "genderAgg" : { "doc_count_error_upper_bound" : 0 , "sum_other_doc_count" : 0 , "buckets" : [ { "key" : "F" , "doc_count" : 38 , "balanceAvg" : { "value" : 26348.684210526317 } } , { "key" : "M" , "doc_count" : 22 , "balanceAvg" : { "value" : 23405.68181818182 } } ] } , "ageBalanceAvg" : { "value" : 25269.583333333332 } } ] } } }
映射Mapping
类似与创建SQL时的定义的字段数据类型(不同于type)。索引下的类型(type)在ES7版本可选,8版本移除。
映射会在创建时ES自动推断
GET http://localhost:9200/bank/_mapping
获取所有字段映射
返回
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 { "bank" : { "mappings" : { "properties" : { "account_number" : { "type" : "long" # 类型为long } , "address" : { "type" : "text" , #类型为文本 "fields" : { "keyword" : { "type" : "keyword" , "ignore_above" : 256 } } } , "age" : { "type" : "long" } , "balance" : { "type" : "long" } , "city" : { "type" : "text" , "fields" : { "keyword" : { "type" : "keyword" , "ignore_above" : 256 } } } , "email" : { "type" : "text" , "fields" : { "keyword" : { "type" : "keyword" , "ignore_above" : 256 } } } , "employer" : { "type" : "text" , "fields" : { "keyword" : { "type" : "keyword" , "ignore_above" : 256 } } } , "firstname" : { "type" : "text" , "fields" : { "keyword" : { "type" : "keyword" , "ignore_above" : 256 } } } , "gender" : { "type" : "text" , "fields" : { "keyword" : { "type" : "keyword" , "ignore_above" : 256 } } } , "lastname" : { "type" : "text" , "fields" : { "keyword" : { "type" : "keyword" , "ignore_above" : 256 } } } , "state" : { "type" : "text" , "fields" : { "keyword" : { "type" : "keyword" , "ignore_above" : 256 } } } } } } }
手动创建映射 ,在创建索引时可以手动创建,创建my_index索引
PUT http://localhost:9200/my_index
调用
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 { "mappings" : { # 创建的时候需要 "properties" : { # 指定为映射 "age" : { "type" : "integer" } , "email" : { "type" : "keyword" } , "name" : { "type" : "text" } } } }
添加映射
PUT http://localhost:9200/my_index/_mapping
调用
1 2 3 4 5 6 7 8 { "properties" : { # 添加映射 "employee-id" : { "type" : "keyword" , "index" : false } } }
迁移数据
不支持修改索引,建议迁移。先创建索引,再迁移
PUT http://localhost:9200/_reindex
调用
1 2 3 4 5 6 7 8 9 { "source" : { # 从那迁移 "index" : "bank" , "type" : "account" } , "dest" : { # 迁移到 "index" : "newbank" } }
6 安装分词器
这里使用IK分词器,medcl/elasticsearch-analysis-ik: The IK Analysis plugin integrates Lucene IK analyzer into elasticsearch, support customized dictionary. (github.com) 。下载与ES对应的ik分词器版本即可。
然解压到之前映射的plugins目录下即可。
使用Ik分词器
GET http://localhost:9200/_analyze
调用
1 2 3 4 { "analyzer" : "ik_smart" , # 使用ik分词器 "text" : "Elasticsearch(简称ES)是一个基于Apache Lucene™的开源搜索引擎" }
返回
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 { "tokens" : [ { "token" : "elasticsearch" , "start_offset" : 0 , "end_offset" : 13 , "type" : "ENGLISH" , "position" : 0 } , { "token" : "简称" , "start_offset" : 14 , "end_offset" : 16 , "type" : "CN_WORD" , "position" : 1 } , { "token" : "es" , "start_offset" : 16 , "end_offset" : 18 , "type" : "ENGLISH" , "position" : 2 } , { "token" : "是" , "start_offset" : 19 , "end_offset" : 20 , "type" : "CN_CHAR" , "position" : 3 } , { "token" : "一个" , "start_offset" : 20 , "end_offset" : 22 , "type" : "CN_WORD" , "position" : 4 } , { "token" : "基于" , "start_offset" : 22 , "end_offset" : 24 , "type" : "CN_WORD" , "position" : 5 } , { "token" : "apache" , "start_offset" : 24 , "end_offset" : 30 , "type" : "ENGLISH" , "position" : 6 } , { "token" : "lucene" , "start_offset" : 31 , "end_offset" : 37 , "type" : "ENGLISH" , "position" : 7 } , { "token" : "的" , "start_offset" : 38 , "end_offset" : 39 , "type" : "CN_CHAR" , "position" : 8 } , { "token" : "开源" , "start_offset" : 39 , "end_offset" : 41 , "type" : "CN_WORD" , "position" : 9 } , { "token" : "搜索引擎" , "start_offset" : 41 , "end_offset" : 45 , "type" : "CN_WORD" , "position" : 10 } ] }