elasticSearch基础

  1. 1. 1.安装elasticSearch
  2. 2. 2.安装kibana
  3. 3. 3.elasticSearch概念
    1. 3.1. 基本概念:
    2. 3.2. 正排索引:
    3. 3.3. 倒排索引
  4. 4. 4.elasticSearch接口
    1. 4.1. 查看cat支持的所有指令
    2. 4.2. 查看节点信息(_cat是Kibana 控制台)
    3. 4.3. 查看节点的健康情况
    4. 4.4. 查看主节点信息
    5. 4.5. 查看ES的索引(数据库)
    6. 4.6. 索引一个文档
    7. 4.7. post新增与修改
    8. 4.8. put新增与修改
    9. 4.9. 乐观锁修改
    10. 4.10. post更新,带ID,带_update
    11. 4.11. 删除数据
    12. 4.12. 批量操作,只能在kibana上
  5. 5. 5 ES检索接口
    1. 5.1. 查询条件位于URL
    2. 5.2. 查询条件位于json(DSL查询)
    3. 5.3. 全文检索(分词查询)
    4. 5.4. 短语匹配
    5. 5.5. 多字段匹配
    6. 5.6. bool复合查询
    7. 5.7. filter过滤
    8. 5.8. term与文本精确查询
    9. 5.9. 聚合检索aggs
    10. 5.10. 映射Mapping
  6. 6. 6 安装分词器
    1. 6.1. 使用Ik分词器

elasticSearch简单介绍,注意,此笔记基于elasticsearch7版本,8版本已经移除type概念。

Elasticsearch(简称ES)是一个基于Apache Lucene™的开源搜索引擎,无论在开源还是专有领域,Lucene 可以被认为是迄今为止最先进、性能最好的、功能最全的搜索引擎库。注意,Lucene 只是一个库。想要发挥其强大的作用,你需使用 Java 并要将其集成到你的应用中。

Lucene 非常复杂,你需要深入的了解检索相关知识来理解它是如何工作的,就跟学习 springmvc 之前先从 servlet 开始,繁琐复杂的工作,Solor、Elasticsearch 应由而生, 其使用 Java 编写并使用 Lucene 来建立索引并实现搜索功能,但是它的目的是通过简单连贯的 RESTful API 让全文搜索变得简单并隐藏 Lucene 的复杂性。

上面的介绍摘自Elasticsearch基本概念_波斯_辣椒的博客。根据 CC 4.0 BY-SA 协议获取授权并转载。(偷个懒,毕竟介绍都差不多(❁´◡`❁))。
注:Solor是另一个lucene封装库。

Elasticsearch: 权威指南 | Elastic。中文文档,可能已经过时。

Elasticsearch Guide | Elastic。英文文档。

1.安装elasticSearch

docker命令可以参考Docker命令_各种参数简介博客

  1. 通过docker下载

下载的版本是7.6.2

1
docker pull elasticsearch:7.6.2
  1. 配置
1
2
3
4
5
6
7
8
9
#创建目录, 用于后面将docker中的配置文件映射到物理机中
mkdir -p /mydata/elasticsearch/config
mkdir -p /mydata/elasticsearch/data
# 写入`http.host: 0.0.0.0`到config下的elasticsearch.yml文件中
# 允许任何ip访问(在防火墙限制ip(因为本地是浮动ip))
echo "http.host: 0.0.0.0" >/mydata/elasticsearch/config/elasticsearch.yml

# 将mydata/elasticsearch/文件夹中文件都可读可写(生产应该更好的做法),-R是递归的意思
chmod -R 777 /mydata/elasticsearch/
  1. 启动Elastic search

下面的命令需要去除注释后全部执行。注意:

  • docker run 只在第一次运行的时候使用,(后续使用docker start),用于将镜像放到容器中。不用指定容器id或名称
  • docker start 重新启动已存在的镜像。用于后面重新启动镜像。需要指定容器id或名称。
  • 查看启动日志:docker logs 容器名称或ID
  • 如果安装配置错误,可以考虑删除容器(不是删除镜像)。
    1. docker stop 容器ID
    2. docker rm 容器ID
    3. 重新执行docker run
1
2
3
4
5
6
7
8
9
10
11
12
13
# 会创建镜像,名称为elasticsearch,映射容器的端口9200到物理机的9200(http请求),9300(集群通信)
docker run --name elasticsearch -p 9200:9200 -p 9300:9300 \
# 启动模式:单节点。`-e`:设置镜像的环境变量
-e "discovery.type=single-node" \
# 限制内存大小,否则会使用所有内存(es特别占用内存)
-e ES_JAVA_OPTS="-Xms64m -Xmx512m" \
# 挂载docker中的配置到外部配置。`-v` 挂载数据卷
# 将docker中的elasticsearch.yml挂载到物理机中的elasticsearch.yml;挂载data文件夹,挂载plugins文件夹
-v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \
-v /mydata/elasticsearch/data:/usr/share/elasticsearch/data \
-v /mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins \
# 后台运行
-d elasticsearch:7.6.2
  1. 设置开机启动(非必须)
1
docker update elasticsearch --restart=always
  1. 开放端口:9200,9300
  2. 访问

直接访问ip+端口,此处端口为上面设置的9200。若安装失败,使用docker logs 容器id查看日志。

elasticSearch安装访问

2.安装kibana

  1. 通过docker下载,需要与elasticsearch版本保持一致
1
docker pull kibana:7.6.2
  1. 启动kibana,并设置kibana的elasticsearch地址
1
2
3
4
5
6
# 第一种方式启动,需要指定地址为ip(局域网ip)
docker run --name kibana -e ELASTICSEARCH_HOSTS=http://192.168.6.128:9200 -p 5601:5601 -d kibana:7.6.2
# 第二种方式启动 可以指定地址为localhost(推荐)
# `-d`后台运行,`-p`映射端口,物理机端口5601到容器端口5601,--link,链接名为elasticsearch的容器,
# `-e`配置环境变量ELASTICSEARCH_URL=localhost:9200,9200是elesticsearch
docker run -d -p 5601:5601 --link elasticsearch -e "ELASTICSEARCH_URL=localhost:9200" kibana:7.6.2
  1. 开放端口:5601
  2. 访问kibana

若安装成功,则可以直接访问kibana的地址。主机ip+port。此处为5601。若安装失败,使用docker logs 容器id查看日志。

kibana访问

3.elasticSearch概念

基本概念:

索引(indices)-------------------Databases 数据库
类型(type)----------------------Table 数据表,在indices下,可以定义一个或多个type。(ES8已移除)
文档(Document)---------------Row 行。以JSON的形式保存
字段(Field)---------------------Columns 列

正排索引:

是以文档对象的唯一 ID 作为索引,以文档内容作为记录的结构。例如关系型数据库的ID。

docID value
1 动态规划
2 动态壁纸超好看
3 好看动态图

倒排索引

将文档内容中的单词作为索引,将包含该词的文档 docID 作为记录的结构。

  1. 先经过正排索引,给文档编号,作为为唯一标识,如上正排索引的表中docID
  2. 对字段进行分词。(因此有各种分词器)
  3. 按分词建立倒排索引表。term为词,posting list为这个词在哪些docID的value中出现过
term posting list
动态 1,2,3
规划 1
好看 2,3
壁纸 2
3

这些词就是term,而存储原ID的是posting list,存储了所有符合某个term的文档id。

当搜索 动态高清壁纸 的时候,ID=2的命中两次,ID=1、3分别命中1次。经过一些列算法,动态壁纸超好看 分数最高。所有命中的都可以查到,但是分数较低。

当然,这样的存储肯定会特别占用内存,搜索词term的的时候也会比较耗时,底层Lucene有自己更为复杂的实现。
PS:好吧,我就是不懂😭

4.elasticSearch接口

其接口是RestFul风格的。

查看cat支持的所有指令

GET: http://localhost:9200/_cat

返回JSON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
=^.^=
/_cat/allocation
/_cat/shards
/_cat/shards/{index}
/_cat/master
/_cat/nodes
/_cat/tasks
/_cat/indices
/_cat/indices/{index}
/_cat/segments
/_cat/segments/{index}
/_cat/count
/_cat/count/{index}
/_cat/recovery
/_cat/recovery/{index}
/_cat/health
/_cat/pending_tasks
/_cat/aliases
/_cat/aliases/{alias}
/_cat/thread_pool
/_cat/thread_pool/{thread_pools}
/_cat/plugins
/_cat/fielddata
/_cat/fielddata/{fields}
/_cat/nodeattrs
/_cat/repositories
/_cat/snapshots/{repository}
/_cat/templates

查看节点信息(_cat是Kibana 控制台)

GET:http://localhost:9200/_cat/nodes。

1
127.0.0.1 17 97 1 0.01 0.02 0.00 dilm * a0afe6713d7f

查看节点的健康情况

GET http://localhost:9200/_cat/health

1
1680096955 13:35:55 elasticsearch yellow 1 1 6 6 0 0 3 0 - 66.7%

查看主节点信息

GET http://localhost:9200/_cat/master

1
Z8a8Ekp4TVuqkv9rEQEzzA 127.0.0.1 127.0.0.1 a0afe6713d7f

查看ES的索引(数据库)

GET http://localhost.28:9200/_cat/indices

1
2
3
4
5
6
yellow open website                  LA03aN0qStmXdO8nmuzWDw 1 1    2 2   8.6kb   8.6kb
yellow open bank tAFAIEHTSkK-Oavb2Wd3JQ 1 1 1000 0 414.3kb 414.3kb
green open .kibana_task_manager_1 V13OX5BcTiaN4YLaXsObfQ 1 0 2 0 21.7kb 21.7kb
green open .apm-agent-configuration isjY4x8cRvCW2XMevs0ypw 1 0 0 0 283b 283b
green open .kibana_1 mK0GqkVdREKfabACDAxPKQ 1 0 8 0 25.2kb 25.2kb
yellow open customer X34j-t18T12nerLUajXISQ 1 1 3 0 3.7kb 3.7kb

索引一个文档

GET http://localhost:9200/customer/external/1
查询customer索引,external类型下的ID为1

1
2
3
4
5
6
7
8
9
10
11
12
13
{
"_index": "customer", # 索引
"_type": "external", # 类型
"_id": "1", # id
"_version": 1, # 版本号
"_seq_no": 10, # 序列号用于并发控制
"_primary_term": 1,
"found": true,
"_source": {
# 具体的key-value
"name": "John Doe"
}
}

post新增与修改

可以不指定ID,如果不指定,则自动生成ID。如果指定了,则是修改,同时_version会加1,_seq_no也会加1。
POST: http://localhost:9200/customer/external/2

调用

1
2
3
{
"name":"John Doe"
}

结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
{
"_index": "customer",
"_type": "external",
"_id": "2",
"_version": 8,
"result": "updated", # 结果是更新
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 14,
"_primary_term": 1
}

put新增与修改

必须指定ID,若没有,则是新增,否则是修改
PUT:http://localhost:9200/customer/external/1

调用:

1
2
3
{
"name":"John Doe"
}

结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
{
"_index": "customer",
"_type": "external",
"_id": "1",
"_version": 5,
"result": "updated",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 18,
"_primary_term": 1
}

乐观锁修改

_seq_no_primary_term,可以用于乐观锁更新。if_seq_no=1&if_primary_term=1。通过序列好使用乐观锁
PUT:http://localhost:9200/customer/external/1?if_seq_no=1&if_primary_term=1

调用

1
2
3
{
"name":"aa"
}

结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
{
"error": {
"root_cause": [
{
"type": "version_conflict_engine_exception",
"reason": "[1]: version conflict, required seqNo [1], primary term [1]. current document has seqNo [18] and primary term [1]",
"index_uuid": "X34j-t18T12nerLUajXISQ",
"shard": "0",
"index": "customer"
}
],
"type": "version_conflict_engine_exception",
"reason": "[1]: version conflict, required seqNo [1], primary term [1]. current document has seqNo [18] and primary term [1]",
"index_uuid": "X34j-t18T12nerLUajXISQ",
"shard": "0",
"index": "customer"
},
"status": 409
}

post更新,带ID,带_update

如果更新的数据没有任何变化,则不进行任何操作

POST:http://localhost:9200/customer/external/1/_update

调用:

1
2
3
4
5
{
"doc" : {
"name" : "john name"
}
}

结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
{
"_index": "customer",
"_type": "external",
"_id": "1",
"_version": 7, # 版本号增加
"result": "updated", # 结果是更新
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 23, # 序列号增加
"_primary_term": 1
}

再次调用

1
2
3
4
5
6
7
8
9
10
11
12
13
14
{
"_index": "customer",
"_type": "external",
"_id": "1",
"_version": 7, # 不变化
"result": "noop", # noop
"_shards": {
"total": 0,
"successful": 0,
"failed": 0
},
"_seq_no": 23, # 不变化
"_primary_term": 1
}

删除数据

DELETE:http://localhost:9200/customer/external/1

返回数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
{
"_index": "customer",
"_type": "external",
"_id": "1",
"_version": 8,
"result": "deleted",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 24,
"_primary_term": 1
}

批量操作,只能在kibana上

如下为批量添加两条数据

elasticsearch-test-data: es测试数据 (gitee.com)

1
2
3
4
5
6
POST /bank/account/_bulk

{"index":{"_id":"1"}}
{"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"[email protected]","city":"Brogan","state":"IL"}
{"index":{"_id":"6"}}
{"account_number":6,"balance":5686,"firstname":"Hattie","lastname":"Bond","age":36,"gender":"M","address":"671 Bristol Street","employer":"Netagy","email":"[email protected]","city":"Dante","state":"TN"}

5 ES检索接口

所有的检索都是先接索引后接_search

查询条件位于URL

GET http://localhost:9200/bank/_search?q=*&sort=account_number:asc

查询bank索引下,查询条件为所有数据(q=*),按account_number升序排序

返回JSON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
{
"took": 43,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1000, # 总共1000条数据,此查询会默认返回10
"relation": "eq" # 查询条件为等于(eq)
},
"max_score": null, # 最大得分为空(没有最大得分)
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "0",
"_score": null,
"_source": {
"account_number": 0,
"balance": 16623,
"firstname": "Bradshaw",
"lastname": "Mckenzie",
"age": 29,
"gender": "F",
"address": "244 Columbus Place",
"employer": "Euron",
"email": "[email protected]",
"city": "Hobucken",
"state": "CO"
},
"sort": [
0
]
}
]
}
}

查询条件位于json(DSL查询)

查询bank索引下的数据

GET http://localhost:9200/bank/_search

调用

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
{
"query": {
"match_all": {} # 查询所有
},
"sort": [
{
"account_number": "asc" # 排序规则,此处为简写
},
{
"balance": {
"order": "desc" # 排序规则,此处为全写
}
}
],
"from": 0,
"size": 5, # 分页查询,从0开始,只查5
"_source": [ # 如果只想返回balance与firstname字段
"balance",
"firstname"
]
}

返回:

返回JSON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1000,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "0",
"_score": null,
"_source": {
"firstname": "Bradshaw",
"balance": 16623
},
"sort": [
0,
16623
]
},
{
"_index": "bank",
"_type": "account",
"_id": "1",
"_score": null,
"_source": {
"firstname": "Amber",
"balance": 39225
},
"sort": [
1,
39225
]
},
{
"_index": "bank",
"_type": "account",
"_id": "2",
"_score": null,
"_source": {
"firstname": "Roberta",
"balance": 28838
},
"sort": [
2,
28838
]
},
{
"_index": "bank",
"_type": "account",
"_id": "3",
"_score": null,
"_source": {
"firstname": "Levine",
"balance": 44947
},
"sort": [
3,
44947
]
},
{
"_index": "bank",
"_type": "account",
"_id": "4",
"_score": null,
"_source": {
"firstname": "Rodriquez",
"balance": 27658
},
"sort": [
4,
27658
]
}
]
}
}

全文检索(分词查询)

使用match,是分词查询,会按评分进行排序

GET http://localhost:9200/bank/_search

调用

1
2
3
4
5
6
7
8
9
{
"query": {
"match":{
# 查询address包含Kings Place分词后的数据(此处会)
# 此处会被分为Kings,Place两个词,
"address": "Kings Place"
}
}
}

返回

返回JSON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 182, # 182
"relation": "eq" # 查询关系 为等于
},
"max_score": 7.6978617, # 最大得分
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "20",
"_score": 7.6978617, # 查询到的分数(此条是最高的)
"_source": {
"account_number": 20,
"balance": 16418,
"firstname": "Elinor",
"lastname": "Ratliff",
"age": 36,
"gender": "M",
"address": "282 Kings Place", # 查询匹配字段
"employer": "Scentric",
"email": "[email protected]",
"city": "Ribera",
"state": "WA"
}
},
{
"_index": "bank",
"_type": "account",
"_id": "722",
"_score": 5.9908285, # 查询到的分数
"_source": {
"account_number": 722,
"balance": 27256,
"firstname": "Roberts",
"lastname": "Beasley",
"age": 34,
"gender": "F",
"address": "305 Kings Hwy",
"employer": "Quintity",
"email": "[email protected]",
"city": "Hayden",
"state": "PA"
}
},
{
"_index": "bank",
"_type": "account",
"_id": "37",
"_score": 1.7070332,
"_source": {
"account_number": 37,
"balance": 18612,
"firstname": "Mcgee",
"lastname": "Mooney",
"age": 39,
"gender": "M",
"address": "826 Fillmore Place",
"employer": "Reversus",
"email": "[email protected]",
"city": "Tooleville",
"state": "OK"
}
}
]
}
}

短语匹配

使用match_phrase进行短语匹配。只要文档里面包含所有分词后的短语,就会被查到,是分词后去查询的,目标文档必须包含分词后的所有词,与term的精确匹配不同

GET http://localhost:9200/bank/_search

调用:

1
2
3
4
5
6
7
{
"query": {
"match_phrase":{
"address": "Kings Place"
}
}
}

返回

返回JSON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
{
"took": 11,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 7.6978617,
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "20",
"_score": 7.6978617,
"_source": {
"account_number": 20,
"balance": 16418,
"firstname": "Elinor",
"lastname": "Ratliff",
"age": 36,
"gender": "M",
"address": "282 Kings Place", # 只有这一条是精确匹配
"employer": "Scentric",
"email": "[email protected]",
"city": "Ribera",
"state": "WA"
}
}
]
}
}

多字段匹配

使用multi_match进行多字段匹配。多个字段匹配查询条件,相当于sql的or条件。会进行分词。

GET: http://localhost:9200/bank/_search

调用:

1
2
3
4
5
6
7
8
9
10
{
"query": {
"multi_match":{
"query": "mill movico",
"fields": [
"address", "city" # 这两个字段匹配查询条件
]
}
}
}

返回

返回JSON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 4,
"relation": "eq"
},
"max_score": 6.505949,
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "472",
"_score": 6.505949,
"_source": {
"account_number": 472,
"balance": 25571,
"firstname": "Lee",
"lastname": "Long",
"age": 32,
"gender": "F",
"address": "288 Mill Street",
"employer": "Comverges",
"email": "[email protected]",
"city": "Movico",
"state": "MT"
}
},
{
"_index": "bank",
"_type": "account",
"_id": "970",
"_score": 5.4032025,
"_source": {
"account_number": 970,
"balance": 19648,
"firstname": "Forbes",
"lastname": "Wallace",
"age": 28,
"gender": "M",
"address": "990 Mill Road",
"employer": "Pheast",
"email": "[email protected]",
"city": "Lopezo",
"state": "AK"
}
},
{
"_index": "bank",
"_type": "account",
"_id": "136",
"_score": 5.4032025,
"_source": {
"account_number": 136,
"balance": 45801,
"firstname": "Winnie",
"lastname": "Holland",
"age": 38,
"gender": "M",
"address": "198 Mill Lane",
"employer": "Neteria",
"email": "[email protected]",
"city": "Urie",
"state": "IL"
}
},
{
"_index": "bank",
"_type": "account",
"_id": "345",
"_score": 5.4032025,
"_source": {
"account_number": 345,
"balance": 9812,
"firstname": "Parker",
"lastname": "Hines",
"age": 38,
"gender": "M",
"address": "715 Mill Avenue",
"employer": "Baluba",
"email": "[email protected]",
"city": "Blackgum",
"state": "KY"
}
}
]
}
}

bool复合查询

用于构建合并多个查询条件。

must: 必须满足的条件;作为条件,同时还会贡献得分,与filter区别

must_not:必须不满足;不会贡献得分,通filter

should:应该,满足了会贡献得分。

GET:http://localhost:9200/bank/_search

调用:

调用JSON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
{
"query": {
"bool":{
"must":[ # 必须满足的条件
{
"match":{ # 匹配
"gender": "M" # gender字段M
}
},
{
"match":{
"address": "mill"
}
}
],
"must_not" : [ # 必须不满足
{
"match":{
"age": 38 # 年龄不等于38
}
}
],
"should": [ # 应该,满了会加分,不满足也会查到,但是不加分
{
"match":{
"lastname": "Wallace"
}
}
]
}
}
}

返回:

返回JSON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 12.585751,
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "970",
"_score": 12.585751,
"_source": {
"account_number": 970,
"balance": 19648,
"firstname": "Forbes",
"lastname": "Wallace",
"age": 28,
"gender": "M",
"address": "990 Mill Road",
"employer": "Pheast",
"email": "[email protected]",
"city": "Lopezo",
"state": "AK"
}
}
]
}
}

filter过滤

用来作为筛选条件,不会贡献得分。不同于must与must_not

GET:http://localhost:9200/bank/_search

调用

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
{
"query": {
"bool":{
"filter":[
{
"range":{ # 为范围查询
"age":{ # 查询年龄在1820
"gte": 18,
"lte": 20
}
}
}
]
}
}
}

返回

返回JSON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 44,
"relation": "eq"
},
"max_score": 0.0, # 最高的分为0,验证了不会贡献得分
"hits": [
{
"_index": "bank",
"_type": "account",
"_id": "157",
"_score": 0.0,
"_source": {
"account_number": 157,
"balance": 39868,
"firstname": "Claudia",
"lastname": "Terry",
"age": 20,
"gender": "F",
"address": "132 Gunnison Court",
"employer": "Lumbrex",
"email": "[email protected]",
"city": "Castleton",
"state": "MD"
}
},
{
"_index": "bank",
"_type": "account",
"_id": "215",
"_score": 0.0,
"_source": {
"account_number": 215,
"balance": 37427,
"firstname": "Copeland",
"lastname": "Solomon",
"age": 20,
"gender": "M",
"address": "741 McDonald Avenue",
"employer": "Recognia",
"email": "[email protected]",
"city": "Edmund",
"state": "ME"
}
}
]
}
}

term与文本精确查询

term的精确匹配,不会进行分词。非文本字段的精确查询,例如年龄、金额等数字。

若需要精确检索,使用字段.keywword。

GET:http://localhost:9200/bank/_search

调用:

1
2
3
4
5
6
7
{
"query": {
"term": {
"age": 28
}
}
}

文本的精确查询

1
2
3
4
5
6
7
{
"query": {
"match": {
"address.keyword": "789 Madison Street" # keywword关键字
}
}
}

聚合检索aggs

用于对查询后的条件进行分析与提取,类似于SQL的group by和SQL的聚合函数。

搜索address为mill,所有人的年龄分布,与平均年龄。但是不显示这些人的详情。

GET:http://localhost:9200/bank/_search

调用:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
{
"query": {
"match": {
"address": "mill" # 查询address为mill的数据
}
},
"aggs": {
"ageAgg": { # 自定义聚合名字为ageAgg
"terms" : { # 聚合类型terms,该类型类似于count group by `age` 按年龄分组统计数量
"field": "ag",
"size": 10 # 只展示前10
}
},
"ageAvg": { # 自定义聚合名字为ageAvg
"avg" : { # 聚合类型为avg
"field": "age" # 统计平均年龄
}
}
},
"size" : "0" # 分页为0,不要任何文档数据,只要聚合结果
}

返回:

返回JSON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 4,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"ageAgg": { # 自定义名字为ageAgg的聚合
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 38, # 年龄38的有2
"doc_count": 2
},
{
"key": 28,
"doc_count": 1
},
{
"key": 32,
"doc_count": 1
}
]
},
"ageAvg": { # 自定义名字为ageAvg的聚合
"value": 34.0 # 平均年龄为34.0
}
}
}

按年龄分组统计数量;按年龄、性别分组统计数量; 按年龄、性别分组统计平均工资; 按年龄的平均工资

调用JSON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
{
"query": {
"match_all": {}
},
"aggs": { # 聚合
"ageAgg": { # 自定义聚合名

"terms": {
"field": "age", # 按年龄count
"size": 2 # 展示前3
},
"aggs": { # 嵌套聚合,此聚合基于按age分组后

"genderAgg": { #自定义聚合名字
"terms": {
"field": "gender.keyword", # 聚合按gender的精确字段统计(因为此字段是文本)
"size": 10
},
"aggs": { # 嵌套聚合,此聚合基于按age、gender分组后
"balanceAvg": { # 自定义聚合名字
"avg": { # 求balance平均值
"field": "balance"
}
}
}
},

"ageBalanceAvg": { # 自定义聚合名字
"avg": { # 求平均值 (每个age的平均balance)
"field": "balance"
}
}
}
}
},
"size": "0"
}

返回:

返回JSON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
{
"took": 9,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1000,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"ageAgg": { # 聚合名称
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 879,
"buckets": [
{
"key": 31, # 年龄31
"doc_count": 61, # 数量
"genderAgg": { # 聚合名称
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "M", # gender 为M的
"doc_count": 35, # 数量
"balanceAvg": { # 聚合名称
"value": 29565.628571428573 # balance平均值29565.628571428573
}
},
{
"key": "F", # gender为F的
"doc_count": 26, # 数量26
"balanceAvg": { # 聚合名称
"value": 26626.576923076922 # balance平均值29565.628571428573(年龄31的、gender为F的平均balance)
}
}
]
},
"ageBalanceAvg": { # 聚合名
"value": 28312.918032786885 # balanc平均值28312.918032786885(年龄38的平均balance)
}
},
{
"key": 39,
"doc_count": 60,
"genderAgg": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "F",
"doc_count": 38,
"balanceAvg": {
"value": 26348.684210526317
}
},
{
"key": "M",
"doc_count": 22,
"balanceAvg": {
"value": 23405.68181818182
}
}
]
},
"ageBalanceAvg": {
"value": 25269.583333333332
}
}
]
}
}
}

映射Mapping

类似与创建SQL时的定义的字段数据类型(不同于type)。索引下的类型(type)在ES7版本可选,8版本移除。

映射会在创建时ES自动推断

GET http://localhost:9200/bank/_mapping

获取所有字段映射

返回

返回JSON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
{
"bank": {
"mappings": {
"properties": {
"account_number": {
"type": "long" # 类型为long
},
"address": {
"type": "text", #类型为文本
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"age": {
"type": "long"
},
"balance": {
"type": "long"
},
"city": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"email": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"employer": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"firstname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"gender": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"lastname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"state": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}

手动创建映射,在创建索引时可以手动创建,创建my_index索引

PUT http://localhost:9200/my_index

调用

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
{
"mappings": { # 创建的时候需要
"properties": { # 指定为映射
"age": {
"type": "integer"
},
"email": {
"type": "keyword"
},
"name": {
"type": "text"
}
}
}
}

添加映射

PUT http://localhost:9200/my_index/_mapping

调用

1
2
3
4
5
6
7
8
{
"properties": { # 添加映射
"employee-id": {
"type": "keyword",
"index": false
}
}
}

迁移数据

不支持修改索引,建议迁移。先创建索引,再迁移

PUT http://localhost:9200/_reindex

调用

1
2
3
4
5
6
7
8
9
{
"source": { # 从那迁移
"index": "bank",
"type": "account"
},
"dest": { # 迁移到
"index": "newbank"
}
}

6 安装分词器

这里使用IK分词器,medcl/elasticsearch-analysis-ik: The IK Analysis plugin integrates Lucene IK analyzer into elasticsearch, support customized dictionary. (github.com)。下载与ES对应的ik分词器版本即可。
然解压到之前映射的plugins目录下即可。

使用Ik分词器

GET http://localhost:9200/_analyze

调用

1
2
3
4
{
"analyzer": "ik_smart", # 使用ik分词器
"text": "Elasticsearch(简称ES)是一个基于Apache Lucene™的开源搜索引擎"
}

返回

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
{
"tokens": [
{
"token": "elasticsearch",
"start_offset": 0,
"end_offset": 13,
"type": "ENGLISH",
"position": 0
},
{
"token": "简称",
"start_offset": 14,
"end_offset": 16,
"type": "CN_WORD",
"position": 1
},
{
"token": "es",
"start_offset": 16,
"end_offset": 18,
"type": "ENGLISH",
"position": 2
},
{
"token": "是",
"start_offset": 19,
"end_offset": 20,
"type": "CN_CHAR",
"position": 3
},
{
"token": "一个",
"start_offset": 20,
"end_offset": 22,
"type": "CN_WORD",
"position": 4
},
{
"token": "基于",
"start_offset": 22,
"end_offset": 24,
"type": "CN_WORD",
"position": 5
},
{
"token": "apache",
"start_offset": 24,
"end_offset": 30,
"type": "ENGLISH",
"position": 6
},
{
"token": "lucene",
"start_offset": 31,
"end_offset": 37,
"type": "ENGLISH",
"position": 7
},
{
"token": "的",
"start_offset": 38,
"end_offset": 39,
"type": "CN_CHAR",
"position": 8
},
{
"token": "开源",
"start_offset": 39,
"end_offset": 41,
"type": "CN_WORD",
"position": 9
},
{
"token": "搜索引擎",
"start_offset": 41,
"end_offset": 45,
"type": "CN_WORD",
"position": 10
}
]
}