elasticSearch基础

2023-03-26
作者 songbirds
~49.03K 字

elasticSearch简单介绍，注意，此笔记基于elasticsearch7版本，8版本已经移除type概念。

Elasticsearch（简称ES）是一个基于Apache Lucene™的开源搜索引擎，无论在开源还是专有领域，Lucene 可以被认为是迄今为止最先进、性能最好的、功能最全的搜索引擎库。注意，Lucene 只是一个库。想要发挥其强大的作用，你需使用 Java 并要将其集成到你的应用中。

Lucene 非常复杂，你需要深入的了解检索相关知识来理解它是如何工作的，就跟学习 springmvc 之前先从 servlet 开始，繁琐复杂的工作，Solor、Elasticsearch 应由而生，其使用 Java 编写并使用 Lucene 来建立索引并实现搜索功能，但是它的目的是通过简单连贯的 RESTful API 让全文搜索变得简单并隐藏 Lucene 的复杂性。

上面的介绍摘自Elasticsearch基本概念_波斯_辣椒的博客。根据 CC 4.0 BY-SA 协议获取授权并转载。（偷个懒，毕竟介绍都差不多(❁´◡`❁)）。
注：Solor是另一个lucene封装库。

Elasticsearch: 权威指南 | Elastic。中文文档，可能已经过时。

Elasticsearch Guide | Elastic。英文文档。

1.安装elasticSearch

docker命令可以参考Docker命令_各种参数简介博客

通过docker下载

下载的版本是7.6.2

1	docker pull elasticsearch:7.6.2

配置

#创建目录, 用于后面将docker中的配置文件映射到物理机中
mkdir -p /mydata/elasticsearch/config  
mkdir -p /mydata/elasticsearch/data
# 写入`http.host: 0.0.0.0`到config下的elasticsearch.yml文件中
# 允许任何ip访问（在防火墙限制ip（因为本地是浮动ip））
echo "http.host: 0.0.0.0" >/mydata/elasticsearch/config/elasticsearch.yml 

# 将mydata/elasticsearch/文件夹中文件都可读可写（生产应该更好的做法），-R是递归的意思
chmod -R 777 /mydata/elasticsearch/

启动Elastic search

下面的命令需要去除注释后全部执行。注意：

docker run 只在第一次运行的时候使用，（后续使用docker start），用于将镜像放到容器中。不用指定容器id或名称
docker start 重新启动已存在的镜像。用于后面重新启动镜像。需要指定容器id或名称。
查看启动日志：docker logs 容器名称或ID
如果安装配置错误，可以考虑删除容器（不是删除镜像）。
1. docker stop 容器ID
2. docker rm 容器ID
3. 重新执行docker run

# 会创建镜像，名称为elasticsearch，映射容器的端口9200到物理机的9200（http请求），9300（集群通信）
docker run --name elasticsearch -p 9200:9200 -p 9300:9300 \
# 启动模式：单节点。`-e`:设置镜像的环境变量
-e  "discovery.type=single-node" \
# 限制内存大小，否则会使用所有内存(es特别占用内存）
-e ES_JAVA_OPTS="-Xms64m -Xmx512m" \
# 挂载docker中的配置到外部配置。`-v` 挂载数据卷
# 将docker中的elasticsearch.yml挂载到物理机中的elasticsearch.yml；挂载data文件夹，挂载plugins文件夹
-v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \
-v /mydata/elasticsearch/data:/usr/share/elasticsearch/data \
-v  /mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins \
# 后台运行
-d elasticsearch:7.6.2

设置开机启动（非必须）

1	docker update elasticsearch --restart=always

开放端口：9200，9300
访问

直接访问ip+端口，此处端口为上面设置的9200。若安装失败，使用docker logs 容器id查看日志。

elasticSearch安装访问

2.安装kibana

通过docker下载，需要与elasticsearch版本保持一致

1	docker pull kibana:7.6.2

启动kibana，并设置kibana的elasticsearch地址

# 第一种方式启动，需要指定地址为ip(局域网ip)
docker run --name kibana -e ELASTICSEARCH_HOSTS=http://192.168.6.128:9200 -p 5601:5601 -d kibana:7.6.2
# 第二种方式启动 可以指定地址为localhost(推荐)
# `-d`后台运行，`-p`映射端口，物理机端口5601到容器端口5601，--link，链接名为elasticsearch的容器，
# `-e`配置环境变量ELASTICSEARCH_URL=localhost:9200，9200是elesticsearch
docker run -d -p 5601:5601 --link elasticsearch -e "ELASTICSEARCH_URL=localhost:9200" kibana:7.6.2

开放端口：5601
访问kibana

若安装成功，则可以直接访问kibana的地址。主机ip+port。此处为5601。若安装失败，使用docker logs 容器id查看日志。

kibana访问

3.elasticSearch概念

基本概念：

索引（indices）-------------------Databases 数据库
类型（type）----------------------Table 数据表，在indices下，可以定义一个或多个type。（ES8已移除）
文档（Document）---------------Row 行。以JSON的形式保存
字段（Field）---------------------Columns 列

正排索引：

是以文档对象的唯一 ID 作为索引，以文档内容作为记录的结构。例如关系型数据库的ID。

docID	value
1	动态规划
2	动态壁纸超好看
3	好看动态图

倒排索引

将文档内容中的单词作为索引，将包含该词的文档 docID 作为记录的结构。

先经过正排索引，给文档编号，作为为唯一标识，如上正排索引的表中docID
对字段进行分词。（因此有各种分词器）
按分词建立倒排索引表。term为词，posting list为这个词在哪些docID的value中出现过

term	posting list
动态	1，2，3
规划	1
好看	2，3
壁纸	2
图	3

这些词就是term，而存储原ID的是posting list，存储了所有符合某个term的文档id。

当搜索 动态高清壁纸 的时候，ID=2的命中两次，ID=1、3分别命中1次。经过一些列算法，动态壁纸超好看 分数最高。所有命中的都可以查到，但是分数较低。

当然，这样的存储肯定会特别占用内存，搜索词term的的时候也会比较耗时，底层Lucene有自己更为复杂的实现。
PS：好吧，我就是不懂😭

4.elasticSearch接口

其接口是RestFul风格的。

查看cat支持的所有指令

GET: http://localhost:9200/_cat

返回JSON

=^.^=
/_cat/allocation
/_cat/shards
/_cat/shards/{index}
/_cat/master
/_cat/nodes
/_cat/tasks
/_cat/indices
/_cat/indices/{index}
/_cat/segments
/_cat/segments/{index}
/_cat/count
/_cat/count/{index}
/_cat/recovery
/_cat/recovery/{index}
/_cat/health
/_cat/pending_tasks
/_cat/aliases
/_cat/aliases/{alias}
/_cat/thread_pool
/_cat/thread_pool/{thread_pools}
/_cat/plugins
/_cat/fielddata
/_cat/fielddata/{fields}
/_cat/nodeattrs
/_cat/repositories
/_cat/snapshots/{repository}
/_cat/templates

查看节点信息（_cat是Kibana 控制台）

GET：http://localhost:9200/_cat/nodes。

1	127.0.0.1 17 97 1 0.01 0.02 0.00 dilm * a0afe6713d7f

查看节点的健康情况

GET http://localhost:9200/_cat/health

1	1680096955 13:35:55 elasticsearch yellow 1 1 6 6 0 0 3 0 - 66.7%

查看主节点信息

GET http://localhost:9200/_cat/master

1	Z8a8Ekp4TVuqkv9rEQEzzA 127.0.0.1 127.0.0.1 a0afe6713d7f

查看ES的索引（数据库）

GET http://localhost.28:9200/_cat/indices

yellow open website                  LA03aN0qStmXdO8nmuzWDw 1 1    2 2   8.6kb   8.6kb
yellow open bank                     tAFAIEHTSkK-Oavb2Wd3JQ 1 1 1000 0 414.3kb 414.3kb
green  open .kibana_task_manager_1   V13OX5BcTiaN4YLaXsObfQ 1 0    2 0  21.7kb  21.7kb
green  open .apm-agent-configuration isjY4x8cRvCW2XMevs0ypw 1 0    0 0    283b    283b
green  open .kibana_1                mK0GqkVdREKfabACDAxPKQ 1 0    8 0  25.2kb  25.2kb
yellow open customer                 X34j-t18T12nerLUajXISQ 1 1    3 0   3.7kb   3.7kb

索引一个文档

GET http://localhost:9200/customer/external/1
查询customer索引，external类型下的ID为1

{
    "_index": "customer", # 索引
    "_type": "external", # 类型
    "_id": "1",	# id
    "_version": 1,	# 版本号
    "_seq_no": 10, # 序列号用于并发控制
    "_primary_term": 1,
    "found": true,
    "_source": {
    	# 具体的key-value
        "name": "John Doe"
    }
}

post新增与修改

可以不指定ID，如果不指定，则自动生成ID。如果指定了，则是修改，同时_version会加1，_seq_no也会加1。
POST: http://localhost:9200/customer/external/2

调用

1
2
3

{
 "name":"John Doe"
}

结果

{
    "_index": "customer",
    "_type": "external",
    "_id": "2",
    "_version": 8,
    "result": "updated", # 结果是更新
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 14,
    "_primary_term": 1
}

put新增与修改

必须指定ID，若没有，则是新增，否则是修改
PUT：http://localhost:9200/customer/external/1

调用：

1
2
3

{
 "name":"John Doe"
}

结果：

{
    "_index": "customer",
    "_type": "external",
    "_id": "1",
    "_version": 5,
    "result": "updated",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 18,
    "_primary_term": 1
}

乐观锁修改

_seq_no，_primary_term，可以用于乐观锁更新。if_seq_no=1&if_primary_term=1。通过序列好使用乐观锁
PUT：http://localhost:9200/customer/external/1?if_seq_no=1&if_primary_term=1

调用

1
2
3

{
 "name":"aa"
}

结果

{
    "error": {
        "root_cause": [
            {
                "type": "version_conflict_engine_exception",
                "reason": "[1]: version conflict, required seqNo [1], primary term [1]. current document has seqNo [18] and primary term [1]",
                "index_uuid": "X34j-t18T12nerLUajXISQ",
                "shard": "0",
                "index": "customer"
            }
        ],
        "type": "version_conflict_engine_exception",
        "reason": "[1]: version conflict, required seqNo [1], primary term [1]. current document has seqNo [18] and primary term [1]",
        "index_uuid": "X34j-t18T12nerLUajXISQ",
        "shard": "0",
        "index": "customer"
    },
    "status": 409
}

post更新，带ID，带`_update`

如果更新的数据没有任何变化，则不进行任何操作

POST：http://localhost:9200/customer/external/1/_update

调用：

{
    "doc" : {
        "name" : "john name"
    }
}

结果

{
    "_index": "customer",
    "_type": "external",
    "_id": "1",
    "_version": 7, # 版本号增加
    "result": "updated", # 结果是更新
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 23, # 序列号增加
    "_primary_term": 1
}

再次调用

{
    "_index": "customer",
    "_type": "external",
    "_id": "1",
    "_version": 7, # 不变化
    "result": "noop", # noop
    "_shards": {
        "total": 0,
        "successful": 0,
        "failed": 0
    },
    "_seq_no": 23, # 不变化
    "_primary_term": 1
}

删除数据

DELETE：http://localhost:9200/customer/external/1

返回数据

{
    "_index": "customer",
    "_type": "external",
    "_id": "1",
    "_version": 8,
    "result": "deleted",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 24,
    "_primary_term": 1
}

批量操作，只能在kibana上

如下为批量添加两条数据

elasticsearch-test-data: es测试数据 (gitee.com)

POST /bank/account/_bulk

{"index":{"_id":"1"}}
{"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"[email protected]","city":"Brogan","state":"IL"}
{"index":{"_id":"6"}}
{"account_number":6,"balance":5686,"firstname":"Hattie","lastname":"Bond","age":36,"gender":"M","address":"671 Bristol Street","employer":"Netagy","email":"[email protected]","city":"Dante","state":"TN"}

5 ES检索接口

所有的检索都是先接索引后接_search。

查询条件位于URL

GET http://localhost:9200/bank/_search?q=*&sort=account_number:asc

查询bank索引下，查询条件为所有数据（q=*），按account_number升序排序

返回JSON

{
    "took": 43,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1000,  # 总共1000条数据，此查询会默认返回10条
            "relation": "eq" # 查询条件为等于（eq）
        },
        "max_score": null, # 最大得分为空（没有最大得分）
        "hits": [
            {
                "_index": "bank",
                "_type": "account",
                "_id": "0",
                "_score": null,
                "_source": {
                    "account_number": 0,
                    "balance": 16623,
                    "firstname": "Bradshaw",
                    "lastname": "Mckenzie",
                    "age": 29,
                    "gender": "F",
                    "address": "244 Columbus Place",
                    "employer": "Euron",
                    "email": "[email protected]",
                    "city": "Hobucken",
                    "state": "CO"
                },
                "sort": [
                    0
                ]
            }
        ]
    }
}

查询条件位于json（DSL查询）

查询bank索引下的数据

GET http://localhost:9200/bank/_search

调用

{
    "query": {
        "match_all": {}  # 查询所有
    },
    "sort": [
        {
            "account_number": "asc" # 排序规则，此处为简写
        },
        {
            "balance": {
                "order": "desc" # 排序规则，此处为全写
            }
        }
    ],
    "from": 0,
    "size": 5,  # 分页查询，从0开始，只查5条
    "_source": [   # 如果只想返回balance与firstname字段
        "balance",
        "firstname"
    ]
}

返回JSON

{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1000,
            "relation": "eq"
        },
        "max_score": null,
        "hits": [
            {
                "_index": "bank",
                "_type": "account",
                "_id": "0",
                "_score": null,
                "_source": {
                    "firstname": "Bradshaw",
                    "balance": 16623
                },
                "sort": [
                    0,
                    16623
                ]
            },
            {
                "_index": "bank",
                "_type": "account",
                "_id": "1",
                "_score": null,
                "_source": {
                    "firstname": "Amber",
                    "balance": 39225
                },
                "sort": [
                    1,
                    39225
                ]
            },
            {
                "_index": "bank",
                "_type": "account",
                "_id": "2",
                "_score": null,
                "_source": {
                    "firstname": "Roberta",
                    "balance": 28838
                },
                "sort": [
                    2,
                    28838
                ]
            },
            {
                "_index": "bank",
                "_type": "account",
                "_id": "3",
                "_score": null,
                "_source": {
                    "firstname": "Levine",
                    "balance": 44947
                },
                "sort": [
                    3,
                    44947
                ]
            },
            {
                "_index": "bank",
                "_type": "account",
                "_id": "4",
                "_score": null,
                "_source": {
                    "firstname": "Rodriquez",
                    "balance": 27658
                },
                "sort": [
                    4,
                    27658
                ]
            }
        ]
    }
}

全文检索（分词查询）

使用match，是分词查询，会按评分进行排序

GET http://localhost:9200/bank/_search

调用

{
    "query": {
        "match":{
             # 查询address包含Kings Place分词后的数据（此处会）
        	 # 此处会被分为Kings，Place两个词，
            "address": "Kings Place"
        }
    }
}

返回JSON

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 182, # 182条
            "relation": "eq"  # 查询关系 为等于
        },
        "max_score": 7.6978617, # 最大得分
        "hits": [
            {
                "_index": "bank",
                "_type": "account",
                "_id": "20",
                "_score": 7.6978617,  # 查询到的分数（此条是最高的）
                "_source": {
                    "account_number": 20,
                    "balance": 16418,
                    "firstname": "Elinor",
                    "lastname": "Ratliff",
                    "age": 36,
                    "gender": "M",
                    "address": "282 Kings Place", # 查询匹配字段
                    "employer": "Scentric",
                    "email": "[email protected]",
                    "city": "Ribera",
                    "state": "WA"
                }
            },
            {
                "_index": "bank",
                "_type": "account",
                "_id": "722",
                "_score": 5.9908285, # 查询到的分数
                "_source": {
                    "account_number": 722,
                    "balance": 27256,
                    "firstname": "Roberts",
                    "lastname": "Beasley",
                    "age": 34,
                    "gender": "F",
                    "address": "305 Kings Hwy",
                    "employer": "Quintity",
                    "email": "[email protected]",
                    "city": "Hayden",
                    "state": "PA"
                }
            },
            {
                "_index": "bank",
                "_type": "account",
                "_id": "37",
                "_score": 1.7070332,
                "_source": {
                    "account_number": 37,
                    "balance": 18612,
                    "firstname": "Mcgee",
                    "lastname": "Mooney",
                    "age": 39,
                    "gender": "M",
                    "address": "826 Fillmore Place",
                    "employer": "Reversus",
                    "email": "[email protected]",
                    "city": "Tooleville",
                    "state": "OK"
                }
            }
        ]
    }
}

短语匹配

使用match_phrase进行短语匹配。只要文档里面包含所有分词后的短语，就会被查到，是分词后去查询的，目标文档必须包含分词后的所有词，与term的精确匹配不同

GET http://localhost:9200/bank/_search

调用：

{
    "query": {
        "match_phrase":{
            "address": "Kings Place"
        }
    }
}

返回JSON

{
    "took": 11,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 7.6978617,
        "hits": [
            {
                "_index": "bank",
                "_type": "account",
                "_id": "20",
                "_score": 7.6978617,
                "_source": {
                    "account_number": 20,
                    "balance": 16418,
                    "firstname": "Elinor",
                    "lastname": "Ratliff",
                    "age": 36,
                    "gender": "M",
                    "address": "282 Kings Place", # 只有这一条是精确匹配
                    "employer": "Scentric",
                    "email": "[email protected]",
                    "city": "Ribera",
                    "state": "WA"
                }
            }
        ]
    }
}

多字段匹配

使用multi_match进行多字段匹配。多个字段匹配查询条件，相当于sql的or条件。会进行分词。

GET: http://localhost:9200/bank/_search

调用:

{
    "query": {
        "multi_match":{
            "query": "mill movico",
            "fields": [
                "address", "city"  # 这两个字段匹配查询条件
            ]
        }
    }
}

返回JSON

{
    "took": 4,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 4,
            "relation": "eq"
        },
        "max_score": 6.505949,
        "hits": [
            {
                "_index": "bank",
                "_type": "account",
                "_id": "472",
                "_score": 6.505949,
                "_source": {
                    "account_number": 472,
                    "balance": 25571,
                    "firstname": "Lee",
                    "lastname": "Long",
                    "age": 32,
                    "gender": "F",
                    "address": "288 Mill Street",
                    "employer": "Comverges",
                    "email": "[email protected]",
                    "city": "Movico",
                    "state": "MT"
                }
            },
            {
                "_index": "bank",
                "_type": "account",
                "_id": "970",
                "_score": 5.4032025,
                "_source": {
                    "account_number": 970,
                    "balance": 19648,
                    "firstname": "Forbes",
                    "lastname": "Wallace",
                    "age": 28,
                    "gender": "M",
                    "address": "990 Mill Road",
                    "employer": "Pheast",
                    "email": "[email protected]",
                    "city": "Lopezo",
                    "state": "AK"
                }
            },
            {
                "_index": "bank",
                "_type": "account",
                "_id": "136",
                "_score": 5.4032025,
                "_source": {
                    "account_number": 136,
                    "balance": 45801,
                    "firstname": "Winnie",
                    "lastname": "Holland",
                    "age": 38,
                    "gender": "M",
                    "address": "198 Mill Lane",
                    "employer": "Neteria",
                    "email": "[email protected]",
                    "city": "Urie",
                    "state": "IL"
                }
            },
            {
                "_index": "bank",
                "_type": "account",
                "_id": "345",
                "_score": 5.4032025,
                "_source": {
                    "account_number": 345,
                    "balance": 9812,
                    "firstname": "Parker",
                    "lastname": "Hines",
                    "age": 38,
                    "gender": "M",
                    "address": "715 Mill Avenue",
                    "employer": "Baluba",
                    "email": "[email protected]",
                    "city": "Blackgum",
                    "state": "KY"
                }
            }
        ]
    }
}

bool复合查询

用于构建合并多个查询条件。

must: 必须满足的条件；作为条件，同时还会贡献得分，与filter区别

must_not：必须不满足；不会贡献得分，通filter

should：应该，满足了会贡献得分。

GET：http://localhost:9200/bank/_search

调用：

调用JSON

{
    "query": {
        "bool":{
            "must":[  # 必须满足的条件
                {
                    "match":{  # 匹配
                        "gender": "M"  # gender字段M
                    }
                },
                {
                    "match":{
                        "address": "mill"
                    }
                }
            ],
            "must_not" : [  # 必须不满足
                {
                    "match":{
                        "age": 38 # 年龄不等于38
                    }
                }
            ],
            "should": [  # 应该，满了会加分，不满足也会查到，但是不加分
                {
                    "match":{
                        "lastname": "Wallace"
                    }
                }
            ]
        }
    }
}

返回JSON

{
    "took": 7,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 12.585751,
        "hits": [
            {
                "_index": "bank",
                "_type": "account",
                "_id": "970",
                "_score": 12.585751,
                "_source": {
                    "account_number": 970,
                    "balance": 19648,
                    "firstname": "Forbes",
                    "lastname": "Wallace",
                    "age": 28,
                    "gender": "M",
                    "address": "990 Mill Road",
                    "employer": "Pheast",
                    "email": "[email protected]",
                    "city": "Lopezo",
                    "state": "AK"
                }
            }
        ]
    }
}

filter过滤

用来作为筛选条件，不会贡献得分。不同于must与must_not

GET：http://localhost:9200/bank/_search

调用

{
    "query": {
        "bool":{
            "filter":[
                {
                    "range":{ # 为范围查询
                        "age":{  # 查询年龄在18到20
                            "gte": 18,
                            "lte": 20
                        }
                    }
                }
            ]
        }
    }
}

返回JSON

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 44,
            "relation": "eq"
        },
        "max_score": 0.0,  # 最高的分为0，验证了不会贡献得分
        "hits": [
            {
                "_index": "bank",
                "_type": "account",
                "_id": "157",
                "_score": 0.0,
                "_source": {
                    "account_number": 157,
                    "balance": 39868,
                    "firstname": "Claudia",
                    "lastname": "Terry",
                    "age": 20,
                    "gender": "F",
                    "address": "132 Gunnison Court",
                    "employer": "Lumbrex",
                    "email": "[email protected]",
                    "city": "Castleton",
                    "state": "MD"
                }
            },
            {
                "_index": "bank",
                "_type": "account",
                "_id": "215",
                "_score": 0.0,
                "_source": {
                    "account_number": 215,
                    "balance": 37427,
                    "firstname": "Copeland",
                    "lastname": "Solomon",
                    "age": 20,
                    "gender": "M",
                    "address": "741 McDonald Avenue",
                    "employer": "Recognia",
                    "email": "[email protected]",
                    "city": "Edmund",
                    "state": "ME"
                }
            }
        ]
    }
}

term与文本精确查询

term的精确匹配，不会进行分词。非文本字段的精确查询，例如年龄、金额等数字。

若需要精确检索，使用字段.keywword。

GET：http://localhost:9200/bank/_search

调用：

{
    "query": {
        "term": {
            "age": 28
        }
    }
}

文本的精确查询

{
    "query": {
        "match": {
            "address.keyword": "789 Madison Street" # keywword关键字
        }
    }
}

聚合检索aggs

用于对查询后的条件进行分析与提取，类似于SQL的group by和SQL的聚合函数。

搜索address为mill，所有人的年龄分布，与平均年龄。但是不显示这些人的详情。

GET：http://localhost:9200/bank/_search

调用：

{
    "query": {
        "match": {
            "address": "mill"   # 查询address为mill的数据
        }
    },
    "aggs": {
        "ageAgg": { # 自定义聚合名字为ageAgg
            "terms" : { # 聚合类型terms，该类型类似于count group by `age` 按年龄分组统计数量
                "field": "ag", 
                "size": 10 # 只展示前10个
            }
        },
        "ageAvg": { # 自定义聚合名字为ageAvg
            "avg" : { # 聚合类型为avg
                "field": "age" # 统计平均年龄
            }
        }
    },
    "size" : "0"  # 分页为0，不要任何文档数据，只要聚合结果
}

返回JSON

{
    "took": 5,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 4,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "ageAgg": {  # 自定义名字为ageAgg的聚合
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": 38,  # 年龄38的有2个
                    "doc_count": 2
                },
                {
                    "key": 28,
                    "doc_count": 1
                },
                {
                    "key": 32,
                    "doc_count": 1
                }
            ]
        },
        "ageAvg": {  # 自定义名字为ageAvg的聚合
            "value": 34.0  # 平均年龄为34.0
        }
    }
}

按年龄分组统计数量；按年龄、性别分组统计数量；按年龄、性别分组统计平均工资；按年龄的平均工资

调用JSON

{
    "query": {
        "match_all": {}
    },
    "aggs": {  # 聚合
        "ageAgg": {  #  自定义聚合名
             
            "terms": {
                "field": "age",  # 按年龄count
                "size": 2   # 展示前3条
            },
            "aggs": { # 嵌套聚合，此聚合基于按age分组后
                     
                "genderAgg": {  #自定义聚合名字
                    "terms": {
                        "field": "gender.keyword",  # 聚合按gender的精确字段统计（因为此字段是文本）
                        "size": 10
                    },
                    "aggs": {  # 嵌套聚合，此聚合基于按age、gender分组后
                        "balanceAvg": { # 自定义聚合名字
                            "avg": { # 求balance平均值
                                "field": "balance" 
                            }
                        }
                    }
                },

                "ageBalanceAvg": { # 自定义聚合名字
                    "avg": { # 求平均值 （每个age的平均balance）
                        "field": "balance"
                    }
                }
            }
        }
    },
    "size": "0"
}

返回JSON

{
    "took": 9,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1000,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "ageAgg": { # 聚合名称
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 879,
            "buckets": [
                {
                    "key": 31, # 年龄31的
                    "doc_count": 61, # 数量
                    "genderAgg": { # 聚合名称
                        "doc_count_error_upper_bound": 0,
                        "sum_other_doc_count": 0,
                        "buckets": [
                            {
                                "key": "M",  # gender 为M的
                                "doc_count": 35, # 数量
                                "balanceAvg": {  # 聚合名称
                                    "value": 29565.628571428573  # balance平均值29565.628571428573
                                }
                            },
                            {
                                "key": "F", # gender为F的
                                "doc_count": 26, # 数量26
                                "balanceAvg": { # 聚合名称
                                    "value": 26626.576923076922 # balance平均值29565.628571428573（年龄31的、gender为F的平均balance）
                                }
                            }
                        ]
                    },
                    "ageBalanceAvg": {  # 聚合名
                        "value": 28312.918032786885 # balanc平均值28312.918032786885（年龄38的平均balance）
                    }
                },
                {
                    "key": 39,
                    "doc_count": 60,
                    "genderAgg": {
                        "doc_count_error_upper_bound": 0,
                        "sum_other_doc_count": 0,
                        "buckets": [
                            {
                                "key": "F",
                                "doc_count": 38,
                                "balanceAvg": {
                                    "value": 26348.684210526317
                                }
                            },
                            {
                                "key": "M",
                                "doc_count": 22,
                                "balanceAvg": {
                                    "value": 23405.68181818182
                                }
                            }
                        ]
                    },
                    "ageBalanceAvg": {
                        "value": 25269.583333333332
                    }
                }
            ]
        }
    }
}

映射Mapping

类似与创建SQL时的定义的字段数据类型（不同于type）。索引下的类型（type）在ES7版本可选，8版本移除。

映射会在创建时ES自动推断

GET http://localhost:9200/bank/_mapping

获取所有字段映射

返回JSON

{
    "bank": {
        "mappings": {
            "properties": {
                "account_number": {
                    "type": "long" # 类型为long
                },
                "address": {
                    "type": "text",  #类型为文本
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "age": {
                    "type": "long"
                },
                "balance": {
                    "type": "long"
                },
                "city": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "email": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "employer": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "firstname": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "gender": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "lastname": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "state": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                }
            }
        }
    }
}

手动创建映射，在创建索引时可以手动创建，创建my_index索引

PUT http://localhost:9200/my_index

调用

{
    "mappings": {  # 创建的时候需要
        "properties": { # 指定为映射
            "age": {
                "type": "integer"
            },
            "email": {
                "type": "keyword"
            },
            "name": {
                "type": "text"
            }
        }
    }
}

添加映射

PUT http://localhost:9200/my_index/_mapping

调用

{
    "properties": { # 添加映射
        "employee-id": {
            "type": "keyword",
            "index": false
        }
    }
}

迁移数据

不支持修改索引，建议迁移。先创建索引，再迁移

PUT http://localhost:9200/_reindex

调用

{
    "source": { # 从那迁移
        "index": "bank",
        "type": "account"
    },
    "dest": { # 迁移到
        "index": "newbank"
    }
}

6 安装分词器

这里使用IK分词器，medcl/elasticsearch-analysis-ik: The IK Analysis plugin integrates Lucene IK analyzer into elasticsearch, support customized dictionary. (github.com)。下载与ES对应的ik分词器版本即可。
然解压到之前映射的plugins目录下即可。

使用Ik分词器

GET http://localhost:9200/_analyze

调用

{
    "analyzer": "ik_smart", # 使用ik分词器
    "text": "Elasticsearch（简称ES）是一个基于Apache Lucene™的开源搜索引擎"
}

{
    "tokens": [
        {
            "token": "elasticsearch",
            "start_offset": 0,
            "end_offset": 13,
            "type": "ENGLISH",
            "position": 0
        },
        {
            "token": "简称",
            "start_offset": 14,
            "end_offset": 16,
            "type": "CN_WORD",
            "position": 1
        },
        {
            "token": "es",
            "start_offset": 16,
            "end_offset": 18,
            "type": "ENGLISH",
            "position": 2
        },
        {
            "token": "是",
            "start_offset": 19,
            "end_offset": 20,
            "type": "CN_CHAR",
            "position": 3
        },
        {
            "token": "一个",
            "start_offset": 20,
            "end_offset": 22,
            "type": "CN_WORD",
            "position": 4
        },
        {
            "token": "基于",
            "start_offset": 22,
            "end_offset": 24,
            "type": "CN_WORD",
            "position": 5
        },
        {
            "token": "apache",
            "start_offset": 24,
            "end_offset": 30,
            "type": "ENGLISH",
            "position": 6
        },
        {
            "token": "lucene",
            "start_offset": 31,
            "end_offset": 37,
            "type": "ENGLISH",
            "position": 7
        },
        {
            "token": "的",
            "start_offset": 38,
            "end_offset": 39,
            "type": "CN_CHAR",
            "position": 8
        },
        {
            "token": "开源",
            "start_offset": 39,
            "end_offset": 41,
            "type": "CN_WORD",
            "position": 9
        },
        {
            "token": "搜索引擎",
            "start_offset": 41,
            "end_offset": 45,
            "type": "CN_WORD",
            "position": 10
        }
    ]
}

Songbirds

elasticSearch基础

1.安装elasticSearch

2.安装kibana

3.elasticSearch概念

基本概念：

正排索引：

倒排索引

4.elasticSearch接口

查看cat支持的所有指令

查看节点信息（_cat是Kibana 控制台）

查看节点的健康情况

查看主节点信息

查看ES的索引（数据库）

索引一个文档

post新增与修改

put新增与修改

乐观锁修改

post更新，带ID，带`_update`

删除数据

批量操作，只能在kibana上

5 ES检索接口

查询条件位于URL

查询条件位于json（DSL查询）

全文检索（分词查询）

短语匹配

多字段匹配

bool复合查询

filter过滤

term与文本精确查询

聚合检索aggs

映射Mapping

6 安装分词器

使用Ik分词器

本作品采用知识共享署名-相同方式共享 4.0 国际许可协议进行许可

Songbirds

1.安装elasticSearch

2.安装kibana

3.elasticSearch概念

基本概念：

正排索引：

倒排索引

4.elasticSearch接口

查看cat支持的所有指令

查看节点信息（_cat是Kibana 控制台）

查看节点的健康情况

查看主节点信息

查看ES的索引（数据库）

索引一个文档

post新增与修改

put新增与修改

乐观锁修改

post更新，带ID，带_update

删除数据

批量操作，只能在kibana上

5 ES检索接口

查询条件位于URL

查询条件位于json（DSL查询）

全文检索（分词查询）

短语匹配

多字段匹配

bool复合查询

filter过滤

term与文本精确查询

聚合检索aggs

映射Mapping

6 安装分词器

使用Ik分词器

本作品采用 知识共享署名-相同方式共享 4.0 国际许可协议 进行许可

post更新，带ID，带`_update`

本作品采用知识共享署名-相同方式共享 4.0 国际许可协议进行许可