CN117312478B

CN117312478B - Address positioning method, device, electronic equipment and storage medium

Info

Publication number: CN117312478B
Application number: CN202311618045.0A
Authority: CN
Inventors: 王成波; 常原飞; 权一卓; 刘朔; 张迎; 刘丽; 荐军; 乔彦友
Original assignee: Aerospace Information Research Institute of CAS
Current assignee: Aerospace Information Research Institute of CAS
Priority date: 2023-11-30
Filing date: 2023-11-30
Publication date: 2024-03-22
Anticipated expiration: 2043-11-30
Also published as: CN117312478A

Abstract

The invention provides an address positioning method, an address positioning device, electronic equipment and a storage medium, belonging to the technical field of positioning and searching, wherein the method comprises the following steps: determining a geographic grid range and an embedded representation vector of a query address based on the query address input by a user; determining a plurality of candidate address information matched with the geographical grid range from an address coordinate library, and determining an embedded representation vector of each candidate address information; the address coordinate library is constructed based on all address text data of the space region range and corresponding geographic coordinate information; and carrying out matching analysis on the embedded representation vector of the query address and the embedded representation vector of each piece of candidate address information, and determining the geographic position information of the query address. The invention can realize effective address positioning, improve the address inquiry matching efficiency and improve the address positioning precision.

Description

Address positioning method, device, electronic equipment and storage medium

技术领域Technical field

本发明涉及定位搜索技术领域，尤其涉及一种地址定位方法、装置、电子设备及存储介质。The present invention relates to the technical field of positioning and search, and in particular to an address positioning method, device, electronic equipment and storage medium.

背景技术Background technique

随着互联网技术及智能定位技术的快速发展，在越来越多的互联网业务场景（如物流运输、智能导航业务）中，需要用户通过输入地址文本信息来获取对应的实际地理位置信息。为解决上述难题，目前通常会使用地址解析（Geocoding）算法，其根据用户输入的地址信息从地理编码数据库中解析出文本地址所对应的实际地理位置。With the rapid development of Internet technology and intelligent positioning technology, in more and more Internet business scenarios (such as logistics and transportation, intelligent navigation services), users are required to enter address text information to obtain the corresponding actual geographical location information. In order to solve the above problems, currently the address resolution (Geocoding) algorithm is usually used, which parses the actual geographical location corresponding to the text address from the geocoding database based on the address information input by the user.

然而，由于现有的Geocoding技术的限制，地理编码数据库中兴趣点（Point OfInterest，POI）的数量有限，无法涵盖所有用户输入的文本地址，而且，用户输入的文本地址也可能存在不规范的情况，时常会遇到在地理编码数据库中未出现的文本地址。对于地理编码数据库中未出现的文本地址，一般是在地理编码数据库中逐一找出附近相关的POI，再通过插值等方式输出一个大概的地理位置。如果查询不到附近POI，或者查询不准确，那么则无法定位或是无效定位，并且地址查询匹配的效率也较为低下。However, due to limitations of existing Geocoding technology, the number of Points of Interest (POIs) in the geocoding database is limited and cannot cover all text addresses entered by users. Moreover, the text addresses entered by users may also be irregular. , it is common to encounter text addresses that do not appear in the geocoding database. For text addresses that do not appear in the geocoding database, it is generally necessary to find nearby related POIs one by one in the geocoding database, and then output an approximate geographical location through interpolation and other methods. If the nearby POI cannot be queried, or the query is inaccurate, then the positioning cannot be performed or the positioning is invalid, and the efficiency of address query matching is also relatively low.

发明内容Contents of the invention

本发明提供一种地址定位方法、装置、电子设备及存储介质，用以解决现有技术中Geocoding技术提供的地理位置信息的精度较低，并且地址查询匹配的效率也较为低下的缺陷。The present invention provides an address positioning method, device, electronic equipment and storage medium to solve the defects in the prior art that the geographical location information provided by Geocoding technology has low accuracy and the efficiency of address query and matching is also low.

本发明提供一种地址定位方法，包括：The present invention provides an address positioning method, which includes:

基于用户输入的查询地址，确定所述查询地址的地理网格范围及嵌入表示向量；Based on the query address input by the user, determine the geographical grid range and embedded representation vector of the query address;

从地址坐标库中确定与所述地理网格范围相匹配的多个候选地址信息，并确定每个所述候选地址信息的嵌入表示向量；所述地址坐标库是基于空间区域范围的所有地址文本数据以及对应的地理坐标信息进行构建的；Determine multiple candidate address information matching the geographical grid range from the address coordinate library, and determine the embedded representation vector of each candidate address information; the address coordinate library is based on all address texts of the spatial area range Data and corresponding geographical coordinate information are constructed;

对所述查询地址的嵌入表示向量与每个所述候选地址信息的嵌入表示向量进行匹配分析，确定所述查询地址的地理位置信息。Matching analysis is performed on the embedded representation vector of the query address and the embedded representation vector of each candidate address information to determine the geographical location information of the query address.

根据本发明提供的一种地址定位方法，所述基于用户输入的查询地址，确定所述查询地址的地理网格范围及嵌入表示向量，包括：According to an address positioning method provided by the present invention, determining the geographical grid range and embedded representation vector of the query address based on the query address input by the user includes:

将所述用户输入的查询地址输入至定位预测神经网络模型，得到所述定位预测神经网络模型输出的所述查询地址的地理网格范围及嵌入表示向量；Input the query address input by the user into the positioning prediction neural network model, and obtain the geographical grid range and embedded representation vector of the query address output by the positioning prediction neural network model;

所述定位预测神经网络模型是根据文本地址样本及其对应的样本标签进行训练得到的。The positioning prediction neural network model is trained based on text address samples and their corresponding sample labels.

根据本发明提供的一种地址定位方法，所述定位预测神经网络模型包括基础骨干子网络、第一预测子网络和第二预测子网络；所述将所述用户输入的查询地址输入至定位预测神经网络模型，得到所述定位预测神经网络模型输出的所述查询地址的地理网格范围及嵌入表示向量，包括：According to an address positioning method provided by the present invention, the positioning prediction neural network model includes a basic backbone sub-network, a first prediction sub-network and a second prediction sub-network; the query address input by the user is input into the positioning prediction Neural network model to obtain the geographical grid range and embedded representation vector of the query address output by the positioning prediction neural network model, including:

将所述用户输入的查询地址输入至所述基础骨干子网络，得到所述基础骨干子网络输出的所述查询地址的文本语义向量；所述基础骨干子网络是基于Transformer编码器构建的；Input the query address input by the user into the basic backbone sub-network to obtain the text semantic vector of the query address output by the basic backbone sub-network; the basic backbone sub-network is constructed based on the Transformer encoder;

将所述查询地址的文本语义向量输入至所述第一预测子网络，得到所述第一预测子网络输出的所述查询地址的地理网格范围；所述第一预测子网络是基于全连接层神经网络构建的；Input the textual semantic vector of the query address into the first prediction sub-network to obtain the geographical grid range of the query address output by the first prediction sub-network; the first prediction sub-network is based on full connection Constructed by layer neural network;

将所述查询地址的文本语义向量输入至所述第二预测子网络，得到所述第二预测子网络输出的所述查询地址的嵌入表示向量；所述第二预测子网络是基于句向量池化层神经网络和全连接层神经网络构建的。Input the textual semantic vector of the query address into the second prediction sub-network to obtain the embedded representation vector of the query address output by the second prediction sub-network; the second prediction sub-network is based on a sentence vector pool It is constructed by layer neural network and fully connected layer neural network.

根据本发明提供的一种地址定位方法，所述对所述查询地址的嵌入表示向量与每个所述候选地址信息的嵌入表示向量进行匹配分析，确定所述查询地址的地理位置信息，包括：According to an address positioning method provided by the present invention, the step of matching and analyzing the embedded representation vector of the query address and the embedded representation vector of each candidate address information to determine the geographical location information of the query address includes:

利用相似度匹配方法，计算所述查询地址的嵌入表示向量与每个所述候选地址信息的嵌入表示向量的距离，以确定所述查询地址的嵌入表示向量相对于每个所述候选地址信息的嵌入表示向量的相似度信息；Using a similarity matching method, calculate the distance between the embedded representation vector of the query address and the embedded representation vector of each of the candidate address information to determine the distance of the embedded representation vector of the query address relative to each of the candidate address information. The embedding represents the similarity information of the vector;

从各个所述候选地址信息中，确定所述相似度信息中最大值所对应的目标候选地址信息；From each of the candidate address information, determine the target candidate address information corresponding to the maximum value in the similarity information;

根据所述目标候选地址信息，得到所述查询地址的地理位置信息。According to the target candidate address information, the geographical location information of the query address is obtained.

根据本发明提供的一种地址定位方法，在所述将所述用户输入的查询地址输入至定位预测神经网络模型之前，所述方法还包括：According to an address positioning method provided by the present invention, before inputting the query address input by the user into the positioning prediction neural network model, the method further includes:

将所述文本地址样本及其对应的样本标签作为一组训练样本，获取多组训练样本；Use the text address sample and its corresponding sample label as a set of training samples to obtain multiple sets of training samples;

利用所述多组训练样本，对定位预测神经网络模型进行训练。The multiple sets of training samples are used to train the positioning prediction neural network model.

根据本发明提供的一种地址定位方法，所述利用所述多组训练样本，对定位预测神经网络模型进行训练，包括：According to an address positioning method provided by the present invention, using the multiple sets of training samples to train a positioning prediction neural network model includes:

对于任意一组训练样本，将所述训练样本输入至定位预测神经网络模型，输出所述训练样本对应的预测概率；For any set of training samples, input the training samples into the positioning prediction neural network model, and output the prediction probability corresponding to the training samples;

利用预设损失函数，根据所述训练样本对应的预测概率和所述训练样本中的样本标签计算损失值；Using a preset loss function, calculate the loss value based on the predicted probability corresponding to the training sample and the sample label in the training sample;

基于所述损失值，对所述定位预测神经网络模型的模型参数进行调整，直至模型训练次数达到预设次数；Based on the loss value, adjust the model parameters of the positioning prediction neural network model until the number of model training times reaches a preset number;

将模型训练次数达到所述预设次数时所得到的模型参数作为训练好的定位预测神经网络模型的模型参数。The model parameters obtained when the number of model training times reaches the preset number of times are used as model parameters of the trained positioning prediction neural network model.

根据本发明提供的一种地址定位方法，在所述基于用户输入的查询地址，确定所述查询地址的地理网格范围及嵌入表示向量之前，所述方法还包括：According to an address positioning method provided by the present invention, before determining the geographical grid range and embedded representation vector of the query address based on the query address input by the user, the method further includes:

对空间区域范围进行地理网格划分，并对划分后的各个地理网格进行编号，确定每个所述地理网格的网格编号；Divide the spatial area into geographical grids, number each divided geographical grid, and determine the grid number of each geographical grid;

将所述空间区域范围的所有地址文本数据以及对应的地理坐标信息，与每个所述地理网格的网格编号进行关联处理，并获取关联处理后的关联信息；Associate all address text data and corresponding geographic coordinate information in the spatial area with the grid number of each geographic grid, and obtain the associated information after association processing;

基于所述关联信息，为所述所有地址文本数据以及对应的地理坐标信息构建地理网格空间索引规则，并确定所述所有地址文本数据的嵌入表示向量；Based on the associated information, construct geographic grid space index rules for all address text data and corresponding geographic coordinate information, and determine embedded representation vectors of all address text data;

将所述地理网格空间索引规则、所述所有地址文本数据的嵌入表示向量、所述所有地址文本数据以及对应的地理坐标信息存入空间数据库，得到所述地址坐标库。The geographical grid spatial index rules, the embedded representation vectors of all address text data, the all address text data and the corresponding geographical coordinate information are stored in a spatial database to obtain the address coordinate library.

本发明还提供一种地址定位装置，包括：The invention also provides an address positioning device, including:

处理模块，用于基于用户输入的查询地址，确定所述查询地址的地理网格范围及嵌入表示向量；A processing module configured to determine the geographical grid range and embedded representation vector of the query address based on the query address input by the user;

匹配模块，用于从地址坐标库中确定与所述地理网格范围相匹配的多个候选地址信息，并确定每个所述候选地址信息的嵌入表示向量；所述地址坐标库是基于空间区域范围的所有地址文本数据以及对应的地理坐标信息进行构建的；A matching module, configured to determine multiple candidate address information that matches the geographical grid range from the address coordinate library, and determine an embedded representation vector of each candidate address information; the address coordinate library is based on a spatial area All address text data in the range and corresponding geographical coordinate information are constructed;

定位模块，用于对所述查询地址的嵌入表示向量与每个所述候选地址信息的嵌入表示向量进行匹配分析，确定所述查询地址的地理位置信息。A positioning module, configured to perform matching analysis on the embedded representation vector of the query address and the embedded representation vector of each candidate address information, and determine the geographical location information of the query address.

本发明还提供一种电子设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述程序时实现如上述任一种所述地址定位方法。The present invention also provides an electronic device, including a memory, a processor and a computer program stored in the memory and executable on the processor. When the processor executes the program, it implements any of the above address positioning methods. .

本发明还提供一种非暂态计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现如上述任一种所述地址定位方法。The present invention also provides a non-transitory computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, it implements any of the above address positioning methods.

本发明还提供一种计算机程序产品，包括计算机程序，所述计算机程序被处理器执行时实现如上述任一种所述地址定位方法。The present invention also provides a computer program product, which includes a computer program. When the computer program is executed by a processor, the computer program implements any of the above address positioning methods.

本发明提供的地址定位方法、装置、电子设备及存储介质，通过利用空间区域范围的所有地址文本数据以及对应的地理坐标信息预先构建地址坐标库，并利用地址文本的语义特征嵌入表示来提升地址文本查询匹配效率，使得在收到用户输入的查询地址后，通过确定该查询地址的地理网格范围及嵌入表示向量，可以从地址坐标库中快速匹配到与该地理网格范围相匹配的多个候选地址信息以及每个候选地址信息的嵌入表示向量，进而对该查询地址的嵌入表示向量与每个候选地址信息的嵌入表示向量进行匹配分析，根据匹配成功的候选地址信息来获取查询地址的地理位置信息，在实现有效地址定位，并提升地址查询匹配效率的同时，也提高了地址定位的精度。The address positioning method, device, electronic equipment and storage medium provided by the present invention pre-construct an address coordinate library by utilizing all address text data in the spatial area and corresponding geographical coordinate information, and use the semantic feature embedding representation of the address text to improve the address. The text query matching efficiency enables that after receiving the query address input by the user, by determining the geographical grid range and embedded representation vector of the query address, multiple addresses matching the geographical grid range can be quickly matched from the address coordinate library. candidate address information and the embedded representation vector of each candidate address information, and then perform matching analysis on the embedded representation vector of the query address and the embedded representation vector of each candidate address information, and obtain the query address based on the successfully matched candidate address information. Geographical location information not only achieves effective address positioning and improves the efficiency of address query matching, but also improves the accuracy of address positioning.

附图说明Description of drawings

为了更清楚地说明本发明或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to explain the present invention or the technical solutions in the prior art more clearly, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are of the present invention. For some embodiments of the invention, those of ordinary skill in the art can also obtain other drawings based on these drawings without exerting creative efforts.

图1是本发明提供的地址定位方法的流程示意图之一；Figure 1 is one of the flow diagrams of the address positioning method provided by the present invention;

图2是本发明提供的地址定位方法的流程示意图之二；Figure 2 is the second schematic flow chart of the address positioning method provided by the present invention;

图3是本发明提供的地址定位装置的结构示意图；Figure 3 is a schematic structural diagram of the address positioning device provided by the present invention;

图4是本发明提供的电子设备的实体结构示意图。Figure 4 is a schematic diagram of the physical structure of the electronic device provided by the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚，下面将结合本发明中的附图，对本发明中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the purpose, technical solutions and advantages of the present invention more clear, the technical solutions in the present invention will be clearly and completely described below in conjunction with the accompanying drawings of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention. , not all examples. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without making creative efforts fall within the scope of protection of the present invention.

在发明的描述中，需要说明的是，除非另有明确的规定和限定，术语“安装”、“相连”、“连接”应做广义理解，例如，可以是固定连接，也可以是可拆卸连接，或一体地连接；可以是机械连接，也可以是电连接；可以是直接相连，也可以通过中间媒介间接相连，可以是两个元件内部的连通。对于本领域的普通技术人员而言，可以根据具体情况理解上述术语在本发明中的具体含义。In the description of the invention, it should be noted that, unless otherwise clearly stated and limited, the terms "installation", "connection" and "connection" should be understood in a broad sense. For example, it can be a fixed connection or a detachable connection. , or integrally connected; it can be a mechanical connection or an electrical connection; it can be a direct connection or an indirect connection through an intermediate medium; it can be an internal connection between two components. For those of ordinary skill in the art, the specific meanings of the above terms in the present invention can be understood according to specific circumstances.

下面结合图1-图4描述本发明的地址定位方法、装置、电子设备及存储介质。The address positioning method, device, electronic equipment and storage medium of the present invention will be described below with reference to Figures 1-4.

图1是本发明提供的地址定位方法的流程示意图之一，如图1所示，包括：步骤110，步骤120和步骤130。Figure 1 is one of the flow diagrams of the address positioning method provided by the present invention. As shown in Figure 1, it includes: step 110, step 120 and step 130.

步骤110，基于用户输入的查询地址，确定查询地址的地理网格范围及嵌入表示向量；Step 110: Based on the query address input by the user, determine the geographical grid range and embedded representation vector of the query address;

步骤120，从地址坐标库中确定与地理网格范围相匹配的多个候选地址信息，并确定每个候选地址信息的嵌入表示向量；地址坐标库是基于空间区域范围的所有地址文本数据以及对应的地理坐标信息进行构建的；Step 120: Determine multiple candidate address information matching the geographical grid range from the address coordinate library, and determine the embedded representation vector of each candidate address information; the address coordinate library is based on all address text data and corresponding spatial area ranges. Constructed with geographical coordinate information;

步骤130，对查询地址的嵌入表示向量与每个候选地址信息的嵌入表示向量进行匹配分析，确定查询地址的地理位置信息。Step 130: Perform matching analysis on the embedded representation vector of the query address and the embedded representation vector of each candidate address information to determine the geographical location information of the query address.

具体地，本发明实施例所描述的查询地址指的是用户在前端人机交互界面上输入的用于描述想要查询的地理位置的文本信息，其具体可以为包含省、市、区、街道/镇等区域级别的地名或详细地址信息。Specifically, the query address described in the embodiment of the present invention refers to the text information input by the user on the front-end human-computer interaction interface to describe the geographical location that the user wants to query, which may specifically include province, city, district, street /Town and other regional level place names or detailed address information.

本发明实施例所描述的嵌入表示向量指的是通过文本特征提取算法对用户输入的查询地址进行文本特征提取所得到的向量表示，其可以用于表征查询地址文本的相关语义信息。The embedded representation vector described in the embodiment of the present invention refers to the vector representation obtained by extracting text features of the query address input by the user through a text feature extraction algorithm, which can be used to characterize the relevant semantic information of the query address text.

本发明实施例所描述的地理网格范围指的是利用地理位置预测算法或模型，对用户输入的查询地址进行预测及查询匹配所得到的该查询地址所属的地理网格范围。The geographical grid range described in the embodiment of the present invention refers to the geographical grid range to which the query address belongs obtained by using a geographical location prediction algorithm or model to predict and query the query address input by the user.

本发明实施例所描述的地址坐标库是预先基于空间区域范围的所有地址文本数据以及对应的地理坐标信息进行构建得到的地址数据库，该数据库存储有所有地址数据以及各个地址数据对应文本语义特征的嵌入表示向量数据。The address coordinate database described in the embodiment of the present invention is an address database constructed in advance based on all address text data in the spatial area and corresponding geographical coordinate information. This database stores all address data and the text semantic features corresponding to each address data. Embedding represents vector data.

在本发明的实施例中，在步骤110中，通过接收用户前端输入的查询地址信息，对用户输入的查询地址进行文本特征提取，并根据提取到的文本特征进行地理位置预测及嵌入向量表示，从而获取到该查询地址的地理网格范围及嵌入表示向量。In the embodiment of the present invention, in step 110, by receiving the query address information input by the user on the front end, the text features of the query address input by the user are extracted, and the geographical location prediction and embedding vector representation are performed based on the extracted text features. Thus, the geographical grid range and embedded representation vector of the query address are obtained.

进一步地，在本发明的实施例中，在步骤120中，根据获取到的地理网格范围，从预先构建好的地址坐标库中，可以查找到与该地理网格范围相匹配的多个候选地址信息，并确定出每个候选地址信息的嵌入表示向量。Further, in the embodiment of the present invention, in step 120, according to the obtained geographical grid range, multiple candidates matching the geographical grid range can be found from the pre-constructed address coordinate library address information, and determine the embedding representation vector of each candidate address information.

进一步地，在本发明的实施例中，在步骤130中，在确定出与该地理网格范围相匹配的多个候选地址信息及对应的嵌入表示向量之后，对查询地址的嵌入表示向量与每个候选地址信息的嵌入表示向量进行匹配分析，可以通过计算查询地址的嵌入表示向量与各个候选地址信息的嵌入表示向量之间的相似度，匹配到与查询地址的嵌入表示向量相似度最高的嵌入表示向量，由此反推出匹配到的该嵌入表示向量所对应的候选地址信息，从而将该候选地址信息最终作为查询地址的地理位置信息，以此完成对用户输入的查询地址的快速精准定位。Further, in the embodiment of the present invention, in step 130, after determining multiple candidate address information and corresponding embedded representation vectors matching the geographical grid range, the embedded representation vector of the query address is compared with each Matching analysis can be performed on the embedding representation vectors of each candidate address information. By calculating the similarity between the embedding representation vector of the query address and the embedding representation vector of each candidate address information, the embedding with the highest similarity to the embedding representation vector of the query address can be matched. Representation vector, from which the candidate address information corresponding to the matched embedded representation vector is deduced, so that the candidate address information is finally used as the geographical location information of the query address, thereby completing the rapid and accurate positioning of the query address input by the user.

本发明实施例的地址定位方法，通过利用空间区域范围的所有地址文本数据以及对应的地理坐标信息预先构建地址坐标库，并利用地址文本的语义特征嵌入表示来提升地址文本查询匹配效率，使得在收到用户输入的查询地址后，通过确定该查询地址的地理网格范围及嵌入表示向量，可以从地址坐标库中快速匹配到与该地理网格范围相匹配的多个候选地址信息以及每个候选地址信息的嵌入表示向量，进而对该查询地址的嵌入表示向量与每个候选地址信息的嵌入表示向量进行匹配分析，根据匹配成功的候选地址信息来获取查询地址的地理位置信息，在实现有效地址定位，并提升地址查询匹配效率的同时，也提高了地址定位的精度。The address positioning method of the embodiment of the present invention pre-constructs an address coordinate library by using all address text data in the spatial area and corresponding geographical coordinate information, and uses the semantic feature embedding representation of the address text to improve the address text query matching efficiency, so that in After receiving the query address input by the user, by determining the geographical grid range and embedded representation vector of the query address, multiple candidate address information matching the geographical grid range and each candidate address information can be quickly matched from the address coordinate library. The embedded representation vector of the candidate address information is then matched and analyzed between the embedded representation vector of the query address and the embedded representation vector of each candidate address information, and the geographical location information of the query address is obtained based on the successfully matched candidate address information, which is effective in achieving Address positioning, while improving the efficiency of address query matching, also improves the accuracy of address positioning.

基于上述实施例的内容，作为一种可选的实施例，基于用户输入的查询地址，确定查询地址的地理网格范围及嵌入表示向量，包括：Based on the contents of the above embodiments, as an optional embodiment, based on the query address input by the user, determining the geographical grid range and embedded representation vector of the query address includes:

将用户输入的查询地址输入至定位预测神经网络模型，得到定位预测神经网络模型输出的查询地址的地理网格范围及嵌入表示向量；Input the query address input by the user into the positioning prediction neural network model, and obtain the geographical grid range and embedded representation vector of the query address output by the positioning prediction neural network model;

定位预测神经网络模型是根据文本地址样本及其对应的样本标签进行训练得到的。The location prediction neural network model is trained based on text address samples and their corresponding sample labels.

具体地，本发明实施例所描述的定位预测神经网络模型是根据文本地址样本及其对应的样本标签对深度神经网络模型进行训练得到的，基于深度神经网络强大的学习能力和特征提取能力，对不同的文本地址信息进行识别，学习地址文本特征与其真实地理网格范围信息之间的关联关系，并学习地址文本特征转换为其文本语义特征的嵌入表示向量之间的规则关系，从而同时输出文本地址所对应的地理网格范围及对应的嵌入表示向量。Specifically, the positioning prediction neural network model described in the embodiment of the present invention is obtained by training a deep neural network model based on text address samples and their corresponding sample labels. Based on the powerful learning ability and feature extraction ability of the deep neural network, the Identify different text address information, learn the association between address text features and its real geographical grid range information, and learn the regular relationship between the address text features converted into embedded representation vectors of their text semantic features, thereby outputting text at the same time The geographical grid range corresponding to the address and the corresponding embedding representation vector.

需要说明的是，针对嵌入表示向量，可以设置定位预测神经网络模型中采用固定神经网络结构来对文本地址进行向量嵌入表示操作，从而提取到文本地址的嵌入表示向量。It should be noted that, for the embedding representation vector, a fixed neural network structure can be used in the positioning prediction neural network model to perform a vector embedding representation operation on the text address, thereby extracting the embedded representation vector of the text address.

其中，深度神经网络可以由包含多头注意力机制的Transformer神经网络模型、池化层神经网络、前连接层神经网络等构成，还可以采用其他用于准确识别及输出文本地址对应的地理网格范围及嵌入表示向量的神经网络，在本发明中不做具体限定。Among them, the deep neural network can be composed of a Transformer neural network model including a multi-head attention mechanism, a pooling layer neural network, a front connection layer neural network, etc. It can also use other geographical grid ranges used to accurately identify and output text addresses. and neural networks embedding representation vectors, which are not specifically limited in the present invention.

本发明实施例所描述的样本标签是根据文本地址样本预先确定的，并与文本地址样本是一一对应的。也就是说，训练样本中的每一个文本地址样本，都预先设定好携带一个与之对应的样本标签。其中，样本标签包括用于训练模型预测地理网格范围所需的标签，以及用于训练模型生成嵌入表示向量所需的标签。The sample labels described in the embodiments of the present invention are predetermined based on the text address samples, and correspond one-to-one with the text address samples. In other words, each text address sample in the training sample is preset to carry a corresponding sample label. Among them, the sample labels include labels required for training the model to predict the geographical grid range, and labels required for training the model to generate embedding representation vectors.

进一步地，在本发明的实施例中，通过将用户输入的查询地址输入至训练好的定位预测神经网络模型，利用深度神经网络对用户输入的查询地址进行识别与处理，同时获取到该查询地址的地理网格范围及嵌入表示向量，为后续地址查询匹配做准备。Further, in the embodiment of the present invention, by inputting the query address input by the user into the trained positioning prediction neural network model, the deep neural network is used to identify and process the query address input by the user, and at the same time, the query address is obtained. The geographic grid range and embedded representation vector are used to prepare for subsequent address query matching.

本发明实施例的方法，通过采用一个带有双头输出的深度神经网络模型，同时计算查询地址所属地理网格范围以及用于后续地址匹配计算的查询地址嵌入表示向量，通过模型只需对查询地址进行一次处理，大大减小了处理步骤繁琐及耗时的问题，可以有效提升文本地址查询匹配及定位的效果。The method of the embodiment of the present invention uses a deep neural network model with a double-headed output to simultaneously calculate the geographical grid range to which the query address belongs and the query address embedding representation vector used for subsequent address matching calculations. Through the model, only the query needs to be The address is processed once, which greatly reduces the tedious and time-consuming processing steps, and can effectively improve the effect of text address query matching and positioning.

基于上述实施例的内容，作为一种可选的实施例，定位预测神经网络模型包括基础骨干子网络、第一预测子网络和第二预测子网络；将用户输入的查询地址输入至定位预测神经网络模型，得到定位预测神经网络模型输出的查询地址的地理网格范围及嵌入表示向量，包括：Based on the content of the above embodiment, as an optional embodiment, the positioning prediction neural network model includes a basic backbone subnetwork, a first prediction subnetwork and a second prediction subnetwork; the query address input by the user is input into the positioning prediction neural network model, and the geographic grid range and embedded representation vector of the query address output by the positioning prediction neural network model are obtained, including:

将用户输入的查询地址输入至基础骨干子网络，得到基础骨干子网络输出的查询地址的文本语义向量；基础骨干子网络是基于Transformer编码器构建的；Input the query address input by the user into the basic backbone subnetwork, and obtain the text semantic vector of the query address output by the basic backbone subnetwork; the basic backbone subnetwork is built based on the Transformer encoder;

将查询地址的文本语义向量输入至第一预测子网络，得到第一预测子网络输出的查询地址的地理网格范围；第一预测子网络是基于全连接层神经网络构建的；Input the textual semantic vector of the query address into the first prediction sub-network to obtain the geographical grid range of the query address output by the first prediction sub-network; the first prediction sub-network is constructed based on a fully connected layer neural network;

将查询地址的文本语义向量输入至第二预测子网络，得到第二预测子网络输出的查询地址的嵌入表示向量；第二预测子网络是基于句向量池化层神经网络和全连接层神经网络构建的。Input the textual semantic vector of the query address into the second prediction sub-network to obtain the embedded representation vector of the query address output by the second prediction sub-network; the second prediction sub-network is based on the sentence vector pooling layer neural network and the fully connected layer neural network Constructed.

具体地，本发明实施例所描述的基础骨干子网络用于提取用户输入的查询地址的文本特征，输出查询地址的文本语义向量，其也可以描述为主干网络。其是基于Transformer编码器构建的，具体可以使用基础版Roberta-wwm预训练模型来构建。Specifically, the basic backbone subnetwork described in the embodiment of the present invention is used to extract text features of the query address input by the user and output the text semantic vector of the query address, which can also be described as a backbone network. It is built based on the Transformer encoder, which can be built using the basic Roberta-wwm pre-trained model.

其中，基础版Roberta-wwm模型由12个Transformer编码器模块堆叠而成，每一个Transformer编码器则是由12个多头自注意力模块、1个前馈神经网络以及1个残差连接组成。多头自注意力机制将输入序列分成多个子空间，并对每个子空间分别进行自注意力计算。这样，模型可以同时关注来自不同子空间的信息，从而更全面地理解输入序列。在训练策略方面，Roberta-wwm移除了语言模型BERT中所提出的下句预测任务，并采用了动态整词掩码预训练机制来更新模型参数。Among them, the basic version of Roberta-wwm model is stacked by 12 Transformer encoder modules. Each Transformer encoder is composed of 12 multi-head self-attention modules, 1 feedforward neural network and 1 residual connection. The multi-head self-attention mechanism divides the input sequence into multiple subspaces and performs self-attention calculations on each subspace. In this way, the model can focus on information from different subspaces simultaneously to gain a more comprehensive understanding of the input sequence. In terms of training strategy, Roberta-wwm removes the next sentence prediction task proposed in the language model BERT, and uses a dynamic whole word mask pre-training mechanism to update model parameters.

本发明实施例所描述的第一预测子网络用于根据查询地址的文本语义向量，预测查询地址文本所属的地理位置信息，如地理网格编号，其也可以描述为地理网格预测子网络。其是基于全连接层神经网络构建的，具体可以采用两层全连接神经网络构建，两个全连接层之间通过Tanh激活函数进行连接。The first prediction sub-network described in the embodiment of the present invention is used to predict the geographical location information, such as the geographical grid number, to which the query address text belongs based on the text semantic vector of the query address. It can also be described as a geographical grid prediction sub-network. It is constructed based on a fully connected layer neural network. Specifically, it can be constructed using a two-layer fully connected neural network. The two fully connected layers are connected through the Tanh activation function.

本发明实施例所描述的第二预测子网络用于根据查询地址的文本语义向量，进一步对该文本语义向量进行特征映射，获取到该查询地址的嵌入表示向量，以用于后续的嵌入表示向量匹配，其也可以描述为查询匹配子网络。其是基于句向量池化层神经网络和全连接层神经网络构建的，具体可以是由1个句向量池化层和1个全连接层组成。The second prediction sub-network described in the embodiment of the present invention is used to further perform feature mapping on the text semantic vector according to the text semantic vector of the query address, and obtain the embedded representation vector of the query address for subsequent embedded representation vectors. Matching, which can also be described as a query matching subnetwork. It is constructed based on the sentence vector pooling layer neural network and the fully connected layer neural network. Specifically, it can be composed of 1 sentence vector pooling layer and 1 fully connected layer.

可选地，在本发明的实施例中，当用户输入一段查询地址文本信息时，首先可以在其首尾分别添加[CLS]和[SEP]两个特殊标识。其中[CLS]位于句首，用于聚合整条文本的文本语义向量表示，并作为第一预测子网络的输入，[SEP]标识符用于标识句尾。之后，使用分词器将输入的查询地址拆分为若干个独立的字符，输入至基础骨干子网络Roberta-wwm模型中进行特征提取，得到查询地址的文本语义向量。Optionally, in the embodiment of the present invention, when the user inputs a piece of query address text information, two special identifiers [CLS] and [SEP] can be added at the beginning and end respectively. [CLS] is located at the beginning of the sentence and is used to aggregate the text semantic vector representation of the entire text and serves as the input of the first prediction sub-network. The [SEP] identifier is used to identify the end of the sentence. After that, a word segmenter is used to split the input query address into several independent characters, and then input into the basic backbone subnetwork Roberta-wwm model for feature extraction to obtain the text semantic vector of the query address.

需要说明的是，本发明实施例中，预先会对地址库所属空间区域范围使用地理网格划分，每个网格由唯一的编号进行表示。It should be noted that in this embodiment of the present invention, the spatial area to which the address database belongs is divided into geographical grids in advance, and each grid is represented by a unique number.

进一步地，在本发明的实施例中，可以将查询地址的[CLS]标识符的文本语义向量输入至第一预测子网络，经过第一预测子网络中的两层全连接层神经网络处理，并在第二个全连接层后使用一个softmax函数将输出向量转换为查询地址属于每个地理网格编号的概率，输出概率最高的结果，从而获得查询地址的地理网格范围，即得到查询地址所属的地理网格编号。Further, in the embodiment of the present invention, the textual semantic vector of the [CLS] identifier of the query address can be input to the first prediction sub-network, and processed by the two-layer fully connected layer neural network in the first prediction sub-network, And after the second fully connected layer, a softmax function is used to convert the output vector into the probability that the query address belongs to each geographical grid number, and the result with the highest probability is output, thereby obtaining the geographical grid range of the query address, that is, the query address is obtained The geographical grid number to which it belongs.

进一步地，在本发明的实施例中，同时将查询地址的文本语义向量输入至第二预测子网络中的句向量池化层和全连接层，通过句向量池化层，将基础骨干子网络输出的查询地址中每个字符的语义向量池化为固定维度的向量表示，再使用全连接层进一步对特征进行投影，得到最终用于后续相似度匹配计算的查询地址的嵌入表示向量。Further, in the embodiment of the present invention, the text semantic vector of the query address is simultaneously input to the sentence vector pooling layer and the fully connected layer in the second prediction sub-network. Through the sentence vector pooling layer, the basic backbone sub-network is The semantic vector of each character in the output query address is pooled into a fixed-dimensional vector representation, and then a fully connected layer is used to further project the features to obtain the embedded representation vector of the query address that is ultimately used for subsequent similarity matching calculations.

本发明实施例方法，通过采用基础骨干子网络、第一预测子网络和第二预测子网络构建带有双头输出的深度神经网络模型，能够同时计算查询地址所属地理网格编号和用于地址相似度计算的语义特征嵌入表示向量，以利用预测的地理网格从全局地址坐标库中筛选出候选地址集合，避免与全部地址进行相似度计算，提高查询匹配效率；并利用查询地址的嵌入表示向量进行相似度计算，提升查询匹配的效果。The method of the embodiment of the present invention uses the basic backbone sub-network, the first prediction sub-network and the second prediction sub-network to construct a deep neural network model with dual-head output, which can simultaneously calculate the geographical grid number to which the query address belongs and the address used for it. The semantic feature embedding representation vector for similarity calculation is used to filter out the candidate address set from the global address coordinate library using the predicted geographical grid, avoiding similarity calculation with all addresses and improving query matching efficiency; and using the embedded representation of the query address Vectors are used to calculate similarity to improve the query matching effect.

基于上述实施例的内容，作为一种可选的实施例，对查询地址的嵌入表示向量与每个候选地址信息的嵌入表示向量进行匹配分析，确定查询地址的地理位置信息，包括：Based on the contents of the above embodiments, as an optional embodiment, a matching analysis is performed on the embedded representation vector of the query address and the embedded representation vector of each candidate address information to determine the geographical location information of the query address, including:

利用相似度匹配方法，计算查询地址的嵌入表示向量与每个候选地址信息的嵌入表示向量的距离，以确定查询地址的嵌入表示向量相对于每个候选地址信息的嵌入表示向量的相似度信息；Using a similarity matching method, calculate the distance between the embedded representation vector of the query address and the embedded representation vector of each candidate address information to determine the similarity information of the embedded representation vector of the query address relative to the embedded representation vector of each candidate address information;

从各个候选地址信息中，确定相似度信息中最大值所对应的目标候选地址信息；From each candidate address information, determine the target candidate address information corresponding to the maximum value in the similarity information;

根据目标候选地址信息，得到查询地址的地理位置信息。According to the target candidate address information, the geographical location information of the query address is obtained.

具体地，本发明实施例所描述的目标候选地址信息指的是通过查询地址的嵌入表示向量与每个候选地址信息的嵌入表示向量进行相似度匹配计算，所确定的相似度最大的候选地址信息。Specifically, the target candidate address information described in the embodiment of the present invention refers to the candidate address information with the greatest similarity determined through similarity matching calculation between the embedded representation vector of the query address and the embedded representation vector of each candidate address information. .

在本发明的实施例中，通过计算查询地址的嵌入表示向量与每个候选地址信息的嵌入表示向量的欧式距离，可以根据计算的欧式距离大小进行相似度排序。欧式距离越小，相似度越大，欧式距离越大，相似度越小。以此，可以确定出查询地址的嵌入表示向量相对于每个候选地址信息的嵌入表示向量的相似度信息，进而可以将各候选地址信息按照相似度大小顺序排序。In embodiments of the present invention, by calculating the Euclidean distance between the embedded representation vector of the query address and the embedded representation vector of each candidate address information, similarity ranking can be performed according to the calculated Euclidean distance. The smaller the Euclidean distance, the greater the similarity; the greater the Euclidean distance, the smaller the similarity. In this way, the similarity information of the embedded representation vector of the query address with respect to the embedded representation vector of each candidate address information can be determined, and then the candidate address information can be sorted in order of similarity.

进一步地，在本发明的实施例中，从各个候选地址信息中，确定相似度信息中最大值所对应的目标候选地址信息，并将该目标候选地址信息作为查询地址的地理位置信息，以此实现对查询地址的定位。Further, in the embodiment of the present invention, from each candidate address information, the target candidate address information corresponding to the maximum value in the similarity information is determined, and the target candidate address information is used as the geographical location information of the query address, so as to Realize the positioning of the query address.

本发明实施例的方法，通过从全局地址坐标库中筛选出候选地址集合，利用查询地址的嵌入表示向量与候选地址的嵌入表示向量进行相似度计算，避免了与全局所有地址进行相似度计算的繁琐，提高了查询地址匹配的效率，提升了地址定位的效果。The method of the embodiment of the present invention filters out a set of candidate addresses from the global address coordinate library, and uses the embedded representation vector of the query address and the embedded representation vector of the candidate address to perform similarity calculations, thus avoiding the need to perform similarity calculations with all global addresses. It is cumbersome, improves the efficiency of query address matching, and improves the effect of address positioning.

基于上述实施例的内容，作为一种可选的实施例，在将用户输入的查询地址输入至定位预测神经网络模型之前，该方法还包括：Based on the contents of the above embodiments, as an optional embodiment, before inputting the query address input by the user into the positioning prediction neural network model, the method further includes:

将文本地址样本及其对应的样本标签作为一组训练样本，获取多组训练样本；Use text address samples and their corresponding sample labels as a set of training samples to obtain multiple sets of training samples;

利用多组训练样本，对定位预测神经网络模型进行训练。Multiple sets of training samples are used to train the positioning prediction neural network model.

具体地，在本发明的实施例中，在将用户输入的查询地址输入至定位预测神经网络模型之前，还需对定位预测神经网络模型进行训练，以得到训练好的定位预测神经网络模型。Specifically, in the embodiment of the present invention, before inputting the query address input by the user into the positioning prediction neural network model, the positioning prediction neural network model needs to be trained to obtain a trained positioning prediction neural network model.

在本发明的实施例中，利用训练集数据对定位预测神经网络模型进行训练，具体训练过程如下：In the embodiment of the present invention, the positioning prediction neural network model is trained using training set data. The specific training process is as follows:

将文本地址样本及其对应的样本标签作为一组训练样本，针对不同的文本地址样本，则可以获取到多组训练样本。The text address samples and their corresponding sample labels are used as a set of training samples. For different text address samples, multiple sets of training samples can be obtained.

在本发明的实施例中，文本地址样本与其携带的样本标签是一一对应的。In the embodiment of the present invention, there is a one-to-one correspondence between text address samples and the sample tags they carry.

然后，在获得多组训练样本之后，再将多组训练样本依次输入至定位预测神经网络模型中，利用多组训练样本对定位预测神经网络模型进行训练，即：Then, after obtaining multiple sets of training samples, the multiple sets of training samples are sequentially input into the positioning prediction neural network model, and the multiple sets of training samples are used to train the positioning prediction neural network model, that is:

将每组训练样本中的文本地址样本及其对应的样本标签同时输入至定位预测神经网络模型中，根据定位预测神经网络模型中的每一次输出的预测结果，通过计算损失函数值，对定位预测神经网络模型中的模型参数进行调整，在满足预设训练终止条件的情况下，最终完成定位预测神经网络模型的整个训练过程，得到训练好的定位预测神经网络模型。The text address samples and their corresponding sample labels in each group of training samples are simultaneously input into the positioning prediction neural network model. According to the prediction results of each output in the positioning prediction neural network model, the positioning prediction is performed by calculating the loss function value. The model parameters in the neural network model are adjusted, and when the preset training termination conditions are met, the entire training process of the positioning prediction neural network model is finally completed, and the trained positioning prediction neural network model is obtained.

本发明实施例的方法，通过将文本地址样本及其对应的样本标签作为一组训练样本，利用多组训练样本对定位预测神经网络模型进行训练，有利于提升训练好的定位预测神经网络模型的模型精度。The method of the embodiment of the present invention uses text address samples and their corresponding sample labels as a set of training samples, and uses multiple sets of training samples to train the positioning prediction neural network model, which is beneficial to improving the performance of the trained positioning prediction neural network model. Model accuracy.

基于上述实施例的内容，作为一种可选的实施例，利用多组训练样本，对定位预测神经网络模型进行训练，包括：Based on the contents of the above embodiments, as an optional embodiment, multiple sets of training samples are used to train the positioning prediction neural network model, including:

对于任意一组训练样本，将训练样本输入至定位预测神经网络模型，输出训练样本对应的预测概率；For any set of training samples, input the training samples into the positioning prediction neural network model and output the prediction probability corresponding to the training samples;

利用预设损失函数，根据训练样本对应的预测概率和训练样本中的标签计算损失值；Use the preset loss function to calculate the loss value based on the predicted probability corresponding to the training sample and the label in the training sample;

基于损失值，对定位预测神经网络模型的模型参数进行调整，直至模型训练次数达到预设次数；Based on the loss value, adjust the model parameters of the positioning prediction neural network model until the number of model training times reaches the preset number;

将模型训练次数达到预设次数时所得到的模型参数作为训练好的定位预测神经网络模型的模型参数。The model parameters obtained when the number of model training times reaches the preset number are used as the model parameters of the trained positioning prediction neural network model.

具体地，本发明实施例所描述的预设损失函数指的是预先设置在定位预测神经网络模型里的损失函数，用于进行模型评估；预设阈值指的是模型预先设置的阈值，用于获得最小损失值，完成模型训练；预设次数指的是预先设置的模型迭代训练的最大次数。Specifically, the preset loss function described in the embodiment of the present invention refers to the loss function preset in the positioning prediction neural network model, which is used for model evaluation; the preset threshold refers to the threshold value preset by the model, which is used for model evaluation. Obtain the minimum loss value and complete the model training; the preset number of times refers to the preset maximum number of iterative training times for the model.

在获得多组训练样本之后，对于任意一组训练样本，将每组训练样本中的文本地址样本及其对应的样本标签同时输入至定位预测神经网络模型，输出该训练样本对应结果的预测概率。After obtaining multiple sets of training samples, for any set of training samples, the text address samples and their corresponding sample labels in each set of training samples are simultaneously input to the positioning prediction neural network model, and the predicted probability of the corresponding result of the training sample is output.

在此基础上，利用预设损失函数，根据该训练样本对应的预测概率和该训练样本对应的样本标签，计算损失值。On this basis, the preset loss function is used to calculate the loss value based on the predicted probability corresponding to the training sample and the sample label corresponding to the training sample.

进一步地，在计算获得损失值之后，本次训练过程结束。再利用如反向传播（BackPropagation，BP）算法，基于该损失值对定位预测神经网络模型的模型参数进行调整，来更新定位预测神经网络模型中的模型的各层权重参数，之后再进行下一次训练，如此反复迭代进行模型训练。Further, after the loss value is calculated, the training process ends. Then use the BackPropagation (BP) algorithm to adjust the model parameters of the positioning prediction neural network model based on the loss value to update the weight parameters of each layer of the model in the positioning prediction neural network model, and then proceed to the next time Training, and so on iteratively perform model training.

在训练的过程中，若针对某组训练样本的训练结果满足预设训练终止条件，如对应计算获得的损失值小于预设阈值，或着当前的迭代次数达到预设次数时，模型的损失值可以控制在收敛范围内，则模型训练结束。此时，可以将所得到的模型参数作为训练好的定位预测神经网络模型的模型参数，则定位预测神经网络模型训练完成，由此得到训练好的定位预测神经网络模型。During the training process, if the training results for a certain set of training samples meet the preset training termination conditions, such as the corresponding calculated loss value is less than the preset threshold, or the current iteration number reaches the preset number, the model's loss value If it can be controlled within the convergence range, the model training ends. At this time, the obtained model parameters can be used as model parameters of the trained positioning prediction neural network model, then the positioning prediction neural network model training is completed, thereby obtaining the trained positioning prediction neural network model.

本发明实施例的方法，通过利用多组训练样本对定位预测神经网络模型进行反复迭代训练，将定位预测神经网络模型的损失值控制在收敛范围内，从而有利于提高模型输出结果的准确性，提升地址定位预测结果的精度。The method of the embodiment of the present invention uses multiple sets of training samples to repeatedly and iteratively train the positioning prediction neural network model, and controls the loss value of the positioning prediction neural network model within the convergence range, thereby helping to improve the accuracy of the model output results. Improve the accuracy of address location prediction results.

基于上述实施例的内容，作为一种可选的实施例，在基于用户输入的查询地址，确定查询地址的地理网格范围及嵌入表示向量之前，该方法还包括：Based on the contents of the above embodiments, as an optional embodiment, before determining the geographical grid range and embedded representation vector of the query address based on the query address input by the user, the method further includes:

对空间区域范围进行地理网格划分，并对划分后的各个地理网格进行编号，确定每个地理网格的网格编号；Divide the spatial area into geographical grids, number each divided geographical grid, and determine the grid number of each geographical grid;

将空间区域范围的所有地址文本数据以及对应的地理坐标信息，与每个地理网格的网格编号进行关联处理，并获取关联处理后的关联信息；Associate all address text data and corresponding geographic coordinate information in the spatial area with the grid number of each geographic grid, and obtain the associated information after association processing;

基于关联信息，为所有地址文本数据以及对应的地理坐标信息构建地理网格空间索引规则，并确定所有地址文本数据的嵌入表示向量；Based on the associated information, construct geographic grid spatial index rules for all address text data and corresponding geographic coordinate information, and determine the embedded representation vectors of all address text data;

将地理网格空间索引规则、所有地址文本数据的嵌入表示向量、所有地址文本数据以及对应的地理坐标信息存入空间数据库，得到地址坐标库。The geographical grid spatial index rules, the embedded representation vectors of all address text data, all address text data and the corresponding geographical coordinate information are stored in the spatial database to obtain the address coordinate library.

具体地，在本发明的实施例中，在基于用户输入的查询地址，确定查询地址的地理网格范围及嵌入表示向量之前，还需要预先构建地址坐标库。Specifically, in the embodiment of the present invention, before determining the geographical grid range and embedding representation vector of the query address based on the query address input by the user, the address coordinate library needs to be pre-constructed.

在本发明的实施例中，空间区域范围指的是涵盖所需查询地址的空间区域范围，可以是地球上某一特定空间区域范围，也可以是全球区域范围。In the embodiment of the present invention, the space area range refers to the space area range covering the required query address, which can be a specific space area range on the earth or a global area range.

在本发明的实施例中，首先，对空间区域范围进行地理网格划分，并对划分后的各个地理网格进行唯一编号，依次设定每个地理网格唯一的网格编号。In the embodiment of the present invention, first, the spatial area range is divided into geographical grids, and each divided geographical grid is uniquely numbered, and a unique grid number is set for each geographical grid in turn.

进而，将空间区域范围的所有地址文本数据以及对应的地理坐标信息，与每个地理网格的网格编号进行关联处理，使得每个地理网格的网格编号都对应有其相关的地址数据以及地理坐标信息（如经纬度信息），以此构建出地址文本数据及对应的地理坐标信息与每个地理网格编号之间的关联信息。Furthermore, all address text data and corresponding geographical coordinate information in the spatial area are associated with the grid number of each geographical grid, so that the grid number of each geographical grid corresponds to its related address data. and geographical coordinate information (such as latitude and longitude information), to construct the association information between the address text data and the corresponding geographical coordinate information and each geographical grid number.

进一步地，在本发明的实施例中，基于上述关联信息，为所有地址文本数据以及对应的地理坐标信息构建地理网格空间索引规则，以在地理网格空间索引规则下，可以根据地理网格的网格编号可以索引查询到对应的地理坐标信息。Further, in the embodiment of the present invention, based on the above-mentioned associated information, geographical grid space index rules are constructed for all address text data and corresponding geographical coordinate information, so that under the geographical grid space index rules, the geographical grid can be The grid number can be indexed and queried to the corresponding geographical coordinate information.

同时，将地址坐标库中的所有地址文本输入至前述定位预测神经网络模型的基础骨干子网络以及查询匹配子网络中，由此获取到所有地址文本数据的嵌入表示向量。At the same time, all address texts in the address coordinate library are input into the basic backbone subnetwork and query matching subnetwork of the aforementioned positioning prediction neural network model, thereby obtaining the embedded representation vectors of all address text data.

最后，在本发明的实施例中，将上述地理网格空间索引规则、所有地址文本数据的嵌入表示向量、所有地址文本数据以及对应的地理坐标信息存入空间数据库中，由此构建出地址坐标库。Finally, in the embodiment of the present invention, the above-mentioned geographical grid spatial index rules, the embedded representation vectors of all address text data, all address text data and the corresponding geographical coordinate information are stored in the spatial database, thereby constructing the address coordinates library.

本发明实施例的方法，通过利用深度神经网络强大的学习能力和特征提取能力，可以直接将地址数据与地理坐标数据入库，并利用语义特征嵌入表示向量进行查询匹配，无需构建复杂的人工匹配规则，相比传统基于统计分词、地址模型、人工匹配规则的方法，本发明构建地址坐标库的方法更加简单、高效。The method of the embodiment of the present invention uses the powerful learning ability and feature extraction ability of the deep neural network to directly store address data and geographical coordinate data into the database, and uses semantic features to embed representation vectors for query matching without the need to construct complex manual matching. Rules, compared with traditional methods based on statistical word segmentation, address models, and manual matching rules, the method of constructing an address coordinate library in the present invention is simpler and more efficient.

图2是本发明提供的地址定位方法的流程示意图之二，如图2所示，在本发明的一个具体实施例中，首先，通过接收前端用户输入的查询地址，将该查询地址首先输入至主干网络，获取到该查询地址的文本语义向量。然后，将该查询地址的文本语义向量分别输入至地理网格预测子网络和查询匹配子网络，通过地理网格预测子网络预测该查询地址所属的地理网格编号，通过查询匹配子网络输出该查询地址的嵌入表示向量。Figure 2 is a second schematic flow chart of the address positioning method provided by the present invention. As shown in Figure 2, in a specific embodiment of the present invention, first, by receiving the query address input by the front-end user, the query address is first input to The backbone network obtains the text semantic vector of the query address. Then, the textual semantic vector of the query address is input to the geographical grid prediction subnetwork and the query matching subnetwork respectively, and the geographical grid number to which the query address belongs is predicted through the geographical grid prediction subnetwork, and the query matching subnetwork is used to output the A vector embedding representation of the query address.

进一步地，在本实施例中，根据该查询地址所属的地理网格编号获得该网格的地理网格坐标范围。然后，利用空间查询从地址坐标库中，选取该地理网格坐标范围区域内的所有地址数据作为候选地址信息，以及各个候选地址信息对应的候选地址嵌入表示向量集合，该集合中含有各个候选地址信息的嵌入表示向量。Further, in this embodiment, the geographical grid coordinate range of the grid is obtained according to the geographical grid number to which the query address belongs. Then, use spatial query to select all address data within the geographical grid coordinate range from the address coordinate database as candidate address information, and a set of candidate address embedding representation vectors corresponding to each candidate address information, which contains each candidate address. Embedding representation vector of information.

进一步地，在本实施例中，将该查询地址的嵌入表示向量与候选地址嵌入表示向量集合输入至相似度计算网络层进行相似度匹配分析，计算该查询地址的嵌入表示向量与每个候选地址信息的嵌入表示向量的相似度，确定出最佳匹配地址与坐标计算结果，完成对用户输入的查询地址的定位。Further, in this embodiment, the embedded representation vector of the query address and the set of candidate address embedded representation vectors are input to the similarity calculation network layer for similarity matching analysis, and the embedded representation vector of the query address and each candidate address are calculated. The embedding of information represents the similarity of vectors, determines the best matching address and coordinate calculation results, and completes the positioning of the query address input by the user.

与现有文本地址定位技术相比，本发明实施例的方法可以同时计算出查询地址所属的地理网格信息和用于查询匹配的语义特征嵌入表示向量，相比分别利用两个同规模深度神经网络模型预测地理网格和查询匹配的方法，能够减少处理步骤，节省计算资源。Compared with existing text address positioning technology, the method of the embodiment of the present invention can simultaneously calculate the geographical grid information to which the query address belongs and the semantic feature embedding representation vector used for query matching. Compared with using two deep neural networks of the same scale respectively, Network model prediction methods for geographic grids and query matching can reduce processing steps and save computing resources.

相比于仅利用基于深度神经网络的查询匹配模型进行全局检索的方法，本发明通过地理网格的筛选可以有效减少待查询匹配的候选地址数量，提高了计算的效率。此外，如果查询地址在地址坐标库中尚未存储，本方法能够根据预测的地理网格编号输出查询地址所属的地理范围。Compared with the method of only using a query matching model based on a deep neural network for global retrieval, the present invention can effectively reduce the number of candidate addresses to be query matched through geographical grid screening, and improve the calculation efficiency. In addition, if the query address has not been stored in the address coordinate library, this method can output the geographical range to which the query address belongs based on the predicted geographical grid number.

下面对本发明提供的地址定位装置进行描述，下文描述的地址定位装置与上文描述的地址定位方法可相互对应参照。The address positioning device provided by the present invention will be described below. The address positioning device described below and the address positioning method described above can be referenced correspondingly.

图3是本发明提供的地址定位装置的结构示意图，如图3所示，包括：Figure 3 is a schematic structural diagram of the address positioning device provided by the present invention. As shown in Figure 3, it includes:

处理模块310，用于处理模块，用于基于用户输入的查询地址，确定查询地址的地理网格范围及嵌入表示向量；The processing module 310 is used to process the module and determine the geographical grid range and embedded representation vector of the query address based on the query address input by the user;

匹配模块320，用于从地址坐标库中确定与地理网格范围相匹配的多个候选地址信息，并确定每个候选地址信息的嵌入表示向量；地址坐标库是基于空间区域范围的所有地址文本数据以及对应的地理坐标信息进行构建的；Matching module 320, used to determine multiple candidate address information that matches the geographical grid range from the address coordinate library, and determine the embedded representation vector of each candidate address information; the address coordinate library is all address text based on the spatial area range Data and corresponding geographical coordinate information are constructed;

定位模块330，用于对查询地址的嵌入表示向量与每个候选地址信息的嵌入表示向量进行匹配分析，确定查询地址的地理位置信息。The positioning module 330 is used to perform matching analysis on the embedded representation vector of the query address and the embedded representation vector of each candidate address information, and determine the geographical location information of the query address.

本实施例所述的地址定位装置可以用于执行上述地址定位方法实施例，其原理和技术效果类似，此处不再赘述。The address positioning device described in this embodiment can be used to execute the above embodiment of the address positioning method. Its principles and technical effects are similar and will not be described again here.

本发明实施例的地址定位装置，通过利用空间区域范围的所有地址文本数据以及对应的地理坐标信息预先构建地址坐标库，并利用地址文本的语义特征嵌入表示来提升地址文本查询匹配效率，使得在收到用户输入的查询地址后，通过确定该查询地址的地理网格范围及嵌入表示向量，可以从地址坐标库中快速匹配到与该地理网格范围相匹配的多个候选地址信息以及每个候选地址信息的嵌入表示向量，进而对该查询地址的嵌入表示向量与每个候选地址信息的嵌入表示向量进行匹配分析，根据匹配成功的候选地址信息来获取查询地址的地理位置信息，在实现有效地址定位，并提升地址查询匹配效率的同时，也提高了地址定位的精度。The address positioning device according to the embodiment of the present invention pre-constructs an address coordinate library by using all address text data in the spatial area and corresponding geographical coordinate information, and uses the semantic feature embedding representation of the address text to improve address text query matching efficiency, so that in After receiving the query address input by the user, by determining the geographical grid range and embedded representation vector of the query address, multiple candidate address information matching the geographical grid range and each candidate address information can be quickly matched from the address coordinate library. The embedded representation vector of the candidate address information is then matched and analyzed between the embedded representation vector of the query address and the embedded representation vector of each candidate address information, and the geographical location information of the query address is obtained based on the successfully matched candidate address information, which is effective in achieving Address positioning, while improving the efficiency of address query matching, also improves the accuracy of address positioning.

图4是本发明提供的电子设备的实体结构示意图，如图4所示，该电子设备可以包括：处理器（processor）410、通信接口（Communications Interface）420、存储器（memory）430和通信总线440，其中，处理器410，通信接口420，存储器430通过通信总线440完成相互间的通信。处理器410可以调用存储器430中的逻辑指令，以执行上述各方法所提供的地址定位方法，该方法包括：基于用户输入的查询地址，确定所述查询地址的地理网格范围及嵌入表示向量；从地址坐标库中确定与所述地理网格范围相匹配的多个候选地址信息，并确定每个所述候选地址信息的嵌入表示向量；所述地址坐标库是基于空间区域范围的所有地址文本数据以及对应的地理坐标信息进行构建的；对所述查询地址的嵌入表示向量与每个所述候选地址信息的嵌入表示向量进行匹配分析，确定所述查询地址的地理位置信息。Figure 4 is a schematic diagram of the physical structure of an electronic device provided by the present invention. As shown in Figure 4, the electronic device may include: a processor (processor) 410, a communications interface (Communications Interface) 420, a memory (memory) 430 and a communication bus 440 , wherein the processor 410, the communication interface 420, and the memory 430 complete communication with each other through the communication bus 440. The processor 410 can call logical instructions in the memory 430 to execute the address positioning method provided by each of the above methods. The method includes: based on the query address input by the user, determining the geographical grid range and embedded representation vector of the query address; Determine multiple candidate address information matching the geographical grid range from the address coordinate library, and determine the embedded representation vector of each candidate address information; the address coordinate library is based on all address texts of the spatial area range The embedded representation vector of the query address and the embedded representation vector of each candidate address information are matched and analyzed to determine the geographical location information of the query address.

此外，上述的存储器430中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备（可以是个人计算机，服务器，或者网络设备等）执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器（ROM，Read-Only Memory）、随机存取存储器（RAM，Random Access Memory）、磁碟或者光盘等各种可以存储程序代码的介质。In addition, the above-mentioned logical instructions in the memory 430 can be implemented in the form of software functional units and can be stored in a computer-readable storage medium when sold or used as an independent product. Based on this understanding, the technical solution of the present invention essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which can be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of the present invention. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program code. .

另一方面，本发明还提供一种计算机程序产品，所述计算机程序产品包括计算机程序，计算机程序可存储在非暂态计算机可读存储介质上，所述计算机程序被处理器执行时，计算机能够执行上述各方法所提供的地址定位方法，该方法包括：基于用户输入的查询地址，确定所述查询地址的地理网格范围及嵌入表示向量；从地址坐标库中确定与所述地理网格范围相匹配的多个候选地址信息，并确定每个所述候选地址信息的嵌入表示向量；所述地址坐标库是基于空间区域范围的所有地址文本数据以及对应的地理坐标信息进行构建的；对所述查询地址的嵌入表示向量与每个所述候选地址信息的嵌入表示向量进行匹配分析，确定所述查询地址的地理位置信息。On the other hand, the present invention also provides a computer program product. The computer program product includes a computer program. The computer program can be stored on a non-transitory computer-readable storage medium. When the computer program is executed by a processor, the computer can Executing the address positioning method provided by each of the above methods, the method includes: based on the query address input by the user, determining the geographical grid range and embedded representation vector of the query address; determining the geographical grid range from the address coordinate library Match multiple candidate address information, and determine the embedded representation vector of each candidate address information; the address coordinate library is constructed based on all address text data in the spatial area range and the corresponding geographical coordinate information; all The embedded representation vector of the query address and the embedded representation vector of each candidate address information are matched and analyzed to determine the geographical location information of the query address.

又一方面，本发明还提供一种非暂态计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现以执行上述各方法提供的地址定位方法，该方法包括：基于用户输入的查询地址，确定所述查询地址的地理网格范围及嵌入表示向量；从地址坐标库中确定与所述地理网格范围相匹配的多个候选地址信息，并确定每个所述候选地址信息的嵌入表示向量；所述地址坐标库是基于空间区域范围的所有地址文本数据以及对应的地理坐标信息进行构建的；对所述查询地址的嵌入表示向量与每个所述候选地址信息的嵌入表示向量进行匹配分析，确定所述查询地址的地理位置信息。In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium on which a computer program is stored. The computer program is implemented when executed by a processor to perform the address positioning method provided by each of the above methods. The method includes: Based on the query address input by the user, determine the geographical grid range and embedded representation vector of the query address; determine multiple candidate address information matching the geographical grid range from the address coordinate library, and determine each of the The embedded representation vector of the candidate address information; the address coordinate library is constructed based on all address text data in the spatial area range and the corresponding geographical coordinate information; the embedded representation vector of the query address and each of the candidate address information Perform matching analysis on the embedded representation vector to determine the geographical location information of the query address.

以上所描述的装置实施例仅仅是示意性的，其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下，即可以理解并实施。The device embodiments described above are only illustrative. The units described as separate components may or may not be physically separated. The components shown as units may or may not be physical units, that is, they may be located in One location, or it can be distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. Persons of ordinary skill in the art can understand and implement the method without any creative effort.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件。基于这样的理解，上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品可以存储在计算机可读存储介质中，如ROM/RAM、磁碟、光盘等，包括若干指令用以使得一台计算机设备（可以是个人计算机，服务器，或者网络设备等）执行各个实施例或者实施例的某些部分所述的方法。Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and of course, it can also be implemented by hardware. Based on this understanding, the part of the above technical solution that essentially contributes to the existing technology can be embodied in the form of a software product. The computer software product can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disc, optical disk, etc., including a number of instructions to cause a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods described in various embodiments or certain parts of the embodiments.

最后应说明的是：以上实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still be used Modifications are made to the technical solutions described in the foregoing embodiments, or equivalent substitutions are made to some of the technical features; however, these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. An address positioning method, characterized by comprising:

Based on the query address input by the user, determine the geographical grid range and embedded representation vector of the query address;

Determine multiple candidate address information matching the geographical grid range from the address coordinate library, and determine the embedded representation vector of each candidate address information; the address coordinate library is based on all address texts of the spatial area range Data and corresponding geographical coordinate information are constructed;

Perform matching analysis on the embedded representation vector of the query address and the embedded representation vector of each candidate address information to determine the geographical location information of the query address;

Wherein, based on the query address input by the user, determining the geographical grid range and embedded representation vector of the query address includes:

Input the query address input by the user into the positioning prediction neural network model, and obtain the geographical grid range and embedded representation vector of the query address output by the positioning prediction neural network model;

The positioning prediction neural network model is trained based on text address samples and their corresponding sample labels;

Wherein, the positioning prediction neural network model includes a basic backbone sub-network, a first prediction sub-network and a second prediction sub-network; the query address input by the user is input into the positioning prediction neural network model to obtain the positioning prediction The geographical grid range and embedded representation vector of the query address output by the neural network model include:

Inputting the query address input by the user into the basic backbone sub-network to obtain a text semantic vector of the query address output by the basic backbone sub-network; the basic backbone sub-network is constructed based on a Transformer encoder;

Input the textual semantic vector of the query address into the first prediction sub-network to obtain the geographical grid range of the query address output by the first prediction sub-network; the first prediction sub-network is based on full connection Constructed by layer neural network;

Input the textual semantic vector of the query address into the second prediction sub-network to obtain the embedded representation vector of the query address output by the second prediction sub-network; the second prediction sub-network is based on a sentence vector pool Constructed by layer neural network and fully connected layer neural network;

Wherein, before determining the geographical grid range and embedded representation vector of the query address based on the query address input by the user, the method further includes:

Divide the spatial area into geographical grids, number each divided geographical grid, and determine the grid number of each geographical grid;

Associate all address text data and corresponding geographic coordinate information in the spatial area with the grid number of each geographic grid, and obtain the associated information after association processing;

Based on the associated information, construct geographic grid space index rules for all address text data and corresponding geographic coordinate information, and determine embedded representation vectors of all address text data;

The geographical grid spatial index rules, the embedded representation vectors of all address text data, the all address text data and the corresponding geographical coordinate information are stored in a spatial database to obtain the address coordinate library.

2. The address positioning method according to claim 1, characterized in that the embedded representation vector of the query address and the embedded representation vector of each candidate address information are matched and analyzed to determine the query address. Geolocation information, including:

Using a similarity matching method, calculate the distance between the embedded representation vector of the query address and the embedded representation vector of each of the candidate address information to determine the distance of the embedded representation vector of the query address relative to each of the candidate address information. The embedding represents the similarity information of the vector;

From each of the candidate address information, determine the target candidate address information corresponding to the maximum value in the similarity information;

According to the target candidate address information, the geographical location information of the query address is obtained.

3. The address positioning method according to claim 1, characterized in that, before inputting the query address input by the user into the positioning prediction neural network model, the method further includes:

Use the text address sample and its corresponding sample label as a set of training samples to obtain multiple sets of training samples;

The multiple sets of training samples are used to train the positioning prediction neural network model.

4. The address positioning method according to claim 3, characterized in that said utilizing the plurality of sets of training samples to train a positioning prediction neural network model includes:

For any set of training samples, input the training samples into the positioning prediction neural network model, and output the prediction probability corresponding to the training samples;

Using a preset loss function, calculate the loss value based on the predicted probability corresponding to the training sample and the sample label in the training sample;

Based on the loss value, adjust the model parameters of the positioning prediction neural network model until the number of model training times reaches a preset number;

The model parameters obtained when the number of model training times reaches the preset number of times are used as model parameters of the trained positioning prediction neural network model.

5. An address positioning device, characterized by comprising:

A processing module configured to determine the geographical grid range and embedded representation vector of the query address based on the query address input by the user;

A matching module, configured to determine multiple candidate address information that matches the geographical grid range from the address coordinate library, and determine an embedded representation vector of each candidate address information; the address coordinate library is based on a spatial area All address text data in the range and corresponding geographical coordinate information are constructed;

A positioning module, configured to perform matching analysis on the embedded representation vector of the query address and the embedded representation vector of each candidate address information, and determine the geographical location information of the query address;

Among them, the processing module is specifically used for:

Wherein, the positioning prediction neural network model includes a basic backbone sub-network, a first prediction sub-network and a second prediction sub-network; the processing module is also specifically used for:

Input the query address input by the user into the basic backbone sub-network to obtain the text semantic vector of the query address output by the basic backbone sub-network; the basic backbone sub-network is constructed based on the Transformer encoder;

Wherein, the device is specifically used for:

6. An electronic device, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, characterized in that when the processor executes the program, it implements claim 1 Go to the address positioning method described in any one of 4.

7. A non-transitory computer-readable storage medium on which a computer program is stored, characterized in that when the computer program is executed by a processor, the address positioning method according to any one of claims 1 to 4 is implemented.