CN117131860A - Address quality determining method and device based on knowledge graph and storage medium - Google Patents

Address quality determining method and device based on knowledge graph and storage medium Download PDF

Info

Publication number
CN117131860A
CN117131860A CN202311154182.3A CN202311154182A CN117131860A CN 117131860 A CN117131860 A CN 117131860A CN 202311154182 A CN202311154182 A CN 202311154182A CN 117131860 A CN117131860 A CN 117131860A
Authority
CN
China
Prior art keywords
address
detected
quality
entity
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311154182.3A
Other languages
Chinese (zh)
Inventor
左珑
孙能林
刘丁
王明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Haier Technology Co Ltd
Haier Smart Home Co Ltd
Haier Uplus Intelligent Technology Beijing Co Ltd
Original Assignee
Qingdao Haier Technology Co Ltd
Haier Smart Home Co Ltd
Haier Uplus Intelligent Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Haier Technology Co Ltd, Haier Smart Home Co Ltd, Haier Uplus Intelligent Technology Beijing Co Ltd filed Critical Qingdao Haier Technology Co Ltd
Priority to CN202311154182.3A priority Critical patent/CN117131860A/en
Publication of CN117131860A publication Critical patent/CN117131860A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses an address quality determining method and device based on a knowledge graph and a storage medium, and relates to the technical field of computers. The address quality determining method based on the knowledge graph comprises the following steps: acquiring a preset address to be detected; inputting the address to be detected into an entity identification model, and outputting a fine-granularity address corresponding to the address to be detected; matching the fine-grained address with a preset address sample to obtain an address matching result; determining the quality type of the address to be detected according to the matching result; the entity identification model is obtained by performing address structural analysis training on a plurality of address samples. According to the embodiment of the application, the to-be-detected address is analyzed, the fine-grained address is obtained efficiently, and the quality type of the to-be-detected address is determined accurately according to the matching result of the fine-grained address and the address sample.

Description

Address quality determining method and device based on knowledge graph and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and apparatus for determining address quality based on a knowledge graph, and a storage medium.
Background
The residence address is very important in our life, for example, the express delivery dispatcher can accurately find the position of the user according to the residence address provided by the user, so as to realize the service of taking and delivering the express to the user. For example, medical staff can accurately and rapidly find the position of the user according to the address information provided by the user to rescue.
However, the address provided by the user's request must be very detailed to be able to accurately determine the location based on the address. Therefore, it is crucial to analyze the address provided by the user to determine the quality type of the address.
Disclosure of Invention
In view of the above, the application provides a method, a device and a storage medium for determining the quality of an address based on a knowledge graph, which are used for efficiently obtaining a fine-grained address by analyzing the address to be detected and accurately determining the quality type of the address to be detected according to the matching result of the fine-grained address and an address sample.
In a first aspect, the present application provides a method for determining address quality based on a knowledge graph, including:
acquiring a preset address to be detected;
inputting the address to be detected into an entity identification model, and outputting a fine-granularity address corresponding to the address to be detected;
Matching the fine-grained address with a preset address sample to obtain an address matching result;
determining the quality type of the address to be detected according to the matching result;
the entity identification model is obtained by performing address structural analysis training on a plurality of address samples.
Preferably, according to the method for determining address quality based on a knowledge graph provided by the present application, the entity identification model at least includes: dictionary module, pre-training language model module, regular matching module;
inputting the address to be detected into an entity identification model, outputting a fine-grained address corresponding to the address to be detected, and the method comprises the following steps:
performing structural analysis on the address to be detected by utilizing a suffix tree dictionary structure of the dictionary module to obtain a first entity address;
performing address extraction processing on the address to be detected by utilizing the pre-training language model module and a preset address knowledge graph to obtain a second entity address;
performing address extraction processing on the address to be detected by using a rule engine of the regular matching module to obtain a third entity address;
and carrying out address fusion processing on the first entity address, the second entity address and the third entity address to obtain the fine-granularity address.
Preferably, according to the method for determining address quality based on a knowledge graph provided by the present application, the address extraction model of the pre-training language model module includes an input layer, an encoding layer and an output layer, and the address extraction processing is performed on the address to be detected by using the pre-training language model module and a preset address knowledge graph to obtain a second entity address, including:
constructing a representation sequence of a head entity, a head entity type, an entity relationship, a tail entity and a tail entity type of an input address to be detected by utilizing the input layer; splicing the representing sequences of the head entity, the head entity type, the entity relation, the tail entity and the tail entity type into an input sequence;
coding the input sequence by utilizing the coding layer, extracting semantic features of different layers of the coded input sequence, and splicing the semantic features of different layers;
calculating the prediction probability of the spliced semantic features by using the output layer, and outputting an output entity address with the prediction probability larger than a preset probability threshold;
and carrying out address extraction processing based on the output entity address and the knowledge graph to obtain a second entity address.
Preferably, according to the method for determining address quality based on a knowledge graph provided by the present application, the encoding of the input sequence, extracting semantic features of different levels of the encoded input sequence, and splicing the semantic features of different levels, includes:
encoding the input sequence by adopting a bidirectional Tansformer encoder, and splicing semantic features of different layers of the extracted encoded input sequence by adopting a multi-head attention mechanism;
the coding layer further comprises an input embedding layer and a position embedding layer, and the generating step of the input sequence comprises the following steps:
mapping an input address to be detected into an input vector by utilizing the input embedding layer;
and constructing a position vector of the address to be detected by utilizing the position embedding layer, and splicing the input vector and the position vector together to form an input representation of the input sequence.
Preferably, according to the method for determining address quality based on a knowledge graph provided by the present application, the address sample at least includes: a first address sample, a second address sample, wherein the sample quality of the first address sample is higher than the sample quality of the second address sample;
The address matching result at least comprises: a first address matching result and a second address matching result;
and performing matching processing on the fine-grained address and a preset address sample to obtain an address matching result, wherein the matching processing comprises the following steps:
under the condition that the fine-grained address is matched with the first address sample, obtaining a first address matching result;
and under the condition that the fine-grained address is matched with the second address sample, obtaining a second address matching result.
Preferably, according to the method for determining the quality of the address based on the knowledge graph provided by the present application, the determining the quality type of the address to be detected according to the matching result includes:
according to the first address matching result, determining that the quality type of the address to be detected is a first quality type;
according to the second address matching result, determining that the quality type of the address to be detected is a second quality type;
wherein the first quality type corresponds to a higher quality than the second quality type corresponds to a higher quality.
Preferably, according to the method for determining the quality of the address based on the knowledge graph provided by the present application, after the step of determining that the quality type of the address to be detected is the second quality type according to the second address matching result, the method further includes:
Under the condition that the quality type of the address to be detected is the second quality type, acquiring a sample type of the second address sample corresponding to the second address matching result;
and determining that the quality type of the address to be detected is the quality defect reason of the second quality type according to the sample type of the second address sample.
Preferably, according to the method for determining address quality based on a knowledge graph provided by the present application, after the step of inputting the address to be detected into a entity recognition model and outputting a fine-grained address corresponding to the address to be detected, the method further includes:
and judging the address type of the fine-grained address, and determining the quality type of the fine-grained address as the first quality type under the condition that the address type indicates the organization type.
In a second aspect, the present application further provides an address quality determining device based on a knowledge graph, including:
the acquisition module is used for acquiring a preset address to be detected;
the identification module is used for inputting the address to be detected into an entity identification model and outputting a fine-granularity address corresponding to the address to be detected;
The matching module is used for carrying out matching processing on the fine-grained address and a preset address sample to obtain an address matching result;
the determining module is used for determining the quality type of the address to be detected according to the matching result; the entity identification model is obtained by performing address structural analysis training on a plurality of address samples.
In a third aspect, the present application also provides an electronic device comprising a memory and a processor, the memory having stored therein a computer program, the processor being arranged to implement a knowledge-graph based address quality determination method as described in any of the above by execution of the computer program.
In a fourth aspect, the present application further provides a computer readable storage medium, the computer readable storage medium comprising a stored program, wherein the program when run performs an address quality determination method based on a knowledge-graph as described in any one of the above.
In a fifth aspect, the present application also provides a computer program product comprising a computer program which, when executed by a processor, implements a knowledge-graph based address quality determination method as described in any of the above.
The application provides a method, a device and a storage medium for determining address quality based on a knowledge graph, wherein a preset address to be detected is obtained; inputting the address to be detected into an entity identification model, and outputting a fine-granularity address corresponding to the address to be detected; matching the fine-grained address with a preset address sample to obtain an address matching result; determining the quality type of the address to be detected according to the matching result; the entity identification model is obtained by performing address structural analysis training on a plurality of address samples. By analyzing the address to be detected, the fine-grained address is obtained efficiently, and the quality type of the address to be detected is determined accurately according to the matching result of the fine-grained address and the address sample.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
In order to more clearly illustrate the embodiments of the application or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, and it will be obvious to a person skilled in the art that other drawings can be obtained from these drawings without inventive effort.
FIG. 1 is a schematic diagram of a hardware environment of an address quality determination method based on a knowledge graph according to the present application;
FIG. 2 is a flow chart of the method for determining the address quality based on the knowledge graph provided by the application;
fig. 3 is a schematic structural diagram of an address quality determining device based on a knowledge graph according to the present application;
FIG. 4 is a second schematic diagram of the address quality determining apparatus based on knowledge graph according to the present application;
fig. 5 is a schematic structural diagram of an electronic device provided by the present application.
Detailed Description
In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The method, the device and the storage medium for determining the address quality based on the knowledge graph are described below with reference to fig. 1 to 5, and the embodiment provided by the application can be used for efficiently obtaining the fine-grained address by analyzing the address to be detected, and accurately determining the quality type of the address to be detected according to the matching result of the fine-grained address and the address sample.
According to an aspect of the embodiment of the application, a method for determining the address quality based on a knowledge graph is provided. The address quality determining method based on the knowledge graph is widely applied to full-house intelligent digital control application scenes such as intelligent Home (Smart Home), intelligent Home equipment ecology, intelligent Home (Intelligence House) ecology and the like. Alternatively, in the present embodiment, the above-described address quality determination method based on the knowledge graph may be applied to a hardware environment constituted by the terminal device 102 and the server 104 as shown in fig. 1. As shown in fig. 1, the server 104 is connected to the terminal device 102 through a network, and may be used to provide services (such as application services and the like) for a terminal or a client installed on the terminal, a database may be set on the server or independent of the server, for providing data storage services for the server 104, and cloud computing and/or edge computing services may be configured on the server or independent of the server, for providing data computing services for the server 104.
The network may include, but is not limited to, at least one of: wired network, wireless network. The wired network may include, but is not limited to, at least one of: a wide area network, a metropolitan area network, a local area network, and the wireless network may include, but is not limited to, at least one of: WIFI (Wireless Fidelity ), bluetooth. The terminal device 102 may not be limited to a PC, a mobile phone, a tablet computer, an intelligent air conditioner, an intelligent smoke machine, an intelligent refrigerator, an intelligent oven, an intelligent cooking range, an intelligent washing machine, an intelligent water heater, an intelligent washing device, an intelligent dish washer, an intelligent projection device, an intelligent television, an intelligent clothes hanger, an intelligent curtain, an intelligent video, an intelligent socket, an intelligent sound box, an intelligent fresh air device, an intelligent kitchen and toilet device, an intelligent bathroom device, an intelligent sweeping robot, an intelligent window cleaning robot, an intelligent mopping robot, an intelligent air purifying device, an intelligent steam box, an intelligent microwave oven, an intelligent kitchen appliance, an intelligent purifier, an intelligent water dispenser, an intelligent door lock, and the like.
Fig. 2 is a flow chart of the address quality determining method based on a knowledge graph provided by the present application, and the address quality determining method based on a knowledge graph may include, but is not limited to, steps S100 to S400.
S100, acquiring a preset address to be detected;
s200, inputting the address to be detected into an entity identification model, and outputting a fine-grained address corresponding to the address to be detected;
s300, carrying out matching processing on the fine-grained address and a preset address sample to obtain an address matching result;
s400, determining the quality type of the address to be detected according to the matching result;
the entity identification model is obtained by performing address structural analysis training on a plurality of address samples.
It should be noted that, the execution body of the address quality determining method based on the knowledge graph according to the embodiment of the present application may be a hardware device with data information processing capability and/or software necessary for driving the hardware device to work.
Alternatively, the execution body may include, but is not limited to, workstations, servers, computers, user terminals, and other intelligent devices. The user terminal comprises, but is not limited to, a mobile phone, a computer, intelligent voice interaction equipment, intelligent household appliances, vehicle-mounted terminals and the like.
In step S100 of some embodiments, a preset address to be detected is obtained.
The address to be detected can be obtained through a database, the address to be detected can be obtained from a network by utilizing a crawler technology, and the address to be detected can be input by a user through operation on an input platform.
In step S200 of some embodiments, the address to be detected is input into an entity recognition model, and a fine-grained address corresponding to the address to be detected is output.
It can be understood that if the quality type of the address to be detected is to be judged, the address to be detected is input into the entity recognition model for recognition processing, and the fine-grained address corresponding to the address to be detected is directly and efficiently output.
It should be noted that the entity recognition model at least includes: dictionary module, pre-training language model module, regular matching module. The entity identification model is obtained by performing address structural analysis training on a plurality of address samples.
The step of inputting the address to be detected into the entity identification model and outputting the fine-grained address corresponding to the address to be detected may specifically be:
and carrying out structural analysis on the address to be detected by utilizing a suffix tree dictionary structure of the dictionary module to obtain a first entity address.
And carrying out address extraction processing on the address to be detected by utilizing the pre-training language model module and a preset address knowledge graph to obtain a second entity address.
And performing address extraction processing on the address to be detected by using a rule engine of the regular matching module to obtain a third entity address.
And carrying out address fusion processing on the first entity address, the second entity address and the third entity address to obtain the fine-granularity address.
It should be further noted that the first entity address includes: province name address, city name address, county name address.
The second entity address includes: street name address, community name address, cell name address, organization name address.
The third entity address includes: building name address, unit name address, floor name address, house number name address.
The fine-grained addresses are addresses composed of a plurality of set addresses with different priorities, and the set addresses are mainly divided into 3 addresses, namely a first entity address, the second entity address and the third entity address. Each entity address is further classified into a hierarchical structure, so that comparison is facilitated, and the accuracy of the address quality judgment is improved.
And carrying out structural analysis on the address to be detected by utilizing a suffix tree dictionary structure of the dictionary module to obtain a first entity address.
The suffix tree dictionary structure is a data structure, and a suffix tree of string S is a tree whose edges are marked as character strings. The suffix of each S thus uniquely corresponds to a path from the root node to the leaf node. This forms a radix tree (radix tree) of the suffix of S. A suffix tree is a special type of prefix tree (trie).
Suffix trees are proposed for the purpose of supporting efficient string matching and querying. A suffix tree T with a string S of m words is a directed tree containing a root node with exactly m leaves, which are given a number from 1 to m. Each internal node, except the root node, has at least two child nodes and each edge is identified by a non-empty string of $. The identification of any two edges from the same node will not start with the same word. Key features of the suffix tree are: for any leaf i, all the identifications from the root node to the edge the leaf experiences are concatenated to spell the suffix of S from the i position, i.e. S [ i, …, m ]. The identity of a node in a tree is defined as a concatenation of identities of all edges from the root to the node.
In the suffix tree dictionary structure of the present application, for example, the root node is Henan province, and the leaf nodes at least include, but are not limited to: zhou Kou, chunchang, kaifeng and Luoyang.
When Zhou Kou city is the root node, the corresponding leaf nodes of Zhou Kou city include, but are not limited to: chuanhui district, fu Gong county, taikang county.
Thus, one of the paths from the root node to the leaf node is: the other route is the Fu Gou county of the Yu Kou city of Henan province, yet another route is taikan county, the peripheral mouth city of the province in Henan.
Therefore, the address to be detected and the suffix tree dictionary structure are subjected to suffix tree query and comparison, and the first entity address of the address to be detected, namely the province name address, the city name address and the county name address of the address to be detected can be determined.
For example, the address to be detected is Li Ji village 5 group 13 number of the valley county of the Fu Kou city of Henan province, and the suffix tree dictionary structure of the dictionary module is utilized to perform structural analysis on the address to be detected to obtain a first entity address. The first physical address is Fu Gou county in the city of the peripheral port of Henan province.
For example, if the address to be detected is building No. 57 116 of the street east-close home in the city area of Beijing, the first physical address is the city area of Beijing.
Further, address extraction processing is carried out on the address to be detected by utilizing a rule engine of the regular matching module, and a third entity address is obtained.
It should be noted that, the rule engine is developed from the inference engine, is a component embedded in the application program, and implements the separation of business decisions from the application program code, and uses the predefined semantic module to write the business decisions. And receiving data input, interpreting the business rule, and making a business decision according to the business rule.
For example, the address to be detected is Li Ji village 5 group 13 number of the Fugou county, henan province, and the address to be detected is identified and processed by a rule engine of a regular matching module to identify the third entity address. The third physical address is 5 sets 13.
For example, if the address to be detected is the first floor 116 of the 57 th floor of the east-close home of the street in the mountain area of Beijing city, the third entity address is the first floor 116 of the 57 th floor.
And further, performing address extraction processing on the address to be detected by utilizing the pre-training language model module and a preset address knowledge graph to obtain a second entity address.
The address extraction model of the pre-training language model module comprises an input layer, an encoding layer and an output layer, wherein,
constructing a representation sequence of a head entity, a head entity type, an entity relationship, a tail entity and a tail entity type of an input address to be detected by utilizing the input layer; and splicing the representing sequences of the head entity, the head entity type, the entity relation, the tail entity and the tail entity type into an input sequence.
And coding the input sequence by using the coding layer, extracting semantic features of different layers of the coded input sequence, and splicing the semantic features of different layers.
Calculating the prediction probability of the spliced semantic features by using the output layer, and outputting an output entity address with the prediction probability larger than a preset probability threshold;
and carrying out address extraction processing based on the output entity address and the knowledge graph to obtain a second entity address.
And calculating the prediction probability of the spliced semantic features by using an output layer, and outputting an output entity address of which the prediction probability is larger than a preset probability threshold.
And matching the output entity address with the output probability larger than a preset probability threshold value with a preset address knowledge graph, firstly extracting semantic relations among a plurality of address words, and then determining a second entity address according to the semantic relations and the semantic features of the address words.
The second entity address includes: street name address, community name address, cell name address, organization name address.
For example, the address to be detected is Li Ji village 5 chives town No. 13 in Fu Gou ditch county, henan province, and the second physical address obtained is Li Jicun.
And then, if the address to be detected is the building 116 of the No. 57 street east-close home of the good town arch in the mountain area of Beijing city, the detected second entity address is the street east-close home of the good town arch.
For example, if the address to be detected is Shenli street Shenzhen university in Shenzhen mountain area, then the corresponding second physical address is Shenzhen university in Shenli street.
And performing address fusion processing on the first entity address, the second entity address and the third entity address according to a preset address ordering sequence to obtain a fine-grained address.
The address ordering sequence is the sequence of sequentially arranging the first entity address, the second entity address and the third entity address.
The coding layer adopts a bidirectional Tansformer coder to code the input sequence, and adopts a multi-head attention mechanism to splice semantic features of different layers of the extracted and coded input sequence.
The coding layer also comprises an input embedding layer and a position embedding layer. The input embedding layer is used for mapping the input address to be detected into an input vector.
And constructing a position vector of the address to be detected by utilizing the position embedding layer, and splicing the input vector and the position vector together to form an input representation of the input sequence.
BERT, an address extraction model, is based on multi-layer bi-directional transducer encoders, each employing a multi-headed attention mechanism to fuse context information around address words of the address to be detected to establish the strength (weight) of the association between address words. On top of the address extraction model, a simple linear model can be directly superimposed, and the model is combined with a specific task after fine tuning. The Transformer does not use a recursive and convolutional network, and global information of the Transformer can be obtained through parallelization training of an attribute mechanism. In order to pay attention to the syntactic features or semantic features of different positions in a long address, a Multi-headed self-attention (SelfAttention) mechanism is adopted by a transducer to acquire semantic features among address words in the long address, so that the performance of a model is improved.
Feature acquisition based on a multi-headed self-attention mechanism. The transducer adopts a multi-head attention mechanism (multi-head attention), the range of information acquisition is enlarged by stacking multiple heads, each head pays attention to the information in the range of the transducer, and the multiple heads jointly realize the full attention of the information in all the ranges, and simultaneously effectively realize parallel calculation.
The implementation process in the BERT framework is divided into two, pre-training and fine-tuning. Firstly, a large amount of non-labeling address data is used for pre-training, namely non-supervision pre-training, the initialization of BERT depends on parameters in the pre-training process, and the cost of a corpus is greatly reduced by using the non-labeling address data for pre-training. In the downstream task, a small number of labeling triples are used to adjust the initialization parameters, i.e., supervised fine tuning.
Firstly, a large amount of non-labeled address data participate in pre-training, and a method of masking word pre-training and predicting the next sentence is adopted for combination in the pre-training stage to obtain pre-training initial parameters. Then, an output layer is added behind the coding layer on the basis of not changing the BERT internal structure, and a small number of marked triples participate in parameter adjustment. Finally, the result of the encoding by the [ CLS ] special mark is taken as the output of the model.
Pre-training based on an unsupervised BERT model. The mask word prediction training method (Masked LM) of the BERT model is combined with the method of predicting the next sentence (Predict the next sentence) to perform model pre-training, so that the model obtains a better pre-training result. Since most of the parameters are pre-trained with quite enough capacity to extract higher level features, more parameters remain and adjustment of parameters is performed in a small range during downstream applications, thereby improving the speed of the model.
In step S300 of some embodiments, the fine-grained address and a preset address sample are subjected to matching processing, so as to obtain an address matching result.
It should be noted that, the address samples at least include: a first address sample and a second address sample, wherein the sample quality of the first address sample is higher than the sample quality of the second address sample.
The address matching result at least comprises: the first address matching result and the second address matching result.
It may be understood that, after the step S200 of inputting the address to be detected into the entity identification model and outputting the fine-grained address corresponding to the address to be detected, the specific implementation steps may be: and carrying out matching processing on the fine-grained address and a preset address sample.
And under the condition that the fine-grained address is matched with the first address sample, obtaining the first address matching result.
And under the condition that the fine-grained address is matched with the second address sample, obtaining a second address matching result.
Such as an address sample example to determine if the name of the village is missing:
if the village name is not available, the community is not available, and the organization name is not available, for example, the village name is judged to be the second address sample, namely the low-quality sample, when the new guest of the Fu-nationality county of the full family of the new guest of the Fu-city of the Liaoning province is clamped in the river town.
If the village name is not available, the community is not available, the organization name is not available, the town name is available, the x groups are available, and the example is that the village name is not available, namely the first address sample is a high-quality sample, if the village name is available, and the village name is available, namely the x groups are available.
Examples of address samples for determining whether to lack a city name:
if the city name is not known, for example, the city name is Mu Zhuangcun in the bergamot town of the bergamot of the deceased, the lack of the city name is determined as a second address sample, namely a low-quality sample.
If there is a province name, there is no city name, there is a county name, example: no. Long Hezhen Jin Longlu 1111 from the Hainan province, no city name is determined to be absent, and the first address sample, i.e., the high quality sample, is obtained.
Examples of address samples for determining if a cell name is missing are:
if there is a village name, a ditch name, a Tuber name, a cell-free name, a institution-free name, an example: if the Sichuan city is a city, jintang county, jin Longzhen, the animal's ditch is determined as the country address, the cell name is not shortage, and the first address sample is the high-quality sample.
If the village/ditch/Tun/village names are not available, the name of the institution is not available, the street names are available, the cell names are not available, and examples are the Qianquan street of the Zhou city of Jining in Shandong province, the cell names are judged to be absent, and the second address sample is the low-quality sample.
In some embodiments of the present application, after the step of inputting the address to be detected into the entity recognition model and outputting the fine-grained address corresponding to the address to be detected, the method further includes:
and judging the address type of the fine-grained address, and determining the quality type of the fine-grained address as the first quality type under the condition that the address type indicates the organization type.
Specifically, whether the fine-granularity address lacks a third entity address is firstly judged, the third entity address is input into an organization address database for searching under the condition that the fine-granularity address does not lack the third entity address, and the address type of the fine-granularity address is determined according to a searching result. If the address type of the fine-grained address is determined to be the organization type according to the retrieval result, the fine-grained address is directly determined to be the first quality type, namely the fine-grained address is determined to be the high quality type address.
If there is an organization name, there is no cell name, example: the center primary school of the town Du Ze in the city of the Zhejiang province, the center primary school of the town Jiang Oudu is determined to be the address of the organization (school, restaurant, company, building, industrial park, etc.), which is the complete address, and the cell name is the first address sample, i.e., the high-quality sample.
If there is no village name, no architecture name, no cell, there is xx street lane xx number, example: and judging the name of the cell which is not defective as a first address sample, namely a high-quality sample, if the number of the commercial street 18 is the number 18 of the street and the gun carriage in Xuzhou city, jiangsu province.
If the community name exists, the community name does not exist, but the building is provided with a building+unit|building+house number|unit+house number, and an example is a building 2 unit 907 of No. 8 of the international community of Hong Menzhen Baolong in New rural city, henan province, the community name is judged to be a first address sample, namely a high-quality sample.
If the community name exists, the community name does not exist, the building unit house number does not exist, and the method comprises the following steps: and judging that the community is a missing cell name as a second address sample, namely a low-quality sample, in the Qingdao city of Shandong, namely the Jiushan street Meng Shahe community of the ink district.
If there is no cell, there is only a "street office", example: the city of Jiangxi is the city of Guangdong and German town , the office of the street determines the name of the absent district as the first address sample, namely the high quality sample.
For example, an address sample for determining whether building names are missing is shown as follows:
if there is a cell, it is xx apartment, there is house number, there is no building name, example: the Yumin street college road extension horse collar apartment 3 unit 602 in the hulb region of Harbin city is determined to be the building name, which is the first address sample, namely the high quality sample.
If there is a cell name, not XX apartment, no building name, example: and Hua Luyu new city cell 6 units are built on the streets in the pottery-fixing area of the Shandong lotus city, and the name of the building is judged to be a second address sample, namely a low-quality sample.
Such as an address sample example to determine if a house number name is missing:
if there is a cell name, no house name, example: and Hua Luyu new city cell 6 units are built on the streets in the pottery field of the Shandong lotus city, and the new city cell 6 units are judged to be the name of the door-lack license plate, and the new city cell is a second address sample, namely a low-quality sample.
The address sample is a fine-grained address sample, so that the quality type of the address to be detected can be accurately judged.
In step S400 of some embodiments, a quality type of the address to be detected is determined according to the matching result.
It may be understood that after the step of performing the step S300 to match the fine-grained address with the preset address sample to obtain the address matching result, the specific performing steps may be:
And determining the quality type of the address to be detected as a first quality type according to the first address matching result. When the quality type of the address to be detected is determined to be the first quality type, the position pointed by the address can be accurately and rapidly found according to the high-quality address to be detected.
And determining the quality type of the address to be detected as a second quality type according to the second address matching result. When the quality type of the address to be detected is determined to be the second quality type, the position pointed by the address cannot be accurately determined according to the address to be detected with low quality, and the address to be detected needs to be collected again.
The quality corresponding to the first quality type is higher than the quality corresponding to the second quality type.
Further, in the case that the quality type of the address to be detected is determined to be the second quality type, a sample type of the second address sample corresponding to the second address matching result is obtained.
And determining that the quality type of the address to be detected is the quality defect reason of the second quality type according to the sample type of the second address sample.
For example, the address to be detected is Li Ji village 5 group of Zhou Kou city of Henan province. The address to be detected is a low-quality address, and the quality defect is caused by the name of the county and the name of the license plate.
Fig. 3 is a schematic diagram of the structure of the address quality determining device based on the knowledge graph, which is provided by the application, wherein the address to be detected of the input module is input into the entity recognition model module for recognition processing, and the dictionary module performs recognition processing on the address to be detected to obtain a first entity address. The pre-training language model module performs recognition processing on the address to be detected to obtain a second entity address. And the regular matching module performs identification processing on the address to be detected to obtain a third identification address. And the output module performs address fusion processing on the first entity address, the second entity address and the third entity address according to the address sequencing order to obtain a fine-grained address. And then, the sample matching module is used for matching the fine-grained address and the address sample, so that the quality type of the address to be detected can be directly and accurately determined.
The application provides a method, a device and a storage medium for determining address quality based on a knowledge graph, wherein a preset address to be detected is obtained; inputting the address to be detected into an entity identification model, and outputting a fine-granularity address corresponding to the address to be detected; matching the fine-grained address with a preset address sample to obtain an address matching result; determining the quality type of the address to be detected according to the matching result; the entity identification model is obtained by performing address structural analysis training on a plurality of address samples. By analyzing the address to be detected, the fine-grained address is obtained efficiently, and the quality type of the address to be detected is determined accurately according to the matching result of the fine-grained address and the address sample.
The address quality determining apparatus based on a knowledge graph provided by the present application will be described below, and the address quality determining apparatus based on a knowledge graph described below and the address quality determining method based on a knowledge graph described above may be referred to correspondingly with each other.
Fig. 4 is a second schematic structural diagram of an address quality determining device based on a knowledge graph according to the present application, where the address quality determining device based on a knowledge graph includes:
an obtaining module 410, configured to obtain a preset address to be detected;
the identification module 420 is configured to input the address to be detected into an entity identification model, and output a fine-grained address corresponding to the address to be detected;
the matching module 430 is configured to perform matching processing on the fine-grained address and a preset address sample, so as to obtain an address matching result;
a determining module 440, configured to determine a quality type of the address to be detected according to the matching result; the entity identification model is obtained by performing address structural analysis training on a plurality of address samples.
Preferably, according to the address quality determining device based on a knowledge graph provided by the present application, the entity identification model at least includes: dictionary module, pre-training language model module, regular matching module;
The identifying module 420 is specifically configured to perform structural analysis on the address to be detected by using a suffix tree dictionary structure of the dictionary module, so as to obtain a first entity address;
performing address extraction processing on the address to be detected by utilizing the pre-training language model module and a preset address knowledge graph to obtain a second entity address;
performing address extraction processing on the address to be detected by using a rule engine of the regular matching module to obtain a third entity address;
and carrying out address fusion processing on the first entity address, the second entity address and the third entity address to obtain the fine-granularity address.
Preferably, according to the knowledge-graph-based address quality determining apparatus provided by the present application, the recognition module 420, the address extraction model specifically used for the pre-training language model module, includes an input layer, an encoding layer and an output layer, wherein,
constructing a representation sequence of a head entity, a head entity type, an entity relationship, a tail entity and a tail entity type of an input address to be detected by utilizing the input layer; splicing the representing sequences of the head entity, the head entity type, the entity relation, the tail entity and the tail entity type into an input sequence;
Coding the input sequence by utilizing the coding layer, extracting semantic features of different layers of the coded input sequence, and splicing the semantic features of different layers;
calculating the prediction probability of the spliced semantic features by using the output layer, and outputting an output entity address with the prediction probability larger than a preset probability threshold;
and carrying out address extraction processing based on the output entity address and the knowledge graph to obtain a second entity address.
Preferably, according to the address quality determining device based on a knowledge graph provided by the application, the identification module 420 is specifically configured to encode the input sequence by using a bidirectional Tansformer encoder by the encoding layer, and splice semantic features of different layers of the input sequence after extracting the encoding by using a multi-head attention mechanism;
the coding layer further comprises an input embedding layer and a position embedding layer, and the generating step of the input sequence comprises the following steps:
mapping an input address to be detected into an input vector by utilizing the input embedding layer;
and constructing a position vector of the address to be detected by utilizing the position embedding layer, and splicing the input vector and the position vector together to form an input representation of the input sequence.
Preferably, according to the address quality determining device based on the knowledge graph provided by the application, the address sample at least comprises: a first address sample, a second address sample, wherein the sample quality of the first address sample is higher than the sample quality of the second address sample;
the address matching result at least comprises: a first address matching result and a second address matching result;
a matching module 430, configured to obtain the first address matching result when the fine-grained address and the first address sample are matched;
and under the condition that the fine-grained address is matched with the second address sample, obtaining a second address matching result.
Preferably, according to the address quality determining device based on a knowledge graph provided by the present application, a determining module 440 is configured to determine, according to the first address matching result, that a quality type of the address to be detected is a first quality type;
and determining the quality type of the address to be detected as a second quality type according to the second address matching result, wherein the quality corresponding to the first quality type is higher than the quality corresponding to the second quality type.
Preferably, according to the address quality determining device based on a knowledge graph provided by the application, under the condition that the quality type of the address to be detected is determined to be the second quality type, the sample type of the second address sample corresponding to the second address matching result is obtained;
and determining that the quality type of the address to be detected is the quality defect reason of the second quality type according to the sample type of the second address sample.
Preferably, according to the address quality determining device based on a knowledge graph provided by the application, after the step of inputting the address to be detected into the entity identification model and outputting the fine-grained address corresponding to the address to be detected, the device is further configured to determine an address type of the fine-grained address, and determine that a quality type of the fine-grained address is the first quality type if the address type indicates a mechanism type.
The application provides a method, a device and a storage medium for determining address quality based on a knowledge graph, wherein a preset address to be detected is obtained; inputting the address to be detected into an entity identification model, and outputting a fine-granularity address corresponding to the address to be detected; matching the fine-grained address with a preset address sample to obtain an address matching result; determining the quality type of the address to be detected according to the matching result; the entity identification model is obtained by performing address structural analysis training on a plurality of address samples. By analyzing the address to be detected, the fine-grained address is obtained efficiently, and the quality type of the address to be detected is determined accurately according to the matching result of the fine-grained address and the address sample.
Fig. 5 illustrates a physical schematic diagram of an electronic device, as shown in fig. 5, the electronic device may include: processor 510, communication interface (Communications Interface) 520, memory 530, and communication bus 540, wherein processor 510, communication interface 520, memory 530 complete communication with each other through communication bus 540. Processor 510 may invoke logic instructions in memory 530 to perform a knowledge-graph based address quality determination method comprising: acquiring a preset address to be detected; inputting the address to be detected into an entity identification model, and outputting a fine-granularity address corresponding to the address to be detected; matching the fine-grained address with a preset address sample to obtain an address matching result; determining the quality type of the address to be detected according to the matching result; the entity identification model is obtained by performing address structural analysis training on a plurality of address samples.
Further, the logic instructions in the memory 530 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present application also provides a computer program product, where the computer program product includes a computer program, where the computer program can be stored on a computer readable storage medium, and when the computer program is executed by a processor, the computer can perform a method for determining address quality based on a knowledge graph provided by the above methods, and the method includes: acquiring a preset address to be detected; inputting the address to be detected into an entity identification model, and outputting a fine-granularity address corresponding to the address to be detected; matching the fine-grained address with a preset address sample to obtain an address matching result; determining the quality type of the address to be detected according to the matching result; the entity identification model is obtained by performing address structural analysis training on a plurality of address samples.
In still another aspect, the present application further provides a computer readable storage medium, where the computer readable storage medium includes a stored program, where the program executes the method for determining the quality of an address based on a knowledge graph provided by the above methods, where the method includes: acquiring a preset address to be detected; inputting the address to be detected into an entity identification model, and outputting a fine-granularity address corresponding to the address to be detected; matching the fine-grained address with a preset address sample to obtain an address matching result; determining the quality type of the address to be detected according to the matching result; the entity identification model is obtained by performing address structural analysis training on a plurality of address samples.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (10)

1. The address quality determining method based on the knowledge graph is characterized by comprising the following steps of:
acquiring a preset address to be detected;
inputting the address to be detected into an entity identification model, and outputting a fine-granularity address corresponding to the address to be detected;
matching the fine-grained address with a preset address sample to obtain an address matching result;
determining the quality type of the address to be detected according to the matching result;
the entity identification model is obtained by performing address structural analysis training on a plurality of address samples.
2. The method for determining the quality of an address based on a knowledge-graph according to claim 1, wherein,
The entity recognition model at least comprises: dictionary module, pre-training language model module, regular matching module;
inputting the address to be detected into an entity identification model, outputting a fine-grained address corresponding to the address to be detected, and the method comprises the following steps:
performing structural analysis on the address to be detected by utilizing a suffix tree dictionary structure of the dictionary module to obtain a first entity address;
performing address extraction processing on the address to be detected by utilizing the pre-training language model module and a preset address knowledge graph to obtain a second entity address;
performing address extraction processing on the address to be detected by using a rule engine of the regular matching module to obtain a third entity address;
and carrying out address fusion processing on the first entity address, the second entity address and the third entity address to obtain the fine-granularity address.
3. The method for determining the quality of an address based on a knowledge-graph according to claim 2, wherein,
the address extraction model of the pre-training language model module comprises an input layer, a coding layer and an output layer;
the address extraction processing is performed on the address to be detected by using the pre-training language model module and a preset address knowledge graph to obtain a second entity address, including:
Constructing a representation sequence of a head entity, a head entity type, an entity relationship, a tail entity and a tail entity type of an input address to be detected by utilizing the input layer; splicing the representing sequences of the head entity, the head entity type, the entity relation, the tail entity and the tail entity type into an input sequence;
coding the input sequence by utilizing the coding layer, extracting semantic features of different layers of the coded input sequence, and splicing the semantic features of different layers;
calculating the prediction probability of the spliced semantic features by using the output layer, and outputting an output entity address with the prediction probability larger than a preset probability threshold;
and carrying out address extraction processing based on the output entity address and the knowledge graph to obtain a second entity address.
4. The method for determining the quality of an address based on a knowledge-graph according to claim 3, wherein,
the encoding the input sequence, extracting semantic features of different layers of the encoded input sequence, and splicing the semantic features of different layers comprises the following steps:
encoding the input sequence by adopting a bidirectional Tansformer encoder, and splicing semantic features of different layers of the extracted encoded input sequence by adopting a multi-head attention mechanism;
The coding layer further comprises an input embedding layer and a position embedding layer, and the generating step of the input sequence comprises the following steps:
mapping an input address to be detected into an input vector by utilizing the input embedding layer;
and constructing a position vector of the address to be detected by utilizing the position embedding layer, and splicing the input vector and the position vector together to form an input representation of the input sequence.
5. The method for determining the quality of an address based on a knowledge-graph according to claim 1, wherein,
the address sample includes at least: a first address sample, a second address sample, wherein the sample quality of the first address sample is higher than the sample quality of the second address sample;
the address matching result at least comprises: a first address matching result and a second address matching result;
and performing matching processing on the fine-grained address and a preset address sample to obtain an address matching result, wherein the matching processing comprises the following steps:
under the condition that the fine-grained address is matched with the first address sample, obtaining a first address matching result;
and under the condition that the fine-grained address is matched with the second address sample, obtaining a second address matching result.
6. The method for determining address quality based on a knowledge-graph as claimed in claim 5, wherein,
the determining the quality type of the address to be detected according to the matching result comprises the following steps:
according to the first address matching result, determining that the quality type of the address to be detected is a first quality type;
according to the second address matching result, determining that the quality type of the address to be detected is a second quality type;
wherein the first quality type corresponds to a higher quality than the second quality type corresponds to a higher quality.
7. The method for determining address quality based on a knowledge-graph as claimed in claim 6, wherein,
after the step of determining that the quality type of the address to be detected is the second quality type according to the second address matching result, the method further includes:
under the condition that the quality type of the address to be detected is the second quality type, acquiring a sample type of the second address sample corresponding to the second address matching result;
and determining that the quality type of the address to be detected is the quality defect reason of the second quality type according to the sample type of the second address sample.
8. The method for determining address quality based on a knowledge-graph as claimed in claim 6, wherein,
after the step of inputting the address to be detected into the entity recognition model and outputting the fine-grained address corresponding to the address to be detected, the method further comprises:
and judging the address type of the fine-grained address, and determining the quality type of the fine-grained address as the first quality type under the condition that the address type indicates the organization type.
9. An address quality determining device based on a knowledge graph, comprising:
the acquisition module is used for acquiring a preset address to be detected;
the identification module is used for inputting the address to be detected into an entity identification model and outputting a fine-granularity address corresponding to the address to be detected;
the matching module is used for carrying out matching processing on the fine-grained address and a preset address sample to obtain an address matching result;
the determining module is used for determining the quality type of the address to be detected according to the matching result; the entity identification model is obtained by performing address structural analysis training on a plurality of address samples.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored program, wherein the program when run performs the method of any one of claims 1 to 8.
CN202311154182.3A 2023-09-07 2023-09-07 Address quality determining method and device based on knowledge graph and storage medium Pending CN117131860A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311154182.3A CN117131860A (en) 2023-09-07 2023-09-07 Address quality determining method and device based on knowledge graph and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311154182.3A CN117131860A (en) 2023-09-07 2023-09-07 Address quality determining method and device based on knowledge graph and storage medium

Publications (1)

Publication Number Publication Date
CN117131860A true CN117131860A (en) 2023-11-28

Family

ID=88859830

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311154182.3A Pending CN117131860A (en) 2023-09-07 2023-09-07 Address quality determining method and device based on knowledge graph and storage medium

Country Status (1)

Country Link
CN (1) CN117131860A (en)

Similar Documents

Publication Publication Date Title
CN106909643B (en) Knowledge graph-based social media big data topic discovery method
US8626835B1 (en) Social identity clustering
CN110020433B (en) Industrial and commercial high-management name disambiguation method based on enterprise incidence relation
CN113515634B (en) Social media rumor detection method and system based on hierarchical heterogeneous graph neural network
CN104717124A (en) Friend recommendation method, device and server
CN101751385A (en) Multilingual information extraction method adopting hierarchical pipeline filter system structure
CN115017425B (en) Location search method, location search device, electronic device, and storage medium
CN103246731A (en) Web service semantic annotation method based on associated data
CN113592037A (en) Address matching method based on natural language inference
CN111414357A (en) Address data processing method, device, system and storage medium
CN114692605A (en) Keyword generation method and device fusing syntactic structure information
CN114676353B (en) Address matching method based on segmentation inference
CN115658837A (en) Address data processing method and device, electronic equipment and storage medium
CN114936627A (en) Improved segmentation inference address matching method
CN111460277B (en) Personalized recommendation method based on mobile social network tree-shaped transmission path
CN117131860A (en) Address quality determining method and device based on knowledge graph and storage medium
CN111460044B (en) Geographic position data processing method and device
CN116881430A (en) Industrial chain identification method and device, electronic equipment and readable storage medium
CN114925210B (en) Knowledge graph construction method, device, medium and equipment
CN114297453B (en) Alarm prediction method and device, electronic equipment and storage medium
CN114900435B (en) Connection relation prediction method and related equipment
CN114638308A (en) Method and device for acquiring object relationship, electronic equipment and storage medium
CN115827890A (en) Hot event knowledge graph link estimation method based on network social platform
CN115114930A (en) Non-continuous entity identification method based on sequence to forest
CN114124417B (en) Vulnerability assessment method with enhanced expandability under large-scale network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination