CN116992880A - Building name identification method, device, electronic equipment and storage medium - Google Patents

Building name identification method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116992880A
CN116992880A CN202310838486.5A CN202310838486A CN116992880A CN 116992880 A CN116992880 A CN 116992880A CN 202310838486 A CN202310838486 A CN 202310838486A CN 116992880 A CN116992880 A CN 116992880A
Authority
CN
China
Prior art keywords
enterprise
building
name
identifying
algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310838486.5A
Other languages
Chinese (zh)
Inventor
李真真
宋保国
张勇
赵济朋
司晨雨
张孝
杨羽飞
张愿
张高恒
韩盈盈
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Online Services Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Online Services Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Online Services Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202310838486.5A priority Critical patent/CN116992880A/en
Publication of CN116992880A publication Critical patent/CN116992880A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

The invention provides a method and a device for identifying a building name, electronic equipment and a storage medium, and relates to the technical field of artificial intelligence. The method comprises the following steps: acquiring enterprise data of an enterprise to be queried; identifying the target name of the building to which the enterprise to be queried belongs from the enterprise data based on a preset multi-level algorithm; the multi-level algorithm at least comprises a BERT-CRF algorithm and an electronic fence algorithm. The invention utilizes the multilevel algorithm comprising the BERT-CRF algorithm and the electronic fence algorithm to identify the name of the building, so as to avoid the situation that the name with the azimuth word cannot be accurately identified, and also makes up the gap that the registration address cannot be identified, so that the coverage range of identifiable data is more comprehensive, the matching is more accurate, and the accuracy of identifying the name of the building is further improved.

Description

Building name identification method, device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of artificial intelligence technologies, and in particular, to a method and apparatus for identifying a building name, an electronic device, and a storage medium.
Background
Along with the continuous increase of camping enterprises, and most of the camping enterprises are in the digital transformation process, in order to expand the markets of the camping enterprises, accurate marketing by assisting a client manager by identifying building names gathered by government enterprise clients has become a new mode.
At present, a regular matching method is generally adopted for identifying the names of buildings in the government and enterprise market, but the regular matching method cannot accurately identify words with multiple meanings, and the names of the buildings cannot be identified according to the data with no names and no matching addresses of registered addresses, so that the identification accuracy of the names of the buildings is low.
Disclosure of Invention
The invention provides a method, a device, electronic equipment and a storage medium for identifying a building name, which are used for solving the defects that in the prior art, a word with multiple meanings cannot be identified, and the building name cannot be identified aiming at data of which the registration addresses cannot be matched, so that the accuracy of identifying the building name is reduced.
The invention provides a method for identifying a building name, which comprises the following steps:
acquiring enterprise data of an enterprise to be queried;
identifying the target name of the building to which the enterprise to be queried belongs from the enterprise data based on a preset multi-level algorithm;
the multi-level algorithm at least comprises a BERT-CRF algorithm and an electronic fence algorithm.
According to the method for identifying a building name provided by the invention, the identifying the target name of the building to which the enterprise to be queried belongs from the enterprise data based on the preset multi-level algorithm comprises the following steps:
Converting the registration address in the enterprise data into sentence vector based on the BERT model in the BERT-CRF algorithm;
decoding the sentence vector based on a CRF model in the BERT-CRF algorithm, and determining a labeling sequence of the sentence vector;
identifying a target sequence representing the entity name from the labeling sequence;
and identifying a target sentence vector corresponding to the target sequence, and determining the target name of the building to which the enterprise to be queried belongs based on the target sentence vector.
According to the method for identifying the building name provided by the invention, the identifying of the target sentence vector corresponding to the target sequence and the determining of the target name of the building to which the enterprise to be queried belongs based on the target sentence vector comprise the following steps:
identifying a target sentence vector corresponding to the target sequence;
if the target sentence vector does not include the target name of the building to which the enterprise to be queried belongs, dictionary matching is performed on the target sentence vector based on a preset corpus, and the target name of the building to which the enterprise to be queried belongs is determined.
According to the method for identifying a building name provided by the invention, the target name of the building to which the enterprise to be queried belongs is identified from the enterprise data based on the preset multi-level algorithm, and the method further comprises the following steps:
If the target name of the building to which the enterprise to be queried belongs cannot be identified based on the BERT-CRF algorithm and the registration address in the enterprise data, the longitude and latitude information of the enterprise to be queried is screened out from the enterprise data;
and identifying the target name of the building to which the enterprise to be queried belongs based on the electronic fence algorithm and the longitude and latitude information.
According to the method for identifying a building name provided by the invention, the identifying the target name of the building to which the enterprise to be queried belongs based on the electronic fence algorithm and the longitude and latitude information comprises the following steps:
determining a fence to be detected from a preset electronic fence table library based on the longitude and latitude information and a preset query range;
determining the theodolite of the enterprise to be queried based on the longitude and latitude information;
taking the theodolite as an endpoint, respectively leading out a virtual ray in two opposite directions, and judging the number of intersecting points of each virtual ray and each fence to be tested;
and determining a target fence corresponding to the enterprise to be queried based on the intersection points, and determining a building name corresponding to the target fence.
According to the method for identifying a building name provided by the invention, the method for converting the registration address in the enterprise data into a sentence vector based on the BERT model in the BERT-CRF algorithm, which comprises the following steps:
acquiring sample data marked with complete entity names;
training a preset model to be trained based on the sample data to obtain a BERT-CRF primary model, and pre-identifying based on the BERT model to obtain a training name;
if the training names and the entity names marked are different, the labels of the sample data are adjusted based on the differences, then the BERT-CRF primary model is trained until the training names and the entity names are not different, and then a BERT-CRF algorithm is determined;
the model to be trained is obtained by superposing a BERT model framework and a CRF model framework.
According to the method for identifying a building name provided by the invention, the method for converting the registration address in the enterprise data into a sentence vector based on the BERT model in the BERT-CRF algorithm, which comprises the following steps:
if the entity name is screened out from the registration address of the enterprise data, identifying the target name of the building to which the enterprise to be queried belongs from a preset corpus based on the entity name;
If the entity name is not screened from the registration address or the target name of the building to which the enterprise to be queried belongs cannot be identified based on the entity name, identifying the target name of the building to which the enterprise to be queried belongs by utilizing the BERT-CRF algorithm.
The invention also provides a device for identifying the name of the building, which comprises:
the acquisition module is used for acquiring enterprise data of an enterprise to be queried;
the first identification module is used for identifying the target name of the building to which the enterprise to be queried belongs from the enterprise data based on a preset multi-level algorithm;
the multi-level algorithm at least comprises a BERT-CRF algorithm and an electronic fence algorithm.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of identifying a building name as described in any of the above when executing the program.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of identifying a building name as described in any of the above.
The identification method, the device, the electronic equipment and the storage medium for the building names, which are provided by the application, can not identify the ambiguous words of the word in the prior art, and can not identify the building names aiming at the data which cannot be matched with the registration addresses, so that the identification accuracy of the building names is reduced; identifying the target name of the building to which the enterprise to be queried belongs from the enterprise data based on a preset multi-level algorithm; the multi-level algorithm at least comprises a BERT-CRF algorithm and an electronic fence algorithm. In the application, after the data of the enterprise to be queried is acquired, the building name of the enterprise to be queried is identified from the enterprise data by sequentially utilizing a BERT-CRF algorithm and an electronic fence algorithm, wherein the BERT-CRF (Bidirectional Encoder Representation from Transformers-Conditional Random Field, bidirectional conversion encoder-undirected graph model conditional random field) algorithm can analyze the semantic relation among words and label the enterprise data in sequence, and the analysis problem of the BERT-CRF algorithm can be solved by utilizing the electronic fence method, so that the data coverage is more complete and the matching is more accurate, namely, the name of the building is identified by utilizing a multi-level algorithm comprising the BERT-CRF algorithm and the electronic fence algorithm, so that the name with the azimuth word cannot be accurately identified is avoided, the problem that the identification by utilizing a registration address is also overcome, the coverage of identifiable data is more comprehensive, the matching is more accurate, and the accuracy of identifying the name of the building is improved.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a method for identifying building names according to the present invention;
FIG. 2 is a second flow chart of the method for identifying building names according to the present invention;
FIG. 3 is a schematic workflow diagram of a method for identifying building names according to the present invention;
FIG. 4 is a schematic diagram of a BERT-CRF algorithm recognition process in the building name recognition method provided by the invention;
FIG. 5 is a schematic diagram of the construction process of an electronic fence list library in the identification method of building names provided by the invention;
FIG. 6 is a schematic diagram of a BERT-CRF algorithm training flow in the method for identifying building names provided by the invention;
FIG. 7 is a schematic diagram illustrating a ray method for identifying a building name according to the present invention;
fig. 8 is a schematic structural diagram of a device for identifying a name of a building according to the present invention;
Fig. 9 is a schematic structural diagram of an electronic device provided by the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
An embodiment of the present application provides a method for identifying a building name, in a first embodiment of the method for identifying a building name of the present application, referring to fig. 1, the method for identifying a building name includes:
step S10, obtaining enterprise data of an enterprise to be queried;
step S20, identifying the target name of the building to which the enterprise to be queried belongs from the enterprise data based on a preset multi-level algorithm;
the multi-level algorithm at least comprises a BERT-CRF algorithm and an electronic fence algorithm.
The present embodiment aims at: the specific meaning of the ambiguous words in the enterprise data is accurately identified, and when the target name of the building to which the enterprise to be queried belongs cannot be identified based on the registration address, the range of the identification data can be expanded, so that the accuracy of identifying the building name is improved.
In this embodiment, it is to be noted that the identification method of the building name may be applied to an identification device of the building name belonging to an identification apparatus of the building name belonging to an identification system of the building name.
The building name recognition mainly refers to recognition of entity names, namely, recognition of entities with specific meanings in texts, wherein the entities can be person names, place names, organization names, proper nouns and the like, and the method is not limited in detail.
The BERT-CRF algorithm is formed by overlapping a BERT model and a CRF model together.
The BERT model can simultaneously consider the contextual information of the words so as to more accurately identify the meaning expressed by the azimuth word or the ambiguous word in the registered address.
The CRF model can accurately label the relation of the state sequence from the enterprise data according to semantic information.
In this embodiment, referring to fig. 3, the multi-level algorithm further includes a dictionary matching method, after receiving the enterprise data (i.e., the business enterprise data information), the dictionary matching method is used to identify the target name of the building to which the enterprise to be queried belongs, if the dictionary matching method is used to identify the target name of the building to which the enterprise to be queried cannot be identified, the BERT-CRF algorithm is used to identify the target name of the building to which the enterprise to be queried belongs, and if the BERT-CRF algorithm is used to identify the target name of the building to which the enterprise to be queried cannot be accurately identified, the electronic fence method is used to identify the result obtained by the BERT-CRF algorithm, and finally the target name of the building to which the enterprise to be queried belongs is obtained. That is, the enterprise data is identified by a multi-hierarchy algorithm level by level until the target name of the building to which the enterprise to be queried belongs is identified from the enterprise data, so as to improve the accuracy of the identification.
It should be noted that, the dictionary matching method can directly identify the registered address in the enterprise data, and directly screen the entity name from the registered address, so as to avoid the identification of the registered address and the market-level address, and improve the efficiency of identifying the building name.
In this embodiment, when the building name of the enterprise to be queried is identified by using the BERT-CRF algorithm, the entity name in the registration address is identified, when the registration address is incomplete, the building name of the enterprise to be queried can be predicted by using the BERT-CRF algorithm, and after the prediction, the building name is required to be perfected or checked again by using a dictionary matching method, and the accuracy part of the prediction is determined, so that the accuracy of the identification is improved.
It should be noted that, the more detailed registration address of the enterprise to be queried is predicted through the BERT-CRF algorithm, and when the building name cannot be recognized according to the address, the longitude and latitude information of the enterprise to be queried is further determined by using the address, so as to improve the accuracy of the longitude and latitude information, and improve the identifiable rate and the accuracy of recognizing the building name of the enterprise to be queried by using the electronic fence algorithm.
In this embodiment, when the BERT-CRF algorithm is used to identify the entity name in the registered address, the BERT model may be used to identify the semantics in the natural language, so as to accurately determine the specific meaning represented by the azimuth word in the registered address, thereby reducing the probability of identifying errors and improving the accuracy of identifying the building name. For example, "west" in "a city B district temple" is a part of proper nouns and does not represent an orientation, so the BERT-CRF algorithm treats "west" as proper nouns according to semantics and does not treat "west" as an orientation word when recognizing "a city B district temple".
The method comprises the following specific steps:
step S10, obtaining enterprise data of an enterprise to be queried;
the enterprise data may be data of one enterprise or may be a data set of a plurality of enterprises.
The enterprise data at least comprises a registration address of an enterprise, an enterprise name, longitude and latitude information of the enterprise and the like.
Step S20, identifying the target name of the building to which the enterprise to be queried belongs from the enterprise data based on a preset multi-level algorithm;
the multi-level algorithm at least comprises a BERT-CRF algorithm and an electronic fence algorithm.
In this embodiment, the multi-level algorithm further includes a dictionary matching method, and after the enterprise data is acquired, the target name of the building to which the enterprise to be queried belongs, that is, the name of the building to which the enterprise to be queried belongs, is first identified by using the dictionary matching method. For example, the registered address of the Q company is the number of building 1234 in the we park 14 in the B area of the a city, and the name of the building obtained by recognition by the recognition dictionary matching method is the we park.
In this embodiment, the building name is identified by using a multi-level algorithm, that is, the building name is identified by using a dictionary matching method first, and after the building name is not identified by using the dictionary matching method, the building name is identified by using a BERT-CRF algorithm, and if the building name is not identified or fails to be identified by using the BERT-CRF algorithm, the building name is identified by using an electronic fence algorithm.
It should be noted that, the BERT-CRF algorithm may predict a complete registration address, and accurately identify the meaning of each word in the registration address from the registration address, that is, accurately identify the meaning of the word with multiple meanings expressed in the registration address, so as to improve the accuracy of identifying the name of the building.
It should be noted that, the building name is identified by using the electronic fence algorithm, so that the defect that the building name cannot be identified by using the keywords (entity names) can be overcome, the coverage of identifiable data is more comprehensive, and the matching is more accurate.
Specifically, the step of identifying, from the enterprise data, the target name of the building to which the enterprise to be queried belongs based on the preset multi-hierarchy algorithm includes:
s21, converting the registered address in the enterprise data into sentence vectors based on a BERT model in the BERT-CRF algorithm;
s22, decoding the sentence vector based on a CRF model in the BERT-CRF algorithm, and determining a labeling sequence of the sentence vector;
step S23, identifying a target sequence for representing the entity name from the labeling sequence;
and step S24, identifying a target sentence vector corresponding to the target sequence, and determining the target name of the building to which the enterprise to be queried belongs based on the target sentence vector.
It should be noted that the BERT-CRF algorithm includes a BERT model and a CRF model, firstly, the BERT model is utilized to convert a registered address in enterprise data into sentence vectors, semantic relations among words in the registered address are analyzed to accurately identify meaning of each word in the registered address, and then the CRF model is utilized to sequence the sentence vectors according to the semantic relations to identify entity names in the registered address, so as to improve accuracy of identifying building names.
In this embodiment, the labeling format of the CRF model may use a BIO labeling method, that is, B (begin) represents the beginning of an entity, I (inside) represents the middle and end of an entity, and O (outside) represents a non-entity part. For example, the labeling sequence of "building number 1234 in city B, area we park No. 14" is "oooobiiiooooooo", that is, the correspondence is "building number (O) 1 (O) 2 (O) 3 (O) 4 (O)", which is "city (O) B (O) area (O) w (B) e (I) park (I) area (I) 1 (O) 4 (O)".
In this embodiment, after converting the registration address into the sentence vector, the CRF model and the semantic relationship are used to label the registration address, and then the entity name in the registration address is identified according to the label sequence, and the entity name, that is, the target name of the building to which the enterprise to be queried belongs, is output.
In the embodiment, the strong language characterization capability and the feature extraction capability of the BERT-CRF algorithm are utilized, so that the automatic learning of the features can be realized, the manual participation is reduced, the problem of serious dependence on a corpus is solved, the segmentation, the part-of-speech labeling, the named entity recognition and the like are more accurate, and the accuracy of recognizing the name of the building is further improved.
Specifically, the step of identifying a target sentence vector corresponding to the target sequence and determining, based on the target sentence vector, a target name of a building to which the enterprise to be queried belongs includes:
step A10, identifying a target sentence vector corresponding to the target sequence;
and step A20, if the target sentence vector does not comprise the target name of the building to which the enterprise to be queried belongs, performing dictionary matching on the target sentence vector based on a preset corpus, and determining the target name of the building to which the enterprise to be queried belongs.
The corpus comprises enterprise names and detailed address information corresponding to the enterprises; the address information includes the city, area and park or building address to which the business belongs, etc.
In this embodiment, because there may be an unspecified situation, for example, "building No. 14 in a city B area", in the obtained enterprise data, the BERT-CRF algorithm is required to predict the registered address to perfect the registered address, and then dictionary matching is performed through a preset corpus to determine the accuracy of prediction.
In this embodiment, referring to fig. 4, when the registration address lacks an entity name or the registration address is not detailed, the corpus cannot be matched with the target name of the building to which the enterprise to be queried belongs, and since the BERT-CRF algorithm is a deep learning algorithm, after the registration address is input into the BERT-CRF algorithm, the BERT-CRF algorithm first determines the detail degree of the registration address, predicts the target name of the building to which the enterprise to be queried belongs according to the detail degree, and detects the prediction result by a dictionary matching method, so as to improve the accuracy of identifying the building name.
Specifically, the step of identifying, from the enterprise data, the target name of the building to which the enterprise to be queried belongs based on the preset multi-hierarchy algorithm further includes:
step B10, if the target name of the building to which the enterprise to be queried belongs cannot be identified based on the BERT-CRF algorithm and the registration address in the enterprise data, the longitude and latitude information of the enterprise to be queried is screened out from the enterprise data;
and step B20, identifying the target name of the building to which the enterprise to be queried belongs based on the electronic fence algorithm and the longitude and latitude information.
In this embodiment, if the registration address predicted by the BERT-CRF algorithm cannot accurately identify the target name of the building to which the enterprise to be queried belongs, the longitude and latitude information of the enterprise to be queried is screened out from the enterprise data, and then the building name of the enterprise to be queried is determined by using the electronic fence algorithm and the longitude and latitude information.
It should be noted that, when determining the target name of the building to which the enterprise to be queried belongs by using the electronic fence algorithm, the range to which the enterprise to be queried belongs needs to be primarily determined, so as to reduce the number of calculated electronic fences and improve the recognition efficiency.
In the embodiment, the electronic fence algorithm is utilized to make up for the unexplainability and the unidentified data condition of the BERT-CRF algorithm, so that the problem of low utilization rate of the data and the direction words is solved, the cost of manual marking and manual participation in maintenance is effectively reduced, the problems of resource waste and manual errors are avoided, and the robustness and generalization capability of the model are enhanced.
In this embodiment, before the target name of the building to which the enterprise to be queried belongs is identified by using an electronic fence algorithm, a building list sample library needs to be built, referring to fig. 5, the sources of the building list may be a list collected by each branch company of an operator, a list obtained through internet data, or a list identified through deep learning, after the building list sample library is obtained, longitude and latitude of each building in the list are collected by using an internet crawler technology, electronic fences of each building are formed according to the longitude and latitude, and the longitude and latitude list library is obtained after each electronic fence is collected; in order to avoid repeated data and abnormal data in the building list sample library, the collected building list and longitude and latitude list library are cleaned to ensure that the collection result is consistent with the input building name, so that the building name identified by the electronic fence algorithm is accurate.
Specifically, the step of identifying the target name of the building to which the enterprise to be queried belongs based on the electronic fence algorithm and the longitude and latitude information includes:
step B21, determining a fence to be detected from a preset electronic fence table library based on the longitude and latitude information and a preset query range;
step B22, determining the theodolite of the enterprise to be queried based on the longitude and latitude information;
step B23, taking the theodolite as an endpoint, respectively leading out a virtual ray in two opposite directions, and judging the number of intersecting points of each virtual ray and each fence to be tested;
and step B24, determining a target fence corresponding to the enterprise to be queried based on the number of intersecting points, and determining a building name corresponding to the target fence.
The electronic fence list library is obtained by integrating a building list sample library and a longitude and latitude list library.
In this embodiment, the latitude and longitude information may be latitude and longitude information in enterprise data, or may be updated latitude and longitude information after predicting the enterprise to be queried through a BERT-CRF algorithm, which is not specifically limited.
In this embodiment, the range in which the enterprise to be queried may be initially determined according to the latitude and longitude information and the electronic fence table library, and the electronic fence within the range is defined as the fence to be tested, so that the number of electronic fences to be identified is reduced, the calculated amount is reduced, and the efficiency of identifying the building name is improved.
In this embodiment, when the target name of the building to which the enterprise to be queried belongs is identified by using the electronic fence algorithm, the identification is performed by using a ray method, that is, a ray is extended in a certain direction by taking the longitude and latitude of the enterprise to be queried as an endpoint, the point number of intersection of the ray and the electronic fence is calculated, and the name of the building to which the enterprise to be queried specifically belongs is determined according to the point number. Wherein the intersection points are in the electronic fence when the intersection points are odd numbers, and are outside the electronic fence when the intersection points are even numbers, namely, when the intersection points are odd numbers, the enterprise to be queried is positioned in the identified building; when the number of the intersecting points is even, the enterprise to be queried is located outside the identified building.
Specifically, referring to fig. 7, a ray is sent to the right from a black point in (a), and 1 intersection point (gray point) is intersected with a polygon (electronic fence), so that the point is judged to be in the polygon, that is, an enterprise to be queried is located in the building, and the recognition result is the name of the building; (b) If 1 intersection point exists according to the right side rays, the black point is judged to be in the polygon, and errors can occur. Therefore, for the special case, a ray (i.e., the extension line of the right ray) is further sent to the left, and no intersection point is found with the polygon, i.e., the number of intersection points is 0 and even, so that the point is judged not to be in the polygon, i.e., the enterprise to be queried is not in the building; (c) If the middle ray is just overlapped with the edge of the polygon, the overlapped edge and the ray are considered to have 1 intersection point, so that the black point is judged to be in the polygon; (d) The middle black point is positioned on the edge of the polygon, the right side is provided with an intersection point, the left side extension line rays have no intersection point, and based on the situation, the black point can be judged not to be positioned in the polygon, and the black point can be judged to be positioned in the polygon, and the method is not particularly limited.
It should be noted that if the enterprise to be queried is not located in any building, a building near the querying enterprise may be taken as a reference, a specific building name may be identified, and the corpus may be updated with the building name.
In this embodiment, by introducing the fusion method of the BERT-CRF algorithm and the electronic fence algorithm, compared with the traditional single regular matching model, the model utilizes multi-level algorithm fusion to focus two main characteristic parameters of the enterprise registration address and longitude and latitude, so that when the enterprise registration address cannot be identified, the identification can be performed through the longitude and latitude, the flexibility of identifying the name of the building is improved, and the data utilization rate is improved.
The application provides a method, a device, equipment and a storage medium for identifying a building name, which are used for acquiring enterprise data of an enterprise to be queried in the application, wherein the method, the device, the equipment and the storage medium cannot identify a word with ambiguous meaning in the prior art, and the building name cannot be identified aiming at data of which the registration address cannot be matched, so that the accuracy of identifying the building name is reduced; identifying the target name of the building to which the enterprise to be queried belongs from the enterprise data based on a preset multi-level algorithm; the multi-level algorithm at least comprises a BERT-CRF algorithm and an electronic fence algorithm. In the application, after the data of the enterprise to be queried is acquired, the building name of the enterprise to be queried is identified from the enterprise data by sequentially utilizing a BERT-CRF algorithm and an electronic fence algorithm, wherein the BERT-CRF (Bidirectional Encoder Representation from Transformers-Conditional Random Field, bidirectional conversion encoder-undirected graph model conditional random field) algorithm can analyze the semantic relation among words and label the enterprise data in sequence, and the analysis problem of the BERT-CRF algorithm can be solved by utilizing the electronic fence method, so that the data coverage is more complete and the matching is more accurate, namely, the name of the building is identified by utilizing a multi-level algorithm comprising the BERT-CRF algorithm and the electronic fence algorithm, so that the name with the azimuth word cannot be accurately identified is avoided, the problem that the identification by utilizing a registration address is also overcome, the coverage of identifiable data is more comprehensive, the matching is more accurate, and the accuracy of identifying the name of the building is improved.
Further, based on the foregoing embodiment of the present application, another embodiment of the present application is provided, in which, referring to fig. 2, before the step of converting the registered address in the enterprise data into the sentence vector based on the BERT model in the BERT-CRF algorithm, the method further includes:
step S01, if the entity name is screened out from the registration address of the enterprise data, identifying the target name of the building to which the enterprise to be queried belongs from a preset corpus based on the entity name;
step S02, if the entity name is not screened from the registration address or the target name of the building to which the enterprise to be queried belongs cannot be identified based on the entity name, identifying the target name of the building to which the enterprise to be queried belongs by utilizing the BERT-CRF algorithm.
In this embodiment, before identifying the target name of the building to which the enterprise to be queried belongs by using the BERT-CRF algorithm, the entity name may be screened from the registered address by using a dictionary matching method based on a preset corpus. If the entity name is not recognized based on the corpus, judging that the recognition fails through a dictionary matching method, and then recognizing the target name of the building to which the enterprise to be queried belongs by utilizing a BERT-CRF algorithm.
When the identification is performed through the dictionary matching method, the entity names can be directly screened out from the registration addresses based on the corpus, and the entity names are output as target names, so that the identification rate is improved on the premise of improving the accuracy of identifying the building names.
Further, based on the foregoing embodiment of the present application, there is provided another embodiment of the present application, in which, before the step of converting the registered address in the enterprise data into the sentence vector based on the BERT model in the BERT-CRF algorithm, the method further includes:
step C10, obtaining sample data marked with complete entity names;
step C20, training a preset model to be trained based on the sample data to obtain a BERT-CRF primary model, and pre-identifying based on the BERT model to obtain a training name;
step C30, if the training names and the entity names marked are different, the labels of the sample data are adjusted based on the differences, then the BERT-CRF primary model is trained until the training names and the entity names are not different, and then a BERT-CRF algorithm is determined;
the model to be trained is obtained by superposing a BERT model framework and a CRF model framework.
In this embodiment, the sample data marked with the complete entity name may be obtained from a corpus, or may be output based on the characteristics and rules of each cluster market name and a building list provided by a branch company of an operator by a dictionary matching method.
The sample data at least comprises an enterprise registration address and a marked building name.
In this embodiment, referring to fig. 6, after the sample data is obtained, the sample data is converted into labeling data that can be understood by a computer, that is, a labeling entity name; wherein, the name of the marked entity can be marked by using a BIO marking method, B (begin) represents the beginning of the entity, I (inside) represents the middle and the end of the entity, and O (outside) represents the non-entity part; after the entity names are marked, the marked sample data are utilized to train the model to be trained.
When the model to be trained is trained, the output prediction result is compared with the entity name of the mark, and the marking data is adjusted according to the difference between the output prediction result and the entity name of the mark, so that the prediction accuracy of the BERT-CRF algorithm is improved.
The identification device for building names provided by the invention is described below, and the identification device for building names described below and the identification method for building names described above can be referred to correspondingly.
Fig. 8 is a schematic structural diagram of a device for identifying a building name according to the present application, as shown in fig. 8, the device for identifying a building name includes:
an obtaining module 810, configured to obtain enterprise data of an enterprise to be queried;
a first identifying module 820, configured to identify, from the enterprise data, a target name of a building to which the enterprise to be queried belongs, based on a preset multi-level algorithm;
the multi-level algorithm at least comprises a BERT-CRF algorithm and an electronic fence algorithm.
The identification device for the building names, provided by the embodiment of the application, can not identify words with ambiguities in the prior art, and can not identify the building names aiming at the data which cannot be matched with the registration addresses, so that the identification accuracy of the building names is reduced; identifying the target name of the building to which the enterprise to be queried belongs from the enterprise data based on a preset multi-level algorithm; the multi-level algorithm at least comprises a BERT-CRF algorithm and an electronic fence algorithm. In the application, after the data of the enterprise to be queried is acquired, the building name of the enterprise to be queried is identified from the enterprise data by sequentially utilizing a BERT-CRF algorithm and an electronic fence algorithm, wherein the BERT-CRF (Bidirectional Encoder Representation from Transformers-Conditional Random Field, bidirectional conversion encoder-undirected graph model conditional random field) algorithm can analyze the semantic relation among words and label the enterprise data in sequence, and the analysis problem of the BERT-CRF algorithm can be solved by utilizing the electronic fence method, so that the data coverage is more complete and the matching is more accurate, namely, the name of the building is identified by utilizing a multi-level algorithm comprising the BERT-CRF algorithm and the electronic fence algorithm, so that the name with the azimuth word cannot be accurately identified is avoided, the problem that the identification by utilizing a registration address is also overcome, the coverage of identifiable data is more comprehensive, the matching is more accurate, and the accuracy of identifying the name of the building is improved.
Optionally, the first identifying module 820 includes:
the conversion module is used for converting the registered address in the enterprise data into sentence vectors based on a BERT model in the BERT-CRF algorithm;
the decoding module is used for decoding the sentence vector based on a CRF model in the BERT-CRF algorithm and determining a labeling sequence of the sentence vector;
the first identification sub-module is used for identifying a target sequence representing the entity name from the labeling sequence;
and the second recognition sub-module is used for recognizing the target sentence vector corresponding to the target sequence and determining the target name of the building to which the enterprise to be queried belongs based on the target sentence vector.
Optionally, the second identifying submodule includes:
the first recognition unit is used for recognizing a target sentence vector corresponding to the target sequence;
and the matching module is used for carrying out dictionary matching on the target sentence vector based on a preset corpus if the target sentence vector does not comprise the target name of the building to which the enterprise to be queried belongs, and determining the target name of the building to which the enterprise to be queried belongs.
Optionally, the matching module includes:
the screening module is used for screening longitude and latitude information of the enterprise to be queried from the enterprise data if the target name of the building to which the enterprise to be queried belongs cannot be identified based on the BERT-CRF algorithm and the registration address in the enterprise data;
And the third identification sub-module is used for identifying the target name of the building to which the enterprise to be queried belongs based on the electronic fence algorithm and the longitude and latitude information.
Optionally, the third recognition submodule includes:
the first determining module is used for determining the fence to be detected from a preset electronic fence table library based on the longitude and latitude information and a preset query range;
the second determining module is used for determining the theodolite of the enterprise to be queried based on the longitude and latitude information;
the judging module is used for taking the theodolite as an endpoint, respectively leading out a virtual ray in two opposite directions, and judging the number of intersecting points of each virtual ray and each fence to be tested;
and the second identification unit is used for determining a target fence corresponding to the enterprise to be queried based on the number of the intersecting points and determining a building name corresponding to the target fence.
Optionally, the identifying device of building names further includes:
an acquisition sub-module for acquiring sample data marked with a complete entity name;
the first training module is used for training a preset model to be trained based on the sample data to obtain a BERT-CRF primary model, and pre-identifying based on the BERT model to obtain a training name;
The second training module is used for adjusting the label of the sample data based on the difference if the difference exists between the training name and the entity name marked, then training the BERT-CRF primary model until the training name and the entity name have no difference, and then determining a BERT-CRF algorithm;
the model to be trained is obtained by superposing a BERT model framework and a CRF model framework.
Optionally, the apparatus further comprises:
the second identifying module is used for identifying the target name of the building to which the enterprise to be queried belongs from a preset corpus based on the entity name if the entity name is screened out from the registration address of the enterprise data;
and the selection module is used for identifying the target name of the building to which the enterprise to be queried belongs by utilizing the BERT-CRF algorithm if the entity name is not screened from the registration address or the target name of the building to which the enterprise to be queried cannot be identified based on the entity name.
Fig. 9 illustrates a physical schematic diagram of an electronic device, as shown in fig. 9, which may include: processor 910, communication interface (Communications Interface), memory 930, and communication bus 940, wherein processor 910, communication interface 920, and memory 930 communicate with each other via communication bus 940. The processor 910 may invoke logic instructions in the memory 930 to perform a method of identifying a building name, the method comprising: acquiring enterprise data of an enterprise to be queried; identifying the target name of the building to which the enterprise to be queried belongs from the enterprise data based on a preset multi-level algorithm; the multi-level algorithm at least comprises a BERT-CRF algorithm and an electronic fence algorithm.
Further, the logic instructions in the memory 930 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product comprising a computer program, the computer program being storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of performing a method of identifying a building name provided by the methods described above, the method comprising: acquiring enterprise data of an enterprise to be queried; identifying the target name of the building to which the enterprise to be queried belongs from the enterprise data based on a preset multi-level algorithm; the multi-level algorithm at least comprises a BERT-CRF algorithm and an electronic fence algorithm.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform a method of identifying a name of a building provided by the above methods, the method comprising: acquiring enterprise data of an enterprise to be queried; identifying the target name of the building to which the enterprise to be queried belongs from the enterprise data based on a preset multi-level algorithm; the multi-level algorithm at least comprises a BERT-CRF algorithm and an electronic fence algorithm.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A method of identifying a name of a building, comprising:
acquiring enterprise data of an enterprise to be queried;
identifying the target name of the building to which the enterprise to be queried belongs from the enterprise data based on a preset multi-level algorithm;
the multi-level algorithm at least comprises a BERT-CRF algorithm and an electronic fence algorithm.
2. The method for identifying a building name according to claim 1, wherein the identifying, from the enterprise data, the target name of the building to which the enterprise to be queried belongs based on a preset multi-level algorithm includes:
converting the registration address in the enterprise data into sentence vector based on the BERT model in the BERT-CRF algorithm;
Decoding the sentence vector based on a CRF model in the BERT-CRF algorithm, and determining a labeling sequence of the sentence vector;
identifying a target sequence representing the entity name from the labeling sequence;
and identifying a target sentence vector corresponding to the target sequence, and determining the target name of the building to which the enterprise to be queried belongs based on the target sentence vector.
3. The method for identifying a building name according to claim 2, wherein the identifying a target sentence vector corresponding to the target sequence and determining, based on the target sentence vector, a target name of a building to which the enterprise to be queried belongs comprises:
identifying a target sentence vector corresponding to the target sequence;
if the target sentence vector does not include the target name of the building to which the enterprise to be queried belongs, dictionary matching is performed on the target sentence vector based on a preset corpus, and the target name of the building to which the enterprise to be queried belongs is determined.
4. The method for identifying a building name according to claim 1, wherein the identifying, based on a preset multi-hierarchy algorithm, a target name of a building to which the enterprise to be queried belongs from the enterprise data, further comprises:
If the target name of the building to which the enterprise to be queried belongs cannot be identified based on the BERT-CRF algorithm and the registration address in the enterprise data, the longitude and latitude information of the enterprise to be queried is screened out from the enterprise data;
and identifying the target name of the building to which the enterprise to be queried belongs based on the electronic fence algorithm and the longitude and latitude information.
5. The method for identifying a building name according to claim 4, wherein identifying the target name of the building to which the enterprise to be queried belongs based on the electronic fence algorithm and the latitude and longitude information comprises:
determining a fence to be detected from a preset electronic fence table library based on the longitude and latitude information and a preset query range;
determining the theodolite of the enterprise to be queried based on the longitude and latitude information;
taking the theodolite as an endpoint, respectively leading out a virtual ray in two opposite directions, and judging the number of intersecting points of each virtual ray and each fence to be tested;
and determining a target fence corresponding to the enterprise to be queried based on the intersection points, and determining a building name corresponding to the target fence.
6. The method of claim 2, wherein the converting the registered address in the enterprise data into sentence vector based on the BERT model in the BERT-CRF algorithm, further comprises:
acquiring sample data marked with complete entity names;
training a preset model to be trained based on the sample data to obtain a BERT-CRF primary model, and pre-identifying based on the BERT model to obtain a training name;
if the training names and the entity names marked are different, the labels of the sample data are adjusted based on the differences, then the BERT-CRF primary model is trained until the training names and the entity names are not different, and then a BERT-CRF algorithm is determined;
the model to be trained is obtained by superposing a BERT model framework and a CRF model framework.
7. The method of claim 2, wherein the converting the registered address in the enterprise data into sentence vector based on the BERT model in the BERT-CRF algorithm, further comprises:
if the entity name is screened out from the registration address of the enterprise data, identifying the target name of the building to which the enterprise to be queried belongs from a preset corpus based on the entity name;
If the entity name is not screened from the registration address or the target name of the building to which the enterprise to be queried belongs cannot be identified based on the entity name, identifying the target name of the building to which the enterprise to be queried belongs by utilizing the BERT-CRF algorithm.
8. A device for identifying a name of a building, comprising:
the acquisition module is used for acquiring enterprise data of an enterprise to be queried;
the first identification module is used for identifying the target name of the building to which the enterprise to be queried belongs from the enterprise data based on a preset multi-level algorithm;
the multi-level algorithm at least comprises a BERT-CRF algorithm and an electronic fence algorithm.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of identifying a building name according to any one of claims 1 to 7 when executing the program.
10. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the method of identifying a building name according to any one of claims 1 to 7.
CN202310838486.5A 2023-07-10 2023-07-10 Building name identification method, device, electronic equipment and storage medium Pending CN116992880A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310838486.5A CN116992880A (en) 2023-07-10 2023-07-10 Building name identification method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310838486.5A CN116992880A (en) 2023-07-10 2023-07-10 Building name identification method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116992880A true CN116992880A (en) 2023-11-03

Family

ID=88520550

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310838486.5A Pending CN116992880A (en) 2023-07-10 2023-07-10 Building name identification method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116992880A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117472885A (en) * 2023-12-27 2024-01-30 图灵人工智能研究院(南京)有限公司 Method and system for enterprise information statistics in regional boundary

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117472885A (en) * 2023-12-27 2024-01-30 图灵人工智能研究院(南京)有限公司 Method and system for enterprise information statistics in regional boundary
CN117472885B (en) * 2023-12-27 2024-03-19 图灵人工智能研究院(南京)有限公司 Method and system for enterprise information statistics in regional boundary

Similar Documents

Publication Publication Date Title
CN108399428B (en) Triple loss function design method based on trace ratio criterion
CN110837550A (en) Knowledge graph-based question and answer method and device, electronic equipment and storage medium
CN112163424A (en) Data labeling method, device, equipment and medium
CN114092742B (en) Multi-angle-based small sample image classification device and method
CN116303971A (en) Few-sample form question-answering method oriented to bridge management and maintenance field
CN115273112A (en) Table identification method and device, electronic equipment and readable storage medium
CN116992880A (en) Building name identification method, device, electronic equipment and storage medium
CN113742733A (en) Reading understanding vulnerability event trigger word extraction and vulnerability type identification method and device
CN115658846A (en) Intelligent search method and device suitable for open-source software supply chain
CN117743601B (en) Natural resource knowledge graph completion method, device, equipment and medium
CN113434631A (en) Emotion analysis method and device based on event, computer equipment and storage medium
CN110866172B (en) Data analysis method for block chain system
CN113886602B (en) Domain knowledge base entity identification method based on multi-granularity cognition
CN115187839B (en) Image-text semantic alignment model training method and device
CN116431827A (en) Information processing method, information processing device, storage medium and computer equipment
CN116431746A (en) Address mapping method and device based on coding library, electronic equipment and storage medium
CN116976341A (en) Entity identification method, entity identification device, electronic equipment, storage medium and program product
CN114925681A (en) Knowledge map question-answer entity linking method, device, equipment and medium
CN113947195A (en) Model determination method and device, electronic equipment and memory
CN111143691B (en) Joint information extraction method and device
CN111950875A (en) Intelligent contract reviewing method
CN112926309B (en) Safety information distinguishing method and device and electronic equipment
CN118132738B (en) Extraction type question-answering method for bridge evaluation text
CN113515677B (en) Address matching method, device and computer readable storage medium
CN113886547B (en) Client real-time dialogue switching method and device based on artificial intelligence and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination