CN115905582A - Abnormal POI data detection method and device, electronic equipment and storage medium - Google Patents

Abnormal POI data detection method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115905582A
CN115905582A CN202211448123.2A CN202211448123A CN115905582A CN 115905582 A CN115905582 A CN 115905582A CN 202211448123 A CN202211448123 A CN 202211448123A CN 115905582 A CN115905582 A CN 115905582A
Authority
CN
China
Prior art keywords
word
segmentation result
words
matching
building information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211448123.2A
Other languages
Chinese (zh)
Inventor
隆盼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202211448123.2A priority Critical patent/CN115905582A/en
Publication of CN115905582A publication Critical patent/CN115905582A/en
Pending legal-status Critical Current

Links

Images

Abstract

The disclosure provides a method and a device for detecting abnormal POI data, electronic equipment and a storage medium, and relates to the field of data processing, in particular to the field of map data processing. The specific implementation scheme is as follows: the method comprises the steps of segmenting building information in a database and building information in a map according to POI to be detected in the map to obtain a first segmentation result and a second segmentation result, matching a first word in the first segmentation result with a second word in the second segmentation result to obtain a binarization matching result of each first word, and determining that the building information in the map is abnormal if the binarization matching results are the first words of a second preset value. If the binarization matching result is the first word of the second preset value, the first word segmentation result is different from the second word segmentation result, and the building information corresponding to the POI in the database is different from the building information corresponding to the POI in the map, the POI data in the map can be determined to be abnormal, so that abnormal POI data can be effectively detected.

Description

Abnormal POI data detection method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to the field of map data processing technologies.
Background
Due to the data hooking error at the map side, a problem that the information displayed by a POI (Point of Interest) in the map is wrong may occur. For example, information of multiple hotels is displayed on the same POI, information of the same hotel is displayed at multiple POI sites, and the like.
Disclosure of Invention
The disclosure provides a method, a device, equipment and a storage medium for detecting abnormal POI data, which are used for detecting the abnormal POI data.
According to an aspect of the present disclosure, there is provided a method for detecting abnormal POI data, including:
aiming at POI to be detected in a map, acquiring reference building information corresponding to the POI in a database and hitching building information corresponding to the POI in the map aiming at each POI in the map;
performing word segmentation on the reference building information and the articulated building information respectively to obtain a first word segmentation result and a second word segmentation result; wherein the first word segmentation result is a word segmentation result of one of the reference building information and the articulated building information; the second word segmentation result is the word segmentation result of the other one of the reference building information and the articulated building information;
for each first word in the first word segmentation result, matching the first word with each second word in the second word segmentation result to obtain a binarization matching result of each first word, wherein the binarization matching result comprises a first preset value and a second preset value, the first preset value indicates that the first word and the second word are successfully matched, and the second preset value indicates that the first word and the second word are not successfully matched;
and if the abnormal first words exist in the first word segmentation result, determining that the articulated building information is abnormal POI data, wherein the abnormal first words are first words of which the binarization matching results are all the second preset values.
According to another aspect of the present disclosure, there is provided an apparatus for detecting abnormal POI data, including:
the acquisition module is used for acquiring reference building information corresponding to the POI in the database and acquiring hanging building information corresponding to the POI in the map aiming at the POI to be detected in the map;
the word segmentation module is used for segmenting words of the reference building information and the articulated building information respectively to obtain a first word segmentation result and a second word segmentation result; wherein the first word segmentation result is a word segmentation result of one of the reference building information and the hanging building information; the second word segmentation result is the word segmentation result of the other one of the reference building information and the articulated building information;
the matching module is used for matching each first word in the first word segmentation result with each second word in the second word segmentation result to obtain a binarization matching result of each first word, wherein the binarization matching result comprises a first preset value and a second preset value, the first preset value indicates that the first word and the second word are successfully matched, and the second preset value indicates that the first word and the second word are not successfully matched;
the detection module is used for determining that the articulated building information is abnormal POI data if abnormal first words exist in the first word segmentation result, wherein the abnormal first words are first words of which the binarization matching results are second preset values.
According to another aspect of the present disclosure, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform any of the above methods of detecting outlier POI data.
According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute any one of the above-described methods of detecting outlier POI data.
According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method of detecting anomalous POI data as described in any one of the above.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
fig. 1 is a schematic diagram of a first embodiment of a method of detecting anomalous POI data provided in accordance with the present disclosure;
FIG. 2 is a schematic diagram of a second embodiment of a method of detecting outlier POI data provided in accordance with the present disclosure;
fig. 3 is a schematic flowchart illustrating a matching of a first segmentation result and a second segmentation result in the method for detecting abnormal POI data according to the present disclosure;
FIG. 4 is a schematic diagram of a third embodiment of a method of detecting outlier POI data, provided in accordance with the present disclosure;
fig. 5 is a schematic structural diagram of an apparatus for detecting abnormal POI data provided in accordance with the present disclosure;
fig. 6 is a block diagram of an electronic device for implementing a method for detecting abnormal POI data provided in an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
At present, various types of APPs (applications) access map data for route planning, information display, and the like. For example, the navigation APP performs route planning from a starting location to a destination based on map data, and the travel APP performs hotel information presentation, scenic spot information presentation, and the like based on map data.
Due to the map side data hooking error, POI data different from those in the database can appear in the map. The data in the database is generally data which is collected by related personnel or uploaded by customers and accords with the real situation, so that the POI data in the map is abnormal if the POI data in the map is different from the information in the database. For example, a POI in the database corresponds to hotel a, and due to a data hooking error, the POI in the map corresponds to hotel A, B, C.
POI data anomaly in the map can cause very big influence for relevant APP, for example, tourism APP only shows hotel information usually to a POI position, if many hotels correspond same POI position, can lead to the hotel information incomplete of APP show, reduces user selection scope, reduces user and uses experience. Therefore, it is necessary to detect the abnormal POI data in the map and correct the abnormal POI data based on the detected abnormal POI data, so as to reduce the occurrence of such a situation.
In order to detect abnormal POI data in a map, the disclosure provides a method and a device for detecting abnormal POI data, an electronic device and a storage medium. The following first exemplifies a method for detecting abnormal POI data provided by the present disclosure:
the method for detecting the abnormal POI data can be applied to any electronic equipment with the function of detecting the abnormal POI data. The electronic device may be a personal computer, a mobile terminal, a server, and the like.
As shown in fig. 1, fig. 1 is a flowchart of a method for detecting abnormal POI data provided by the present disclosure, which may specifically include the following steps:
s101, aiming at a POI to be detected in a map, acquiring reference building information corresponding to the POI in a database and acquiring hanging building information corresponding to the POI in the map;
step S102, performing word segmentation on the reference building information and the articulated building information respectively to obtain a first word segmentation result and a second word segmentation result; the first word segmentation result is a word segmentation result of one of the reference building information and the hanging building information; the second word segmentation result is the word segmentation result of the other one of the reference building information and the articulated building information;
step S103, aiming at each first word in the first word segmentation result, matching the first word with each second word in the second word segmentation result to obtain a binarization matching result of each first word, wherein the binarization matching result comprises a first preset value and a second preset value, the first preset value indicates that the first word and the second word are successfully matched, and the second preset value indicates that the first word and the second word are not successfully matched;
and step S104, if abnormal first words exist in the first word segmentation result, determining that the hanging building information is abnormal POI data, wherein the abnormal first words are first words of which the binarization matching results are all second preset values.
By applying the embodiment of the disclosure, for a to-be-detected POI in a map, the building information corresponding to the POI in a database and the building information corresponding to the POI in the map are segmented to obtain a first segmentation result and a second segmentation result, each first word in the first segmentation result and each second word in the second segmentation result are matched one by one, and a binarization matching result of each first word is obtained, wherein the binarization matching result comprises a first preset value and a second preset value, the first preset value indicates that the first word and the second word are successfully matched, the second preset value indicates that the first word and the second word are not successfully matched, and if the binarization matching result is the first word of the second preset value, the building information corresponding to the POI in the map is determined to be abnormal. If the binarization matching results are the first words with the second preset values, the first words different from the second words exist in the first word segmentation results, that is, the first word segmentation results are different from the second word segmentation results, and the building information corresponding to the POI in the database is different from the building information corresponding to the POI in the map, so that the building information corresponding to the POI in the map can be determined to be abnormal information. Therefore, by applying the embodiment of the disclosure, abnormal POI data can be effectively detected.
Furthermore, in the related art, the character string of the name or address of the location corresponding to the POI to be detected in the map and the character string of the name or address corresponding to the POI in the database are both converted into vectors, and the similar distance between the two vectors, such as the cosine distance, the euclidean distance, and the like between the two vectors, is calculated, and if the similar distance is greater than a preset threshold, it is determined that the POI data in the map is abnormal. Compared with the prior art, when the similarity between two groups of building information is calculated, the similarity value of the two groups of word segmentation results can be obtained by performing simple word matching and binarization operation on the word segmentation results corresponding to the two groups of building information without converting two character strings into vectors, and then the similarity is calculated through the similarity distance between the vectors, so that the simplicity of similarity calculation is improved, and the efficiency of abnormal POI data detection is improved.
The following is an exemplary description of the above steps S101-S104:
the POI to be detected can be any POI in a map. In step S101, building information corresponding to the POI in the database, including a name and an address of the building, may be obtained for the POI to be detected. And calling a map interface to acquire hanging basic information of the POI in the map. The hanging basic information, that is, the basic information of the building hung at the POI location in the map, may include the name, address, telephone number, and status (open/non-open) of the building. And then, the name, address and the like of the building on the POI point position can be extracted from the hanging basic information. For convenience of description, the building information corresponding to the POI in the database is hereinafter referred to as reference building information, and the building information corresponding to the POI in the map is hereinafter referred to as attachment building information. The building information may be a building name and/or a building address.
Then, it can be determined whether the reference building information and the hanging building information point to the same building, such as whether the reference building information and the hanging building information point to the same building, for example, whether the reference building information and the hanging building information point to the same name or the same address. When comparing the reference building information and the hanging building information, it is necessary to ensure that the reference building information and the hanging building information are of the same type. If the reference building information is a building name, the hanging building information is also a building name. The reference building information is a building name and address, and the hanging building information is also a building name and address.
The above steps S102 to S104 will be described below by taking the calculation of the similarity between building names as an example.
In step S102, when the reference building information and the hanging building information are segmented, the reference building information and the hanging building information may be segmented according to a fixed number of words. For example, the reference building information and the hanging building information may be segmented by two characters, or a set of three characters. For example, for the hotel name "AABC hotel", if the word is segmented according to the two character-set group, the segmentation results of "AA", "AB", "BC", "C wine" and "hotel" can be obtained. But this is prone to meaningless word segmentation results.
As another specific embodiment, based on fig. 1, as shown in fig. 2, the reference building information and the hitching building information may be segmented according to the following steps.
Step S201, performing word segmentation on reference building information and articulated building information respectively based on a preset basic word list to obtain a first candidate word segmentation and a second candidate word segmentation;
step S202, matching the first candidate participle and the second candidate participle with words in a preset disuse word list;
step S203, eliminating words which are successfully matched with the words in the preset stop word list in the first candidate word segmentation and the second candidate word segmentation to obtain a first word segmentation result and a second word segmentation result.
In step S201, the preset basic vocabulary may include phrases preset for names or addresses of buildings. For example, the preference, the chain, the hotel brand, the city name, the district name, and the like may be included, and the specific setting may be performed in advance according to the actual situation.
When the reference building information and the hanging building information are segmented based on the preset basic word list, the words in the preset basic word list can be matched with the reference building information and the hanging building information one by one to obtain a first candidate segmentation and a second candidate segmentation. The first candidate word segmentation is a result obtained after the word segmentation is carried out on the reference building information or the hanging building information based on the preset basic word list, and the second candidate word segmentation is another word segmentation result except the first candidate word segmentation result in the result obtained after the word segmentation is carried out on the reference building information or the hanging building information based on the preset basic word list.
The first candidate participles and the second candidate participles may include words in the preset basic word list that are successfully matched with the reference building information and the hanging building information, and words in the reference building information and the hanging building information that are not successfully matched in the preset basic word list. For example, if the reference building information is "AABC hotel", and only "BC" and "hotel" are obtained by matching in the preset basic word list, the obtained word segmentation result is "AA", "BC" and "hotel".
The first candidate participles and the second candidate participles can be filtered based on a preset deactivation word list. The preset deactivation word list may include words without specific semantics, such as conjunctions and symbols. For example, the preset deactivation vocabulary may include "and", "·", "-", and the like.
For example, the characters in the preset disabled word list may be matched with the first candidate participle and the second candidate participle one by one. And eliminating the successfully matched words from the first candidate participle and the second candidate participle to obtain a first participle result and a second participle result.
It can be understood that the hotel name and the address usually include fixed collocated words, such as "fast hotel", "special hotel", "a city", and the like, and the preset basic word list includes words preset for the hotel name and/or the address, therefore, by performing word segmentation on the reference building information and the hanging building information based on the preset basic word list and the preset stop word list, the possibility of meaningless words and symbols appearing in the word segmentation result can be reduced, for example, the possibility of the meaningless words in the "special hotel" including "wine preference" in the word segmentation result obtained after performing word segmentation on the "special hotel" is reduced, and the subsequent similarity calculation efficiency is improved.
After the first segmentation result and the second segmentation result are obtained, the similarity between the first segmentation result and the second segmentation result can be calculated, and the similarity between the first segmentation result and the second segmentation result is the similarity between the reference building information and the articulated building information.
When the similarity between the first word segmentation result and the second word segmentation result is calculated, the first word segmentation result and the second word segmentation result are matched one by one aiming at each first word in the first word segmentation result, so that a binarization matching result of each first word is obtained, wherein the binarization matching result comprises a first preset value and a second preset value, the first preset value indicates that the first word and the second word are successfully matched, and the second preset value indicates that the first word and the second word are not successfully matched. For example, the first preset value may be 1, and the second preset value may be 0. Of course, the first preset value and the second preset value may also be other values, which is not specifically limited in this disclosure. The "matching is successful" may mean that the first word is identical to the second word.
As a specific embodiment, as shown in fig. 3, each first word and each second word may be matched by the following steps:
and S301, acquiring a first word from the first word segmentation result.
In this step, the first words may be obtained in any order, or may be obtained in the order of the words of each first word in the corresponding building information.
And step S302, aiming at the first word, obtaining a second word from the second word segmentation result.
In this step, the second words may be obtained in any order, or may be obtained in the order of the words of the first words in the corresponding building information.
And step S303, matching the second term with the first term.
Step S304, determining whether the second words and the first words are successfully matched; if the matching is successful, step S305 is executed, and if the matching is not successful, step S306 is executed.
Step S305, determining a binarization matching result of the first word and the second word as a first preset value; step S309 is performed.
And S306, determining the binarization matching result of the first words and the second words as a second preset value.
Step S307, determining whether all second words are acquired; if not all the second words are acquired, step S308 is executed, and if all the second words are acquired, step S309 is executed.
Here, all the second words are obtained, which means that all the second words in the second word segmentation result have been matched with the first words.
And step S308, acquiring a new second word from the second word segmentation result, and returning to the step S303.
Step S309, determining whether all the first words are acquired, if so, ending the process, otherwise, executing step S310.
Step S310, obtaining a new first word from the first word segmentation result, and returning to execute step 302.
As a specific implementation manner, the first words and the second words in the second word segmentation result may be matched one by one in a matrix form to obtain a binarization matching result of each first word, so as to calculate a similarity between the first word segmentation result and the second word segmentation result. For example, the first word segmentation result and the second word segmentation result may be respectively used as one-dimensional arrays, elements in the arrays are matched one by one, and the matching result of each element in the two arrays is recorded by a matrix. For example, after the ith word in the array corresponding to the first word segmentation result is matched with the jth word in the array corresponding to the second word segmentation result, the matching result is recorded at the jth column position of the ith row or the jth column position of the jth row of the matrix, wherein i is any positive integer from 1 to N, N is the number of words in the first word segmentation result, j is any positive integer from 1 to M, and M is the number of words in the second word segmentation result.
For example, the first result is word 1, word 2, word 3, and the second result is word 4, word 5, word 3. Then the array formed by the first segmentation result is [ word 1, word 2, word 3], the array formed by the second segmentation result is [ word 4, word 5, word 3], and the result obtained by matching the first segmentation result and the second segmentation result is:
Figure BDA0003951104040000091
the first word segmentation result may be a word segmentation result of the reference building information or a word segmentation result of the hanging building information, and the second word segmentation result is another word segmentation result of the reference building information and the word segmentation result of the hanging building information. When matching, words contained in the word segmentation result of the reference building information can be matched with words contained in the word segmentation result of the hanging building information one by one, and words contained in the word segmentation result of the hanging building information can be matched with words contained in the word segmentation result of the reference building information one by one.
According to the number of words contained in the first word segmentation result and the second word segmentation result, the aforementioned one-by-one matching mode may be different, and the following description will be made in cases:
the first condition is as follows: the number of words contained in the first word segmentation result is different from the number of words contained in the second word segmentation result:
in this case, if the word segmentation result of the reference building information and the word segmentation result of the hanging building information include a smaller number of words as the first word segmentation result and a larger number of words as the second word segmentation result, even if each first word has a second word that is successfully matched, the second word that is not successfully matched with any first word still exists in the second word segmentation result, which cannot indicate that the first word segmentation result is consistent with the second word segmentation result, but because the first word that is not successfully matched with the second word does not exist, an erroneous determination that abnormal POI data does not occur may be obtained.
Therefore, for this situation, the segmentation result with a larger number of words in the segmentation results of the reference building information and the segmentation results of the hitching building information may be used as the first segmentation result, and the segmentation result with a smaller number of words may be used as the second segmentation result. Thus, matching each first word in the first segmentation result with each second word in the second segmentation result one by one is equivalent to matching each second word in the second segmentation result with each first word one by one at the same time. When the first word which is not successfully matched does not exist in the first word segmentation result, the second word which is not successfully matched does not exist in the second word segmentation result, so that the first word segmentation result is consistent with the second word segmentation result.
Case two: the number of words contained in the first word segmentation result is equal to the number of words contained in the second word segmentation result:
in this case, any one of the word segmentation result of the reference building information and the word segmentation result of the hitched building information is used as the first word segmentation result, and the other word segmentation result is used as the second word segmentation result. And matching each first word in the first word segmentation result with each second word in the second word segmentation result one by one, namely matching each second word with each first word one by one. When the first word which is not successfully matched with each second word in the second word segmentation result does not exist in the first word segmentation result, the second word which is not successfully matched with any first word does not exist in the second word segmentation result, and the two word segmentation results are consistent.
When the first words in the first word segmentation result are matched with the words in the second word segmentation result one by one, the words in the first word segmentation result and the words in the second word segmentation result can be matched according to any sequence, for example, the words can be matched according to the word sequence of the words, or the words can be not matched according to the word sequence of the words. The present disclosure does not specifically limit this.
If the binarization matching result of each first word in the first word segmentation result contains the first preset value, it indicates that a second word successfully matched with each first word in the first word segmentation result exists, and indicates that the first word segmentation result is consistent with the second word segmentation result. Therefore, the hitching building information can be determined to be normal POI data.
If the first word with the binarization matching result being the second preset value exists in the first word segmentation result, the fact that the first word segmentation result is inconsistent with the second word segmentation result if the words which are not successfully matched exist in the two word segmentation results is indicated, and the fact that the hanging building information is abnormal information is indicated. Therefore, the hitching release can be performed, namely, the hitched building information corresponding to the POI in the map is deleted, and the hitching change record table can be recorded in the database, so that the repeated hitching of wrong hitching data can be prevented. In this way, map POI data correctness can be improved.
To the above-mentioned hotel information that leads to the tourism APP of using map data to demonstrate because POI data anomaly in the map is incomplete, dwindles the problem of user selection range, use this embodiment can improve the accuracy of map POI data to improve the comprehensiveness of the hotel information that tourism APP demonstrated, the hotel quotation quantity of improvement enlarges user selection range.
If the number of words included in the first word segmentation result and the second word segmentation result is different, it can also be stated that the two word segmentation results are different and the information of the two buildings is different.
In a practical application scenario, there may be a case where there are duplicate words in the hotel name. If the method is adopted, multiple matching can be performed on the same word, and the abnormal POI data can be mistaken as correct POI data.
Therefore, as a specific implementation manner, the step S302 of obtaining, for the first word, a second word from the second word segmentation result may include:
and aiming at the first word, obtaining a second word from the words to be matched of the second word segmentation result, wherein the words to be matched are all the second words in the second word segmentation result initially.
If the obtained second word is successfully matched with the first word, the second word can be deleted from the word to be matched so as to avoid repeated matching of the words in the second word segmentation result.
In the step S307, it is determined whether all the second words are obtained, specifically, it is determined whether all the second words in the words to be matched are obtained; if not, executing step S308, and if all the second terms in the terms to be matched are acquired, executing step S309.
In step S308, a new second word is obtained from the second word segmentation result, specifically, the new second word may be obtained from the word to be matched of the second word segmentation result.
Illustratively, the first word segmentation result includes a word a, a word B, and a word C, the second word segmentation result includes a word a, a word B, and a word C, and the first word segmentation result is matched with each second word according to the order of the word a, the word B, and the word C. And the word A in the first word segmentation result is successfully matched with the word A in the second word segmentation result, so that the word A can be removed from the second word segmentation result, and the rest words are used as words to be matched for subsequent matching. For example, when matching the word B in the first segmentation result with the word in the second segmentation result, only the word B in the first segmentation result is matched with the word B and the word C in the second segmentation result.
As one embodiment, the first and second segmentation result arrays may be maintained separately. Initially, the first segmentation result array includes all the first terms, and the second segmentation result array includes all the second terms. If there is a second word successfully matched with the first word in the matching process, the second word successfully matched with the first word may be deleted from the second word segmentation result array.
Therefore, the possibility of matching errors caused by repeated words contained in the word segmentation result is reduced, and the similarity calculation accuracy is improved, so that the detection accuracy of abnormal POI data is improved.
Because the building information sources are more, such as data uploaded by merchants and collected by related personnel, building information description modes of different sources may be different, so that different hotel names with similar meanings may be uploaded for the same hotel. For example, there may be two names "special Hui Jiudian" and "special hotel" for the same hotel. In this case, if the matching method is adopted, the correct POI data may be determined as the abnormal POI data.
Therefore, as a specific embodiment, based on fig. 1, as shown in fig. 2, the first segmentation result and the second segmentation result can be matched through the following steps:
step S204, obtaining synonyms of all second words in the second word segmentation result based on a preset synonym word list;
step S205, for each first term in the first segmentation result, matching the first term with each second term and a synonym of each second term to obtain a binarization matching result of each first term, where the binarization matching result includes a first preset value and a second preset value, the first preset value indicates that the first term is the same as the second term or the synonym of the second term, and the second preset value indicates that the first term is different from the synonym of the second term and the synonym of the second term.
In step S204, the synonym table may be preset according to actual situations. For example, "offer" and "special offer" may be set as synonyms for each other.
By combining the synonym vocabulary to match the first segmentation result and the second segmentation result, misjudgment possibly caused by different building names due to different data sources can be avoided to a certain extent, and the accuracy of detecting abnormal POI data is improved.
As shown in fig. 4, fig. 4 is a schematic flowchart of a specific embodiment of a method for detecting abnormal POI data provided in the present disclosure, which may include the following steps:
step S401, obtaining a hotel name corresponding to a target POI from a database, calling a map interface, obtaining hanging basic information of the target POI in a map, and extracting the hotel name corresponding to the target POI from the hanging basic information.
Illustratively, the hotel name corresponding to the target POI obtained from the database is "AABC hotel", and the hotel name corresponding to the target POI obtained from the map is "DEBC hotel".
And step S402, performing word processing on the two hotel names respectively.
Illustratively, the words of the "AABC hotel" and the "DEBC hotel" are segmented based on a preset basic word list, and stop words in the obtained segmentation results are filtered based on a preset stop word list, so that dichotomy results are obtained. Wherein the word segmentation result of the AABC hotel is as follows: the participle results of the AA, the BC and the hotel and the DEBC hotel are DE, BC and hotel.
And S403, performing similarity calculation on the dichotomy result, and returning 1 when the synonym is hit.
Illustratively, synonyms for "DE", "BC", and "hotel" are obtained in conjunction with the synonym table. And matching AA, BC, hotel and DE, BC, hotel and synonyms thereof contained in the word segmentation result of the AABC hotel one by one, and recording the similarity between corresponding words as 1 if the matching is successful. And calculating the overall similarity based on the word matching results. Specifically, if all the words are successfully matched, the overall similarity between the first segmentation result and the second segmentation result is 1.
The above steps S401-S403 are directed to the hotel name in one database and the hotel name in one map at a time. If the target POI in the map corresponds to multiple hotel names, steps S401-S403 may be performed multiple times.
And S404, screening out hotels with similarity not 1.
And S405, issuing a suspension release event to the database, namely releasing the suspension connection between the DEBC hotel and the target POI in the database, namely deleting the DEBC hotel information on the target POI in the map. And recording an unhook change record table in a database, preventing mistaken hooking of data and repeated hooking, such as hooking the information of the DEBC hotel and the target POI again.
By applying the embodiment of the disclosure, abnormal POI data in the map are detected and deleted, the probability of occurrence of the condition that a plurality of hotels are hooked on the same POI is effectively reduced, the hooking accuracy of the hotels is improved, the redundancy of the hotels due to the POI hooking is avoided, the diversity of price information of the hotels displayed by utilizing the map is increased, the selectivity of users is improved, and the user experience is enhanced.
As shown in fig. 5, fig. 5 is a schematic structural diagram of an apparatus for detecting abnormal POI data according to the present disclosure. The apparatus may include:
the acquiring module 501 is configured to acquire, for a POI to be detected in a map, reference building information corresponding to the POI in a database and hitching building information corresponding to the POI in the map;
a word segmentation module 502, configured to perform word segmentation on the reference building information and the hitching building information respectively to obtain a first word segmentation result and a second word segmentation result; wherein the first word segmentation result is a word segmentation result of one of the reference building information and the articulated building information; the second word segmentation result is the word segmentation result of the other one of the reference building information and the articulated building information;
a matching module 503, configured to match, for each first word in the first word segmentation result, the first word with each second word in the second word segmentation result to obtain a binarization matching result of each first word, where the binarization matching result includes a first preset value and a second preset value, the first preset value indicates that the first word and the second word are successfully matched, and the second preset value indicates that the first word and the second word are not successfully matched;
a detection module 504, configured to determine that the hitching building information is abnormal POI data if an abnormal first word exists in the first word segmentation result, where the abnormal first word is a first word whose binarization matching result is a second preset value.
In a possible embodiment, the matching, for each first term in the first segmentation result, the first term with each second term in the second segmentation result to obtain a binarization matching result of each first term includes:
obtaining synonyms of all second words in the second word segmentation result based on a preset synonym word table;
and aiming at each first word in the first word segmentation result, matching the first word with each second word and synonyms of the second words to obtain a binarization matching result of each first word, wherein the binarization matching result comprises a first preset value and a second preset value, the first preset value indicates that the first word is the same as the second word or the synonyms of the second words, and the second preset value indicates that the first word is different from the synonyms of the second words and the synonyms of the second words.
In a possible embodiment, the matching, for each first term in the first segmentation result, the first term with each second term in the second segmentation result to obtain a binarization matching result of each first term includes:
obtaining a first word from the first word segmentation result;
aiming at the first word, obtaining a second word from the second word segmentation result;
matching the second term with the first term;
if the second word and the first word are successfully matched, determining a binarization matching result of the first word and the second word as a first preset value, and if the second word and the first word are not successfully matched, determining a binarization matching result of the first word and the second word as a second preset value;
acquiring a new second word from the second word segmentation result, and returning to execute the step of matching the second word with the first word until the second word is successfully matched with the first word, or all the second words in the second word segmentation result aiming at the first word are acquired;
and acquiring a new first word from the first word segmentation result, and returning to execute the step of acquiring a second word from the second word segmentation result aiming at the first word until all the first words in the first word segmentation result are acquired.
In a possible embodiment, the obtaining, for the first word, a second word from the second word segmentation result includes:
aiming at the first word, obtaining a second word from the words to be matched of the second word segmentation result, wherein the words to be matched are all the second words in the second word segmentation result initially;
the matching module is used for deleting the second word from the words to be matched if the second word is successfully matched with the first word;
the obtaining a new second word from the second word segmentation result, and returning to perform the step of matching the second word with the first word until all second words in the second word segmentation result for the first word are obtained includes:
and acquiring a new second word from the words to be matched of the second word segmentation result, and returning to execute the step of matching the second word with the first word until all second words in the words to be matched of the second word segmentation result aiming at the first word are acquired.
In a possible embodiment, the segmenting the reference building information and the hitching building information respectively to obtain a first segmentation result and a second segmentation result includes:
segmenting the reference building information and the articulated building information respectively based on a preset basic word list to obtain a first candidate segmentation and a second candidate segmentation;
matching the first candidate participle and the second candidate participle with words in a preset deactivation word list;
and eliminating words which are successfully matched with the words in the preset stop word list in the first candidate word segmentation and the second candidate word segmentation to obtain a first word segmentation result and a second word segmentation result.
In a possible embodiment, the detection module is configured to determine that the hitched building information is normal POI data if the binarization matching result of each first term in the first segmentation result includes the first preset value.
In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 6 illustrates a schematic block diagram of an example electronic device 600 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 6, the apparatus 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, and the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 executes the respective methods and processes described above, such as the detection method of the abnormal POI data. For example, in some embodiments, the method of detecting anomalous POI data may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 608. In some embodiments, part or all of a computer program may be loaded onto and/or installed onto device 600 via ROM 602 and/or communications unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the above-described method of detecting abnormal POI data may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the method of detecting outlier POI data in any other suitable way (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program code, when executed by the processor or controller, causes the functions/acts specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server combining a blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (15)

1. A method for detecting abnormal POI data comprises the following steps:
aiming at a POI to be detected in a map, acquiring reference building information corresponding to the POI in a database and acquiring hitching building information corresponding to the POI in the map;
performing word segmentation on the reference building information and the articulated building information respectively to obtain a first word segmentation result and a second word segmentation result; wherein the first word segmentation result is a word segmentation result of one of the reference building information and the articulated building information; the second word segmentation result is the word segmentation result of the other one of the reference building information and the articulated building information;
for each first word in the first word segmentation result, matching the first word with each second word in the second word segmentation result to obtain a binarization matching result of each first word, wherein the binarization matching result comprises a first preset value and a second preset value, the first preset value indicates that the first word and the second word are successfully matched, and the second preset value indicates that the first word and the second word are not successfully matched;
and if the abnormal first words exist in the first word segmentation result, determining that the articulated building information is abnormal POI data, wherein the abnormal first words are first words of which the binarization matching results are all the second preset values.
2. The method of claim 1, wherein the matching, for each first word in the first segmentation result, the first word with each second word in the second segmentation result to obtain a binarization matching result for each first word comprises:
obtaining synonyms of all second words in the second word segmentation result based on a preset synonym word table;
and aiming at each first word in the first word segmentation result, matching the first word with each second word and synonyms of the second words to obtain a binarization matching result of each first word, wherein the binarization matching result comprises a first preset value and a second preset value, the first preset value indicates that the first word is the same as the second word or the synonyms of the second words, and the second preset value indicates that the first word is different from the synonyms of the second words and the synonyms of the second words.
3. The method of claim 1, wherein the matching, for each first word in the first segmentation result, the first word with each second word in the second segmentation result to obtain a binarization matching result for each first word comprises:
obtaining a first word from the first word segmentation result;
aiming at the first word, obtaining a second word from the second word segmentation result;
matching the second term with the first term;
if the second word and the first word are successfully matched, determining a binarization matching result of the first word and the second word as a first preset value, and if the second word and the first word are not successfully matched, determining a binarization matching result of the first word and the second word as a second preset value;
acquiring a new second word from the second word segmentation result, and returning to execute the step of matching the second word with the first word until the second word is successfully matched with the first word, or all the second words in the second word segmentation result aiming at the first word are acquired;
and acquiring a new first word from the first word segmentation result, and returning to execute the step of acquiring a second word from the second word segmentation result aiming at the first word until all the first words in the first word segmentation result are acquired.
4. The method of claim 3, wherein the obtaining, for the first term, a second term from the second segmentation result comprises:
aiming at the first word, obtaining a second word from the words to be matched of the second word segmentation result, wherein the words to be matched are all the second words in the second word segmentation result initially;
the method further comprises the following steps:
if the second word is successfully matched with the first word, deleting the second word from the word to be matched;
the obtaining a new second word from the second word segmentation result, and returning to perform the step of matching the second word with the first word until all second words in the second word segmentation result for the first word are obtained includes:
and acquiring a new second word from the words to be matched of the second word segmentation result, and returning to execute the step of matching the second word with the first word until all second words in the words to be matched of the second word segmentation result aiming at the first word are acquired.
5. The method of claim 1, wherein the segmenting the reference building information and the articulated building information to obtain a first segmentation result and a second segmentation result comprises:
segmenting the reference building information and the articulated building information respectively based on a preset basic word list to obtain a first candidate segmentation and a second candidate segmentation;
matching the first candidate participle and the second candidate participle with words in a preset deactivation word list;
and eliminating words which are successfully matched with the words in the preset stop word list in the first candidate word segmentation and the second candidate word segmentation to obtain a first word segmentation result and a second word segmentation result.
6. The method of claim 1, further comprising:
and if the binarization matching result of each first word in the first word segmentation result contains the first preset value, determining that the articulated building information is normal POI data.
7. An apparatus for detecting abnormal POI data, comprising:
the acquisition module is used for acquiring reference building information corresponding to POI in a database and acquiring hanging building information corresponding to the POI in a map aiming at the POI to be detected in the map;
the word segmentation module is used for segmenting words of the reference building information and the articulated building information respectively to obtain a first word segmentation result and a second word segmentation result; wherein the first word segmentation result is a word segmentation result of one of the reference building information and the articulated building information; the second word segmentation result is the word segmentation result of the other one of the reference building information and the articulated building information;
the matching module is used for matching each first word in the first word segmentation result with each second word in the second word segmentation result to obtain a binarization matching result of each first word, wherein the binarization matching result comprises a first preset value and a second preset value, the first preset value indicates that the first word and the second word are successfully matched, and the second preset value indicates that the first word and the second word are not successfully matched;
and the detection module is used for determining that the hanging building information is abnormal POI data if abnormal first words exist in the first word segmentation result, wherein the abnormal first words are first words of which the binarization matching results are all second preset values.
8. The apparatus of claim 7, wherein the matching, for each first word in the first segmentation result, the first word with each second word in the second segmentation result to obtain a binarization matching result for each first word comprises:
obtaining synonyms of all second words in the second word segmentation result based on a preset synonym word table;
and aiming at each first word in the first word segmentation result, matching the first word with each second word and synonyms of the second words to obtain a binarization matching result of each first word, wherein the binarization matching result comprises a first preset value and a second preset value, the first preset value indicates that the first word is the same as the second word or the synonyms of the second words, and the second preset value indicates that the first word is different from the synonyms of the second words and the synonyms of the second words.
9. The apparatus of claim 7, wherein the matching, for each first word in the first segmentation result, the first word with each second word in the second segmentation result to obtain a binarization matching result for each first word comprises:
obtaining a first word from the first word segmentation result;
aiming at the first word, obtaining a second word from the second word segmentation result;
matching the second term with the first term;
if the second word and the first word are successfully matched, determining a binarization matching result of the first word and the second word as a first preset value, and if the second word and the first word are not successfully matched, determining a binarization matching result of the first word and the second word as a second preset value;
acquiring a new second word from the second word segmentation result, and returning to execute the step of matching the second word with the first word until the second word is successfully matched with the first word, or all the second words in the second word segmentation result aiming at the first word are acquired;
and acquiring a new first word from the first word segmentation result, and returning to execute the step of acquiring a second word from the second word segmentation result aiming at the first word until all the first words in the first word segmentation result are acquired.
10. The apparatus of claim 9, wherein the obtaining, for the first term, a second term from the second segmentation result comprises:
aiming at the first words, second words are obtained from words to be matched of the second word segmentation results, wherein the words to be matched are all the second words in the second word segmentation results initially;
the matching module is used for deleting the second word from the words to be matched if the second word is successfully matched with the first word;
the obtaining a new second word from the second word segmentation result, and returning to perform the step of matching the second word with the first word until all second words in the second word segmentation result for the first word are obtained includes:
and acquiring a new second word from the words to be matched of the second word segmentation result, and returning to execute the step of matching the second word with the first word until all second words in the words to be matched of the second word segmentation result aiming at the first word are acquired.
11. The apparatus of claim 7, wherein the segmenting the reference building information and the hanging building information, respectively, to obtain a first segmentation result and a second segmentation result, comprises:
segmenting the reference building information and the articulated building information respectively based on a preset basic word list to obtain a first candidate segmentation and a second candidate segmentation;
matching the first candidate participle with a second candidate participle with a word in a preset deactivation word list;
and eliminating words which are successfully matched with the words in the preset stop word list in the first candidate word segmentation and the second candidate word segmentation to obtain a first word segmentation result and a second word segmentation result.
12. The device according to claim 7, wherein the detection module is configured to determine that the hitched building information is normal POI data if the binarization matching result of each first term in the first segmentation result includes the first preset value.
13. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.
14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-6.
15. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-6.
CN202211448123.2A 2022-11-18 2022-11-18 Abnormal POI data detection method and device, electronic equipment and storage medium Pending CN115905582A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211448123.2A CN115905582A (en) 2022-11-18 2022-11-18 Abnormal POI data detection method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211448123.2A CN115905582A (en) 2022-11-18 2022-11-18 Abnormal POI data detection method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115905582A true CN115905582A (en) 2023-04-04

Family

ID=86483577

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211448123.2A Pending CN115905582A (en) 2022-11-18 2022-11-18 Abnormal POI data detection method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115905582A (en)

Similar Documents

Publication Publication Date Title
WO2020168750A1 (en) Address information standardization method and apparatus, computer device and storage medium
CN110276023B (en) POI transition event discovery method, device, computing equipment and medium
US20220128372A1 (en) Method for path planning, electronic device and storage medium
CN111062208B (en) File auditing method, device, equipment and storage medium
US20220188292A1 (en) Data processing method, apparatus, electronic device and readable storage medium
CN115600592A (en) Method, device, equipment and medium for extracting key information of text content
CN115759100A (en) Data processing method, device, equipment and medium
CN115905582A (en) Abnormal POI data detection method and device, electronic equipment and storage medium
CN111339776B (en) Resume parsing method and device, electronic equipment and computer-readable storage medium
CN114492370A (en) Webpage identification method and device, electronic equipment and medium
CN114417862A (en) Text matching method, and training method and device of text matching model
CN114116688A (en) Data processing and data quality inspection method, device and readable storage medium
CN113850072A (en) Text emotion analysis method, emotion analysis model training method, device, equipment and medium
CN113961672A (en) Information labeling method and device, electronic equipment and storage medium
CN113051926A (en) Text extraction method, equipment and storage medium
CN114328687B (en) Event extraction model training method and device and event extraction method and device
CN115659067A (en) POI processing method and device, electronic equipment and storage medium
CN116521866A (en) Training sample construction method and device, electronic equipment and medium
CN115512146A (en) POI information mining method, device, equipment and storage medium
CN114201568A (en) Information processing method, generating method, device, electronic equipment and storage medium
CN115828915A (en) Entity disambiguation method, apparatus, electronic device and storage medium
CN115292444A (en) Text data screening method and device, electronic equipment and storage medium
CN116049335A (en) POI classification and model training method, device, equipment and storage medium
CN116052188A (en) Form detection method, form detection device, electronic equipment and storage medium
CN114036263A (en) Website identification method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination