CN112910674B - Physical site screening method and device, electronic equipment and storage medium - Google Patents

Physical site screening method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112910674B
CN112910674B CN201911227234.9A CN201911227234A CN112910674B CN 112910674 B CN112910674 B CN 112910674B CN 201911227234 A CN201911227234 A CN 201911227234A CN 112910674 B CN112910674 B CN 112910674B
Authority
CN
China
Prior art keywords
logical
word
site
station
bag
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911227234.9A
Other languages
Chinese (zh)
Other versions
CN112910674A (en
Inventor
梁童
邱超
赵培
刘庆
姜书敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Group Design Institute Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Group Design Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Group Design Institute Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201911227234.9A priority Critical patent/CN112910674B/en
Publication of CN112910674A publication Critical patent/CN112910674A/en
Application granted granted Critical
Publication of CN112910674B publication Critical patent/CN112910674B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/142Network analysis or design using statistical or mathematical methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Algebra (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Pure & Applied Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a physical site screening method, a physical site screening device, electronic equipment and a storage medium, wherein the method comprises the following steps: if the similarity between the name of a first logical site and the name of a second logical site is greater than a first preset threshold value, calculating the distance between the first logical site and the second logical site; and if the distance between the first logical station and the second logical station is smaller than a second preset threshold, determining that the first logical station and the second logical station belong to the same physical station. According to the physical site screening method, the physical site screening device, the electronic equipment and the storage medium, the similarity of the names of the logical sites is identified by adopting an artificial intelligence algorithm, and the logical sites belonging to the same physical site are screened by combining the longitude and the latitude of the logical sites, so that the efficiency and the accuracy of physical site screening are improved.

Description

Physical site screening method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of communications technologies, and in particular, to a method and an apparatus for screening a physical site, an electronic device, and a storage medium.
Background
The base stations are wireless signal transmitting devices, the base stations are divided into 2G base stations, 3G base stations, 4G base stations, 5G base stations and the like according to generation times, the same generation base station comprises base stations with different frequency bands, usually, a plurality of base stations with different types are installed in a machine room, a physical station refers to a station of the machine room, namely, a physical station is no matter how the number and the configuration of the base stations in the machine room, and a logical station refers to the base stations in the machine room.
In the prior art, when physical site classification is required, the following two methods are generally adopted: 1. and adding a physical station special label to each logical station in a correlation processing mode, wherein the logical stations with the same labels are classified into the same physical station. 2. The method is solved by longitude and latitude matching calculation, a certain fault-tolerant distance is set, the distance between two logic stations is calculated, and the two logic stations with the distance smaller than a certain preset threshold value are classified into the same physical station.
However, the physical site analysis and classification are performed by means of logical site identification, data needs to be identified manually in a standardized manner, and then physical site classification is performed by means of database processing, so that the manual workload is large. In addition, the longitude and latitude traversing calculation matching efficiency is low, the calculation complexity is high, and the longitude and latitude acquisition precision has a great influence on the matching result.
Disclosure of Invention
The embodiment of the invention provides a physical site screening method and device, electronic equipment and a storage medium, which are used for solving the technical problems of low physical site screening efficiency and low physical site screening precision in the prior art.
In order to solve the foregoing technical problem, an embodiment of the present invention provides a method for screening physical sites, where the method includes:
if the similarity between the name of a first logical site and the name of a second logical site is greater than a first preset threshold value, calculating the distance between the first logical site and the second logical site;
and if the distance between the first logical station and the second logical station is smaller than a second preset threshold, determining that the first logical station and the second logical station belong to the same physical station.
Further, before calculating the distance between the first logical site and the second logical site, the method further includes:
performing text word segmentation on the name of the first logic site to obtain a first word segmentation result; performing text word segmentation on the name of the second logic site to obtain a second word segmentation result;
generating a first bag-of-words vector by utilizing a pre-constructed bag-of-words model based on the first word segmentation result; generating a second bag-of-words vector by using the bag-of-words model based on the second word segmentation result;
and calculating cosine values of the first bag-of-words vector and the second bag-of-words vector, and taking the cosine values as the similarity of the name of the first logical site and the name of the second logical site.
Further, the specific steps of constructing the bag-of-words model are as follows:
acquiring the names of all logical stations in a preset area;
performing text word segmentation on the name of each logic site to obtain a plurality of words;
serializing all the words after the duplication removal, and giving a unique identification code to each word;
calculating the word frequency inverse document frequency value of each vocabulary, wherein the word frequency inverse document frequency value is the product of the word frequency value of the target vocabulary and the inverse document frequency value of the target vocabulary;
and constructing the word bag model according to the identification code of each word and the word frequency inverse document frequency value.
Further, the generating a first bag-of-words vector by using a pre-constructed bag-of-words model based on the first segmentation result specifically includes:
extracting a word frequency inverse document frequency value corresponding to each word in the first word segmentation result from the word bag model; the bag-of-words model comprises identification codes of a plurality of words and word frequency inverse document frequency values corresponding to each word;
and generating the first bag-of-words vector based on the word frequency inverse document frequency value corresponding to each word in the first word segmentation result.
Further, the text word segmentation of the name of each logical site specifically includes:
if the current word is the registered word, generating a directed acyclic graph with possible word formation, searching a maximum probability path by adopting dynamic programming, and determining a maximum segmentation combination based on word frequency;
if the current word is an unknown word, generating a probability model of Chinese character word formation by adopting a hidden Markov model method, and determining the segmentation combination with the maximum word formation sequence probability.
Further, before the serializing all the words after the deduplication, the method further includes:
and optimizing the word segmentation result by using the user-defined word list and the disabled word list.
Further, after the calculating the distance between the first logical site and the second logical site, the method further includes:
and if the distance between the first logical station and the second logical station is greater than or equal to the second preset threshold value, and the similarity between the name of the first logical station and the name of the second logical station is greater than a third preset threshold value, prompting to check the longitude and latitude of the first logical station and the longitude and latitude of the second logical station, wherein the third preset threshold value is greater than the first preset threshold value.
In another aspect, an embodiment of the present invention provides a physical site screening apparatus, including:
the calculation module is used for calculating the distance between the first logical station and the second logical station if the similarity between the name of the first logical station and the name of the second logical station is greater than a first preset threshold value;
and the screening module is used for determining that the first logical station and the second logical station belong to the same physical station if the distance between the first logical station and the second logical station is smaller than a second preset threshold.
In another aspect, an embodiment of the present invention provides an electronic device, including: a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the above method when executing the computer program.
In yet another aspect, the present invention provides a non-transitory computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the above method.
According to the physical site screening method, the physical site screening device, the electronic equipment and the storage medium, the similarity of the names of the logical sites is identified by adopting an artificial intelligence algorithm, and the logical sites belonging to the same physical site are screened by combining the longitude and the latitude of the logical sites, so that the efficiency and the accuracy of physical site screening are improved.
Drawings
Fig. 1 is a schematic diagram of a physical site screening method according to an embodiment of the present invention;
fig. 2 is a flowchart of a physical site screening logic according to an embodiment of the present invention;
FIG. 3 is a logic flow diagram for calculating station name similarity according to an embodiment of the present invention;
FIG. 4 is a logic flow diagram for constructing a bag of words model according to an embodiment of the present invention;
fig. 5 is a schematic diagram of a physical site screening apparatus according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic diagram of a physical site screening method according to an embodiment of the present invention, and as shown in fig. 1, an execution subject of the physical site screening method according to the embodiment of the present invention is a physical site screening apparatus. The method comprises the following steps:
step S101, if the similarity between the name of the first logical station and the name of the second logical station is greater than a first preset threshold value, calculating the distance between the first logical station and the second logical station.
Specifically, fig. 2 is a flowchart of a physical site screening logic provided by an embodiment of the present invention, and as shown in fig. 2, when performing physical site screening, a screening range is first determined, where the screening range is usually in units of a certain administrative division, for example, in units of districts.
And judging whether the similarity of the names of any two logical sites in all the logical sites in the screening range is greater than a first preset threshold value by adopting a text processing algorithm based on artificial intelligence, wherein the first preset threshold value can be set according to specific conditions, for example, set to 80%. The similarity is used for representing the similarity of the names of the two logical sites, and the larger the similarity value is, the more similar the names of the two logical sites are.
And if the similarity between the name of the first logical site and the name of the second logical site is greater than the first preset threshold, further calculating the distance between the first logical site and the second logical site according to the longitude and latitude of the logical sites.
And if the similarity between the name of the first logical site and the name of the second logical site is less than or equal to the first preset threshold, judging the similarity of the names of the next pair of logical sites.
Step S102, if the distance between the first logical station and the second logical station is smaller than a second preset threshold, determining that the first logical station and the second logical station belong to the same physical station.
Specifically, after the distance between the first logical station and the second logical station is calculated according to the longitude and latitude of the logical station, the distance between the first logical station and the second logical station is compared with a second preset threshold, which may be set according to specific situations, for example, 50 meters.
And if the distance between the first logical site and the second logical site is smaller than the second preset threshold, determining that the first logical site and the second logical site belong to the same physical site.
And if the distance between the first logical station and the second logical station is greater than or equal to the second preset threshold, determining that the first logical station and the second logical station do not belong to the same physical station.
According to the physical site screening method provided by the embodiment of the invention, the similarity of the names of the logical sites is identified by adopting an artificial intelligence algorithm, and then the logical sites belonging to the same physical site are screened by combining the longitudes and latitudes of the logical sites, so that the efficiency and the accuracy of physical site screening are improved.
Based on any of the above embodiments, further before the calculating the distance between the first logical site and the second logical site, the method further includes:
performing text word segmentation on the name of the first logic site to obtain a first word segmentation result; performing text word segmentation on the name of the second logic site to obtain a second word segmentation result;
generating a first bag-of-words vector by utilizing a pre-constructed bag-of-words model based on the first word segmentation result; generating a second bag-of-words vector by using the bag-of-words model based on the second word segmentation result;
and calculating cosine values of the first bag-of-words vector and the second bag-of-words vector, and taking the cosine values as the similarity of the name of the first logical site and the name of the second logical site.
Specifically, fig. 3 is a logic flow chart of calculating the similarity between the station names according to the embodiment of the present invention, and as shown in fig. 3, when calculating the similarity between the name of the first logical station and the name of the second logical station, first, the texts of two new station names, i.e., the first logical station and the second logical station, are obtained.
And then, respectively carrying out text word segmentation on the name of the first logical site and the name of the second logical site by adopting a text word segmentation algorithm. For example, the Chinese participle component in Python's knot (Jieba) may be used for participling. And performing text word segmentation on the name of the first logic site to obtain a first word segmentation result, and performing text word segmentation on the name of the second logic site to obtain a second word segmentation result.
Then, a first bag-of-words vector is generated by using a pre-constructed bag-of-words (Bagofwords) model based on the first segmentation result, and a second bag-of-words vector is generated by using the bag-of-words model based on the second segmentation result.
Generating a first bag-of-words vector by utilizing a pre-constructed bag-of-words model based on the first word segmentation result, which specifically comprises the following steps:
mapping through a bag-of-words model, and extracting a word frequency inverse document frequency value corresponding to each word in the first word segmentation result from the bag-of-words model; the bag-of-words model comprises a plurality of recognition codes of words and a word frequency inverse document frequency value corresponding to each word.
And assigning values to corresponding positions of the bag-of-words vectors based on the extracted word frequency inverse document frequency value corresponding to each word in the first segmentation result, and filling the rest positions with 0 to generate the first bag-of-words vector. The length of the first bag-of-words vector is the number of words in the bag-of-words model.
The method for generating the second bag-of-words vector using the bag-of-words model based on the second segmentation result is the same as the above method.
For example, the bag of words model includes five vocabularies, namely vocabulary 1, vocabulary 2, vocabulary 3, vocabulary 4 and vocabulary 5, and the word frequency inverse document frequency values of the five vocabularies are value1, value2, value3, value4 and value5, respectively. The first segmentation result comprises two words, namely a word 1 and a word 3. The second word segmentation result comprises three words, namely word2, word 3 and word 5. The first generated bag-of-word vector is denoted by F1, F1= [ value1,0, value3, 0], and the second generated bag-of-word vector is denoted by F2, F2= [0, value2, value3,0, value5].
And finally, performing similarity calculation based on a cosine similarity model, wherein the product of bag vectors corresponding to the two logical sites is calculated firstly, then the product of bag vector models corresponding to the two logical sites is calculated, and finally the quotient of the two products is taken as a final similarity value, namely, the cosine values of the first bag-of-words vector and the second bag-of-words vector are taken as the similarity between the name of the first logical site and the name of the second logical site.
According to the physical site screening method provided by the embodiment of the invention, the similarity of the names of the logical sites is identified by adopting an artificial intelligence algorithm, and the logical sites belonging to the same physical site are screened by combining the longitude and latitude of the logical sites, so that the efficiency and the accuracy of physical site screening are improved.
Based on any of the above embodiments, further, the specific steps of constructing the bag-of-words model are as follows:
acquiring the names of all logical stations in a preset area;
performing text word segmentation on the name of each logic site to obtain a plurality of words;
serializing all the words after the duplication removal, and giving a unique identification code to each word;
calculating the word frequency inverse document frequency value of each vocabulary, wherein the word frequency inverse document frequency value is the product of the word frequency value of the target vocabulary and the inverse document frequency value of the target vocabulary;
and constructing the word bag model according to the identification code of each word and the word frequency inverse document frequency value.
Specifically, fig. 4 is a logic flow diagram for constructing a bag-of-words model according to an embodiment of the present invention, and as shown in fig. 4, the specific steps for constructing the bag-of-words model are as follows:
first, a site name expectation is acquired, that is, text data of names of all logical sites in a preset area, which is generally in units of a certain administrative division, is acquired, the range of the preset area is larger than the range of screening physical sites, for example, the preset area is a city, and the screening range when the physical site screening is performed is in units of zones.
Text data of names of all logical sites in the preset area can be extracted from a pre-established site resource library by taking the province and city preset areas as conditions, and the site resource library comprises basic data such as site names, site sky longitude and latitude, province and city.
And then, respectively carrying out text word segmentation on the name of each logical site by adopting a text word segmentation algorithm. For example, the Jieba chinese participle component in Python can be used for participle.
Then, the word segmentation result is subjected to duplication removal and word serialization, and each word is endowed with a unique identification code ID.
Then, the word frequency (TF) value of each vocabulary is calculated, and normalization processing is carried out in the period, so that the situation that the word frequency in a long file is possibly higher than that in a short file is avoided. Then calculating the Inverse Document Frequency (IDF) value of each vocabulary, and finally multiplying the two values to obtain the word frequency inverse document frequency (TF-IDF) value of each vocabulary.
Finally, the ID of the vocabulary is used as a main key (key), the TF-IDF value is used as a value (value), the ID is stored in a graph (map) data structure in the form of key-value, and a bag-of-words model file is generated, wherein the format of the bag-of-words model file can be various, such as txt format and the like.
According to the physical site screening method provided by the embodiment of the invention, the similarity of the names of the logical sites is identified by adopting an artificial intelligence algorithm, and then the logical sites belonging to the same physical site are screened by combining the longitudes and latitudes of the logical sites, so that the efficiency and the accuracy of physical site screening are improved.
Based on any of the above embodiments, further, the performing text segmentation on the name of each logical site specifically includes:
if the current word is a registered word, generating a directed acyclic graph with possible word formation, searching a maximum probability path by adopting dynamic programming, and determining a maximum segmentation combination based on word frequency;
and if the current word is an unknown word, generating a probability model of Chinese character word formation by adopting a hidden Markov model method, and determining the segmentation combination with the maximum word formation sequence probability.
Specifically, in the process of performing text segmentation on names of logical sites by adopting a text segmentation algorithm, if a current word is a registered word, a Directed Acyclic Graph (DAG) method is adopted to generate a directed acyclic graph of possible word formation, a dynamic planning is adopted to search a maximum probability path, and a maximum segmentation combination based on word frequency is searched. The registered word is a word that has been included in the word segmentation vocabulary.
And if the current word is an unknown word, generating a probability model of Chinese character word formation by adopting an HMM (hidden Markov model) method, and searching for the segmentation combination with the maximum word formation sequence probability. The unknown words are words which are not included in the word segmentation word list but are required to be segmented, and include various proper nouns (names of people, places, names of organizations, and the like), new words, and the like.
According to the physical site screening method provided by the embodiment of the invention, the similarity of the names of the logical sites is identified by adopting an artificial intelligence algorithm, and the logical sites belonging to the same physical site are screened by combining the longitude and latitude of the logical sites, so that the efficiency and the accuracy of physical site screening are improved.
Based on any of the foregoing embodiments, further before the serializing all the words after the deduplication, the method further includes:
and optimizing the word segmentation result by using the user-defined word list and the disabled word list.
Specifically, in order to improve the word segmentation accuracy, before all the words after duplication removal are serialized, a user-defined word list and a stop word list are introduced to optimize word segmentation results, and names of various provinces, cities, districts, counties and towns are mainly stored in the user-defined word list; the word segmentation result which does not take statistics into account is mainly stored in the stop word list.
According to the physical site screening method provided by the embodiment of the invention, the similarity of the names of the logical sites is identified by adopting an artificial intelligence algorithm, and the logical sites belonging to the same physical site are screened by combining the longitude and latitude of the logical sites, so that the efficiency and the accuracy of physical site screening are improved.
Based on any of the foregoing embodiments, further after the calculating the distance between the first logical site and the second logical site, the method further includes:
and if the distance between the first logical station and the second logical station is greater than or equal to the second preset threshold value, and the similarity between the name of the first logical station and the name of the second logical station is greater than a third preset threshold value, prompting to check the longitude and latitude of the first logical station and the longitude and latitude of the second logical station, wherein the third preset threshold value is greater than the first preset threshold value.
Specifically, the embodiment of the invention can also carry out the accuracy check of longitude and latitude and screen the longitude and latitude data which possibly have acquisition problems.
After calculating the distance between the first logical station and the second logical station, the distance between the first logical station and the second logical station is compared with a second preset threshold value, which may be set according to the specific situation, for example, set to 50 meters.
And if the distance between the first logical station and the second logical station is greater than or equal to the second preset threshold value, and the similarity between the name of the first logical station and the name of the second logical station is greater than a third preset threshold value, prompting to check the longitude and latitude of the first logical station and the longitude and latitude of the second logical station, wherein the third preset threshold value is greater than the first preset threshold value. The first preset threshold and the third preset threshold may be set according to specific situations, for example, the first preset threshold is set to 80%, and the third preset threshold is set to 90%.
And finally, performing database carding on the logical stations judged to be the same physical station according to a tree structure, carding according to the tree structure if corresponding physical stations exist when the logical stations are newly added subsequently, and processing according to independent physical stations if the logical stations are newly added.
According to the physical site screening method provided by the embodiment of the invention, the similarity of the names of the logical sites is identified by adopting an artificial intelligence algorithm, and the logical sites belonging to the same physical site are screened by combining the longitude and latitude of the logical sites, so that the efficiency and the accuracy of physical site screening are improved.
The method in the above example is illustrated below with a set of experimental results:
a comparative experiment is performed between the method in the above embodiment and the word2vec method, wherein a P value (accuracy), an R value (recall), an F value, and time (seconds) are used as evaluation indexes, and the experimental results are shown in table 1:
TABLE 1 results of the experiment
Processing method P (accuracy) R (recall rate) F value Time consuming (seconds)
Bagofwords 80.17% 93.66% 86.39% 830
Word2vec 79.13% 88% 83.33% 780
Through comparative experiments, the following results can be obtained: the word bag model accuracy rate is slightly higher than that of the word2vec method, the recall rate of the word bag model is improved by 5.66% compared with that of the word2vec method, and the F value is improved by 3.06%.
In the aspect of time consumption, 9569 sites are adopted to carry out comparison test on the two schemes, and the time difference is about 50 seconds and is basically equivalent. Therefore, the method in the embodiment of the invention can improve the performance of the algorithm on the basis of ensuring the efficiency.
Based on any of the above embodiments, further, fig. 5 is a schematic diagram of a physical site screening apparatus provided in an embodiment of the present invention, and as shown in fig. 5, an embodiment of the present invention provides a physical site screening apparatus, which includes a calculating module 501 and a screening module 502, where:
the calculating module 501 is configured to calculate a distance between a first logical site and a second logical site if similarity between the name of the first logical site and the name of the second logical site is greater than a first preset threshold; the screening module 502 is configured to determine that the first logical station and the second logical station belong to the same physical station if the distance between the first logical station and the second logical station is smaller than a second preset threshold.
According to the physical site screening device provided by the embodiment of the invention, the similarity of the names of the logical sites is firstly identified by adopting an artificial intelligence algorithm, and then the logical sites belonging to the same physical site are screened by combining the longitude and latitude of the logical sites, so that the efficiency and the accuracy of physical site screening are improved.
Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 6, the electronic device includes: a processor (processor) 601, a communication Interface (Communications Interface) 602, a memory (memory) 603 and a communication bus 604, wherein the processor 601, the communication Interface 602 and the memory 603 complete communication with each other through the communication bus 604. The processor 601 and the memory 602 communicate with each other via a bus 603. The processor 601 may call logic instructions in the memory 603 to perform the following method:
if the similarity between the name of a first logical site and the name of a second logical site is greater than a first preset threshold value, calculating the distance between the first logical site and the second logical site;
and if the distance between the first logical station and the second logical station is smaller than a second preset threshold, determining that the first logical station and the second logical station belong to the same physical station.
In addition, the logic instructions in the memory may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.
Further, embodiments of the present invention provide a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the steps in the above-described method embodiments, for example, comprising:
if the similarity between the name of a first logical site and the name of a second logical site is greater than a first preset threshold value, calculating the distance between the first logical site and the second logical site;
and if the distance between the first logical station and the second logical station is smaller than a second preset threshold value, determining that the first logical station and the second logical station belong to the same physical station.
Further, an embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in the foregoing method embodiments, for example, including:
if the similarity between the name of a first logical site and the name of a second logical site is greater than a first preset threshold value, calculating the distance between the first logical site and the second logical site;
and if the distance between the first logical station and the second logical station is smaller than a second preset threshold value, determining that the first logical station and the second logical station belong to the same physical station.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on the understanding, the above technical solutions substantially or otherwise contributing to the prior art may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the various embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A physical site screening method is characterized by comprising the following steps:
if the similarity between the name of a first logical site and the name of a second logical site is greater than a first preset threshold value, calculating the distance between the first logical site and the second logical site;
if the distance between the first logical station and the second logical station is smaller than a second preset threshold value, determining that the first logical station and the second logical station belong to the same physical station, and performing database combing on the first logical station and the second logical station according to a tree structure.
2. The method of claim 1, wherein the calculating the distance between the first logical site and the second logical site further comprises:
performing text word segmentation on the name of the first logic site to obtain a first word segmentation result; performing text word segmentation on the name of the second logic site to obtain a second word segmentation result;
generating a first bag-of-words vector by utilizing a pre-constructed bag-of-words model based on the first word segmentation result; generating a second bag-of-words vector by using the bag-of-words model based on the second word segmentation result;
and calculating cosine values of the first bag-of-words vector and the second bag-of-words vector, and taking the cosine values as the similarity of the name of the first logical site and the name of the second logical site.
3. The physical site screening method of claim 2, wherein the specific steps of constructing the bag-of-words model are as follows:
acquiring the names of all logical stations in a preset area;
performing text word segmentation on the name of each logical site to obtain a plurality of words;
serializing all the words after the duplication removal, and giving a unique identification code to each word;
calculating the word frequency inverse document frequency value of each vocabulary, wherein the word frequency inverse document frequency value is the product of the word frequency value of the target vocabulary and the inverse document frequency value of the target vocabulary;
and constructing the word bag model according to the identification code of each word and the word frequency inverse document frequency value.
4. The physical site screening method according to claim 2, wherein the generating a first bag-of-words vector by using a pre-constructed bag-of-words model based on the first segmentation result specifically comprises:
extracting a word frequency inverse document frequency value corresponding to each word in the first word segmentation result from the word bag model; the bag-of-words model comprises identification codes of a plurality of words and word frequency inverse document frequency values corresponding to each word;
and generating the first bag-of-words vector based on the word frequency inverse document frequency value corresponding to each word in the first word segmentation result.
5. The physical site screening method according to claim 3, wherein the text word segmentation of the name of each logical site specifically includes:
if the current word is a registered word, generating a directed acyclic graph with possible word formation, searching a maximum probability path by adopting dynamic programming, and determining a maximum segmentation combination based on word frequency;
if the current word is an unknown word, generating a probability model of Chinese character word formation by adopting a hidden Markov model method, and determining the segmentation combination with the maximum word formation sequence probability.
6. The method as claimed in claim 3, wherein before the serializing all the de-duplicated words, the method further comprises:
and optimizing the word segmentation result by using the user-defined word list and the disabled word list.
7. The physical site screening method according to any one of claims 1 to 6, wherein after the calculating the distance between the first logical site and the second logical site, the method further comprises:
and if the distance between the first logical station and the second logical station is greater than or equal to the second preset threshold value, and the similarity between the name of the first logical station and the name of the second logical station is greater than a third preset threshold value, prompting to check the longitude and latitude of the first logical station and the longitude and latitude of the second logical station, wherein the third preset threshold value is greater than the first preset threshold value.
8. A physical site screening apparatus, comprising:
the calculation module is used for calculating the distance between the first logical station and the second logical station if the similarity between the name of the first logical station and the name of the second logical station is greater than a first preset threshold value;
and the screening module is used for determining that the first logical station and the second logical station belong to the same physical station if the distance between the first logical station and the second logical station is smaller than a second preset threshold value, and performing database carding on the first logical station and the second logical station according to a tree structure.
9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the physical site screening method according to any one of claims 1 to 7 when executing the computer program.
10. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor, performs the steps of the physical site screening method according to any one of claims 1 to 7.
CN201911227234.9A 2019-12-04 2019-12-04 Physical site screening method and device, electronic equipment and storage medium Active CN112910674B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911227234.9A CN112910674B (en) 2019-12-04 2019-12-04 Physical site screening method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911227234.9A CN112910674B (en) 2019-12-04 2019-12-04 Physical site screening method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112910674A CN112910674A (en) 2021-06-04
CN112910674B true CN112910674B (en) 2023-04-18

Family

ID=76110660

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911227234.9A Active CN112910674B (en) 2019-12-04 2019-12-04 Physical site screening method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112910674B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105069141A (en) * 2015-08-19 2015-11-18 北京工商大学 Construction method and construction system for stock standard news library
CN107832306A (en) * 2017-11-28 2018-03-23 武汉大学 A kind of similar entities method for digging based on Doc2vec
CN108268539A (en) * 2016-12-31 2018-07-10 上海交通大学 Video matching system based on text analyzing
CN108710613A (en) * 2018-05-22 2018-10-26 平安科技(深圳)有限公司 Acquisition methods, terminal device and the medium of text similarity
CN109635077A (en) * 2018-12-18 2019-04-16 武汉斗鱼网络科技有限公司 Calculation method, device, electronic equipment and the storage medium of text similarity
CN110225541A (en) * 2019-05-28 2019-09-10 广东南方通信建设有限公司 Base station site physical message management method, system, computer and can storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170139899A1 (en) * 2015-11-18 2017-05-18 Le Holdings (Beijing) Co., Ltd. Keyword extraction method and electronic device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105069141A (en) * 2015-08-19 2015-11-18 北京工商大学 Construction method and construction system for stock standard news library
CN108268539A (en) * 2016-12-31 2018-07-10 上海交通大学 Video matching system based on text analyzing
CN107832306A (en) * 2017-11-28 2018-03-23 武汉大学 A kind of similar entities method for digging based on Doc2vec
CN108710613A (en) * 2018-05-22 2018-10-26 平安科技(深圳)有限公司 Acquisition methods, terminal device and the medium of text similarity
CN109635077A (en) * 2018-12-18 2019-04-16 武汉斗鱼网络科技有限公司 Calculation method, device, electronic equipment and the storage medium of text similarity
CN110225541A (en) * 2019-05-28 2019-09-10 广东南方通信建设有限公司 Base station site physical message management method, system, computer and can storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
TD基站经纬度核查工具的研发;苏琦等;《广西通信技术》;20121231;正文第40-45页、图1-9 *

Also Published As

Publication number Publication date
CN112910674A (en) 2021-06-04

Similar Documents

Publication Publication Date Title
CN110781276B (en) Text extraction method, device, equipment and storage medium
CN109815487B (en) Text quality inspection method, electronic device, computer equipment and storage medium
CN108399564B (en) Credit scoring method and device
US9977995B2 (en) Image clustering method, image clustering system, and image clustering server
CN111382212B (en) Associated address acquisition method and device, electronic equipment and storage medium
CN111931077B (en) Data processing method, device, electronic equipment and storage medium
CN110162518B (en) Data grouping method, device, electronic equipment and storage medium
CN110781284A (en) Knowledge graph-based question and answer method, device and storage medium
CN111292752A (en) User intention identification method and device, electronic equipment and storage medium
CN109783805B (en) Network community user identification method and device and readable storage medium
CN111177367A (en) Case classification method, classification model training method and related products
CN112036168A (en) Event subject recognition model optimization method, device and equipment and readable storage medium
CN112183102A (en) Named entity identification method based on attention mechanism and graph attention network
CN112036169A (en) Event recognition model optimization method, device and equipment and readable storage medium
CN110674208A (en) Method and device for determining position information of user
CN108616413B (en) Information calibration method and device
CN112910674B (en) Physical site screening method and device, electronic equipment and storage medium
CN112380861A (en) Model training method and device and intention identification method and device
CN112100355A (en) Intelligent interaction method, device and equipment
CN108830302B (en) Image classification method, training method, classification prediction method and related device
CN111932302A (en) Method, device, equipment and system for determining number of service sites in area
CN116127066A (en) Text clustering method, text clustering device, electronic equipment and storage medium
CN113342932B (en) Target word vector determining method and device, storage medium and electronic device
CN114398482A (en) Dictionary construction method and device, electronic equipment and storage medium
CN116432633A (en) Address error correction method, device, computer equipment and readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant