CN113987114B - Address matching method and device based on semantic analysis and electronic equipment - Google Patents
Address matching method and device based on semantic analysis and electronic equipment Download PDFInfo
- Publication number
- CN113987114B CN113987114B CN202111092055.6A CN202111092055A CN113987114B CN 113987114 B CN113987114 B CN 113987114B CN 202111092055 A CN202111092055 A CN 202111092055A CN 113987114 B CN113987114 B CN 113987114B
- Authority
- CN
- China
- Prior art keywords
- corpus
- address
- detected
- content
- word frequency
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/29—Geographical information databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3343—Query execution using phonetics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Remote Sensing (AREA)
- Acoustics & Sound (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Machine Translation (AREA)
Abstract
The embodiment of the specification provides an address matching method based on semantic analysis, which includes the steps of collecting sample address contents, constructing a standard library arrangement rule, performing character conversion on the sample address contents by using a regular expression in the standard library arrangement rule, constructing a word bag model and an address corpus, performing word frequency vectorization processing on objects in a corpus according to character semantics to obtain semantic word frequency vectors, performing voice conversation with a user through a fault hotline module, obtaining to-be-detected corpus contents in the conversation, performing word frequency vectorization processing after the to-be-detected corpus contents are cleaned, obtaining the semantic word frequency vectors of the to-be-detected corpus contents, calculating the similarity between the semantic word frequency vectors of the to-be-detected corpus contents and each object in the corpus, sorting, identifying corpus objects with the maximum similarity, and performing fault processing. By means of similarity matching, matching success rate is improved, and the situation that addresses cannot be retrieved is avoided.
Description
Technical Field
The present application relates to the field of computers, and in particular, to an address matching method and apparatus based on semantic analysis, and an electronic device.
Background
Along with the continuous improvement of the informatization degree of geographic address matching, the address matching function is more and more applied in the geographic information technology industry, most of the address matching adopts the input road name, doorplate and POI information for retrieval, and if the input address information is not matched in the database, the matching success rate is low.
Therefore, it is necessary to provide a method with a high matching success rate.
Disclosure of Invention
The embodiment of the specification provides an address matching method and device based on semantic analysis and electronic equipment, and the address matching method and device and the electronic equipment are used for matching success rate.
An embodiment of the present specification further provides an address matching method based on semantic analysis, including:
collecting sample address content, constructing a standard library sorting rule, performing character conversion on the sample address content by using a regular expression in the standard library sorting rule, and storing the converted sample address content into a created standard library;
constructing a bag-of-words model and an address corpus by using a standard library;
performing word frequency vectorization processing on objects in a corpus according to character semantics to obtain semantic word frequency vectors;
providing a fault hotline module, carrying out voice conversation with a user through the fault hotline module, acquiring the corpus content to be detected in the conversation, cleaning the corpus content to be detected, and carrying out word frequency vectorization processing to obtain semantic word frequency vectors of the corpus content to be detected;
calculating the similarity between the semantic word frequency vector of the corpus content to be detected and each object in the corpus, sequencing the semantic word frequency vector and each object in the corpus, and identifying the corpus object with the maximum similarity;
and carrying out fault processing according to the address characteristics of the identified corpus objects.
Optionally, the performing fault processing according to the address of the identified corpus object includes:
and sending the identified address to a field cruise system to assist the troubleshooting personnel in carrying out field cruise.
Optionally, the obtaining of the corpus content to be detected in the dialog includes:
obtaining conversation voice, constructing a corpus recognition model, and recognizing the corpus content to be detected in the conversation voice in the corpus recognition model.
Optionally, the constructing a corpus recognition model includes:
and training the corpus recognition model according to whether the sample label is provided with the stop word setting sample.
Optionally, the method further comprises:
and performing word segmentation processing on the corpus content to be detected.
Optionally, the method further comprises:
and adjusting the word segmentation precision until the word segmentation result meets the preset precision condition.
Optionally, the method further comprises:
and constructing a word segmentation model in a machine learning mode, and performing word segmentation processing on the corpus content to be detected.
Optionally, the word segmentation model is further configured to merge the interrupted semantic texts.
Optionally, the performing fault processing according to the address feature of the identified corpus object further includes:
and feeding back the matched and identified address characteristics to the user in real time.
An embodiment of the present specification further provides an address matching apparatus based on semantic analysis, including:
the corpus library module is used for collecting sample address contents, constructing a standard library arrangement rule, performing character conversion on the sample address contents by using a regular expression in the standard library arrangement rule, and storing the converted sample address contents into a created standard library;
constructing a bag-of-words model and an address corpus by using a standard library;
performing word frequency vectorization processing on objects in the corpus according to character semantics to obtain semantic word frequency vectors;
the fault hot line module is used for carrying out voice conversation with a user, acquiring the corpus content to be detected in the conversation, cleaning the corpus content to be detected, and carrying out word-frequency vectorization processing to obtain a semantic word-frequency vector of the corpus content to be detected;
the matching module is used for calculating the similarity between the semantic word frequency vector of the corpus content to be detected and each object in the corpus, sequencing the semantic word frequency vector and each object in the corpus and identifying the corpus object with the maximum similarity;
and performing fault processing according to the address characteristics of the identified corpus object.
An embodiment of the present specification further provides an electronic device, where the electronic device includes:
a processor; and the number of the first and second groups,
a memory storing a computer executable program which, when executed, causes the processor to perform any of the methods described above.
The present specification also provides a computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement any of the above methods.
The method includes the steps of collecting sample address content, constructing a standard library arrangement rule, performing character conversion on the sample address content by using a regular expression in the standard library arrangement rule, constructing a bag-of-words model and an address corpus, performing word frequency vectorization processing on objects in the corpus according to character semantics to obtain semantic word frequency vectors, performing voice conversation with a user through a fault hotline module, obtaining to-be-detected corpus content in the conversation, performing word frequency vectorization processing after the to-be-detected corpus content is cleaned to obtain the semantic word frequency vectors of the to-be-detected corpus content, calculating similarity between the semantic word frequency vectors of the to-be-detected corpus content and each object in the corpus, sorting, identifying the corpus object with the maximum similarity, and performing fault processing. By means of similarity matching, matching success rate is improved, and the situation that addresses cannot be retrieved is avoided.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 is a schematic diagram of a semantic analysis based address matching method according to an embodiment of the present disclosure;
fig. 2 is a schematic structural diagram of an address matching apparatus based on semantic analysis according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of an electronic device provided in an embodiment of the present disclosure;
fig. 4 is a schematic diagram of a computer-readable medium provided in an embodiment of the present specification.
Detailed Description
Exemplary embodiments of the present invention will now be described more fully with reference to the accompanying drawings. The exemplary embodiments, however, may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these exemplary embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the invention to those skilled in the art. The same reference numerals denote the same or similar elements, components, or portions in the drawings, and thus, a repetitive description thereof will be omitted.
Features, structures, characteristics or other details described in a particular embodiment do not preclude the fact that the features, structures, characteristics or other details may be combined in a suitable manner in one or more other embodiments in accordance with the technical idea of the invention.
In describing particular embodiments, the present invention has been described with reference to features, structures, characteristics or other details that are within the purview of one skilled in the art to provide a thorough understanding of the embodiments. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific features, structures, characteristics, or other details.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The term "and/or" and/or "includes all combinations of any one or more of the associated listed items.
Fig. 1 is a schematic diagram of a semantic analysis based address matching method provided in an embodiment of the present disclosure, where the method may include:
s101: collecting sample address content, constructing a standard library arrangement rule, performing character conversion on the sample address content by using a regular expression in the standard library arrangement rule, storing the converted sample address content into a created standard library, and constructing a bag-of-words model and an address corpus by using the standard library.
The existing address matching method in the industry has the following defects: in the existing method, other suffix identifications such as an input door and the like are added under an original standard address, such as: zhajia river way 1033 entrance, tian Ke bridge 30 getting side 1 door, etc.; the existing method can not perform complex address matching, such as: the maple forest road 333 breaks into building 1 of the Hongjing garden; description addresses which are difficult to clear in the existing mode: tian-Ke Qiao, zhajia, lubei 10 m flower bed, tian-Ke Qiao Hui Shanxi, 100 m before Shanxi, etc.
Therefore, we need to perform a conversion process on the sample address content.
Objects stored in the standard library may have address names and coordinate information.
S102: and performing word frequency vectorization processing on the objects in the corpus according to the character semantics to obtain semantic word frequency vectors.
The address matching accuracy is improved, and an artificial intelligence mode can be adopted to select an NLP semantic analysis and cosine similarity method to carry out optimal address matching.
Word frequency vectorization (TF-IDF), a commonly used weighting technique for information retrieval and data mining, can be used to achieve more precise semantic matching.
Word frequency vectorization is disclosed in the prior art, and refers to weighting according to the occurrence frequency of the vocabulary to be matched, so that the matching accuracy can be improved, and the details are not described herein.
S103: and providing a fault hot line module, carrying out voice conversation with a user through the fault hot line module, acquiring the corpus content to be detected in the conversation, cleaning the corpus content to be detected, and carrying out word frequency vectorization processing to obtain the semantic word frequency vector of the corpus content to be detected.
In an embodiment of this specification, the acquiring the corpus content to be detected in the dialog includes:
obtaining conversation voice, constructing a corpus recognition model, and recognizing the corpus content to be detected in the conversation voice in the corpus recognition model.
In an embodiment of this specification, the building a corpus recognition model includes:
and training the corpus recognition model according to whether the stop word setting sample label exists.
Thus, the stop word can be recognized through the corpus recognition model.
In the embodiment of the present specification, the method further includes:
and performing word segmentation processing on the corpus content to be detected. By word segmentation, correct semantic parsing can be performed.
In the embodiment of this specification, still include:
and adjusting the word segmentation precision until the word segmentation result meets the preset precision condition.
In the embodiment of this specification, still include:
and constructing a word segmentation model in a machine learning mode, and performing word segmentation processing on the corpus content to be detected.
In this embodiment of the present specification, the word segmentation model is further configured to merge interrupted semantic texts.
S104: calculating the similarity between the semantic word frequency vector of the corpus content to be detected and each object in the corpus, sequencing, identifying the corpus object with the maximum similarity, and performing fault processing according to the address feature of the identified corpus object.
The method comprises the steps of collecting sample address contents, constructing a standard library arrangement rule, performing character conversion on the sample address contents by using a regular expression in the standard library arrangement rule, constructing a bag-of-words model and an address corpus, performing word frequency vectorization processing on objects in the corpus according to character semantics to obtain semantic word frequency vectors, performing voice conversation with a user through a fault hotline module, obtaining to-be-detected corpus contents in the conversation, performing word frequency vectorization processing after the to-be-detected corpus contents are cleaned, obtaining the semantic word frequency vectors of the to-be-detected corpus contents, calculating the similarity between the semantic word frequency vectors of the to-be-detected corpus contents and each object in the corpus, sequencing, identifying the corpus object with the maximum similarity, and performing fault processing. By means of similarity matching, matching success rate is improved, and the situation that addresses cannot be retrieved is avoided.
In the embodiment of the specification, the fault processing is carried out according to the address of the identified corpus object,
the method comprises the following steps:
and sending the identified address to a field cruising system to assist the troubleshooting personnel to carry out field cruising. In an embodiment of this specification, the performing fault processing according to the address feature of the identified corpus object further includes:
and feeding back the matched and identified address characteristics to the user in real time.
The specific implementation is that the process is as follows:
1. creating a standard library of addresses: and POI information and road doorplate information provided by each map application program are obtained, collected and imported into a database.
2. Data cleaning: and (5) carding by using a regular expression, and sorting the data into data meeting the conditions by using a plurality of rules.
3. Collecting customer service corpora: such as collecting address information obtained by customer service telephones of the same type of industry.
4. Removing irregular words: since the customer service corpus may generate non-standard address information, irregular words, such as words attached to addresses by side doors, doorways, etc., are removed.
5. Establishing a bag-of-words model of an address standard library, establishing a corpus of the standard address library: the creation of the bag of words model and corpus is done using the previously created standard library.
6. And performing word frequency vectorization processing on the corpus.
7. And (4) the customer service corpus is participated, and stop words are removed after the participated words are segmented.
8. And calculating the word frequency vector of the text to be detected.
9. And performing similarity matching on the customer service corpus information to be detected by utilizing an address standard library in a mode of calculating cosine similarity between vectors.
10. And calculating the text similarity values for sorting, and screening out the address with the maximum similarity.
11. And verifying whether the matching result is accurate.
In one application scenario, when a troubleshooting person performs pipeline patrol, the troubleshooting person uses a fault hotline module to input voice containing address information, such as "i find pipeline leakage 10 meters before the Tianyin bridge connects with the shopping mall".
And extracting the corpus content from the background, analyzing and processing the corpus content, segmenting sentences according to semantic units, then eliminating characters irrelevant to addresses and words which do not accord with rules, and keeping the addresses and direction words. And then, performing TF-IDF vector calculation on the input corpus, performing TF-IDF vector calculation on addresses stored in a semantic library, performing similarity matching on the two vectors for sorting, and sorting similarity results from high to low to obtain the highest address.
Fig. 2 is a schematic structural diagram of an address matching apparatus based on semantic analysis according to an embodiment of the present disclosure, where the apparatus may include:
the corpus module 201 is used for collecting sample address contents, constructing a standard library arrangement rule, performing character conversion on the sample address contents by using a regular expression in the standard library arrangement rule, and storing the converted sample address contents into a created standard library;
constructing a bag-of-words model and an address corpus by using a standard library;
performing word frequency vectorization processing on objects in a corpus according to character semantics to obtain semantic word frequency vectors;
the fault hotline module 202 is used for performing voice conversation with a user, acquiring the corpus content to be detected in the conversation, cleaning the corpus content to be detected, and performing word frequency vectorization processing to obtain semantic word frequency vectors of the corpus content to be detected;
the matching module 203 calculates the similarity between the semantic word frequency vector of the corpus content to be detected and each object in the corpus, and sequences the semantic word frequency vector and each object to be detected, so as to identify the corpus object with the maximum similarity;
and carrying out fault processing according to the address characteristics of the identified corpus objects.
Optionally, the performing fault processing according to the address of the identified corpus object includes:
and sending the identified address to a field cruising system to assist the troubleshooting personnel to carry out field cruising.
Optionally, the obtaining of the corpus content to be detected in the dialog includes:
obtaining conversation voice, constructing a corpus recognition model, and recognizing the corpus content to be detected in the conversation voice in the corpus recognition model.
Optionally, the constructing the corpus recognition model includes:
and training the corpus recognition model according to whether the stop word setting sample label exists.
Optionally, the method further comprises:
and performing word segmentation processing on the corpus content to be detected.
Optionally, the method further comprises:
and adjusting the word segmentation precision until the word segmentation result meets the preset precision condition.
Optionally, the method further comprises:
and constructing a word segmentation model in a machine learning mode, and performing word segmentation processing on the corpus content to be detected.
Optionally, the word segmentation model is further configured to merge the interrupted semantic text.
Optionally, the performing fault processing according to the address feature of the identified corpus object further includes:
and feeding back the matched and identified address characteristics to the user in real time.
The device comprises the steps of collecting sample address contents, constructing a standard library arrangement rule, utilizing a regular expression in the standard library arrangement rule to perform character conversion on the sample address contents, constructing a bag-of-words model and an address corpus, performing word frequency vectorization processing on objects in the corpus according to character semantics to obtain semantic word frequency vectors, performing voice conversation with a user through a fault hotline module, obtaining to-be-detected corpus contents in the conversation, performing word frequency vectorization processing after the to-be-detected corpus contents are cleaned, obtaining the semantic word frequency vectors of the to-be-detected corpus contents, calculating the similarity between the semantic word frequency vectors of the to-be-detected corpus contents and each object in the corpus, sequencing, identifying the corpus object with the maximum similarity, and performing fault processing. By means of similarity matching, matching success rate is improved, and the situation that addresses cannot be retrieved is avoided.
Based on the same inventive concept, the embodiment of the specification further provides the electronic equipment.
In the following, embodiments of the electronic device of the present invention are described, which may be regarded as specific physical implementations for the above-described embodiments of the method and apparatus of the present invention. Details described in the embodiments of the electronic device of the invention should be considered supplementary to the embodiments of the method or apparatus described above; for details which are not disclosed in embodiments of the electronic device of the invention, reference may be made to the above-described embodiments of the method or the apparatus.
Fig. 3 is a schematic structural diagram of an electronic device provided in an embodiment of the present disclosure. An electronic device 300 according to this embodiment of the invention is described below with reference to fig. 3. The electronic device 300 shown in fig. 3 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 3, electronic device 300 is embodied in the form of a general purpose computing device. The components of electronic device 300 may include, but are not limited to: at least one processing unit 310, at least one memory unit 320, a bus 330 that couples various system components including the memory unit 320 and the processing unit 310, a display unit 340, and the like.
Wherein the storage unit stores program code executable by the processing unit 310 to cause the processing unit 310 to perform the steps according to various exemplary embodiments of the present invention described in the above-mentioned processing method section of the present specification. For example, the processing unit 310 may perform the steps shown in fig. 1.
The storage unit 320 may include readable media in the form of volatile storage units, such as a random access memory unit (RAM) 3201 and/or a cache storage unit 3202, and may further include a read only memory unit (ROM) 3203.
The memory unit 320 may also include programs/utilities 3204 having a set (at least one) of program modules 3205, such program modules 3205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
The electronic device 300 may also communicate with one or more external devices 400 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 300, and/or with any device (e.g., router, modem, etc.) that enables the electronic device 300 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 350. Also, the electronic device 300 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 360. Network adapter 360 may communicate with other modules of electronic device 300 via bus 330. It should be understood that although not shown in FIG. 3, other hardware and/or software modules may be used in conjunction with electronic device 300, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments of the present invention described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiment of the present invention can be embodied in the form of a software product, which can be stored in a computer-readable storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to make a computing device (which can be a personal computer, a server, or a network device, etc.) execute the above method according to the present invention. The computer program, when executed by a data processing apparatus, enables the computer readable medium to implement the above-described method of the invention, namely: such as the method shown in fig. 1.
Fig. 4 is a schematic diagram of a computer-readable medium provided in an embodiment of the present specification.
A computer program implementing the method shown in fig. 1 may be stored on one or more computer readable media. The computer readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
In summary, the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that some or all of the functionality of some or all of the components in embodiments in accordance with the invention may be implemented in practice using a general purpose data processing device such as a microprocessor or a Digital Signal Processor (DSP). The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website, or provided on a carrier signal, or provided in any other form.
While the foregoing detailed description has described in detail certain embodiments of the invention with reference to certain specific aspects, embodiments and advantages thereof, it should be understood that the invention is not limited to any particular computer, virtual machine, or electronic device, as various general purpose machines may implement the invention. The invention is not to be considered as limited to the specific embodiments thereof, but is to be understood as being modified in all respects, all changes and equivalents that come within the spirit and scope of the invention.
All the embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from other embodiments.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art to which the present application pertains. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.
Claims (11)
1. An address matching method based on semantic analysis is characterized by comprising the following steps:
creating a standard library of addresses based on the obtained POI information and the road doorplate information of the map application program; collecting sample address content, constructing a standard library sorting rule, performing character conversion on the sample address content by using a regular expression in the standard library sorting rule, and storing the converted sample address content into a created standard library;
collecting address information obtained by customer service telephones in the same industry, and removing non-standard words; constructing a bag-of-words model and an address corpus by using a standard library;
performing word frequency vectorization processing on the objects in the corpus according to the character semantics to obtain semantic word frequency vectors;
providing a fault hot line module, carrying out voice conversation with a user through the fault hot line module, acquiring the corpus content to be detected in the conversation, carrying out word segmentation and stop word elimination on the corpus content to be detected, and carrying out word-frequency vectorization on the processed corpus content to be detected to obtain a semantic word-frequency vector of the corpus content to be detected;
calculating the similarity between the semantic word frequency vector of the corpus content to be detected and each object in the corpus, sequencing, and identifying the corpus object with the maximum similarity;
verifying whether the matching result is accurate; and performing fault processing according to the address characteristics of the identified corpus object.
2. The method of claim 1, wherein the fault handling based on the address of the identified corpus object comprises:
and sending the identified address to a field cruising system to assist the troubleshooting personnel to carry out field cruising.
3. The method according to claim 1, wherein the obtaining the corpus content to be detected in the dialog comprises:
obtaining conversation voice, constructing a corpus recognition model, and recognizing the corpus content to be detected in the conversation voice in the corpus recognition model.
4. The method according to claim 3, wherein said constructing a corpus recognition model comprises:
and training the corpus recognition model according to whether the sample label is provided with the stop word setting sample.
5. The method of claim 1, further comprising:
and adjusting the word segmentation precision until the word segmentation result meets the preset precision condition.
6. The method of claim 1, further comprising:
and constructing a word segmentation model in a machine learning mode, and performing word segmentation processing on the corpus content to be detected.
7. The method of claim 6, wherein the segmentation model is further configured to merge broken semantic texts.
8. The method of claim 1, wherein the fault handling based on the address characteristics of the identified corpus objects, further comprises:
and feeding back the matched and identified address characteristics to the user in real time.
9. An address matching apparatus based on semantic analysis, comprising:
the corpus module is used for creating a standard library of addresses based on the obtained POI information and road doorplate information of the map application program; collecting sample address content, constructing a standard library sorting rule, performing character conversion on the sample address content by using a regular expression in the standard library sorting rule, and storing the converted sample address content into a created standard library; collecting customer service linguistic data, and removing non-standard words in the customer service linguistic data;
constructing a bag-of-words model and an address corpus by using a standard library;
performing word frequency vectorization processing on the objects in the corpus according to the character semantics to obtain semantic word frequency vectors;
the fault hotline module is used for carrying out voice conversation with a user, acquiring the corpus content to be detected in the conversation, cleaning the corpus content to be detected, and carrying out word frequency vectorization processing to obtain a semantic word frequency vector of the corpus content to be detected;
the matching module is used for calculating the similarity between the semantic word frequency vector of the corpus content to be detected and each object in the corpus, sequencing the similarity and identifying the corpus object with the maximum similarity;
verifying whether the matching result is accurate; and performing fault processing according to the address characteristics of the identified corpus object.
10. An electronic device, wherein the electronic device comprises:
a processor; and the number of the first and second groups,
a memory storing a computer executable program that, when executed, causes the processor to perform the method of any of claims 1-8.
11. A computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement the method of any of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111092055.6A CN113987114B (en) | 2021-09-17 | 2021-09-17 | Address matching method and device based on semantic analysis and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111092055.6A CN113987114B (en) | 2021-09-17 | 2021-09-17 | Address matching method and device based on semantic analysis and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113987114A CN113987114A (en) | 2022-01-28 |
CN113987114B true CN113987114B (en) | 2023-04-07 |
Family
ID=79736017
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111092055.6A Active CN113987114B (en) | 2021-09-17 | 2021-09-17 | Address matching method and device based on semantic analysis and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113987114B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105786800A (en) * | 2016-03-23 | 2016-07-20 | 苏州数字地图信息科技股份有限公司 | Police standard address acquiring method and system |
CN110019575A (en) * | 2017-08-04 | 2019-07-16 | 北京京东尚科信息技术有限公司 | The method and apparatus that geographical address is standardized |
WO2020168750A1 (en) * | 2019-02-18 | 2020-08-27 | 平安科技(深圳)有限公司 | Address information standardization method and apparatus, computer device and storage medium |
CN111638422A (en) * | 2020-06-11 | 2020-09-08 | 国家电网有限公司 | Rapid positioning method based on electric power big data power distribution network fault |
CN113326267A (en) * | 2021-06-24 | 2021-08-31 | 中国科学技术大学智慧城市研究院(芜湖) | Address matching method based on inverted index and neural network algorithm |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP6302391B2 (en) * | 2014-10-07 | 2018-03-28 | ヤンマー株式会社 | Remote server |
CN106598953A (en) * | 2016-12-28 | 2017-04-26 | 上海博辕信息技术服务有限公司 | Address resolution method and device |
CN110442603B (en) * | 2019-07-03 | 2024-01-19 | 平安科技(深圳)有限公司 | Address matching method, device, computer equipment and storage medium |
CN110928971B (en) * | 2019-11-21 | 2023-05-09 | 深圳无域科技技术有限公司 | Method and device for improving address identification accuracy |
CN111522838B (en) * | 2020-04-23 | 2023-07-21 | 数网金融有限公司 | Address similarity calculation method and device |
CN112347222B (en) * | 2020-10-22 | 2022-03-18 | 中科曙光南京研究院有限公司 | Method and system for converting non-standard address into standard address based on knowledge base reasoning |
CN112559658B (en) * | 2020-12-08 | 2022-12-30 | 中国科学技术大学 | Address matching method and device |
CN112818685B (en) * | 2021-01-29 | 2024-07-26 | 上海寻梦信息技术有限公司 | Address matching method and device, electronic equipment and storage medium |
CN112767924A (en) * | 2021-02-26 | 2021-05-07 | 北京百度网讯科技有限公司 | Voice recognition method and device, electronic equipment and storage medium |
-
2021
- 2021-09-17 CN CN202111092055.6A patent/CN113987114B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105786800A (en) * | 2016-03-23 | 2016-07-20 | 苏州数字地图信息科技股份有限公司 | Police standard address acquiring method and system |
CN110019575A (en) * | 2017-08-04 | 2019-07-16 | 北京京东尚科信息技术有限公司 | The method and apparatus that geographical address is standardized |
WO2020168750A1 (en) * | 2019-02-18 | 2020-08-27 | 平安科技(深圳)有限公司 | Address information standardization method and apparatus, computer device and storage medium |
CN111638422A (en) * | 2020-06-11 | 2020-09-08 | 国家电网有限公司 | Rapid positioning method based on electric power big data power distribution network fault |
CN113326267A (en) * | 2021-06-24 | 2021-08-31 | 中国科学技术大学智慧城市研究院(芜湖) | Address matching method based on inverted index and neural network algorithm |
Non-Patent Citations (3)
Title |
---|
Improvement in Semantic Address Matching using Natural Language Processing;Vansh Gupta 等;《2021 2nd International Conference for Emerging Technology (INCET)》;20210622;1-5 * |
基于自然语言的中文地址匹配研究;徐兵 等;《电子设计工程》;20200818;第28卷(第16期);7-10+16 * |
基于语音识别的智能故障报修系统的研究与应用;孙林檀 等;《电子科学技术》;20170910;第04卷(第05期);73-76 * |
Also Published As
Publication number | Publication date |
---|---|
CN113987114A (en) | 2022-01-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107273503B (en) | Method and device for generating parallel text in same language | |
KR101806151B1 (en) | Method and device for extracting alternative words automatically, recording medium for performing the method | |
CN110276023B (en) | POI transition event discovery method, device, computing equipment and medium | |
US11822568B2 (en) | Data processing method, electronic equipment and storage medium | |
CN111191445B (en) | Advertisement text classification method and device | |
CN109086265B (en) | Semantic training method and multi-semantic word disambiguation method in short text | |
CN111930792B (en) | Labeling method and device for data resources, storage medium and electronic equipment | |
US20180373989A1 (en) | Relation extraction using co-training with distant supervision | |
CN111626055B (en) | Text processing method and device, computer storage medium and electronic equipment | |
CN113220999B (en) | User characteristic generation method and device, electronic equipment and storage medium | |
US10049108B2 (en) | Identification and translation of idioms | |
CN116402166B (en) | Training method and device of prediction model, electronic equipment and storage medium | |
CN110610003A (en) | Method and system for assisting text annotation | |
US10354013B2 (en) | Dynamic translation of idioms | |
CN111597800A (en) | Method, device, equipment and storage medium for obtaining synonyms | |
CN115438149A (en) | End-to-end model training method and device, computer equipment and storage medium | |
CN115798661A (en) | Knowledge mining method and device in clinical medicine field | |
CN118378631A (en) | Text examination method, device, equipment and storage medium | |
CN112989050B (en) | Form classification method, device, equipment and storage medium | |
CN110889717A (en) | Method and device for filtering advertisement content in text, electronic equipment and storage medium | |
CN114580383A (en) | Log analysis model training method and device, electronic equipment and storage medium | |
CN112148958A (en) | Method, apparatus, and computer storage medium for information recommendation | |
CN110929499B (en) | Text similarity obtaining method, device, medium and electronic equipment | |
CN110222139A (en) | Road solid data De-weight method, calculates equipment and medium at device | |
CN111460224B (en) | Comment data quality labeling method, comment data quality labeling device, comment data quality labeling equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |