CN111353309A - Method and system for processing communication quality complaint address based on text analysis - Google Patents

Method and system for processing communication quality complaint address based on text analysis Download PDF

Info

Publication number
CN111353309A
CN111353309A CN202010114162.3A CN202010114162A CN111353309A CN 111353309 A CN111353309 A CN 111353309A CN 202010114162 A CN202010114162 A CN 202010114162A CN 111353309 A CN111353309 A CN 111353309A
Authority
CN
China
Prior art keywords
address
level
tree
rule
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010114162.3A
Other languages
Chinese (zh)
Inventor
刘德厚
雷晓宇
王福君
李言良
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Heli Yijie Polytron Technologies Inc
Original Assignee
Beijing Heli Yijie Polytron Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Heli Yijie Polytron Technologies Inc filed Critical Beijing Heli Yijie Polytron Technologies Inc
Publication of CN111353309A publication Critical patent/CN111353309A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases

Abstract

A method and system for processing communication quality complaint addresses based on text analysis includes: performing Chinese address word segmentation on a source text by a forward longest matching algorithm; the word segmentation result is subjected to reasoning analysis by a multi-level administrative division address tree to obtain an accurate multi-level place name recognition result; after receiving input transmitted by an address tree, automatically analyzing more complex and more general multi-level address fields based on a rule multi-level address recognition algorithm; and fusing the inference result of the address tree and the recognition result matched with the rule to be used as the final output of the algorithm system. Aiming at the specific accepted text content of the communication quality complaint work order, Chinese word segmentation, regular expression address extraction and address tree reasoning processing are carried out, and then automatic extraction of the standard address of the communication quality complaint work order is achieved.

Description

Method and system for processing communication quality complaint address based on text analysis
Technical Field
The invention relates to the technical field of computer networks, in particular to a method and a system for processing a communication quality complaint address based on text analysis.
Background
After a customer in the telecommunication industry complains about the communication quality problem through a service hotline incoming call, a front-line seat staff enters a work order system along with the fault problem record to be distributed to a network department, and the work order system is distributed to a final responsibility unit to be processed according to a step-by-step distribution mode of province-city-county and the like. Existing telecommunications customer service worksheet systems do not have a method for handling communication quality complaint addresses based on text analysis. Existing telecommunications customer service worksheet systems do not have a method for handling communication quality complaint addresses based on text analysis. Manual order dispatching processing is needed in each link of a call center, each level of network departments and the like, and therefore the work order circulation period is too long, and the efficiency is too low.
Disclosure of Invention
The scheme adopts a method to realize the automatic extraction of the standard address of the communication quality complaint work order after Chinese word segmentation, regular expression address extraction and address tree reasoning processing aiming at the specific acceptance text content of the communication quality complaint work order.
The invention aims to solve the problem of high coupling performance of events in the whole pushing process, and realizes free combination of services by configuring rules.
The invention provides a method for processing telecommunication industry communication quality complaint addresses based on text analysis, which comprises the following steps:
step one, performing Chinese address word segmentation on a source text through a forward longest matching algorithm;
secondly, reasoning and analyzing the word segmentation result by a multi-level administrative division address tree to obtain an accurate multi-level place name recognition result;
step three, after receiving the input transmitted by the address tree, automatically analyzing more complex and more general multi-level address fields based on a regular multi-level address recognition algorithm;
and step four, fusing the inference result of the address tree and the recognition result matched with the rule to be used as the final output of the algorithm system.
In an embodiment of the present disclosure, in the first step, the administrative division prefix index database is queried.
In an embodiment of the present disclosure, in the second step, the multi-level administrative division record data is queried.
In an embodiment of the present disclosure, in the third step, each level of the identification rule base is queried.
In an embodiment of the present disclosure, the steps one to three include various stages of named entity identification processing.
The invention also provides a system for processing the telecommunication industry communication quality complaint address based on text analysis, which comprises the following steps:
the forward longest matching algorithm module is used for performing Chinese address word segmentation on the source text;
the multi-level administrative division address tree reasoning module is used for reasoning and analyzing the word segmentation result to obtain an accurate multi-level place name recognition result;
the rule-based multi-level address identification module is used for automatically analyzing more complicated and more general multi-level address fields after receiving input transmitted by an address tree;
and the result fusion output module fuses the inference result of the address tree and the recognition result matched with the rule and takes the fusion result as the final output of the algorithm system.
In an embodiment of the present disclosure, the forward longest match algorithm module queries an administrative division prefix index database.
In an embodiment of the disclosure, the multi-level administrative division address tree inference module queries the multi-level administrative division record data.
In one embodiment of the present disclosure, the rule-based multi-level address identification module queries a multi-level identification rule base.
In one embodiment of the disclosure, named entity recognition processing modules are included at various levels.
The method and the system for processing the telecommunication quality complaint address based on the text analysis have the technical effects that the method for automatically processing the telecommunication quality complaint address based on the text analysis is provided, the circulation efficiency of a communication quality complaint work order is improved, and the evaluation and management means of the communication quality complaint are improved.
Additional features and advantages of embodiments of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
The technical solutions of the embodiments of the present invention are further described in detail with reference to the accompanying drawings and embodiments.
Drawings
FIG. 1 is a schematic diagram of a Chinese address resolution system;
FIG. 2 is a flow chart of Chinese address resolution;
FIG. 3 is a flow chart of a forward longest match algorithm;
FIG. 4 is a schematic diagram of address tree reasoning.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The invention adopts a method to carry out Chinese word segmentation, regular expression address extraction and address tree reasoning treatment aiming at the specific accepted text content of the communication quality complaint work order, thereby realizing the automatic extraction of the specific address of the communication quality complaint and automatically converting the specific address into a 9-segment standard address format.
Realization idea
Province, city, county and the like in the Chinese addresses belong to national administrative divisions, are fixed in name and do not change frequently, and can be used as a standard database to be compared and analyzed with the Chinese addresses to be analyzed. By utilizing the forward longest matching strategy in Chinese word segmentation, Chinese addresses existing in a database can be rapidly segmented. Considering that the hierarchical address is a tree structure, the address tree can be established and reasoning can be carried out on the address tree, so that the Chinese address can be identified.
Because the Chinese addresses grow exponentially with the level subdivision, more elaborate Chinese addresses are difficult to build into a standardized place name database. However, through certain analysis, some common characteristic patterns exist in Chinese addresses at each level. If all the patterns can be analyzed, the problem of Chinese address recognition can be solved. By designing elaborate rules for a specific address recognition task, a given recognition task can be completed with high accuracy. The realization method can complete the recognition task according to the ascending order of the address grade by constructing a series of regular expressions.
1. System architecture
FIG. 1 is a block diagram illustrating the architecture of the method and system for processing telecommunication industry complaint addresses based on text analysis. The architecture of the scheme of the invention comprises 5 components of an infrastructure layer, a data resource layer, a data comprehensive analysis layer, a data service layer and a problem solving layer. The scheme of the invention is operated on an X86 server, and is a Java operation environment of a Linux operation system. The database supporting the scheme stores national level 3 administrative division benchmarking data provided by the national statistical bureau and unofficial national level 5 administrative division benchmarking data. The data is the latest data and can be updated in time. The data service layer provides a Chinese address resolution JAVA API to the outside to start the scheme. And the problem solving layer verifies the accuracy of the algorithm through the training set and identifies the data of the test set. The data comprehensive analysis layer is called by the problem solving layer, and the Chinese address is analyzed through the stored rule. The basic rules include forward longest match rules, address tree inference rules, and other pattern matching rules.
The selection of this architecture has several advantages:
a) java code can be executed across platforms, and an API is externally provided by an algorithm system, so that the calling is simple and convenient.
b) The algorithm depends on the latest administrative division data, and irregular place names can be corrected. When the administrative division has updating change, only data needs to be replaced, and codes do not need to be reconstructed.
c) Rule-based pattern matching, a rule is represented using a regular expression. The representation of the rules is concise and highly efficient in practical applications. In general, in consideration of the habit of Chinese expression, the inherent mode of each level of address does not change, so that a specific set of rules has wide applicability. If a new pattern is found, the rule base is modified.
d) The algorithm is modularized and convenient to expand. For the place name recognition of the existing administrative division, the accuracy rate of the scheme is close to 100%. And for unregistered place names, improvement can be continued by adding an algorithm module. For other country place name identification tasks, the administrative division database and the rule database can be modified. It is convenient to expand the system under the framework.
2. Overall process flow of system
The flow of the algorithm is shown in fig. 2. The system relies on known place name data and matching rules.
The embodiment of the invention takes the work order text of the complaint of the communication quality as the Chinese address analysis object. The system firstly carries out forward longest matching Chinese address word segmentation on a source text by querying administrative division prefix index data, and obtains an accurate multi-level place name recognition result by querying multi-level administrative division record data and carrying out reasoning analysis on the multi-level administrative division address tree. This step is highly accurate and therefore it is reasonable to remove the recognized place name from the source text and pass the remaining text to the next module for processing.
The rule-based multi-level address recognition algorithm relies on the analysis of the place names at each level to find the most general rule from the data, thereby designing a reasonable and effective recognition rule. The rule is described by a regular expression, so that the method is simple, quick and general. The method can identify the place names which do not appear in the database, and has good generalization performance. After the rule-based multi-level address recognition module receives the input transmitted by the address tree, a proper address recognition rule is selected by inquiring the recognition rule bases at all levels, and more complicated and more general multi-level address fields are automatically analyzed.
And finally, fusing the inference result of the address tree and the recognition result matched with the rule by the algorithm system to serve as the final output of the algorithm system.
The whole set of algorithm system is realized by using Java language and provides an interface for the outside.
3. Detailed system design
3.1 Forward longest match Algorithm Module
The algorithm flow chart shown in fig. 3, the algorithm design concept is explained as follows:
1) building a prefix hash table and an address tree of the place name by using the 3-level administrative division data.
2) Set i to 0.
3) For the input string S, starting with the ith character of S, the longest substring starting with that character that appears in the prefix hash table is searched for. If the length is larger than a certain threshold value, the matching is considered to be successful, and the place name is identified; and if the identification cannot be carried out, i ← i +1, go to step 5).
4) The place name is added to the address tree with a score of X/2count, X being any number, not 220, the count being the total number of identified place names. i ← identifying the next position of the substring.
5) Repeat 3) if i is less than the length of S. Otherwise, ending.
3.2 Address Tree reasoning Module
Fig. 4 is a schematic diagram of an address tree reasoning module, and the algorithm idea is designed as follows:
enumerating the nodes which are added with scores on the address tree, and calculating the scores of the nodes to the tree root, wherein obviously the nodes which pass through are multi-level addresses. And taking the node with the highest score as an answer to obtain a final reasoning result. For example, the government of the basalt city of Nanjing. The nodes with the point change are nodes of Nanjing city and basalt zone, and the points are x and 1.5x respectively. Therefore, this node of the basalt area is selected as the recognition result. The points where it passes to the root of the tree are: basalt district-Nanjing City-Jiangsu province-China.
3.3 named entity recognition Module
3.3.1 data cleansing logic design
In the processing process of each level of named entity recognition module, data cleaning processing is firstly carried out, for example, a 'leisure area' is used as a data cleaning logic for interference removal, and a regular expression is used as ++ (
3.3.2 level entity identification design the following named entities are removed after they are identified.
a) Identification of town roads
The regular expression identifying the town is [ \ u4e00- \ u9fa5_0-9] +? (
The regular expression for the recognition road is [ \ u4e00- \ u9fa5_0-9] + {2,10 }? (
b) Identification of road numbers
The regular expression identifying road number is: (
c) Identification of building number, unit number and house number
The regular expression identifying the floor number is: ([ \ u4e00- \ u9fa5_0-9] + {2, }
The regular expression identifying the cell number is [0-9_ zero _ one _ two _ three _ four _ five _ six _ seven _ eight _ nine _ ten ] +? (
The regular expression identifying the subscriber number is [0-9_ zero _ one _ two _ three _ four _ five _ six _ seven _ eight _ nine _ ten _ a-Z _ - ] +? (
d) Data cleansing
If some answers have repeated prefixes, such as 'West Pond town', the answers are automatically corrected.
According to the method, after forward longest matching word segmentation, regular expression address extraction and Chinese address tree reasoning processing are carried out on Chinese address specific acceptance text contents of the communication quality complaint work order, the automatic extraction of the communication quality complaint work order specific address is realized and is automatically converted into a 9-segment standard address format, so that automatic work order dispatching is supported, and a final responsibility unit is dispatched in a one-key mode, so that the processing efficiency of the communication quality complaint work order is accelerated, and the overall service efficiency and the customer perception are improved. Meanwhile, the communication quality complaint quantity indexes of each level of province, city, county and county can be constructed nationwide and used for evaluating and managing network departments of each level.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention, and is not to be construed as limiting the invention since the present invention is more easily understood by those skilled in the art, and any modifications, equivalents and improvements made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method and a system for processing a communication quality complaint address based on text analysis are characterized by comprising the following steps:
step one, performing Chinese address word segmentation on a source text through a forward longest matching algorithm;
secondly, reasoning and analyzing the word segmentation result by a multi-level administrative division address tree to obtain an accurate multi-level place name recognition result;
step three, after receiving the input transmitted by the address tree, automatically analyzing more complex and more general multi-level address fields based on a regular multi-level address recognition algorithm;
and step four, fusing the inference result of the address tree and the recognition result matched with the rule to be used as the final output of the algorithm system.
2. The method of claim 1, wherein in step one, the administrative division prefix index database is queried.
3. The method of claim 1, wherein in step two, multi-level administrative division record data is queried.
4. The method of claim 1, wherein in step three, each stage of the recognition rule base is queried.
5. The method according to claim 1, wherein the steps one to three comprise various levels of named entity recognition processing.
6. A method and a system for processing a communication quality complaint address based on text analysis are characterized by comprising the following steps:
the forward longest matching algorithm module is used for performing Chinese address word segmentation on the source text;
the multi-level administrative division address tree reasoning module is used for reasoning and analyzing the word segmentation result to obtain an accurate multi-level place name recognition result;
the rule-based multi-level address identification module is used for automatically analyzing more complicated and more general multi-level address fields after receiving input transmitted by an address tree;
and the result fusion output module fuses the inference result of the address tree and the recognition result matched with the rule and takes the fusion result as the final output of the algorithm system.
7. The system of claim 6, wherein the forward longest match algorithm module queries an administrative district prefix index database.
8. The system of claim 6, wherein the multi-level administrative district address tree inference module queries multi-level administrative district record data.
9. The system of claim 6, wherein the rule-based multi-level address identification module queries a level identification rule base.
10. The system of claim 6, comprising named entity recognition processing modules at various levels.
CN202010114162.3A 2019-12-25 2020-02-24 Method and system for processing communication quality complaint address based on text analysis Pending CN111353309A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2019113624383 2019-12-25
CN201911362438 2019-12-25

Publications (1)

Publication Number Publication Date
CN111353309A true CN111353309A (en) 2020-06-30

Family

ID=71195764

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010114162.3A Pending CN111353309A (en) 2019-12-25 2020-02-24 Method and system for processing communication quality complaint address based on text analysis

Country Status (1)

Country Link
CN (1) CN111353309A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111914557A (en) * 2020-07-31 2020-11-10 上海燕汐软件信息科技有限公司 Address resolution method, device, equipment and computer readable storage medium
CN112181978A (en) * 2020-08-19 2021-01-05 杭州数梦工场科技有限公司 Address storage structure, address resolution method, device, medium and computer equipment
CN112699683A (en) * 2020-12-31 2021-04-23 大唐融合通信股份有限公司 Named entity identification method and device fusing neural network and rule

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008107305A2 (en) * 2007-03-07 2008-09-12 International Business Machines Corporation Search-based word segmentation method and device for language without word boundary tag
CN101882163A (en) * 2010-06-30 2010-11-10 中国科学院地理科学与资源研究所 Fuzzy Chinese address geographic evaluation method based on matching rule
CN102024024A (en) * 2010-11-10 2011-04-20 百度在线网络技术(北京)有限公司 Method and device for constructing address database
US20170124497A1 (en) * 2015-10-28 2017-05-04 Fractal Industries, Inc. System for automated capture and analysis of business information for reliable business venture outcome prediction
CN106649464A (en) * 2016-09-26 2017-05-10 深圳市数字城市工程研究中心 Method of building Chinese address tree and device
CN107016084A (en) * 2017-03-31 2017-08-04 江苏速度信息科技股份有限公司 A kind of place name address quickly positions the method with inquiry
CN108763215A (en) * 2018-05-30 2018-11-06 中智诚征信有限公司 A kind of address storage method, device and computer equipment based on address participle
US20180365217A1 (en) * 2017-06-14 2018-12-20 Beijing Baidu Netcom Science And Technology Co., Ltd. Word segmentation method based on artificial intelligence, server and storage medium
CN109308674A (en) * 2017-07-26 2019-02-05 北京嘀嘀无限科技发展有限公司 Processing method, device and the terminal device of Order Address
CN109522335A (en) * 2018-09-19 2019-03-26 北京明略软件系统有限公司 A kind of information acquisition method, device and computer readable storage medium
CN109961259A (en) * 2019-03-28 2019-07-02 上海中通吉网络技术有限公司 Address Standardization processing method and equipment

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008107305A2 (en) * 2007-03-07 2008-09-12 International Business Machines Corporation Search-based word segmentation method and device for language without word boundary tag
CN101882163A (en) * 2010-06-30 2010-11-10 中国科学院地理科学与资源研究所 Fuzzy Chinese address geographic evaluation method based on matching rule
CN102024024A (en) * 2010-11-10 2011-04-20 百度在线网络技术(北京)有限公司 Method and device for constructing address database
US20170124497A1 (en) * 2015-10-28 2017-05-04 Fractal Industries, Inc. System for automated capture and analysis of business information for reliable business venture outcome prediction
CN106649464A (en) * 2016-09-26 2017-05-10 深圳市数字城市工程研究中心 Method of building Chinese address tree and device
CN107016084A (en) * 2017-03-31 2017-08-04 江苏速度信息科技股份有限公司 A kind of place name address quickly positions the method with inquiry
US20180365217A1 (en) * 2017-06-14 2018-12-20 Beijing Baidu Netcom Science And Technology Co., Ltd. Word segmentation method based on artificial intelligence, server and storage medium
CN109308674A (en) * 2017-07-26 2019-02-05 北京嘀嘀无限科技发展有限公司 Processing method, device and the terminal device of Order Address
CN108763215A (en) * 2018-05-30 2018-11-06 中智诚征信有限公司 A kind of address storage method, device and computer equipment based on address participle
CN109522335A (en) * 2018-09-19 2019-03-26 北京明略软件系统有限公司 A kind of information acquisition method, device and computer readable storage medium
CN109961259A (en) * 2019-03-28 2019-07-02 上海中通吉网络技术有限公司 Address Standardization processing method and equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
应申;李威阳;贺彪;王维;赵朝彬;: "基于城市地址树的地址文本匹配方法" *
李晓林;张懿;周华兵;李霖;: "基于C-F模型的中文地址行政区划辨识方法" *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111914557A (en) * 2020-07-31 2020-11-10 上海燕汐软件信息科技有限公司 Address resolution method, device, equipment and computer readable storage medium
CN112181978A (en) * 2020-08-19 2021-01-05 杭州数梦工场科技有限公司 Address storage structure, address resolution method, device, medium and computer equipment
CN112699683A (en) * 2020-12-31 2021-04-23 大唐融合通信股份有限公司 Named entity identification method and device fusing neural network and rule

Similar Documents

Publication Publication Date Title
CN111353309A (en) Method and system for processing communication quality complaint address based on text analysis
US6735595B2 (en) Data structure and storage and retrieval method supporting ordinality based searching and data retrieval
US7281001B2 (en) Data quality system
Martinelli et al. Measuring knowledge persistence: a genetic approach to patent citation networks
Fogel et al. A note on representations and variation operators
US20060112133A1 (en) System and method for creating and maintaining data records to improve accuracy thereof
CN101699440B (en) Service-based retrieving method and service-based retrieving system
CN111176656B (en) Complex data matching method and medium
Goan et al. A grammar inference algorithm for the world wide web
CN110825919A (en) ID data processing method and device
Vilar Query learning of subsequential transducers
CN111737529B (en) Multi-source heterogeneous data acquisition method
CN115017251B (en) Standard mapping map establishing method and system for smart city
CN115146635B (en) Address segmentation method based on domain knowledge enhancement
CN108090185A (en) A kind of customer information duplicate checking method
CN115292448A (en) Language escaping method, device, equipment and storage medium
CN110765100B (en) Label generation method and device, computer readable storage medium and server
Sano et al. Modeling Prim's Algorithm for Tour Agencies' Minimum Traveling Paths to Increase Profitability
CN116414808A (en) Method, device, computer equipment and storage medium for normalizing detailed address
CN113742498A (en) Method for constructing and updating knowledge graph
CN112637432A (en) Extension identification method, system, equipment and storage medium under outbound scene
CN105843785A (en) Data custom calculation statement generation method embedded with organizational management level
CN110634019A (en) Matching method based on enterprise and region, electronic equipment and storage medium
CN116501897B (en) Method for constructing knowledge graph based on fuzzy matching
CN114490928B (en) Implementation method, system, computer equipment and storage medium of semantic search

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination