CN113011188A - Method for intelligently mining complaint reporting object based on complaint reporting content - Google Patents

Method for intelligently mining complaint reporting object based on complaint reporting content Download PDF

Info

Publication number
CN113011188A
CN113011188A CN202110313877.6A CN202110313877A CN113011188A CN 113011188 A CN113011188 A CN 113011188A CN 202110313877 A CN202110313877 A CN 202110313877A CN 113011188 A CN113011188 A CN 113011188A
Authority
CN
China
Prior art keywords
complaint
content
complaint reporting
reporting
name
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110313877.6A
Other languages
Chinese (zh)
Inventor
侯居永
张雷
栾丽丽
陈兆亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Cloud Information Technology Co Ltd
Original Assignee
Inspur Cloud Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Cloud Information Technology Co Ltd filed Critical Inspur Cloud Information Technology Co Ltd
Priority to CN202110313877.6A priority Critical patent/CN113011188A/en
Publication of CN113011188A publication Critical patent/CN113011188A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/31Programming languages or programming paradigms
    • G06F8/315Object-oriented languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Software Systems (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Educational Administration (AREA)
  • Computing Systems (AREA)
  • Development Economics (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Primary Health Care (AREA)
  • Operations Research (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a method for intelligently mining a complaint reporting object based on complaint reporting content, which comprises the following steps: step 1: acquiring source data, connecting a complaint reporting database table, opening the database table, reading the content of the complaint reporting database table and entering the data set; step 2: identifying the organization name, namely identifying the organization name in the complaint content aiming at the complaint content after the complaint content information in the data set is obtained; and step 3: accurate matching, namely calling market supervision registration information, and obtaining detailed name information of the enterprise through an accurate matching algorithm by utilizing the identified mechanism name and address information; and 4, step 4: and recommending a result, namely recommending and displaying the calculation result by the system in the last step after the detailed name of the complaint object enterprise is obtained so as to perform the next application. After the method is applied, the collected complaint reporting information can be intelligently matched with the registration information of the market monitoring enterprise by applying an artificial intelligence AI technology, so that accurate complaint reporting enterprise object information can be obtained.

Description

Method for intelligently mining complaint reporting object based on complaint reporting content
Technical Field
This patent belongs to computer software artificial intelligence field, and the work process is handled to the complaint report to the government department mainly being applied to, with the help of the research result of this technique, realizes the quick accurate discernment to the complaint report object, greatly promotes work efficiency.
Background
Named Entity Recognition (NER) is a fundamental task of Natural Language Processing (NLP). The method aims to identify named entities such as names of people, places, organizational structures and the like in the corpus. Since these named entities are increasing in number, they are usually not possible to be listed exhaustively in dictionaries, and their construction methods have their own regularity, the recognition of these words is usually handled independently from the task of lexical morphological processing (e.g. chinese segmentation), called named entity recognition. The named entity recognition technology is an indispensable component of various natural language processing technologies such as information extraction, information retrieval, machine translation, question and answer systems and the like.
Named entities are the subject of research for named entity recognition and generally include 3 major classes (entity class, time class, and numeric class) and 7 minor classes (person name, place name, organization name, time, date, currency, and percentage) of named entities. Judging whether a named entity is correctly identified includes two aspects: whether the boundaries of the entity are correct; whether the type of the entity is correctly labeled. The main error types include correct text and possible wrong types; otherwise, the text boundary is wrong, and the main entity words and word class marks contained in the text boundary can be correct.
Algorithmically, Conditional Random Field (CRF) models are often used for named entity recognition. Since the method is simple and easy to implement and can achieve better performance, it is favored by the industry, has been widely applied to the identification of various types of named entities such as names of people, places, organizations and the like, and is continuously improved in specific applications.
In view of the fact that a large amount of complaint reporting data are gathered by various government departments, data processing and processing are required to be carried out on the mass data, and the information of each supervision object reported by the complaints is automatically extracted by using an artificial intelligence technology, the technology provided by the patent is produced in the background.
Disclosure of Invention
The technical task of the invention is to solve the defects of the prior art, provide a method for intelligently mining the complaint report object based on the complaint report content, solve the problem of low efficiency of manually identifying the supervision object through the complaint report content in the current government complaint report processing business field, automatically identify the complaint reported object by using the technology, and greatly improve the working efficiency.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a method for intelligently mining a complaint reporting object based on complaint reporting content comprises the steps of obtaining source data, identifying organization names, accurately matching and recommending results:
step 1: acquiring source data, connecting a complaint reporting database table, reading the content of the complaint reporting database table into a data set after opening the database table, and acquiring complaint content information;
step 2: identifying the organization name, namely identifying the organization name in the complaint content aiming at the complaint content after the complaint content information in the data set is obtained;
and step 3: accurate matching, namely calling market supervision registration information, and obtaining detailed name information of the enterprise through an accurate matching algorithm by utilizing the identified mechanism name and address information;
and 4, step 4: and recommending a result, namely recommending and displaying the calculation result by the system in the last step after the detailed name of the complaint object enterprise is obtained so as to perform the next application.
Further, there may be a plurality of organization names identified in step 2, which requires data cleansing by us to find the most needed one, namely, the name of the enterprise to which the complaint report is made, and after step 2 and before step 3, a data cleansing step is further included for cleansing and filtering out irrelevant organizations from the organization name set identified in step 2, and only the enterprise or company related organization names are retained.
Further, after the database is opened in step 1, the content of the complaint report data table is read into the data set by executing an sql select statement.
Further, the data set in step 1 includes complaint content and complaint address information.
Further, a named entity identification algorithm, namely an NER algorithm, is called for the complaint content in step 2, and the agency name in the complaint content is identified.
Further, the irrelevant organization that needs to be cleaned and filtered out in the data cleaning step is a government agency or community.
Further, the irrelevant mechanism that needs to wash and filter in the data washing step belongs to fixed phrase, mainly includes: government, Jupiter, Authority, Disable Commission, administration, Party, court, college, planning agency, petition office, immigration office, city management, supervision agency, religious Commission, Committee, armed services, Primary school, university, school, State department, Law enforcement office, education agency, Party group office, high school, work group, affiliate office, public Security, Party, police station, House administration, Dispatch Commission, subway station, House administration, supervision administration, Industrial office, Commission, fire protection, quarantine office, State administration, team, subsidiary, travel, City construction Commission, civil institute, Master, Committee, official, Collection, price office, school, health Care, prison, environmental agency, Party office, and office.
Further, the detailed enterprise name in step 3 includes four components, which are administrative area, name word size, industry features and component form.
Compared with the prior art, the method for intelligently mining the complaint reporting object based on the complaint reporting content has the beneficial effects that:
after the method is applied, the collected complaint reporting information can be intelligently matched with the registration information of the market monitoring enterprise by applying an artificial intelligence AI technology, so that accurate complaint reporting enterprise object information can be obtained.
In this way, the subsequent application related to complaint report can perform the relevant processing for the information of the complaint report target.
Drawings
FIG. 1 is a schematic flow chart of the algorithm of the present invention.
Detailed Description
In order to more clearly describe the working principle of the method for intelligently mining the complaint report object based on the complaint report content, the method is further described by the attached diagram.
The invention provides a method for intelligently mining a complaint reporting object based on complaint reporting content, which comprises the steps of acquiring source data, identifying organization names, cleaning data, accurately matching and recommending results:
step 1: acquiring source data, connecting a complaint reporting database table, reading the content of the complaint reporting database table into a data set after opening the database table, and acquiring complaint content information;
step 2: identifying the organization name, namely identifying the organization name in the complaint content aiming at the complaint content after the complaint content information in the data set is obtained; the identified organization names are possibly in a plurality of cases, so that data cleaning is needed to find the most needed organization name, namely the name of the complaint reporting object business;
and step 3: and (3) data cleaning, which is used for cleaning and filtering out irrelevant organizations from the organization name set identified in the step 2, and only keeping enterprise or company relevant organization names.
And 4, step 4: accurate matching, namely calling market supervision registration information, and obtaining detailed name information of the enterprise through an accurate matching algorithm by utilizing the identified mechanism name and address information;
and 5: and recommending a result, namely recommending and displaying the calculation result by the system in the last step after the detailed name of the complaint object enterprise is obtained so as to perform the next application.
After the database is opened in the step 1, the content of the complaint report data table is read into the data set by executing an sql select statement.
The data set in step 1 includes complaint content and complaint address information.
And (3) calling a named entity recognition algorithm, namely an NER algorithm, for the complaint content in the step 2, and recognizing the organization name in the complaint content.
The irrelevant agency requiring cleaning and filtering in the step 3 is a government agency or community.
The irrelevant mechanism that needs to wash and filter in the above-mentioned step 3 belongs to fixed phrase, mainly includes: government, Jupiter, Authority, Disable Commission, administration, Party, court, college, planning agency, petition office, immigration office, city management, supervision agency, religious Commission, Committee, armed services, Primary school, university, school, State department, Law enforcement office, education agency, Party group office, high school, work group, affiliate office, public Security, Party, police station, House administration, Dispatch Commission, subway station, House administration, supervision administration, Industrial office, Commission, fire protection, quarantine office, State administration, team, subsidiary, travel, City construction Commission, civil institute, Master, Committee, official, Collection, price office, school, health Care, prison, environmental agency, Party office, and office.
Further, the detailed enterprise name in step 4 includes four components, which are administrative area, name word size, industry features and component form.
The following is part of the python source code for reference:
Figure BDA0002990328230000051
Figure BDA0002990328230000061
Figure BDA0002990328230000071
Figure BDA0002990328230000081
Figure BDA0002990328230000091

Claims (8)

1. a method for intelligently mining a complaint reporting object based on complaint reporting content is characterized by comprising the steps of obtaining source data, identifying organization names, accurately matching and recommending results:
step 1: acquiring source data, connecting a complaint reporting database table, reading the content of the complaint reporting database table into a data set after opening the database table, and acquiring complaint content information;
step 2: identifying the organization name, namely identifying the organization name in the complaint content aiming at the complaint content after the complaint content information in the data set is obtained;
and step 3: accurate matching, namely calling market supervision registration information, and obtaining detailed name information of the enterprise through an accurate matching algorithm by utilizing the identified mechanism name and address information;
and 4, step 4: and recommending a result, namely recommending and displaying the calculation result by the system in the last step after the detailed name of the complaint object enterprise is obtained so as to perform the next application.
2. The method for intelligently mining the complaint reporting object based on the complaint reporting content as claimed in claim 1, wherein there are a plurality of possible names of organizations identified in step 2, which requires data cleaning to find the most needed one, namely the name of the company of the complaint reporting object, and further comprises a data cleaning step after step 2 and before step 3, for cleaning and filtering out irrelevant organizations from the set of names of organizations identified in step 2, and only keeping the names of enterprises or companies.
3. The method for intelligently mining the complaint reporting object based on the complaint reporting content as claimed in claim 1, wherein the complaint reporting data table content is read into the data set by executing an sql select statement after the database is opened in step 1.
4. The method for intelligently mining the complaint reporting object based on the complaint reporting content as claimed in claim 1, wherein the data set in step 1 comprises the complaint content and the complaint address information.
5. The method for intelligently mining the complaint reporting object based on the complaint reporting content as claimed in claim 1, wherein a named entity recognition algorithm, NER algorithm, is invoked for the complaint content in step 2 to identify the organization name in the complaint content.
6. The method for intelligently mining the complaint reporting object based on the complaint reporting content as claimed in claim 2, wherein the irrelevant organization to be cleaned and filtered out in the data cleaning step is a government agency or a community.
7. The method for intelligently mining the complaint reporting object based on the complaint reporting content as claimed in claim 2, wherein the irrelevant mechanisms needing to be cleaned and filtered in the data cleaning step belong to a fixed phrase, and mainly comprise: government, Jupiter, Authority, Disable Commission, administration, Party, court, college, planning agency, petition office, immigration office, city management, supervision agency, religious Commission, Committee, armed services, Primary school, university, school, State department, Law enforcement office, education agency, Party group office, high school, work group, affiliate office, public Security, Party, police station, House administration, Dispatch Commission, subway station, House administration, supervision administration, Industrial office, Commission, fire protection, quarantine office, State administration, team, subsidiary, travel, City construction Commission, civil institute, Master, Committee, official, Collection, price office, school, health Care, prison, environmental agency, Party office, and office.
8. The method for intelligently mining the complaint reporting object based on the complaint reporting content as claimed in claim 1, wherein the detailed enterprise name in step 3 comprises four parts, namely an administrative region, a name word size, an industry feature and a composition form.
CN202110313877.6A 2021-03-24 2021-03-24 Method for intelligently mining complaint reporting object based on complaint reporting content Pending CN113011188A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110313877.6A CN113011188A (en) 2021-03-24 2021-03-24 Method for intelligently mining complaint reporting object based on complaint reporting content

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110313877.6A CN113011188A (en) 2021-03-24 2021-03-24 Method for intelligently mining complaint reporting object based on complaint reporting content

Publications (1)

Publication Number Publication Date
CN113011188A true CN113011188A (en) 2021-06-22

Family

ID=76406155

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110313877.6A Pending CN113011188A (en) 2021-03-24 2021-03-24 Method for intelligently mining complaint reporting object based on complaint reporting content

Country Status (1)

Country Link
CN (1) CN113011188A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109614524A (en) * 2018-11-26 2019-04-12 汉纳森(厦门)数据股份有限公司 A kind of method of keyword bi-directional matching
CN109670843A (en) * 2018-11-12 2019-04-23 平安科技(深圳)有限公司 Data processing method, device, computer equipment and the storage medium of complaint business
CN110619124A (en) * 2019-09-19 2019-12-27 成都数之联科技有限公司 Named entity identification method and system combining attention mechanism and bidirectional LSTM
CN111553817A (en) * 2020-04-24 2020-08-18 北京北大软件工程股份有限公司 Analysis method and system for goodness of fit of complaint reporting case and treatment department

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109670843A (en) * 2018-11-12 2019-04-23 平安科技(深圳)有限公司 Data processing method, device, computer equipment and the storage medium of complaint business
CN109614524A (en) * 2018-11-26 2019-04-12 汉纳森(厦门)数据股份有限公司 A kind of method of keyword bi-directional matching
CN110619124A (en) * 2019-09-19 2019-12-27 成都数之联科技有限公司 Named entity identification method and system combining attention mechanism and bidirectional LSTM
CN111553817A (en) * 2020-04-24 2020-08-18 北京北大软件工程股份有限公司 Analysis method and system for goodness of fit of complaint reporting case and treatment department

Similar Documents

Publication Publication Date Title
CN110619506A (en) Post portrait generation method, post portrait generation device and electronic equipment
CN113269244B (en) Method for implementing disambiguation treatment for cross-enterprise personnel renaming in business registration information
CN109739992B (en) Method and terminal for acquiring associated information
Francopoulo et al. Anonymization for the GDPR in the Context of Citizen and Customer Relationship Management and NLP
Ash et al. Unsupervised extraction of workplace rights and duties from collective bargaining agreements
CN114840519A (en) Data labeling method, equipment and storage medium
Hamborg et al. Newsalyze: enabling news consumers to understand media bias
US9805072B2 (en) Qualification of match results
CN112363996B (en) Method, system and medium for establishing physical model of power grid knowledge graph
CN113011188A (en) Method for intelligently mining complaint reporting object based on complaint reporting content
CN105573984B (en) The recognition methods of socio-economic indicator and device
van den Braak et al. Combining and analyzing judicial databases
CN110019237B (en) System and method for analyzing criminal whereabouts based on map
CN110941652A (en) Analysis method of bank flow data
CN115719289A (en) House data processing method, device, equipment and medium
Eberle et al. Record linkage of the linked employer-employee survey of the socio-economic panel study (SOEP-LEE) and the establishment history panel (BHP)
Mukti et al. How Accounting Artificial Intelligence Can Prevent Fraud?(Status and Research Opportunities)
Christen et al. A probabilistic geocoding system utilising a parcel based address file
CN114241206A (en) Target object feature extraction method and device, electronic equipment and storage medium
CN110851431B (en) Data processing method and device for data center station
CN113779998A (en) Structured processing method, device and equipment of service information and storage medium
CN113204644A (en) Government affair encyclopedia construction method based on knowledge graph
Yu et al. An LLM Maturity Model for Reliable and Transparent Text-to-Query
CN110928985A (en) Scientific and technological project duplicate checking method for automatically extracting near-meaning words based on deep learning algorithm
CN111612601B (en) Financial risk identification method and device for marketing companies based on service institutions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210622