CN115510025A - Construction method of government affair industry knowledge base based on natural language and user behavior analysis - Google Patents

Construction method of government affair industry knowledge base based on natural language and user behavior analysis Download PDF

Info

Publication number
CN115510025A
CN115510025A CN202211116425.XA CN202211116425A CN115510025A CN 115510025 A CN115510025 A CN 115510025A CN 202211116425 A CN202211116425 A CN 202211116425A CN 115510025 A CN115510025 A CN 115510025A
Authority
CN
China
Prior art keywords
data
user behavior
knowledge base
natural language
behavior analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211116425.XA
Other languages
Chinese (zh)
Inventor
朱俊伟
贝聿运
陈祺
徐智蕴
方海宾
贝文馨
王倩璐
张晓东
陈飞飞
毛亚青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Big Data Center
Original Assignee
Shanghai Big Data Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Big Data Center filed Critical Shanghai Big Data Center
Priority to CN202211116425.XA priority Critical patent/CN115510025A/en
Publication of CN115510025A publication Critical patent/CN115510025A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Tourism & Hospitality (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Primary Health Care (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • General Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a construction method of a government affair industry knowledge base based on natural language and user behavior analysis, which comprises the following steps: s1) acquiring various unstructured data, semi-structured data and structured data in a manner from a third-party database, manually uploaded files, data captured on a webpage and user behavior data; s2) uniformly converting the unstructured data and the semi-structured data into structured data, and performing information extraction and data fusion processing to generate item classification and material classification; s3) analyzing the user behavior data and establishing a matter relation; and S4) examining all results, evaluating the reliability of the quantitative knowledge, and storing the quantitative knowledge as the applicable knowledge. The data collected by the invention is comprehensive and diverse, the hierarchical logic knowledge is formed by establishing the item relation through user behavior analysis, the reliability of the quantitative knowledge is evaluated, and the long-term effective operation and the more extensive and comprehensive application of the knowledge base after being established are ensured.

Description

Construction method of government affair industry knowledge base based on natural language and user behavior analysis
Technical Field
The invention relates to a construction method of a knowledge base, in particular to a construction method of a government affair industry knowledge base based on natural language and user behavior analysis.
Background
The construction of the knowledge base in the government affair industry is carried out by combing and updating in a manual mode, the obtained data source is single, the hierarchy and the logicality of the content of the knowledge base are caused to be deficient, the maintenance cost is too high due to too much manual intervention, the data updating is often not timely enough, the timeliness of knowledge cannot be guaranteed, the whole usability of the knowledge base is poor after the knowledge base is built for a period of time, therefore, the artificial intelligence related technology needs to be introduced, comprehensive and diverse data can be timely and quickly obtained through the self-learning of a computer, and the integrity and the effectiveness of the data are guaranteed.
With the new change of government affairs services, on the premise of providing better affair handling experience for common affairs handling persons, improving service content and improving affair handling efficiency, the requirement of comprehensive service window handling is provided, how to know item relation so as to provide important information for setting a comprehensive window becomes a new problem, the establishment of item combination or event according to the input and output of item handling is simple, only a series relation can be established, the supplement of human experience can only perfect data within post responsibility, data crossing posts and departments can be omitted, the understanding of item relation is incomplete, and further the service content and the number of the set comprehensive window are unreasonable; therefore, it is necessary to add user behavior, introduce more comprehensive parallel connection relation of events, cross-post, cross-department and even cross-regional attribute information, provide a comprehensive event relation diagram, and more comprehensively and reasonably establish event combination and events, and set contents and quantity of comprehensive service windows.
Disclosure of Invention
The invention aims to solve the technical problem of providing a construction method of a government affairs industry knowledge base based on natural language and user behavior analysis, which can quickly and effectively construct the knowledge base and ensure long-term effective operation and wider and comprehensive application of the knowledge base after the knowledge base is constructed.
The technical scheme adopted by the invention for solving the technical problems is to provide a construction method of a government affairs industry knowledge base based on natural language and user behavior analysis, which comprises the following steps: s1) acquiring various unstructured data, semi-structured data and structured data in a manner from a third-party database, manually uploaded files, data captured on a webpage and user behavior data; s2) uniformly converting the unstructured data and the semi-structured data into structured data, and performing information extraction and data fusion processing to generate item classification and material classification; s3) analyzing the user behavior data and establishing an item relation; and S4) examining all results, evaluating the credibility of the quantitative knowledge, and storing the quantitative knowledge as the applicable knowledge.
Further, the data acquired in step S1 includes collected transaction guide, declaration material, approval operation instruction manual, and related legal and regulatory documents.
Further, the step S2 extracts entities, entity attributes, and relationships between entities using the BERT-based deep learning model.
Further, in step S2, a triple composed of the entities, relationships, and entities is extracted, the entities are used as nodes, the relationships between the entities are sides to perform a fusion disambiguation operation, and names of frequently used relationships in a service are standardized.
Further, the step S2 further includes labeling items and materials, combing data hierarchy and logicality, and managing and searching the materials in a mapping manner.
Further, the step S3 includes: and applying behavior event analysis, page click analysis and user behavior path analysis to establish cross-post, cross-department, cross-region and parallel relations of the items.
Further, in the step S3, an Apriori algorithm is used to analyze the transaction sequence of the transaction records in a certain period of time, dig out a frequent item set, and after cleaning, obtain a correlation transaction sequence with a high probability, thereby obtaining a business logic item correlation.
Further, the step S4 includes: and establishing an entity body, adding a classification label, establishing a new entity relationship through the existing entity relationship by computer reasoning, and matching with a timing task to ensure the timeliness and the availability of the knowledge base.
Compared with the prior art, the invention has the following beneficial effects: the construction method of the government affair industry knowledge base based on the natural language and the user behavior analysis can ensure comprehensive diversity of data, establish item relationships through the user behavior analysis to form knowledge with hierarchical logicality, evaluate the credibility of the quantitative knowledge, and ensure long-term effective operation and wider and comprehensive application of the knowledge base after the knowledge base is built.
Drawings
FIG. 1 is a schematic diagram of a general construction process of a government affairs knowledge base according to the present invention;
fig. 2 is a schematic diagram of a construction process of a government affairs knowledge base according to an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the figures and examples.
FIG. 1 is a general construction flow diagram of the government industry knowledge base of the present invention; fig. 2 is a schematic flow chart of construction of a government affairs industry knowledge base according to an embodiment of the present invention.
Referring to fig. 1 and fig. 2, the method for constructing a government affairs industry knowledge base based on natural language and user behavior analysis according to the present invention includes:
s1) acquiring various unstructured data, semi-structured data and structured data in a manner from a third-party database, manually uploaded files, data captured on a webpage and user behavior data; abundant data and various acquisition modes ensure that the knowledge base acquires comprehensive, abundant and various data of various data sources, and the problem of acquiring unstructured data of open, captured and paper sources on a network is solved; the acquired data comprises collected work guide, declaration material, examination and approval operation instruction manual and related legal and regulation documents.
S2) uniformly converting the unstructured data and the semi-structured data into structured data, performing information extraction and data fusion processing to generate item classification and material classification, and solving the problem of fusion of an internal database and public data in the government affair industry; the step S2 uses a BERT (Bidirectional Encoder reproduction from transformations) based deep learning model to extract entities, entity attributes and the relationship among the entities; and S2, extracting a triple formed by the entities, the relationships and the entities, carrying out fusion disambiguation operation by taking the entities as nodes and taking the relationships between the entities as edges, and carrying out name standardization on the frequently-used relationships on the services.
S3) analyzing the user behavior data and establishing an item relation; the method specifically comprises the following steps: applying behavior event analysis, page click analysis and user behavior path analysis to establish cross-post, cross-department, cross-region and parallel relations of items; the data range of the knowledge base is expanded, the construction speed of the knowledge base is improved, and effective operation is guaranteed.
S4) examining all results, evaluating the reliability of the quantitative knowledge, and storing the results as the applicable knowledge; the quality of the knowledge base is ensured through knowledge processing, and the method specifically comprises the following steps: and establishing an entity body, adding a classification label, establishing a new entity relationship through the existing entity relationship by computer reasoning, and matching with a timing task to ensure the timeliness and the availability of the knowledge base.
Regarding information extraction, step S2 extracts entities, entity attributes, and relationships between entities. The extraction model is formed by training manually marked government affair corpora by using a BERT-based deep learning model. The model is an end-to-end structure, and entities in the input text and triples formed by the entities, the relations and the entities can be extracted at one time. For example: < civil bureau, reception time, workday 8.
Regarding data fusion, according to the triple knowledge extracted in step 2, performing fusion disambiguation operation on the node (entity) and the edge (relationship), and performing name standardization on the common relationship on the business, such as: capital, qualification, location, equipment, personnel, management regulations, subject of transaction, time of transaction, location of transaction, materials of transaction. The step S2 also comprises the steps of labeling items and materials, combing the data hierarchy and logicality, and conveniently managing and searching the materials in a mapping mode.
Step S3 of the invention supplements and perfects the data information of the item relation, the item group and the event through the user behavior analysis. The specific implementation mode is as follows: using Apriori algorithm to analyze transaction records (including personal transaction and enterprise transaction), namely transaction sequences in a certain period of time (half a year or one year), excavating frequent item sets, and cleaning to obtain associated transaction sequences (ordered and with sequential logical relationship) with higher probability; thereby obtaining the item association on the business logic and further forming a declaration event library; the construction method provided by the invention not only provides the construction method of the knowledge base, but also enhances the timeliness, universality and usability of the constructed knowledge base in the actual application scene.
Although the present invention has been described with respect to the preferred embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (8)

1. A construction method of a government affair industry knowledge base based on natural language and user behavior analysis is characterized by comprising the following steps:
s1) acquiring various unstructured data, semi-structured data and structured data in a manner from a third-party database, manually uploaded files, data captured on a webpage and user behavior data;
s2) uniformly converting the unstructured data and the semi-structured data into structured data, and performing information extraction and data fusion processing to generate item classification and material classification;
s3) analyzing the user behavior data and establishing an item relation;
and S4) examining all results, evaluating the reliability of the quantitative knowledge, and storing the quantitative knowledge as the applicable knowledge.
2. The method for constructing a government affairs industry knowledge base based on natural language and user behavior analysis according to claim 1, wherein the data obtained in step S1 includes collected work guide, declaration material, approval operation instruction manual and related legal and legal documents.
3. The method for building a government industry knowledge base based on natural language and user behavior analysis according to claim 1, wherein the step S2 extracts entities, entity attributes, and relationships between entities using a BERT-based deep learning model.
4. The method for constructing a government affair industry knowledge base based on natural language and user behavior analysis according to claim 3, wherein the step S2 extracts triples consisting of < entities, relationships and entities >, and performs fusion disambiguation operations on the triples with the entities as nodes and the relationships between the entities as edges, so as to standardize names of frequently used relationships in business.
5. The method for building a government industry knowledge base based on natural language and user behavior analysis according to claim 1, wherein the step S2 further comprises labeling matters and materials, combing data hierarchy and logic, and managing and searching materials in a graph-based manner.
6. The method for building a government industry knowledge base based on natural language and user behavior analysis according to claim 1, wherein the step S3 comprises: and applying behavior event analysis, page click analysis and user behavior path analysis to establish cross-post, cross-department, cross-region and parallel relations of the items.
7. The method as claimed in claim 1, wherein step S3 is to use Apriori algorithm to analyze the transaction sequence of transaction records in a certain period of time, dig out frequent item sets, and clean them to obtain the associated transaction sequences with higher probability, so as to obtain the business logic event association.
8. The method for building a government industry knowledge base based on natural language and user behavior analysis according to claim 1, wherein the step S4 comprises: and establishing an entity body, adding a classification label, establishing a new entity relationship through the existing entity relationship by computer reasoning, and matching with a timing task to ensure timeliness and usability of the knowledge base.
CN202211116425.XA 2022-09-14 2022-09-14 Construction method of government affair industry knowledge base based on natural language and user behavior analysis Pending CN115510025A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211116425.XA CN115510025A (en) 2022-09-14 2022-09-14 Construction method of government affair industry knowledge base based on natural language and user behavior analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211116425.XA CN115510025A (en) 2022-09-14 2022-09-14 Construction method of government affair industry knowledge base based on natural language and user behavior analysis

Publications (1)

Publication Number Publication Date
CN115510025A true CN115510025A (en) 2022-12-23

Family

ID=84503375

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211116425.XA Pending CN115510025A (en) 2022-09-14 2022-09-14 Construction method of government affair industry knowledge base based on natural language and user behavior analysis

Country Status (1)

Country Link
CN (1) CN115510025A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116628172A (en) * 2023-07-24 2023-08-22 北京酷维在线科技有限公司 Dialogue method for multi-strategy fusion in government service field based on knowledge graph

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116628172A (en) * 2023-07-24 2023-08-22 北京酷维在线科技有限公司 Dialogue method for multi-strategy fusion in government service field based on knowledge graph
CN116628172B (en) * 2023-07-24 2023-09-19 北京酷维在线科技有限公司 Dialogue method for multi-strategy fusion in government service field based on knowledge graph

Similar Documents

Publication Publication Date Title
CN111708773B (en) Multi-source scientific and creative resource data fusion method
Evans et al. A holistic view of the knowledge life cycle: the knowledge management cycle (KMC) model
Bartram et al. Untidy data: The unreasonable effectiveness of tables
CN109558393B (en) Data model construction method, device, equipment and storage medium
WO2016045153A1 (en) Information visualization method and intelligent visible analysis system based on textual resume information
CN111078897A (en) System for generating six-dimensional knowledge map
CN113505242A (en) Method and system for automatically embedding knowledge graph
Jensen et al. Data Mining for Software Process Discovery in Open Source Software Development Communities.
CN110795932A (en) Geological report text information extraction method based on geological ontology
CN115510025A (en) Construction method of government affair industry knowledge base based on natural language and user behavior analysis
Singh et al. Data mining: dirty data and data cleaning
CN114911893A (en) Method and system for automatically constructing knowledge base based on knowledge graph
CN113032496A (en) Industry brain data analysis system based on industry knowledge map
CN116431828A (en) Construction method of power grid center data asset knowledge graph database constructed based on neural network technology
CN116681084A (en) Text-oriented landslide geological disaster semantic information extraction method and system
CN116260866A (en) Government information pushing method and device based on machine learning and computer equipment
CN115587190A (en) Construction method and device of knowledge graph in power field and electronic equipment
Abdullah et al. Decision making using document driven decision support systems
CN115827885A (en) Operation and maintenance knowledge graph construction method and device and electronic equipment
CN114780744A (en) Figure resume analysis method for knowledge graph construction
Schröder Efficient High-Level Semantic Enrichment of Undocumented Enterprise Data
Štajner et al. Modeling knowledge worker activity
Liu Integrating process mining with discrete-event simulation modeling
Wang et al. Construction of knowledge graph for internal control of financial enterprises
Laukaitis et al. Formal concept analysis and information systems modeling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination