CN115510025A - Construction method of government affair industry knowledge base based on natural language and user behavior analysis - Google Patents
Construction method of government affair industry knowledge base based on natural language and user behavior analysis Download PDFInfo
- Publication number
- CN115510025A CN115510025A CN202211116425.XA CN202211116425A CN115510025A CN 115510025 A CN115510025 A CN 115510025A CN 202211116425 A CN202211116425 A CN 202211116425A CN 115510025 A CN115510025 A CN 115510025A
- Authority
- CN
- China
- Prior art keywords
- data
- user behavior
- knowledge base
- natural language
- behavior analysis
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000010276 construction Methods 0.000 title claims abstract description 17
- 239000000463 material Substances 0.000 claims abstract description 14
- 238000000605 extraction Methods 0.000 claims abstract description 6
- 238000007499 fusion processing Methods 0.000 claims abstract description 4
- 230000006399 behavior Effects 0.000 claims description 28
- 238000000034 method Methods 0.000 claims description 12
- 230000004927 fusion Effects 0.000 claims description 6
- 238000013136 deep learning model Methods 0.000 claims description 4
- 239000000284 extract Substances 0.000 claims description 4
- 238000002372 labelling Methods 0.000 claims description 3
- 230000007774 longterm Effects 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 4
- 238000004140 cleaning Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 239000013589 supplement Substances 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000012797 qualification Methods 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Tourism & Hospitality (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- Primary Health Care (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Human Resources & Organizations (AREA)
- General Health & Medical Sciences (AREA)
- Quality & Reliability (AREA)
- Health & Medical Sciences (AREA)
- Educational Administration (AREA)
- Development Economics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a construction method of a government affair industry knowledge base based on natural language and user behavior analysis, which comprises the following steps: s1) acquiring various unstructured data, semi-structured data and structured data in a manner from a third-party database, manually uploaded files, data captured on a webpage and user behavior data; s2) uniformly converting the unstructured data and the semi-structured data into structured data, and performing information extraction and data fusion processing to generate item classification and material classification; s3) analyzing the user behavior data and establishing a matter relation; and S4) examining all results, evaluating the reliability of the quantitative knowledge, and storing the quantitative knowledge as the applicable knowledge. The data collected by the invention is comprehensive and diverse, the hierarchical logic knowledge is formed by establishing the item relation through user behavior analysis, the reliability of the quantitative knowledge is evaluated, and the long-term effective operation and the more extensive and comprehensive application of the knowledge base after being established are ensured.
Description
Technical Field
The invention relates to a construction method of a knowledge base, in particular to a construction method of a government affair industry knowledge base based on natural language and user behavior analysis.
Background
The construction of the knowledge base in the government affair industry is carried out by combing and updating in a manual mode, the obtained data source is single, the hierarchy and the logicality of the content of the knowledge base are caused to be deficient, the maintenance cost is too high due to too much manual intervention, the data updating is often not timely enough, the timeliness of knowledge cannot be guaranteed, the whole usability of the knowledge base is poor after the knowledge base is built for a period of time, therefore, the artificial intelligence related technology needs to be introduced, comprehensive and diverse data can be timely and quickly obtained through the self-learning of a computer, and the integrity and the effectiveness of the data are guaranteed.
With the new change of government affairs services, on the premise of providing better affair handling experience for common affairs handling persons, improving service content and improving affair handling efficiency, the requirement of comprehensive service window handling is provided, how to know item relation so as to provide important information for setting a comprehensive window becomes a new problem, the establishment of item combination or event according to the input and output of item handling is simple, only a series relation can be established, the supplement of human experience can only perfect data within post responsibility, data crossing posts and departments can be omitted, the understanding of item relation is incomplete, and further the service content and the number of the set comprehensive window are unreasonable; therefore, it is necessary to add user behavior, introduce more comprehensive parallel connection relation of events, cross-post, cross-department and even cross-regional attribute information, provide a comprehensive event relation diagram, and more comprehensively and reasonably establish event combination and events, and set contents and quantity of comprehensive service windows.
Disclosure of Invention
The invention aims to solve the technical problem of providing a construction method of a government affairs industry knowledge base based on natural language and user behavior analysis, which can quickly and effectively construct the knowledge base and ensure long-term effective operation and wider and comprehensive application of the knowledge base after the knowledge base is constructed.
The technical scheme adopted by the invention for solving the technical problems is to provide a construction method of a government affairs industry knowledge base based on natural language and user behavior analysis, which comprises the following steps: s1) acquiring various unstructured data, semi-structured data and structured data in a manner from a third-party database, manually uploaded files, data captured on a webpage and user behavior data; s2) uniformly converting the unstructured data and the semi-structured data into structured data, and performing information extraction and data fusion processing to generate item classification and material classification; s3) analyzing the user behavior data and establishing an item relation; and S4) examining all results, evaluating the credibility of the quantitative knowledge, and storing the quantitative knowledge as the applicable knowledge.
Further, the data acquired in step S1 includes collected transaction guide, declaration material, approval operation instruction manual, and related legal and regulatory documents.
Further, the step S2 extracts entities, entity attributes, and relationships between entities using the BERT-based deep learning model.
Further, in step S2, a triple composed of the entities, relationships, and entities is extracted, the entities are used as nodes, the relationships between the entities are sides to perform a fusion disambiguation operation, and names of frequently used relationships in a service are standardized.
Further, the step S2 further includes labeling items and materials, combing data hierarchy and logicality, and managing and searching the materials in a mapping manner.
Further, the step S3 includes: and applying behavior event analysis, page click analysis and user behavior path analysis to establish cross-post, cross-department, cross-region and parallel relations of the items.
Further, in the step S3, an Apriori algorithm is used to analyze the transaction sequence of the transaction records in a certain period of time, dig out a frequent item set, and after cleaning, obtain a correlation transaction sequence with a high probability, thereby obtaining a business logic item correlation.
Further, the step S4 includes: and establishing an entity body, adding a classification label, establishing a new entity relationship through the existing entity relationship by computer reasoning, and matching with a timing task to ensure the timeliness and the availability of the knowledge base.
Compared with the prior art, the invention has the following beneficial effects: the construction method of the government affair industry knowledge base based on the natural language and the user behavior analysis can ensure comprehensive diversity of data, establish item relationships through the user behavior analysis to form knowledge with hierarchical logicality, evaluate the credibility of the quantitative knowledge, and ensure long-term effective operation and wider and comprehensive application of the knowledge base after the knowledge base is built.
Drawings
FIG. 1 is a schematic diagram of a general construction process of a government affairs knowledge base according to the present invention;
fig. 2 is a schematic diagram of a construction process of a government affairs knowledge base according to an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the figures and examples.
FIG. 1 is a general construction flow diagram of the government industry knowledge base of the present invention; fig. 2 is a schematic flow chart of construction of a government affairs industry knowledge base according to an embodiment of the present invention.
Referring to fig. 1 and fig. 2, the method for constructing a government affairs industry knowledge base based on natural language and user behavior analysis according to the present invention includes:
s1) acquiring various unstructured data, semi-structured data and structured data in a manner from a third-party database, manually uploaded files, data captured on a webpage and user behavior data; abundant data and various acquisition modes ensure that the knowledge base acquires comprehensive, abundant and various data of various data sources, and the problem of acquiring unstructured data of open, captured and paper sources on a network is solved; the acquired data comprises collected work guide, declaration material, examination and approval operation instruction manual and related legal and regulation documents.
S2) uniformly converting the unstructured data and the semi-structured data into structured data, performing information extraction and data fusion processing to generate item classification and material classification, and solving the problem of fusion of an internal database and public data in the government affair industry; the step S2 uses a BERT (Bidirectional Encoder reproduction from transformations) based deep learning model to extract entities, entity attributes and the relationship among the entities; and S2, extracting a triple formed by the entities, the relationships and the entities, carrying out fusion disambiguation operation by taking the entities as nodes and taking the relationships between the entities as edges, and carrying out name standardization on the frequently-used relationships on the services.
S3) analyzing the user behavior data and establishing an item relation; the method specifically comprises the following steps: applying behavior event analysis, page click analysis and user behavior path analysis to establish cross-post, cross-department, cross-region and parallel relations of items; the data range of the knowledge base is expanded, the construction speed of the knowledge base is improved, and effective operation is guaranteed.
S4) examining all results, evaluating the reliability of the quantitative knowledge, and storing the results as the applicable knowledge; the quality of the knowledge base is ensured through knowledge processing, and the method specifically comprises the following steps: and establishing an entity body, adding a classification label, establishing a new entity relationship through the existing entity relationship by computer reasoning, and matching with a timing task to ensure the timeliness and the availability of the knowledge base.
Regarding information extraction, step S2 extracts entities, entity attributes, and relationships between entities. The extraction model is formed by training manually marked government affair corpora by using a BERT-based deep learning model. The model is an end-to-end structure, and entities in the input text and triples formed by the entities, the relations and the entities can be extracted at one time. For example: < civil bureau, reception time, workday 8.
Regarding data fusion, according to the triple knowledge extracted in step 2, performing fusion disambiguation operation on the node (entity) and the edge (relationship), and performing name standardization on the common relationship on the business, such as: capital, qualification, location, equipment, personnel, management regulations, subject of transaction, time of transaction, location of transaction, materials of transaction. The step S2 also comprises the steps of labeling items and materials, combing the data hierarchy and logicality, and conveniently managing and searching the materials in a mapping mode.
Step S3 of the invention supplements and perfects the data information of the item relation, the item group and the event through the user behavior analysis. The specific implementation mode is as follows: using Apriori algorithm to analyze transaction records (including personal transaction and enterprise transaction), namely transaction sequences in a certain period of time (half a year or one year), excavating frequent item sets, and cleaning to obtain associated transaction sequences (ordered and with sequential logical relationship) with higher probability; thereby obtaining the item association on the business logic and further forming a declaration event library; the construction method provided by the invention not only provides the construction method of the knowledge base, but also enhances the timeliness, universality and usability of the constructed knowledge base in the actual application scene.
Although the present invention has been described with respect to the preferred embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (8)
1. A construction method of a government affair industry knowledge base based on natural language and user behavior analysis is characterized by comprising the following steps:
s1) acquiring various unstructured data, semi-structured data and structured data in a manner from a third-party database, manually uploaded files, data captured on a webpage and user behavior data;
s2) uniformly converting the unstructured data and the semi-structured data into structured data, and performing information extraction and data fusion processing to generate item classification and material classification;
s3) analyzing the user behavior data and establishing an item relation;
and S4) examining all results, evaluating the reliability of the quantitative knowledge, and storing the quantitative knowledge as the applicable knowledge.
2. The method for constructing a government affairs industry knowledge base based on natural language and user behavior analysis according to claim 1, wherein the data obtained in step S1 includes collected work guide, declaration material, approval operation instruction manual and related legal and legal documents.
3. The method for building a government industry knowledge base based on natural language and user behavior analysis according to claim 1, wherein the step S2 extracts entities, entity attributes, and relationships between entities using a BERT-based deep learning model.
4. The method for constructing a government affair industry knowledge base based on natural language and user behavior analysis according to claim 3, wherein the step S2 extracts triples consisting of < entities, relationships and entities >, and performs fusion disambiguation operations on the triples with the entities as nodes and the relationships between the entities as edges, so as to standardize names of frequently used relationships in business.
5. The method for building a government industry knowledge base based on natural language and user behavior analysis according to claim 1, wherein the step S2 further comprises labeling matters and materials, combing data hierarchy and logic, and managing and searching materials in a graph-based manner.
6. The method for building a government industry knowledge base based on natural language and user behavior analysis according to claim 1, wherein the step S3 comprises: and applying behavior event analysis, page click analysis and user behavior path analysis to establish cross-post, cross-department, cross-region and parallel relations of the items.
7. The method as claimed in claim 1, wherein step S3 is to use Apriori algorithm to analyze the transaction sequence of transaction records in a certain period of time, dig out frequent item sets, and clean them to obtain the associated transaction sequences with higher probability, so as to obtain the business logic event association.
8. The method for building a government industry knowledge base based on natural language and user behavior analysis according to claim 1, wherein the step S4 comprises: and establishing an entity body, adding a classification label, establishing a new entity relationship through the existing entity relationship by computer reasoning, and matching with a timing task to ensure timeliness and usability of the knowledge base.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211116425.XA CN115510025A (en) | 2022-09-14 | 2022-09-14 | Construction method of government affair industry knowledge base based on natural language and user behavior analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211116425.XA CN115510025A (en) | 2022-09-14 | 2022-09-14 | Construction method of government affair industry knowledge base based on natural language and user behavior analysis |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115510025A true CN115510025A (en) | 2022-12-23 |
Family
ID=84503375
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211116425.XA Pending CN115510025A (en) | 2022-09-14 | 2022-09-14 | Construction method of government affair industry knowledge base based on natural language and user behavior analysis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115510025A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116628172A (en) * | 2023-07-24 | 2023-08-22 | 北京酷维在线科技有限公司 | Dialogue method for multi-strategy fusion in government service field based on knowledge graph |
-
2022
- 2022-09-14 CN CN202211116425.XA patent/CN115510025A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116628172A (en) * | 2023-07-24 | 2023-08-22 | 北京酷维在线科技有限公司 | Dialogue method for multi-strategy fusion in government service field based on knowledge graph |
CN116628172B (en) * | 2023-07-24 | 2023-09-19 | 北京酷维在线科技有限公司 | Dialogue method for multi-strategy fusion in government service field based on knowledge graph |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111708773B (en) | Multi-source scientific and creative resource data fusion method | |
Evans et al. | A holistic view of the knowledge life cycle: the knowledge management cycle (KMC) model | |
Bartram et al. | Untidy data: The unreasonable effectiveness of tables | |
CN109558393B (en) | Data model construction method, device, equipment and storage medium | |
WO2016045153A1 (en) | Information visualization method and intelligent visible analysis system based on textual resume information | |
CN111078897A (en) | System for generating six-dimensional knowledge map | |
CN113505242A (en) | Method and system for automatically embedding knowledge graph | |
Jensen et al. | Data Mining for Software Process Discovery in Open Source Software Development Communities. | |
CN110795932A (en) | Geological report text information extraction method based on geological ontology | |
CN115510025A (en) | Construction method of government affair industry knowledge base based on natural language and user behavior analysis | |
Singh et al. | Data mining: dirty data and data cleaning | |
CN114911893A (en) | Method and system for automatically constructing knowledge base based on knowledge graph | |
CN113032496A (en) | Industry brain data analysis system based on industry knowledge map | |
CN116431828A (en) | Construction method of power grid center data asset knowledge graph database constructed based on neural network technology | |
CN116681084A (en) | Text-oriented landslide geological disaster semantic information extraction method and system | |
CN116260866A (en) | Government information pushing method and device based on machine learning and computer equipment | |
CN115587190A (en) | Construction method and device of knowledge graph in power field and electronic equipment | |
Abdullah et al. | Decision making using document driven decision support systems | |
CN115827885A (en) | Operation and maintenance knowledge graph construction method and device and electronic equipment | |
CN114780744A (en) | Figure resume analysis method for knowledge graph construction | |
Schröder | Efficient High-Level Semantic Enrichment of Undocumented Enterprise Data | |
Štajner et al. | Modeling knowledge worker activity | |
Liu | Integrating process mining with discrete-event simulation modeling | |
Wang et al. | Construction of knowledge graph for internal control of financial enterprises | |
Laukaitis et al. | Formal concept analysis and information systems modeling |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |