CN117874240B - Audit text classification method, system and equipment based on knowledge graph - Google Patents

Audit text classification method, system and equipment based on knowledge graph Download PDF

Info

Publication number
CN117874240B
CN117874240B CN202410278278.9A CN202410278278A CN117874240B CN 117874240 B CN117874240 B CN 117874240B CN 202410278278 A CN202410278278 A CN 202410278278A CN 117874240 B CN117874240 B CN 117874240B
Authority
CN
China
Prior art keywords
audit
text
data
knowledge graph
audit text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410278278.9A
Other languages
Chinese (zh)
Other versions
CN117874240A (en
Inventor
孟庆霖
邱巧红
陈蕾
宫成
周飞
熊德意
药炜
高镇
李森
宋岩
谭真勇
王端瑞
韩琨
葛晓舰
吕元旭
柴博
李丽娜
吴新维
孙浩然
徐邵洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Tianjin Electric Power Co Chengxi Power Supply Branch
Tianjin Chengxi Guangyuan Power Engineering Co ltd
State Grid Corp of China SGCC
State Grid Tianjin Electric Power Co Ltd
Original Assignee
Tianjin Chengxi Guangyuan Power Engineering Co ltd
State Grid Tianjin Electric Power Co Chengxi Power Supply Branch
State Grid Corp of China SGCC
State Grid Tianjin Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Chengxi Guangyuan Power Engineering Co ltd, State Grid Tianjin Electric Power Co Chengxi Power Supply Branch, State Grid Corp of China SGCC, State Grid Tianjin Electric Power Co Ltd filed Critical Tianjin Chengxi Guangyuan Power Engineering Co ltd
Priority to CN202410278278.9A priority Critical patent/CN117874240B/en
Publication of CN117874240A publication Critical patent/CN117874240A/en
Application granted granted Critical
Publication of CN117874240B publication Critical patent/CN117874240B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an audit text classification method, system and equipment based on a knowledge graph, and relates to the field of electric power audit. The method comprises the following steps: acquiring a first audit text data set, and constructing an audit text knowledge graph based on the first audit text data set; acquiring a second audit text data set, acquiring word vectors of audit problem categories in audit texts to be classified, calculating matching indexes based on the audit text knowledge spectrums and the word vectors, complementing the word vectors to the audit text knowledge spectrums when the matching indexes are smaller than a first preset value, creating a text database corresponding to the audit problem categories, and acquiring the next audit text to be classified in the second audit text data set when the matching indexes are larger than a second preset value until the second audit text data set is completely executed. The method and the system can help electric staff to quickly acquire various information of the audit text, improve accuracy and appropriateness of audit text classification in subsequent work, and realize automation of audit text classification.

Description

Audit text classification method, system and equipment based on knowledge graph
Technical Field
The invention relates to the field of electric power audit, in particular to an audit text classification method, an audit text classification system and audit text classification equipment based on a knowledge graph.
Background
The internal audit is a powerful guarantee for the internal management of the power company, and through the internal audit, the problems existing in the power company can be found and solved, the management level is improved, and the safety and stability of power supply are maintained.
Along with the social development, the audit workload of each electric company is gradually increased, the quantity of audit texts generated by the electric company is also in an ascending trend, and the types of the audit texts are various, so that the classification of the audit texts becomes important content in the audit work of the electric power system, and how to improve the classification efficiency and accuracy of the audit texts, improve the working efficiency of the audit work and realize the automation of the audit text classification is a problem to be solved urgently in the current electric power system.
Currently, the methods for audit text classification studied also have the following disadvantages:
(1) Because of the professionality and complexity of the audit text, the text classification method based on the pre-training language model needs professional audit auditors to label, has high cost for labeling a large amount of data and has low efficiency;
(2) An audit text classification algorithm based on deep learning needs a large number of training samples to improve the classification accuracy of the model;
(3) Because of the professionality and complexity of the audit text, various items of information contained in the audit text are difficult to visually represent.
Disclosure of Invention
Aiming at the problems, the invention provides an audit text classification method, an audit text classification system and audit text classification equipment based on a knowledge graph, which aim to improve the accuracy and the appropriateness of audit text classification, improve the working efficiency of audit work of a power system and realize the automation of audit text classification.
In one aspect, the invention provides an audit text classification method based on a knowledge graph, which comprises the following steps:
s1, acquiring a first audit text data set, and constructing an audit text knowledge graph based on the first audit text data set;
S2, acquiring a second audit text data set, acquiring word vectors of audit problem categories in audit texts to be classified, calculating a matching index based on the audit text knowledge graph and the word vectors, complementing the word vectors to the audit text knowledge graph when the matching index is smaller than a first preset value, and creating a text database corresponding to the audit problem categories; and when the coordination index is larger than a second preset value, acquiring the next audit text to be classified in the second audit text data set, and executing S2 until the second audit text data set is completely executed.
Further, the acquiring mode of the first audit text data set comprises a power audit text data set and a power audit text acquired from a webpage.
Further, the second audit text data set is obtained in a power audit text of daily audit records of each power company.
Further, the construction process of the audit text knowledge graph comprises the following steps:
Acquiring entity data in the first audit text data set; the entity data comprises audited units, item types, audit problem titles, audit problem categories, system basis and audit opinions;
acquiring relationship data in the first audit text data set; the relation data comprises belongings, occurrences, reasons and basis;
Acquiring attribute data in the first audit text data set; the attribute data comprise unqualified contractual service, nonstandard subcontract management, nonstandard subcontract and insufficient research depth;
The entity data, the relation data and the attribute data are subjected to data fusion and stored in a graph database in a mode of entity-relation-entity-attribute;
And obtaining the audit text knowledge graph by adopting a visualization tool.
Further, the calculation formula of the word vector is as follows:
Where m is the number of words contained in the audit question category, n is the word vector length of each word contained in the audit question category, U m×n is the vector when the word contained in the audit question category is the center word, V m×n is the vector when the word contained in the audit question category is the non-center word, and Y is the word vector.
Further, the calculation formula of the coordination index is as follows:
E is the coordination index, Y i is the i field of the word vector, Q j is the characteristic vector of the entity data audit problem category contained in the audit text knowledge graph, n is the number of the entity data audit problem category contained in the audit text knowledge graph, and max is the maximum value.
Further, the first preset value is 40%.
Further, the second preset value is 95%.
Further, the creating process of the text database includes:
based on the word vector newly completed by the audit text knowledge graph, automatically capturing data information by the background and carrying out data export;
And when the data is exported, automatically generating a file database named by the audit problem category corresponding to the word vector by adopting an online export technology.
In another aspect, the invention provides an audit text classification system based on a knowledge graph, comprising: an audit text knowledge graph construction module and an audit text classification module;
the audit text knowledge graph construction module is used for acquiring a first audit text data set and constructing an audit text knowledge graph based on the first audit text data set;
The audit text classification module is used for acquiring a second audit text data set, acquiring word vectors of audit problem categories in audit texts to be classified, calculating a matching index based on the audit text knowledge graph and the word vectors, supplementing the word vectors to the audit text knowledge graph when the matching index is smaller than a first preset value, and creating a text database corresponding to the audit problem categories; and when the coordination index is larger than a second preset value, acquiring the next audit text to be classified in the second audit text data set, and re-executing the audit text classification module until the second audit text data set is completely executed.
Further, the acquiring mode of the first audit text data set comprises a power audit text data set and a power audit text acquired from a webpage.
Further, the second audit text data set is obtained in a power audit text of daily audit records of each power company.
Further, the audit text knowledge graph construction module includes: the system comprises an entity data acquisition unit, a relation data acquisition unit, an attribute data acquisition unit, a fusion storage unit and a visualization unit;
The entity data acquisition unit is used for acquiring entity data in the first audit text data set; the entity data comprises audited units, item types, audit problem titles, audit problem categories, system basis and audit opinions;
the relation data acquisition unit is used for acquiring relation data in the first audit text data set; the relation data comprises belongings, occurrences, reasons and basis;
The attribute data acquisition unit is used for acquiring attribute data in the first audit text data set; the attribute data comprise unqualified contractual service, nonstandard subcontract management, nonstandard subcontract and insufficient research depth;
The fusion storage unit is used for carrying out data fusion on the entity data, the relation data and the attribute data and storing the data into a graph database in a mode of entity-relation-entity-attribute;
And the visualization unit is used for obtaining the audit text knowledge graph by adopting a visualization tool.
Further, the calculation formula of the word vector is as follows:
Where m is the number of words contained in the audit question category, n is the word vector length of each word contained in the audit question category, U m×n is the vector when the word contained in the audit question category is the center word, V m×n is the vector when the word contained in the audit question category is the non-center word, and Y is the word vector.
Further, the calculation formula of the coordination index is as follows:
E is the coordination index, Y i is the i field of the word vector, Q j is the characteristic vector of the entity data audit problem category contained in the audit text knowledge graph, n is the number of the entity data audit problem category contained in the audit text knowledge graph, and max is the maximum value.
Further, the first preset value is 40%.
Further, the second preset value is 95%.
Further, the creating process of the text database includes:
based on the word vector newly completed by the audit text knowledge graph, automatically capturing data information by the background and carrying out data export;
And when the data is exported, automatically generating a file database named by the audit problem category corresponding to the word vector by adopting an online export technology.
On the other hand, the invention also provides audit text classification equipment based on the knowledge graph, which comprises the following steps: the device comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
a memory for storing a computer program;
and the processor is used for realizing any one of the audit text classification methods based on the knowledge graph when executing the program stored in the memory.
The invention has at least the following beneficial effects:
According to the invention, the entity data, the relation data and the attribute data in the audit text are obtained to construct the audit text knowledge graph, so that the association relation among different entity data in the audit text is visualized, the electric staff can be helped to quickly obtain various information of the audit text, and the working efficiency is improved; the auditing problem category is converted into word vectors, the coordination index is calculated, and entity data of the auditing problem category with the coordination index lower than a first preset value is complemented to an auditing text knowledge graph, so that accuracy and appropriateness of auditing text classification in subsequent work are improved; by automatically creating a text database of audit problem categories through the background, the classification efficiency of audit texts and the working efficiency of electric staff are improved, and the automation of audit text classification is realized.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description and drawings.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method according to an embodiment of the invention;
fig. 2 is a schematic diagram of a text knowledge graph for audit according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Along with the social development, the audit workload of each electric company is gradually increased, the quantity of audit texts generated by the electric company is also in an ascending trend, and the types of the audit texts are various, so that the classification of the audit texts becomes important content in the audit work of the electric power system, and how to improve the classification efficiency and accuracy of the audit texts, improve the working efficiency of the audit work and realize the automation of the audit text classification is a problem to be solved urgently in the current electric power system.
Therefore, the invention provides an audit text classification method, an audit text classification system and audit text classification equipment based on a knowledge graph, wherein the audit text classification method, the audit text classification system and the audit text classification equipment based on the knowledge graph are based on the knowledge graph.
Fig. 1 is a flowchart of an audit text classification method based on a knowledge graph according to an embodiment of the present invention, please refer to fig. 1, and the embodiment provides an audit text classification method based on a knowledge graph, which includes:
S1, acquiring a first audit text data set, and constructing an audit text knowledge graph based on the first audit text data set;
specifically, the acquiring mode of the first audit text data set comprises a power audit text data set and a power audit text acquired from a webpage, so that various types of information of the power audit text can be covered in a maximum range;
fig. 2 is a schematic diagram of an audit text knowledge graph according to an embodiment of the present invention, referring to fig. 2, a process for constructing the audit text knowledge graph includes:
Because the audit text is unstructured data, an LSTM-CNNs-CRF model is adopted to acquire entity data in the first audit text data set; the entity data comprises audited units, item types, audit problem titles, audit problem categories, system basis and audit opinions; the LSTM-CNNs-CRF model comprises a forward LSTM layer, a reverse LSTM layer and a CRF layer, characters in the audit text are used as input, and the entity data is output through the forward LSTM layer, the reverse LSTM layer and the CRF layer in sequence;
acquiring relationship data in a first audit text data set by adopting a BERT model; the relationship data comprises belonging, occurrence, reason and basis;
Acquiring attribute data in a first audit text data set; the attribute data comprise unqualified contractual service, non-standardization of subcontract management, non-standardization of subcontract and insufficient research depth;
the entity data, the relation data and the attribute data are subjected to data fusion and stored in a graph database, and the process of data fusion comprises the following steps:
Performing data preprocessing on entity data, relation data and attribute data, wherein the data preprocessing comprises canonical grammar, grammar matching, irrelevant symbol removal, shorthand replacement and the like;
calculating the similarity of the two-by-two entity data in the entity data by adopting K-Means clustering;
And performing entity matching on the two entity data with high similarity by adopting a Limes method to finish data fusion.
The storage mode is entity-relation-entity-attribute;
And obtaining an audit text knowledge graph by adopting a visualization tool.
S2, acquiring a second audit text data set, acquiring word vectors of audit problem categories in the audit text to be classified, calculating a matching index based on the audit text knowledge graph and the word vectors, complementing the word vectors to the audit text knowledge graph when the matching index is smaller than a first preset value, and creating a text database corresponding to the audit problem categories; when the matching index is larger than a second preset value, acquiring the next audit text to be classified in the second audit text data set and executing S2 until the second audit text data set is completely executed;
Specifically, the second audit text data set is obtained by using the electric power audit text of each electric power company in a daily audit record, and the audit text to be classified is the audit text contained in the second audit text data set;
specifically, the calculation formula of the word vector of the audit problem category in the audit text to be classified is as follows:
Where m is the number of words contained in the audit problem category, n is the length of a single word vector of each word contained in the audit problem category, U m×n represents a vector when a word contained in the audit problem category is used as a central word, V m×n represents a vector when a word contained in the audit problem category is used as a non-central word, and Y is the word vector, namely a final word vector of the audit problem category.
Specifically, on the basis that the audit text knowledge graph has visual characteristics, calculating a matching index based on the audit text knowledge graph and the word vector, wherein a calculation formula of the matching index is as follows:
E is the coordination index, Y i is the i-th field of word vectors, Q j is the feature vector of the entity data audit problem category contained in the audit text knowledge graph, n is the number of the entity data audit problem category contained in the audit text knowledge graph, and max is the maximum value.
Specifically, when the matching index is between 40% and 95%, the audit problem category corresponding to the audit text to be classified is considered to exist in the audit text knowledge graph, and the audit text to be classified is not executed S2 and deleted (filtered), so that the classification speed of the audit text is improved, and the data volume of the background is reduced.
Specifically, the process of completing the word vector to the audit text knowledge graph comprises the following steps:
because the "audit problem category" corresponding to the word vector to be completed is one of the entity data, in this embodiment, the entity data to be completed is referred to as a first audit problem category entity, the entity data "audit problem category" existing in the audit text knowledge graph is referred to as a second audit problem category entity, the matching degree between the first audit problem category and the second audit problem category is calculated by adopting the evaluation index w (a, b), and the calculation formula of the evaluation index w (a, b) is as follows:
Wherein a is a first audit question category entity, b is a second audit question category entity, Is a circular correlation operation.
Specifically, the text database creation process includes:
And automatically capturing data information in the background based on the newly completed word vector of the audit text knowledge graph, and carrying out data export, wherein when the data is exported, a BLOB online export technology is adopted to automatically generate a file database named by the audit problem category corresponding to the word vector.
It is noted that after the execution of S1 and S2 in sequence is ended, the method further includes: the first audit text data set and the second audit text data set acquired by the embodiment are combined to obtain a third audit text data set, the third audit text data set is divided into a training set and a test set based on the third audit text data set, a deep learning model is built, audit text classification is realized, and because a part of data sets in the second audit text data set are filtered in S2, the training set required to be marked for training the deep learning model is greatly reduced under the condition that the classification accuracy of the deep learning model is not reduced, the workload is further reduced, and the classification efficiency is improved. The deep learning model includes, but is not limited to, a high-precision model commonly used in the prior art, and this embodiment is not repeated.
The embodiment provides an audit text classification system based on a knowledge graph, which comprises the following steps: an audit text knowledge graph construction module and an audit text classification module;
the audit text knowledge graph construction module is used for acquiring a first audit text data set and constructing an audit text knowledge graph based on the first audit text data set;
The audit text classification module is used for acquiring a second audit text data set, acquiring word vectors of audit problem categories in the audit text to be classified, calculating a matching index based on the audit text knowledge graph and the word vectors, complementing the word vectors to the audit text knowledge graph when the matching index is smaller than a first preset value, and creating a text database corresponding to the audit problem categories; and when the matching index is larger than a second preset value, acquiring the next audit text to be classified in the second audit text data set, and re-executing the audit text until the second audit text data set is completely executed.
In specific implementation, the implementation processes of the audit text classification system based on the knowledge graph and the audit text classification method based on the knowledge graph are in one-to-one correspondence, and are not repeated here.
The invention provides audit text classification equipment based on a knowledge graph, which comprises the following steps: the device comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
a memory for storing a computer program;
And the processor is used for realizing any audit text classification method based on the knowledge graph when executing the program stored in the memory.
According to the invention, the entity data, the relation data and the attribute data in the audit text are obtained to construct the audit text knowledge graph, so that the association relation among different entity data in the audit text is visualized, the electric staff can be helped to quickly obtain various information of the audit text, and the working efficiency is improved; the auditing problem category is converted into word vectors, the coordination index is calculated, and entity data of the auditing problem category with the coordination index lower than a first preset value is complemented to an auditing text knowledge graph, so that accuracy and appropriateness of auditing text classification in subsequent work are improved; by automatically creating a text database of audit problem categories through the background, the classification efficiency of audit texts and the working efficiency of electric staff are improved, and the automation of audit text classification is realized.
Although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (15)

1. The audit text classification method based on the knowledge graph is characterized by comprising the following steps of:
s1, acquiring a first audit text data set, and constructing an audit text knowledge graph based on the first audit text data set;
S2, acquiring a second audit text data set, acquiring word vectors of audit problem categories in audit texts to be classified, calculating a matching index based on the audit text knowledge graph and the word vectors, complementing the word vectors to the audit text knowledge graph when the matching index is smaller than a first preset value, and creating a text database corresponding to the audit problem categories; when the coordination index is larger than a second preset value, acquiring the next audit text to be classified in the second audit text data set and executing S2 until the second audit text data set is completely executed;
The calculation formula of the word vector is as follows:
Y=(Um×n(outside)+Vm×n(center))/2
Wherein m is the number of words contained in the audit problem category, n is the word vector length of each word contained in the audit problem category, U m×n represents a vector when the word contained in the audit problem category is used as a central word, V m×n represents a vector when the word contained in the audit problem category is used as a non-central word, and Y is the word vector;
the calculation formula of the coordination index is as follows:
E=maxj∈[1,t](YiQj) (2)
E is the coordination index, Y i is the i field of the word vector, Q j is the characteristic vector of the entity data audit problem category contained in the audit text knowledge graph, t is the number of the entity data audit problem category contained in the audit text knowledge graph, and max is the maximum value;
The process of completing the word vector to the audit text knowledge graph comprises the following steps:
the entity data to be completed is called a first audit problem category entity, the existing entity data 'audit problem category' in the audit text knowledge graph is called a second audit problem category entity, the matching degree between the first audit problem category and the second audit problem category is calculated by adopting evaluation indexes w (a, b), and the calculation formula of the evaluation indexes w (a, b) is as follows:
w(a,b)=a*b (3)
Wherein a is a first audit question category entity, b is a second audit question category entity, and is a circular correlation operation.
2. The audit text classification method according to claim 1 wherein the means for obtaining the first audit text data set includes a power audit text data set and power audit text obtained from a web page.
3. The audit text classification method according to claim 1 wherein the second audit text data set is obtained as power audit text for each power company's daily audit record.
4. The audit text classification method according to claim 1, wherein the audit text knowledge graph construction process includes:
Acquiring entity data in the first audit text data set; the entity data comprises audited units, item types, audit problem titles, audit problem categories, system basis and audit opinions;
acquiring relationship data in the first audit text data set; the relation data comprises belongings, occurrences, reasons and basis;
Acquiring attribute data in the first audit text data set; the attribute data comprise unqualified contractual service, nonstandard subcontract management, nonstandard subcontract and insufficient research depth;
The entity data, the relation data and the attribute data are subjected to data fusion and stored in a graph database in a mode of entity-relation-entity-attribute;
And obtaining the audit text knowledge graph by adopting a visualization tool.
5. The audit text classification method according to claim 1 wherein the first preset value is 40%.
6. The audit text classification method according to claim 1 wherein the second preset value is 95%.
7. The audit text classification method according to claim 1 wherein the creating of the text database includes:
based on the word vector newly completed by the audit text knowledge graph, automatically capturing data information by the background and carrying out data export;
And when the data is exported, automatically generating a file database named by the audit problem category corresponding to the word vector by adopting an online export technology.
8. Audit text classification system based on knowledge graph, characterized by comprising: an audit text knowledge graph construction module and an audit text classification module;
the audit text knowledge graph construction module is used for acquiring a first audit text data set and constructing an audit text knowledge graph based on the first audit text data set;
The audit text classification module is used for acquiring a second audit text data set, acquiring word vectors of audit problem categories in audit texts to be classified, calculating a matching index based on the audit text knowledge graph and the word vectors, supplementing the word vectors to the audit text knowledge graph when the matching index is smaller than a first preset value, and creating a text database corresponding to the audit problem categories; when the coordination index is larger than a second preset value, acquiring the next audit text to be classified in the second audit text data set and re-executing the audit text classification module until the second audit text data set is completely executed;
The calculation formula of the word vector is as follows:
Y=(Um×n(outside)+Vm×n(center))/2
Wherein m is the number of words contained in the audit problem category, n is the word vector length of each word contained in the audit problem category, U m×n represents a vector when the word contained in the audit problem category is used as a central word, V m×n represents a vector when the word contained in the audit problem category is used as a non-central word, and Y is the word vector;
the calculation formula of the coordination index is as follows:
E=maxj∈[1,t](YiQj) (2)
E is the coordination index, Y i is the i field of the word vector, Q j is the characteristic vector of the entity data audit problem category contained in the audit text knowledge graph, t is the number of the entity data audit problem category contained in the audit text knowledge graph, and max is the maximum value;
The process of completing the word vector to the audit text knowledge graph comprises the following steps:
The entity data to be completed is called a first audit problem category entity, the existing entity data 'audit problem category' in the audit text knowledge graph is called a second audit problem category entity, the matching degree between the first audit problem category and the second audit problem category is calculated by adopting evaluation indexes w (a, b), and the calculation formula of the evaluation indexes w (a, b) is as follows:
w(a,b)=a*b (3)
Wherein a is a first audit question category entity, b is a second audit question category entity, and is a circular correlation operation.
9. The audit text classification system of claim 8 wherein the means for obtaining the first audit text data set includes a power audit text data set and power audit text obtained from a web page.
10. The audit text classification system according to claim 8 wherein the second audit text data set is obtained as power audit text for each power company's daily audit record.
11. The audit text classification system of claim 8, wherein the audit text knowledge graph construction module comprises: the system comprises an entity data acquisition unit, a relation data acquisition unit, an attribute data acquisition unit, a fusion storage unit and a visualization unit;
The entity data acquisition unit is used for acquiring entity data in the first audit text data set; the entity data comprises audited units, item types, audit problem titles, audit problem categories, system basis and audit opinions;
the relation data acquisition unit is used for acquiring relation data in the first audit text data set; the relation data comprises belongings, occurrences, reasons and basis;
The attribute data acquisition unit is used for acquiring attribute data in the first audit text data set; the attribute data comprise unqualified contractual service, nonstandard subcontract management, nonstandard subcontract and insufficient research depth;
The fusion storage unit is used for carrying out data fusion on the entity data, the relation data and the attribute data and storing the data into a graph database in a mode of entity-relation-entity-attribute;
And the visualization unit is used for obtaining the audit text knowledge graph by adopting a visualization tool.
12. The audit text classification system according to claim 8 wherein the first preset value is 40%.
13. The audit text classification system according to claim 8 wherein the second preset value is 95%.
14. The audit text classification system according to claim 8 wherein the creation process of the text database includes:
based on the word vector newly completed by the audit text knowledge graph, automatically capturing data information by the background and carrying out data export;
And when the data is exported, automatically generating a file database named by the audit problem category corresponding to the word vector by adopting an online export technology.
15. Audit text classification equipment based on knowledge graph, characterized by comprising: the device comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
a memory for storing a computer program;
a processor for implementing the knowledge-graph-based audit text classification method of any one of claims 1-7 when executing a program stored on a memory.
CN202410278278.9A 2024-03-12 2024-03-12 Audit text classification method, system and equipment based on knowledge graph Active CN117874240B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410278278.9A CN117874240B (en) 2024-03-12 2024-03-12 Audit text classification method, system and equipment based on knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410278278.9A CN117874240B (en) 2024-03-12 2024-03-12 Audit text classification method, system and equipment based on knowledge graph

Publications (2)

Publication Number Publication Date
CN117874240A CN117874240A (en) 2024-04-12
CN117874240B true CN117874240B (en) 2024-06-14

Family

ID=90597173

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410278278.9A Active CN117874240B (en) 2024-03-12 2024-03-12 Audit text classification method, system and equipment based on knowledge graph

Country Status (1)

Country Link
CN (1) CN117874240B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116303976A (en) * 2023-05-12 2023-06-23 中国人民解放军国防科技大学 Penetration test question-answering method, system and medium based on network security knowledge graph

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109558492A (en) * 2018-10-16 2019-04-02 中山大学 A kind of listed company's knowledge mapping construction method and device suitable for event attribution
CN110334212A (en) * 2019-07-01 2019-10-15 南京审计大学 A kind of territoriality audit knowledge mapping construction method based on machine learning
CN110990567A (en) * 2019-11-25 2020-04-10 国家电网有限公司 Electric power audit text classification method for enhancing domain features
CN112100396B (en) * 2020-08-28 2023-10-27 泰康保险集团股份有限公司 Data processing method and device
CN112035637A (en) * 2020-08-28 2020-12-04 康键信息技术(深圳)有限公司 Medical field intention recognition method, device, equipment and storage medium
CN115168575A (en) * 2022-06-27 2022-10-11 北京至臻云智能科技有限公司 Subject supplement method applied to audit field and related equipment
CN115545468A (en) * 2022-09-28 2022-12-30 国网山东省电力公司淄博供电公司 Audit risk measurement method based on knowledge graph
CN116844731A (en) * 2023-07-07 2023-10-03 中国平安人寿保险股份有限公司 Disease classification method, disease classification device, electronic device, and storage medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116303976A (en) * 2023-05-12 2023-06-23 中国人民解放军国防科技大学 Penetration test question-answering method, system and medium based on network security knowledge graph

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
面向电网企业内控审计的知识图谱建模研究;邵磊落;《会计之友》;20211031(第20期);正文第五、六章,第124-126页 *

Also Published As

Publication number Publication date
CN117874240A (en) 2024-04-12

Similar Documents

Publication Publication Date Title
CN107704512A (en) Financial product based on social data recommends method, electronic installation and medium
US20220138193A1 (en) Conversion method and systems from natural language to structured query language
CN113051382A (en) Intelligent power failure question-answering method and device based on knowledge graph
CN111967761A (en) Monitoring and early warning method and device based on knowledge graph and electronic equipment
CN110765277A (en) Online equipment fault diagnosis platform of mobile terminal based on knowledge graph
CN113360582B (en) Relation classification method and system based on BERT model fusion multi-entity information
CN116340530A (en) Intelligent design method based on mechanical knowledge graph
CN109902305A (en) Template generation, search and text generation apparatus and method for based on name Entity recognition
CN116521898A (en) Construction method of power plant power generation equipment fault knowledge graph
CN115358481A (en) Early warning and identification method, system and device for enterprise ex-situ migration
CN113626571B (en) Method, device, computer equipment and storage medium for generating answer sentence
Yin et al. Sentence-bert and k-means based clustering technology for scientific and technical literature
CN113343701B (en) Extraction method and device for text named entities of power equipment fault defects
CN117874240B (en) Audit text classification method, system and equipment based on knowledge graph
CN116187323A (en) Knowledge graph in field of numerical control machine tool and construction method thereof
CN114722159B (en) Multi-source heterogeneous data processing method and system for numerical control machine tool manufacturing resources
CN112668836B (en) Risk spectrum-oriented associated risk evidence efficient mining and monitoring method and apparatus
CN116226371A (en) Digital economic patent classification method
Hu et al. A classification model of power operation inspection defect texts based on graph convolutional network
CN114969341A (en) Fine-grained emotion analysis method and device for catering industry comments
Sharma et al. Comprehensive study of semantic annotation: Variant and praxis
CN113688233A (en) Text understanding method for semantic search of knowledge graph
CN113505117A (en) Data quality evaluation method, device, equipment and medium based on data indexes
CN112559739A (en) Method for processing insulation state data of power equipment
Feng et al. The core technique and application of knowledge graph in power grid company administrative duty

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20240517

Address after: 300000 No. 278, Hongqi Road, Nankai District, Tianjin

Applicant after: State Grid Tianjin electric power company Chengxi power supply branch

Country or region after: China

Applicant after: STATE GRID TIANJIN ELECTRIC POWER Co.

Applicant after: STATE GRID CORPORATION OF CHINA

Applicant after: Tianjin Chengxi Guangyuan Power Engineering Co.,Ltd.

Address before: No.153 Xiangwei Road, Hebei District, Tianjin 300143

Applicant before: TIANJIN ELECTRIC POWER ENGINEERING SUPERVISION Co.,Ltd.

Country or region before: China

Applicant before: State Grid Tianjin electric power company construction branch

Applicant before: Tianjin Sanyuan Power Intelligent Technology Co.,Ltd.

Applicant before: Tianjin Chengxi Guangyuan Power Engineering Co.,Ltd.

Applicant before: Tianjin Tianyuan Electric Power Engineering Co.,Ltd.

Applicant before: Tianjin Ninghe District Ningdong Shengyuan Power Engineering Co.,Ltd.

Applicant before: STATE GRID TIANJIN ELECTRIC POWER Co.

Applicant before: STATE GRID CORPORATION OF CHINA

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant