CN114595459B - Question rectification suggestion generation method based on deep learning - Google Patents

Question rectification suggestion generation method based on deep learning Download PDF

Info

Publication number
CN114595459B
CN114595459B CN202111584344.8A CN202111584344A CN114595459B CN 114595459 B CN114595459 B CN 114595459B CN 202111584344 A CN202111584344 A CN 202111584344A CN 114595459 B CN114595459 B CN 114595459B
Authority
CN
China
Prior art keywords
entity
definition
relation
training
samples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111584344.8A
Other languages
Chinese (zh)
Other versions
CN114595459A (en
Inventor
黄鹏
贾梦妮
刘钰
邱杰
刘德安
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Digital Intelligence Technology Co Ltd
Original Assignee
China Telecom Digital Intelligence Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Digital Intelligence Technology Co Ltd filed Critical China Telecom Digital Intelligence Technology Co Ltd
Priority to CN202111584344.8A priority Critical patent/CN114595459B/en
Publication of CN114595459A publication Critical patent/CN114595459A/en
Application granted granted Critical
Publication of CN114595459B publication Critical patent/CN114595459B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The invention discloses a problem rectification suggestion generation method based on deep learning, which comprises the following steps: adding entity labels to the safety problem data of the training set and the verification set, and then splicing with various entity definition types; constructing an entity recognition model; constructing a relation extraction model; performing equal-insurance evaluation on a system to be evaluated, then performing entity identification by using an entity identification model, outputting entity information, performing entity relationship extraction by using a relationship extraction model, and outputting entity relationship information; performing knowledge graph reasoning according to the determined entity information and entity relationship information to obtain related information of the rectification suggestion; and generating the correction suggestion of the security assessment safety problem by combining the entity information, the entity relationship information, the correction suggestion related information and the like. The invention has the advantages of improving the generation quality and efficiency of safety problem correction suggestions and reducing the labor cost.

Description

Question rectification suggestion generation method based on deep learning
Technical Field
The invention relates to the technical field of rectification suggestion generation. More particularly, the invention relates to a problem rectification suggestion generation method based on deep learning.
Background
The information security level protection evaluation (information security level protection evaluation) is an evaluation organization with qualification certified by the ministry of public security, entrusted by relevant units according to the regulations of the information security level protection of the country, and carries out detection and evaluation on the security level protection condition of the information system according to relevant management regulations and technical standards.
For the found safety problems in the process of carrying out equal-protection evaluation on the safety of the information system by an evaluation mechanism, an evaluator needs to manually write corresponding safety problem correction suggestions, and the manual writing has the following problems:
the efficiency is low: a large amount of data needs to be looked up manually, so that the efficiency is difficult to improve;
the manpower cost is high: professional staff are needed to participate, and the labor cost is high;
there is no uniform standard: the suggestions have no uniform standard and the quality is uneven due to the diversity of problems and the individual subjective reasons of an evaluator;
the problem rectification suggestion generation method which is high in efficiency, can reduce labor cost and has unified standards is a problem which is urgently needed to be solved at present.
Disclosure of Invention
It is an object of the present invention to address at least the above problems and to provide at least the advantages described hereinafter.
The invention also aims to provide a problem rectification suggestion generation method based on deep learning, which utilizes safety problem data to combine with context to carry out semantic analysis, carries out knowledge reasoning on safety problems through a knowledge graph on the basis of knowledge extraction, and then combines a text generation model (GPT-2) with key information to generate a safety problem rectification suggestion with high readability, thereby reducing the labor cost and improving the generation quality of the safety problem rectification suggestion.
To achieve these objects and other advantages and in accordance with the purpose of the invention, as embodied and broadly described herein, there is provided a problem adjustment and modification recommendation generating method based on deep learning, comprising the steps of:
adding entity labels to safety problem data of the training set and the verification set, and then splicing the safety problem data with various entity definition types serving as priori knowledge to obtain an entity definition training set and an entity definition verification set;
the method comprises the steps of taking RoBERTA-wwm-large-ext as a pre-training model, taking an entity definition training set as input, training two classifiers respectively used for predicting an entity starting position and an entity ending position, and constructing an entity identification model, wherein the entity definition verification set is used for verifying and determining whether the entity identification model is trained completely;
establishing an entity pair according to an output result of the entity recognition model, adding an entity boundary, an entity definition type and a host guest type as identifiers before and after the entity, and splicing the identifiers with various relation definition types serving as priori knowledge to obtain a relation definition training set and a relation definition verification set;
taking RoBERTA-wwm-large-ext as a pre-training model, taking a relation definition training set as input, training two classifiers, a Softmax classifier and a binary classifier, then carrying out dynamic weighted averaging on the results of the two classifiers to obtain predicted entity relation classification, and constructing a relation extraction model, wherein the relation definition verification set is used for verifying and determining whether the relation extraction model is trained or not;
performing equal-insurance evaluation on a system to be evaluated, then performing entity identification by using an entity identification model, outputting entity information, and performing entity relationship extraction by using a relationship extraction model on the basis of the output entity information to output entity relationship information;
and searching in the knowledge graph according to the determined entity information and entity relationship information, then carrying out knowledge graph reasoning to obtain related information of the rectification suggestion, and generating the rectification suggestion for the security assessment safety problems by combining the entity information, the entity relationship information and the rectification suggestion related information.
Preferably, the splicing with various entity definition types as prior knowledge specifically comprises: and (3) splicing the defined types of all types of entities with the safety problem data of the training set and the verification set by character strings, then segmenting, limiting the text length to be 512, and re-splicing the part of the text length exceeding 512 with the defined types of all types of entities by character strings.
Preferably, before the entity definition training set is used as an input to train the pre-training model, the method further includes: randomly filtering samples accounting for 35-40% of the entity definition training lumped samples, wherein the content of negative samples of each entity definition type is determined, the entity definition types are sorted from large to small according to the content of the negative samples, the amount of the samples filtered from the samples corresponding to the previous entity definition type is not less than the amount of the samples filtered from the samples corresponding to the later entity definition type, the content of the negative samples of each entity definition type is the number of the negative samples of the entity definition type/the total number of the samples of the entity definition type, and the negative samples of the entity definition type are samples of which the entity definition type is not matched with the safety problem data.
Preferably, the step of determining whether the training of the entity identification model is completed by verifying the entity definition verification set specifically includes:
and after the pre-training model is iterated for a preset number of times, respectively obtaining loss values of an entity definition training set and an entity definition verification set, and determining whether the training of the entity identification model is finished according to the trend relation of the loss values of the entity definition training set and the entity definition verification set, wherein the loss value is the sum of the loss values of two classifiers which are respectively used for predicting the entity starting position and the entity ending position.
Preferably, before the training of the pre-training model by using the relationship definition training set as an input, the method further includes:
randomly filtering samples accounting for 35-40% of the relation definition training lumped samples, determining the content of negative samples of each relation definition type, sorting the relation definition types according to the content of the negative samples from large to small, wherein the amount of the samples filtered from the samples corresponding to the previous relation definition type is not less than the amount of the samples filtered from the samples corresponding to the next relation definition type, the content of the negative samples of each relation definition type is the number of the negative samples of the relation definition type/the total number of the samples of the relation definition type, and the negative samples of the relation definition type are the samples of which the relation definition type is not matched with the safety problem data.
Preferably, the step of determining whether the training of the relationship extraction model is completed by using the relationship definition verification set as input verification specifically comprises:
and after the pre-training model is iterated for a preset number of times, respectively obtaining loss values of a relation definition training set and a relation definition verification set, and determining whether the entity recognition model is trained or not according to the trend relation of the loss values, wherein the loss values are errors between the predicted entity relation classification and the real entity relation classification.
Preferably, the number of training set samples is 70% to 90% of the total number of training set and validation set samples.
Preferably, the entity recognition is performed by using an entity recognition model, and the entity information is output, specifically:
respectively splicing various entity definition types with the safety problem data of the system to be evaluated, and then carrying out length segmentation on the spliced text, wherein the length of the text exceeds 512, and splicing the new safety problem data with various entity definition types again to obtain an entity definition test set;
and taking the entity definition test set as the input of an entity identification model, identifying the entity, and outputting entity information, wherein the entity information comprises the entity and an entity position.
Preferably, the relationship extraction model is used for extracting the entity relationship, and the outputting the entity relationship information specifically comprises:
after obtaining an output result of the entity identification model, establishing an entity pair, and adding an entity boundary, an entity definition type and a host type as identifiers to the front and the back of the entity to obtain a relationship definition test set;
and taking the relation definition test set as the input of the relation extraction model to obtain entity relation information.
Preferably, the weights of each output layer are updated during the training of the entity recognition model.
The invention at least comprises the following beneficial effects:
firstly, semantic analysis is carried out by combining safety problem data with context, knowledge reasoning is carried out on safety problems through a knowledge map on the basis of knowledge extraction, and then a high-readability safety problem rectification suggestion is generated by combining a text generation model (GPT-2) with key information, so that the labor cost is reduced, and the generation quality of the safety problem rectification suggestion is improved;
secondly, improving output data of the entity recognition model, updating the weight of each output layer (representing layer) in the iterative loop training process of the model, and improving the entity recognition accuracy by giving different weights to different layers;
thirdly, improving input data of a relation extraction model, splicing a relation definition type with the input data, then fusing two kinds of information of an entity host-guest boundary and an entity type, training one Softmax classification and one secondary classification, and finally carrying out dynamic weighted average on results of the two classifiers to obtain the final entity relation classification, so that the model accuracy is improved;
fourthly, the unmatched data are filtered in the model training process, the training data are thinned, and the performance of the knowledge extraction model (the entity recognition model and the relation extraction model) is effectively improved.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention.
Detailed Description
The present invention is further described in detail below with reference to examples so that those skilled in the art can practice the invention with reference to the description.
The problem rectification suggestion generation method based on deep learning comprises the following steps:
step one, constructing a database
1.1, acquiring safety problem data in the process of equal insurance evaluation;
"security issue data" is, for example: the application system does not provide a security audit module, the audit record is directly written into the database, and the Elasticissearch does not start the audit function;
1.2, preprocessing the safety problem data to obtain preprocessed safety problem data, wherein the preprocessing comprises processing special characters (such as single quotation marks, spaces, abnormal symbols, case and case conversion and the like), and aims to avoid the problem of confusion in execution due to the fact that the special characters are understood as special meanings and break away from the design intention in the subsequent model training process;
1.3, adding an entity label (namely performing BIESO labeling) to the preprocessed safety problem data to obtain sample data, wherein all the sample data form a sample set, and the sample set is stored in a database, wherein the number of the sample data in the sample set is not less than 2 ten thousand;
wherein, the BIESO labeling specifically comprises: "B" represents a beginning character of a vocabulary, "I" represents a middle character of the vocabulary, "E" represents an ending character of the vocabulary, "S" represents the vocabulary as a single character, "O" represents that the character is not in the vocabulary, e.g., for security issue data "application system does not provide security audit module, audit record is written directly to database, and for the Elasticsearch does not enable audit function" the notation is: b-sys I-sys I-sys E-sys O O B-fun I-fun I-fun I-fun I-fun E-fun O B-info I-info I-info E-info O O B-app I-app E-app O B-app I-app I-app I-app I-app I-app I-app E-app O B-fun I-fun I-fun E-fun;
step two, training to obtain an entity recognition model
2.1: randomly acquiring 70-90% of sample data from the sample set to form a training set;
obtaining residual sample data to form a verification set;
2.2: data processing
2.2.1, constructing an entity definition type data set, wherein the entity definition type data set comprises entity definition types related to the data of the equal protection evaluation security problems, such as a system entity (system, for short), a function entity (function, for short), a software entity (software, for short), an information entity (info, for short), an application entity (app), and the like;
2.2.2 for the training set:
taking various entity definition types as query, performing character string splicing with sample data of a training set, then performing length segmentation on the spliced text, limiting the text length to be 512 (wherein the text length is insufficient for completion), further performing character string splicing on the part with the text length exceeding 512 again, taking the part as new sample data and various entity definition types, and obtaining an entity definition training set;
randomly filtering samples accounting for 35-40% of the total samples for an entity definition training set, wherein the content of negative samples of each entity definition type is determined, the entity definition types are sorted according to the negative sample content from large to small, the amount of the samples filtered by the previous entity definition type is not less than that of the samples of the next entity definition type, and the filtered training set is obtained, wherein the entity negative samples are 'safety problem data' which are not matched with the entity definition types;
for example, the negative sample content of each entity definition type is randomly filtered in all entity definition samples according to the proportion of the negative sample content, specifically: the safety problem data of the training set is K, the entity definition type is N types, and the data are respectively marked as (N) 1 、N 2 、N 3 、·····、N i 、······、N n ) Correspondingly, the content of the negative sample corresponding to each entity definition type is respectively marked as (W) 1 、W 2 、W 3 、·····、W i 、······、W n ) If the total number of samples is (N × K), the number of samples to be filtered is M ═ 35% × N × K, and in summary, the type N is defined for the ith entity i In other words, the negative sample content W i Corresponding to the number of samples to be randomly filtered
X i =(W i /(W 1 +W 2 +W 3 +·····+W i +······+W n ))*M;
2.2.3 for validation set
Taking various entity definition types as query, performing character string splicing with sample data of a verification set, then performing length segmentation on the spliced text, limiting the text length to be 512 (wherein the text length is insufficient for completion), further performing character string splicing on the part with the text length exceeding 512 again serving as new sample data and various entity definition types to obtain an entity definition verification set;
2.3: entity identification
2.3.1, taking a Chinese training model RoBERTA-wwm-large-ext issued by the Harbour and Daisei combined laboratory as a pre-training model;
2.3.2, taking the screened training set as the input of a pre-training model;
2.3.3, fine-tuning (fine-turning) the pre-training model, after iterating for a preset number of times, outputting a word vector (entity type) corresponding to each token of each safety problem data, and obtaining an initial entity identification model after the current training is finished, wherein the word vector corresponding to each token of the safety problem data is obtained by adding corresponding weights in a matching way of 24 output layers, the weight of each output layer (representing layer) is updated in the iteration cycle training process of the model, and the preset iteration number can be set according to the actual situation of a specific operator;
when the screened training set passes through the pre-training model once and returns once, this process is called an Epoch (one of the predetermined number of times), (that is, all the training samples are forward propagated and backward propagated in the pre-training model, and then, a bit, an Epoch is a process of training all the screened training sets once.
2.3.4, after obtaining the output corresponding to the screened training set, training two classifiers respectively for predicting the entity span starting position and the entity span ending position, and obtaining a loss value a of the initial entity recognition model according to the prediction result, specifically, the loss value a is the sum of the loss values of the two classifiers, and the loss value of each classifier is the error between the prediction result and the real label, for example: loss value loss (start) CE (predicted start, labeled start) of the classifier for predicting the start position of the entity span, loss value loss (end) CE (predicted end, labeled end) of the classifier for predicting the end position of the entity span, loss value a (start) + loss (end)
Wherein, the entity span starting position and the entity span ending position sequence are a sequence which is equal to the input text and is composed of 0 and 1, and the 1 corresponds to the starting position and the ending position of the entity segment;
2.3.4, outputting word vectors corresponding to each safety problem data by using the entity definition verification set as the input of the initial entity identification model, training two classifiers after obtaining the word vectors corresponding to the entity definition verification set by using the initial entity identification model, respectively predicting the start position and the end position of the entity span, and obtaining a loss value B of the initial entity identification model according to the prediction result;
2.3.5, determining whether an optimal loss value A is obtained or not according to the relation between the loss value A and the loss value B and the convergence condition of the pre-training model in the training process, if so, obtaining a trained entity recognition model, and if not, repeating the steps 2.3.3-2.3.5 to determine to obtain the optimal loss value A;
the preset iteration times in the step 2.3.3 can be set according to specific operators based on actual conditions, and the new preset iteration times in the process of repeating the steps 2.3.3-2.3.5 can be adjusted by referring to the relation walking direction of the loss value A and the loss value B in multiple tests;
step three: training acquisition relation extraction model
3.1: data processing
3.1.1, constructing a relation definition type data set, wherein the relation definition type data set comprises relation definition types related to the data of the equal insurance evaluation safety problems, such as calling relation, storage relation, providing function relation and the like;
3.1.2 for training set
After obtaining the output results corresponding to the entity recognition model and the screened training set data, establishing an entity pair (pair), and adding an entity boundary (from the first word of the entity to the last word of the entity), an entity definition type, a main guest type (Subject, abbreviated as S) and a guest entity (Object, abbreviated as O)) as identifiers before and after the entity (entity Span) to obtain the processed training set output result, for example, for an entity pair (application system, security audit module) in the security problem data "application system does not provide a security audit module", the application system provides no < O: fun >;
splicing various relation definition types serving as priori knowledge with the processed output result to obtain a relation definition training set;
the method comprises the steps of filtering samples which account for 35-40% of total samples randomly for a relation definition training set, determining the content of negative samples of each relation definition type, sequencing the relation definition types from large to small according to the content of the negative samples, and obtaining a filtered training set output result, wherein the filtered sample amount of a previous relation definition type is not less than that of a next relation definition type, and the relation negative sample is a training set output result which is not matched with the relation definition types;
wherein, the negative sample content of each relationship definition type (the negative sample number of the relationship definition type/the total number of samples of the relationship definition type) is randomly filtered according to the proportion of the negative sample content in all the relationship definition type samples,the method specifically comprises the following steps: the safety problem data of the training set is H, the entity definition type is G type, and the safety problem data are respectively marked as (G) 1 、G 2 、G 3 、·····、G i 、······、G g ) Correspondingly, the content of the negative sample corresponding to each entity definition type is respectively marked as (L) 1 、L 2 、L 3 、·····、L i 、······、L g ) If the total number of samples is (H × G), the number of samples to be filtered is Y ═ 35% × (H × G), and in summary, the type G is defined for the i-th entity i In other words, the negative sample content is L i Corresponding to the number of samples to be randomly filtered
S i =(L i /(L 1 +L 2 +L 3 +·····+L i +······+L g ))*Y;
3.1.2 for validation set
After an output result corresponding to the entity identification model and the verification set data is obtained, an entity pair (pair) is established, and an entity boundary and an entity type are used as identifiers and are added to the front and the back of an entity Span to obtain a processed verification set output result; splicing various relation definition types serving as priori knowledge with the processed output result to obtain a relation definition verification set;
3.2: relationship extraction
3.2.1, taking a Chinese training model RoBERTA-wwm-large-ext issued by the Harbour and Daisei combined laboratory as a pre-training model;
3.2.2, defining a training set by a relation as the input of a pre-training model;
3.2.3, fine-tuning (fine-tuning) the pre-training model, iterating for a preset number of times, outputting codes corresponding to each token, and obtaining an initial relation extraction model after the current training is finished;
3.2.4, after obtaining the output corresponding to the relation definition training set, training two classifiers, namely a Softmax classifier and a binary classifier, then carrying out dynamic weighted average on the results of the two classifiers to obtain a predicted entity relation classification, and obtaining a loss value C of the initial entity recognition model according to the prediction result, wherein the loss value C is the error between the predicted entity relation classification and the real entity relation classification;
3.2.4, taking the relationship definition verification set as the input of the initial relationship extraction model, outputting codes (entity relationship classification) corresponding to each token, combining two classifiers, namely a Softmax classifier and a binary classifier, then carrying out dynamic weighted average on the results of the two classifiers to obtain predicted entity relationship classification, and obtaining a loss value D of the initial entity identification model according to the prediction result, wherein the loss value D is the error between the predicted entity relationship classification and the real entity relationship classification;
2.3.5, determining whether an optimal loss value C is obtained or not according to the relation between the loss value C and the loss value D and the convergence condition of the pre-training model in the training process, if so, obtaining a trained relation extraction model, and if not, repeating the steps 3.2.3-3.2.5 until the optimal loss value C is determined;
the preset iteration times in the step 3.2.3 can be set based on actual conditions according to specific operators, and the new preset iteration times in the process of repeating the steps 3.2.3-3.2.5 can be adjusted by referring to the relation of the loss value C and the loss value D in multiple tests;
step four, generation of rectification suggestion
4.1, performing equal-protection evaluation on the system to be evaluated to generate corresponding safety problem data to form a test set;
4.2, preprocessing safety problem data of the test set, wherein the preprocessing is specifically processing special characters;
4.3, respectively carrying out character string splicing on various entity definition types and safety problem data preprocessed by the test set, then carrying out length segmentation on the spliced text, wherein the text with the length exceeding 512 is used as new safety problem data to be spliced with definitions of various entities again to obtain an entity definition test set;
4.3, taking the entity definition test set as the input of the entity identification model, carrying out entity identification, and outputting the entity and the entity position corresponding to the safety problem data as the output result of the entity identification model;
4.4, after obtaining the output result of the entity recognition model, establishing an entity pair, and adding an entity boundary, an entity definition type and a host guest type as identifiers to the front and back of an entity Span to obtain a relationship definition test set;
4.5, taking the relation definition test set as the input of the relation extraction model to obtain the entity relation;
4.6, searching in the knowledge graph according to the determined entity information and entity relation information, and then carrying out knowledge graph reasoning to obtain related information of the rectification suggestion;
and 4.7, generating an equal-protection evaluation safety problem rectification and modification suggestion by combining the entity information, the entity relation information and the rectification and modification suggestion related information obtained in the steps through a pre-training model GPT-2.
While embodiments of the invention have been described above, it is not limited to the applications set forth in the description and the embodiments, which are fully applicable to various fields of endeavor for which the invention may be embodied with additional modifications as would be readily apparent to those skilled in the art, and the invention is not limited to the details given herein without departing from the generic concept as defined by the claims and their equivalents.

Claims (10)

1. The problem rectification suggestion generation method based on deep learning is characterized by comprising the following steps of:
adding entity labels to safety problem data of a training set and a verification set, and then splicing the safety problem data with various entity definition types serving as priori knowledge to obtain an entity definition training set and an entity definition verification set;
the method comprises the steps of taking RoBERTA-wwm-large-ext as a pre-training model, taking an entity definition training set as input, training two classifiers respectively used for predicting an entity starting position and an entity ending position, and constructing an entity identification model, wherein an entity definition verification set is used for verifying and determining whether the entity identification model is trained completely;
establishing an entity pair according to an output result of the entity identification model, adding an entity boundary, an entity definition type and a host guest type as identifiers before and after the entity, and splicing the entity boundary, the entity definition type and the host guest type with various relation definition types as priori knowledge to obtain a relation definition training set and a relation definition verification set;
taking RoBERTA-wwm-large-ext as a pre-training model, taking a relation definition training set as input, training two classifiers, a Softmax classifier and a binary classifier, then carrying out dynamic weighted average on results of the two classifiers to obtain predicted entity relation classification, and constructing a relation extraction model, wherein the relation definition verification set is used for verifying and determining whether the relation extraction model is trained completely;
performing equal-insurance evaluation on a system to be evaluated, then performing entity identification by using an entity identification model, outputting entity information, and performing entity relationship extraction by using a relationship extraction model on the basis of the output entity information to output entity relationship information;
and searching in the knowledge graph according to the determined entity information and entity relation information, then reasoning the knowledge graph to obtain related information of the rectification suggestion, and generating the rectification suggestion for the security assessment safety problems by combining the entity information, the entity relation information and the rectification suggestion related information.
2. The method for generating the problem rectification proposal based on the deep learning as claimed in claim 1, wherein the splicing with various entity definition types as the prior knowledge is specifically: and splicing character strings of various entity definition types and safety problem data of a training set and a verification set, then segmenting, defining the text length to be 512, and re-splicing the text length exceeding 512 with the character strings of various entity definition types.
3. The method of claim 1, wherein before training the pre-trained model with the entity-defined training set as input, the method further comprises: randomly filtering samples accounting for 35-40% of the entity definition training lumped samples, determining the content of negative samples of each entity definition type, sorting the entity definition types from large to small according to the content of the negative samples, wherein the amount of the samples filtered from the samples corresponding to the previous entity definition type is not less than the amount of the samples filtered from the samples corresponding to the later entity definition type, the content of the negative samples of each entity definition type is the number of the negative samples of the entity definition type/the total number of the samples of the entity definition type, and the negative samples of the entity definition type are samples of which the entity definition type is not matched with the safety problem data.
4. The method for generating the problem rectification proposal based on the deep learning as claimed in claim 1, wherein the step of determining whether the training of the entity recognition model is completed by the verification of the entity definition verification set specifically comprises the following steps:
and after the pre-training model is iterated for a preset number of times, respectively obtaining loss values of an entity definition training set and an entity definition verification set, and determining whether the training of the entity identification model is finished according to the trend relation of the loss values of the entity definition training set and the entity definition verification set, wherein the loss value is the sum of the loss values of two classifiers which are respectively used for predicting the entity starting position and the entity ending position.
5. The method of claim 1, wherein before training a pre-trained model using a relationship definition training set as input, the method further comprises:
randomly filtering samples accounting for 35-40% of the relation definition training lumped samples, determining the content of negative samples of each relation definition type, sorting the relation definition types according to the content of the negative samples from large to small, wherein the amount of the samples filtered from the samples corresponding to the previous relation definition type is not less than the amount of the samples filtered from the samples corresponding to the next relation definition type, the content of the negative samples of each relation definition type is the number of the negative samples of the relation definition type/the total number of the samples of the relation definition type, and the negative samples of the relation definition type are the samples of which the relation definition type is not matched with the safety problem data.
6. The method for generating the problem rectification proposal based on the deep learning as claimed in claim 1, wherein the step of determining whether the relation extraction model is trained by using the relation definition verification set as the input verification specifically comprises the following steps:
and after the pre-training model is iterated for a preset number of times, respectively obtaining loss values of a relation definition training set and a relation definition verification set, and determining whether the entity recognition model is trained or not according to the trend relation of the loss values, wherein the loss values are errors between the predicted entity relation classification and the real entity relation classification.
7. The method of claim 1, wherein the number of training set samples is 70% -90% of the total number of training set and validation set samples.
8. The method for generating the problem rectification proposal based on the deep learning as claimed in claim 1, wherein the entity recognition model is used for entity recognition and entity information is output, specifically:
respectively splicing various entity definition types with the safety problem data of the system to be evaluated in a character string mode, then carrying out length segmentation on the spliced text, wherein the length of the text exceeds 512, splicing the new safety problem data serving as the new safety problem data with various entity definition types again to obtain an entity definition test set;
and taking the entity definition test set as the input of an entity identification model, identifying the entity, and outputting entity information, wherein the entity information comprises the entity and an entity position.
9. The method for generating the problem rectification proposal based on the deep learning as claimed in claim 8, wherein the relationship extraction model is used for extracting the entity relationship, and the outputting the entity relationship information is specifically:
after obtaining an output result of the entity identification model, establishing an entity pair, and adding an entity boundary, an entity definition type and a host guest type as identifiers to the front and the back of the entity to obtain a relationship definition test set;
and taking the relation definition test set as the input of the relation extraction model to obtain entity relation information.
10. The method of claim 1, wherein the weights for each output layer are updated during training of the entity recognition model.
CN202111584344.8A 2021-12-22 2021-12-22 Question rectification suggestion generation method based on deep learning Active CN114595459B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111584344.8A CN114595459B (en) 2021-12-22 2021-12-22 Question rectification suggestion generation method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111584344.8A CN114595459B (en) 2021-12-22 2021-12-22 Question rectification suggestion generation method based on deep learning

Publications (2)

Publication Number Publication Date
CN114595459A CN114595459A (en) 2022-06-07
CN114595459B true CN114595459B (en) 2022-08-16

Family

ID=81814025

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111584344.8A Active CN114595459B (en) 2021-12-22 2021-12-22 Question rectification suggestion generation method based on deep learning

Country Status (1)

Country Link
CN (1) CN114595459B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598000A (en) * 2019-08-01 2019-12-20 达而观信息科技(上海)有限公司 Relationship extraction and knowledge graph construction method based on deep learning model
CN111563166A (en) * 2020-05-28 2020-08-21 浙江学海教育科技有限公司 Pre-training model method for mathematical problem classification
CN112163097A (en) * 2020-09-23 2021-01-01 中国电子科技集团公司第十五研究所 Military knowledge graph construction method and system
CN113242236A (en) * 2021-05-08 2021-08-10 国家计算机网络与信息安全管理中心 Method for constructing network entity threat map
CN113254667A (en) * 2021-06-07 2021-08-13 成都工物科云科技有限公司 Scientific and technological figure knowledge graph construction method and device based on deep learning model and terminal
CN113590837A (en) * 2021-07-29 2021-11-02 华中农业大学 Deep learning-based food and health knowledge map construction method
CN113779260A (en) * 2021-08-12 2021-12-10 华东师范大学 Domain map entity and relationship combined extraction method and system based on pre-training model

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160335544A1 (en) * 2015-05-12 2016-11-17 Claudia Bretschneider Method and Apparatus for Generating a Knowledge Data Model

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598000A (en) * 2019-08-01 2019-12-20 达而观信息科技(上海)有限公司 Relationship extraction and knowledge graph construction method based on deep learning model
CN111563166A (en) * 2020-05-28 2020-08-21 浙江学海教育科技有限公司 Pre-training model method for mathematical problem classification
CN112163097A (en) * 2020-09-23 2021-01-01 中国电子科技集团公司第十五研究所 Military knowledge graph construction method and system
CN113242236A (en) * 2021-05-08 2021-08-10 国家计算机网络与信息安全管理中心 Method for constructing network entity threat map
CN113254667A (en) * 2021-06-07 2021-08-13 成都工物科云科技有限公司 Scientific and technological figure knowledge graph construction method and device based on deep learning model and terminal
CN113590837A (en) * 2021-07-29 2021-11-02 华中农业大学 Deep learning-based food and health knowledge map construction method
CN113779260A (en) * 2021-08-12 2021-12-10 华东师范大学 Domain map entity and relationship combined extraction method and system based on pre-training model

Also Published As

Publication number Publication date
CN114595459A (en) 2022-06-07

Similar Documents

Publication Publication Date Title
CN111145052A (en) Structured analysis method and system of judicial documents
CN109872162B (en) Wind control classification and identification method and system for processing user complaint information
CN106296195A (en) A kind of Risk Identification Method and device
CN110413319B (en) Code function taste detection method based on deep semantics
CN110162478B (en) Defect code path positioning method based on defect report
CN111428504B (en) Event extraction method and device
CN112800232B (en) Case automatic classification method based on big data
CN111428511B (en) Event detection method and device
CN115470354B (en) Method and system for identifying nested and overlapped risk points based on multi-label classification
CN113742733A (en) Reading understanding vulnerability event trigger word extraction and vulnerability type identification method and device
CN112966708A (en) Chinese crowdsourcing test report clustering method based on semantic similarity
CN110197213B (en) Image matching method, device and equipment based on neural network
CN111144462B (en) Unknown individual identification method and device for radar signals
CN111581346A (en) Event extraction method and device
CN111723852A (en) Robust training method for target detection network
CN114595459B (en) Question rectification suggestion generation method based on deep learning
CN117009223A (en) Software testing method, system, storage medium and terminal based on abstract grammar
CN114564942B (en) Text error correction method, storage medium and device for supervision field
CN114610882A (en) Abnormal equipment code detection method and system based on electric power short text classification
CN113657986A (en) Hybrid neural network-based enterprise illegal funding risk prediction method
CN114764913A (en) Case element identification method integrated with label information
CN114330350A (en) Named entity identification method and device, electronic equipment and storage medium
CN113569957A (en) Object type identification method and device of business object and storage medium
CN116028631B (en) Multi-event detection method and related equipment
CN117251599B (en) Video corpus intelligent test optimization method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant