CN114969337A - Method for automatically generating test questions based on case text, storage medium and electronic equipment - Google Patents

Method for automatically generating test questions based on case text, storage medium and electronic equipment Download PDF

Info

Publication number
CN114969337A
CN114969337A CN202210561195.1A CN202210561195A CN114969337A CN 114969337 A CN114969337 A CN 114969337A CN 202210561195 A CN202210561195 A CN 202210561195A CN 114969337 A CN114969337 A CN 114969337A
Authority
CN
China
Prior art keywords
information
case
question
case text
test question
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210561195.1A
Other languages
Chinese (zh)
Inventor
杨烨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zehuo Times Technology Xiamen Co ltd
Original Assignee
Zehuo Times Technology Xiamen Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zehuo Times Technology Xiamen Co ltd filed Critical Zehuo Times Technology Xiamen Co ltd
Priority to CN202210561195.1A priority Critical patent/CN114969337A/en
Publication of CN114969337A publication Critical patent/CN114969337A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • G06Q50/205Education administration or guidance
    • G06Q50/2057Career enhancement or continuing education service

Abstract

The invention provides a method, a storage medium and electronic equipment for automatically generating test questions based on case texts, wherein the method comprises the following steps: acquiring case text information; carrying out data preprocessing on the case text information to construct a training set; inputting the training set into a training model for training to obtain key value pairs of test question questions and test question answers; and generating the question related to each case text message according to the plurality of key value pairs to obtain a case knowledge base. Because the test question information is obtained by inputting preprocessed data based on the case text information into the training model for training, the training model can combine test question information of various types aiming at different case texts, the test question efficiency is greatly improved compared with a manual test question mode, the repeated test question condition is avoided to the maximum extent, and the test question information is beneficial for a person doing the test to comprehensively master knowledge points of related cases.

Description

Method for automatically generating test questions based on case text, storage medium and electronic equipment
Technical Field
The invention relates to the field of case informatization, in particular to a method, a storage medium and electronic equipment for automatically generating test questions based on case texts.
Background
With the continuous improvement of the social and economic level, the attention degree on the production safety also shows a continuously rising trend. The work of strengthening the training management work for the safety production of enterprises becomes the work key point of all levels of work units. Taking fire safety as an example, the most basic to do fire safety work is to comprehensively master relevant laws, regulations and policies and to be familiar with production and operation management characteristics of various industries. In recent years, due to various factors such as economic society development and information technology, how to get familiar with the policy and regulation and the safety accident case as soon as possible and to be able to be used skillfully becomes a first problem faced by each fire safety worker.
At present, relatively mature computer semantic understanding technology in China is applied to the fields of fire safety and the like, but research on a test question generating system applied to safety accident case texts is less, if test question information related to safety accident cases is to be compiled, a person who asks to question needs to read the whole case text in advance, then the person who asks to question to understand corresponding test questions according to the text content needs to understand, the whole process is time-consuming and labor-consuming, the method is limited by the fact that thought inertia is always the same based on the test question information generated by the same accident case text every time, and the method is not beneficial for the person who asks to question to master knowledge points in all directions.
Disclosure of Invention
Therefore, a technical scheme for automatically generating test questions based on case texts is needed to be provided, so that the problems that the mode for generating test question information based on cases is low in efficiency, the problem setting repetition rate is high and the like are solved.
In a first aspect, the present invention provides a method for automatically generating test questions based on case texts, comprising the following steps:
acquiring case text information;
carrying out data preprocessing on the case text information to construct a training set;
inputting the training set into a training model for training to obtain key value pairs of test question questions and test question answers;
and generating the question related to each case text message according to the plurality of key value pairs to obtain a case knowledge base.
As an optional embodiment, the performing data preprocessing on the case text information includes:
removing symbols except the text content in the case text information, dividing the extracted text content into sentences according to the sentence unit, and marking each sentence in a BIOES coding mode; where "B" represents the beginning of an entity, "I" represents the middle of an entity, "O" represents a non-entity, used to mark an unrelated character, "E" represents the end of an entity, and "S" represents a single word or word mark.
As an alternative embodiment, the training model comprises a BERT layer, a FLAT layer and a CRF layer;
the BERT layer is used for enhancing semantic representation of sentences to obtain serialized text input;
the FLAT layer is used for constructing two position codes for each character or vocabulary information and converting the character or vocabulary information into the position codes;
and the CRF layer is used for modeling the label sequence based on the dependency relationship existing among the label information so as to obtain the optimal sequence.
As an alternative embodiment, the flag layer includes a transform structure; the flag layer is configured to construct two position codes for each character or vocabulary information, and the converting of the character or vocabulary information into the position codes specifically includes:
head [ i ] and tail [2] are adopted to respectively represent the head and tail of a character or vocabulary information, and then four relative position distances are adopted to represent the relative relationship between two words, wherein the four relative position distances are calculated as follows:
Figure BDA0003656374660000021
Figure BDA0003656374660000031
Figure BDA0003656374660000032
Figure BDA0003656374660000033
wherein the content of the first and second substances,
Figure BDA0003656374660000034
denotes x i Head to x of j Head distance of (d);
Figure BDA0003656374660000035
denotes x i Head to x of j Tail distance of;
Figure BDA0003656374660000036
denotes x i Tail to x of j Head distance of (d);
Figure BDA0003656374660000037
denotes x i Tail to x of j Tail distance of;
a relative position code is calculated based on the above four relative position distances according to the following formula:
Figure BDA0003656374660000038
wherein, W r Is a parameter that needs to be trained and,
Figure BDA0003656374660000039
representing join operators;
Figure BDA00036563746600000310
the calculation method of (c) is as follows:
Figure BDA00036563746600000311
Figure BDA00036563746600000312
wherein d is the dimension of the position code;
the calculation result can then be input to the self-attention mechanism layer in the following manner:
Figure BDA00036563746600000313
wherein, W q 、W k,R 、W k,E U and v are parameters to be trained, and E represents the word embedding lookup table or the output of the last layer of the transform.
As an alternative embodiment, the CRF layer is configured to model the tag sequences based on the dependency relationship existing between the tag information, so as to obtain the optimal sequences, including:
the CRF layer is trained by using maximum likelihood estimation, and a conversion matrix W is introduced into the CRF layer as a parameter; for sentence X, the probability of the model annotation sequence Y ═ (Y1, Y2.., yn) is:
Figure BDA0003656374660000041
Figure BDA0003656374660000042
wherein, X represents the input sentence X1, X2.., xn, Y is the tag sequence Y1, Y2.., yn, P i,yj Indicating that the ith character is classified asProbability value of jth tag, W i,yj Representing state transition values from the ith label to the jth label, and exp representing an exponential function of a natural constant e;
the optimal sequence is the sequence with the maximum probability value.
As an alternative embodiment, the case text information includes event reason information; generating the topics related to the text information of each case according to the plurality of key-value pairs comprises:
using a regular expression to obtain a direct reason item causing an event from case text information, storing the obtained direct reason item causing the event into a first database, and matching based on a rule mode;
and/or the case text information includes event countermeasure information; generating the topics related to the text information of each case according to the plurality of key-value pairs comprises:
and acquiring event corresponding measure items from the case text information by using a regular expression, storing the acquired event corresponding measure items into a second database, and matching in a rule-based mode.
As an alternative embodiment, matching the event cause information in a rule-based manner includes:
when the direct cause entry is greater than 0 and less than 3, two [0,1 ] are generated]Random _ index and random _ index 1; if random _ index is equal to 0, generating an error judgment question; if random _ index1 is equal to 0, a reason R not belonging to the event is randomly selected from the first database O Generating test question information;
and/or matching the event countermeasure information in a rule-based manner comprises:
when the event countermeasure entry is greater than 0 and less than 3, two [0,1 ] are generated]Random _ index and random _ index 1; if random _ index is equal to 0, generating an error judgment question; if random _ index1 is equal to 0, a countermeasure C not belonging to the event is randomly selected from the second database O And generating test question information.
As an optional embodiment, the case knowledge base includes a plurality of different types of test question information, and the types of the test question information include selection questions, judgment questions, and answer questions;
the method further comprises the following steps:
receiving a question answering request of a user terminal, and recording geographical position information of the user terminal and timestamp information for initiating the question answering request;
acquiring historical answer data of the user terminal in all preset area ranges within a preset time period according to the geographical position information of the user terminal and the timestamp information for initiating the answer request, and taking the type of the test question information with the highest answer error rate as a push question type; the preset time period is a certain time length before the timestamp information of the answering request; the preset area range is an area within a preset radius range by taking the geographical position information of the user terminal as a center;
and pushing answer information corresponding to the pushed question type to a user terminal which initiates an answer request at present.
In a second aspect, the invention also provides a storage medium storing a computer program which, when executed by a processor, performs the method steps according to the first aspect of the invention.
In a third aspect, the present invention also provides an electronic device comprising a processor and a storage medium, the storage medium being as in the second aspect;
the processor is adapted to execute a computer program stored in the storage medium to perform the method steps as in the first aspect.
Different from the prior art, the invention provides a method, a storage medium and electronic equipment for automatically generating test questions based on case texts, wherein the method comprises the following steps: acquiring case text information; carrying out data preprocessing on the case text information to construct a training set; inputting the training set into a training model for training to obtain key value pairs of test question questions and test question answers; and generating the question related to each case text message according to the plurality of key value pairs to obtain a case knowledge base. Because the test question information is obtained by inputting preprocessed data based on the case text information into the training model for training, the training model can combine test question information of various types aiming at different case texts, the test question efficiency is greatly improved compared with a manual test question mode, the repeated test question condition is avoided to the maximum extent, and the test question information is beneficial for a person doing the test to comprehensively master knowledge points of related cases.
Drawings
Fig. 1 is a flowchart of a method for automatically generating test questions based on case texts according to a first embodiment of the present invention;
fig. 2 is a flowchart of a method for automatically generating test questions based on case texts according to a second embodiment of the present invention;
fig. 3 is a flowchart of a method for automatically generating test questions based on case texts according to a third embodiment of the present invention;
FIG. 4 is a diagram of the architecture of the BERT-FLAT-CRF model according to one embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a BERT model according to an embodiment of the present invention;
fig. 6 is a block diagram of an electronic device according to an embodiment of the present invention;
reference numerals:
10. an electronic device;
101. a processor;
102. a storage medium.
Detailed Description
In order to explain in detail possible application scenarios, technical principles, practical embodiments, and the like of the present application, the following detailed description is given with reference to the accompanying drawings in conjunction with the listed embodiments. The embodiments described herein are merely for more clearly illustrating the technical solutions of the present application, and therefore, the embodiments are only used as examples, and the scope of the present application is not limited thereby.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase "an embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or related to other embodiments specifically defined. In principle, in the present application, the technical features mentioned in the embodiments can be combined in any manner to form a corresponding implementable technical solution as long as there is no technical contradiction or conflict.
Unless defined otherwise, technical terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the use of relational terms herein is intended only to describe particular embodiments and is not intended to limit the present application.
In the description of the present application, the term "and/or" is a expression for describing a logical relationship between objects, indicating that three relationships may exist, for example, a and/or B, indicating that: there are three cases of A, B, and both A and B. In addition, the character "/" herein generally indicates that the former and latter associated objects are in a logical relationship of "or".
In this application, terms such as "first" and "second" are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
Without further limitation, in this application, the use of "including," "comprising," "having," or other similar expressions in phrases and expressions of "including," "comprising," or "having," is intended to cover a non-exclusive inclusion, and such expressions do not exclude the presence of additional elements in a process, method, or article that includes an element, such that a process, method, or article that includes a list of elements may include not only those elements but also other elements not expressly listed or inherent to such process, method, or article.
As is understood in the "review guidelines," in this application, the terms "greater than," "less than," "more than," and the like are to be understood as excluding the number; the expressions "above", "below", "within" and the like are understood to include the present numbers. In addition, in the description of the embodiments of the present application, "a plurality" means two or more (including two), and expressions related to "a plurality" similar thereto are also understood, for example, "a plurality of groups", "a plurality of times", and the like, unless specifically defined otherwise.
Natural Language Processing (NLP) is a theory and method for understanding and generating Natural Language by computer, and belongs to an important even core branch in the field of artificial intelligence. In the current age with data expanding continuously, text information grows exponentially, and a wide practical application scene is provided for natural language processing technology. Early NLP technologies were somewhat difficult to apply, but after the natural language processing world entered the era of deep training, NLP began to completely reveal its way at the application level. Recent popular application tasks include information extraction, named entity recognition, text summarization, emotion analysis, question and answer system, machine translation and dialogue system, etc., which all have certain social significance and make people see the actual ability of natural language processing. The pre-trained models, represented by BERT, successively exhibit capabilities that, at some task, even exceed the human level. Models that have been created in recent years are also developed around pre-training models, and among them, the models are not in the chinese domain. It is also the powerful ability of the pre-trained model that the application of NLP is gradually hot.
The invention provides a question generation method based on a safety accident case text, which can automatically generate test questions related to text contents according to the text contents for any input accident case text, and the test questions can be used for testing the mastering degree of the text contents by a person reading the text, so that the aim of performing safety education on workers is fulfilled, the safety awareness of the workers is improved, and the occurrence of safety accidents is reduced.
Specifically, the generated test questions can include various types, for example, legal time, place or number in a sentence can be extracted as an answer of a correct option when a choice question is constructed, and the rest of the sentence is used as a question stem; the main reasons of the safety accident case can be extracted as the answer to the problem; or the contents of the precautionary measures and the rectification measures in the accident case are extracted to generate related questions as answers of the choice questions or the judgment questions.
As shown in fig. 1, in a first aspect, the present invention provides a method for automatically generating test questions based on case texts, comprising the following steps:
firstly, step S101 is carried out to obtain case text information;
then, performing data preprocessing on the case text information in step S102 to construct a training set;
then step S103 is entered to input the training set into a training model for training to obtain key value pairs of test question questions and test question answers;
and then, step S104 is carried out to generate the questions related to the text information of each case according to the plurality of key value pairs to obtain a case knowledge base.
In the present embodiment, the case text information includes event basic information, event cause information, and event countermeasure information. The event basic information includes a time point, a place, etc. where the event occurs. The event reason information is a reason for occurrence of an event, and the event countermeasure information is countermeasures for the event. The event basic information, the event reason information, and the event countermeasure information are generally recorded in the case text information in the form of words or characters. The event can be a safety accident, such as a fire safety accident, a power safety accident and the like.
In this embodiment, the key-value pair refers to matching information corresponding to the test question and the test question answer, and the matching information may be stored in a key-value pair manner for subsequent indexing. The test question questions and the test question answers can be generated in real time based on the training model to avoid repetition of the generated test question information. The case knowledge base can store a plurality of different types of test question information so as to obtain real-time calling when receiving an answer request of the user terminal.
After the scheme is adopted, the test question information related to the invention is obtained by inputting the preprocessed data based on the case text information into the training model for training, and the training model can combine various test question information of different types aiming at different case texts, so that the test question efficiency is greatly improved compared with a manual test question mode, the repeated test question condition is avoided to the maximum extent, and a question maker can master knowledge points of related cases more comprehensively during question making.
In some embodiments, the data preprocessing the case text information comprises: removing symbols except the text content in the case text information, dividing the extracted text content into sentences according to the sentence unit, and marking each sentence in a BIOES coding mode; where "B" represents the beginning of an entity, "I" represents the middle of an entity, "O" represents a non-entity, used to mark an unrelated character, "E" represents the end of an entity, and "S" represents a single word or word mark.
Furthermore, the symbols other than the text content include extra spaces, special symbols such as addition, subtraction, multiplication, division and the like in the case text information. After the extracted text content is divided into sentences according to the sentence unit, the head and tail blank symbols of the sentences can be removed and then labeled, and punctuation symbols are removed if the punctuation symbols exist at the tail of the sentences.
After the case text information is subjected to data preprocessing in the mode, a training set can be constructed, sentences which are subjected to primary symbol filtering one by one are contained in the training set, and a subsequent training model can more conveniently process the data so as to generate corresponding test question information.
In certain embodiments, the training model includes a BERT layer, a FLAT layer, and a CRF layer;
the BERT layer is used for enhancing semantic representation of sentences to obtain serialized text input;
the FLAT layer is used for constructing two position codes for each character or vocabulary information and converting the character or vocabulary information into the position codes;
and the CRF layer is used for modeling the label sequence based on the dependency relationship existing among the label information so as to obtain the optimal sequence.
Preferably, the training model is a BERT-FLAT-CRF named entity recognition model, and the training model can cut and recognize sentences in a training set and recognize corresponding named entities such as time, place or number for sorting; and then splitting the identified labels through regular matching, for example, splitting according to time, place, primary reason, secondary reason and the like, and storing the splitting result into a database.
The BERT layer (first layer of the model) is the BERT word embedding layer, which employs a chinese BERT pre-trained model. The BERT model is built on the basis of a Transformer model, and comprises two layers of bidirectional Transformer models, but the BERT model only comprises an Encoder part. In the BERT model, a multi-head attention layer is used as a core processing layer, the BERT models with different scales comprise different numbers of multi-head attention mechanisms (such as 12 or 24 layers), and the number of the attention heads contained in each multi-head attention layer is also different (such as 12 or 16 layers). Since no weight sharing is done between the layers of the BERT model, one BERT model can effectively contain up to 384(24x16) different attention mechanism results. The BERT model is only a model in a pre-training stage, and other processing models are required to be added behind the BERT model when a specific NLP task is solved, so that the BERT model is perfected to be an Encoder-Decoder configuration. The structure of the BERT model is shown in FIG. 5.
In certain embodiments, the FLAT layer comprises a transform structure; the flag layer is configured to construct two position codes for each character or vocabulary information, and the converting of the character or vocabulary information into the position codes specifically includes:
head [ i ] and tail [ i ] are adopted to respectively represent the head and tail of a character or vocabulary information, and then four relative position distances are adopted to represent the relative relationship between two words, wherein the four relative position distances are calculated as follows:
Figure BDA0003656374660000111
Figure BDA0003656374660000112
Figure BDA0003656374660000113
Figure BDA0003656374660000114
wherein the content of the first and second substances,
Figure BDA0003656374660000115
denotes x i Head to x of j Head distance of (d);
Figure BDA0003656374660000116
denotes x i Head to x of j Tail distance of;
Figure BDA0003656374660000117
denotes x i Tail to x of j Head distance of (d);
Figure BDA0003656374660000118
denotes x i Tail to x of j Tail distance of;
a relative position code is calculated based on the above four relative position distances according to the following formula:
Figure BDA0003656374660000119
wherein, W r Is a parameter that needs to be trained and,
Figure BDA0003656374660000121
representing join operators; w is a group of r May be weight parameters of a neural network and the join operator may beAND, NAND (rather than AND), 0R, etc.
Figure BDA0003656374660000122
The calculation method of (c) is as follows:
Figure BDA0003656374660000123
Figure BDA0003656374660000124
wherein d is the dimension of the position code;
the calculation result can then be input to the self-attention mechanism layer in the following manner:
Figure BDA0003656374660000125
wherein, W q 、W k,R 、W k,E U and v are parameters to be trained, and E represents the word embedding lookup table or the output of the last layer of the transform.
Preferably, as shown in fig. 4, the flag layer has a classical transducer structure on the left side and the lower side, and only one layer of transducer structure. The four matrices on the right side are the flattening of the Lattice structure. Generally, a transform structure indicates position information, and the first transform structure is an absolute position, while the subsequent structure often indicates a positional relationship by a relative position, and the indication of the relative position is diversified.
Through the FLAT layer structure design, the problem of mining vocabulary information of the input text is solved, and the structure can capture long-distance dependence. In addition, the position information in the FLAT structure represents that the parallelization problem is solved, so that the time efficiency is greatly improved.
In some embodiments, the CRF layer is configured to model the tag sequences based on the dependency relationship existing between the tag information, so as to obtain the optimal sequences, including:
the CRF layer is trained by using maximum likelihood estimation, and a conversion matrix W is introduced into the CRF layer as a parameter; for sentence X, the probability of the model annotation sequence Y ═ (Y1, Y2.., yn) is:
Figure BDA0003656374660000131
Figure BDA0003656374660000132
wherein, X represents the input sentence X1, X2.., xn, Y is the tag sequence Y1, Y2.., yn, P i,yj A probability value, W, representing the classification of the ith character as the jth label i,yj Representing state transition values from the ith label to the jth label, and exp representing an exponential function of a natural constant e; the optimal sequence is the sequence with the maximum probability value.
Unlike the Softmax function employed in common deep learning models, CRF can learn dependencies between tags, for example: in the industry type tag, B-TIME cannot appear after M-TIME.
In some embodiments, the case text information includes event reason information; generating the topics related to the text information of each case according to the plurality of key-value pairs comprises: and acquiring direct reason items causing the occurrence of the events from the case text information by using a regular expression, storing the acquired direct reason items causing the occurrence of the events into a first database, and matching in a rule-based mode.
And/or the case text information comprises event countermeasure information; generating the topics related to the text information of each case according to the plurality of key-value pairs comprises: and acquiring event corresponding measure items from the case text information by using a regular expression, storing the acquired event corresponding measure items into a second database, and matching in a rule-based mode.
Preferably, the matching for the event cause information in a rule-based manner includes: when the direct cause entry is greater than 0 and less than 3, two [0,1 ] are generated]Random _ index and random _ index 1; if random _ index is equal to 0, generating an error judgment question; if random _ index1 is equal to 0, a reason R not belonging to the event is randomly selected from the first database O And generating test question information.
And/or matching the event countermeasure information in a rule-based manner comprises: when the event countermeasure entry is greater than 0 and less than 3, two [0,1 ] are generated]Random _ index and random _ index 1; if random _ index is equal to 0, generating an error judgment question; if random _ index1 is equal to 0, a countermeasure C not belonging to the event is randomly selected from the second database O And generating test question information.
The two random numbers are set to randomly generate different types of test question information (such as choice questions or judgment questions), so that the repeatability of the test question information can be effectively avoided, and a question maker can master knowledge points of related test questions more comprehensively.
As shown in fig. 2, the case knowledge base includes a plurality of different types of test question information, and the types of the test question information include selection questions, judgment questions, and resolution questions;
the method further comprises the following steps:
firstly, entering step S201 to receive an answer request of a user terminal, and recording geographical position information of the user terminal and timestamp information for initiating the answer request;
then step S202 is carried out, according to the geographical position information of the user terminal and the timestamp information of the question answering request, historical question answering data of the user terminal in all preset area ranges in a preset time period are obtained, and the type of the test question information with the highest question answering error rate is used as a push question type; the preset time period is a certain time length before the timestamp information of the answering request; the preset area range is an area within a preset radius range by taking the geographical position information of the user terminal as a center;
and then step S203 is carried out to push answer information corresponding to the pushed question type to the user terminal which initiates the answer request currently.
Taking a preset time period as one month before timestamp information for initiating an answer request and a preset radius of 1 kilometer as an example, when the system receives an answer request sent by a certain user terminal, historical answer data of all user terminals within a range of 1 kilometer by taking geographical position information of the user terminal as a center within nearly 1 month are called, and the answer data are counted, for example, the number of times of wrong answer judgment in the area is far higher than that of a selected question, the system determines that the judgment question can push a question type, and pushes related judgment question information to the user terminal which currently initiates the answer request. Through the design, user terminal can also know the answer condition of people near own district according to the propelling movement examination question except can knowing the condition of doing the question of self, for example to being convenient for follow-up declare around after the knowledge point of being liable to mistake is digested and is absorbed to the universality of safety knowledge science has been promoted.
Certainly, in order to make the pushed test question information more representative and timeliness, firstly, the test question information generated based on the case that the time information recorded in the case text information is close to the time information can be pushed according to the timestamp information initiating the answer request, secondly, the event type which is generated recently in the preset area range corresponding to the geographic position information obtained from the historical database can be obtained based on the geographic position information of the current user terminal, and if 2 times of fire disaster occurs in the last year, the system can mainly call the case text information related to the fire disaster to generate the related test question information to be pushed to the terminal; for example, if many traffic accidents happen near the location of the user terminal in the last year, the system will call the case text information related to traffic safety to generate related test question information and push the test question information to the terminal. Thus, the pushing of the test question information can be more targeted and representative.
As shown in fig. 3, in some application scenarios, the method provided by the present invention may further include the following steps:
step S1 may be entered to first obtain text data corresponding to the incident case from a plurality of different channels. Specifically, the text materials of the safety accident related cases can be obtained from different channels such as fire safety policy and regulation, actual case libraries of enterprises, the Internet and the like.
And then step S2 is carried out to preprocess the text data and establish a Named Entity Recognition (NER), a text abstract and an information extraction model. In this step, data preprocessing can be performed on the acquired accident case text material, labeling is performed, a training set is constructed, and a knowledge base is established.
And then, the step S3 is carried out to generate related test questions based on the information such as time, place, number and the like, and the list in the NER is traversed to judge so as to generate answers corresponding to the test questions.
And then, the method enters a step S4 to apply the technologies of regularization, NER and the like to obtain the reason items from the accident case text, and a step S5 to apply the technologies of regularization, NER, abstract generation, information extraction and the like to obtain the accident-related solution.
And repeating the steps S3-S5, so that the problems which can be extracted from the input safety accident case text can be described one by one, and the generated accident case text test questions are summarized to form a knowledge base.
The invention provides a question generation method based on an event case text, which fully considers the characteristics of event safety knowledge and solves the defects that the current manual question making needs to read the whole text, the time and the labor are consumed for making questions according to the text content, the questions generated by the same event case text each time are the same, and the like. In order to achieve the purposes of semantic understanding and automatic intelligence, the invention uses a word segmentation algorithm to segment the accident case text information input by a user in a natural language form, and simultaneously preprocesses and labels the text to form a training sample base based on standardization. A question generation knowledge base based on a safety accident case text is constructed through knowledge acquisition, representation and reasoning, and the method is a key for improving safety awareness of workers and reducing safety accidents.
For any input accident case text, the invention can automatically generate test questions related to the text content according to the text content, and the test questions can be used for testing the mastering degree of the text content by a person reading the text, thereby achieving the purpose of performing safety education on the staff, improving the safety awareness of the staff and reducing the occurrence of safety accidents. After the accident case text is input into the system, the system can complete generation of a set of questions related to the text content according to the text content in a short event, and the test question generation efficiency is greatly improved.
In a second aspect, the present invention also provides a storage medium storing a computer program which, when executed, performs the method steps of the first aspect of the present invention.
As shown in fig. 6, in a third aspect, the present invention further provides an electronic device 10, comprising a processor 101 and a storage medium 102, wherein the storage medium 102 is the storage medium according to the second aspect; the processor 101 is adapted to execute a computer program stored in the storage medium 102 to implement the method steps as in the first aspect.
In this embodiment, the electronic device is a computer device, including but not limited to: personal computer, server, general-purpose computer, special-purpose computer, network equipment, embedded equipment, programmable equipment, intelligent mobile terminal, intelligent home equipment, wearable intelligent equipment, vehicle-mounted intelligent equipment, etc. Storage media include, but are not limited to: RAM, ROM, magnetic disk, magnetic tape, optical disk, flash memory, U disk, removable hard disk, memory card, memory stick, network server storage, network cloud storage, etc. Processors include, but are not limited to, a CPU (Central processing Unit), a GPU (image processor), an MCU (Microprocessor), and the like.
As will be appreciated by one skilled in the art, the above-described embodiments may be provided as a method, apparatus, or computer program product. These embodiments may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. All or part of the steps of the methods related to the above embodiments may be implemented by relevant hardware instructed by a program, and the program may be stored in a storage medium readable by a computer device and used for executing all or part of the steps of the methods related to the above embodiments.
The various embodiments described above are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a computer apparatus to produce a machine, such that the instructions, which execute via the processor of the computer apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer device to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer apparatus to cause a series of operational steps to be performed on the computer apparatus to produce a computer implemented process such that the instructions which execute on the computer apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Although the embodiments have been described, once the basic inventive concept is obtained, other variations and modifications of these embodiments can be made by those skilled in the art, so that these embodiments are only examples of the present invention, and not intended to limit the scope of the present invention, and all equivalent structures or equivalent processes that can be used in the present specification and drawings, or used directly or indirectly in other related fields are encompassed by the present invention.

Claims (10)

1. A method for automatically generating test questions based on case texts is characterized by comprising the following steps:
acquiring case text information;
carrying out data preprocessing on the case text information to construct a training set;
inputting the training set into a training model for training to obtain key value pairs of test question questions and test question answers;
and generating the question related to each case text message according to the plurality of key value pairs to obtain a case knowledge base.
2. The method for automatically generating test questions for case text as recited in claim 1, wherein the data preprocessing of the case text information comprises:
removing symbols except the text content in the case text information, dividing the extracted text content into sentences according to the sentence unit, and marking each sentence in a BIOES coding mode; where "B" represents the beginning of an entity, "I" represents the middle of an entity, "O" represents a non-entity, used to mark an unrelated character, "E" represents the end of an entity, and "S" represents a single word or word mark.
3. The method for automatically generating test questions for case text as set forth in claim 2, wherein the training model comprises a BERT layer, a FLAT layer and a CRF layer;
the BERT layer is used for enhancing semantic representation of sentences to obtain serialized text input;
the FLAT layer is used for constructing two position codes for each character or vocabulary information and converting the character or vocabulary information into the position codes;
and the CRF layer is used for modeling the label sequence based on the dependency relationship among the label information so as to obtain the optimal sequence.
4. The method for automatically generating test questions for case text as recited in claim 3, wherein the FLAT layer comprises a transform structure; the flag layer is configured to construct two position codes for each character or vocabulary information, and the converting of the character or vocabulary information into the position codes specifically includes:
head [ i ] and tail [ i ] are adopted to respectively represent the head and tail of a character or vocabulary information, and then four relative position distances are adopted to represent the relative relationship between two words, wherein the four relative position distances are calculated as follows:
Figure FDA0003656374650000021
Figure FDA0003656374650000022
Figure FDA0003656374650000023
Figure FDA0003656374650000024
wherein the content of the first and second substances,
Figure FDA0003656374650000025
denotes x i Head to x of j Head distance of (d);
Figure FDA0003656374650000026
denotes x i Head to x of j Tail distance of;
Figure FDA0003656374650000027
denotes x i Tail to x of j Head distance of (d);
Figure FDA0003656374650000028
denotes x i Tail to x of j Tail distance of;
a relative position code is calculated based on the above four relative position distances according to the following formula:
Figure FDA0003656374650000029
wherein, W r Is a parameter that needs to be trained and,
Figure FDA00036563746500000210
representing join operators;
Figure FDA00036563746500000211
the calculation method of (c) is as follows:
Figure FDA00036563746500000212
Figure FDA00036563746500000213
wherein d is the dimension of the position code;
the calculation result can then be input to the self-attention mechanism layer in the following manner:
Figure FDA0003656374650000031
wherein, W q 、W k,R 、W k,E U and v are parameters to be trained, and E represents the word embedding lookup table or the output of the last layer of the transform.
5. The method for automatically generating test questions according to case text as claimed in claim 3, wherein the CRF layer is configured to model the tag sequences based on the dependency relationship existing between the tag information, so as to obtain the optimal sequences comprises:
the CRF layer is trained by using maximum likelihood estimation, and a conversion matrix W is introduced into the CRF layer as a parameter; for sentence X, the probability of the model annotation sequence Y ═ (Y1, Y2.., yn) is:
Figure FDA0003656374650000032
Figure FDA0003656374650000033
wherein, X represents the input sentence X1, X2.., xn, Y is the tag sequence Y1, Y2.., yn, P i,yj A probability value, W, representing the classification of the ith character as the jth label i,yj Representing state transition values from the ith label to the jth label, and exp representing an exponential function of a natural constant e;
the optimal sequence is the sequence with the maximum probability value.
6. The method for automatically generating test questions for case text according to claim 1, wherein the case text information includes event reason information; generating the topics related to the text information of each case according to the plurality of key-value pairs comprises the following steps:
using a regular expression to obtain a direct reason item causing an event from case text information, storing the obtained direct reason item causing the event into a first database, and matching based on a rule mode;
and/or the case text information includes event countermeasure information; generating the topics related to the text information of each case according to the plurality of key-value pairs comprises:
and acquiring event corresponding measure items from the case text information by using a regular expression, storing the acquired event corresponding measure items into a second database, and matching in a rule-based mode.
7. The method for case text automatic generation of test questions of claim 1,
matching event cause information in a rule-based manner includes:
when the direct cause entry is greater than 0 and less than 3, two [0,1 ] are generated]Random _ index and random _ index 1; if random _ index is equal to 0, generating an error judgment question; if random _ index1 is equal to 0, a reason R not belonging to the event is randomly selected from the first database O Generating test question information;
and/or matching the event countermeasure information in a rule-based manner comprises:
when the event countermeasure entry is greater than 0 and less than 3, two [0,1 ] are generated]Random _ index and random _ index 1; if random _ index is equal to 0, generating an error judgment question; if random _ index1 is equal to 0, a countermeasure C not belonging to the event is randomly selected from the second database O And generating test question information.
8. The method for automatically generating test questions according to the case text, as set forth in claim 1, wherein the case knowledge base comprises a plurality of different types of test question information, the types of the test question information comprising selection questions, judgment questions and solution questions;
the method further comprises the following steps:
receiving an answer request of a user terminal, and recording geographical position information of the user terminal and timestamp information for initiating the answer request;
acquiring historical answer data of the user terminal in all preset area ranges within a preset time period according to the geographical position information of the user terminal and the timestamp information for initiating the answer request, and taking the type of the test question information with the highest answer error rate as a push question type; the preset time period is a certain time length before the timestamp information of the answering request; the preset area range is an area within a preset radius range by taking the geographical position information of the user terminal as a center;
and pushing answer information corresponding to the pushed question type to a user terminal which initiates an answer request at present.
9. A storage medium, characterized in that the storage medium stores a computer program which, when executed, implements the method of any one of claims 1 to 8.
10. An electronic device comprising a processor and a storage medium according to claim 9;
the processor is configured to execute a computer program stored in the storage medium to implement the method of any one of claims 1 to 8.
CN202210561195.1A 2022-05-23 2022-05-23 Method for automatically generating test questions based on case text, storage medium and electronic equipment Pending CN114969337A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210561195.1A CN114969337A (en) 2022-05-23 2022-05-23 Method for automatically generating test questions based on case text, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210561195.1A CN114969337A (en) 2022-05-23 2022-05-23 Method for automatically generating test questions based on case text, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN114969337A true CN114969337A (en) 2022-08-30

Family

ID=82984549

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210561195.1A Pending CN114969337A (en) 2022-05-23 2022-05-23 Method for automatically generating test questions based on case text, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN114969337A (en)

Similar Documents

Publication Publication Date Title
CN111985239B (en) Entity identification method, entity identification device, electronic equipment and storage medium
CN108549658A (en) A kind of deep learning video answering method and system based on the upper attention mechanism of syntactic analysis tree
CN111222305A (en) Information structuring method and device
CN113268610B (en) Intent jump method, device, equipment and storage medium based on knowledge graph
CN111966812A (en) Automatic question answering method based on dynamic word vector and storage medium
CN113268561B (en) Problem generation method based on multi-task joint training
CN114297399A (en) Knowledge graph generation method, knowledge graph generation system, storage medium and electronic equipment
CN116186237A (en) Entity relationship joint extraction method based on event cause and effect inference
CN116737922A (en) Tourist online comment fine granularity emotion analysis method and system
CN114492460A (en) Event causal relationship extraction method based on derivative prompt learning
CN113011196B (en) Concept-enhanced representation and one-way attention-containing subjective question automatic scoring neural network model
CN113051904B (en) Link prediction method for small-scale knowledge graph
CN113627194A (en) Information extraction method and device, and communication message classification method and device
CN112035629B (en) Method for implementing question-answer model based on symbolized knowledge and neural network
CN114969337A (en) Method for automatically generating test questions based on case text, storage medium and electronic equipment
CN115935969A (en) Heterogeneous data feature extraction method based on multi-mode information fusion
CN115905458A (en) Event extraction method based on machine reading understanding model
CN115270746A (en) Question sample generation method and device, electronic equipment and storage medium
CN115357711A (en) Aspect level emotion analysis method and device, electronic equipment and storage medium
CN112579666A (en) Intelligent question-answering system and method and related equipment
CN111428005A (en) Standard question and answer pair determining method and device and electronic equipment
CN117743315B (en) Method for providing high-quality data for multi-mode large model system
CN113987135A (en) Bank product problem retrieval method and device
CN113094489A (en) Question word classifier-based neural network problem generation method and system
CN115510865A (en) Method and device for identifying title entity of product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination