CN110222345A - Cloze Test answer method, apparatus, electronic equipment and storage medium - Google Patents
Cloze Test answer method, apparatus, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN110222345A CN110222345A CN201910528256.2A CN201910528256A CN110222345A CN 110222345 A CN110222345 A CN 110222345A CN 201910528256 A CN201910528256 A CN 201910528256A CN 110222345 A CN110222345 A CN 110222345A
- Authority
- CN
- China
- Prior art keywords
- word
- data
- vector
- document data
- answer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 31
- 230000015654 memory Effects 0.000 claims abstract description 79
- 238000004590 computer program Methods 0.000 claims description 21
- 239000011159 matrix material Substances 0.000 claims description 20
- 238000004364 calculation method Methods 0.000 claims description 8
- 230000007787 long-term memory Effects 0.000 claims description 8
- 230000009467 reduction Effects 0.000 claims description 3
- 230000006870 function Effects 0.000 description 14
- 230000003287 optical effect Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000012015 optical character recognition Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/126—Character encoding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to a kind of Cloze Test answer method, apparatus, electronic equipment and storage mediums.The described method includes: the term vector of the word in building topic document data;The word vector of generation word after two-way length memory network coding is carried out to the term vector of the word in the topic document data;The term vector of word in Construct question data, and problem vector is generated after encoding to the term vector of the word in described problem data;The problems in the word vector of each word in topic document data and described problem data vector are made dot product and normalized to obtain the probability value of each word in the topic document data, and the probability of same word in the topic document data is merged to obtain the probability of all words in the topic document data;And final result is determined according to judging result and the answer data.
Description
Technical field
The present invention relates to computer disposal fields, and in particular to a kind of Cloze Test answer method, apparatus, electronic equipment and
Storage medium.
Background technique
Cloze test topic is that a kind of the most popular topic type and difficulty are big in the Middle School English examination question of China various regions, score
The low topic type of rate.In general, cloze test, which is read, understands it is the problem typical that relationship is excavated between a kind of document and inquiry,
The setting of the distracter of cloze test topic is mainly related to literary meaning.Existing automatic cloze test system is based on n meta-model structure
The simple system built, the main frequency occurred in a document according to phrase can only be answered and be had already appeared in document to judge
Combinations of words, but lack inferential capability, for the regular collocation for not having to occur in text, then it can not provide answer.
Summary of the invention
In view of the foregoing, it is necessary to propose a kind of Cloze Test answer method, apparatus, electronic equipment and computer-readable
Storage medium, in the case where solving the problems, such as that the combination of Cloze Test and Cloze Test answer regular collocation does not occur in the text
The problem of answer can not be provided automatically.
The first aspect of the application provides a kind of Cloze Test answer method, which is characterized in that the described method includes:
Obtain answer data, wherein the answer data include topic document data, problem data and answer data;
Construct the term vector of the word in the topic document data;
The term vector of word in the topic document data is carried out after two-way length memory network encodes described in generation
The word vector of word in topic document data;
The term vector of word in described problem data is constructed, and the term vector of the word in described problem data is carried out double
To length memory network coding and will be after the term vector of coding is exported by the output layer of the two-way length memory network
Generation problem vector;
The problems in the word vector of each word in the topic document data and described problem data vector are made
Dot product simultaneously normalizes and obtains the probability value of each word in the topic document data, and by phase in the topic document data
Probability with word merges to obtain the probability of all words in the topic document data;
Judge whether the word in the topic document data in the word and answer data of maximum probability is identical;And
Final result is determined according to judging result and the answer data.
Preferably, described to determine that final result includes: according to judging result and the answer data
It, will when the word of maximum probability is identical as the word vector in the answer data in the topic document data
Word identical with the word of maximum probability is confirmed as final result in the answer data.
It, will be in institute when the word of maximum probability is identical as the word vector in answer data in the topic document data
It states word identical with the word of maximum probability in answer data and is confirmed as final result.
Preferably, the term vector of the word in the building topic document data includes:
The coding vector that one-hot coding generates word is carried out respectively to each word in the answer data;
Construct the word embeded matrix of all words in the answer data;And
The coding vector of word in the topic document data is subjected to product calculation with institute's predicate embeded matrix and is dropped
The term vector of word in the topic document data is obtained after dimension.
Preferably, each word in the answer data carry out respectively one-hot coding generate word coding to
Amount includes:
The coding vector of the word in the answer data is determined according to all word quantity in the answer data
Dimension, and determine according to sorting position of the word in the answer data target dimension of the coding vector of the word
Position sets 1 for the coding vector at the target dimension position, and ties up the target is removed in the coding vector
Dimension position other than degree position is set as 0.
Preferably, the term vector to the word in the topic document data carries out two-way length memory network coding
The word vector for generating the word in the topic document data afterwards includes:
The term vector of word in the topic document data is established described in the first two-way length memory network and training
First two-way length memory network, according to formulaUtilize the trained described first two-way length memory network pair
The term vector of word in the topic document data is encoded, wherein E (x) is the term vector of word in the topic document data, and s is described
Position of the current hidden layer in the described first two-way length memory network in first two-way length memory network.
Preferably, the term vector to the word in described problem data carry out two-way length memory network encode and incite somebody to action
Generating problem vector after the term vector of coding is by the output layer output of the two-way length memory network includes:
Second two-way length memory network and training described second are established to the term vector of the word in described problem data
Two-way length memory network, according to formulaThe term vector utilization of word in described problem data has been trained
Second two-way length memory network is encoded, wherein G (x) is
The term vector of word in described problem data, t be in the described second two-way length memory network current hidden layer described the
Position in two two-way length memory networks.
Preferably, in the word vector by each word in the topic document data and described problem data
Problem vector makees dot product and normalizing and obtains the probability value of each word in the topic document data
By by the two-way splicing of each word of the first two-way shot and long term memory network in the topic document data
The second shot and long term memory network head and the tail splicing vector in vector and described problem data makees dot product and normalizes to obtain the topic
The probability value of each word in mesh document data.
The second aspect of the application provides a kind of Cloze Test answering device, and described device includes:
Data acquisition module, for obtaining answer data, wherein the answer data include topic document data, problem
Data and answer data;
Term vector constructs module, for constructing the term vector of the word in the topic document data;
Word vector generation module carries out two-way length note for the term vector to the word in the topic document data
Recall the word vector that the word in the topic document data is generated after network code;
Problem vector generation module, for constructing the term vector of word in described problem data, and to described problem data
In the term vector of word carry out two-way length memory network coding and the two-way length will be passed through by the term vector of coding
Problem vector is generated after the output layer output of memory network;
Computing module, for will be in the word vector of each word in the topic document data and described problem data
The problem of vector make dot product and normalize to obtain the probability value of each word in the topic document data, and by the topic
The probability of same word merges to obtain the probability of all words in the topic document data in document data;
Judgment module, for judging that the word in the topic document data in the word and answer data of maximum probability is
It is no identical;And
Answer determining module, for when the word in the topic document data in the word and answer data of maximum probability
When vector is identical, word identical with the word of maximum probability in the answer data is confirmed as final result.
The third aspect of the application provides a kind of electronic equipment, and the electronic equipment includes processor, and the processor is used
The Cloze Test answer method is realized when executing the computer program stored in memory.
The fourth aspect of the application provides a kind of computer readable storage medium, is stored thereon with computer program, described
The Cloze Test answer method is realized when computer program is executed by processor.
In the present invention, by the word vector of each word in the topic document data and asking in described problem data
Topic vector makees dot product and normalizes to obtain the corresponding probability value of each word in topic document data, and by the topic number of files
It merges to obtain the probability of all words in the topic document data according to the probability value of middle same word;When maximum probability
Word identical with the word of maximum probability in answer data is confirmed when word is identical as the word vector in answer data
For final result, and when the word vector in the word and answer data of maximum probability is not identical by calculating the answer number
In in the word vector of each word and the topic document data between word vector corresponding to the word of maximum probability
COS distance value, and word corresponding to maximum COS distance value is confirmed as final result, to solve Cloze Test
There is not the problem of can not providing answer automatically in the case where regular collocation in the text in the combination of problem and Cloze Test answer.
Detailed description of the invention
Fig. 1 is the flow chart of Cloze Test answer method in an embodiment of the present invention.
Fig. 2 is the structure chart of Cloze Test answering device in an embodiment of the present invention.
Fig. 3 is the schematic diagram of electronic equipment in an embodiment of the present invention.
Specific embodiment
To better understand the objects, features and advantages of the present invention, with reference to the accompanying drawing and specific real
Applying example, the present invention will be described in detail.It should be noted that in the absence of conflict, embodiments herein and embodiment
In feature can be combined with each other.
In the following description, numerous specific details are set forth in order to facilitate a full understanding of the present invention, described embodiment is only
It is only a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill
Personnel's every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
Unless otherwise defined, all technical and scientific terms used herein and belong to technical field of the invention
The normally understood meaning of technical staff is identical.Term as used herein in the specification of the present invention is intended merely to description tool
The purpose of the embodiment of body, it is not intended that in the limitation present invention.
Preferably, Cloze Test answer method of the present invention is applied in one or more electronic equipment.The electronics is set
Standby is that one kind can be according to the instruction for being previously set or storing, and the automatic equipment for carrying out numerical value calculating and/or information processing is hard
Part include but is not limited to microprocessor, specific integrated circuit (Application Specific Integrated Circuit,
ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), digital processing unit (Digital
Signal Processor, DSP), embedded device etc..
The electronic equipment can be the calculating such as desktop PC, laptop, tablet computer and cloud server
Equipment.The equipment can carry out man-machine friendship by modes such as keyboard, mouse, remote controler, touch tablet or voice-operated devices with user
Mutually.
Embodiment 1
Fig. 1 is the flow chart of Cloze Test answer method in an embodiment of the present invention.The stream according to different requirements,
The sequence of step can change in journey figure, and certain steps can be omitted.
As shown in fig.1, the Cloze Test answer method specifically includes the following steps:
Step S11 obtains answer data, wherein the answer data include topic document data, problem data and answer
Data.
In present embodiment, the answer data can be obtained from the database of server.For example, can be from the service
What is stored in the database of device includes that the answer data are obtained in the electronic document of answering information.In another embodiment
In, the answer data are obtained from the paper document comprising answering information by scanning means.Specifically, the scanning means
The transmitting optical signal in the paper document is obtained using optical technology, wherein papery text is carried in the reflected light signal
The information content in part.The scanning means further pass through optical character identification (Optic Character Recognize,
OCR) the information content for including in technology analysis reflected light signal, to obtain the answer data.It is described in present embodiment
Answer data can be English data.
Step S12 constructs the term vector of the word in the topic document data.
In present embodiment, the term vector of the word in the building topic document data includes:
A) each word in the answer data is carried out respectively solely hot (one-hot) coding generate the coding of word to
Amount;
B) the word embeded matrix of all words in the answer data is constructed;And
C) coding vector of the word in the topic document data and institute's predicate embeded matrix are subjected to product calculation simultaneously
The term vector of word in the topic document data is obtained after dimensionality reduction.
In a specific embodiment, each word in the answer data carries out one-hot coding generation respectively
The coding vector of word includes: the word determined in the answer data according to all word quantity in the answer data
Coding vector dimension, and the coding vector of the word is determined according to sorting position of the word in the answer data
The position of target dimension sets 1 for the coding vector at the target dimension position, and will remove in the coding vector
Dimension position other than the target dimension position is set as 0, so realizes the building to the coding vector of the word.Example
Such as, it is assumed that the word in answer data has 1000, then is determined according to the quantity of word in the answer data to the answer
Each word in data carry out respectively the word generated after one-hot coding coding vector be 1000 dimension, and if the word exist
Sorting position in the answer data is located at i-th bit and sets place, then the coding vector of the word is arranged at i-th dimension position
It is 0, and set 0 for the position in the coding vector in addition to target dimension.In present embodiment, according to the answer number
The word embeded matrix of answer data described in feature construction according to middle word.For example, if word includes 300 in the answer data
Feature, then the dimension of institute's predicate embeded matrix is configured to 1000 × 300.In present embodiment, the topic number of files is being generated
In the coding vector of word in and the building answer data after the word embeded matrix of all words according to formula e (x)=
x·weCalculate the term vector of word in the topic document data, wherein x is the coding of the word in topic document data
Vector, weFor the word embeded matrix of all words in the answer data.
Step S13 carries out two-way length memory network (Long to the term vector of the word in the topic document data
Short-Term Memory, LSTM) the word vector of word in the topic document data is generated after coding.
In present embodiment, the first two-way length is established to the term vector of the word in the topic document data and remembers net
Network and the training first two-way length memory network, according to formulaUtilize the described first two-way length trained
Short memory network encodes the term vector of the word in the topic document data, whereinE (x) is the term vector of word in the topic document data, and s is
Current position of the hidden layer in the described first two-way length memory network in the first two-way length memory network.
Step S14 constructs the term vector of the word in described problem data, and to the word of the word in described problem data
Vector carries out two-way length memory network coding and will pass through the defeated of the two-way length memory network by the term vector of coding
Problem vector is generated after layer output out.
Described in present embodiment building described problem data in word term vector include: will be in described problem data
The coding vector and institute's predicate embeded matrix of word carry out product calculation and obtain the term vector of word in described problem data.
In present embodiment, the second two-way length memory network is established simultaneously to the term vector of the word in described problem data
The training second two-way length memory network, according to formulaTo the term vector of the word in described problem data
It is encoded using the trained second two-way length memory network, wherein G (x) is the term vector of word in described problem data, and t is described second
Position of the current hidden layer in the described second two-way length memory network in two-way length memory network.
Step S15, by the word vector of each word in the topic document data and asking in described problem data
Topic vector makees dot product and normalizes to obtain the probability value of each word in the topic document data, and by the topic document
The probability of same word merges to obtain the probability of all words in the topic document data in data.
In present embodiment, by by the two-way spelling of each word of shot and long term memory network two-way in topic document data
Shot and long term memory network head and the tail splicing vector in vector and described problem data is connect to make dot product and normalize to obtain the topic text
The probability value of each word in file data.
Step S16, judge in the topic document data word of maximum probability and the word in answer data whether phase
Together.
Step S17 determines final result according to judging result and the answer data.
In present embodiment, when the word vector in the topic document data in the word and answer data of maximum probability
Word identical with the word of maximum probability in the answer data is confirmed as final result when identical.
In present embodiment, when the word vector in the topic document data in the word and answer data of maximum probability
The word vector of word in the answer data is constructed when not identical, and calculates separately the list of each word in the answer data
COS distance value in term vector and the topic document data between word vector corresponding to the word of maximum probability, and will
Word corresponding to maximum COS distance value is confirmed as final result in the answer data.
In present embodiment, the word vector for constructing word in the answer data includes: to judge in the answer data
Word whether be included in the topic document data;When the word in the answer data is included in the topic number of files
According to it is middle when, the coding vector of the word in described problem data and institute's predicate embeded matrix are subjected to product calculation and obtain described ask
Inscribe the term vector of word in data;And it is set when the word in the answer data is not comprised in the topic document data
Word vector in the fixed answer data is null vector.
In the present invention, by the word vector of each word in the topic document data and asking in described problem data
Topic vector makees dot product and normalizes to obtain the corresponding probability value of each word in topic document data, and by the topic number of files
It merges to obtain the probability of all words in the topic document data according to the probability value of middle same word;When maximum probability
Word identical with the word of maximum probability in answer data is confirmed when word is identical as the word vector in answer data
For final result, and when the word vector in the word and answer data of maximum probability is not identical by calculating the answer number
In in the word vector of each word and the topic document data between word vector corresponding to the word of maximum probability
COS distance value, and word corresponding to maximum COS distance value is confirmed as final result, to solve Cloze Test
There is not the problem of can not providing answer automatically in the case where regular collocation in the text in the combination of problem and Cloze Test answer.
Embodiment 2
Fig. 2 is the structure chart of Cloze Test answering device 40 in an embodiment of the present invention.
In some embodiments, the Cloze Test answering device 40 is run in electronic equipment.The Cloze Test is answered
Inscribing device 40 may include multiple functional modules as composed by program code segments.It is each in the Cloze Test answering device 40
The program code of a program segment can store in memory, and as performed by least one processor, to execute Cloze Test
The function of answer.
In the present embodiment, function of the Cloze Test answering device 40 according to performed by it can be divided into multiple
Functional module.As shown in fig.2, the Cloze Test answering device 40 may include data acquisition module 401, term vector building
Module 402, word vector generation module 403, problem vector generation module 404, computing module 405, judgment module 406 and answer
Determining module 407.The so-called module of the present invention refers to that one kind performed by least one processor and can be completed solid
Determine the series of computation machine program segment of function, storage is in memory.It is described in some embodiments, the function about each module
It can will be described in detail in subsequent embodiment.
It is described to obtain answer data according to acquisition module 401, wherein the answer data include topic document data, problem
Data and answer data.
It is described that the answer data are obtained from the database of server according to acquisition module 401 in present embodiment.Example
Such as, described according to obtain that module 401 stores from the database of the server includes to obtain in the electronic document of answering information
Take the answer data.In another embodiment, described according to obtaining module 401 by scanning means from including answering information
The answer data are obtained in paper document.Specifically, the scanning means is obtained in the paper document using optical technology
Transmitting optical signal, wherein the information content in paper document is carried in the reflected light signal.The scanning means is into one
Step analyzes the information content for including in reflected light signal by optical character recognition technology, to obtain the answer data.This
In embodiment, the answer data can be English data.
The term vector building module 402 constructs the term vector of the word in the topic document data.
In present embodiment, term vector building module 402 construct the word of the word in the topic document data to
Amount includes:
A) each word in the answer data is carried out respectively solely hot (one-hot) coding generate the coding of word to
Amount;
B) the word embeded matrix of all words in the answer data is constructed;And
C) coding vector of the word in the topic document data and institute's predicate embeded matrix are subjected to product calculation simultaneously
The term vector of word in the topic document data is obtained after dimensionality reduction.
In a specific embodiment, to each list in the answer data described in the term vector building module 402
Word carries out one-hot coding and generates the coding vector of word to include: true according to all word quantity in the answer data respectively
The dimension of the coding vector of word in the fixed answer data, and it is true according to sorting position of the word in the answer data
The position of the target dimension of the coding vector of the fixed word, sets the coding vector at the target dimension position
1, and 0 is set by the dimension position in the coding vector in addition to the target dimension position, so realize to the list
The building of the coding vector of word.For example, it is assumed that the word in answer data has 1000, then according to word in the answer data
Quantity determine the word generated after one-hot coding is carried out respectively to each word in the answer data coding vector be
1000 dimension, and if sorting position of the word in the answer data be located at i-th bit and set place, by the coding of the word
Vector is set as 0 at i-th dimension position, and sets 0 for the position in the coding vector in addition to target dimension.This implementation
In mode, answer data described in feature construction of the term vector building module 402 also according to word in the answer data
Word embeded matrix.For example, the dimension of institute's predicate embeded matrix constructs if word includes 300 features in the answer data
It is 1000 × 300.In present embodiment, answer described in the coding vector and building for generating the word in the topic document data
According to formula e (x)=xw after the word embeded matrix of all words in topic dataeCalculate word in the topic document data
Term vector, wherein x be topic document data in word coding vector, weFor all words in the answer data
Word embeded matrix.
The word vector generation module 403 carries out two-way length to the term vector of the word in the topic document data
The word vector of the word in the topic document data is generated after memory network coding.
In present embodiment, the word vector generation module 403 to the word of the word in the topic document data to
Amount establishes the first two-way length memory network and the training first two-way length memory network, according to formulaBenefit
The term vector of the word in the topic document data is encoded with the trained described first two-way length memory network,
Wherein,E (x) be the topic document data in word word to
Amount, s are current position of the hidden layer in the described first two-way length memory network in the described first two-way length memory network.
Described problem vector generation module 404 constructs the term vector of the word in described problem data, and to described problem
The term vector of word in data carries out two-way length memory network coding and passes through the term vector for passing through coding described two-way
Problem vector is generated after the output layer output of length memory network.
In present embodiment, described problem vector generation module 404 constructs the term vector packet of word in described problem data
It includes: the coding vector of the word in described problem data and institute's predicate embeded matrix being subjected to product calculation and obtain described problem number
According to the term vector of middle word.
In present embodiment, described problem vector generation module 404 builds the term vector of the word in described problem data
Vertical second two-way length memory network and the training second two-way length memory network, according to formulaTo described
The term vector of word in problem data is encoded using the trained second two-way length memory network, whereinG (x) is the term vector of word in described problem data, and t is institute
State position of the hidden layer current in the second two-way length memory network in the described second two-way length memory network.
The computing module 405 is by the word vector of each word in the topic document data and described problem data
The problems in vector make dot product and normalize to obtain the probability value of each word in the topic document data, and by the topic
The probability of same word merges to obtain the probability of all words in the topic document data in mesh document data.
In present embodiment, the computing module 405 is by by shot and long term memory network two-way in topic document data
Shot and long term memory network head and the tail splicing vector makees dot product and normalizing in the two-way splicing vector and described problem data of each word
Change and obtains the probability value of each word in the topic document data.
The judgment module 406 judges the word in the topic document data in the word and answer data of maximum probability
It is whether identical.
The answer determining module 407 determines final result according to judging result and the answer data.
In present embodiment, the answer determining module 407 is used for the list when maximum probability in the topic document data
It is when word is identical as the word vector in answer data that word identical with the word of maximum probability in the answer data is true
Think final result.
In present embodiment, the answer determining module 407 is also used to when maximum probability in the topic document data
Word vector in word and answer data constructs the word vector of word in the answer data when not identical, and calculates separately
List corresponding to word of the word vector of each word with maximum probability in the topic document data in the answer data
COS distance value between term vector, and word corresponding to COS distance value maximum in the answer data is confirmed as most
Whole answer.
In present embodiment, the word vector that the answer determining module 407 constructs word in the answer data includes:
Judge whether the word in the answer data is included in the topic document data;When the word packet in the answer data
When being contained in the topic document data, the coding vector of the word in described problem data and institute's predicate embeded matrix are carried out
Product calculation obtains the term vector of word in described problem data;And described in being not comprised in when the word in the answer data
The word vector in the answer data is set when in topic document data as null vector.
In the present invention, by the word vector of each word in the topic document data and asking in described problem data
Topic vector makees dot product and normalizes to obtain the corresponding probability value of each word in topic document data, and by the topic number of files
It merges to obtain the probability of all words in the topic document data according to the probability value of middle same word;When maximum probability
Word identical with the word of maximum probability in answer data is confirmed when word is identical as the word vector in answer data
For final result, and when the word vector in the word and answer data of maximum probability is not identical by calculating the answer number
In in the word vector of each word and the topic document data between word vector corresponding to the word of maximum probability
COS distance value, and word corresponding to maximum COS distance value is confirmed as final result, to solve Cloze Test
There is not the problem of can not providing answer automatically in the case where regular collocation in the text in the combination of problem and Cloze Test answer.
Embodiment 3
Fig. 3 is the schematic diagram of electronic equipment 6 in an embodiment of the present invention.
The electronic equipment 6 includes memory 61, processor 62 and is stored in the memory 61 and can be described
The computer program 63 run on processor 62.The processor 62 realizes that above-mentioned complete type is filled out when executing the computer program 63
Step in empty answer embodiment of the method, such as step S11~S17 shown in FIG. 1.Alternatively, the processor 62 execute it is described
The function of each module/unit in above-mentioned Cloze Test answering device embodiment, such as the mould in Fig. 2 are realized when computer program 63
Block 401~407.
Illustratively, the computer program 63 can be divided into one or more module/units, it is one or
Multiple module/units are stored in the memory 61, and are executed by the processor 62, to complete the present invention.Described one
A or multiple module/units can be the series of computation machine program instruction section that can complete specific function, and described instruction section is used
In implementation procedure of the description computer program 63 in the electronic equipment 6.For example, the computer program 63 can be by
It is divided into data acquisition module 401, term vector building module 402, word vector generation module 403, problem vector in Fig. 3 raw
At module 404, computing module 405, judgment module 406 and answer determining module 407, each module concrete function is referring to embodiment 2.
In present embodiment, it is whole that the electronic equipment 6 can be desktop PC, notebook, palm PC and cloud
End device etc. calculates equipment.It will be understood by those skilled in the art that the schematic diagram is only the example of electronic equipment 6, not
Restriction to electronic equipment 6 is constituted, may include perhaps combining certain components or not than illustrating more or fewer components
Same component, such as the electronic equipment 6 can also include input-output equipment, network access equipment, bus etc..
Alleged processor 62 can be central processing module (Central Processing Unit, CPU), can also be
Other general processors, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit
(Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-
Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic,
Discrete hardware components etc..General processor can be microprocessor or the processor 62 is also possible to any conventional processing
Device etc., the processor 62 are the control centres of the electronic equipment 6, utilize various interfaces and the entire electronic equipment of connection
6 various pieces.
The memory 61 can be used for storing the computer program 63 and/or module/unit, and the processor 62 passes through
Operation executes the computer program and/or module/unit being stored in the memory 61, and calls and be stored in memory
Data in 61 realize the various functions of the electronic equipment 6.The memory 61 can mainly include storing program area and storage
Data field, wherein storing program area can application program needed for storage program area, at least one function (for example sound plays
Function, image player function etc.) etc.;Storage data area, which can be stored, uses created data (such as sound according to electronic equipment 6
Frequency evidence, phone directory etc.) etc..In addition, memory 61 may include high-speed random access memory, it can also include non-volatile
Memory, such as hard disk, memory, plug-in type hard disk, intelligent memory card (Smart Media Card, SMC), secure digital
(Secure Digital, SD) card, flash card (Flash Card), at least one disk memory, flush memory device or other
Volatile solid-state part.
If the integrated module/unit of the electronic equipment 6 is realized in the form of software function module and as independent
Product when selling or using, can store in a computer readable storage medium.Based on this understanding, the present invention is real
All or part of the process in existing above-described embodiment method, can also instruct relevant hardware come complete by computer program
At the computer program can be stored in a computer readable storage medium, and the computer program is held by processor
When row, it can be achieved that the step of above-mentioned each embodiment of the method.Wherein, the computer program includes computer program code, institute
Stating computer program code can be source code form, object identification code form, executable file or certain intermediate forms etc..It is described
Computer-readable medium may include: any entity or device, recording medium, U that can carry the computer program code
Disk, mobile hard disk, magnetic disk, CD, computer storage, read-only memory (ROM, Read-Only Memory), arbitrary access
Memory (RAM, Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium etc..It needs
It is bright, the content that the computer-readable medium includes can according in jurisdiction make laws and patent practice requirement into
Row increase and decrease appropriate, such as do not include electric load according to legislation and patent practice, computer-readable medium in certain jurisdictions
Wave signal and telecommunication signal.
In several embodiments provided by the present invention, it should be understood that disclosed electronic equipment and method, Ke Yitong
Other modes are crossed to realize.For example, electronic equipment embodiment described above is only schematical, for example, the module
Division, only a kind of logical function partition, there may be another division manner in actual implementation.
It, can also be in addition, each functional module in each embodiment of the present invention can integrate in same treatment module
It is that modules physically exist alone, can also be integrated in equal modules with two or more modules.Above-mentioned integrated mould
Block both can take the form of hardware realization, can also realize in the form of hardware adds software function module.
It is obvious to a person skilled in the art that invention is not limited to the details of the above exemplary embodiments, Er Qie
In the case where without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter
From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the present invention is by appended power
Benefit requires rather than above description limits, it is intended that all by what is fallen within the meaning and scope of the equivalent elements of the claims
Variation is included in the present invention.Any reference signs in the claims should not be construed as limiting the involved claims.This
Outside, it is clear that one word of " comprising " is not excluded for other modules or step, and odd number is not excluded for plural number.It is stated in electronic equipment claim
Multiple modules or electronic equipment can also be implemented through software or hardware by the same module or electronic equipment.The first, the
Second-class word is used to indicate names, and is not indicated any particular order.
Finally it should be noted that the above examples are only used to illustrate the technical scheme of the present invention and are not limiting, although reference
Preferred embodiment describes the invention in detail, those skilled in the art should understand that, it can be to of the invention
Technical solution is modified or equivalent replacement, without departing from the spirit and scope of the technical solution of the present invention.
Claims (10)
1. a kind of Cloze Test answer method, which is characterized in that the described method includes:
Obtain answer data, wherein the answer data include topic document data, problem data and answer data;
Construct the term vector of the word in the topic document data;
The topic is generated after carrying out two-way length memory network coding to the term vector of the word in the topic document data
The word vector of word in document data;
The term vector of word in described problem data is constructed, and two-way length is carried out to the term vector of the word in described problem data
Short memory network coding will simultaneously generate after the term vector of coding is by the output layer output of the two-way length memory network
Problem vector;
The problems in the word vector of each word in the topic document data and described problem data vector are made into dot product
And it normalizes and obtains the probability value of each word in the topic document data, and by identical list in the topic document data
The probability of word merges to obtain the probability of all words in the topic document data;
Judge whether the word of maximum probability and the word in the answer data are identical in the topic document data;And according to
Judging result and the answer data determine final result.
2. Cloze Test answer method as described in claim 1, which is characterized in that described according to judging result and the answer
Data determine that final result includes:
It, will be in institute when the word of maximum probability is identical as the word vector in the answer data in the topic document data
It states word identical with the word of maximum probability in answer data and is confirmed as final result.
3. Cloze Test answer method as described in claim 1, which is characterized in that in the building topic document data
The term vector of word include:
The coding vector that one-hot coding generates word is carried out respectively to each word in the answer data;
Construct the word embeded matrix of all words in the answer data;And
After the coding vector of word in the topic document data and institute's predicate embeded matrix are carried out product calculation and dimensionality reduction
Obtain the term vector of word in the topic document data.
4. Cloze Test answer method as claimed in claim 3, which is characterized in that described to each of described answer data
Word carry out respectively one-hot coding generate word coding vector include:
The dimension of the coding vector of the word in the answer data is determined according to all word quantity in the answer data
Degree, and determine according to sorting position of the word in the answer data position of the target dimension of the coding vector of the word
It sets, sets 1 at the target dimension position for the coding vector, and the target dimension will be removed in the coding vector
Dimension position other than position is set as 0.
5. Cloze Test answer method as claimed in claim 3, which is characterized in that described in the topic document data
The term vector of word carries out the word vector that the word in the topic document data is generated after two-way length memory network encodes
Include:
First two-way length memory network and training described first are established to the term vector of the word in the topic document data
Two-way length memory network, according to formulaUsing the trained described first two-way length memory network to described
The term vector of word in topic document data is encoded, wherein E (x) is the term vector of word in the topic document data, and s is described
Position of the current hidden layer in the described first two-way length memory network in first two-way length memory network.
6. Cloze Test answer method as claimed in claim 5, which is characterized in that the word in described problem data
Term vector carry out two-way length memory network coding and the two-way length memory network will be passed through by the term vector of coding
Output layer output after generate problem vector include:
Second two-way length memory network is established to the term vector of the word in described problem data and training described second is two-way
Length memory network, according to formulaSecond trained is utilized to the term vector of the word in described problem data
Two-way length memory network is encoded, wherein G (x) is described
The term vector of word in problem data, t are hidden layer current in the described second two-way length memory network at described second pair
Position into length memory network.
7. Cloze Test answer method as claimed in claim 6, which is characterized in that it is described will be in the topic document data
The problems in the word vector of each word and described problem data vector make dot product and normalize to obtain the topic number of files
The probability value of each word in includes:
By by the two-way splicing vector of each word of the first two-way shot and long term memory network in the topic document data
Make dot product and normalize to obtain the topic text with the second shot and long term memory network head and the tail splicing vector in described problem data
The probability value of each word in file data.
8. a kind of Cloze Test answering device, which is characterized in that described device includes:
Data acquisition module, for obtaining answer data, wherein the answer data include topic document data, problem data
And answer data;
Term vector constructs module, for constructing the term vector of the word in the topic document data;
Word vector generation module carries out two-way length for the term vector to the word in the topic document data and remembers net
The word vector of the word in the topic document data is generated after network coding;
Problem vector generation module, for constructing the term vector of word in described problem data, and in described problem data
The term vector of word carries out two-way length memory network coding and will be remembered by the term vector of coding by the two-way length
Problem vector is generated after the output layer output of network;
Computing module, for by the word vector of each word in the topic document data and asking in described problem data
Topic vector makees dot product and normalizes to obtain the probability value of each word in the topic document data, and by the topic document
The probability of same word merges to obtain the probability of all words in the topic document data in data;
Judgment module is for judging in the topic document data word of maximum probability with the word in the answer data
It is no identical;And
Answer determining module, for when the word in the topic document data in the word and the answer data of maximum probability
When vector is identical, word identical with the word of maximum probability in the answer data is confirmed as final result.
9. a kind of electronic equipment, it is characterised in that: the electronic equipment includes processor, and the processor is for executing memory
The Cloze Test answer method as described in any one of claim 1-7 is realized when the computer program of middle storage.
10. a kind of computer readable storage medium, is stored thereon with computer program, it is characterised in that: the computer program
The Cloze Test answer method as described in any one of claim 1-7 is realized when being executed by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910528256.2A CN110222345A (en) | 2019-06-18 | 2019-06-18 | Cloze Test answer method, apparatus, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910528256.2A CN110222345A (en) | 2019-06-18 | 2019-06-18 | Cloze Test answer method, apparatus, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110222345A true CN110222345A (en) | 2019-09-10 |
Family
ID=67817572
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910528256.2A Pending CN110222345A (en) | 2019-06-18 | 2019-06-18 | Cloze Test answer method, apparatus, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110222345A (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2018060273A (en) * | 2016-10-03 | 2018-04-12 | エヌ・ティ・ティ レゾナント株式会社 | Information processing device, information processing method, and program |
CN108415977A (en) * | 2018-02-09 | 2018-08-17 | 华南理工大学 | One is read understanding method based on the production machine of deep neural network and intensified learning |
CN108829672A (en) * | 2018-06-05 | 2018-11-16 | 平安科技(深圳)有限公司 | Sentiment analysis method, apparatus, computer equipment and the storage medium of text |
CN108845990A (en) * | 2018-06-12 | 2018-11-20 | 北京慧闻科技发展有限公司 | Answer selection method, device and electronic equipment based on two-way attention mechanism |
CN108959556A (en) * | 2018-06-29 | 2018-12-07 | 北京百度网讯科技有限公司 | Entity answering method, device and terminal neural network based |
US20190043379A1 (en) * | 2017-08-03 | 2019-02-07 | Microsoft Technology Licensing, Llc | Neural models for key phrase detection and question generation |
CN109670029A (en) * | 2018-12-28 | 2019-04-23 | 百度在线网络技术(北京)有限公司 | For determining the method, apparatus, computer equipment and storage medium of problem answers |
-
2019
- 2019-06-18 CN CN201910528256.2A patent/CN110222345A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2018060273A (en) * | 2016-10-03 | 2018-04-12 | エヌ・ティ・ティ レゾナント株式会社 | Information processing device, information processing method, and program |
US20190043379A1 (en) * | 2017-08-03 | 2019-02-07 | Microsoft Technology Licensing, Llc | Neural models for key phrase detection and question generation |
CN108415977A (en) * | 2018-02-09 | 2018-08-17 | 华南理工大学 | One is read understanding method based on the production machine of deep neural network and intensified learning |
CN108829672A (en) * | 2018-06-05 | 2018-11-16 | 平安科技(深圳)有限公司 | Sentiment analysis method, apparatus, computer equipment and the storage medium of text |
CN108845990A (en) * | 2018-06-12 | 2018-11-20 | 北京慧闻科技发展有限公司 | Answer selection method, device and electronic equipment based on two-way attention mechanism |
CN108959556A (en) * | 2018-06-29 | 2018-12-07 | 北京百度网讯科技有限公司 | Entity answering method, device and terminal neural network based |
CN109670029A (en) * | 2018-12-28 | 2019-04-23 | 百度在线网络技术(北京)有限公司 | For determining the method, apparatus, computer equipment and storage medium of problem answers |
Non-Patent Citations (1)
Title |
---|
陈志刚: "《应用考试自动答题技术的研究》", 《中国博士论文全文库》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10592607B2 (en) | Iterative alternating neural attention for machine reading | |
CN110110041A (en) | Wrong word correcting method, device, computer installation and storage medium | |
CN111241237B (en) | Intelligent question-answer data processing method and device based on operation and maintenance service | |
CN109033068A (en) | It is used to read the method, apparatus understood and electronic equipment based on attention mechanism | |
WO2021063089A1 (en) | Rule matching method, rule matching apparatus, storage medium and electronic device | |
CN112131881A (en) | Information extraction method and device, electronic equipment and storage medium | |
CN114003682A (en) | Text classification method, device, equipment and storage medium | |
KR20200041199A (en) | Method, apparatus and computer-readable medium for operating chatbot | |
US11869128B2 (en) | Image generation based on ethical viewpoints | |
CN113486659B (en) | Text matching method, device, computer equipment and storage medium | |
CN114662484A (en) | Semantic recognition method and device, electronic equipment and readable storage medium | |
CN116402166B (en) | Training method and device of prediction model, electronic equipment and storage medium | |
WO2021217866A1 (en) | Method and apparatus for ai interview recognition, computer device and storage medium | |
CN112036439B (en) | Dependency relationship classification method and related equipment | |
CN115510193B (en) | Query result vectorization method, query result determination method and related devices | |
CN117764373A (en) | Risk prediction method, apparatus, device and storage medium | |
CN110263346A (en) | Lexical analysis method, electronic equipment and storage medium based on small-sample learning | |
CN116090471A (en) | Multitasking model pre-training method and device, storage medium and electronic equipment | |
CN110222345A (en) | Cloze Test answer method, apparatus, electronic equipment and storage medium | |
US11790885B2 (en) | Semi-structured content aware bi-directional transformer | |
CN114970535B (en) | Intention recognition method, system, device and storage medium | |
CN116933800B (en) | Template-based generation type intention recognition method and device | |
Chang et al. | Applying code transform model to newly generated program for improving execution performance | |
CN117453273A (en) | Intelligent program code complement method and device | |
CN115617944A (en) | Content recommendation method and device, storage medium and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
AD01 | Patent right deemed abandoned | ||
AD01 | Patent right deemed abandoned |
Effective date of abandoning: 20240105 |