CN110222345A - Cloze Test answer method, apparatus, electronic equipment and storage medium - Google Patents

Cloze Test answer method, apparatus, electronic equipment and storage medium Download PDF

Info

Publication number
CN110222345A
CN110222345A CN201910528256.2A CN201910528256A CN110222345A CN 110222345 A CN110222345 A CN 110222345A CN 201910528256 A CN201910528256 A CN 201910528256A CN 110222345 A CN110222345 A CN 110222345A
Authority
CN
China
Prior art keywords
word
data
vector
document data
answer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910528256.2A
Other languages
Chinese (zh)
Inventor
吴良顺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhuo Erzhi Lian Wuhan Research Institute Co Ltd
Original Assignee
Zhuo Erzhi Lian Wuhan Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhuo Erzhi Lian Wuhan Research Institute Co Ltd filed Critical Zhuo Erzhi Lian Wuhan Research Institute Co Ltd
Priority to CN201910528256.2A priority Critical patent/CN110222345A/en
Publication of CN110222345A publication Critical patent/CN110222345A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a kind of Cloze Test answer method, apparatus, electronic equipment and storage mediums.The described method includes: the term vector of the word in building topic document data;The word vector of generation word after two-way length memory network coding is carried out to the term vector of the word in the topic document data;The term vector of word in Construct question data, and problem vector is generated after encoding to the term vector of the word in described problem data;The problems in the word vector of each word in topic document data and described problem data vector are made dot product and normalized to obtain the probability value of each word in the topic document data, and the probability of same word in the topic document data is merged to obtain the probability of all words in the topic document data;And final result is determined according to judging result and the answer data.

Description

Cloze Test answer method, apparatus, electronic equipment and storage medium
Technical field
The present invention relates to computer disposal fields, and in particular to a kind of Cloze Test answer method, apparatus, electronic equipment and Storage medium.
Background technique
Cloze test topic is that a kind of the most popular topic type and difficulty are big in the Middle School English examination question of China various regions, score The low topic type of rate.In general, cloze test, which is read, understands it is the problem typical that relationship is excavated between a kind of document and inquiry, The setting of the distracter of cloze test topic is mainly related to literary meaning.Existing automatic cloze test system is based on n meta-model structure The simple system built, the main frequency occurred in a document according to phrase can only be answered and be had already appeared in document to judge Combinations of words, but lack inferential capability, for the regular collocation for not having to occur in text, then it can not provide answer.
Summary of the invention
In view of the foregoing, it is necessary to propose a kind of Cloze Test answer method, apparatus, electronic equipment and computer-readable Storage medium, in the case where solving the problems, such as that the combination of Cloze Test and Cloze Test answer regular collocation does not occur in the text The problem of answer can not be provided automatically.
The first aspect of the application provides a kind of Cloze Test answer method, which is characterized in that the described method includes:
Obtain answer data, wherein the answer data include topic document data, problem data and answer data;
Construct the term vector of the word in the topic document data;
The term vector of word in the topic document data is carried out after two-way length memory network encodes described in generation The word vector of word in topic document data;
The term vector of word in described problem data is constructed, and the term vector of the word in described problem data is carried out double To length memory network coding and will be after the term vector of coding is exported by the output layer of the two-way length memory network Generation problem vector;
The problems in the word vector of each word in the topic document data and described problem data vector are made Dot product simultaneously normalizes and obtains the probability value of each word in the topic document data, and by phase in the topic document data Probability with word merges to obtain the probability of all words in the topic document data;
Judge whether the word in the topic document data in the word and answer data of maximum probability is identical;And
Final result is determined according to judging result and the answer data.
Preferably, described to determine that final result includes: according to judging result and the answer data
It, will when the word of maximum probability is identical as the word vector in the answer data in the topic document data Word identical with the word of maximum probability is confirmed as final result in the answer data.
It, will be in institute when the word of maximum probability is identical as the word vector in answer data in the topic document data It states word identical with the word of maximum probability in answer data and is confirmed as final result.
Preferably, the term vector of the word in the building topic document data includes:
The coding vector that one-hot coding generates word is carried out respectively to each word in the answer data;
Construct the word embeded matrix of all words in the answer data;And
The coding vector of word in the topic document data is subjected to product calculation with institute's predicate embeded matrix and is dropped The term vector of word in the topic document data is obtained after dimension.
Preferably, each word in the answer data carry out respectively one-hot coding generate word coding to Amount includes:
The coding vector of the word in the answer data is determined according to all word quantity in the answer data Dimension, and determine according to sorting position of the word in the answer data target dimension of the coding vector of the word Position sets 1 for the coding vector at the target dimension position, and ties up the target is removed in the coding vector Dimension position other than degree position is set as 0.
Preferably, the term vector to the word in the topic document data carries out two-way length memory network coding The word vector for generating the word in the topic document data afterwards includes:
The term vector of word in the topic document data is established described in the first two-way length memory network and training First two-way length memory network, according to formulaUtilize the trained described first two-way length memory network pair The term vector of word in the topic document data is encoded, wherein E (x) is the term vector of word in the topic document data, and s is described Position of the current hidden layer in the described first two-way length memory network in first two-way length memory network.
Preferably, the term vector to the word in described problem data carry out two-way length memory network encode and incite somebody to action Generating problem vector after the term vector of coding is by the output layer output of the two-way length memory network includes:
Second two-way length memory network and training described second are established to the term vector of the word in described problem data Two-way length memory network, according to formulaThe term vector utilization of word in described problem data has been trained Second two-way length memory network is encoded, wherein G (x) is The term vector of word in described problem data, t be in the described second two-way length memory network current hidden layer described the Position in two two-way length memory networks.
Preferably, in the word vector by each word in the topic document data and described problem data Problem vector makees dot product and normalizing and obtains the probability value of each word in the topic document data
By by the two-way splicing of each word of the first two-way shot and long term memory network in the topic document data The second shot and long term memory network head and the tail splicing vector in vector and described problem data makees dot product and normalizes to obtain the topic The probability value of each word in mesh document data.
The second aspect of the application provides a kind of Cloze Test answering device, and described device includes:
Data acquisition module, for obtaining answer data, wherein the answer data include topic document data, problem Data and answer data;
Term vector constructs module, for constructing the term vector of the word in the topic document data;
Word vector generation module carries out two-way length note for the term vector to the word in the topic document data Recall the word vector that the word in the topic document data is generated after network code;
Problem vector generation module, for constructing the term vector of word in described problem data, and to described problem data In the term vector of word carry out two-way length memory network coding and the two-way length will be passed through by the term vector of coding Problem vector is generated after the output layer output of memory network;
Computing module, for will be in the word vector of each word in the topic document data and described problem data The problem of vector make dot product and normalize to obtain the probability value of each word in the topic document data, and by the topic The probability of same word merges to obtain the probability of all words in the topic document data in document data;
Judgment module, for judging that the word in the topic document data in the word and answer data of maximum probability is It is no identical;And
Answer determining module, for when the word in the topic document data in the word and answer data of maximum probability When vector is identical, word identical with the word of maximum probability in the answer data is confirmed as final result.
The third aspect of the application provides a kind of electronic equipment, and the electronic equipment includes processor, and the processor is used The Cloze Test answer method is realized when executing the computer program stored in memory.
The fourth aspect of the application provides a kind of computer readable storage medium, is stored thereon with computer program, described The Cloze Test answer method is realized when computer program is executed by processor.
In the present invention, by the word vector of each word in the topic document data and asking in described problem data Topic vector makees dot product and normalizes to obtain the corresponding probability value of each word in topic document data, and by the topic number of files It merges to obtain the probability of all words in the topic document data according to the probability value of middle same word;When maximum probability Word identical with the word of maximum probability in answer data is confirmed when word is identical as the word vector in answer data For final result, and when the word vector in the word and answer data of maximum probability is not identical by calculating the answer number In in the word vector of each word and the topic document data between word vector corresponding to the word of maximum probability COS distance value, and word corresponding to maximum COS distance value is confirmed as final result, to solve Cloze Test There is not the problem of can not providing answer automatically in the case where regular collocation in the text in the combination of problem and Cloze Test answer.
Detailed description of the invention
Fig. 1 is the flow chart of Cloze Test answer method in an embodiment of the present invention.
Fig. 2 is the structure chart of Cloze Test answering device in an embodiment of the present invention.
Fig. 3 is the schematic diagram of electronic equipment in an embodiment of the present invention.
Specific embodiment
To better understand the objects, features and advantages of the present invention, with reference to the accompanying drawing and specific real Applying example, the present invention will be described in detail.It should be noted that in the absence of conflict, embodiments herein and embodiment In feature can be combined with each other.
In the following description, numerous specific details are set forth in order to facilitate a full understanding of the present invention, described embodiment is only It is only a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill Personnel's every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
Unless otherwise defined, all technical and scientific terms used herein and belong to technical field of the invention The normally understood meaning of technical staff is identical.Term as used herein in the specification of the present invention is intended merely to description tool The purpose of the embodiment of body, it is not intended that in the limitation present invention.
Preferably, Cloze Test answer method of the present invention is applied in one or more electronic equipment.The electronics is set Standby is that one kind can be according to the instruction for being previously set or storing, and the automatic equipment for carrying out numerical value calculating and/or information processing is hard Part include but is not limited to microprocessor, specific integrated circuit (Application Specific Integrated Circuit, ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), digital processing unit (Digital Signal Processor, DSP), embedded device etc..
The electronic equipment can be the calculating such as desktop PC, laptop, tablet computer and cloud server Equipment.The equipment can carry out man-machine friendship by modes such as keyboard, mouse, remote controler, touch tablet or voice-operated devices with user Mutually.
Embodiment 1
Fig. 1 is the flow chart of Cloze Test answer method in an embodiment of the present invention.The stream according to different requirements, The sequence of step can change in journey figure, and certain steps can be omitted.
As shown in fig.1, the Cloze Test answer method specifically includes the following steps:
Step S11 obtains answer data, wherein the answer data include topic document data, problem data and answer Data.
In present embodiment, the answer data can be obtained from the database of server.For example, can be from the service What is stored in the database of device includes that the answer data are obtained in the electronic document of answering information.In another embodiment In, the answer data are obtained from the paper document comprising answering information by scanning means.Specifically, the scanning means The transmitting optical signal in the paper document is obtained using optical technology, wherein papery text is carried in the reflected light signal The information content in part.The scanning means further pass through optical character identification (Optic Character Recognize, OCR) the information content for including in technology analysis reflected light signal, to obtain the answer data.It is described in present embodiment Answer data can be English data.
Step S12 constructs the term vector of the word in the topic document data.
In present embodiment, the term vector of the word in the building topic document data includes:
A) each word in the answer data is carried out respectively solely hot (one-hot) coding generate the coding of word to Amount;
B) the word embeded matrix of all words in the answer data is constructed;And
C) coding vector of the word in the topic document data and institute's predicate embeded matrix are subjected to product calculation simultaneously The term vector of word in the topic document data is obtained after dimensionality reduction.
In a specific embodiment, each word in the answer data carries out one-hot coding generation respectively The coding vector of word includes: the word determined in the answer data according to all word quantity in the answer data Coding vector dimension, and the coding vector of the word is determined according to sorting position of the word in the answer data The position of target dimension sets 1 for the coding vector at the target dimension position, and will remove in the coding vector Dimension position other than the target dimension position is set as 0, so realizes the building to the coding vector of the word.Example Such as, it is assumed that the word in answer data has 1000, then is determined according to the quantity of word in the answer data to the answer Each word in data carry out respectively the word generated after one-hot coding coding vector be 1000 dimension, and if the word exist Sorting position in the answer data is located at i-th bit and sets place, then the coding vector of the word is arranged at i-th dimension position It is 0, and set 0 for the position in the coding vector in addition to target dimension.In present embodiment, according to the answer number The word embeded matrix of answer data described in feature construction according to middle word.For example, if word includes 300 in the answer data Feature, then the dimension of institute's predicate embeded matrix is configured to 1000 × 300.In present embodiment, the topic number of files is being generated In the coding vector of word in and the building answer data after the word embeded matrix of all words according to formula e (x)= x·weCalculate the term vector of word in the topic document data, wherein x is the coding of the word in topic document data Vector, weFor the word embeded matrix of all words in the answer data.
Step S13 carries out two-way length memory network (Long to the term vector of the word in the topic document data Short-Term Memory, LSTM) the word vector of word in the topic document data is generated after coding.
In present embodiment, the first two-way length is established to the term vector of the word in the topic document data and remembers net Network and the training first two-way length memory network, according to formulaUtilize the described first two-way length trained Short memory network encodes the term vector of the word in the topic document data, whereinE (x) is the term vector of word in the topic document data, and s is Current position of the hidden layer in the described first two-way length memory network in the first two-way length memory network.
Step S14 constructs the term vector of the word in described problem data, and to the word of the word in described problem data Vector carries out two-way length memory network coding and will pass through the defeated of the two-way length memory network by the term vector of coding Problem vector is generated after layer output out.
Described in present embodiment building described problem data in word term vector include: will be in described problem data The coding vector and institute's predicate embeded matrix of word carry out product calculation and obtain the term vector of word in described problem data.
In present embodiment, the second two-way length memory network is established simultaneously to the term vector of the word in described problem data The training second two-way length memory network, according to formulaTo the term vector of the word in described problem data It is encoded using the trained second two-way length memory network, wherein G (x) is the term vector of word in described problem data, and t is described second Position of the current hidden layer in the described second two-way length memory network in two-way length memory network.
Step S15, by the word vector of each word in the topic document data and asking in described problem data Topic vector makees dot product and normalizes to obtain the probability value of each word in the topic document data, and by the topic document The probability of same word merges to obtain the probability of all words in the topic document data in data.
In present embodiment, by by the two-way spelling of each word of shot and long term memory network two-way in topic document data Shot and long term memory network head and the tail splicing vector in vector and described problem data is connect to make dot product and normalize to obtain the topic text The probability value of each word in file data.
Step S16, judge in the topic document data word of maximum probability and the word in answer data whether phase Together.
Step S17 determines final result according to judging result and the answer data.
In present embodiment, when the word vector in the topic document data in the word and answer data of maximum probability Word identical with the word of maximum probability in the answer data is confirmed as final result when identical.
In present embodiment, when the word vector in the topic document data in the word and answer data of maximum probability The word vector of word in the answer data is constructed when not identical, and calculates separately the list of each word in the answer data COS distance value in term vector and the topic document data between word vector corresponding to the word of maximum probability, and will Word corresponding to maximum COS distance value is confirmed as final result in the answer data.
In present embodiment, the word vector for constructing word in the answer data includes: to judge in the answer data Word whether be included in the topic document data;When the word in the answer data is included in the topic number of files According to it is middle when, the coding vector of the word in described problem data and institute's predicate embeded matrix are subjected to product calculation and obtain described ask Inscribe the term vector of word in data;And it is set when the word in the answer data is not comprised in the topic document data Word vector in the fixed answer data is null vector.
In the present invention, by the word vector of each word in the topic document data and asking in described problem data Topic vector makees dot product and normalizes to obtain the corresponding probability value of each word in topic document data, and by the topic number of files It merges to obtain the probability of all words in the topic document data according to the probability value of middle same word;When maximum probability Word identical with the word of maximum probability in answer data is confirmed when word is identical as the word vector in answer data For final result, and when the word vector in the word and answer data of maximum probability is not identical by calculating the answer number In in the word vector of each word and the topic document data between word vector corresponding to the word of maximum probability COS distance value, and word corresponding to maximum COS distance value is confirmed as final result, to solve Cloze Test There is not the problem of can not providing answer automatically in the case where regular collocation in the text in the combination of problem and Cloze Test answer.
Embodiment 2
Fig. 2 is the structure chart of Cloze Test answering device 40 in an embodiment of the present invention.
In some embodiments, the Cloze Test answering device 40 is run in electronic equipment.The Cloze Test is answered Inscribing device 40 may include multiple functional modules as composed by program code segments.It is each in the Cloze Test answering device 40 The program code of a program segment can store in memory, and as performed by least one processor, to execute Cloze Test The function of answer.
In the present embodiment, function of the Cloze Test answering device 40 according to performed by it can be divided into multiple Functional module.As shown in fig.2, the Cloze Test answering device 40 may include data acquisition module 401, term vector building Module 402, word vector generation module 403, problem vector generation module 404, computing module 405, judgment module 406 and answer Determining module 407.The so-called module of the present invention refers to that one kind performed by least one processor and can be completed solid Determine the series of computation machine program segment of function, storage is in memory.It is described in some embodiments, the function about each module It can will be described in detail in subsequent embodiment.
It is described to obtain answer data according to acquisition module 401, wherein the answer data include topic document data, problem Data and answer data.
It is described that the answer data are obtained from the database of server according to acquisition module 401 in present embodiment.Example Such as, described according to obtain that module 401 stores from the database of the server includes to obtain in the electronic document of answering information Take the answer data.In another embodiment, described according to obtaining module 401 by scanning means from including answering information The answer data are obtained in paper document.Specifically, the scanning means is obtained in the paper document using optical technology Transmitting optical signal, wherein the information content in paper document is carried in the reflected light signal.The scanning means is into one Step analyzes the information content for including in reflected light signal by optical character recognition technology, to obtain the answer data.This In embodiment, the answer data can be English data.
The term vector building module 402 constructs the term vector of the word in the topic document data.
In present embodiment, term vector building module 402 construct the word of the word in the topic document data to Amount includes:
A) each word in the answer data is carried out respectively solely hot (one-hot) coding generate the coding of word to Amount;
B) the word embeded matrix of all words in the answer data is constructed;And
C) coding vector of the word in the topic document data and institute's predicate embeded matrix are subjected to product calculation simultaneously The term vector of word in the topic document data is obtained after dimensionality reduction.
In a specific embodiment, to each list in the answer data described in the term vector building module 402 Word carries out one-hot coding and generates the coding vector of word to include: true according to all word quantity in the answer data respectively The dimension of the coding vector of word in the fixed answer data, and it is true according to sorting position of the word in the answer data The position of the target dimension of the coding vector of the fixed word, sets the coding vector at the target dimension position 1, and 0 is set by the dimension position in the coding vector in addition to the target dimension position, so realize to the list The building of the coding vector of word.For example, it is assumed that the word in answer data has 1000, then according to word in the answer data Quantity determine the word generated after one-hot coding is carried out respectively to each word in the answer data coding vector be 1000 dimension, and if sorting position of the word in the answer data be located at i-th bit and set place, by the coding of the word Vector is set as 0 at i-th dimension position, and sets 0 for the position in the coding vector in addition to target dimension.This implementation In mode, answer data described in feature construction of the term vector building module 402 also according to word in the answer data Word embeded matrix.For example, the dimension of institute's predicate embeded matrix constructs if word includes 300 features in the answer data It is 1000 × 300.In present embodiment, answer described in the coding vector and building for generating the word in the topic document data According to formula e (x)=xw after the word embeded matrix of all words in topic dataeCalculate word in the topic document data Term vector, wherein x be topic document data in word coding vector, weFor all words in the answer data Word embeded matrix.
The word vector generation module 403 carries out two-way length to the term vector of the word in the topic document data The word vector of the word in the topic document data is generated after memory network coding.
In present embodiment, the word vector generation module 403 to the word of the word in the topic document data to Amount establishes the first two-way length memory network and the training first two-way length memory network, according to formulaBenefit The term vector of the word in the topic document data is encoded with the trained described first two-way length memory network, Wherein,E (x) be the topic document data in word word to Amount, s are current position of the hidden layer in the described first two-way length memory network in the described first two-way length memory network.
Described problem vector generation module 404 constructs the term vector of the word in described problem data, and to described problem The term vector of word in data carries out two-way length memory network coding and passes through the term vector for passing through coding described two-way Problem vector is generated after the output layer output of length memory network.
In present embodiment, described problem vector generation module 404 constructs the term vector packet of word in described problem data It includes: the coding vector of the word in described problem data and institute's predicate embeded matrix being subjected to product calculation and obtain described problem number According to the term vector of middle word.
In present embodiment, described problem vector generation module 404 builds the term vector of the word in described problem data Vertical second two-way length memory network and the training second two-way length memory network, according to formulaTo described The term vector of word in problem data is encoded using the trained second two-way length memory network, whereinG (x) is the term vector of word in described problem data, and t is institute State position of the hidden layer current in the second two-way length memory network in the described second two-way length memory network.
The computing module 405 is by the word vector of each word in the topic document data and described problem data The problems in vector make dot product and normalize to obtain the probability value of each word in the topic document data, and by the topic The probability of same word merges to obtain the probability of all words in the topic document data in mesh document data.
In present embodiment, the computing module 405 is by by shot and long term memory network two-way in topic document data Shot and long term memory network head and the tail splicing vector makees dot product and normalizing in the two-way splicing vector and described problem data of each word Change and obtains the probability value of each word in the topic document data.
The judgment module 406 judges the word in the topic document data in the word and answer data of maximum probability It is whether identical.
The answer determining module 407 determines final result according to judging result and the answer data.
In present embodiment, the answer determining module 407 is used for the list when maximum probability in the topic document data It is when word is identical as the word vector in answer data that word identical with the word of maximum probability in the answer data is true Think final result.
In present embodiment, the answer determining module 407 is also used to when maximum probability in the topic document data Word vector in word and answer data constructs the word vector of word in the answer data when not identical, and calculates separately List corresponding to word of the word vector of each word with maximum probability in the topic document data in the answer data COS distance value between term vector, and word corresponding to COS distance value maximum in the answer data is confirmed as most Whole answer.
In present embodiment, the word vector that the answer determining module 407 constructs word in the answer data includes: Judge whether the word in the answer data is included in the topic document data;When the word packet in the answer data When being contained in the topic document data, the coding vector of the word in described problem data and institute's predicate embeded matrix are carried out Product calculation obtains the term vector of word in described problem data;And described in being not comprised in when the word in the answer data The word vector in the answer data is set when in topic document data as null vector.
In the present invention, by the word vector of each word in the topic document data and asking in described problem data Topic vector makees dot product and normalizes to obtain the corresponding probability value of each word in topic document data, and by the topic number of files It merges to obtain the probability of all words in the topic document data according to the probability value of middle same word;When maximum probability Word identical with the word of maximum probability in answer data is confirmed when word is identical as the word vector in answer data For final result, and when the word vector in the word and answer data of maximum probability is not identical by calculating the answer number In in the word vector of each word and the topic document data between word vector corresponding to the word of maximum probability COS distance value, and word corresponding to maximum COS distance value is confirmed as final result, to solve Cloze Test There is not the problem of can not providing answer automatically in the case where regular collocation in the text in the combination of problem and Cloze Test answer.
Embodiment 3
Fig. 3 is the schematic diagram of electronic equipment 6 in an embodiment of the present invention.
The electronic equipment 6 includes memory 61, processor 62 and is stored in the memory 61 and can be described The computer program 63 run on processor 62.The processor 62 realizes that above-mentioned complete type is filled out when executing the computer program 63 Step in empty answer embodiment of the method, such as step S11~S17 shown in FIG. 1.Alternatively, the processor 62 execute it is described The function of each module/unit in above-mentioned Cloze Test answering device embodiment, such as the mould in Fig. 2 are realized when computer program 63 Block 401~407.
Illustratively, the computer program 63 can be divided into one or more module/units, it is one or Multiple module/units are stored in the memory 61, and are executed by the processor 62, to complete the present invention.Described one A or multiple module/units can be the series of computation machine program instruction section that can complete specific function, and described instruction section is used In implementation procedure of the description computer program 63 in the electronic equipment 6.For example, the computer program 63 can be by It is divided into data acquisition module 401, term vector building module 402, word vector generation module 403, problem vector in Fig. 3 raw At module 404, computing module 405, judgment module 406 and answer determining module 407, each module concrete function is referring to embodiment 2.
In present embodiment, it is whole that the electronic equipment 6 can be desktop PC, notebook, palm PC and cloud End device etc. calculates equipment.It will be understood by those skilled in the art that the schematic diagram is only the example of electronic equipment 6, not Restriction to electronic equipment 6 is constituted, may include perhaps combining certain components or not than illustrating more or fewer components Same component, such as the electronic equipment 6 can also include input-output equipment, network access equipment, bus etc..
Alleged processor 62 can be central processing module (Central Processing Unit, CPU), can also be Other general processors, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field- Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic, Discrete hardware components etc..General processor can be microprocessor or the processor 62 is also possible to any conventional processing Device etc., the processor 62 are the control centres of the electronic equipment 6, utilize various interfaces and the entire electronic equipment of connection 6 various pieces.
The memory 61 can be used for storing the computer program 63 and/or module/unit, and the processor 62 passes through Operation executes the computer program and/or module/unit being stored in the memory 61, and calls and be stored in memory Data in 61 realize the various functions of the electronic equipment 6.The memory 61 can mainly include storing program area and storage Data field, wherein storing program area can application program needed for storage program area, at least one function (for example sound plays Function, image player function etc.) etc.;Storage data area, which can be stored, uses created data (such as sound according to electronic equipment 6 Frequency evidence, phone directory etc.) etc..In addition, memory 61 may include high-speed random access memory, it can also include non-volatile Memory, such as hard disk, memory, plug-in type hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card (Flash Card), at least one disk memory, flush memory device or other Volatile solid-state part.
If the integrated module/unit of the electronic equipment 6 is realized in the form of software function module and as independent Product when selling or using, can store in a computer readable storage medium.Based on this understanding, the present invention is real All or part of the process in existing above-described embodiment method, can also instruct relevant hardware come complete by computer program At the computer program can be stored in a computer readable storage medium, and the computer program is held by processor When row, it can be achieved that the step of above-mentioned each embodiment of the method.Wherein, the computer program includes computer program code, institute Stating computer program code can be source code form, object identification code form, executable file or certain intermediate forms etc..It is described Computer-readable medium may include: any entity or device, recording medium, U that can carry the computer program code Disk, mobile hard disk, magnetic disk, CD, computer storage, read-only memory (ROM, Read-Only Memory), arbitrary access Memory (RAM, Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium etc..It needs It is bright, the content that the computer-readable medium includes can according in jurisdiction make laws and patent practice requirement into Row increase and decrease appropriate, such as do not include electric load according to legislation and patent practice, computer-readable medium in certain jurisdictions Wave signal and telecommunication signal.
In several embodiments provided by the present invention, it should be understood that disclosed electronic equipment and method, Ke Yitong Other modes are crossed to realize.For example, electronic equipment embodiment described above is only schematical, for example, the module Division, only a kind of logical function partition, there may be another division manner in actual implementation.
It, can also be in addition, each functional module in each embodiment of the present invention can integrate in same treatment module It is that modules physically exist alone, can also be integrated in equal modules with two or more modules.Above-mentioned integrated mould Block both can take the form of hardware realization, can also realize in the form of hardware adds software function module.
It is obvious to a person skilled in the art that invention is not limited to the details of the above exemplary embodiments, Er Qie In the case where without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the present invention is by appended power Benefit requires rather than above description limits, it is intended that all by what is fallen within the meaning and scope of the equivalent elements of the claims Variation is included in the present invention.Any reference signs in the claims should not be construed as limiting the involved claims.This Outside, it is clear that one word of " comprising " is not excluded for other modules or step, and odd number is not excluded for plural number.It is stated in electronic equipment claim Multiple modules or electronic equipment can also be implemented through software or hardware by the same module or electronic equipment.The first, the Second-class word is used to indicate names, and is not indicated any particular order.
Finally it should be noted that the above examples are only used to illustrate the technical scheme of the present invention and are not limiting, although reference Preferred embodiment describes the invention in detail, those skilled in the art should understand that, it can be to of the invention Technical solution is modified or equivalent replacement, without departing from the spirit and scope of the technical solution of the present invention.

Claims (10)

1. a kind of Cloze Test answer method, which is characterized in that the described method includes:
Obtain answer data, wherein the answer data include topic document data, problem data and answer data;
Construct the term vector of the word in the topic document data;
The topic is generated after carrying out two-way length memory network coding to the term vector of the word in the topic document data The word vector of word in document data;
The term vector of word in described problem data is constructed, and two-way length is carried out to the term vector of the word in described problem data Short memory network coding will simultaneously generate after the term vector of coding is by the output layer output of the two-way length memory network Problem vector;
The problems in the word vector of each word in the topic document data and described problem data vector are made into dot product And it normalizes and obtains the probability value of each word in the topic document data, and by identical list in the topic document data The probability of word merges to obtain the probability of all words in the topic document data;
Judge whether the word of maximum probability and the word in the answer data are identical in the topic document data;And according to Judging result and the answer data determine final result.
2. Cloze Test answer method as described in claim 1, which is characterized in that described according to judging result and the answer Data determine that final result includes:
It, will be in institute when the word of maximum probability is identical as the word vector in the answer data in the topic document data It states word identical with the word of maximum probability in answer data and is confirmed as final result.
3. Cloze Test answer method as described in claim 1, which is characterized in that in the building topic document data The term vector of word include:
The coding vector that one-hot coding generates word is carried out respectively to each word in the answer data;
Construct the word embeded matrix of all words in the answer data;And
After the coding vector of word in the topic document data and institute's predicate embeded matrix are carried out product calculation and dimensionality reduction Obtain the term vector of word in the topic document data.
4. Cloze Test answer method as claimed in claim 3, which is characterized in that described to each of described answer data Word carry out respectively one-hot coding generate word coding vector include:
The dimension of the coding vector of the word in the answer data is determined according to all word quantity in the answer data Degree, and determine according to sorting position of the word in the answer data position of the target dimension of the coding vector of the word It sets, sets 1 at the target dimension position for the coding vector, and the target dimension will be removed in the coding vector Dimension position other than position is set as 0.
5. Cloze Test answer method as claimed in claim 3, which is characterized in that described in the topic document data The term vector of word carries out the word vector that the word in the topic document data is generated after two-way length memory network encodes Include:
First two-way length memory network and training described first are established to the term vector of the word in the topic document data Two-way length memory network, according to formulaUsing the trained described first two-way length memory network to described The term vector of word in topic document data is encoded, wherein E (x) is the term vector of word in the topic document data, and s is described Position of the current hidden layer in the described first two-way length memory network in first two-way length memory network.
6. Cloze Test answer method as claimed in claim 5, which is characterized in that the word in described problem data Term vector carry out two-way length memory network coding and the two-way length memory network will be passed through by the term vector of coding Output layer output after generate problem vector include:
Second two-way length memory network is established to the term vector of the word in described problem data and training described second is two-way Length memory network, according to formulaSecond trained is utilized to the term vector of the word in described problem data Two-way length memory network is encoded, wherein G (x) is described The term vector of word in problem data, t are hidden layer current in the described second two-way length memory network at described second pair Position into length memory network.
7. Cloze Test answer method as claimed in claim 6, which is characterized in that it is described will be in the topic document data The problems in the word vector of each word and described problem data vector make dot product and normalize to obtain the topic number of files The probability value of each word in includes:
By by the two-way splicing vector of each word of the first two-way shot and long term memory network in the topic document data Make dot product and normalize to obtain the topic text with the second shot and long term memory network head and the tail splicing vector in described problem data The probability value of each word in file data.
8. a kind of Cloze Test answering device, which is characterized in that described device includes:
Data acquisition module, for obtaining answer data, wherein the answer data include topic document data, problem data And answer data;
Term vector constructs module, for constructing the term vector of the word in the topic document data;
Word vector generation module carries out two-way length for the term vector to the word in the topic document data and remembers net The word vector of the word in the topic document data is generated after network coding;
Problem vector generation module, for constructing the term vector of word in described problem data, and in described problem data The term vector of word carries out two-way length memory network coding and will be remembered by the term vector of coding by the two-way length Problem vector is generated after the output layer output of network;
Computing module, for by the word vector of each word in the topic document data and asking in described problem data Topic vector makees dot product and normalizes to obtain the probability value of each word in the topic document data, and by the topic document The probability of same word merges to obtain the probability of all words in the topic document data in data;
Judgment module is for judging in the topic document data word of maximum probability with the word in the answer data It is no identical;And
Answer determining module, for when the word in the topic document data in the word and the answer data of maximum probability When vector is identical, word identical with the word of maximum probability in the answer data is confirmed as final result.
9. a kind of electronic equipment, it is characterised in that: the electronic equipment includes processor, and the processor is for executing memory The Cloze Test answer method as described in any one of claim 1-7 is realized when the computer program of middle storage.
10. a kind of computer readable storage medium, is stored thereon with computer program, it is characterised in that: the computer program The Cloze Test answer method as described in any one of claim 1-7 is realized when being executed by processor.
CN201910528256.2A 2019-06-18 2019-06-18 Cloze Test answer method, apparatus, electronic equipment and storage medium Pending CN110222345A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910528256.2A CN110222345A (en) 2019-06-18 2019-06-18 Cloze Test answer method, apparatus, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910528256.2A CN110222345A (en) 2019-06-18 2019-06-18 Cloze Test answer method, apparatus, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN110222345A true CN110222345A (en) 2019-09-10

Family

ID=67817572

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910528256.2A Pending CN110222345A (en) 2019-06-18 2019-06-18 Cloze Test answer method, apparatus, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110222345A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018060273A (en) * 2016-10-03 2018-04-12 エヌ・ティ・ティ レゾナント株式会社 Information processing device, information processing method, and program
CN108415977A (en) * 2018-02-09 2018-08-17 华南理工大学 One is read understanding method based on the production machine of deep neural network and intensified learning
CN108829672A (en) * 2018-06-05 2018-11-16 平安科技(深圳)有限公司 Sentiment analysis method, apparatus, computer equipment and the storage medium of text
CN108845990A (en) * 2018-06-12 2018-11-20 北京慧闻科技发展有限公司 Answer selection method, device and electronic equipment based on two-way attention mechanism
CN108959556A (en) * 2018-06-29 2018-12-07 北京百度网讯科技有限公司 Entity answering method, device and terminal neural network based
US20190043379A1 (en) * 2017-08-03 2019-02-07 Microsoft Technology Licensing, Llc Neural models for key phrase detection and question generation
CN109670029A (en) * 2018-12-28 2019-04-23 百度在线网络技术(北京)有限公司 For determining the method, apparatus, computer equipment and storage medium of problem answers

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018060273A (en) * 2016-10-03 2018-04-12 エヌ・ティ・ティ レゾナント株式会社 Information processing device, information processing method, and program
US20190043379A1 (en) * 2017-08-03 2019-02-07 Microsoft Technology Licensing, Llc Neural models for key phrase detection and question generation
CN108415977A (en) * 2018-02-09 2018-08-17 华南理工大学 One is read understanding method based on the production machine of deep neural network and intensified learning
CN108829672A (en) * 2018-06-05 2018-11-16 平安科技(深圳)有限公司 Sentiment analysis method, apparatus, computer equipment and the storage medium of text
CN108845990A (en) * 2018-06-12 2018-11-20 北京慧闻科技发展有限公司 Answer selection method, device and electronic equipment based on two-way attention mechanism
CN108959556A (en) * 2018-06-29 2018-12-07 北京百度网讯科技有限公司 Entity answering method, device and terminal neural network based
CN109670029A (en) * 2018-12-28 2019-04-23 百度在线网络技术(北京)有限公司 For determining the method, apparatus, computer equipment and storage medium of problem answers

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈志刚: "《应用考试自动答题技术的研究》", 《中国博士论文全文库》 *

Similar Documents

Publication Publication Date Title
US10592607B2 (en) Iterative alternating neural attention for machine reading
CN110110041A (en) Wrong word correcting method, device, computer installation and storage medium
CN111241237B (en) Intelligent question-answer data processing method and device based on operation and maintenance service
CN109033068A (en) It is used to read the method, apparatus understood and electronic equipment based on attention mechanism
WO2021063089A1 (en) Rule matching method, rule matching apparatus, storage medium and electronic device
CN112131881A (en) Information extraction method and device, electronic equipment and storage medium
CN114003682A (en) Text classification method, device, equipment and storage medium
KR20200041199A (en) Method, apparatus and computer-readable medium for operating chatbot
US11869128B2 (en) Image generation based on ethical viewpoints
CN113486659B (en) Text matching method, device, computer equipment and storage medium
CN114662484A (en) Semantic recognition method and device, electronic equipment and readable storage medium
CN116402166B (en) Training method and device of prediction model, electronic equipment and storage medium
WO2021217866A1 (en) Method and apparatus for ai interview recognition, computer device and storage medium
CN112036439B (en) Dependency relationship classification method and related equipment
CN115510193B (en) Query result vectorization method, query result determination method and related devices
CN117764373A (en) Risk prediction method, apparatus, device and storage medium
CN110263346A (en) Lexical analysis method, electronic equipment and storage medium based on small-sample learning
CN116090471A (en) Multitasking model pre-training method and device, storage medium and electronic equipment
CN110222345A (en) Cloze Test answer method, apparatus, electronic equipment and storage medium
US11790885B2 (en) Semi-structured content aware bi-directional transformer
CN114970535B (en) Intention recognition method, system, device and storage medium
CN116933800B (en) Template-based generation type intention recognition method and device
Chang et al. Applying code transform model to newly generated program for improving execution performance
CN117453273A (en) Intelligent program code complement method and device
CN115617944A (en) Content recommendation method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
AD01 Patent right deemed abandoned
AD01 Patent right deemed abandoned

Effective date of abandoning: 20240105