CN112487806B - English text concept understanding method - Google Patents

English text concept understanding method Download PDF

Info

Publication number
CN112487806B
CN112487806B CN202011382136.5A CN202011382136A CN112487806B CN 112487806 B CN112487806 B CN 112487806B CN 202011382136 A CN202011382136 A CN 202011382136A CN 112487806 B CN112487806 B CN 112487806B
Authority
CN
China
Prior art keywords
concept
word
text
candidate
nouns
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011382136.5A
Other languages
Chinese (zh)
Other versions
CN112487806A (en
Inventor
李俊
姜兰兰
黄桂敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin University of Electronic Technology
Original Assignee
Guilin University of Electronic Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Electronic Technology filed Critical Guilin University of Electronic Technology
Priority to CN202011382136.5A priority Critical patent/CN112487806B/en
Publication of CN112487806A publication Critical patent/CN112487806A/en
Application granted granted Critical
Publication of CN112487806B publication Critical patent/CN112487806B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses an English text concept understanding method, which is an understanding model composed of an English text understanding preprocessing module, an English text keyword concept semantic feature extraction module, an English text keyword and concept semantic dependency relation extraction module and a candidate answer selection module which are sequentially connected. After the English text and the questions related to the English text are processed by the understanding method, the related concept answers of the questions can be obtained. The method solves the problem of English text concept understanding, and the answer result is more accurate than that of the traditional English text understanding method.

Description

English text concept understanding method
Technical Field
The invention relates to a natural language processing technology, in particular to an English text concept understanding method, which is only suitable for English text and is not suitable for Chinese text.
Background
Machine automated english text understanding is where by entering a stretch of english text and a number of questions related to the text, the machine relies on its own algorithm to find the answer to the question from the entered english text. The traditional English text understanding method mainly comprises a text-question semantic analysis understanding method and a text-question vocabulary matching understanding method. The text-question semantic analysis understanding method mainly relies on a predefined rule template, and utilizes manually designed language features to learn the relation between the text and the question, and firstly, a large amount of manually marked data is needed, which can cause the problem of sparse semantic features, and the method is only suitable for certain limited fields. The text-question vocabulary matching understanding method selects words or phrases with higher similarity as answers by calculating semantic similarity of key words in the text and the questions, and the method only matches similarity information between the questions and the words in the English text, so that accurate semantics of ambiguous words in the English text are difficult to obtain, and the problems of inaccurate answer selection are caused by reading and understanding. Aiming at the problems, the invention provides an English text concept understanding method, which acquires conceptual semantic information of English text by mining deep concept semantic features of key words in English text, and finally acquires more accurate answers by concept semantic dependency relationship between English text and words in the problems.
Disclosure of Invention
The general processing flow of the English text concept understanding method is shown in fig. 1, and comprises an English text understanding preprocessing module, an English text keyword concept semantic feature extraction module, an English text keyword and concept semantic dependency relation extraction module and a candidate answer selection module.
The English text understanding preprocessing module comprises the following processing flows: firstly, inputting English texts and questions to be read, respectively performing word segmentation, stop word removal and word lowercase treatment on the English texts and the questions to be read, and forming text sequences consisting of a plurality of sentences on English text clauses to be read; secondly, word segmentation, phrase segmentation and part-of-speech tagging are carried out on the text sequence output in the first step, so that a sequence consisting of words and phrases of English text to be read and questions is obtained; thirdly, respectively outputting nouns and noun phrases, verbs and adjective lists of sentence sequences in the English text to be read, and nouns and noun phrases, verbs and adjective lists of the problem sentence sequences.
The processing flow of the English text keyword concept semantic feature extraction module is as follows: firstly, inputting the English text to be read and the preprocessing result of the problem in an English text preprocessing module, and selecting nouns or noun phrases in the English text to be read; secondly, carrying out word vector representation on the nouns or noun phrases selected in the first step by using a pre-trained reading understanding data set; thirdly, respectively calculating cosine similarity between nouns or noun phrases in the questions and nouns or noun phrases selected in the English text to be read, sorting the calculated results in a descending order, and selecting the results with the top five ranks as candidate key nouns or noun phrases; fourthly, calculating the co-occurrence probability of the candidate key nouns or noun phrases and the candidate concepts to which the candidate key nouns or noun phrases belong, if the co-occurrence probability result is zero, continuing to execute the fifth step, otherwise, selecting the result with the highest probability as the candidate key nouns or noun phrases to which the candidate key nouns or noun phrases belong; fifth, if the co-occurrence probability result of the candidate key noun or noun phrase and the concept to which the candidate key noun or noun phrase belongs is zero, directly using the current noun or noun phrase as the concept to which the candidate key noun or noun phrase belongs; and sixthly, calculating the importance degree of the selected keywords, and obtaining the final importance degree score of the current keywords by calculating the weight coefficient between the current keywords and the context words and then weighting and summing.
The processing flow of the English text keyword and concept semantic dependency relation extraction module is as follows: first, inputting a word vector representation of a candidate key noun or noun phrase; second, the second; inputting a conceptual representation of the candidate key nouns or noun phrases; thirdly, extracting semantic dependency relations among candidate key nouns or noun phrases by using a pre-trained semantic dependency relation set; fourthly, extracting concept dependency relations among candidate key nouns or noun phrases by using a pre-trained concept dependency relation set; fifth, the cosine similarity between the semantic dependency relationship and the concept dependency relationship of the candidate key nouns or noun phrases is calculated, the calculated results are ordered in a descending order, and the result with the highest similarity is selected as the current key word and the concept semantic dependency relationship thereof.
The processing flow of the candidate answer selection module is as follows: first, inputting a conceptual representation of candidate key nouns or noun phrases; secondly, inputting the selected keywords and the concept semantic dependency relationship thereof; thirdly, constructing a concept semantic graph model, wherein a conceptual representation of candidate key nouns or phrases is used as nodes, and the selected keywords and concept semantic dependencies thereof are used as edges; fourthly, calculating Euclidean distance between each node vector and all node weighted average vectors in the conceptual semantic graph model, and taking probability distribution of the Euclidean distance as a weight value of the node; fifth, the node with the highest weight value is selected as the final answer.
The definition of the invention is as follows:
1. word part of speech tagging structure
The part-of-speech tagging in the invention is to tag the part of speech of a word in a text to be read and a question, and mainly tags nouns, verbs and adjectives, and the tagging format is as follows:
words and phrases 1 [ # part of speech 1 * Part of speech # 2 * Part of speech # 3 *……]
Words and phrases 2 [ # part of speech 1 * Part of speech # 2 * Part of speech # 3 *……]
……
Words and phrases n [ # part of speech 1 * Part of speech # 2 * Part of speech # 3 *……]
2. Word segmentation and phrase segmentation structure
The segmentation and phrase segmentation in the invention is to segment nouns or noun phrases in texts and questions to be read, and the segmentation format is as follows:
nouns and noun phrases 1 Segmentation and phrase segmentation mark 1
Nouns and noun phrases 2 Segmentation and phrase segmentation mark 2
……
Nouns and noun phrases n Segmentation and phrase segmentation mark n
3. Concept structure to which nouns or noun phrases belong
In general, the semantic concepts represented by the same noun in different texts are not identical, for example, "apple" has a concept of representing "fruit" and also has a concept of representing "company". The concept of nouns or noun phrases in the invention is to divide the concepts of nouns or noun phrases in the text to be read and the problems, so as to ensure the accuracy of the semantic concept of the current nouns or noun phrases, and the structure is as follows:
nouns and noun phrases 1 [ possible concepts of the genus ] 1 The possible concepts 2 … …, the possible concepts of the genus n ]
Nouns and noun phrases 2 [ possible concepts of the genus ] 1 The possible concepts 2 ,……,Possible concepts of the genus n ]
……
Nouns and noun phrases n [ possible concepts of the genus ] 1 The possible concepts 2 … …, the possible concepts of the genus n ]
4. Keyword and concept semantic dependency structure thereof
In addition to understanding the keyword information in the text, the semantic dependency relationship among the keywords needs to be further determined, and different semantic dependencies usually express different semantics of the text. The keywords and the conceptual semantic dependencies in the invention refer to: extracting and determining semantic dependency relations between nouns or noun phrases in English text to be read and questions, wherein the structure is as follows:
[ keyword ] 1 Dependency relationship 12 Keyword(s) 2 ]
[ keyword ] 1 Dependency relationship 13 Keyword(s) 3 ]
[ keyword ] 1 Dependency relationship 14 Keyword(s) 4 ]
……
[ keyword ] 2 Dependency relationship 23 Keyword(s) 3 ]
[ keyword ] 2 Dependency relationship 24 Keyword(s) 4 ]
……
[ keyword ] n Dependency relationship n,n+1 Keyword(s) n+1 ]
[ keyword ] n Dependency relationship n,n+2 Keyword(s) n+2 ]
Concept of 1 Dependency relationship 12 Concept of 2 ]
Concept of 1 Dependency relationship 13 Concept of 3 ]
Concept of 1 Dependency relationship 14 Concept of 4 ]
……
Concept of 2 Dependency relationship 23 Concept of 23 ]
Concept of 2 Dependency relationship 24 Concept of 24 ]
……
Concept of n Dependency relationship n,n+1 Concept of n,n+1 ]
Concept of n Dependency relationship n,n+2 Concept of n,n+2 ]
……
5. Directed edge structure between key words
The key words are used as nodes, the weight values among the words are used as edges to form a graph model, the directed edges are that the weight values from the word a to the word b are different from the weight values from the word b to the word a, and the structure is as follows:
[ keyword ] 1 Directed edge weight value 12 Keyword(s) 2 ][ keyword ] 2 Directed edge weight value 21 Keyword(s) 1 ]
[ keyword ] 1 Directed edge weight value 13 Keyword(s) 3 ][ keyword ] 3 Directed edge weight value 31 Keyword(s) 1 ]
……
[ keyword ] 1 Directed edge weight value 1n Keyword(s) n ][ keyword ] n Directed edge weight value n1 Keyword(s) 1 ]
[ keyword ] 2 Directed edge weight value 23 Keyword(s) 3 ][ keyword ] 3 Directed edge weight value 32 Keyword(s) 2 ]
[ keyword ] 2 Directed edge weight value 24 Keyword(s) 4 ][ keyword ] 4 Directed edge weight value 42 Keyword(s) 2 ]
……
[ keyword ] 2 Directed edge weight value 2n Keyword(s) n ][ keyword ] n Directed edge weight value n2 Keyword(s) 2 ]
[ keyword ] n Directed edge weight value n,n+1 Keyword(s) n+1 ][ keyword ] n+1 Directed side weightsWeight value n+1,n Keyword(s) n ]
[ keyword ] n Directed edge weight value n,n+2 Keyword(s) n+2 ][ keyword ] n+2 Directed edge weight value n+2,n Keyword(s) n ]
……
[ keyword ] n Directed edge weight value n,2n Keyword(s) 2n ][ keyword ] 2n Directed edge weight value 2n,n Keyword(s) n ]。
6. Certain concept calculation formula of noun or noun phrase
To determine a particular concept to which a noun or noun phrase belongs in the current text, a co-occurrence relationship between the current noun or noun phrase and the concept to which it belongs is used to calculate the following formula:
Figure GDA0004172872280000041
7. formula for calculating semantic similarity of nouns or noun phrases in English text to be read and questions
Figure GDA0004172872280000042
In the formula (2), the similarity between the noun or noun phrase in the problem and the noun or phrase in the text to be read is calculated, and the word vector can be obtained through training.
8. Weight coefficient calculation formula between current word and context word thereof
Figure GDA0004172872280000043
In formula (3), the numerator represents the correlation between the current term i and the term j in its context, and the denominator represents the sum of the correlations between the current term i and the n terms in its context.
9. Formula for calculating importance degree of current words or phrases
The weight coefficient between the current word i and the context word j is obtained in the formula (3), and the importance degree score of the current word or phrase in the text can be obtained by weighted summation of the weight coefficient, and the calculation formula is as follows:
Figure GDA0004172872280000044
10. directional edge weight value calculation formula among key words
The weight value of the candidate key word in the current graph model refers to the duty ratio of the sum of Euclidean distances between the candidate key word and all other adjacent node words, and the calculation formula is as follows:
Figure GDA0004172872280000045
11. word weight value normalization processing formula
After the word weight value is obtained, the normalized score of the word is obtained through normalization processing, and the final answer word is selected after descending order sorting. The normalized score of a term in english text refers to the ratio of the weight value of the term in the current graph model to the sum of the weight values of all terms in the current graph model, and the calculation formula is as follows:
Figure GDA0004172872280000051
in the formula (6), the weight value of the word i in the current graph model is calculated by the calculation formula (5).
As shown in fig. 2, the processing flow of the english text understanding preprocessing module is as follows:
p201 begins;
p202 reads in English text and questions to be read;
p203 separates text to be read from the question use mark;
p204, performing stop word processing on the text and the problem to be read;
p205 performs word lowercase processing on the text and the problem to be read;
p206, sentence dividing is carried out on the text to be read and the problems to form a plurality of sentence sequences;
p207 performs word segmentation and phrase segmentation processing on texts and questions to be read;
p208 marks the part of speech of the text sequence after word segmentation, and outputs nouns or noun phrases, verbs and adjective lists in the text to be read;
p209 marks the part of speech of the question sequence after word segmentation, and outputs nouns or noun phrases, verbs and adjective lists in the questions;
p210 counts the total number of words in the text to be read and the problem sequence after word segmentation respectively;
p211 performs grouping processing on the text sequence to be read after word segmentation, wherein every 20 words are separated by one group, and groups of less than 20 words are complemented by NULL;
p212 performs grouping processing on the segmented problem sequence, wherein the number of the problem sequences is generally less than 20 words, and NULL filling is used;
p212 ends.
As shown in fig. 3, the processing flow of the english text keyword concept semantic feature extraction module is as follows:
p301 starts;
p302 reads text to be read and a problem sequence result after word segmentation;
p303 calculates the distributed word vector of the text to be read and the words in the question, and generates 200-dimensional vector representation;
p304 uses formula (2) to calculate cosine similarity between nouns or noun phrases in questions and nouns or noun phrases in English text to be read;
p305 performs descending order sorting on the calculated cosine similarity results, and selects the result with the top five ranks as a candidate keyword or phrase of the text to be read related to the problem;
p306 calculates the co-occurrence probability of the candidate keyword or phrase and the candidate concept to which the candidate keyword or phrase belongs by using the formula (1);
p307 judges whether the co-occurrence probability of the keyword and the candidate concept to which the keyword belongs is zero, if so, P308 is executed, otherwise P309 is executed;
p308 uses the current candidate key word or phrase as the concept to which the current candidate key word or phrase belongs, and 200-dimensional word vector representation of the current candidate key word or phrase is the conceptual representation result to which the current candidate key word or phrase belongs;
p309 descending order the co-occurrence probability of the current candidate keyword and the possible concepts to which the current candidate keyword belongs, and determining the concepts to which the current candidate keyword belongs;
p310 carries out vectorization representation on concepts to which the determined keywords belong, and generates 200-dimensional word vector representation;
p311 calculates the weight coefficient of the current concept in its context using equation (3);
p312 calculates an importance score for the current concept in its context using equation (4);
p313 carries out descending order sorting on importance scores of the current concept in the context of the current concept to obtain semantic features of the current candidate keyword concept;
p314 ends.
As shown in fig. 4, the processing flow of the english text keyword and the conceptual semantic dependency relationship extraction module is as follows:
p401 starts;
p402 reads word vector representation results of candidate key nouns or noun phrases;
p403 reads the conceptual representation result of the candidate key nouns or noun phrases;
p404 inputs the conceptual representation result into a pre-trained concept semantic dependency set, and selects two candidate concept dependencies with the top order;
p405 performs position coding on the candidate concept dependency relationship, namely, calculates the position distance between the concept dependency relationship and the belonging concept pair;
p406 fuses the conceptual representation of the candidate key nouns or phrases and the conceptual dependency position codes, and inputs the fused vectors into a convolutional neural network;
p407 fuses word vector representation of candidate key nouns or noun phrases and conceptual dependency position codes, and inputs the fused vector into another convolutional neural network;
p408 carries out convolution layer calculation on input vectors of P406 and P407 respectively, and P406 and P407 share network parameters;
p409 performs word vector and concept vector pooling operation on the convolution calculation result of P408;
p410 splices the pooling operation results obtained by P409 respectively;
p411 uses a softmax function to classify the splicing result of P410 to obtain the final concept dependency relationship result;
p412 ends.
As shown in fig. 5, the process flow of the candidate answer selection module is as follows:
p501 begins;
p502 inputs a conceptual representation of candidate key nouns or noun phrases;
p503 inputs the selected keywords and the concept semantic dependency relationship;
p504 uses the conceptual representation of candidate key nouns or phrases as nodes, uses the selected key words and the concept semantic dependency relationships thereof as edges, and builds a concept semantic graph model;
p505 uses the formula (5) to calculate the Euclidean distance between any two nodes in the conceptual semantic graph model;
p506 calculates the directed edge weight value between nodes using equation (6);
p507 descending order of weight values among all nodes;
p508 selects the maximum weight value between nodes, and takes the candidate key word of the node and the concept thereof as the final answer;
p509 ends.
The method solves the problem of English text concept understanding, and the answer result is more accurate than that of the traditional English text understanding method. And after the English text and the questions related to the English text are processed by the understanding method, finally, the related concept answers of the questions can be obtained.
Drawings
FIG. 1 is a general process flow diagram of the method of the present invention;
FIG. 2 is a flow chart of the English text pre-processing module processing of the method of the present invention;
FIG. 3 is a flow chart of the processing of the semantic feature extraction module of the English text keyword concept of the method of the present invention;
FIG. 4 is a flow chart of the process of the English text keyword and its conceptual semantic dependency extraction module;
FIG. 5 is a flow chart of a candidate answer selection module process of the method of the invention.
Detailed Description
The specific implementation mode of the English text concept understanding method is divided into the following five steps.
A first step of: executing English text preprocessing module "
The English text input in the embodiment of the invention is obtained from standard reading understanding text, questions and answers in the Stanford reading understanding data set, and the English text content and questions are as follows:
the english text content to be read is as follows:
On June 14,1946,Sam was born in New York City.After graduating from the military school in 1964,Sam entered the Wharton School of the University of Pennsylvania.In college,Sam carefully learned new knowledge in the business field and cultivated a smart business savvy.In college,Sam entered a real estate company founded by his father.His father's business secrets taught Sam more experience.When he was a senior,he wanted to make a breakthrough in the business world.From time to time,he went abroad to inspect the latest and future economic trends,and deeply realized that the most important corporate business strategy today is"marketing."In 1999,Sam was again active in investment activities in the real estate,casino,entertainment,sports and transportation sectors.His assets have exceeded$3billion.
the questions and answers are as follows:
Where was Sam born?
Ground Truth Answers:[New York][New York][New York]
Prediction:New York
When was Sam born?
Ground Truth Answers:[1946][1946][1946]
Prediction:1946
When did Sam become the President of the United States?
Ground Truth Answers:<No Answer>
Prediction:<No Answer>
(1) After word segmentation and part-of-speech tagging are carried out on the English text to be read, the generated part-of-speech tagging result is as follows:
On[on#IN*],[#null*],June[june#NNP*],[#null*],14[14#CD*],[#null*],1946[1946#CD*],[#null*],Donald[donald#NNP*],[#null*],Sam[Sam#NNP*],[#null*],was[is#VBD*],[#null*],born[born#VBN*],[#null*],in[in#IN*],[#null*],New York[new york#NNP*],[#null*],City[city#NNP*],[#null*],After[after#IN*],[#null*],graduating[graduate#VBG*],[#null*],from[from#IN*],[#null*],the[the#DT*],[#null*],military[military#JJ*],[#null*],school[school#NN*],[#null*],in[in#IN*],[#null*],1964[1946#CD*],[#null*],Sam[Sam#NNP*],[#null*],entered[enter#VBD*],[#null*],the[the#DT*],[#null*],Wharton[wharton#NNP*],[#null*],School[school#NNP*],[#null*],of[of#IN*],[#null*],the[the#DT*],[#null*],University[university#NNP*],[#null*],of[of#IN*],[#null*],Pennsylvania[pennsylvania#NNP*],[#null*],college[college#NN*],[#null*],Sam[Sam#NNP*],[#null*],carefully[carefully#RB*],[#null*],learned[learn#VBD*],[#null*],new[new#JJ*],[#null*],knowledge[knowledge#NN*],[#null*],in[in#IN*],[#null*],the[the#DT*],[#null*],business[business#NN*],[#null*],field[field#NN*],[#null*],and[and#CC*],[#null*],cultivated[cultivate#VBD*],[#null*],a[a#DT*],[#null*],smart[smart#JJ*],[#null*],business[business#NN*],[#null*],savvy[savvy#NN*],[#null*],In[in#IN*],[#null*],college[college#NN*],[#null*],Sam[Sam#NNP*],[#null*],entered[enter#VBD*],[#null*],a[a#DT*],[#null*],real[real#JJ*],[#null*],estate[estate#NN*],[#null*],company[company#NN*],[#null*],founded[found#VBD*],[#null*],by[by#IN*],[#null*],his[his#PRP*],[#null*],father[father#NN*],[#null*],His[his#PRP*],[#null*],father[father#NN*],[#null*],business[business#NN*],[#null*],secrets[secret#NNS*],[#null*],taught[teach#VBD*],[#null*],Sam[Sam#NNP*],[#null*],more[more#JJR*],[#null*],experience[experience#NN*],[#null*],When[when#WRB*],[#null*],he[he#PRP*],[#null*],was[is#VBD*],[#null*],a[a#DT*],[#null*],senior[senior#JJ*],[#null*],he[he#PRP*],[#null*],wanted[want#VBD*],[#null*],make[make#VB*],[#null*],breakthrough[breakthrough#NN*],[#null*],business[business#NN*],[#null*],world[world#NN*],[#null*],From[from#IN*],[#null*],time[time#NN*],[#null*],to[to#TO*],[#null*],time[time#NN*],[#null*],he[he#PRP*],[#null*],went[go#VBD*],[#null*],abroad[abroad#RB*],[#null*],inspect[inspect#VB*],[#null*],latest[latest#JJS*],[#null*],and[and#CC*],[#null*],future[future#JJ*],[#null*],economic[economic#JJ*],[#null*],trends[trend#NNS*],[#null*],and[and#CC*],[#null*],deeply[deeply#RB*],[#null*],realized[realize#VBD*],[#null*],that[that#IN*],[#null*],the[the#DT*],[#null*],most[most#RBS*],[#null*],important[important#JJ*],[#null*],corporate[corporate#JJ*],[#null*],business[business#NN*],[#null*],strategy[strategy#NN*],[#null*],today[today#NN*],[#null*],is[is#VBZ*],[#null*],marketing[market#NN*],[#null*],In[in#IN*],[#null*],1999[1999#CD*],[#null*],Sam[Sam#NNP*],[#null*],was[is#VBD*],[#null*],again[again#RB*],[#null*],active[active#JJ*],[#null*],in[in#IN*],[#null*],investment[investment#NN*],[#null*],activities[activity#NNS*],[#null*],real[real#JJ*],[#null*],estate[estate#NN*],[#null*],casino[casino#NN*],[#null*],entertainment[entertainment#NN*],[#null*],sports[sport#NNS*],[#null*],and[and#CC*],[#null*],transportation[transportation#NN*],[#null*],sectors[sector#NNS*],[#null*],His[his#PRP*],[#null*],assets[asset#NNS*],[#null*],have[have#VBP*],[#null*],exceeded[exceed#VBN*],[#null*],billion[billion#CD*][#null*]
part-of-speech tagging of question text:
[question#1,Where[where#WRB*],was[is#VBD*],Donald[donald#NNP*],Sam[Sam#NNP*],born[born#VBN*]]
[question#2,When[when#WRB*],was[is#VBD*],Donald[donald#NNP*],Sam[Sam#NNP*],born[born#VBN*]]
[question#3,When[when#WRB*],did[do#VBD*],Sam[Sam#NNP*],become[become#VB*],the[the#DT*],President[president#NNP*],of[of#IN*],the[the#DT*],United[united#NNP*],States[states#NNP*]]
(2) After noun or noun phrase dicing is performed on English text to be read, the generated noun or noun phrase dicing result is as follows:
/On[on#IN*]/June[june#NNP*]/14[14#CD*]/1946[1946#CD*]/DonaldSam[donaldSam#NN P*]/was[is#VBD*]/born[born#VBN*]/in[in#IN*]/NewYork[newyork#NNP*]/City[city#NNP*]/After[after#IN*]/graduating[graduate#VBG*]/from[from#IN*]/the[the#DT*]/militar y[military#JJ*]/school[school#NN*]/in[in#IN*]/1964[1946#CD*]/Sam[Sam#NNP*]/enter ed[enter#VBD*]/the[the#DT*]/Wharton[wharton#NNP*]/School[school#NNP*]/of[of#IN*]/the[the#DT*]/University[university#NNP*]/of[of#IN*]/Pennsylvania[pennsylvania#NNP*]/college[college#NN*]/Sam[Sam#NNP*]/carefully[carefully#RB*]/learned[learn#VBD*]/new[new#JJ*]/knowledge[knowledge#NN*]/in[in#IN*]/the[the#DT*]/business[busi ness#NN*]/field[field#NN*]/and[and#CC*]/cultivated[cultivate#VBD*]/a[a#DT*]/smar t[smart#JJ*]/business[business#NN*]/savvy[savvy#NN*]/In[in#IN*]/college[college#NN*]/Sam[Sam#NNP*]/entered[enter#VBD*]/a[a#DT*]/real[real#JJ*]/estate[estate#NN*]/company[company#NN*]/founded[found#VBD*]/by[by#IN*]/his[his#PRP*]/father[fathe r#NN*]/His[his#PRP*]/father[father#NN*]/business[business#NN*]/secrets[secret#NN S*]/taught[teach#VBD*]/Sam[Sam#NNP*]/more[more#JJR*]/experience[experience#NN*]/When[when#WRB*]/he[he#PRP*]/was[is#VBD*]/a[a#DT*]/senior[senior#JJ*]/he[he#PRP*]/wanted[want#VBD*]/make[make#VB*]/breakthrough[breakthrough#NN*]/business[busine ss#NN*]/world[world#NN*]/From[from#IN*]/time[time#NN*]/to[to#TO*]/time[time#NN*]/he[he#PRP*]/went[go#VBD*]/abroad[abroad#RB*]/inspect[inspect#VB*]/latest[latest#JJS*]/and[and#CC*]/future[future#JJ*]/economic[economic#JJ*]/trends[trend#NNS*]/and[and#CC*]/deeply[deeply#RB*]/realized[realize#VBD*]/that[that#IN*]/the[the#DT*]/most[most#RBS*]/important[important#JJ*]/corporate[corporate#JJ*]/business[business#NN*]/strategy[strategy#NN*]/today[today#NN*]/is[is#VBZ*]/marketing[marke t#NN*]/In[in#IN*]/1999[1999#CD*]/Sam[Sam#NNP*]/was[is#VBD*]/again[again#RB*]/act ive[active#JJ*]/in[in#IN*]/investment[investment#NN*]/activities[activity#NNS*]/real[real#JJ*]/estate[estate#NN*]/casino[casino#NN*]/entertainment[entertainment#NN*]/sports[sport#NNS*]/and[and#CC*]/transportation[transportation#NN*]/sectors[sector#NNS*]/His[his#PRP*]/assets[asset#NNS*]/have[have#VBP*]/exceeded[exceed#VBN*]/billion[billion#CD*]/
and a second step of: executing 'English text keyword concept semantic feature extraction module'
(1) On the basis of the first step, word vector representation is carried out on nouns or noun phrases in the preprocessed English text, a 200-dimensional vector representation form is generated, and word vector representation results of partial words are as follows:
business:[-2.59042799e-01 1.56627929e+00 -1.55328619e+00 1.16095312e-01
8.28763063e-04 1.13678873e+00 1.07951772e+00 6.84864402e-01
-3.05663824e-01 -9.47709203e-01 -9.14580405e-01 1.78567588e-01
9.55694243e-02 1.46830523e+00 4.33245957e-01 5.62674284e-01
-1.20297933e+00 -3.30155420e+00 2.39313304e-01 5.39111316e-01
1.37632453e+00 -5.18846154e-01 -1.72100616e+00 -7.81766713e-01
8.12833726e-01 -6.71297908e-01 -2.55080253e-01 -9.63443890e-02
3.75874341e-02 -1.85547560e-01 -5.85621536e-01 -1.32061994e+00
-1.15084291e+00 1.19156432e+00 6.12567663e-01 -4.88826752e-01
2.49715820e-01 -1.13945462e-01 -4.11442071e-01 7.39667833e-01
7.39755988e-01 6.95835590e-01 -2.12423000e-02 -6.15605295e-01
-8.16631496e-01 -4.95573401e-01 1.19313017e-01 -2.32566208e-01
-7.09587812e-01 -2.01330781e+00 6.02940023e-01 2.97293991e-01
-8.00344229e-01 2.30241203e+00 -7.61904955e-01 -4.40068513e-01
5.51879108e-01 4.55911309e-01 7.38105178e-01 1.89581215e+00
1.05786526e+00 1.08144259e+00 -2.95965791e-01 -9.70735908e-01
7.77064264e-01 1.23684049e+00 -1.16662085e+00 1.25651217e+00
-5.55168211e-01 1.06070185e+00 6.27060890e-01 1.89990854e+00
-4.69613642e-01 3.78263712e-01 1.10785294e+00 5.32317340e-01
1.78810787e+00 -1.90469372e+00 -6.32371485e-01 5.51381886e-01
-2.27715746e-01 -1.09175253e+00 -1.68093562e+00 1.41336232e-01
8.34236890e-02 -2.33603567e-01 -1.16054632e-01 -6.98961541e-02
5.63091874e-01 1.23674989e+00 -5.66389710e-02 -9.67171729e-01
4.83761936e-01 -1.42906487e-01 6.26178682e-01 1.67304240e-02
1.24199748e+00 -3.84036869e-01 4.28546637e-01 -6.10349886e-02
1.66938648e-01 3.96170676e-01 4.63583052e-01 -9.17208970e-01
-5.85813046e-01 -6.92225516e-01 -9.51395154e-01 -6.38596237e-01
3.08472663e-01 -5.36561683e-02 -7.41630197e-02 -1.49298131e-01
-6.27747476e-01 1.96738780e+00 2.24164918e-01 3.24346006e-01
2.43802595e+00 -3.70077312e-01 8.90044630e-01 9.88620240e-03
1.34185135e-01 6.29028857e-01 -1.10365725e+00 -3.79670203e-01
5.07582128e-01 7.99743831e-01 -8.41116905e-01 -1.29741180e+00
-2.33467355e-01 -8.41176212e-01 2.48963069e-02 5.14094293e-01
1.13484383e+00 -7.05592871e-01 5.25330365e-01 -3.20291258e-02
-2.67125368e-01 -4.17263657e-01 2.82960385e-01 -9.61873531e-01
3.51352364e-01 -6.42272592e-01 -2.43765354e+00 2.40605965e-01
-1.68029988e+00 3.13021213e-01 -9.40301061e-01 1.38528538e+00
-1.08122826e-01 -8.73246133e-01 1.75076559e-01 5.97331882e-01
-1.39861321e+00 -3.17869186e-01 3.57864857e-01 -1.39695033e-01
6.25059903e-01 9.22169983e-01 -8.13591704e-02 -9.10186917e-02
-4.52748924e-01 1.60742199e+00 4.60776240e-01 -7.78419793e-01
-1.02559980e-02 1.52036750e+00 -1.84489512e+00 -6.73551381e-01
1.20446825e+00 2.46079013e-01 8.50453556e-01 -7.69736469e-01
1.84337378e-01 1.13760567e+00 4.32253242e-01 -6.89828217e-01
-7.06000090e-01 9.13547158e-01 1.73478693e-01 1.42103589e+00
7.80944586e-01 8.11390400e-01 -7.83208683e-02 -5.13207555e-01
-1.06880486e+00 -7.83280969e-01 -5.65739870e-01 -2.30160475e-01
6.54523432e-01 -9.24793482e-01 -2.84793049e-01 1.01340890e+00
9.57501888e-01 2.22771317e-01 3.90049964e-01 1.60163665e+00
2.16183096e-01 7.16380775e-01 8.28462422e-01 1.71259999e-01]
savvy:[0.06921814 0.08985148 0.10130031 -0.01975576 0.00613875 0.06860386
0.07878992 0.15682952 -0.079765-0.01364678 0.05102079 0.00548506
0.03024285 0.11446191 0.09568619 0.04286152 -0.13500483 -0.08419026
-0.01513231 0.11023535 0.06145927 0.00069024 -0.06334386 0.02397627
-0.13211721 0.10869574 -0.01575115 0.01712319 0.10889407 -0.03390257
-0.08128685 -0.00774771 0.07443068 -0.02511345 0.02655445 0.10193694
0.01160171 -0.03776457 0.18400234 -0.05345958 0.03763071 0.01195812
0.202218 0.0132231-0.19167267 0.04500511 0.0789397-0.01589778
0.13028212 -0.06922863 -0.06018286 0.08444316 -0.03776797 -0.14269106
-0.13448288 0.01259283 0.01702782 -0.00926038 0.01356861 0.03965648
-0.08855332 0.06088002 -0.10612214 -0.09905583 0.06241861 0.1188715
-0.04242382 0.06692507 0.02515559 -0.00878243 0.02058123 -0.00600162
0.05146226 0.10495976 0.06806118 0.03343373 0.11794326 -0.11481091
-0.12138966 0.02585844 -0.03958427 -0.02640601 -0.05624481 -0.01868268
-0.15891208 0.03756193 -0.03025833 0.01944492 0.10282031 -0.03299379
-0.00475729 0.14685485 -0.06587423 0.0149247 0.04896393 -0.06590062
0.11573595 -0.03508269 0.0751999-0.04895703 0.01599983 0.07251011
-0.09170596 -0.02906534 -0.04846796 0.06372514 -0.07596011 -0.02131839
-0.05209391 0.13131613 -0.22141725 -0.00611135 -0.04040148 -0.03427979
0.0410597-0.02699451 0.04695193 0.01251158 0.03160017 0.00255954
-0.07341788 -0.05954413 -0.10209412 0.00679443 0.00787201 0.00381293
-0.05103155 -0.14217651 0.05005223 -0.00610479 0.06478029 -0.1646596
0.09607032 -0.09883969 -0.05145364 0.00964217 0.14213578 0.01998526
-0.06588282 -0.0529303 0.06216754 0.02636117 -0.11312462 0.01608072
-0.01465175 -0.00260696 -0.04901178 0.00495274 0.05634578 0.00028076
0.06987215 0.09869573 0.11174746 0.01768979 -0.12532751 -0.04939596
-0.05851451 -0.17550679 0.24233076 0.0345888 0.08057397 -0.02626101
0.00672352 -0.03837141 0.01871823 -0.07934792 0.01752568 -0.133829
0.0478517 0.0792998-0.02651287 0.05125243 0.09184576 0.15655527
0.03717348 -0.01241744 -0.08104452 0.06890302 0.01926608 -0.10523076
-0.11265913 -0.09659582 0.04266785 0.04144118 -0.14290997 -0.02705677
-0.02053294 0.05827883 -0.01985832 -0.05965782 0.14561172 -0.04690978
0.10358934 0.04019428 0.06787848 0.01593667 -0.13111904 -0.06707609
0.08144604 0.04385952]
……
(2) Calculating the similarity between the text to be read and the words in the problem, and sorting the top 20 related words by the similarity, wherein partial results are as follows:
the first 20 words most relevant to the word bussiness are:
financial 0.6826215982437134
consumer 0.6628485918045044
banking 0.6589778661727905
marketing 0.6573569178581238
corporate 0.6446224451065063
firms 0.6148818731307983
investments 0.6143110990524292
insurance 0.6100685596466064
retail 0.604107141494751
financing 0.5926154851913452
management 0.5904277563095093
buying 0.5883773565292358
businesses 0.5873700380325317
markets 0.5868954062461853
employees 0.5846246480941772customer 0.583165168762207
marketplace 0.5821336507797241enterprise 0.5816493034362793welfare 0.5800684690475464
jobs 0.5792907476425171
the first 20 words most relevant to the word born are: married 0.6355774402618408
christened 0.5665861368179321novelist 0.5470004677772522actress 0.5364381670951843
apprenticed 0.530538022518158maclean 0.5302119255065918
interred 0.525600790977478
beatrice 0.525336503982544
desmond 0.5203564763069153
beecher 0.5200093388557434
lafcadio 0.5169895887374878corinne 0.5124737024307251
louise 0.5076141357421875
patricia 0.5058313012123108anna 0.5041660070419312
sarah 0.5030679702758789
ballerina 0.5028273463249207angela 0.500499963760376
died 0.4998953342437744
The first 20 terms to which the anton 0.4994434416294098 is most relevant to the terms are: inventnents 0.7772245407104492profits 0.7662760019302368
revenues 0.7530128359794617revenue 0.7483336925506592
funds 0.7441127896308899
investors 0.7420588731765747firms 0.7401308417320251
debts 0.7333177924156189
loans 0.7315413951873779
shareholders 0.7296478748321533businesses 0.7258060574531555
employees 0.7210573554039001
costs 0.7146604061126709
expenses 0.7083866596221924
purchases 0.7039198279380798
earnings 0.7029934525489807
subsidies 0.7015666961669922
payments 0.7007849812507629
goods 0.6995357275009155
The first 20 words most relevant to the word keyword by the connections 0.6982542872428894 are: the antigens 0.7221421003341675
administration 0.6922751069068909regime 0.6741224527359009parliament 0.6391890048980713
electorate 0.6347169876098633
prc 0.6314117908477783
legislature 0.6243986487388611
legislation 0.6075990796089172
authorities 0.6037262082099915
senate 0.5914326906204224parliamentary 0.5884313583374023coalition 0.5815113186836243
policies 0.5814124345779419
policy 0.5776035785675049
junta 0.5771560668945312privatization 0.5765987038612366economy 0.5755563974380493
taxation 0.5730693340301514
autonomy 0.5683175325393677
kmt 0.5680544376373291
The first 20 words most relevant to the word merchant are: shipyards 0.6756232976913452
kaiserliche 0.6664568185806274
sailing 0.6592236757278442
ship 0.6573899984359741
tonnage 0.647135853767395
ships 0.635455846786499marine 0.6257590651512146
fleet 0.6237657070159912marines 0.6213807463645935warship 0.6195002794265747
aboard 0.619187593460083
sailors 0.6180980205535889
frigate 0.6149691343307495navy 0.612155556678772
surveyors 0.6083635687828064harbours 0.6074026823043823
submarines 0.6049712896347046hms 0.6042121052742004
escort 0.6040891408920288
cruiser 0.6031404733657837
The first 20 words most relevant to the word enterreneleurs are: journ alists 0.6671593189239502
professionals 0.6548882722854614
intellectuals 0.6519579887390137
pioneers 0.6428285241127014hackers 0.6421672105789185
capitalists 0.6376326084136963
consultants 0.6374378204345703
comedians 0.6370235681533813
economists 0.6340476274490356
executives 0.633492112159729
distributors 0.6326943635940552
businessmen 0.6269378662109375
firms 0.6252130270004272producers 0.6200482249259949
filmmakers 0.6186020970344543
ventures 0.6152722239494324
investors 0.6144982576370239
charities 0.6126831769943237
engineers 0.6111494302749634
writers 0.6104857325553894
The first 20 words most relevant to the word president are:
presidency 0.7305980324745178
chairman 0.7099910974502563
governor 0.6958410739898682
presidents 0.6945462226867676
taoiseach 0.6547336578369141
chancellor 0.6463114023208618
senator 0.6372398138046265
presidential 0.6284170150756836
deputy 0.6119073629379272
democrat 0.6081264019012451
incumbent 0.5973949432373047
eisenhower 0.5925225019454956
senate 0.5860650539398193
reagan 0.583078145980835
mayor 0.5807799696922302
secretary 0.5800341367721558
pinochet 0.578060507774353
resigns 0.576712429523468
ould 0.5762377381324768
taya 0.5750265121459961
……
(3) Selecting candidate keywords based on similarity ranking of key nouns or noun phrases
The top five keywords are as follows:
Sam:0.8765474881873130
business:0.7866258742548321
business savvy:0.7456898574232562
government:0.7120154685214523
assets:0.6956024587541035
……
(4) Calculating probability of concept to which candidate keyword belongs
The calculation result of the probability of the candidate keyword Sam belonged to the concept is as follows:
Sam[merchant,entrepreneurs,president]
probability of belonging to the concept of merchant: 0.8532689542652531
Probability of belonging to the concept of enterreneleurs: 0.8325621421303526 probability of belonging to president concept: 0.2102145741021432
……
The probability result of a certain concept of the keyword can be obtained, and the probability of the concept of the word Sam belonging to the businessman or the enterprise in the text to be read is far greater than that of the president.
And a third step of: executing English text keyword and concept semantic dependency relation extraction module thereof "
(1) Semantic dependencies between key nouns are extracted, and partial results are as follows:
[Sam,born in,New York]
[Sam,born in year,1946]
[Sam,university of,Pennsylvania]
……
after the key nouns or noun phrases are conceptualized, candidate semantic dependencies between concepts are extracted from the knowledge base as follows:
[Sam,born in,New York]
[Sam,president of,United State]
……
(2) Selection validation of candidate semantic dependencies
After extracting semantic dependency relations among candidate key nouns or noun phrases and candidate semantic dependency relations after the candidate key nouns or noun phrases are conceptualized, respectively inputting the two semantic dependency relations into two independent convolution networks for feature extraction, and in the step, in order to fully acquire semantic features of the two semantic dependency relations, respectively acquiring hidden layer semantic information by using a three-layer convolution network structure.
The information output from the hidden layer is respectively pooled by two independent convolution networks, in this step, the hidden layer information is weighted-averaged using an averaging pooling operation, and the output result of the pooled layer is input to the fully connected layer.
And in the full connection layer, splicing the output results of the two independent pooling layers to form a new feature vector.
And classifying and calculating the spliced vectors of the full-connection layer through a softmax function to obtain probability scores of candidate semantic dependencies, sorting the probability scores in descending order, and selecting the result with the highest probability as the final semantic dependency.
Fourth step: executing "candidate answer selection Module"
The conceptual training representation of the candidate key nouns or phrases extracted in the second step, i.e. the 200-dimensional vector, is input.
Concept semantic dependencies between nouns or noun phrases extracted in the third step are entered.
Concept semantic graph models are built using concept representations of candidate key nouns or phrases as nodes and concept semantic dependencies as edges.
The directional edge weight values between nodes in the graph model are calculated using equation (5).
After the term weight value is obtained, the normalized score of the term is obtained through the normalization processing of the formula (6), and the final answer term is obtained through descending order sorting.

Claims (7)

1. An English text concept understanding method is characterized in that: the method comprises an understanding model consisting of an English text understanding preprocessing module, an English text keyword concept semantic feature extraction module, an English text keyword and concept semantic dependency relation extraction module and a candidate answer selection module which are sequentially connected, wherein the understanding method comprises the following steps:
(1) The English text understanding preprocessing module inputs English text and questions to be read, and word segmentation, stop word removal and word lowercase processing are respectively carried out on the English text and the questions to be read; performing part-of-speech tagging and phrase segmentation on English texts and questions to be read after word segmentation, stop word removal and word lowercase treatment; outputting the processed English text to be read and the preprocessing result of the problem;
(2) The method comprises the steps of firstly, inputting a preprocessing result of an English text to be read and a preprocessing result of a problem in an English text preprocessing module, and marking nouns or noun phrases in the English text to be read and the problem; secondly, calculating word vectors of labeled nouns or noun phrases in English texts and questions to be read; thirdly, calculating cosine similarity between nouns or noun phrases in the text to be read and nouns or noun phrases in the questions, sorting the calculated cosine similarity results in a descending order, and selecting the result with the top five ranks as a candidate key noun or noun phrase; fourthly, calculating the co-occurrence probability of the candidate key nouns or noun phrases and the candidate concepts to which the candidate key nouns or noun phrases belong, if the co-occurrence probability result is zero, continuing to execute the fifth step, otherwise, selecting the result with the highest probability as the candidate key nouns or noun phrases to which the candidate key nouns or noun phrases belong; fifth, if the co-occurrence probability result of the candidate key noun or noun phrase and the concept to which the candidate key noun or noun phrase belongs is zero, directly using the current noun or noun phrase as the concept to which the candidate key noun or noun phrase belongs; sixthly, calculating a weight coefficient between the current keyword and the context word thereof, and then carrying out weighted summation to obtain a final importance degree score of the current keyword;
(3) English text keywords and the concept semantic dependency relation extraction module inputs word vector representation of candidate key nouns or noun phrases; inputting a conceptual representation of the candidate key nouns or noun phrases; extracting semantic dependency relations among candidate key nouns or noun phrases by using a pre-trained semantic dependency relation set; extracting concept dependency relations among candidate key nouns or noun phrases by using a pre-trained concept dependency relation set; calculating cosine similarity between semantic dependency relationship and concept dependency relationship of candidate key nouns or noun phrases, sorting the calculated results in descending order, and selecting the result with highest similarity as the current key word and the concept semantic dependency relationship thereof;
(4) The candidate answer selection module inputs a conceptual representation of candidate key nouns or noun phrases; inputting the selected keywords and the concept semantic dependency relationship thereof; using the conceptual representation of the candidate key nouns or phrases as nodes and using the selected key words and the conceptual semantic dependency relationships thereof as edges to construct a conceptual semantic representation graph model; the Euclidean distance between each node vector and the weighted average vector of all nodes in the conceptual semantic graph model is calculated, and the probability distribution of the Euclidean distance is used as the weight value of the nodes; and selecting the node with the highest weight value as a final answer.
2. The understanding method according to claim 1, characterized in that: the English text understanding preprocessing module comprises the following processing steps:
p201 begins;
p202 reads in English text and questions to be read;
p203 separates text to be read from the question use mark;
p204, performing stop word processing on the text and the problem to be read;
p205 performs word lowercase processing on the text and the problem to be read;
p206, sentence dividing is carried out on the text to be read and the problems to form a plurality of sentence sequences;
p207 performs word segmentation and phrase segmentation processing on texts and questions to be read;
p208 marks the part of speech of the text sequence after word segmentation, and outputs nouns or noun phrases, verbs and adjective lists in the text to be read;
p209 marks the part of speech of the question sequence after word segmentation, and outputs nouns or noun phrases, verbs and adjective lists in the questions;
p210 counts the total number of words in the text to be read and the problem sequence after word segmentation respectively;
p211 performs grouping processing on the text sequence to be read after word segmentation, wherein every 20 words are separated by one group, and groups of less than 20 words are complemented by NULL;
p212 performs grouping processing on the segmented problem sequence, wherein the number of the problem sequences is generally less than 20 words, and NULL filling is used;
p212 ends.
3. The understanding method according to claim 1, characterized in that: the calculation formula of the English text keyword concept semantic feature extraction module is defined as follows:
(1) Certain concept calculation formula of noun or noun phrase
The probability of a term or term phrase in english text belonging to a concept refers to the ratio of the co-occurrence number of the term and the concept in the current text to the sum of the co-occurrence numbers of the term and all possible concepts in the training text set, and the calculation formula is as follows:
Figure FDA0004172872270000021
(2) Formula for calculating semantic similarity of nouns or noun phrases in English text to be read and questions
The semantic similarity between English text and nouns or phrases in question refers to the ratio of the inner product of word vectors of English text and words in question to the model of word vectors, and the calculation formula is as follows:
Figure FDA0004172872270000022
in the calculation formula (2), the word vector can be obtained through training;
(3) Weight coefficient calculation formula between current word and context word thereof
The weight coefficient between the current word and the context word refers to the correlation between the current word and a certain word in the context, and the ratio of the correlation between the current word and the sum of the correlations of all words in the context is calculated as follows:
Figure FDA0004172872270000023
(4) Formula for calculating importance degree of current words or phrases
The weight coefficient between the current word i and the context word j is obtained in the formula (3), and the importance degree score of the current word or phrase in the text can be obtained by weighted summation of the weight coefficient, and the calculation formula is as follows:
Figure FDA0004172872270000024
4. an understanding method according to claim 3, characterized in that: the processing steps of the English text keyword concept semantic feature extraction module are as follows:
p301 starts;
p302 reads text to be read and a problem sequence result after word segmentation;
p303 calculates the distributed word vector of the text to be read and the words in the question, and generates 200-dimensional vector representation;
p304 uses formula (2) to calculate cosine similarity between nouns or noun phrases in questions and nouns or noun phrases in English text to be read;
p305 performs descending order sorting on the calculated cosine similarity results, and selects the result with the top five ranks as a candidate keyword or phrase of the text to be read related to the problem;
p306 calculates the co-occurrence probability of the candidate keyword or phrase and the candidate concept to which the candidate keyword or phrase belongs by using the formula (1);
p307 judges whether the co-occurrence probability of the keyword and the candidate concept to which the keyword belongs is zero, if so, P308 is executed, otherwise P309 is executed;
p308 uses the current candidate key word or phrase as the concept to which the current candidate key word or phrase belongs, and 200-dimensional word vector representation of the current candidate key word or phrase is the conceptual representation result to which the current candidate key word or phrase belongs;
p309 descending order the co-occurrence probability of the current candidate keyword and the possible concepts to which the current candidate keyword belongs, and determining the concepts to which the current candidate keyword belongs;
p310 carries out vectorization representation on concepts to which the determined keywords belong, and generates 200-dimensional word vector representation;
p311 calculates the weight coefficient of the current concept in its context using equation (3);
p312 calculates an importance score for the current concept in its context using equation (4);
p313 carries out descending order sorting on importance scores of the current concept in the context of the current concept to obtain semantic features of the current candidate keyword concept;
p314 ends.
5. The understanding method according to claim 1, characterized in that: the processing steps of the English text keyword and the concept semantic dependency relation extraction module are as follows:
p401 starts;
p402 reads word vector representation results of candidate key nouns or noun phrases;
p403 reads the conceptual representation result of the candidate key nouns or noun phrases;
p404 inputs the conceptual representation result into a pre-trained concept semantic dependency set, and selects two candidate concept dependencies with the top order;
p405 performs position coding on the candidate concept dependency relationship, namely, calculates the position distance between the concept dependency relationship and the belonging concept pair;
p406 fuses the conceptual representation of the candidate key nouns or phrases and the conceptual dependency position codes, and inputs the fused vectors into a convolutional neural network;
p407 fuses word vector representation of candidate key nouns or noun phrases and conceptual dependency position codes, and inputs the fused vector into another convolutional neural network;
p408 carries out convolution layer calculation on input vectors of P406 and P407 respectively, and P406 and P407 share network parameters;
p409 performs word vector and concept vector pooling operation on the convolution calculation result of P408;
p410 splices the pooling operation results obtained by P409 respectively;
p411 uses a softmax function to classify the splicing result of P410 to obtain the final concept dependency relationship result;
p412 ends.
6. The understanding method according to claim 1, characterized in that: the calculation formula of the candidate answer selection module is defined as follows:
(1) Directional edge weight value calculation formula among key words
The weight value of the candidate key word in the current graph model refers to the duty ratio of the sum of Euclidean distances between the candidate key word and all other adjacent node words, and the calculation formula is as follows:
Figure FDA0004172872270000041
(2) Word weight value normalization processing formula
The normalized score of a term in english text refers to the ratio of the weight value of the term in the current graph model to the sum of the weight values of all terms in the current graph model, and its calculation formula is as follows:
Figure FDA0004172872270000042
7. the understanding method according to claim 6, characterized in that: the candidate answer selection module comprises the following processing steps:
p501 begins;
p502 inputs a conceptual representation of candidate key nouns or noun phrases;
p503 inputs the selected keywords and the concept semantic dependency relationship;
p504 uses the conceptual representation of candidate key nouns or phrases as nodes, uses the selected key words and the concept semantic dependency relationships thereof as edges, and builds a concept semantic graph model;
p505 uses the formula (5) to calculate the Euclidean distance between any two nodes in the conceptual semantic graph model;
p506 calculates the directed edge weight value between nodes using equation (6);
p507 descending order of weight values among all nodes;
p508 selects the maximum weight value between nodes, and takes the candidate key word of the node and the concept thereof as the final answer;
p509 ends.
CN202011382136.5A 2020-11-30 2020-11-30 English text concept understanding method Active CN112487806B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011382136.5A CN112487806B (en) 2020-11-30 2020-11-30 English text concept understanding method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011382136.5A CN112487806B (en) 2020-11-30 2020-11-30 English text concept understanding method

Publications (2)

Publication Number Publication Date
CN112487806A CN112487806A (en) 2021-03-12
CN112487806B true CN112487806B (en) 2023-05-23

Family

ID=74938475

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011382136.5A Active CN112487806B (en) 2020-11-30 2020-11-30 English text concept understanding method

Country Status (1)

Country Link
CN (1) CN112487806B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114417865B (en) * 2022-01-24 2023-05-26 平安科技(深圳)有限公司 Description text processing method, device and equipment for disaster event and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107357783A (en) * 2017-07-04 2017-11-17 桂林电子科技大学 A kind of English translation mass analysis method of translator of Chinese into English
CN111027314A (en) * 2019-12-10 2020-04-17 中国传媒大学 Character attribute extraction method based on language fragment
CN111737980A (en) * 2020-06-22 2020-10-02 桂林电子科技大学 Method for correcting English text word use errors

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9015031B2 (en) * 2011-08-04 2015-04-21 International Business Machines Corporation Predicting lexical answer types in open domain question and answering (QA) systems

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107357783A (en) * 2017-07-04 2017-11-17 桂林电子科技大学 A kind of English translation mass analysis method of translator of Chinese into English
CN111027314A (en) * 2019-12-10 2020-04-17 中国传媒大学 Character attribute extraction method based on language fragment
CN111737980A (en) * 2020-06-22 2020-10-02 桂林电子科技大学 Method for correcting English text word use errors

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于神经网络的机器阅读理解综述;顾迎捷;桂小林;李德福;沈毅;廖东;;软件学报(07);全文 *

Also Published As

Publication number Publication date
CN112487806A (en) 2021-03-12

Similar Documents

Publication Publication Date Title
CN108446271B (en) Text emotion analysis method of convolutional neural network based on Chinese character component characteristics
CN110502738A (en) Chinese name entity recognition method, device, equipment and inquiry system
Ghosh et al. Natural language processing fundamentals: build intelligent applications that can interpret the human language to deliver impactful results
CN112988970A (en) Text matching algorithm serving intelligent question-answering system
George et al. Teamcen at semeval-2018 task 1: global vectors representation in emotion detection
CN116028608A (en) Question-answer interaction method, question-answer interaction device, computer equipment and readable storage medium
CN112487806B (en) English text concept understanding method
Nehar et al. An efficient stemming for arabic text classification
CN115374259A (en) Question and answer data mining method and device and electronic equipment
CN113516094B (en) System and method for matching and evaluating expert for document
Chanda et al. Sentiment Analysis and Homophobia detection of Code-Mixed Dravidian Languages leveraging pre-trained model and word-level language tag.
CN114297342A (en) Legal document generation method and system based on reading understanding and intention recognition model
CN114139537A (en) Word vector generation method and device
Wang et al. YNUWB at SemEval-2019 Task 6: K-max pooling CNN with average meta-embedding for identifying offensive language
Alwaneen et al. Stacked dynamic memory-coattention network for answering why-questions in Arabic
CN115034302B (en) Relation extraction method, device, equipment and medium for optimizing information fusion strategy
CN111241848A (en) Article reading comprehension answer retrieval system and device based on machine learning
JP7444625B2 (en) question answering device
KR102534131B1 (en) Method and Apparatus for Providing Book Recommendation Service Based on Interactive Form
Yoshida et al. Dialog state tracking for unseen values using an extended attention mechanism
Singh et al. Model robustness with text classification: Semantic-preserving adversarial attacks
Bouhoun et al. Information Retrieval Using Domain Adapted Language Models: Application to Resume Documents for HR Recruitment Assistance
Rehman et al. Automated classification of mobile app reviews considering user’s quality concerns
Hellström Aspect based sentiment analysis in Finnish
Serra et al. A crowdsourcing semi-supervised lstm training approach to identify novel items in emerging artificial intelligent environments

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant