CN106776550B

CN106776550B - method for analyzing consistency quality of English literary texts

Info

Publication number: CN106776550B
Application number: CN201611109331.4A
Authority: CN
Inventors: 黄桂敏; 冯其良; 黄思睿
Original assignee: Guilin University of Electronic Technology
Current assignee: Guilin University of Electronic Technology
Priority date: 2016-12-06
Filing date: 2016-12-06
Publication date: 2019-12-13
Anticipated expiration: 2036-12-06
Also published as: CN106776550A

Abstract

The invention provides an analysis method for consistency quality of English writing, which is an analysis model consisting of an English writing preprocessing module, an English writing grammar role labeling module, an English writing feature extraction module, an English writing coreference resolution module, an English writing entity chain grid construction module and an English writing consistency analysis module which are sequentially connected. After an English composition is processed by the analysis model, a consistent quality analysis result of the English composition can be obtained finally. The method solves the problem of consistency analysis of index relations and words for languages in English texts, and the analysis result is better than that of the traditional method for analyzing the consistency quality of English texts.

Description

method for analyzing consistency quality of English literary texts

Technical Field

the invention relates to a natural language processing technology, a machine learning algorithm and an English language content analysis technology, in particular to an analysis method for the consistency quality of English language sentences.

Background

the traditional English text consistency quality analysis method mainly comprises a potential semantic analysis method and an entity grid analysis method. The latent semantic analysis method is a method for analyzing the internal semantic relation among vocabularies by constructing a word-document matrix, reducing the dimension of the word-document matrix by using singular value decomposition. However, because singular value decomposition is a mathematical transformation, the newly generated matrix is relatively poor in interpretability, and the potential semantic analysis method has the defects that the phenomenon of word ambiguity cannot be processed, and the appearance sequence of words is ignored. In recent years, an entity grid analysis method gradually replaces a potential semantic analysis method and becomes a more widely used English text consistency quality analysis method. Moreover, the traditional English text consistency quality analysis method generally solves the problem of consistency of English texts in news reports, and because the chapter structures, the reference relations and the words for languages of the English texts are relatively fixed, the traditional English text consistency quality analysis method can obtain a better analysis result; however, since the chapter structure, the reference relationship, and the words for language of the english composition are not fixed, the analysis result obtained by the traditional english text and language consistency quality analysis method in the aspect of analyzing the english composition and language consistency quality is not ideal. Therefore, in order to solve the problems, the invention provides a method for analyzing the consistency quality of English literary pieces.

Disclosure of Invention

1. a method for analyzing the consistency quality of English literary texts is characterized by comprising the following steps: the method comprises an analysis model consisting of an English composition preprocessing module, an English composition grammar role labeling module, an English composition characteristic extraction module, an English composition coreference resolution module, an English composition entity chain grid construction module and an English composition consistency analysis module which are connected in sequence, wherein the overall processing steps of the analysis model are shown in figure 1.

in the analysis model, a first step English composition preprocessing module reads in an English composition, performs segmentation, word segmentation, sentence segmentation, part of speech tagging and dependency syntax analysis on the English composition, and outputs a preprocessing result of the English composition; secondly, reading the preprocessing result of the English composition by an English composition grammar role labeling module, finding out the dependency relationship of each entity word in the preprocessing result, labeling grammar roles of the entity words in sentences according to the dependency relationship, and outputting the grammar roles of the entity words in the sentences; thirdly, an English composition feature extraction module reads in a preprocessing result of the English composition, performs semantic grade definition on entity words in the preprocessing result, extracts coreference resolution features of the entity words at the same time, and outputs the coreference resolution features of the entity words; fourthly, the English composition feature extraction module reads in the entity word coreference resolution features output by the English composition feature extraction module, analyzes the coreference relation of the entity words through the coreference resolution model and the entity word coreference resolution features, and outputs a coreference linked list formed by the entity words; fifthly, reading the entity word co-reference linked list output by the English composition co-reference resolution module by the English composition entity chain grid structure modeling block, constructing the English composition entity chain grid by using the entity word co-reference linked list, replacing a priority low-language role in the entity chain grid by using a priority high-grammar role, and outputting the English composition entity chain grid; sixthly, reading the English composition entity chain grids output by the English composition entity chain grid construction module by the English composition consistency analysis module, analyzing sentence consistency of the English composition through the entity chain grids, calculating English composition consistency quality, and outputting English composition consistency quality analysis results; the following are the processing steps for each module in the analytical model:

(1) The English composition preprocessing module comprises the following processing steps as shown in FIG. 2:

p201 begins;

p202 reads English composition;

p203 segments English composition;

p204, separating the English composition;

p205 carries out word segmentation on English compositions;

p206 carries out part-of-speech tagging on English compositions according to the word part-of-speech tagging set;

and P207 generates a word directed graph of the English composition according to the part-of-speech tagging result of the English composition. In the directed graph, one node is a word and part of speech labels thereof, and all nodes are connected through a directed edge;

p208 is combined according to the dependency relationship weight of the directed edge, and a weight corresponding to the dependency relationship of the directed edge in the directed graph is given;

P209 generates a sentence dependency relationship library of English composition through greedy search and outputs a dependency syntax analysis result of the English composition;

p210 is finished;

(2) the processing steps of the English composition grammar role labeling module are as follows, as shown in FIG. 3:

P301 is started;

p302 reads the dependency syntax analysis result of English composition;

p303, traversing leaf nodes of a dependency tree in the sentence dependency relationship of the English composition;

p304, if the word where the current node is located is an entity word, turning to P305 operation, otherwise, turning to P303 operation;

p305 finds the brother node of the current node;

P306 inquires the composition components formed by the current node and the brother nodes of the current node in the word part-of-speech tagging set;

p307, if a result is inquired in the word part-of-speech tagging set, turning to P308 operation, otherwise, turning to P315 operation;

p308, performing phrase tagging on the current phrase according to the phrase syntax tagging set;

p309 if the current phrase is a preposition phrase, turning to P310 operation, otherwise, turning to P311 operation;

p310 processes the preposition phrases and transfers to P315 operation;

p311 if the current phrase is a noun phrase, then go to P312 operation, otherwise go to P313 operation;

p312 processes noun phrases, and then the operation is switched to P315;

p313, if the current phrase is a phrase of "Most of", then go to P314 operation, otherwise go to P315 operation;

p314 processes the phrase "Most of" and switches to P315;

p315 obtains all dependency relations related to the current entity words from the sentence dependency relation library of English composition;

p316, if the noun subject relation or clause component subject relation exists, then the operation is switched to P317, otherwise, the operation is switched to P318;

p317 marks the grammar role of the current entity word as a subject, and then the operation is switched to P321;

p318 goes to P319 operation if there is a direct object relationship or an indirect object relationship, otherwise goes to P320 operation;

p319 marks the current entity word as an object and transfers to P321 operation;

p320 marks the grammatical role of the current entity word as "present";

p321 stores the grammar role marking result of the English composition into a grammar role marking linked list;

P322, if the dependency syntax tree traversal is finished, then turning to P323 operation, otherwise, turning to P303 operation;

p323 outputs the grammar role marking result of the English composition;

p324 is finished;

(3) The processing steps of the English composition feature extraction module are as follows, as shown in FIG. 4:

p401 begins;

p402 reads the dependency syntax analysis result of English composition;

p403 traverses leaf nodes of the dependency syntax tree;

p404, detecting entity words according to the dependency syntax analysis result;

p405, if the current word is detected as the entity word, turning to the P406 operation, otherwise, turning to the P403 operation;

p406 creates an 'expression' object for storing feature information according to the coreference resolution feature set;

p407, finding a leaf node representing the current entity word in the dependency syntax tree according to the position information of the entity word;

p408, starting from the current leaf node, searching the parent node of the current node upwards;

P409, if the parent node is a noun phrase node, then turning to P410 operation, otherwise, turning to P408 operation;

p410 integrates all leaf nodes under the noun phrase node into one phrase and stores the phrase into the current expression object;

P411 extracts the common features of the entity words according to the coreference resolution feature set and stores the common features into the current 'expression' object;

p412 queries the male name vocabulary and the female name vocabulary respectively;

p413, if the gender of the current entity word is inquired, turning to the operation P414, and otherwise, turning to the operation P415;

p414 stores the sex characteristics into the current 'expression' object, and then the operation is switched to P416;

p415 sets the gender feature as unknown and stores it in the current "expression" object;

p416 queries a vocabulary of commonly represented characters;

P417, if the current entity word is inquired in the vocabulary commonly representing the characters, turning to the P418 operation, otherwise, turning to the P419 operation;

p418 takes the corresponding semantic level type as the semantic level of the current entity word, and then the operation is switched to P426;

p419 carries out named entity recognition on the entity words;

p420, if the named entity identification result is the named entity, turning to P421 operation, otherwise, turning to P422 operation;

P421 converts the recognition result of the named entity into corresponding semantic level, and then transfers to P426 operation;

p422 carries out date detection on the entity words;

p423 if the entity word represents a date, then go to P424 operation, otherwise go to P425 operation;

p424 sets the semantic level of the entity word as the date, and then the operation goes to P426;

p425 labels the semantic level result of the entity word as "object";

P426 inquires a living vocabulary table containing vital sign vocabularies;

P427 if the current entity word is looked up in the live vocabulary, go to P428 operation, otherwise go to P429 operation;

p428 marks the live features of the entity words as true, and then the operation is switched to P430;

p429 labels the living characteristics of the entity words as false;

p430 queries the nickname table;

if the current entity word is inquired in the nickname table, the operation is switched to the P433 operation by the P431, and otherwise, the operation is switched to the P432 operation;

p432 sets the alias feature to null, turning to P434 operation;

p433 stores the searched alias information into the alias characteristics of the current entity word;

p434 queries the fact information of the current entity in the English entity word common sense library, and stores the result in the coreference resolution feature set;

P435, if the dependency syntax tree traversal is finished, then go to P436, otherwise go to P403;

p436 outputs English composition feature extraction results;

p437 is finished;

(4) The English composition coreference resolution module comprises the following processing steps as shown in FIG. 5:

p501 begins;

P502 reads the feature extraction result of the English composition;

P503, traversing the feature extraction result of the English composition;

p504 detects whether the word "it" is in a sentence pattern or semantics that do not refer to any pre-referent;

p505 if the word "it" has non-referency, then go to P519 operation, otherwise go to P506 operation;

P506, if the current entity word has the collocate, turning to P507 for operation, otherwise, turning to P508 for operation;

p507 adds the current entity word into the common finger chain where the co-located word is located, and changes to the operation of P521;

p508 if the current entity word is in a clause beginning with "As", then go to P509 operation, otherwise go to P511 operation;

p509 switches to P510 if the subject in the "As" clause has subject relation, otherwise switches to P511;

P510 adds the current entity word into the entity chain where the entity word of the "As" clause subject is located, and then the operation is switched to P521;

p511 constructs a candidate pre-fingered list;

p512 inquires the word senses of the entity words in an English vocabulary network dictionary;

P513 calculates the vocabulary semantic similarity between the current 'expression' and all candidate pre-fingered words through a semantic classification tree, the word senses of the entity words and the following formula (1), and adds the vocabulary semantic similarity into the coreference resolution feature set of the English composition;

P514 converts the characteristic information of the current entity word and all candidate pre-fingered words into a format required by a coreference resolution algorithm;

p515 is loaded in a coreference resolution model which is trained in a coreference resolution training English composition set;

p516 calculates the information gain rate of each coreference resolution characteristic according to the following formula (2), formula (3), formula (4), formula (5) and formula (6), and performs coreference resolution processing by adopting a coreference resolution algorithm;

If the fruit body words are commonly referred to, the operation is switched to P518, otherwise, the operation is switched to P519;

p518 adds the back-index word into the common-index chain where the front-index word is positioned, and then the operation is switched to P521;

p519 creates a new coreference chain for the referent;

P520 stores the newly created co-finger chain into the co-finger chain table;

if the P521 refers to the end of the traversal of the feature result, the operation is switched to the P522 operation, otherwise, the operation is switched to the P503 operation;

P522 outputs the common finger linked list of English composition;

p523 ends;

(5) the processing steps of the English composition entity chain grid construction module are as follows, as shown in FIG. 6:

p601 is started;

p602 creates a two-dimensional matrix for storing the common finger chain grammar role information and initializes the matrix;

P603 reads the common finger linked list of English composition;

p604 traverses the next common finger chain;

P605 traverses the next "expression" of the current co-designated chain;

p606 acquires new grammar role information of current 'expression';

p607 acquires the old grammar role of the expression from the two-dimensional matrix through the current position information of the expression;

p608, if the priority of the new grammar role is higher than that of the old grammar role, then the operation is switched to P609, otherwise, the operation is switched to P610;

p609 replaces the old grammar role with the new grammar role in the two-dimensional matrix according to the current 'expression' position information;

p610 transfers to P605 operation if unprocessed 'expression' exists in the current co-reference chain, otherwise transfers to P611 operation;

P611 transfers to P604 operation if there are still unprocessed co-finger chains, otherwise transfers to P612 operation;

p612 generates entity chain grids of English compositions;

p613 is over;

(6) The processing steps of the English literary sentence continuous analysis module are as follows, as shown in FIG. 7:

P701 starts;

p702 initializes a grammar role transition frequency matrix;

P703 reading in entity chain grids of English compositions;

p704 traverses the physical chain mesh;

p705 if the current entity word is the first element of the entity chain, then the operation is switched to P706, otherwise, the operation is switched to P707;

the P706 caches the current grammar role and switches to the P704 operation;

P707 forming a transition sequence with a length of 2 by the current grammar role and the cached grammar role;

P708 adds 1 to the frequency of the current transfer sequence in the grammar character transfer frequency matrix;

p709 replaces the cached grammar role with the current grammar role;

P710, if the traversal of the entity chain grid is finished, turning to P711 operation, otherwise, turning to P704 operation;

p711 loads an entity chain grid model trained in a language piece coherent training English composition set;

p712 calculates transition probability of grammatical role sequence in entity chain grid according to the following formula (7);

p713 calculates a consistency score of the english composition according to the following formula (8);

p714 weights the scores according to the number of different entity words in the common finger chain;

P715, calculating the semantic similarity between adjacent sentences according to the following formula (9), and calculating the shallow semantic consistency of English compositions according to the following formula (10);

p716 generating a English composition consistency analysis result according to the English composition consistency score and the shallow semantic consistency;

P717 ends.

2. the basic concept of the method of the invention is defined as follows:

(1) common-finger resolution training English composition set and English composition coherent training English composition set

The common-reference resolution training English corpus is obtained from English model texts which do not contain word errors, grammar errors, expression errors and common-reference phenomena, and the language piece coherent training English corpus is obtained from English model texts which do not contain word errors, grammar errors, expression errors and language piece coherence.

(2) word part-of-speech tagging set and phrase syntax tagging set

the part-of-speech tagging set of words and the syntactic tagging set of phrases adopt a Bingzhou tree library part-of-speech tagging set.

(3) sentence dependency relationship library

the sentence dependency library refers to a collection containing all dependencies of english compositions. In the sentence dependency library, each row represents a dependency record, and the storage structure of each dependency is as follows:

dependency relationship (word 1-position number of word 1, word 2-position number of word 2)

(4) coreference resolution feature set

the coreference resolution feature set is a feature set containing all feature information of the back and front referents, and is shown in the following table 1:

table 1: coreference resolution feature set

(5) english vocabulary network dictionary

the English vocabulary network dictionary is an English dictionary containing common English word meaning information, and the storage structure of the English vocabulary network dictionary is as follows:

part of speech (word frequency) { offset } < affiliated dictionary filename > [ dictionary file number ] (word # meaning number) (meaning "example sentence"

(6) semantic classification tree

the semantic classification tree is an English word relation set comprising English words, synonymy relations, antisense relations, integral part relations, attribute relations, modification relations, upper meaning relations and lower meaning relations, and the storage structure of the semantic classification tree is as follows:

(7) Knowledge base of English entity word common sense

the English entity word common sense knowledge base refers to a fact collection base containing relevant information about daily entities. In the english entity word general knowledge base, each row represents a fact, each fact is composed of a sequence number and a triple < entity 1, relationship, entity 2>, the sequence number is the number of the fact in the knowledge base, entity 1 is the subject, relationship is the predicate, entity 2 is the object, and the storage structure of each fact is as follows:

< sequence number > < entity 1> < relationship > < entity 2>

3. The calculation formula of the method of the invention is defined as follows:

(1) calculation formula of vocabulary semantic similarity

In the formula (1), the path₁the path from the minimum common father node of two word senses to the root node in the semantic classification tree; route of travel₂Refers to the path from the word sense of the first word to the smallest common parent node of the two word senses in the semantic classification tree; route of travel₃refers to the path from the sense of the second word to the smallest common parent node of the two word senses in the semantic classification tree. The minimum common father node is the first common father node of the searched current two word meaning nodes, which is searched upwards from the node where the word meanings of the two words are located;

(2) calculation formula of entity word set information entropy

In equation (2), the category attribute is decided_iis the ith attribute in the decision category; i is the serial number of the current attribute, i is 1, 2, …, m; m is the total number of attributes in the decision category; the total number of all attributes in a decision category is the total number of attributes in the decision category.

(3) calculation formula of coreference resolution characteristic information gain

common-meaning resolution feature information gain is entity word set information entropy-common-meaning resolution feature expectation information (4)

In the formula (4), the information entropy of the entity word set is obtained by calculation through the formula (2); the coreference resolution characteristic expected information is obtained by calculation according to the formula (3);

(4) calculation formula of coreference resolution characteristic expected information

In the formula (3), the characteristics_Ais the feature currently to be calculated; j is the serial number of the current attribute, j is 1, 2, …, v; v is a feature in a set of entity words_Athe number of attribute categories of (2); properties_jIs characterized in that_Athe jth attribute; the entity word set information entropy is obtained by calculation according to the formula (2);

(5) calculation formula of coreference resolution feature splitting information

in the formula (5), the characteristics_Ais the feature currently to be calculated; j is the serial number of the current attribute, j is 1, 2, …, v; v is a feature in a set of entity words_Athe number of attribute categories of (2); properties_jis characterized in that_AThe jth attribute;

(6) Calculation formula of coreference resolution characteristic information gain rate

in the formula (6), the information gain is calculated by the following formula (4); the splitting information is calculated by the following formula (5).

(7) Formula for calculating occurrence probability of entity chain sequence

in formula (7), i is the sequence number of the current grammar role in the grammar role sequence, i is 1, 2, …, v; v is the total number of grammatical roles of the current entity chain; grammatical roles_iis the currently computed grammar role; grammatical roles_i+1Is the currently computed grammar role; the co-occurrence times refer to the total times of co-occurrence of two grammatical roles in a consistent training English composition set of the language;

(8) Calculation formula for English literary sentence coherent score

In formula (8), i is the sequence number of the current sentence, i is 1, 2, …, n; j is the serial number of the current entity chain, j is 1, 2, …, m; n is the total number of sentences; m is the total number of entity chains; the weight of the entity chain refers to the number of different entity words in the same entity chain; the occurrence probability of the entity chain sequence is obtained by calculation according to the formula (7);

(9) Calculation formula of sentence semantic consistency

in formula (9), the chain of entities_iis the current chain of entities being computed; i is the serial number of the current entity chain, i is 1, 2, …, n; n is a sentence₁And sentences₂the total number of different chains of entities in; sentence₁is a sentence currently to be calculated; sentence₂is a sentence currently to be calculated;

(10) Calculation formula of shallow semantic consistency

in formula (10), i is the serial number of the sentence to be currently calculated, i is 1, 2, …, n; n is the total number of sentences of the English composition; the sentence semantic consistency is calculated by the above formula (9).

(V) description of the drawings

FIG. 1 is a diagram of the overall processing steps of the English composition of the method of the present invention;

FIG. 2 is a diagram of the processing steps of the English composition preprocessing module of the method of the present invention;

FIG. 3 is a diagram of the processing steps of the English composition grammar role labeling module of the method of the present invention;

FIG. 4 is a diagram of the processing steps of the English composition feature extraction module of the method of the present invention;

FIG. 5 is a diagram of English composition coreference resolution module processing steps of the method of the present invention;

FIG. 6 is a diagram of the processing steps of the English composition entity chain grid building module of the method of the present invention;

FIG. 7 is a diagram of processing steps of a consistency analysis module for English literary pieces of the method of the present invention;

(VI) detailed description of the preferred embodiment

the specific implementation mode of the method for analyzing the consistency quality of the English literary texts is divided into the following six steps.

the first step is as follows: executing 'English composition preprocessing module'

the subject of English composition in the embodiment of the present invention Is "Is it new science for college students to knock out the society", the implementation results are as follows:

(1) the contents of the English composition are as follows:

Is it necessary for college students to know about the society？

It is necessary for college students to know about the society.After graduatingfrom campus,they will enter the society,which is quite differentfrom university.

In order to adapt to the complicated society in the future,they should do somethingnow.There are many ways to know the world outside thecampus.For example,studentscan know about it through mass medium,such as TV,newspapers and etc.In addition,they should take part in various activities ofsociety.

As far as I am concerned,I'm eager to know the society.I'm a student of JournalismDepartment,so I can know many people.I mean to make friends withthem after each interview.I will keep in touch with them and communicate witheach other.From them,I may learn about part of the world.I will do so fromnow on.

(2) After the part of speech tagging is performed on the English composition, the generated part of speech tagging result is as follows:

It/PRP is/VBZ necessary/JJ for/IN college/NN students/NNS to/TO know/VB about/INthe/DT society/NN./.After/IN graduating/VBG from/IN campus/NN,/,they/PRP will/MDenter/VB the/DT society/NN,/,which/WDT is/VBZ quite/RBdifferent/JJ from/INuniversity/NN./.In/IN order/NN to/TO adapt/VB to/TO the/DT complicated/JJsociety/NN in/IN the/DT future/NN,/,they/PRP should/MD do/VBsomething/NN now/RB./.

There/EX are/VBP many/JJ ways/NNS to/TO know/VB the/DT world/NN outside/IN the/DTcampus/NN./.For/IN example/NN,/,students/NNS can/MD know/VBabout/IN the/DTworld/NN through/IN mass/NN medium/NN,/,such/JJ as/IN TV/NN,/,newspapers/NNSand/CC etc./FW./.In/IN addition/NN,/,they/PRP should/MD take/VBpart/NN in/INvarious/JJ activities/NNS of/IN society/NN./.

As/RB far/RB as/IN I/PRP am/VBP concerned/VBN,/,I/PRP'm/VBP eager/JJ to/TOknow/VB the/DT society/NN./.I/PRP'm/VBP a/DT student/NN of/INJournalism/NNPSDepartment/NNP,/,so/IN I/PRP can/MD know/VB many/JJ people/NNS./.I/PRP mean/VBPto/TO make/VB friends/NN with/IN them/PRP after/IN each/DT interview/NN./.I/PRPwill/MD keep/VB in/RB touch/NN with/IN them/PRP and/CCcommunicate/VB with/IN each/DTother/JJ./.From/IN them/PRP,/,I/PRP may/MDlearn/VB about/IN part/NN of/IN the/DTworld/NN./.I/PRP will/MD do/VB so/RBfrom/IN now/RB on/IN./.

(3) after dependency parsing processing is performed on english compositions, the sentence dependency relationship library generated is as follows:

nsubj(necessary-3,It-1)cop(necessary-3,is-2)root(ROOT-0,necessary-3)mark(know-8,for-4)compound(students-6,college-5)nsubj(know-8,students-6)mark(know-8,to-7)advcl(necessary-3,know-8)case(society-11,about-9)det(society-11,the-10)nmod:about(know-8,society-11)

mark(graduating-2,After-1)advcl(enter-8,graduating-2)case(campus-4,from-3)nmod:from(graduating-2,campus-4)nsubj(enter-8,they-6)aux(enter-8,will-7)root(ROOT-0,enter-8)det(society-10,the-9)dobj(enter-8,society-10)nsubj(different-15,society-10)ref(society-10,which-12)cop(different-15,is-13)advmod(different-15,quite-14)acl:relcl(society-10,different-15)case(university-17,from-16)nmod:from(different-15,university-17)

mark(adapt-4,In-1)mwe(In-1,order-2)mark(adapt-4,to-3)advcl(do-15,adapt-4)case(society-8,to-5)det(society-8,the-6)amod(society-8,complicated-7)nmod:to(adapt-4,soci ety-8)case(future-11,in-9)det(future-11,the-10)nmod:in(adapt-4,future-11)nsubj(do-15,they-13)aux(do-15,should-14)root(ROOT-0,do-15)dobj(do-15,something-16)advmod(do-15,now-17)

expl(are-2,There-1)root(ROOT-0,are-2)amod(ways-4,many-3)nsubj(are-2,ways-4)mark(know-6,to-5)acl(ways-4,know-6)det(world-8,the-7)dobj(know-6,world-8)case(campus-11,outside-9)det(campus-11,the-10)nmod:outside(know-6,campus-11)

case(example-2,For-1)nmod:for(know-6,example-2)nsubj(know-6,students-4)aux(know-6,can-5)root(ROOT-0,know-6)case(world-9,about-7)det(world-9,the-8)nmod:about(know-6,world-9)case(medium-12,through-10)compound(medium-12,mass-11)nmod:through(world-9,medium-12)case(TV-16,such-14)mwe(such-14,as-15)nmod:such_as(medium-12,TV-16)nmod:such_as(medium-12,newspapers-18)conj:and(TV-16,newspapers-18)cc(TV-16,and-19)nmod:such_as(medium-12,etc.-20)conj:and(TV-16,etc.-20)

case(addition-2,In-1)nmod:in(take-6,addition-2)nsubj(take-6,they-4)aux(take-6,should-5)root(ROOT-0,take-6)dobj(take-6,part-7)case(activities-10,in-8)amod(activities-10,various-9)nmod:in(take-6,activities-10)case(society-12,of-11)nmod:of(activities-10,society-12)

advmod(far-2,As-1)advmod(concerned-6,far-2)mark(concerned-6,as-3)nsubjpass(concerned-6,I-4)auxpass(concerned-6,am-5)advcl(eager-10,concerned-6)nsubj(eager-10,I-8)nsubj(know-12,I-8)cop(eager-10,'m-9)root(ROOT-0,eager-10)mark(know-12,to-11)xcomp(eager-10,know-12)det(society-14,the-13)dobj(know-12,society-14)

nsubj(student-4,I-1)cop(student-4,'m-2)det(student-4,a-3)root(ROOT-0,student-4)case(Department-7,of-5)compound(Department-7,Journalism-6)nmod:of(student-4,Department-7)dep(student-4,so-9)nsubj(know-12,I-10)aux(know-12,can-11)parataxis(student-4,know-12)amod(people-14,many-13)dobj(know-12,people-14)

nsubj(mean-2,I-1)nsubj(make-4,I-1)root(ROOT-0,mean-2)mark(make-4,to-3)xcomp(mean-2,make-4)dobj(make-4,friends-5)case(them-7,with-6)nmod:with(make-4,them-7)case(interview-10,after-8)det(interview-10,each-9)nmod:after(make-4,interview-10)

nsubj(keep-3,I-1)nsubj(communicate-9,I-1)aux(keep-3,will-2)root(ROOT-0,keep-3)advmod(keep-3,in-4)dobj(keep-3,touch-5)case(them-7,with-6)nmod:with(keep-3,them-7)cc(keep-3,and-8)conj:and(keep-3,communicate-9)case(other-12,with-10)det(other-12,each-11)nmod:with(communicate-9,other-12)

case(them-2,From-1)nmod:from(learn-6,them-2)nsubj(learn-6,I-4)aux(learn-6,may-5)root(ROOT-0,learn-6)case(part-8,about-7)nmod:about(learn-6,part-8)case(world-11,of-9)det(world-11,the-10)nmod:of(part-8,world-11)

nsubj(do-3,I-1)aux(do-3,will-2)root(ROOT-0,do-3)advmod(do-3,so-4) case(now-6,from-5)advcl:on(do-3,now-6)case(now-6,on-7)

The second step is as follows: executing English composition grammar role marking module "

The English composition grammar role labeling module executes the part of speech labeling result output by the English composition preprocessing module and the sentence dependency relationship library to detect the entity words through the first step, performs grammar role labeling on the entity words, and finally generates the grammar role labeling result of the English composition. Since there are many entity words whose grammatical role is "not present", only the labeling result of the entity word whose grammatical role is not "not present" is listed below.

It(1)/S society(3)/X society(6)/O society(8)/X students(2)/S they(5)/S they(10)/Scampus(4)/X university(7)/X future(9)/X something(11)/O ways(1)/Sworld(2)/Oworld(5)/X campus(3)/X students(4)/S they(10)/S medium(6)/X TV(7)/Xnewspapers(8)/Xaddition(9)/X part(11)/O activities(12)/X I(1)/O I(2)/S I(4)/SI(6)/S I(8)/S I(12)/SI(16)/S I(18)/S society(3)/O student(5)/X people(7)/Ofriends(9)/O them(10)/Xthem(14)/X them(15)/X interview(11)/X touch(13)/O part(17)/X

the third step: executing 'English composition feature extraction module'

the English composition feature extraction module is used for extracting the coreference resolution feature of the English composition entity word set according to the part-of-speech tagging result, the sentence dependency relationship library and the grammar role tagging result which are output after the first step and the second step are executed, and finally outputting the coreference resolution feature set of the English composition entity word set. As the coreference resolution feature set of the English composition is huge and cannot be listed one by one, only the coreference resolution feature set of the first section of the English composition is listed below, and the following data is replaced by the omission symbol.

[college students,It]:FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,2,0,0.084746,28.559322,FALSE,TRUE,FALSE,FALSE

[the society,college students]:FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,2,0,0.084746,0,FALSE,TRUE,TRUE,FALSE

[the society,It]:FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,TRUE,FALSE,FALSE,FALSE,1,0.444444,0.847458,0,FALSE,TRUE,FALSE,FALSE

[campus,the society]:FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,1,0.235294,0.677966,0.084746,FALSE,FALSE,FALSE,FALSE

[campus,college students]:FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,2,0,0,0.084746,FALSE,TRUE,FALSE,FALSE

[campus,It]:FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,1,0.235294,0.677966,0.084746,FALSE,FALSE,FALSE,FALSE

[they,campus]:FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,2,0,0.762712,0.423729,FALSE,FALSE,FALSE,FALSE

[they,the society]:FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,2,0,18.389831,28.559322,FALSE,FALSE,TRUE,FALSE

[they,college students]:FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,0,0,0.084746,28.559322,FALSE,TRUE,TRUE,FALSE

[the society,they]:FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE, 2,0,0.847458,6.101695,FALSE,TRUE,TRUE,FALSE

[the society,campus]:FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,1,0.235294,0.762712,0.169492,FALSE,FALSE,FALSE,FALSE

[the society,the society]:TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,1,1,2.79661,6.101695,FALSE,FALSE,TRUE,TRUE

[university,the society]:FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,1,0.75,0.677966,0.254237,FALSE,FALSE,FALSE,FALSE

[university,they]:FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,2,0,0.508475,0.084746,FALSE,TRUE,FALSE,FALSE

[university,campus]:FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,1,0.421053,3.220339,0,FALSE,FALSE,FALSE,TRUE

[the complicated society,university]:FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,1,0.75,0.847458,0.254237,FALSE,FALSE,FALSE,FALSE

[the complicated society,the society]:FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,1,1,0.59322,0.847458,FALSE,FALSE,TRUE,TRUE

[the future,the complicated society]:FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,1,0.625,2.79661,0.677966,FALSE,FALSE,FALSE,FALSE

[the future,university]:FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,1,0.705882,0.762712,0,FALSE,FALSE,FALSE,FALSE

[the future,the society]:FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,1,0.625,2.79661,0.677966,FALSE,FALSE,FALSE,FALSE

[the future,they]:FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,2,0,0.847458,0.677966,FALSE,TRUE,FALSE,FALSE

[the future,campus]:FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,1,0.705882,0.762712,0,FALSE,FALSE,FALSE,FALSE

[the future,college students]:FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,2,0,0.847458,0.677966,FALSE,TRUE,FALSE,FALSE

[the future,It]:FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,1,0.625,2.79661,0.677966,FALSE,FALSE,FALSE,FALSE

[they,the future]:FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,2,0,0.847458,0.677966,FALSE,FALSE,FALSE,FALSE

[they,the complicated society]:FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,2,0,18.389831,28.559322,FALSE,FALSE,TRUE,FALSE

[they,university]:FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,2,0, 0.762712,0.423729,FALSE,FALSE,FALSE,FALSE

[they,they]:TRUE,FALSE,FALSE,TRUE,TRUE,FALSE,TRUE,FALSE,FALSE,FALSE,0,0,18.389831,28.559322,FALSE,TRUE,TRUE,TRUE

[something,they]:FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,2,0,0.508475,6.101695,FALSE,TRUE,FALSE,FALSE

[something,the future]:FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,1,0.461538,0.677966,0.677966,FALSE,FALSE,FALSE,FALSE

[something,the complicated society]:FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,1,0.333333,0.677966,6.101695,FALSE,FALSE,FALSE,FALSE

[something,university]:FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,1,0.428571,3.220339,0.169492,FALSE,FALSE,FALSE,FALSE

[something,the society]:FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,1,0.333333,0.677966,6.101695,FALSE,FALSE,FALSE,FALSE

[something,campus]:FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,1,0.428571,3.220339,0.169492,FALSE,FALSE,FALSE,FALSE

[something,college students]:FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,2,0,0.508475,6.101695,FALSE,TRUE,FALSE,FALSE

[something,It]:FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,1,0.333333,0.677966,6.101695,FALSE,FALSE,FALSE,FALSE

……

the fourth step: executing 'English composition common reference clearing module'

The English composition coreference resolution module executes coreference resolution characteristic set output by the English composition characteristic extraction module through the third step, carries out coreference resolution on entity words in English compositions, and finally generates a coreference linked list of the English composition, as shown in the following table 2:

table 2: common finger chain table for English composition

the fifth step: executing 'English composition entity chain grid construction module'

the English composition entity chain grid construction module is used for executing the co-reference linked list output by the English composition co-reference resolution module through the fourth step, executing the grammar role marking result output by the English composition grammar role marking module through the second step, and constructing the English composition entity chain grid, wherein the constructed entity chain grid is as follows:

a sixth step: executing 'consistent analysis module of English literary sentence'

the English composition consistency analysis module executes the solid chain grids output by the English composition solid chain grid structure modeling block through the fifth step, analyzes the English composition consistency according to the formula (7), the formula (8), the formula (9) and the formula (10), and the analysis result is as follows:

consistency score: 0.3267 (coherent character)

segment 1 coherence score: 0.3733 (better consistency)

clause 1 and clause 2 transition scores: 6 (very good transition)

clause 2 and clause 3 transition scores: 5 (very good transition)

segment 2 coherence score: 0.2119 (word reluctant)

clause 1 and clause 2 transition scores: 2 (transition general)

Clause 2 and clause 3 transition scores: 3 (good transition)

segment 3 coherence score: 0.3949 (better consistency)

clause 1 and clause 2 transition scores: 3 (good transition)

clause 2 and clause 3 transition scores: 3 (good transition)

clause 3 and clause 4 transition scores: 4 (good transition)

Clause 4 and clause 5 transition score: 4 (good transition)

clause 5 and clause 6 transition scores: 3 (good transition).

Claims

1. A method for analyzing the consistency quality of English literary texts is characterized by comprising the following steps: the analysis method comprises an analysis model consisting of an English composition preprocessing module, an English composition grammar role labeling module, an English composition feature extraction module, an English composition coreference resolution module, an English composition entity chain grid construction module and an English composition consistency analysis module which are connected in sequence, and comprises the following steps:

(1) the English composition preprocessing module reads an English composition, performs segmentation, word segmentation, sentence segmentation, part of speech tagging and dependency syntax analysis on the English composition, and outputs a preprocessing result of the English composition;

(2) The English composition grammar role labeling module reads in a preprocessing result of the English composition, finds out the dependency relationship of each entity word in the preprocessing result, labels the grammar roles of the entity words in sentences according to the dependency relationship and outputs the grammar roles of the entity words in the sentences;

(3) the English composition feature extraction module reads in a preprocessing result of the English composition, performs semantic grade definition on entity words in the preprocessing result, extracts coreference resolution features of the entity words at the same time, and outputs the coreference resolution features of the entity words;

(4) The English composition feature extraction module reads the entity word coreference resolution features output by the English composition feature extraction module, analyzes the coreference relation of the entity words through the coreference resolution model and the entity word coreference resolution features, and outputs a coreference linked list formed by the entity words;

(5) The English composition entity chain grid construction module reads in an entity word co-reference linked list output by the English composition co-reference resolution module, constructs the English composition entity chain grid by using the entity word co-reference linked list, replaces a priority low-syntax role in the entity chain grid by a priority high-syntax role, and outputs the English composition entity chain grid;

(6) The English written sentence consistency analysis module reads in the English written sentence entity chain grids output by the English written sentence entity chain grid construction module, analyzes sentence consistency of the English written sentence through the entity chain grids, calculates English written sentence consistency quality, and outputs English written sentence consistency quality analysis results.

2. the analytical method of claim 1, wherein: the English composition preprocessing module comprises the following processing steps:

P201 begins;

P202 reads English composition;

p203 segments English composition;

p204, separating the English composition;

p205 carries out word segmentation on English compositions;

p207 generates a word directed graph of the English composition according to the part-of-speech tagging result of the English composition, wherein in the directed graph, one node is a word and part-of-speech tagging thereof, and all nodes are connected through a directed edge;

P210 ends.

3. The analytical method of claim 1, wherein: the processing steps of the English composition grammar role labeling module are as follows:

p301 is started;

P302 reads the dependency syntax analysis result of English composition;

P305 finds the brother node of the current node;

p310 processes the preposition phrases and transfers to P315 operation;

P312 processes noun phrases, and then the operation is switched to P315;

p314 processes the phrase "Most of" and switches to P315;

P320 marks the grammatical role of the current entity word as "present";

p323 outputs the grammar role marking result of the English composition;

p324 ends.

4. The analytical method of claim 1, wherein: the English composition feature extraction module comprises the following processing steps:

p401 begins;

p402 reads the dependency syntax analysis result of English composition;

p403 traverses leaf nodes of the dependency syntax tree;

P416 queries a vocabulary of commonly represented characters;

p419 carries out named entity recognition on the entity words;

p422 carries out date detection on the entity words;

p425 labels the semantic level result of the entity word as "object";

p426 inquires a living vocabulary table containing vital sign vocabularies;

p429 labels the living characteristics of the entity words as false;

p430 queries the nickname table;

P432 sets the alias feature to null, turning to P434 operation;

p436 outputs English composition feature extraction results;

p437 ends.

5. the analytical method of claim 1, wherein: the calculation formula of the English composition coreference resolution module is defined as follows:

(1) calculation formula of vocabulary semantic similarity

in the formula (1), the path₁the path from the minimum common father node of two word senses to the root node in the semantic classification tree; route of travel₂refers to the path from the word sense of the first word to the smallest common parent node of the two word senses in the semantic classification tree; route of travel₃the method comprises the steps that in a semantic classification tree, the word meaning of a second word reaches the path of a minimum common father node of two word meanings, wherein the minimum common father node is the first common father node of two searched word meaning nodes which are searched upwards from the node where the word meanings of the two words are located;

(2) calculation formula of entity word set information entropy

in equation (2), the category attribute is decided_iis the ith attribute in the decision category; i is the serial number of the current attribute, i is 1, 2, …, m; m is the total number of attributes in the decision category; the total number of all attributes in the decision category is the attribute in the decision categorytotal number of sex;

(3) calculation formula of coreference resolution characteristic expected information

(4) calculation formula of coreference resolution characteristic information gain

(5) Calculation formula of coreference resolution feature splitting information

in the formula (6), the information gain is calculated by the above formula (4); the split information is calculated by the above equation (5).

6. the analytical method of claim 5, wherein: the English composition coreference resolution module comprises the following processing steps:

p501 begins;

p502 reads the feature extraction result of the English composition;

p503, traversing the feature extraction result of the English composition;

p511 constructs a candidate pre-fingered list;

P513 calculates the semantic similarity between the current expression and the vocabulary of all candidate pre-fingered words through the semantic classification tree, the word senses of the entity words and the formula (1), and adds the semantic similarity into the coreference resolution feature set of the English composition;

P516 calculates the information gain rate of each coreference resolution characteristic according to a formula (2), a formula (3), a formula (4), a formula (5) and a formula (6), and performs coreference resolution processing by adopting a coreference resolution algorithm;

p519 creates a new coreference chain for the referent;

P520 stores the newly created co-finger chain into the co-finger chain table;

P522 outputs the common finger linked list of English composition;

p523 ends.

7. the analytical method of claim 1, wherein: the English composition entity chain grid construction module comprises the following processing steps:

p601 is started;

P603 reads the common finger linked list of English composition;

p604 traverses the next common finger chain;

P605 traverses the next "expression" of the current co-designated chain;

P606 acquires new grammar role information of current 'expression';

p612 generates entity chain grids of English compositions;

P613 is finished.

8. the analytical method of claim 1, wherein: the calculation formula of the English composition entity chain grid construction module is defined as follows:

(1) formula for calculating occurrence probability of entity chain sequence

(2) calculation formula for English literary sentence coherent score

(3) calculation formula of sentence semantic consistency

In formula (9), the chain of entities_iis the current chain of entities being computed; i is the serial number of the current entity chain, i is 1, 2, …, n; n is a sentence₁and sentences₂total number of different entity chains in(ii) a Sentence₁Is a sentence currently to be calculated; sentence₂Is a sentence currently to be calculated;

(4) Calculation formula of shallow semantic consistency

9. the analytical method of claim 8, wherein: the processing steps of the English literary sentence continuous analysis module are as follows:

p701 starts;

p702 initializes a grammar role transition frequency matrix;

p703 reading in entity chain grids of English compositions;

p704 traverses the physical chain mesh;

the P706 caches the current grammar role and switches to the P704 operation;

P709 replaces the cached grammar role with the current grammar role;

p712 calculates the occurrence probability of the entity chain sequence according to formula (7);

P713 calculates a consistency score of the English composition according to formula (8);

p715, calculating sentence semantic similarity of the English composition according to formula (9), and calculating shallow semantic coherence of the English composition according to formula (10);

p717 ends.