CN106776550B - method for analyzing consistency quality of English literary texts - Google Patents

method for analyzing consistency quality of English literary texts Download PDF

Info

Publication number
CN106776550B
CN106776550B CN201611109331.4A CN201611109331A CN106776550B CN 106776550 B CN106776550 B CN 106776550B CN 201611109331 A CN201611109331 A CN 201611109331A CN 106776550 B CN106776550 B CN 106776550B
Authority
CN
China
Prior art keywords
entity
false
english
word
current
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611109331.4A
Other languages
Chinese (zh)
Other versions
CN106776550A (en
Inventor
黄桂敏
冯其良
黄思睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin University of Electronic Technology
Original Assignee
Guilin University of Electronic Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Electronic Technology filed Critical Guilin University of Electronic Technology
Priority to CN201611109331.4A priority Critical patent/CN106776550B/en
Publication of CN106776550A publication Critical patent/CN106776550A/en
Application granted granted Critical
Publication of CN106776550B publication Critical patent/CN106776550B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides an analysis method for consistency quality of English writing, which is an analysis model consisting of an English writing preprocessing module, an English writing grammar role labeling module, an English writing feature extraction module, an English writing coreference resolution module, an English writing entity chain grid construction module and an English writing consistency analysis module which are sequentially connected. After an English composition is processed by the analysis model, a consistent quality analysis result of the English composition can be obtained finally. The method solves the problem of consistency analysis of index relations and words for languages in English texts, and the analysis result is better than that of the traditional method for analyzing the consistency quality of English texts.

Description

method for analyzing consistency quality of English literary texts
Technical Field
the invention relates to a natural language processing technology, a machine learning algorithm and an English language content analysis technology, in particular to an analysis method for the consistency quality of English language sentences.
Background
the traditional English text consistency quality analysis method mainly comprises a potential semantic analysis method and an entity grid analysis method. The latent semantic analysis method is a method for analyzing the internal semantic relation among vocabularies by constructing a word-document matrix, reducing the dimension of the word-document matrix by using singular value decomposition. However, because singular value decomposition is a mathematical transformation, the newly generated matrix is relatively poor in interpretability, and the potential semantic analysis method has the defects that the phenomenon of word ambiguity cannot be processed, and the appearance sequence of words is ignored. In recent years, an entity grid analysis method gradually replaces a potential semantic analysis method and becomes a more widely used English text consistency quality analysis method. Moreover, the traditional English text consistency quality analysis method generally solves the problem of consistency of English texts in news reports, and because the chapter structures, the reference relations and the words for languages of the English texts are relatively fixed, the traditional English text consistency quality analysis method can obtain a better analysis result; however, since the chapter structure, the reference relationship, and the words for language of the english composition are not fixed, the analysis result obtained by the traditional english text and language consistency quality analysis method in the aspect of analyzing the english composition and language consistency quality is not ideal. Therefore, in order to solve the problems, the invention provides a method for analyzing the consistency quality of English literary pieces.
Disclosure of Invention
1. a method for analyzing the consistency quality of English literary texts is characterized by comprising the following steps: the method comprises an analysis model consisting of an English composition preprocessing module, an English composition grammar role labeling module, an English composition characteristic extraction module, an English composition coreference resolution module, an English composition entity chain grid construction module and an English composition consistency analysis module which are connected in sequence, wherein the overall processing steps of the analysis model are shown in figure 1.
in the analysis model, a first step English composition preprocessing module reads in an English composition, performs segmentation, word segmentation, sentence segmentation, part of speech tagging and dependency syntax analysis on the English composition, and outputs a preprocessing result of the English composition; secondly, reading the preprocessing result of the English composition by an English composition grammar role labeling module, finding out the dependency relationship of each entity word in the preprocessing result, labeling grammar roles of the entity words in sentences according to the dependency relationship, and outputting the grammar roles of the entity words in the sentences; thirdly, an English composition feature extraction module reads in a preprocessing result of the English composition, performs semantic grade definition on entity words in the preprocessing result, extracts coreference resolution features of the entity words at the same time, and outputs the coreference resolution features of the entity words; fourthly, the English composition feature extraction module reads in the entity word coreference resolution features output by the English composition feature extraction module, analyzes the coreference relation of the entity words through the coreference resolution model and the entity word coreference resolution features, and outputs a coreference linked list formed by the entity words; fifthly, reading the entity word co-reference linked list output by the English composition co-reference resolution module by the English composition entity chain grid structure modeling block, constructing the English composition entity chain grid by using the entity word co-reference linked list, replacing a priority low-language role in the entity chain grid by using a priority high-grammar role, and outputting the English composition entity chain grid; sixthly, reading the English composition entity chain grids output by the English composition entity chain grid construction module by the English composition consistency analysis module, analyzing sentence consistency of the English composition through the entity chain grids, calculating English composition consistency quality, and outputting English composition consistency quality analysis results; the following are the processing steps for each module in the analytical model:
(1) The English composition preprocessing module comprises the following processing steps as shown in FIG. 2:
p201 begins;
p202 reads English composition;
p203 segments English composition;
p204, separating the English composition;
p205 carries out word segmentation on English compositions;
p206 carries out part-of-speech tagging on English compositions according to the word part-of-speech tagging set;
and P207 generates a word directed graph of the English composition according to the part-of-speech tagging result of the English composition. In the directed graph, one node is a word and part of speech labels thereof, and all nodes are connected through a directed edge;
p208 is combined according to the dependency relationship weight of the directed edge, and a weight corresponding to the dependency relationship of the directed edge in the directed graph is given;
P209 generates a sentence dependency relationship library of English composition through greedy search and outputs a dependency syntax analysis result of the English composition;
p210 is finished;
(2) the processing steps of the English composition grammar role labeling module are as follows, as shown in FIG. 3:
P301 is started;
p302 reads the dependency syntax analysis result of English composition;
p303, traversing leaf nodes of a dependency tree in the sentence dependency relationship of the English composition;
p304, if the word where the current node is located is an entity word, turning to P305 operation, otherwise, turning to P303 operation;
p305 finds the brother node of the current node;
P306 inquires the composition components formed by the current node and the brother nodes of the current node in the word part-of-speech tagging set;
p307, if a result is inquired in the word part-of-speech tagging set, turning to P308 operation, otherwise, turning to P315 operation;
p308, performing phrase tagging on the current phrase according to the phrase syntax tagging set;
p309 if the current phrase is a preposition phrase, turning to P310 operation, otherwise, turning to P311 operation;
p310 processes the preposition phrases and transfers to P315 operation;
p311 if the current phrase is a noun phrase, then go to P312 operation, otherwise go to P313 operation;
p312 processes noun phrases, and then the operation is switched to P315;
p313, if the current phrase is a phrase of "Most of", then go to P314 operation, otherwise go to P315 operation;
p314 processes the phrase "Most of" and switches to P315;
p315 obtains all dependency relations related to the current entity words from the sentence dependency relation library of English composition;
p316, if the noun subject relation or clause component subject relation exists, then the operation is switched to P317, otherwise, the operation is switched to P318;
p317 marks the grammar role of the current entity word as a subject, and then the operation is switched to P321;
p318 goes to P319 operation if there is a direct object relationship or an indirect object relationship, otherwise goes to P320 operation;
p319 marks the current entity word as an object and transfers to P321 operation;
p320 marks the grammatical role of the current entity word as "present";
p321 stores the grammar role marking result of the English composition into a grammar role marking linked list;
P322, if the dependency syntax tree traversal is finished, then turning to P323 operation, otherwise, turning to P303 operation;
p323 outputs the grammar role marking result of the English composition;
p324 is finished;
(3) The processing steps of the English composition feature extraction module are as follows, as shown in FIG. 4:
p401 begins;
p402 reads the dependency syntax analysis result of English composition;
p403 traverses leaf nodes of the dependency syntax tree;
p404, detecting entity words according to the dependency syntax analysis result;
p405, if the current word is detected as the entity word, turning to the P406 operation, otherwise, turning to the P403 operation;
p406 creates an 'expression' object for storing feature information according to the coreference resolution feature set;
p407, finding a leaf node representing the current entity word in the dependency syntax tree according to the position information of the entity word;
p408, starting from the current leaf node, searching the parent node of the current node upwards;
P409, if the parent node is a noun phrase node, then turning to P410 operation, otherwise, turning to P408 operation;
p410 integrates all leaf nodes under the noun phrase node into one phrase and stores the phrase into the current expression object;
P411 extracts the common features of the entity words according to the coreference resolution feature set and stores the common features into the current 'expression' object;
p412 queries the male name vocabulary and the female name vocabulary respectively;
p413, if the gender of the current entity word is inquired, turning to the operation P414, and otherwise, turning to the operation P415;
p414 stores the sex characteristics into the current 'expression' object, and then the operation is switched to P416;
p415 sets the gender feature as unknown and stores it in the current "expression" object;
p416 queries a vocabulary of commonly represented characters;
P417, if the current entity word is inquired in the vocabulary commonly representing the characters, turning to the P418 operation, otherwise, turning to the P419 operation;
p418 takes the corresponding semantic level type as the semantic level of the current entity word, and then the operation is switched to P426;
p419 carries out named entity recognition on the entity words;
p420, if the named entity identification result is the named entity, turning to P421 operation, otherwise, turning to P422 operation;
P421 converts the recognition result of the named entity into corresponding semantic level, and then transfers to P426 operation;
p422 carries out date detection on the entity words;
p423 if the entity word represents a date, then go to P424 operation, otherwise go to P425 operation;
p424 sets the semantic level of the entity word as the date, and then the operation goes to P426;
p425 labels the semantic level result of the entity word as "object";
P426 inquires a living vocabulary table containing vital sign vocabularies;
P427 if the current entity word is looked up in the live vocabulary, go to P428 operation, otherwise go to P429 operation;
p428 marks the live features of the entity words as true, and then the operation is switched to P430;
p429 labels the living characteristics of the entity words as false;
p430 queries the nickname table;
if the current entity word is inquired in the nickname table, the operation is switched to the P433 operation by the P431, and otherwise, the operation is switched to the P432 operation;
p432 sets the alias feature to null, turning to P434 operation;
p433 stores the searched alias information into the alias characteristics of the current entity word;
p434 queries the fact information of the current entity in the English entity word common sense library, and stores the result in the coreference resolution feature set;
P435, if the dependency syntax tree traversal is finished, then go to P436, otherwise go to P403;
p436 outputs English composition feature extraction results;
p437 is finished;
(4) The English composition coreference resolution module comprises the following processing steps as shown in FIG. 5:
p501 begins;
P502 reads the feature extraction result of the English composition;
P503, traversing the feature extraction result of the English composition;
p504 detects whether the word "it" is in a sentence pattern or semantics that do not refer to any pre-referent;
p505 if the word "it" has non-referency, then go to P519 operation, otherwise go to P506 operation;
P506, if the current entity word has the collocate, turning to P507 for operation, otherwise, turning to P508 for operation;
p507 adds the current entity word into the common finger chain where the co-located word is located, and changes to the operation of P521;
p508 if the current entity word is in a clause beginning with "As", then go to P509 operation, otherwise go to P511 operation;
p509 switches to P510 if the subject in the "As" clause has subject relation, otherwise switches to P511;
P510 adds the current entity word into the entity chain where the entity word of the "As" clause subject is located, and then the operation is switched to P521;
p511 constructs a candidate pre-fingered list;
p512 inquires the word senses of the entity words in an English vocabulary network dictionary;
P513 calculates the vocabulary semantic similarity between the current 'expression' and all candidate pre-fingered words through a semantic classification tree, the word senses of the entity words and the following formula (1), and adds the vocabulary semantic similarity into the coreference resolution feature set of the English composition;
P514 converts the characteristic information of the current entity word and all candidate pre-fingered words into a format required by a coreference resolution algorithm;
p515 is loaded in a coreference resolution model which is trained in a coreference resolution training English composition set;
p516 calculates the information gain rate of each coreference resolution characteristic according to the following formula (2), formula (3), formula (4), formula (5) and formula (6), and performs coreference resolution processing by adopting a coreference resolution algorithm;
If the fruit body words are commonly referred to, the operation is switched to P518, otherwise, the operation is switched to P519;
p518 adds the back-index word into the common-index chain where the front-index word is positioned, and then the operation is switched to P521;
p519 creates a new coreference chain for the referent;
P520 stores the newly created co-finger chain into the co-finger chain table;
if the P521 refers to the end of the traversal of the feature result, the operation is switched to the P522 operation, otherwise, the operation is switched to the P503 operation;
P522 outputs the common finger linked list of English composition;
p523 ends;
(5) the processing steps of the English composition entity chain grid construction module are as follows, as shown in FIG. 6:
p601 is started;
p602 creates a two-dimensional matrix for storing the common finger chain grammar role information and initializes the matrix;
P603 reads the common finger linked list of English composition;
p604 traverses the next common finger chain;
P605 traverses the next "expression" of the current co-designated chain;
p606 acquires new grammar role information of current 'expression';
p607 acquires the old grammar role of the expression from the two-dimensional matrix through the current position information of the expression;
p608, if the priority of the new grammar role is higher than that of the old grammar role, then the operation is switched to P609, otherwise, the operation is switched to P610;
p609 replaces the old grammar role with the new grammar role in the two-dimensional matrix according to the current 'expression' position information;
p610 transfers to P605 operation if unprocessed 'expression' exists in the current co-reference chain, otherwise transfers to P611 operation;
P611 transfers to P604 operation if there are still unprocessed co-finger chains, otherwise transfers to P612 operation;
p612 generates entity chain grids of English compositions;
p613 is over;
(6) The processing steps of the English literary sentence continuous analysis module are as follows, as shown in FIG. 7:
P701 starts;
p702 initializes a grammar role transition frequency matrix;
P703 reading in entity chain grids of English compositions;
p704 traverses the physical chain mesh;
p705 if the current entity word is the first element of the entity chain, then the operation is switched to P706, otherwise, the operation is switched to P707;
the P706 caches the current grammar role and switches to the P704 operation;
P707 forming a transition sequence with a length of 2 by the current grammar role and the cached grammar role;
P708 adds 1 to the frequency of the current transfer sequence in the grammar character transfer frequency matrix;
p709 replaces the cached grammar role with the current grammar role;
P710, if the traversal of the entity chain grid is finished, turning to P711 operation, otherwise, turning to P704 operation;
p711 loads an entity chain grid model trained in a language piece coherent training English composition set;
p712 calculates transition probability of grammatical role sequence in entity chain grid according to the following formula (7);
p713 calculates a consistency score of the english composition according to the following formula (8);
p714 weights the scores according to the number of different entity words in the common finger chain;
P715, calculating the semantic similarity between adjacent sentences according to the following formula (9), and calculating the shallow semantic consistency of English compositions according to the following formula (10);
p716 generating a English composition consistency analysis result according to the English composition consistency score and the shallow semantic consistency;
P717 ends.
2. the basic concept of the method of the invention is defined as follows:
(1) common-finger resolution training English composition set and English composition coherent training English composition set
The common-reference resolution training English corpus is obtained from English model texts which do not contain word errors, grammar errors, expression errors and common-reference phenomena, and the language piece coherent training English corpus is obtained from English model texts which do not contain word errors, grammar errors, expression errors and language piece coherence.
(2) word part-of-speech tagging set and phrase syntax tagging set
the part-of-speech tagging set of words and the syntactic tagging set of phrases adopt a Bingzhou tree library part-of-speech tagging set.
(3) sentence dependency relationship library
the sentence dependency library refers to a collection containing all dependencies of english compositions. In the sentence dependency library, each row represents a dependency record, and the storage structure of each dependency is as follows:
dependency relationship (word 1-position number of word 1, word 2-position number of word 2)
(4) coreference resolution feature set
the coreference resolution feature set is a feature set containing all feature information of the back and front referents, and is shown in the following table 1:
table 1: coreference resolution feature set
(5) english vocabulary network dictionary
the English vocabulary network dictionary is an English dictionary containing common English word meaning information, and the storage structure of the English vocabulary network dictionary is as follows:
part of speech (word frequency) { offset } < affiliated dictionary filename > [ dictionary file number ] (word # meaning number) (meaning "example sentence"
(6) semantic classification tree
the semantic classification tree is an English word relation set comprising English words, synonymy relations, antisense relations, integral part relations, attribute relations, modification relations, upper meaning relations and lower meaning relations, and the storage structure of the semantic classification tree is as follows:
(7) Knowledge base of English entity word common sense
the English entity word common sense knowledge base refers to a fact collection base containing relevant information about daily entities. In the english entity word general knowledge base, each row represents a fact, each fact is composed of a sequence number and a triple < entity 1, relationship, entity 2>, the sequence number is the number of the fact in the knowledge base, entity 1 is the subject, relationship is the predicate, entity 2 is the object, and the storage structure of each fact is as follows:
< sequence number > < entity 1> < relationship > < entity 2>
3. The calculation formula of the method of the invention is defined as follows:
(1) calculation formula of vocabulary semantic similarity
In the formula (1), the path1the path from the minimum common father node of two word senses to the root node in the semantic classification tree; route of travel2Refers to the path from the word sense of the first word to the smallest common parent node of the two word senses in the semantic classification tree; route of travel3refers to the path from the sense of the second word to the smallest common parent node of the two word senses in the semantic classification tree. The minimum common father node is the first common father node of the searched current two word meaning nodes, which is searched upwards from the node where the word meanings of the two words are located;
(2) calculation formula of entity word set information entropy
In equation (2), the category attribute is decidediis the ith attribute in the decision category; i is the serial number of the current attribute, i is 1, 2, …, m; m is the total number of attributes in the decision category; the total number of all attributes in a decision category is the total number of attributes in the decision category.
(3) calculation formula of coreference resolution characteristic information gain
common-meaning resolution feature information gain is entity word set information entropy-common-meaning resolution feature expectation information (4)
In the formula (4), the information entropy of the entity word set is obtained by calculation through the formula (2); the coreference resolution characteristic expected information is obtained by calculation according to the formula (3);
(4) calculation formula of coreference resolution characteristic expected information
In the formula (3), the characteristicsAis the feature currently to be calculated; j is the serial number of the current attribute, j is 1, 2, …, v; v is a feature in a set of entity wordsAthe number of attribute categories of (2); propertiesjIs characterized in thatAthe jth attribute; the entity word set information entropy is obtained by calculation according to the formula (2);
(5) calculation formula of coreference resolution feature splitting information
in the formula (5), the characteristicsAis the feature currently to be calculated; j is the serial number of the current attribute, j is 1, 2, …, v; v is a feature in a set of entity wordsAthe number of attribute categories of (2); propertiesjis characterized in thatAThe jth attribute;
(6) Calculation formula of coreference resolution characteristic information gain rate
in the formula (6), the information gain is calculated by the following formula (4); the splitting information is calculated by the following formula (5).
(7) Formula for calculating occurrence probability of entity chain sequence
in formula (7), i is the sequence number of the current grammar role in the grammar role sequence, i is 1, 2, …, v; v is the total number of grammatical roles of the current entity chain; grammatical rolesiis the currently computed grammar role; grammatical rolesi+1Is the currently computed grammar role; the co-occurrence times refer to the total times of co-occurrence of two grammatical roles in a consistent training English composition set of the language;
(8) Calculation formula for English literary sentence coherent score
In formula (8), i is the sequence number of the current sentence, i is 1, 2, …, n; j is the serial number of the current entity chain, j is 1, 2, …, m; n is the total number of sentences; m is the total number of entity chains; the weight of the entity chain refers to the number of different entity words in the same entity chain; the occurrence probability of the entity chain sequence is obtained by calculation according to the formula (7);
(9) Calculation formula of sentence semantic consistency
in formula (9), the chain of entitiesiis the current chain of entities being computed; i is the serial number of the current entity chain, i is 1, 2, …, n; n is a sentence1And sentences2the total number of different chains of entities in; sentence1is a sentence currently to be calculated; sentence2is a sentence currently to be calculated;
(10) Calculation formula of shallow semantic consistency
in formula (10), i is the serial number of the sentence to be currently calculated, i is 1, 2, …, n; n is the total number of sentences of the English composition; the sentence semantic consistency is calculated by the above formula (9).
(V) description of the drawings
FIG. 1 is a diagram of the overall processing steps of the English composition of the method of the present invention;
FIG. 2 is a diagram of the processing steps of the English composition preprocessing module of the method of the present invention;
FIG. 3 is a diagram of the processing steps of the English composition grammar role labeling module of the method of the present invention;
FIG. 4 is a diagram of the processing steps of the English composition feature extraction module of the method of the present invention;
FIG. 5 is a diagram of English composition coreference resolution module processing steps of the method of the present invention;
FIG. 6 is a diagram of the processing steps of the English composition entity chain grid building module of the method of the present invention;
FIG. 7 is a diagram of processing steps of a consistency analysis module for English literary pieces of the method of the present invention;
(VI) detailed description of the preferred embodiment
the specific implementation mode of the method for analyzing the consistency quality of the English literary texts is divided into the following six steps.
the first step is as follows: executing 'English composition preprocessing module'
the subject of English composition in the embodiment of the present invention Is "Is it new science for college students to knock out the society", the implementation results are as follows:
(1) the contents of the English composition are as follows:
Is it necessary for college students to know about the society?
It is necessary for college students to know about the society.After graduatingfrom campus,they will enter the society,which is quite differentfrom university.
In order to adapt to the complicated society in the future,they should do somethingnow.There are many ways to know the world outside thecampus.For example,studentscan know about it through mass medium,such as TV,newspapers and etc.In addition,they should take part in various activities ofsociety.
As far as I am concerned,I'm eager to know the society.I'm a student of JournalismDepartment,so I can know many people.I mean to make friends withthem after each interview.I will keep in touch with them and communicate witheach other.From them,I may learn about part of the world.I will do so fromnow on.
(2) After the part of speech tagging is performed on the English composition, the generated part of speech tagging result is as follows:
It/PRP is/VBZ necessary/JJ for/IN college/NN students/NNS to/TO know/VB about/INthe/DT society/NN./.After/IN graduating/VBG from/IN campus/NN,/,they/PRP will/MDenter/VB the/DT society/NN,/,which/WDT is/VBZ quite/RBdifferent/JJ from/INuniversity/NN./.In/IN order/NN to/TO adapt/VB to/TO the/DT complicated/JJsociety/NN in/IN the/DT future/NN,/,they/PRP should/MD do/VBsomething/NN now/RB./.
There/EX are/VBP many/JJ ways/NNS to/TO know/VB the/DT world/NN outside/IN the/DTcampus/NN./.For/IN example/NN,/,students/NNS can/MD know/VBabout/IN the/DTworld/NN through/IN mass/NN medium/NN,/,such/JJ as/IN TV/NN,/,newspapers/NNSand/CC etc./FW./.In/IN addition/NN,/,they/PRP should/MD take/VBpart/NN in/INvarious/JJ activities/NNS of/IN society/NN./.
As/RB far/RB as/IN I/PRP am/VBP concerned/VBN,/,I/PRP'm/VBP eager/JJ to/TOknow/VB the/DT society/NN./.I/PRP'm/VBP a/DT student/NN of/INJournalism/NNPSDepartment/NNP,/,so/IN I/PRP can/MD know/VB many/JJ people/NNS./.I/PRP mean/VBPto/TO make/VB friends/NN with/IN them/PRP after/IN each/DT interview/NN./.I/PRPwill/MD keep/VB in/RB touch/NN with/IN them/PRP and/CCcommunicate/VB with/IN each/DTother/JJ./.From/IN them/PRP,/,I/PRP may/MDlearn/VB about/IN part/NN of/IN the/DTworld/NN./.I/PRP will/MD do/VB so/RBfrom/IN now/RB on/IN./.
(3) after dependency parsing processing is performed on english compositions, the sentence dependency relationship library generated is as follows:
nsubj(necessary-3,It-1)cop(necessary-3,is-2)root(ROOT-0,necessary-3)mark(know-8,for-4)compound(students-6,college-5)nsubj(know-8,students-6)mark(know-8,to-7)advcl(necessary-3,know-8)case(society-11,about-9)det(society-11,the-10)nmod:about(know-8,society-11)
mark(graduating-2,After-1)advcl(enter-8,graduating-2)case(campus-4,from-3)nmod:from(graduating-2,campus-4)nsubj(enter-8,they-6)aux(enter-8,will-7)root(ROOT-0,enter-8)det(society-10,the-9)dobj(enter-8,society-10)nsubj(different-15,society-10)ref(society-10,which-12)cop(different-15,is-13)advmod(different-15,quite-14)acl:relcl(society-10,different-15)case(university-17,from-16)nmod:from(different-15,university-17)
mark(adapt-4,In-1)mwe(In-1,order-2)mark(adapt-4,to-3)advcl(do-15,adapt-4)case(society-8,to-5)det(society-8,the-6)amod(society-8,complicated-7)nmod:to(adapt-4,soci ety-8)case(future-11,in-9)det(future-11,the-10)nmod:in(adapt-4,future-11)nsubj(do-15,they-13)aux(do-15,should-14)root(ROOT-0,do-15)dobj(do-15,something-16)advmod(do-15,now-17)
expl(are-2,There-1)root(ROOT-0,are-2)amod(ways-4,many-3)nsubj(are-2,ways-4)mark(know-6,to-5)acl(ways-4,know-6)det(world-8,the-7)dobj(know-6,world-8)case(campus-11,outside-9)det(campus-11,the-10)nmod:outside(know-6,campus-11)
case(example-2,For-1)nmod:for(know-6,example-2)nsubj(know-6,students-4)aux(know-6,can-5)root(ROOT-0,know-6)case(world-9,about-7)det(world-9,the-8)nmod:about(know-6,world-9)case(medium-12,through-10)compound(medium-12,mass-11)nmod:through(world-9,medium-12)case(TV-16,such-14)mwe(such-14,as-15)nmod:such_as(medium-12,TV-16)nmod:such_as(medium-12,newspapers-18)conj:and(TV-16,newspapers-18)cc(TV-16,and-19)nmod:such_as(medium-12,etc.-20)conj:and(TV-16,etc.-20)
case(addition-2,In-1)nmod:in(take-6,addition-2)nsubj(take-6,they-4)aux(take-6,should-5)root(ROOT-0,take-6)dobj(take-6,part-7)case(activities-10,in-8)amod(activities-10,various-9)nmod:in(take-6,activities-10)case(society-12,of-11)nmod:of(activities-10,society-12)
advmod(far-2,As-1)advmod(concerned-6,far-2)mark(concerned-6,as-3)nsubjpass(concerned-6,I-4)auxpass(concerned-6,am-5)advcl(eager-10,concerned-6)nsubj(eager-10,I-8)nsubj(know-12,I-8)cop(eager-10,'m-9)root(ROOT-0,eager-10)mark(know-12,to-11)xcomp(eager-10,know-12)det(society-14,the-13)dobj(know-12,society-14)
nsubj(student-4,I-1)cop(student-4,'m-2)det(student-4,a-3)root(ROOT-0,student-4)case(Department-7,of-5)compound(Department-7,Journalism-6)nmod:of(student-4,Department-7)dep(student-4,so-9)nsubj(know-12,I-10)aux(know-12,can-11)parataxis(student-4,know-12)amod(people-14,many-13)dobj(know-12,people-14)
nsubj(mean-2,I-1)nsubj(make-4,I-1)root(ROOT-0,mean-2)mark(make-4,to-3)xcomp(mean-2,make-4)dobj(make-4,friends-5)case(them-7,with-6)nmod:with(make-4,them-7)case(interview-10,after-8)det(interview-10,each-9)nmod:after(make-4,interview-10)
nsubj(keep-3,I-1)nsubj(communicate-9,I-1)aux(keep-3,will-2)root(ROOT-0,keep-3)advmod(keep-3,in-4)dobj(keep-3,touch-5)case(them-7,with-6)nmod:with(keep-3,them-7)cc(keep-3,and-8)conj:and(keep-3,communicate-9)case(other-12,with-10)det(other-12,each-11)nmod:with(communicate-9,other-12)
case(them-2,From-1)nmod:from(learn-6,them-2)nsubj(learn-6,I-4)aux(learn-6,may-5)root(ROOT-0,learn-6)case(part-8,about-7)nmod:about(learn-6,part-8)case(world-11,of-9)det(world-11,the-10)nmod:of(part-8,world-11)
nsubj(do-3,I-1)aux(do-3,will-2)root(ROOT-0,do-3)advmod(do-3,so-4) case(now-6,from-5)advcl:on(do-3,now-6)case(now-6,on-7)
The second step is as follows: executing English composition grammar role marking module "
The English composition grammar role labeling module executes the part of speech labeling result output by the English composition preprocessing module and the sentence dependency relationship library to detect the entity words through the first step, performs grammar role labeling on the entity words, and finally generates the grammar role labeling result of the English composition. Since there are many entity words whose grammatical role is "not present", only the labeling result of the entity word whose grammatical role is not "not present" is listed below.
It(1)/S society(3)/X society(6)/O society(8)/X students(2)/S they(5)/S they(10)/Scampus(4)/X university(7)/X future(9)/X something(11)/O ways(1)/Sworld(2)/Oworld(5)/X campus(3)/X students(4)/S they(10)/S medium(6)/X TV(7)/Xnewspapers(8)/Xaddition(9)/X part(11)/O activities(12)/X I(1)/O I(2)/S I(4)/SI(6)/S I(8)/S I(12)/SI(16)/S I(18)/S society(3)/O student(5)/X people(7)/Ofriends(9)/O them(10)/Xthem(14)/X them(15)/X interview(11)/X touch(13)/O part(17)/X
the third step: executing 'English composition feature extraction module'
the English composition feature extraction module is used for extracting the coreference resolution feature of the English composition entity word set according to the part-of-speech tagging result, the sentence dependency relationship library and the grammar role tagging result which are output after the first step and the second step are executed, and finally outputting the coreference resolution feature set of the English composition entity word set. As the coreference resolution feature set of the English composition is huge and cannot be listed one by one, only the coreference resolution feature set of the first section of the English composition is listed below, and the following data is replaced by the omission symbol.
[college students,It]:FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,2,0,0.084746,28.559322,FALSE,TRUE,FALSE,FALSE
[the society,college students]:FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,2,0,0.084746,0,FALSE,TRUE,TRUE,FALSE
[the society,It]:FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,TRUE,FALSE,FALSE,FALSE,1,0.444444,0.847458,0,FALSE,TRUE,FALSE,FALSE
[campus,the society]:FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,1,0.235294,0.677966,0.084746,FALSE,FALSE,FALSE,FALSE
[campus,college students]:FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,2,0,0,0.084746,FALSE,TRUE,FALSE,FALSE
[campus,It]:FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,1,0.235294,0.677966,0.084746,FALSE,FALSE,FALSE,FALSE
[they,campus]:FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,2,0,0.762712,0.423729,FALSE,FALSE,FALSE,FALSE
[they,the society]:FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,2,0,18.389831,28.559322,FALSE,FALSE,TRUE,FALSE
[they,college students]:FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,0,0,0.084746,28.559322,FALSE,TRUE,TRUE,FALSE
[the society,they]:FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE, 2,0,0.847458,6.101695,FALSE,TRUE,TRUE,FALSE
[the society,campus]:FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,1,0.235294,0.762712,0.169492,FALSE,FALSE,FALSE,FALSE
[the society,the society]:TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,1,1,2.79661,6.101695,FALSE,FALSE,TRUE,TRUE
[university,the society]:FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,1,0.75,0.677966,0.254237,FALSE,FALSE,FALSE,FALSE
[university,they]:FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,2,0,0.508475,0.084746,FALSE,TRUE,FALSE,FALSE
[university,campus]:FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,1,0.421053,3.220339,0,FALSE,FALSE,FALSE,TRUE
[the complicated society,university]:FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,1,0.75,0.847458,0.254237,FALSE,FALSE,FALSE,FALSE
[the complicated society,the society]:FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,1,1,0.59322,0.847458,FALSE,FALSE,TRUE,TRUE
[the future,the complicated society]:FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,1,0.625,2.79661,0.677966,FALSE,FALSE,FALSE,FALSE
[the future,university]:FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,1,0.705882,0.762712,0,FALSE,FALSE,FALSE,FALSE
[the future,the society]:FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,1,0.625,2.79661,0.677966,FALSE,FALSE,FALSE,FALSE
[the future,they]:FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,2,0,0.847458,0.677966,FALSE,TRUE,FALSE,FALSE
[the future,campus]:FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,1,0.705882,0.762712,0,FALSE,FALSE,FALSE,FALSE
[the future,the society]:FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,1,0.625,2.79661,0.677966,FALSE,FALSE,FALSE,FALSE
[the future,college students]:FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,2,0,0.847458,0.677966,FALSE,TRUE,FALSE,FALSE
[the future,It]:FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,1,0.625,2.79661,0.677966,FALSE,FALSE,FALSE,FALSE
[they,the future]:FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,2,0,0.847458,0.677966,FALSE,FALSE,FALSE,FALSE
[they,the complicated society]:FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,2,0,18.389831,28.559322,FALSE,FALSE,TRUE,FALSE
[they,university]:FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,2,0, 0.762712,0.423729,FALSE,FALSE,FALSE,FALSE
[they,the society]:FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,2,0,18.389831,28.559322,FALSE,FALSE,TRUE,FALSE
[they,they]:TRUE,FALSE,FALSE,TRUE,TRUE,FALSE,TRUE,FALSE,FALSE,FALSE,0,0,18.389831,28.559322,FALSE,TRUE,TRUE,TRUE
[something,they]:FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,2,0,0.508475,6.101695,FALSE,TRUE,FALSE,FALSE
[something,the future]:FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,1,0.461538,0.677966,0.677966,FALSE,FALSE,FALSE,FALSE
[something,the complicated society]:FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,1,0.333333,0.677966,6.101695,FALSE,FALSE,FALSE,FALSE
[something,university]:FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,1,0.428571,3.220339,0.169492,FALSE,FALSE,FALSE,FALSE
[something,the society]:FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,1,0.333333,0.677966,6.101695,FALSE,FALSE,FALSE,FALSE
[something,they]:FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,2,0,0.508475,6.101695,FALSE,TRUE,FALSE,FALSE
[something,campus]:FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,1,0.428571,3.220339,0.169492,FALSE,FALSE,FALSE,FALSE
[something,the society]:FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,1,0.333333,0.677966,6.101695,FALSE,FALSE,FALSE,FALSE
[something,college students]:FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,2,0,0.508475,6.101695,FALSE,TRUE,FALSE,FALSE
[something,It]:FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,1,0.333333,0.677966,6.101695,FALSE,FALSE,FALSE,FALSE
……
the fourth step: executing 'English composition common reference clearing module'
The English composition coreference resolution module executes coreference resolution characteristic set output by the English composition characteristic extraction module through the third step, carries out coreference resolution on entity words in English compositions, and finally generates a coreference linked list of the English composition, as shown in the following table 2:
table 2: common finger chain table for English composition
the fifth step: executing 'English composition entity chain grid construction module'
the English composition entity chain grid construction module is used for executing the co-reference linked list output by the English composition co-reference resolution module through the fourth step, executing the grammar role marking result output by the English composition grammar role marking module through the second step, and constructing the English composition entity chain grid, wherein the constructed entity chain grid is as follows:
a sixth step: executing 'consistent analysis module of English literary sentence'
the English composition consistency analysis module executes the solid chain grids output by the English composition solid chain grid structure modeling block through the fifth step, analyzes the English composition consistency according to the formula (7), the formula (8), the formula (9) and the formula (10), and the analysis result is as follows:
consistency score: 0.3267 (coherent character)
segment 1 coherence score: 0.3733 (better consistency)
clause 1 and clause 2 transition scores: 6 (very good transition)
clause 2 and clause 3 transition scores: 5 (very good transition)
segment 2 coherence score: 0.2119 (word reluctant)
clause 1 and clause 2 transition scores: 2 (transition general)
Clause 2 and clause 3 transition scores: 3 (good transition)
segment 3 coherence score: 0.3949 (better consistency)
clause 1 and clause 2 transition scores: 3 (good transition)
clause 2 and clause 3 transition scores: 3 (good transition)
clause 3 and clause 4 transition scores: 4 (good transition)
Clause 4 and clause 5 transition score: 4 (good transition)
clause 5 and clause 6 transition scores: 3 (good transition).

Claims (9)

1. A method for analyzing the consistency quality of English literary texts is characterized by comprising the following steps: the analysis method comprises an analysis model consisting of an English composition preprocessing module, an English composition grammar role labeling module, an English composition feature extraction module, an English composition coreference resolution module, an English composition entity chain grid construction module and an English composition consistency analysis module which are connected in sequence, and comprises the following steps:
(1) the English composition preprocessing module reads an English composition, performs segmentation, word segmentation, sentence segmentation, part of speech tagging and dependency syntax analysis on the English composition, and outputs a preprocessing result of the English composition;
(2) The English composition grammar role labeling module reads in a preprocessing result of the English composition, finds out the dependency relationship of each entity word in the preprocessing result, labels the grammar roles of the entity words in sentences according to the dependency relationship and outputs the grammar roles of the entity words in the sentences;
(3) the English composition feature extraction module reads in a preprocessing result of the English composition, performs semantic grade definition on entity words in the preprocessing result, extracts coreference resolution features of the entity words at the same time, and outputs the coreference resolution features of the entity words;
(4) The English composition feature extraction module reads the entity word coreference resolution features output by the English composition feature extraction module, analyzes the coreference relation of the entity words through the coreference resolution model and the entity word coreference resolution features, and outputs a coreference linked list formed by the entity words;
(5) The English composition entity chain grid construction module reads in an entity word co-reference linked list output by the English composition co-reference resolution module, constructs the English composition entity chain grid by using the entity word co-reference linked list, replaces a priority low-syntax role in the entity chain grid by a priority high-syntax role, and outputs the English composition entity chain grid;
(6) The English written sentence consistency analysis module reads in the English written sentence entity chain grids output by the English written sentence entity chain grid construction module, analyzes sentence consistency of the English written sentence through the entity chain grids, calculates English written sentence consistency quality, and outputs English written sentence consistency quality analysis results.
2. the analytical method of claim 1, wherein: the English composition preprocessing module comprises the following processing steps:
P201 begins;
P202 reads English composition;
p203 segments English composition;
p204, separating the English composition;
p205 carries out word segmentation on English compositions;
p206 carries out part-of-speech tagging on English compositions according to the word part-of-speech tagging set;
p207 generates a word directed graph of the English composition according to the part-of-speech tagging result of the English composition, wherein in the directed graph, one node is a word and part-of-speech tagging thereof, and all nodes are connected through a directed edge;
P208 is combined according to the dependency relationship weight of the directed edge, and a weight corresponding to the dependency relationship of the directed edge in the directed graph is given;
p209 generates a sentence dependency relationship library of English composition through greedy search and outputs a dependency syntax analysis result of the English composition;
P210 ends.
3. The analytical method of claim 1, wherein: the processing steps of the English composition grammar role labeling module are as follows:
p301 is started;
P302 reads the dependency syntax analysis result of English composition;
p303, traversing leaf nodes of a dependency tree in the sentence dependency relationship of the English composition;
p304, if the word where the current node is located is an entity word, turning to P305 operation, otherwise, turning to P303 operation;
P305 finds the brother node of the current node;
p306 inquires the composition components formed by the current node and the brother nodes of the current node in the word part-of-speech tagging set;
p307, if a result is inquired in the word part-of-speech tagging set, turning to P308 operation, otherwise, turning to P315 operation;
p308, performing phrase tagging on the current phrase according to the phrase syntax tagging set;
p309 if the current phrase is a preposition phrase, turning to P310 operation, otherwise, turning to P311 operation;
p310 processes the preposition phrases and transfers to P315 operation;
p311 if the current phrase is a noun phrase, then go to P312 operation, otherwise go to P313 operation;
P312 processes noun phrases, and then the operation is switched to P315;
P313, if the current phrase is a phrase of "Most of", then go to P314 operation, otherwise go to P315 operation;
p314 processes the phrase "Most of" and switches to P315;
P315 obtains all dependency relations related to the current entity words from the sentence dependency relation library of English composition;
p316, if the noun subject relation or clause component subject relation exists, then the operation is switched to P317, otherwise, the operation is switched to P318;
p317 marks the grammar role of the current entity word as a subject, and then the operation is switched to P321;
p318 goes to P319 operation if there is a direct object relationship or an indirect object relationship, otherwise goes to P320 operation;
P319 marks the current entity word as an object and transfers to P321 operation;
P320 marks the grammatical role of the current entity word as "present";
p321 stores the grammar role marking result of the English composition into a grammar role marking linked list;
P322, if the dependency syntax tree traversal is finished, then turning to P323 operation, otherwise, turning to P303 operation;
p323 outputs the grammar role marking result of the English composition;
p324 ends.
4. The analytical method of claim 1, wherein: the English composition feature extraction module comprises the following processing steps:
p401 begins;
p402 reads the dependency syntax analysis result of English composition;
p403 traverses leaf nodes of the dependency syntax tree;
p404, detecting entity words according to the dependency syntax analysis result;
p405, if the current word is detected as the entity word, turning to the P406 operation, otherwise, turning to the P403 operation;
p406 creates an 'expression' object for storing feature information according to the coreference resolution feature set;
p407, finding a leaf node representing the current entity word in the dependency syntax tree according to the position information of the entity word;
p408, starting from the current leaf node, searching the parent node of the current node upwards;
p409, if the parent node is a noun phrase node, then turning to P410 operation, otherwise, turning to P408 operation;
p410 integrates all leaf nodes under the noun phrase node into one phrase and stores the phrase into the current expression object;
p411 extracts the common features of the entity words according to the coreference resolution feature set and stores the common features into the current 'expression' object;
p412 queries the male name vocabulary and the female name vocabulary respectively;
P413, if the gender of the current entity word is inquired, turning to the operation P414, and otherwise, turning to the operation P415;
P414 stores the sex characteristics into the current 'expression' object, and then the operation is switched to P416;
P415 sets the gender feature as unknown and stores it in the current "expression" object;
P416 queries a vocabulary of commonly represented characters;
p417, if the current entity word is inquired in the vocabulary commonly representing the characters, turning to the P418 operation, otherwise, turning to the P419 operation;
p418 takes the corresponding semantic level type as the semantic level of the current entity word, and then the operation is switched to P426;
p419 carries out named entity recognition on the entity words;
p420, if the named entity identification result is the named entity, turning to P421 operation, otherwise, turning to P422 operation;
p421 converts the recognition result of the named entity into corresponding semantic level, and then transfers to P426 operation;
p422 carries out date detection on the entity words;
p423 if the entity word represents a date, then go to P424 operation, otherwise go to P425 operation;
p424 sets the semantic level of the entity word as the date, and then the operation goes to P426;
p425 labels the semantic level result of the entity word as "object";
p426 inquires a living vocabulary table containing vital sign vocabularies;
P427 if the current entity word is looked up in the live vocabulary, go to P428 operation, otherwise go to P429 operation;
p428 marks the live features of the entity words as true, and then the operation is switched to P430;
p429 labels the living characteristics of the entity words as false;
p430 queries the nickname table;
If the current entity word is inquired in the nickname table, the operation is switched to the P433 operation by the P431, and otherwise, the operation is switched to the P432 operation;
P432 sets the alias feature to null, turning to P434 operation;
p433 stores the searched alias information into the alias characteristics of the current entity word;
P434 queries the fact information of the current entity in the English entity word common sense library, and stores the result in the coreference resolution feature set;
p435, if the dependency syntax tree traversal is finished, then go to P436, otherwise go to P403;
p436 outputs English composition feature extraction results;
p437 ends.
5. the analytical method of claim 1, wherein: the calculation formula of the English composition coreference resolution module is defined as follows:
(1) calculation formula of vocabulary semantic similarity
in the formula (1), the path1the path from the minimum common father node of two word senses to the root node in the semantic classification tree; route of travel2refers to the path from the word sense of the first word to the smallest common parent node of the two word senses in the semantic classification tree; route of travel3the method comprises the steps that in a semantic classification tree, the word meaning of a second word reaches the path of a minimum common father node of two word meanings, wherein the minimum common father node is the first common father node of two searched word meaning nodes which are searched upwards from the node where the word meanings of the two words are located;
(2) calculation formula of entity word set information entropy
in equation (2), the category attribute is decidediis the ith attribute in the decision category; i is the serial number of the current attribute, i is 1, 2, …, m; m is the total number of attributes in the decision category; the total number of all attributes in the decision category is the attribute in the decision categorytotal number of sex;
(3) calculation formula of coreference resolution characteristic expected information
in the formula (3), the characteristicsAis the feature currently to be calculated; j is the serial number of the current attribute, j is 1, 2, …, v; v is a feature in a set of entity wordsAThe number of attribute categories of (2); propertiesjis characterized in thatAthe jth attribute; the entity word set information entropy is obtained by calculation according to the formula (2);
(4) calculation formula of coreference resolution characteristic information gain
common-meaning resolution feature information gain is entity word set information entropy-common-meaning resolution feature expectation information (4)
in the formula (4), the information entropy of the entity word set is obtained by calculation through the formula (2); the coreference resolution characteristic expected information is obtained by calculation according to the formula (3);
(5) Calculation formula of coreference resolution feature splitting information
in the formula (5), the characteristicsAIs the feature currently to be calculated; j is the serial number of the current attribute, j is 1, 2, …, v; v is a feature in a set of entity wordsAThe number of attribute categories of (2); propertiesjis characterized in thatAthe jth attribute;
(6) Calculation formula of coreference resolution characteristic information gain rate
in the formula (6), the information gain is calculated by the above formula (4); the split information is calculated by the above equation (5).
6. the analytical method of claim 5, wherein: the English composition coreference resolution module comprises the following processing steps:
p501 begins;
p502 reads the feature extraction result of the English composition;
p503, traversing the feature extraction result of the English composition;
p504 detects whether the word "it" is in a sentence pattern or semantics that do not refer to any pre-referent;
P505 if the word "it" has non-referency, then go to P519 operation, otherwise go to P506 operation;
p506, if the current entity word has the collocate, turning to P507 for operation, otherwise, turning to P508 for operation;
p507 adds the current entity word into the common finger chain where the co-located word is located, and changes to the operation of P521;
p508 if the current entity word is in a clause beginning with "As", then go to P509 operation, otherwise go to P511 operation;
p509 switches to P510 if the subject in the "As" clause has subject relation, otherwise switches to P511;
p510 adds the current entity word into the entity chain where the entity word of the "As" clause subject is located, and then the operation is switched to P521;
p511 constructs a candidate pre-fingered list;
P512 inquires the word senses of the entity words in an English vocabulary network dictionary;
P513 calculates the semantic similarity between the current expression and the vocabulary of all candidate pre-fingered words through the semantic classification tree, the word senses of the entity words and the formula (1), and adds the semantic similarity into the coreference resolution feature set of the English composition;
p514 converts the characteristic information of the current entity word and all candidate pre-fingered words into a format required by a coreference resolution algorithm;
p515 is loaded in a coreference resolution model which is trained in a coreference resolution training English composition set;
P516 calculates the information gain rate of each coreference resolution characteristic according to a formula (2), a formula (3), a formula (4), a formula (5) and a formula (6), and performs coreference resolution processing by adopting a coreference resolution algorithm;
If the fruit body words are commonly referred to, the operation is switched to P518, otherwise, the operation is switched to P519;
p518 adds the back-index word into the common-index chain where the front-index word is positioned, and then the operation is switched to P521;
p519 creates a new coreference chain for the referent;
P520 stores the newly created co-finger chain into the co-finger chain table;
if the P521 refers to the end of the traversal of the feature result, the operation is switched to the P522 operation, otherwise, the operation is switched to the P503 operation;
P522 outputs the common finger linked list of English composition;
p523 ends.
7. the analytical method of claim 1, wherein: the English composition entity chain grid construction module comprises the following processing steps:
p601 is started;
P602 creates a two-dimensional matrix for storing the common finger chain grammar role information and initializes the matrix;
P603 reads the common finger linked list of English composition;
p604 traverses the next common finger chain;
P605 traverses the next "expression" of the current co-designated chain;
P606 acquires new grammar role information of current 'expression';
p607 acquires the old grammar role of the expression from the two-dimensional matrix through the current position information of the expression;
P608, if the priority of the new grammar role is higher than that of the old grammar role, then the operation is switched to P609, otherwise, the operation is switched to P610;
P609 replaces the old grammar role with the new grammar role in the two-dimensional matrix according to the current 'expression' position information;
p610 transfers to P605 operation if unprocessed 'expression' exists in the current co-reference chain, otherwise transfers to P611 operation;
p611 transfers to P604 operation if there are still unprocessed co-finger chains, otherwise transfers to P612 operation;
p612 generates entity chain grids of English compositions;
P613 is finished.
8. the analytical method of claim 1, wherein: the calculation formula of the English composition entity chain grid construction module is defined as follows:
(1) formula for calculating occurrence probability of entity chain sequence
in formula (7), i is the sequence number of the current grammar role in the grammar role sequence, i is 1, 2, …, v; v is the total number of grammatical roles of the current entity chain; grammatical rolesiis the currently computed grammar role; grammatical rolesi+1is the currently computed grammar role; the co-occurrence times refer to the total times of co-occurrence of two grammatical roles in a consistent training English composition set of the language;
(2) calculation formula for English literary sentence coherent score
in formula (8), i is the sequence number of the current sentence, i is 1, 2, …, n; j is the serial number of the current entity chain, j is 1, 2, …, m; n is the total number of sentences; m is the total number of entity chains; the weight of the entity chain refers to the number of different entity words in the same entity chain; the occurrence probability of the entity chain sequence is obtained by calculation according to the formula (7);
(3) calculation formula of sentence semantic consistency
In formula (9), the chain of entitiesiis the current chain of entities being computed; i is the serial number of the current entity chain, i is 1, 2, …, n; n is a sentence1and sentences2total number of different entity chains in(ii) a Sentence1Is a sentence currently to be calculated; sentence2Is a sentence currently to be calculated;
(4) Calculation formula of shallow semantic consistency
in formula (10), i is the serial number of the sentence to be currently calculated, i is 1, 2, …, n; n is the total number of sentences of the English composition; the sentence semantic consistency is calculated by the above formula (9).
9. the analytical method of claim 8, wherein: the processing steps of the English literary sentence continuous analysis module are as follows:
p701 starts;
p702 initializes a grammar role transition frequency matrix;
p703 reading in entity chain grids of English compositions;
p704 traverses the physical chain mesh;
p705 if the current entity word is the first element of the entity chain, then the operation is switched to P706, otherwise, the operation is switched to P707;
the P706 caches the current grammar role and switches to the P704 operation;
p707 forming a transition sequence with a length of 2 by the current grammar role and the cached grammar role;
P708 adds 1 to the frequency of the current transfer sequence in the grammar character transfer frequency matrix;
P709 replaces the cached grammar role with the current grammar role;
p710, if the traversal of the entity chain grid is finished, turning to P711 operation, otherwise, turning to P704 operation;
p711 loads an entity chain grid model trained in a language piece coherent training English composition set;
p712 calculates the occurrence probability of the entity chain sequence according to formula (7);
P713 calculates a consistency score of the English composition according to formula (8);
P714 weights the scores according to the number of different entity words in the common finger chain;
p715, calculating sentence semantic similarity of the English composition according to formula (9), and calculating shallow semantic coherence of the English composition according to formula (10);
p716 generating a English composition consistency analysis result according to the English composition consistency score and the shallow semantic consistency;
p717 ends.
CN201611109331.4A 2016-12-06 2016-12-06 method for analyzing consistency quality of English literary texts Active CN106776550B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611109331.4A CN106776550B (en) 2016-12-06 2016-12-06 method for analyzing consistency quality of English literary texts

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611109331.4A CN106776550B (en) 2016-12-06 2016-12-06 method for analyzing consistency quality of English literary texts

Publications (2)

Publication Number Publication Date
CN106776550A CN106776550A (en) 2017-05-31
CN106776550B true CN106776550B (en) 2019-12-13

Family

ID=58879164

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611109331.4A Active CN106776550B (en) 2016-12-06 2016-12-06 method for analyzing consistency quality of English literary texts

Country Status (1)

Country Link
CN (1) CN106776550B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110647625A (en) * 2018-06-27 2020-01-03 上海意仕腾教育科技有限公司 Training method of English writing evaluation system
CN110287291B (en) * 2019-07-03 2021-11-02 桂林电子科技大学 Unsupervised method for analyzing running questions of English short sentences
CN111709224B (en) * 2020-06-22 2023-04-07 桂林电子科技大学 Method for analyzing continuity of English short sentence level topics
CN113553830B (en) * 2021-08-11 2023-01-03 桂林电子科技大学 Graph-based English text sentence language piece coherent analysis method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101446943A (en) * 2008-12-10 2009-06-03 苏州大学 Reference and counteraction method based on semantic role information in Chinese character processing
CN101901213A (en) * 2010-07-29 2010-12-01 哈尔滨工业大学 Instance-based dynamic generalization coreference resolution method
CN102831558A (en) * 2012-07-20 2012-12-19 桂林电子科技大学 System and method for automatically scoring college English compositions independent of manual pre-scoring
CN102866989A (en) * 2012-08-30 2013-01-09 北京航空航天大学 Viewpoint extracting method based on word dependence relationship
CN106021229A (en) * 2016-05-19 2016-10-12 苏州大学 Chinese event co-reference resolution method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101446943A (en) * 2008-12-10 2009-06-03 苏州大学 Reference and counteraction method based on semantic role information in Chinese character processing
CN101901213A (en) * 2010-07-29 2010-12-01 哈尔滨工业大学 Instance-based dynamic generalization coreference resolution method
CN102831558A (en) * 2012-07-20 2012-12-19 桂林电子科技大学 System and method for automatically scoring college English compositions independent of manual pre-scoring
CN102866989A (en) * 2012-08-30 2013-01-09 北京航空航天大学 Viewpoint extracting method based on word dependence relationship
CN106021229A (en) * 2016-05-19 2016-10-12 苏州大学 Chinese event co-reference resolution method and system

Also Published As

Publication number Publication date
CN106776550A (en) 2017-05-31

Similar Documents

Publication Publication Date Title
CN106776550B (en) method for analyzing consistency quality of English literary texts
Creutz et al. Inducing the morphological lexicon of a natural language from unannotated text
CN107392143B (en) Resume accurate analysis method based on SVM text classification
Mori et al. A machine learning approach to recipe text processing
Hadni et al. Hybrid part-of-speech tagger for non-vocalized Arabic text
CN110287497B (en) Semantic structure coherent analysis method for English text
CN109033085B (en) Chinese word segmentation system and Chinese text word segmentation method
KR101729461B1 (en) Natural language processing system, natural language processing method, and natural language processing program
CN115858758A (en) Intelligent customer service knowledge graph system with multiple unstructured data identification
CN112860896A (en) Corpus generalization method and man-machine conversation emotion analysis method for industrial field
CN113704416A (en) Word sense disambiguation method and device, electronic equipment and computer-readable storage medium
JP6145059B2 (en) Model learning device, morphological analysis device, and method
CN111737980A (en) Method for correcting English text word use errors
CN116258137A (en) Text error correction method, device, equipment and storage medium
CN114492470A (en) Commodity title text translation method and device, equipment, medium and product thereof
CN115455197A (en) Dialogue relation extraction method integrating position perception refinement
Nehar et al. An efficient stemming for arabic text classification
Albogamy et al. Unsupervised stemmer for Arabic tweets
JP6969431B2 (en) Morphological analysis learning device, morphological analysis device, method, and program
Asghari et al. A probabilistic approach to persian ezafe recognition
Deka et al. A study of t’nt and crf based approach for pos tagging in assamese language
JP4148247B2 (en) Vocabulary acquisition method and apparatus, program, and computer-readable recording medium
CN109446537B (en) Translation evaluation method and device for machine translation
Shamsfard et al. A Hybrid Morphology-Based POS Tagger for Persian.
Rojan et al. Natural Language Processing based Text Imputation for Malayalam Corpora

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20170531

Assignee: Guilin Dazhi Technology Co.,Ltd.

Assignor: GUILIN University OF ELECTRONIC TECHNOLOGY

Contract record no.: X2022450000184

Denomination of invention: An Analysis Method of the Quality of Discourse Coherence in English Composition

Granted publication date: 20191213

License type: Common License

Record date: 20221125

Application publication date: 20170531

Assignee: Guilin Ruisen Education Service Co.,Ltd.

Assignor: GUILIN University OF ELECTRONIC TECHNOLOGY

Contract record no.: X2022450000186

Denomination of invention: An Analysis Method of the Quality of Discourse Coherence in English Composition

Granted publication date: 20191213

License type: Common License

Record date: 20221125