CN113553835B - 一种英语文本中句子语法错误自动纠正方法 - Google Patents

一种英语文本中句子语法错误自动纠正方法 Download PDF

Info

Publication number
CN113553835B
CN113553835B CN202110916902.XA CN202110916902A CN113553835B CN 113553835 B CN113553835 B CN 113553835B CN 202110916902 A CN202110916902 A CN 202110916902A CN 113553835 B CN113553835 B CN 113553835B
Authority
CN
China
Prior art keywords
sentence
word
english text
processed
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110916902.XA
Other languages
English (en)
Other versions
CN113553835A (zh
Inventor
黄桂敏
王家浩
张晓薇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin University of Electronic Technology
Original Assignee
Guilin University of Electronic Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Electronic Technology filed Critical Guilin University of Electronic Technology
Priority to CN202110916902.XA priority Critical patent/CN113553835B/zh
Publication of CN113553835A publication Critical patent/CN113553835A/zh
Application granted granted Critical
Publication of CN113553835B publication Critical patent/CN113553835B/zh
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

本发明提供一种英语文本中句子语法错误自动纠正方法,该纠正方法是一个由顺序连接的句子上下文词向量表示模块、句子最佳候选句子推荐模块、句子语法错误纠正生成模块组成的纠正模型。待处理英语文本的句子通过该纠正模型处理后,最后能够得到待处理英语文本的句子语法纠错建议。本发明的纠正方法解决了基于规则的英语文本中句子语法错误纠正方法纠错精度低的问题,以及基于统计的英语文本中句子语法错误纠正方法纠正的语法错误类型少的问题。

Description

一种英语文本中句子语法错误自动纠正方法
技术领域
本发明涉及自然语言处理技术,是一种英语文本中句子语法错误自动纠正方法,本发明的方法只适合自动纠正英语文本中句子的语法错误,不适合自动纠正中文文本中句子的语法错误。
背景技术
传统的英语文本语法错误纠正方法主要分为两类,一类是基于规则的英语文本中句子语法错误纠正方法,一类是基于统计的英语文本中句子语法错误纠正方法。基于规则的英语文本中句子语法错误纠正方法需要人为定义语法规则,将大量的语法规则构建为语法规则库,采用构建的语法规则库对英语文本中的语法错误进行纠正,由于基于规则的英语文本中句子语法错误纠正方法难以概括所有语法错误,因此它的纠错精度低且难以提升。基于统计的英语文本中句子语法错误纠正方法通过构建语法纠错统计模型,采用构建的语法纠错统计模型对英语文本中的语法错误进行纠正,由于基于统计的英语文本中句子语法错误纠正方法无法纠正距离长的语法错误,因此它的能够纠正的语法错误类型少且难以提高。针对上述问题,本发明提出了一种英语文本中句子语法错误自动纠正方法,该方法解决了基于规则的英语文本中句子语法错误纠正方法纠错精度低的问题,以及基于统计的英语文本中句子语法错误纠正方法纠正的语法错误类型少的问题。
发明内容
本发明的一种英语文本中句子语法错误自动纠正方法包括:句子上下文词向量表示模块、句子最佳候选句子推荐模块、句子语法错误纠正生成模块,总体处理流程图如图1所示。
本发明的句子上下文词向量表示模块的处理流程是:第一,读入待处理英语文本,将其切分成句子并进行单词词性标注;第二,根据单词词性标注进行句法依存关系分析和单词依赖关系分析,得到待处理英语文本的句法关系树和单词依赖关系树;第三,根据待处理英语文本的句法关系树和单词依赖关系树,对待处理英语文本中句子进行单词的向量化处理,得到句子中单词的词向量;第四,初始化搜索权重矩阵、标记权重矩阵和结果权重矩阵,计算句子中单词的搜索向量、标记向量和结果向量;第五,计算句子中单词注意力权重、句间注意力向量和上下文词向量,最后输出待处理英语文本中句子上下文词向量表示。
本发明的句子最佳候选句子推荐模块的处理流程是:第一,读取待处理英语文本中一个句子的上下文词向量表示,采用奇异值分解对句子上下文词向量进行降维处理;第二,将降维后的句子上下文词向量与单词的词向量进行合并;第三,将合并后的词向量进行数值缩放和归一化处理;第四,计算英语单词词典中所有单词的下一个单词出现概率,并把概率最高的前5个单词作为候选单词,分别基于每一个候选单词进行推理,得到基于该单词的新候选单词;第五,把候选单词构成的句子中概率最大的句子作为候选句子,对选取的候选句子概率进行累加,并判断候选句子的累加概率是否达到设定的阈值,如果是则停止选取新的候选句子,并输出选取的所有候选句子;如果否则继续选取概率最大的句子添加到候选句子集中;第六,判断英语文本中句子是否处理完,如果是则输出英语文本中所有句子的候选句子集,否则返回第二步继续处理英语文本中剩余的句子,直至处理完待处理英语文本中的所有句子为止。
本发明的句子语法错误纠正生成模块的处理流程是:第一,读取待处理英语文本中句子的全部候选句子,并选取概率最高的候选句子作为语法纠错的结果;第二,统计待处理英语文本中出现的语法错误总数,计算待处理英语文本的语法纠错得分,并根据待处理英语文本的语法纠错得分输出相应的语法纠错建议。
本发明纠正方法的计算公式定义
(1)搜索向量、标记向量和结果向量的计算公式
搜索向量j=搜索权重矩阵×英语文本中单词j (1)
标记向量j=标记权重矩阵×英语文本中单词j (2)
结果向量j=结果权重矩阵×英语文本中单词j (3)
在公式(1)、(2)、(3)中,j为英语文本中单词顺序的编号。
(2)单词注意力权重的计算公式
Figure BDA0003205949390000021
在公式(4)中,i为英语文本中第i个单词的编号,搜索向量、标记向量与结果向量由公式(1)(2)(3)计算得到。
(3)句间注意力向量的计算公式
Figure BDA0003205949390000022
在公式(5)中,i为英语文本中第i个单词的编号,j为英语文本中单词顺序的编号,N为英语文本中单词的总数。
(4)上下文词向量的计算公式
Figure BDA0003205949390000023
在公式(6)中,i为英语文本中第i个单词的编号,j为英语文本中单词顺序的编号,N为英语文本中单词的总数。
(5)英语文本语法纠错得分的计算公式
Figure BDA0003205949390000031
本发明纠正方法的具体处理步骤
如图2所示,所述的句子上下文词向量表示模块处理流程如下:
P201开始;
P202读入待处理英语文本;
P203对待处理英语文本进行分句、分词和单词词性标注处理;
P204对待处理英语文本进行句法依存关系分析和单词依赖关系分析,得到待处理英语文本的句法关系树和单词依赖关系树;
P205依次读取待处理英语文本中每个句子;
P206根据待处理英语文本的句法关系树和单词依赖关系树,对每个句子进行单词的词向量化处理,得到每个句子中单词的词向量;
P207初始化搜索权重矩阵、标记权重矩阵和结果权重矩阵的数值;
P208采用公式(1)、(2)、(3)计算每个句子中单词的搜索向量、标记向量和结果向量;
P209采用公式(4)计算每个句子中单词的单词注意力权重,采用公式(5)计算每个句子的句间注意力向量;
P210更新每个句子的搜索权重矩阵、标记权重矩阵、结果权重矩阵、单词注意力权重和句间注意力向量的数值;
P211根据每个句子中单词注意力权重的更新数值,对每个句子的句间注意力向量进行更新,并采用公式(6)计算出每个句子的上下文词向量;
P212输出待处理英语文本中句子的上下文词向量表示;
P213结束。
如图3所示,所述的句子最佳候选句子推荐模块处理流程如下:
P301开始;
P302读取待处理英语文本中一个句子的上下文词向量;
P303采用奇异值分解对句子的上下文词向量进行降维处理;
P304将降维后的句子上下文词向量与单词的词向量进行合并;
P305将合并后的词向量进行数值缩放和归一化处理;
P306计算英语单词词典中所有单词的下一个单词出现概率,并把概率最高的前5个单词作为候选单词;
P307分别基于每一个候选单词进行推理,得到基于该单词的新候选单词;
P308把候选单词构成的句子中概率最大的句子作为候选句子,并对选取的候选句子概率进行累加;
P309判断候选句子的累加概率是否达到设定的阈值,是则执行P310,否则执行P308;
P310停止选取新的候选句子,并输出选取的所有候选句子;
P311判断待处理英语文本中句子是否处理完,是则执行P312,否则执行P302;
P312输出待处理英语文本中所有句子的候选句子集;
P313结束。
如图4所示,所述的句子语法错误纠正生成模块处理流程如下:
P401开始;
P402读取待处理英语文本的所有句子的候选句子,并选取概率最高的候选句子作为语法纠错的结果;
P403统计待处理英语文本中的语法错误总数;
P404采用公式(7)计算待处理英语文本的语法纠错得分,并生成对应的语法纠错建议;
P405结束。
本发明英语文本中句子语法错误自动纠正方法,待处理英语文本的句子通过该纠正方法处理后,最后能够得到待处理英语文本的句子语法纠错建议。本发明的纠正方法解决了基于规则的英语文本中句子语法错误纠正方法纠错精度低的问题,以及基于统计的英语文本中句子语法错误纠正方法纠正的语法错误类型少的问题。
附图说明
图1是本发明的总体处理流程图;
图2是本发明的句子上下文词向量表示模块处理流程图;
图3是本发明的句子最佳候选句子推荐模块处理流程图;
图4是本发明的句子语法错误纠正生成模块处理流程图。
具体实施方式
下面结合实施例对本发明内容作进一步的说明,但不是对本发明的限定。
实施例
本发明实施例中输入的待处理英语文本取自于中国学习者英语语料库中的中国学生英语作文,该待处理英语文本中句子语法错误自动纠正方法,包括如下步骤:
第一步骤:执行“句子上下文词向量表示模块”
待处理英语文本如下:
In modern society,we live on commodities,and the fake commodities isa danger enemy in the darkness.They not only cannot afford us the usefulaspect what they should have,but also intimid our lives and ourpossessions.For example,a high-pressure pain is not quantified,and itsexplosion can cause a tragical accident.The like that has been printed innewspaper not for the first time.As we all know,the substance CH3OH in fakewine will make bright eyes dim.On the other hand,the fake commodities canalso affect the fame of some firm badly.Now it is time for us to get rid ofall the fake commodities from the shelf in the shops.Above all,we customersmust erect a sense of self-protection.If we all have bright eyes,fakecommodities will have to be hidden.Second,the government must act on astiffer law to prohibit the production of fake commodities.I believe the daywithout any fake commodities will come soon in spite of some difficulties.
(1)对待处理英语文本进行分句分词,得到的分句分词结果如下:
第1个句子
[In modern society,we live on commodities,and the fake commodities isa danger enemy in the darkness.]
第2个句子
[They not only cannot afford us the useful aspect what they shouldhave,but alsointimid our lives and our possessions.]
第3个句子
[For example,a high-pressure pain is not quantified,and its explosioncan cause a tragical accident.]
第4个句子
[The like that has been printed in newspaper not for the first time.]
第5个句子
[As we all know,the substance CH3OH in fake wine will make brighteyes dim.]
第6个句子
[On the other hand,the fake commodities can also affect the fame ofsome firmbadly.]
第7个句子
[Now it is time for us to get rid of all the fake commodities fromthe shelf in the shops.]
第8个句子
[Above all,we customers must erect a sense of self-protection.]
第9个句子
[If we all have bright eyes,fake commodities will have to be hidden.]
第10个句子
[Second,the government must act on a stiffer law to prohibit theproduction of fake commodities.]
第11个句子
[I believe the day without any fake commodities will come soon inspite of some difficulties.]
(2)对分句分词后的待处理英语文本进行词性标注,得到的词性标注结果如下:
第1个句子
[In/IN modern/JJ society/NN,/,we/PRP live/VBP on/IN commodities/NNSand/CC the/DT fake/JJ commodities/NNS,/,is/VBZ a/DT danger/NN enemy/NN in/INthe/DT darkness/NN./.]
第2个句子
[They/PRP not/RB only/RB can/MD not/RB afford/VB us/PRP the/DTuseful/JJ aspect/NN what/WP they/PRP should/MD have/VB,/,but/CC also/RBintimid/VBD our/PRP$lives/NNS and/CC our/PRP$possessions/NNS./.]
第3个句子
[For/IN example/NN,/,a/DT high/JJ-/HYPH pressure/NN pain/NN is/VBZnot/RB quantified/VBN,/,and/CC its/PRP$explosion/NN can/MD cause/VB a/DTtragical/JJ accident/NN./.]
第4个句子
[The/DT like/NN that/WDT has/VBZ been/VBN printed/VBN in/INnewspaper/NN not/RB for/IN the/DT first/JJ time/NN./.]
第5个句子
[As/IN we/PRP all/RB know/VBP,/,the/DT substance/NN CH3OH/NN in/INfake/JJ wine/NN will/MD make/VB bright/JJ eyes/NNS dim/JJ./.]
第6个句子
[On/IN the/DT other/JJ hand/NN,/,the/DT fake/JJ commodities/NNS can/MD also/RBaffect/VB the/DT fame/NN of/IN some/DT firm/NN badly/RB./.]
第7个句子
[Now/RB it/PRP is/VBZ time/NN for/IN us/PRP to/TO get/VB rid/VBN of/IN all/PDTthe/DT fake/JJ commodities/NNS from/IN the/DT shelf/NN in/IN the/DTshops/NNS./.]
第8个句子
[Above/IN all/DT,/,we/PRP customers/NNS must/MD erect/VB a/DT sense/NN of/INself/NN-/HYPH protection/NN./.]
第9个句子
[If/IN we/PRP all/RB have/VBP bright/JJ eyes/NNS,/,fake/JJcommodities/NNSwill/MD have/VB to/TO be/VB hidden/VBN./.]
第10个句子
[Second/RB,/,the/DT government/NN must/MD act/VB on/IN a/DT stiffer/JJR law/NNto/TO prohibit/VB the/DT production/NN of/IN fake/JJ commodities/NNS./.]
第11个句子
[I/PRP believe/VBP the/DT day/NN without/IN any/DT fake/JJcommodities/NNSwill/MD come/VB soon/RB in/IN spite/NN of/IN some/DTdifficulties/NNS./.]
(3)对待处理英语文本中句子进行句法依存关系分析,得到句法关系树的结构如下:
第1个句子
(ROOT(S(S(PP(IN In)(NP(JJ modern)(NN society)))(,,)(NP(PRP we))(VP(VBP live)(PP(IN on)(NP(NNS commodities)))))(,,)(CC and)(S(NP(DT the)(JJfake)(NNS commodities))(VP(VBZ is)(NP(NP(DT a)(NN danger)(NN enemy))(PP(INin)(NP(DT the)(NN darkness))))))(..)))
第2个句子
(ROOT(S(NP(PRP They))(VP(CONJP(RB not)(RB only))(VP(MD can)(RB not)(VP(VB afford)(S(NP(PRP us))(NP(NP(DT the)(JJ useful)(NN aspect))(SBAR(WHNP(WP what))(S(NP(PRP they))(VP(MD should)(VP(VB have)))))))))(,,)(CONJP(CCbut)(RB also))(VP(VBD intimid)(NP(NP(PRP$our)(NNS lives))(CC and)(NP(PRP$our)(NNS possessions)))))(..)))
第3个句子
(ROOT(S(S(PP(IN For)(NP(NN example)))(,,)(NP(DT a)(NML(JJ high)(HYPH-)(NN pressure))(NN pain))(VP(VBZ is)(RB not)(VP(VBN quantified))))(,,)(CC and)(S(NP(PRP$its)(NN explosion))(VP(MD can)(VP(VB cause)(NP(DT a)(JJtragical)(NN accident)))))(..)))
第4个句子
(ROOT(S(NP(NP(NNP The))(PP(IN like)(NP(DT that))))(VP(VBZ has)(VP(VBNbeen)(VP(VBN printed)(PP(IN in)(NP(NP(NN newspaper)(RB not))(PP(IN for)(NP(DTthe)(JJ first)(NN time))))))))(..)))
第5个句子
(ROOT(S(SBAR(IN As)(S(NP(PRP we))(ADVP(RB all))(VP(VBP know))))(,,)(NP(NP(DT the)(NN substance)(NN CH3OH))(PP(IN in)(NP(JJ fake)(NN wine))))(VP(MD will)(VP(VB make)(S(NP(JJ bright)(NNS eyes))(ADJP(JJ dim)))))(..)))第6个句子
(ROOT(S(PP(IN On)(NP(DT the)(JJ other)(NN hand)))(,,)(NP(DT the)(JJfake)(NNS commodities))(VP(MD can)(ADVP(RB also))(VP(VB affect)(NP(NP(DT the)(NN fame))(PP(IN of)(NP(DT some)(NN firm))))(ADVP(RB badly))))(..)))第7个句子
(ROOT(S(ADVP(RB Now))(NP(PRP it))(VP(VBZ is)(NP(NP(NN time))(PP(INfor)(NP(PRP us)))(S(VP(TO to)(VP(VB get)(VP(VBN rid)(PP(IN of)(NP(NP(PDT all)(DT the)(JJ fake)(NNS commodities))(PP(IN from)(NP(NP(DT the)(NN shelf))(PP(IN in)(NP(DT the)(NNS shops)))))))))))))(..)))
第8个句子
(ROOT(S(PP(IN Above)(NP(DT all)))(,,)(NP(PRP we)(NNS customers))(VP(MD must)(VP(VB erect)(NP(NP(DT a)(NN sense))(PP(IN of)(NP(NN self)(HYPH-)(NNprotection))))))(..)))
第9个句子
(ROOT(S(SBAR(IN If)(S(NP(PRP we))(ADVP(RB all))(VP(VBP have)(NP(JJbright)(NNS eyes)))))(,,)(NP(JJ fake)(NNS commodities))(VP(MD will)(VP(VBhave)(S(VP(TO to)(VP(VB be)(VP(VBN hidden)))))))(..)))
第10个句子
(ROOT(S(ADVP(RB Second))(,,)(NP(DT the)(NN government))(VP(MD must)(VP(VB act)(PP(IN on)(NP(DT a)(JJR stiffer)(NN law)))(S(VP(TO to)(VP(VBprohibit)(NP(NP(DT the)(NN production))(PP(IN of)(NP(JJ fake)(NNScommodities)))))))))(..)))
第11个句子
(ROOT(S(NP(PRP I))(VP(VBP believe)(SBAR(S(NP(NP(DT the)(NN day))(PP(IN without)(NP(DT any)(JJ fake)(NNS commodities))))(VP(MD will)(VP(VB come)(ADVP(RB soon))(PP(IN in)(NP(NP(NN spite))(PP(IN of)(NP(DT some)(NNSdifficulties))))))))))(..)))
(4)对待处理英语文本中单词进行单词依赖关系分析,得到单词依赖关系树的结构如下:
第1个句子
[case(society-3,In-1)amod(society-3,modern-2)obl:in(live-6,society-3)nsubj(live-6,we-5)root(ROOT-0,live-6)case(commodities-8,on-7)obl:on(live-6,commodities-8)cc(enemy-17,and-10)det(commodities-13,the-11)amod(commodities-13,fake-12)nsubj(enemy-17,commodities-13)cop(enemy-17,is-14)det(enemy-17,a-15)compound(enemy-17,danger-16)conj:and(live-6,enemy-17)case(darkness-20,in-18)det(darkness-20,the-19)nmod:in(enemy-17,darkness-20)]
第2个句子
[nsubj(afford-6,They-1)nsubj(intimid-18,They-1)advmod(only-3,not-2)cc:preconj(afford-6,only-3)aux(afford-6,can-4)advmod(afford-6,not-5)root(ROOT-0,afford-6)nsubj(aspect-10,us-7)det(aspect-10,the-8)amod(aspect-10,useful-9)xcomp(afford-6,aspect-10)obj(have-14,aspect-10)ref(aspect-10,what-11)nsubj(have-14,they-12)aux(have-14,should-13)acl:relcl(aspect-10,have-14)cc(intimid-18,but-16)advmod(intimid-18,also-17)conj:and(afford-6,intimid-18)nmod:poss(lives-20,our-19)obj(intimid-18,lives-20)cc(possessions-23,and-21)nmod:poss(possessions-23,our-22)obj(intimid-18,possessions-23)conj:and(lives-20,possessions-23)]
第3个句子
[case(example-2,For-1)obl:for(quantified-11,example-2)det(pain-8,a-4)amod(pressure-7,high-5)punct(pressure-7,--6)compound(pain-8,pressure-7)nsubj:pass(quantified-11,pain-8)aux:pass(quantified-11,is-9)advmod(quantified-11,not-10)root(ROOT-0,quantified-11)cc(cause-17,and-13)nmod:poss(explosion-15,its-14)nsubj(cause-17,explosion-15)aux(cause-17,can-16)conj:and(quantified-11,cause-17)det(accident-20,a-18)amod(accident-20,tragical-19)obj(cause-17,accident-20)]
第4个句子
[nsubj:pass(printed-6,The-1)case(that-3,like-2)nmod:like(The-1,that-3)aux(printed-6,has-4)aux:pass(printed-6,been-5)root(ROOT-0,printed-6)case(newspaper-8,in-7)obl:in(printed-6,newspaper-8)advmod(newspaper-8,not-9)case(time-13,for-10)det(time-13,the-11)amod(time-13,first-12)nmod:for(newspaper-8,time-13)]
第5个句子
[mark(know-4,As-1)nsubj(know-4,we-2)advmod(know-4,all-3)advcl(make-13,know-4)det(CH3OH-8,the-6)compound(CH3OH-8,substance-7)nsubj(make-13,CH3OH-8)case(wine-11,in-9)amod(wine-11,fake-10)nmod:in(CH3OH-8,wine-11)aux(make-13,will-12)root(ROOT-0,make-13)amod(eyes-15,bright-14)nsubj(dim-16,eyes-15)xcomp(make-13,dim-16)]
第6个句子
[case(hand-4,On-1)det(hand-4,the-2)amod(hand-4,other-3)obl:on(affect-11,hand-4)det(commodities-8,the-6)amod(commodities-8,fake-7)nsubj(affect-11,commodities-8)aux(affect-11,can-9)advmod(affect-11,also-10)root(ROOT-0,affect-11)det(fame-13,the-12)obj(affect-11,fame-13)case(firm-16,of-14)det(firm-16,some-15)nmod:of(fame-13,firm-16)advmod(affect-11,badly-17)]
第7个句子
[advmod(time-4,Now-1)nsubj(time-4,it-2)cop(time-4,is-3)root(ROOT-0,time-4)case(us-6,for-5)nmod:for(time-4,us-6)mark(rid-9,to-7)aux:pass(rid-9,get-8)acl(time-4,rid-9)case(commodities-14,of-10)det:predet(commodities-14,all-11)det(commodities-14,the-12)amod(commodities-14,fake-13)obl:of(rid-9,commodities-14)case(shelf-17,from-15)det(shelf-17,the-16)nmod:from(commodities-14,shelf-17)case(shops-20,in-18)det(shops-20,the-19)nmod:in(shelf-17,shops-20)]
第8个句子
[case(all-2,Above-1)obl:above(erect-7,all-2)dep(customers-5,we-4)nsubj(erect-7,customers-5)aux(erect-7,must-6)root(ROOT-0,erect-7)det(sense-9,a-8)obj(erect-7,sense-9)case(protection-13,of-10)compound(protection-13,self-11)punct(protection-13,--12)nmod:of(sense-9,protection-13)]
第9个句子
[mark(have-4,If-1)nsubj(have-4,we-2)advmod(have-4,all-3)advcl(have-11,have-4)amod(eyes-6,bright-5)obj(have-4,eyes-6)amod(commodities-9,fake-8)nsubj(have-11,commodities-9)nsubj:pass:xsubj(hidden-14,commodities-9)aux(have-11,will-10)root(ROOT-0,have-11)mark(hidden-14,to-12)aux:pass(hidden-14,be-13)xcomp(have-11,hidden-14)]
第10个句子
[advmod(act-6,Second-1)det(government-4,the-3)nsubj(act-6,government-4)nsubj:xsubj(prohibit-12,government-4)aux(act-6,must-5)root(ROOT-0,act-6)case(law-10,on-7)det(law-10,a-8)amod(law-10,stiffer-9)obl:on(act-6,law-10)mark(prohibit-12,to-11)xcomp(act-6,prohibit-12)det(production-14,the-13)obj(prohibit-12,production-14)case(commodities-17,of-15)amod(commodities-17,fake-16)nmod:of(production-14,commodities-17)]
第11个句子
[nsubj(believe-2,I-1)root(ROOT-0,believe-2)det(day-4,the-3)nsubj(come-10,day-4)case(commodities-8,without-5)det(commodities-8,any-6)amod(commodities-8,fake-7)nmod:without(day-4,commodities-8)aux(come-10,will-9)ccomp(believe-2,come-10)advmod(come-10,soon-11)case(difficulties-16,in-12)fixed(in-12,spite-13)fixed(in-12,of-14)det(difficulties-16,some-15)obl:in_spite_of(come-10,difficulties-16)]
(5)将待处理英语文本中的单词转化为词向量,得到的词向量如下:
第1个句子
[0.88731223,0.58120215,-0.73104781,...,-0.38501585,0.54886746,-0.03811252],[0.64540702,0.84005779,-0.32642967,...,-0.68850678,0.20182693,0.09689900],[0.28777501,0.73943686,-0.11752694,...,-0.72764307,0.56701452,0.44484282],[0.32574126,1.01410854,-0.37209913,...,-0.49188718,0.40403485,-0.33792970],[0.82257861,1.04121339,-0.16380487,...,-0.39518330,0.71957588,0.31918916],[0.89457726,0.47683927,-0.56336206,...,-0.49055418,0.18090129,0.07754472],[0.22887111,0.40329373,-0.01253630,...,-0.50055373,0.48401821,0.4236083],[0.54691792,0.66339368,-0.59164178,...,-0.61900127,0.66203475,-0.12971932],[0.52764875,0.75389832,-0.47884265,...,-0.73180723,0.22470111,-0.40799180],[0.12549956,0.69425756,0.35147083,...,-0.91356879,0.44520065,-0.02031172],[0.22887111,0.40329373,-0.01253630,...,-0.50055373,0.48401821,0.42360830],[0.61284226,0.76920104,-0.82114655,...,-0.56082326,0.07730889,-0.48182729],[0.40888742,0.56879914,-0.46132466,...,-0.43315104,0.12292353,-0.08168960],[0.71680045,0.44602990,-0.08714306,...,-0.56923527,0.46241698,0.10988426],[0.88731223,0.58120215,-0.73104781,...,-0.38501585,0.54886746,-0.03811252],[0.52764875,0.75389832,-0.47884265,...,-0.73180723,0.22470111,-0.40799180],[0.01973449,0.40741289,0.23051713,...,-0.34422147,0.17322083,-0.32863113],[0.77281857,0.30524546,-0.63670730,...,-0.71217430,0.52426460,0.93458830],
……
第11个句子
[0.45000613,0.80553681,-0.10446999,...,-0.51769769,0.27324462,-0.23227419],[0.40934685,0.56205034,-0.17857145,...,-0.72519159,0.56253004,0.41420683],[0.52764875,0.75389832,-0.47884265,...,-0.73180723,0.22470111,-0.40799180],[0.15220518,0.37932172,-0.12466386,...,-0.60083771,0.35271147,0.08316841],[0.38012400,0.41926789,-0.39678419,...,-0.85321313,0.52345985,-0.00418444],[0.33414388,0.47365859,-0.48332623,...,-0.33296272,0.46423438,-0.14165024],[0.72166508,0.58148539,-0.44393054,...,-0.74636704,0.23864335,-0.11923205],[0.59502685,0.82335049,-0.64003140,...,-0.54264212,0.68246937,0.14163448],[0.49008131,0.38584661,0.07494428,...,-0.50990921,0.10206913,0.39514568],[0.52764875,0.75389832,-0.47884265,...,-0.73180723,0.22470111,-0.40799180],[0.06046878,0.74936205,-0.10149002,...,-0.27482945,1.08982205,-0.21852523],[0.76665276,0.50959057,-0.63455814,...,-0.59576172,0.23965351,-0.14607368],[0.37342623,0.45145273,-0.03400040,...,-0.53484255,0.39585698,-0.31821975],[0.76665276,0.50959057,-0.63455814,...,-0.59576172,0.23965351,-0.14607368],[0.67337489,0.77567345,-0.53990513,...,-0.53218424,0.31343362,0.01751496],[0.41714790,0.35483381,-0.07002024,...,-0.55396628,0.22909264,0.21319027]
(6)初始化搜索权重矩阵、标记权重矩阵和结果权重矩阵,得到的初始化值如下:
搜索权重矩阵
[-0.93852663 -0.57928514 -0.9754391 0.9433651 0.8345357 -0.19638540.9410325 0.2755371 -0.94585985 -0.99999636 -0.7588035 0.97825813 0.981277050.7931257 0.9414303 -0.7641323 -0.5325371 -0.6604417 0.48289928 -0.500185670.80085874 0.9999998 -0.42725858 0.32169098 0.5716769 0.9986829 -0.82804880.9420337 0.9622615 0.7180853 -0.80044353 0.2599983 -0.9926198 -0.27187952 -0.97995365 -0.9949787 0.5513077 -0.69439924 0.005424826 -0.02851493 -0.92623085 0.31212965 0.99999803 0.31744084 0.7138329 -0.30178043 -1.00.40791273 -0.9089964 0.9856001 0.95775354 0.96267927 0.31973028 0.57932490.5936054 -0.47324258 -0.1081081 0.26481277 -0.3450029 -0.56022626 -0.66203040.51201 -0.96771824 -0.8957188 0.9652928 0.9167802 -0.27903348 -0.31508788 -0.27735722 -0.06555849 0.94002247 0.4078891 -0.18808922 -0.906149740.85171497 0.29182288 -0.69974375 1.0 -0.74834865 -0.97941846 0.95751050.90557516 0.60767704 -0.5465612 0.65718335 -1.0 0.5289216 0.04478532 -0.99036914 0.39699554 0.67770135 -0.4028355 0.74035925 0.7142711 -0.63405085-0.7261532 -0.5376098 -0.9439443 -0.47505817 -0.47676566 0.15480274 -0.4105868 -0.5695629 -0.5274124 0.51932883 -0.5541567 -0.6824901 0.72205910.5015991 0.78165793 0.53702337 -0.47133535 0.66415066 -0.9613365 0.73404795-0.46821955 -0.9914113 -0.6865679 -0.99135035 0.7203648 -0.61735064 -0.20672026 0.9701276 -0.7293478 0.6253504 -0.1658886 -0.9830006 -1.0-0.7836291 -0.7457903 -0.4168746 -0.41568932 -0.9808697 -0.9610197 0.66763440.9610069 0.30807412 0.99999183 -0.46265262 0.9579391 -0.73210585 -0.853247050.8805271 -0.48206675 0.9132179 0.54531676 -0.65999603 0.27384743 -0.54864990.7381528 -0.85510635 -0.4125382 -0.92722934 -0.94044137 -0.470008280.9606229 -0.8138718 -0.98284966 -0.3483187 -0.28044567 -0.603799 0.90349150.8237358 0.47757703 -0.52120495 0.4420451 0.27747053 0.69998723 -0.8934448 -0.5452992 0.51825696 -0.4423273 -0.9640199 -0.9795761 -0.6279196 0.704966960.99389327 0.82892776 0.36524254 0.924795 -0.33834696 0.876235 -0.973326440.9867138 -0.3693307 0.4390758 -0.712964 0.5258218 -0.8733913 0.384611850.91503936 -0.8722628 -0.8089284 -0.0982373 -0.5335075 -0.5572353 -0.92141120.5875499 -0.4277193 -0.47889253 -0.17448896 0.9454572 0.9891623 0.89272590.60432297 0.8651404 -0.9226203 -0.55179673 0.2318353 0.47967187 0.23585440.99457157 -0.88839304 -0.14081924 -0.95551693 -0.98840755 0.044587657 -0.9282594 -0.23548827 -0.79045653 0.8534265 -0.57432944 0.73235446 0.59040433-0.9860356 -0.83869904 0.49752986 -0.623013 0.48438412 -0.27723688 0.8783370.9776588 -0.65762806 0.66433024 0.9228069 -0.96029663 -0.84734094 0.8427519-0.4912734 0.92624027 -0.79313713 0.99188876 0.9775371 0.8679769 -0.9441205 -0.8685749 -0.856022 -0.8269005 -0.20677428 0.19313015 0.9550734 0.686444760.57362485 0.15790954 -0.8087857 0.9989102 -0.84788394 -0.9612054 -0.46476302-0.54779977 -0.99062073 0.96064585 0.32870626 0.7347536 -0.65325516 -0.81362396 -0.96787184 0.9361749 0.1918277 0.98614925 -0.44271922 -0.9677956-0.7365571 -0.944983 0.020219954 -0.2970671 -0.6612086 0.09309814 -0.95848220.5780944 0.65581214 0.6090926 -0.9651666 0.99960613 1.0 0.980573360.89051276 0.93666524 -0.9999921 -0.55241376 0.9999993 -0.9977021 -1.0 -0.93428266 -0.7813177 0.442978 -1.0 -0.13297546 -0.086355336 -0.93614320.82980186 0.97704726 0.9965491 0.87467694]
标记权重矩阵
[0.9732773 0.7348044 0.6228178 -0.32777885 0.46181533 -0.9760962 -0.94163823 -0.80169123 -0.8691235 0.99980026 0.27033848 -0.805028 -0.925854860.7415514 -0.03183183 0.26345888 -0.97040737 -0.43909094 0.839970470.89861715 0.3148959 0.37373593 -0.72968465 0.4496323 0.25093934 0.34726110.6917669 -0.9550405 -0.6180419 -0.21123545 0.2854544 -0.83094734 -0.96598750.970728 -0.3359741 0.9723012 1.0 0.5905968 -0.9126012 0.83292776 0.42350197-0.57857996 1.0 0.8546204 -0.9842684 -0.6837396 0.8285335 -0.6901522 -0.8237249 0.9998405 -0.2803522 -0.8799769 -0.7137745 0.98063505 -0.993346330.9993464 -0.9352548 -0.9823275 0.96964484 0.9521962 -0.68265945 -0.831955550.18496192 -0.80025256 0.5008123 -0.93724686 0.8101333 0.6330687 -0.229535270.89323217 -0.87479544 -0.6644085 0.36399633 -0.6636714 -0.4254855 0.980965260.6702441 -0.38487446 -0.10170595 -0.45299453 -0.8679591 -0.979773760.7959597 1.0-0.408101 0.9313571 -0.54952115 -0.0797783 0.00286539950.6877464 0.69033056 -0.43368292 -0.9394509 0.92910355 -0.9715225 -0.99057320.80351007 0.25813422 -0.32561827 0.99999964 0.6117077 0.38629717 0.429654360.99757993 -0.06804277 0.5906618 0.9687944 0.98618084 -0.482932 0.68550050.84409165 -0.9677905 -0.42126963 -0.73167545 0.15805046 -0.93898820.118746065 -0.95660996 0.97227156 0.98656315 0.5933755 0.39894286 0.87168731.0-0.9271177 0.57872427 -0.13623634 0.8372727 -0.99998003 -0.8284159 -0.46975562 -0.20838195 -0.9349155 -0.51858497 0.44831672 -0.9629988 0.95798240.92290026 -0.9942093 -0.99028236 -0.48909596 0.92428124 0.18013635 -0.99738324 -0.8055185 -0.5644878 0.8887961 -0.3815287 -0.94340175 -0.68667674-0.5677376 0.64246285 -0.42589936 0.66641265 0.9327008 0.72508246 -0.89856136-0.4272885 -0.09482083 -0.83392894 0.91348124 -0.8615762 -0.9898126 -0.25428835 1.0 -0.508361 0.95597446 0.7863634 0.7940155 -0.355824230.28512347 0.9863842 0.3516465 -0.7890025 -0.96538055 -0.5692717 -0.67058330.7851611 0.83232665 0.83772707 0.90096194 0.9235191 0.20323493 -0.06770899 -0.095291555 0.99985546 -0.42753133 -0.25634903 -0.56413096 -0.2539562 -0.44034418 -0.3597228 1.0 0.385305 0.8229532 -0.9928271 -0.96011597 -0.9443362 1.0 0.84886146 -0.8160441 0.7532111 0.59750855 -0.0249570520.83749825 -0.32840902 -0.39816692 0.27313402 0.2699922 0.9570294 -0.6658057-0.97672814 -0.77847016 0.57142115 -0.9706531 0.9999978 -0.6693475 -0.5731653-0.55459416 -0.41579917 0.093631476 -0.11751255 -0.9829011 -0.396143850.39634296 0.964925 0.285559 -0.6878334 -0.9165842 0.9316707 0.8943601 -0.97349924 -0.9712057 0.96567255 -0.98713046 0.7581767 1.0 0.391858550.5673435 0.37038493 -0.6254468 0.5251907 -0.5285827 0.7726603 -0.9521132 -0.40844926 -0.28094995 0.5304313 -0.3595196 -0.60880595 0.77281857 0.30524546-0.6367073 -0.7121743 -0.23052841 0.5242646 0.9345883 -0.42915082 -0.177219260.2742938 -0.15892437 -0.94877285 -0.4855368 -0.5819405 -0.99999976 0.7694981-1.0 0.8080647 0.5325984 -0.39299208 0.86971176 0.4949563 0.89314055 -0.84562975 -0.95855373 0.4144704 0.86852527 -0.49621144 -0.8203894 -0.74488580.41348496 -0.096635774 0.4079483 -0.8333679 0.8128923 -0.26106665 1.00.21010002 -0.83741623 -0.9825915 0.20455872 -0.33447865 1.0 -0.8994603 -0.9601328 0.5013205 -0.8236577 -0.8710756 0.545023 0.011208467 -0.8411487 -0.9880564 0.9667554 0.888109 -0.69324565]
结果权重矩阵
[-0.9419668 -0.55713624 -0.97269773 0.92488754 0.8181052 -0.202435450.90414405 0.2930623 -0.9482085 -0.99999684 -0.81892914 0.97855 0.98339260.74806386 0.9339398 -0.78607893 -0.3959001 -0.6722102 0.40020847 -0.463024260.7437414 0.99999946 -0.2939085 0.32978788 0.5903137 0.99867517 -0.848594670.94220465 0.9629802 0.6101735 -0.7324193 0.36757135 -0.9905183 -0.28683075 -0.9679988 -0.995032 0.5821432 -0.71134156 0.06046868 -0.10328594 -0.9129920.39078587 0.9999973 0.3814591 0.6462977 -0.37037757 -1.0 0.39619774 -0.90487635 0.9875689 0.9537641 0.98144215 0.30634275 0.57031864 0.5890024 -0.5006227 -0.052588645 0.2526485 -0.34883696 -0.59646815 -0.67667030.51449054 -0.9446721 -0.9128197 0.97171396 0.9296414 -0.3472259 -0.32634053-0.24326883 -0.055432178 0.93608046 0.3155784 -0.2402385 -0.878385660.8854268 0.36244097 -0.7534438 1.0 -0.7779765 -0.9802859 0.9222548 0.90494880.68042 -0.5741393 0.6493024 -1.0 0.62394863 -0.018053856 -0.98904380.42681798 0.72159046 -0.34553525 0.59267575 0.7368815 -0.60032135 -0.71327764 -0.49316147 -0.925915 -0.4990491 -0.51251775 0.19760951 -0.41279778 -0.4972197 -0.52303976 0.51873356 -0.62001085 -0.654799 0.640343840.4336283 0.74944854 0.5512493 -0.4976758 0.6573349 -0.97128695 0.7927335 -0.44955623 -0.9892971 -0.75191087 -0.9890073 0.6808466 -0.5788558 -0.191558670.9688536 -0.6220369 0.5623865 -0.18882789 -0.9895727 -1.0 -0.82296455 -0.7020875 -0.44070876 -0.4478782 -0.97990793 -0.96851236 0.6708143 0.96322320.34590402 0.9999847 -0.46911094 0.9504683 -0.6881798 -0.82538337 0.8977076 -0.59039074 0.8950203 0.5158567 -0.67266273 0.21538389 -0.6035279 0.7172157 -0.8362736 -0.39726895 -0.9118482 -0.9262451 -0.49587554 0.9511121 -0.7488941-0.9829855 -0.29264233 -0.41023806 -0.6130626 0.83953136 0.8865214 0.44873998-0.49498907 0.5513412 0.3527894 0.720476 -0.8771519 -0.6215287 0.54062635 -0.5068145 -0.95950264 -0.982895 -0.5892084 0.73281 0.99301857 0.82734470.3646386 0.94895196 -0.37511384 0.89096826 -0.97179073 0.9848797 -0.2866440.3654667 -0.5929482 0.56288224 -0.8091314 0.20238139 0.90750605 -0.8648725 -0.8120762 -0.16418605 -0.54305947 -0.530553 -0.9232808 0.5631609 -0.33315083-0.5069858 -0.2199384 0.93628484 0.98721886 0.8268996 0.5875212 0.87341 -0.9095007 -0.47507694 0.24327968 0.41347852 0.19497763 0.99503374 -0.8018322-0.21967615 -0.9461079 -0.9885726 0.02160253 -0.94249004 -0.24583887 -0.7975322 0.85434157 -0.542995 0.807365 0.63300043 -0.9864112 -0.79892610.52959883 -0.64361125 0.491298 -0.30995166 0.7962095 0.97642547 -0.628604950.6607291 0.9071111 -0.9267494 -0.8181853 0.83287925 -0.4871497 0.9078692 -0.7866857 0.9904054 0.9800968 0.8991177 -0.93559647 -0.8363416 -0.8212061 -0.85371083 -0.24178748 0.26606172 0.9656702 0.75601685 0.5927866 0.24516703 -0.77907753 0.99828136 -0.7094333 -0.96048135 -0.50148124 -0.44417462 -0.9871508 0.9571242 0.4171309 0.64616376 -0.6206483 -0.8046106 -0.957092340.926217 0.18128477 0.9884775 -0.5399007 -0.9571306 -0.77829236 -0.9307564 -0.04540403 -0.3640887 -0.23292497 -0.10815491 -0.9295621 0.83573747 0.9758290.9952732 -1.0 0.8535448 0.95422226 -0.7506009 0.99138236 -0.64398980.97208685 0.51043063 0.61659235 -0.32631555 0.45685592 -0.9639011 -0.9168702-0.8006079]
(7)通过公式(6)计算出的句子上下文词向量表示如下:
第1个句子
[0.2805 0.3642 0.2743 0.7160 … 0.6938 0.4496 0.7118 0.5727]
[0.2792 0.2689 0.3060 0.7268 … 0.7278 0.6788 0.7179 0.3130]
[0.4023 0.3380 0.5987 0.3863 … 0.6778 0.7311 0.4270 0.5817]
[0.3247 0.5909 0.2708 0.4288 … 0.2753 0.2699 0.6416 0.3293]
[0.5151 0.4742 0.2864 0.5965 … 0.7311 0.5942 0.6562 0.4084]
[0.2689 0.5978 0.2880 0.7286 … 0.7219 0.7274 0.5760 0.6388]
[0.3370 0.6259 0.2800 0.2864 … 0.7255 0.7170 0.4141 0.4191]
[0.4395 0.4861 0.7183 0.5782 … 0.4402 0.2935 0.7079 0.5896]
[0.3201 0.7311 0.3148 0.2728 … 0.7155 0.7120 0.6638 0.3603]
[0.6730 0.4145 0.6440 0.6763 … 0.3543 0.3289 0.3791 0.2838]
[0.3778 0.3746 0.5492 0.3982 … 0.3782 0.3721 0.6269 0.3498]
[0.3419 0.6548 0.6067 0.6791 … 0.6344 0.3781 0.6587 0.2746]
[0.3592 0.4523 0.7249 0.3493 … 0.6370 0.4529 0.2710 0.2689]
[0.3051 0.3313 0.3916 0.3899 … 0.2729 0.2752 0.6617 0.7238]
[0.5856 0.7311 0.3848 0.7212 … 0.3344 0.3046 0.7105 0.3565]
[0.3023 0.4020 0.2866 0.2837 … 0.3785 0.7213 0.3211 0.2723]
[0.4274 0.3989 0.3514 0.6984 … 0.7082 0.6103 0.3787 0.6344]
[0.5873 0.6727 0.2938 0.3494 … 0.6320 0.3759 0.2770 0.2723]
……
第11个句子
[0.4073 0.7091 0.2745 0.7281 … 0.4288 0.5904 0.3560 0.6371]
[0.3081 0.5504 0.7125 0.2963 … 0.3074 0.4590 0.3675 0.3704]
[0.2843 0.6372 0.4175 0.3759 … 0.4452 0.7183 0.7285 0.6957]
[0.6428 0.7055 0.2871 0.3834 … 0.5605 0.6019 0.5486 0.7301]
[0.3096 0.4453 0.2797 0.2712 … 0.5054 0.2804 0.4388 0.3106]
[0.7015 0.3675 0.6915 0.6532 … 0.2716 0.3103 0.6294 0.3444]
[0.6204 0.4231 0.6892 0.7264 … 0.3478 0.6594 0.7124 0.2836]
[0.3061 0.6970 0.3806 0.7126 … 0.3129 0.7292 0.7271 0.7108]
[0.2818 0.3023 0.3055 0.2987 … 0.4398 0.5661 0.7243 0.6805]
[0.6440 0.5610 0.3145 0.7307 … 0.3297 0.2768 0.3772 0.3907]
[0.2715 0.7225 0.6028 0.6561 … 0.3496 0.3090 0.2775 0.7163]
[0.5452 0.7288 0.3682 0.2775 … 0.3147 0.2828 0.4887 0.4100]
[0.3342 0.5002 0.2778 0.6231 … 0.6594 0.6521 0.2755 0.7310]
[0.7311 0.7277 0.7102 0.7166 … 0.2689 0.3575 0.7311 0.2693]
[0.2689 0.2851 0.3190 0.5779 … 0.2689 0.4420 0.4730 0.2830]
[0.6976 0.7263 0.7301 0.2689 … 0.7013 0.7220 0.3207 0.7294]
第二步骤:执行“句子最佳候选句子推荐模块”
输出待处理英语文本中所有句子的候选句子集如下:
第1个句子
In modern society,we live on commodities,and the fake commodities isa dangerous enemy in the darkness.0.9621
In modern society,we live on commodities,and the fake commodities isa danger enemy in the darkness.0.8036
In modern society,we live in commodities,and the fake commodities isa danger enemy in the darkness.0.7829
In modern society,we live in commodities,and the phony commodities isa dangerous enemy in the darkness.0.7643
In modern society,we live in commodities,and the phony commodities isa danger enemy in the dark.0.7017
第2个句子
They not only cannot afford us the useful aspect what they shouldhave,but also intimidate our lives and possessions.0.9513
They not only cannot afford us useful aspect what they should have,but also intimidate our lives and our possessions.0.8961
They not only cannot provide us the useful aspect what they shouldhave,but also intimidate our lives and our possessions.0.8614
They not only cannot afford us the useful things what they shouldprovide,but also intimidate our lives and our possessions.0.8506
They not only cannot afford us the useful aspect that they shouldhave,but also intimidate our lives and possessions.0.8441
……
第11个句子
I believe that the day without any fake commodities will come soon inspite of some difficulties.0.9223
I believe the day without any fake commodities will come soon inspite of suffering some difficulties.0.9036
I believe that the day without any fake product will come soon inspite of some difficulties.0.8720
I believe that the day without any fake commodities will come soon inspite of suffering some difficulties.0.8663
I believe the day without any fake products will come soon aftersuffering some difficulties.0.8432
第三步骤:执行“句子语法错误纠正生成模块”
(1)选择概率最大的候选句子作为正确的纠正结果如下:
第1个句子
In modern society,we live on commodities,and the fake commodities isa dangerous enemy in the darkness.
第2个句子
They not only cannot afford us the useful aspect what they shouldhave,but also intimidate our lives and possessions.
……
第11个句子
I believe that the day without any fake commodities will come soon inspite of some difficulties.
(2)对待处理英语文本的语法错误数进行统计,并根据公式(7)计算出待处理英语文本的语法纠错得分,给出相应的语法纠错建议:
英语文本中的语法错误总数:4
错误句子1:In modern society,we live on commodities,and the fakecommodities is a danger enemy in the darkness.
错误类型:词性使用错误
纠正建议:danger建议改为dangerous
示例表达:A Wolf in a sheep's skin is our most dangerous enemy.
错误句子2:For example,a high-pressure pain is not quantified,and itsexplosion can cause a tragical accident.
错误类型:用词不当
纠正建议:tragical建议改为tragic
示例表达:These tragic incidents have had an immediate effect.
错误句子3:The like that has been printed in newspaper not for thefirst time.
错误类型:名词单复数错误
纠正建议:newspaper建议改为newspapers
示例表达:Newspapers lack the immediacy of television.
错误句子4:On the other hand,the fake commodities can also affect thefame of some firm badly.
错误类型:名词单复数错误
纠正建议:firm建议改为firms
示例表达:Some smallish firms may close.
英语文本语法正确性得分:81.82
英语文本的语法错误程度评语:存在少量的语法错误,整体较为良好。

Claims (5)

1.一种英语文本中句子语法错误自动纠正方法,其特征是:包括一个由顺序连接的句子上下文词向量表示模块、句子最佳候选句子推荐模块、句子语法错误纠正生成模块组成的纠正模型,其包括如下步骤:
(1)句子上下文词向量表示模块的处理流程是:
第一,读入待处理英语文本,将其切分成句子并进行单词词性标注;
第二,根据单词词性标注进行句法依存关系分析和单词依赖关系分析,得到待处理英语文本的句法关系树和单词依赖关系树;
第三,根据待处理英语文本的句法关系树和单词依赖关系树,对待处理英语文本中句子进行单词的向量化处理,得到句子中单词的词向量;
第四,初始化搜索权重矩阵、标记权重矩阵和结果权重矩阵,计算句子中单词的搜索向量、标记向量和结果向量;
第五,计算句子中单词注意力权重、句间注意力向量和上下文词向量,最后输出待处理英语文本中句子上下文词向量表示;
(2)句子最佳候选句子推荐模块的处理流程是:
第一,读取待处理英语文本中一个句子的上下文词向量表示,采用奇异值分解对句子上下文词向量进行降维处理;
第二,将降维后的句子上下文词向量与单词的词向量进行合并;
第三,将合并后的词向量进行数值缩放和归一化处理;
第四,计算英语单词词典中所有单词的下一个单词出现概率,并把概率最高的前5个单词作为候选单词,分别基于每一个候选单词进行推理,得到基于该单词的新候选单词;
第五,把候选单词构成的句子中概率最大的句子作为候选句子,对选取的候选句子概率进行累加,并判断候选句子的累加概率是否达到设定的阈值,如果是则停止选取新的候选句子,并输出选取的所有候选句子;如果否则继续选取概率最大的句子添加到候选句子集中;
第六,判断英语文本中句子是否处理完,如果是则输出英语文本中所有句子的候选句子集,否则返回第二步继续处理英语文本中剩余的句子,直至处理完待处理英语文本中的所有句子为止;
(3)句子语法错误纠正生成模块的处理流程是:
第一,读取待处理英语文本中句子的全部候选句子,并选取概率最高的候选句子作为语法纠错的结果;
第二,统计待处理英语文本中出现的语法错误总数,计算待处理英语文本的语法纠错得分,并根据待处理英语文本的语法纠错得分输出相应的语法纠错建议;
步骤(1)所述的句子上下文词向量表示模块的计算公式定义如下:
(1.1)搜索向量、标记向量和结果向量的计算公式
Figure 61428DEST_PATH_IMAGE001
(1)
Figure 447410DEST_PATH_IMAGE002
(2)
Figure 430410DEST_PATH_IMAGE003
(3)
在公式(1)、(2)、(3)中,j为英语文本中单词顺序的编号;
(1.2)单词注意力权重的计算公式
Figure 294461DEST_PATH_IMAGE004
(4)
在公式(4)中,i为英语文本中第i个单词的编号,搜索向量、标记向量与结果向量由公式(1)(2)(3)计算得到;
(1.3)句间注意力向量的计算公式
Figure 561363DEST_PATH_IMAGE005
(5)
在公式(5)中,i为英语文本中第i个单词的编号,j为英语文本中单词顺序的编号,N为英语文本中单词的总数;
(1.4)上下文词向量的计算公式
Figure 852667DEST_PATH_IMAGE006
(6)
在公式(6)中,i为英语文本中第i个单词的编号,j为英语文本中单词顺序的编号,N为英语文本中单词的总数。
2.根据权利要求1所述的纠正方法,其特征是:所述的句子上下文词向量表示模块处理流程如下:
P201 开始;
P202 读入待处理英语文本;
P203 对待处理英语文本进行分句、分词和单词词性标注处理;
P204 对待处理英语文本进行句法依存关系分析和单词依赖关系分析,得到待处理英语文本的句法关系树和单词依赖关系树;
P205 依次读取待处理英语文本中每个句子;
P206 根据待处理英语文本的句法关系树和单词依赖关系树,对每个句子进行单词的词向量化处理,得到每个句子中单词的词向量;
P207 初始化搜索权重矩阵、标记权重矩阵和结果权重矩阵的数值;
P208 采用公式(1)、(2)、(3)计算每个句子中单词的搜索向量、标记向量和结果向量;
P209 采用公式(4)计算每个句子中单词的单词注意力权重,采用公式(5)计算每个句子的句间注意力向量;
P210更新每个句子的搜索权重矩阵、标记权重矩阵、结果权重矩阵、单词注意力权重和句间注意力向量的数值;
P211 根据每个句子中单词注意力权重的更新数值,对每个句子的句间注意力向量进行更新,并采用公式(6)计算出每个句子的上下文词向量;
P212 输出待处理英语文本中句子的上下文词向量表示;
P213 结束。
3.根据权利要求1所述的纠正方法,其特征是:所述的句子最佳候选句子推荐模块处理流程如下:
P301 开始;
P302 读取待处理英语文本中一个句子的上下文词向量;
P303 采用奇异值分解对句子的上下文词向量进行降维处理;
P304 将降维后的句子上下文词向量与单词的词向量进行合并;
P305 将合并后的词向量进行数值缩放和归一化处理;
P306 计算英语单词词典中所有单词的下一个单词出现概率,并把概率最高的前5个单词作为候选单词;
P307 分别基于每一个候选单词进行推理,得到基于该单词的新候选单词;
P308 把候选单词构成的句子中概率最大的句子作为候选句子,并对选取的候选句子概率进行累加;
P309 判断候选句子的累加概率是否达到设定的阈值,是则执行P310,否则执行P308;
P310 停止选取新的候选句子,并输出选取的所有候选句子;
P311 判断待处理英语文本中句子是否处理完,是则执行P312,否则执行P302;
P312 输出待处理英语文本中所有句子的候选句子集;
P313 结束。
4.根据权利要求1所述的纠正方法,其特征是:所述句子语法错误纠正生成模块的计算公式定义如下:
英语文本语法纠错得分的计算公式
Figure 588542DEST_PATH_IMAGE007
(7)。
5.根据权利要求4所述的纠正方法,其特征是:所述的句子语法错误纠正生成模块处理流程如下:
P401 开始;
P402 读取待处理英语文本的所有句子的候选句子,并选取概率最高的候选句子作为语法纠错的结果;
P403 统计待处理英语文本中的语法错误总数;
P404 采用公式(7)计算待处理英语文本的语法纠错得分,并生成对应的语法纠错建议;
P405 结束。
CN202110916902.XA 2021-08-11 2021-08-11 一种英语文本中句子语法错误自动纠正方法 Active CN113553835B (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110916902.XA CN113553835B (zh) 2021-08-11 2021-08-11 一种英语文本中句子语法错误自动纠正方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110916902.XA CN113553835B (zh) 2021-08-11 2021-08-11 一种英语文本中句子语法错误自动纠正方法

Publications (2)

Publication Number Publication Date
CN113553835A CN113553835A (zh) 2021-10-26
CN113553835B true CN113553835B (zh) 2022-12-09

Family

ID=78133791

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110916902.XA Active CN113553835B (zh) 2021-08-11 2021-08-11 一种英语文本中句子语法错误自动纠正方法

Country Status (1)

Country Link
CN (1) CN113553835B (zh)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106776549A (zh) * 2016-12-06 2017-05-31 桂林电子科技大学 一种基于规则的英语作文语法错误纠正方法
CN107357775A (zh) * 2017-06-05 2017-11-17 百度在线网络技术(北京)有限公司 基于人工智能的循环神经网络的文本纠错方法及装置
CN111428470A (zh) * 2020-03-23 2020-07-17 北京世纪好未来教育科技有限公司 文本连贯性判定及其模型训练方法、电子设备及可读介质
CN111737980A (zh) * 2020-06-22 2020-10-02 桂林电子科技大学 一种英语文本单词使用错误的纠正方法
CN112466279A (zh) * 2021-02-02 2021-03-09 深圳市阿卡索资讯股份有限公司 一种英语口语发音自动纠正方法和装置
CN112613323A (zh) * 2020-12-21 2021-04-06 中国科学技术大学 语法依赖增强的数学应用题语义识别与推理方法及系统
CN112686030A (zh) * 2020-12-29 2021-04-20 科大讯飞股份有限公司 语法纠错方法、装置、电子设备和存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103365838B (zh) * 2013-07-24 2016-04-20 桂林电子科技大学 基于多元特征的英语作文语法错误自动纠正方法
CN106484682B (zh) * 2015-08-25 2019-06-25 阿里巴巴集团控股有限公司 基于统计的机器翻译方法、装置及电子设备
CN108519974A (zh) * 2018-03-31 2018-09-11 华南理工大学 英语作文语法错误自动检测与分析方法
CN109543022B (zh) * 2018-12-17 2020-10-13 北京百度网讯科技有限公司 文本纠错方法和装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106776549A (zh) * 2016-12-06 2017-05-31 桂林电子科技大学 一种基于规则的英语作文语法错误纠正方法
CN107357775A (zh) * 2017-06-05 2017-11-17 百度在线网络技术(北京)有限公司 基于人工智能的循环神经网络的文本纠错方法及装置
CN111428470A (zh) * 2020-03-23 2020-07-17 北京世纪好未来教育科技有限公司 文本连贯性判定及其模型训练方法、电子设备及可读介质
CN111737980A (zh) * 2020-06-22 2020-10-02 桂林电子科技大学 一种英语文本单词使用错误的纠正方法
CN112613323A (zh) * 2020-12-21 2021-04-06 中国科学技术大学 语法依赖增强的数学应用题语义识别与推理方法及系统
CN112686030A (zh) * 2020-12-29 2021-04-20 科大讯飞股份有限公司 语法纠错方法、装置、电子设备和存储介质
CN112466279A (zh) * 2021-02-02 2021-03-09 深圳市阿卡索资讯股份有限公司 一种英语口语发音自动纠正方法和装置

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BERT-based Contextual Semantic analysis for English Preposition Error Correction;Guimin Huang等;《Journal of Physics》;20201231;第1-4页 *
一种用于英语语法错误纠正的层次语言模型的研究与设计;李灿润;《中国优秀硕士学位论文全文数据库信息科技辑》;20180115(第1期);第I138-2005页 *

Also Published As

Publication number Publication date
CN113553835A (zh) 2021-10-26

Similar Documents

Publication Publication Date Title
Yoo et al. Gpt3mix: Leveraging large-scale language models for text augmentation
Faruqui et al. Morphological inflection generation using character sequence to sequence learning
US5835888A (en) Statistical language model for inflected languages
Damper et al. Evaluating the pronunciation component of text-to-speech systems for English: A performance comparison of different approaches
US20120166942A1 (en) Using parts-of-speech tagging and named entity recognition for spelling correction
US20070179779A1 (en) Language information translating device and method
Lee et al. Syllable-pattern-based unknown-morpheme segmentation and estimation for hybrid part-of-speech tagging of Korean
CN112669845A (zh) 语音识别结果的校正方法及装置、电子设备、存储介质
Jain et al. Generating gender augmented data for NLP
Li et al. Boost transformer with BERT and copying mechanism for ASR error correction
CN113553835B (zh) 一种英语文本中句子语法错误自动纠正方法
US10410624B2 (en) Training apparatus, training method, and computer program product
Manghat et al. Malayalam-English Code-Switched: Grapheme to Phoneme System.
KR20040089774A (ko) 어절 엔-그램을 이용한 띄어쓰기와 철자 교정장치 및 방법
Declerck et al. Towards the addition of pronunciation information to lexical semantic resources
Toyin et al. ArTST: Arabic Text and Speech Transformer
Pellegrini et al. Using phonetic features in unsupervised word decompounding for ASR with application to a less-represented language.
Habeeb et al. Three N-grams Based Language Model for Auto-correction of Speech Recognition Errors
Krishnapriya et al. Design of a POS tagger using conditional random fields for Malayalam
Proisl et al. The_illiterati: Part-of-speech tagging for magahi and bhojpuri without even knowing the alphabet
Azimizadeh et al. Persian part of speech tagger based on Hidden Markov Model
Mosquera et al. TENOR: A lexical normalisation tool for spanish web 2.0 texts
Sudesh et al. Erroff: A Tool to Identify and Correct Real-word Errors in Sinhala Documents
Ono How to Handle “Missing Values” in Linguistic Typology: A Pitfall in the Statistical Modelling Approach
Cai et al. Dependency grammar based English subject-verb agreement evaluation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Huang Guimin

Inventor after: Wang Jiahao

Inventor after: Zhang Xiaowei

Inventor before: Huang Guimin

Inventor before: Zhang Xiaowei

Inventor before: Wang Jiahao

CB03 Change of inventor or designer information
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20211026

Assignee: Guilin ruiweisaide Technology Co.,Ltd.

Assignor: GUILIN University OF ELECTRONIC TECHNOLOGY

Contract record no.: X2023980046266

Denomination of invention: An Automatic Correction Method for Sentence Grammar Errors in English Text

Granted publication date: 20221209

License type: Common License

Record date: 20231108

EE01 Entry into force of recordation of patent licensing contract