CN111709224B - 一种英语短文句子层次主题连贯分析方法 - Google Patents

一种英语短文句子层次主题连贯分析方法 Download PDF

Info

Publication number
CN111709224B
CN111709224B CN202010573975.9A CN202010573975A CN111709224B CN 111709224 B CN111709224 B CN 111709224B CN 202010573975 A CN202010573975 A CN 202010573975A CN 111709224 B CN111709224 B CN 111709224B
Authority
CN
China
Prior art keywords
topic
english short
sentence
text
english
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010573975.9A
Other languages
English (en)
Other versions
CN111709224A (zh
Inventor
黄桂敏
范春丽
黄思睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin University of Electronic Technology
Original Assignee
Guilin University of Electronic Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Electronic Technology filed Critical Guilin University of Electronic Technology
Priority to CN202010573975.9A priority Critical patent/CN111709224B/zh
Publication of CN111709224A publication Critical patent/CN111709224A/zh
Application granted granted Critical
Publication of CN111709224B publication Critical patent/CN111709224B/zh
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

本发明提供一种英语短文句子层次主题连贯分析方法,该方法是一个由顺序连接的英语短文句子预处理模块、英语短文层次主题树混合语义空间分析模块、英语短文句子层次主题连贯分析模块、英语短文句子层次主题连贯分析输出模块组成的分析模型。一篇英语短文通过该分析模型和分析方法处理后,最后能够得到这篇英语短文句子层次的主题连贯分析结果。本发明分析方法解决了英语短文句子层次主题连贯的自动分析问题,其分析结果比传统的英语短文句子层次主题连贯分析结果更好。

Description

一种英语短文句子层次主题连贯分析方法
技术领域
本发明涉及自然语言处理技术,具体是一种使用计算机自动分析英语短文中句子层次主题是否连贯的方法,本发明的分析方法只适用于分析英语短文,不适用于分析中文短文。
背景技术
在英语短文中句子主题连贯程度决定了句子是否围绕主题,目前国内外英语短文句子主题连贯分析方法主要分为无监督的英语短文句子主题连贯分析方法和有监督的英语短文句子主题连贯分析方法,有监督的英语短文句子主题连贯分析方法,需要参考英语短文范文,不适用于对大量的英语短文进行句子层次主题连贯分析;无监督的英语短文句子主题连贯分析方法,是通过分布式向量直接计算英语短文句子层次主题连贯语义相似度,来判断英语短文句子的主题连贯程度,缺乏对英语短文句子层次主题连贯特征的分析。本发明为了解决上述问题,提供了一种英语短文句子层次主题连贯分析方法。
发明内容
本发明的一种英语短文句子层次主题连贯分析方法的总体处理流程如图1所示,其中包括英语短文句子预处理模块、英语短文层次主题树混合语义空间分析模块、英语短文句子层次主题连贯分析模块、英语短文句子层次主题连贯分析输出模块。
其中的英语短文句子预处理模块的处理流程是:第一,输入英语短文的题目和全文,对英语短文题目和英语短文全文分别进行分词分句、删除停用词、词干化处理;第二,对分词分句、删除停用词、词干化处理后的英语短文的题目和全文进行词性标注、关系三元组提取;第三,输出上述两步处理的英语短文的题目和全文的预处理结果。
其中的英语短文层次主题树混合语义空间分析模块的处理流程是:第一,输入英语短文的题目和全文的预处理结果,使用构建的关系三元组层次主题树模型,对从英语短文的题目、全文、段落、句子的关系三元组信息分别进行主题聚类;第二,将主题聚类映射到分布式语义空间中,生成英语短文的题目主题关系三元组分布式向量、全文主题关系三元组分布式向量、段落主题关系三元组分布式向量、句子主题关系三元组分布式向量;第三,对生成的英语短文的题目主题关系三元组分布式向量、全文主题关系三元组分布式向量、段落主题关系三元组分布式向量、句子主题关系三元组分布式向量,匹配英语知识库中语义概念,抽取相邻关系三元组,并通过迭代的方法分析出最优英语短文的题目、全文、段落、句子的候选主题关系三元组集合,扩展英语短文的题目主题关系三元组分布式向量、全文主题关系三元组分布式向量、段落主题关系三元组分布式向量、句子主题关系三元组分布式向量。
其中的英语短文句子层次主题连贯分析模块的处理流程是:第一,输入英语短文的题目主题关系三元组分布式向量、全文主题关系三元组分布式向量、段落主题关系三元组分布式向量、句子主题关系三元组分布式向量,分别计算英语短文中题目与句子之间的层次主题连贯语义相似度、段落与句子之间的层次主题连贯语义相似度;第二,根据计算出的英语短文中题目与句子之间的层次主题连贯语义相似度、段落与句子之间的层次主题连贯语义相似度,设置计算英语短文中题目与句子之间的层次主题连贯语义相似度的权重值、段落与句子之间的层次主题连贯语义相似度的权重值,计算出英语短文中句子的层次主题连贯语义相似度,根据计算出的英语短文中句子的层次主题连贯语义相似度,计算英语短文层次主题连贯语义相似度评分值;第三,计算英语短文中句子与段落的层次主题连贯值、段落与段落的层次主题连贯值、各段落与全文的层次主题连贯值;第四,根据英语短文中句子与段落的层次主题连贯值,将各句子与段落的层次主题连贯值排序,设置层次主题连贯阈值抽取英语短文中主题不连贯句子;第五,根据英语短文中句子与段落的层次主题连贯值、段落与段落的层次主题连贯值、段落与全文的层次主题连贯值,计算出英语短文的层次主题连贯评分均值。
其中的英语短文句子层次主题连贯分析输出模块的处理流程是:第一,输入英语短文句子层次主题连贯分析模块中的英语短文层次主题连贯语义相似度评分值、英语短文层次主题连贯评分均值;第二,根据英语短文句子层次主题连贯分析模块的英语短文层次主题连贯语义相似度评分值、英语短文层次主题连贯评分均值,计算英语短文的主题连贯分数,并生成英语短文的主题连贯分析评语。
本发明的概念与结构定义如下:
1.英语短文单词的词性标注
本发明的英语短文单词的词性标注是对英语短文句子中的单词进行词性标注,单词词性标注的格式如下:
英语短文句子单词1/词性,
英语短文句子单词2/词性,
英语短文句子单词3/词性,
……
英语短文句子单词n/词性
2.英语短文的切分
本发明的英语短文的切分是对英语短文的段落、句子两个部分进行切分,切分结果的格式如下:
[段落1]-段落切分标记1,[段落2]-段落切分标记2,……[段落m]-段落切分标记k
英语短文句子切分的格式如下:
[[句子1]-句子切分标记1,[句子2]-句子切分标记2,……[句子n]-句子切分标记p]]
3.关系三元组的结构
本发明的关系三元组包括英语短文中句子的主语,谓语和宾语,它的结构如下:
题目1[关系三元组1,关系三元组2,……,关系三元组n]
段落1[关系三元组1,关系三元组2,……,关系三元组i]
段落2[关系三元组1,关系三元组2,……,关系三元组j]
……
段落n[关系三元组1,关系三元组2,……,关系三元组k]
4.关系三元组层次主题树模型的结构
本发明的关系三元组层次主题树模型包括英语短文的题目、段落、句子和全文的主题关系三元组,它的结构如下:
题目1[主题关系三元组1,主题关系三元组2,……,主题关系三元组n]
段落1[主题关系三元组1,主题关系三元组2,……,主题关系三元组i]
段落2[主题关系三元组1,主题关系三元组2,……,主题关系三元组j]
……
段落m[主题关系三元组1,主题关系三元组2,……,主题关系三元组k]
……
句子1[主题关系三元组1,主题关系三元组2,……,主题关系三元组l]
句子2[主题关系三元组1,主题关系三元组2,……,主题关系三元组o]
……
句子n[主题关系三元组1,主题关系三元组2,……,主题关系三元组p]
……
全文[主题关系三元组1,主题关系三元组2,……,主题关系三元组r]
5.主题关系三元组分布式向量空间的结构
本发明的主题关系三元组分布式向量空间的结构如下:
主题关系三元组1[300维向量]
主题关系三元组2[300维向量]
……
主题关系三元组n[300维向量]
6.英语知识库的结构
本发明英语知识库结构中的概念是指英语短文中的单词语义,关系是指英语短文中单词之间的主谓关系,权重值是指英语短文中单词之间主谓关系出现次数,英语知识库的结构如下:
[概念1,关系及权重值,概念n+1]
[概念2,关系及权重值,概念n+2]
……
[概念n,关系及权重值,概念n+m]
7.英语短文题目与句子层次主题连贯语义相似度计算公式
Figure BDA0002550360860000031
Figure BDA0002550360860000041
在计算公式(1)中,n表示在英语短文层次主题树混合语义向量空间中,从第i维到n维的英语短文题目主题关系三元组分布式向量和英语短文句子主题关系三元组分布式向量。
8.英语短文段落与句子层次主题连贯语义相似度计算公式
Figure BDA0002550360860000042
在计算公式(2)中,n表示在英语短文的层次主题树混合语义向量空间中,从第i维到n维的英语短文段落主题关系三元组分布式向量和英语短文句子主题关系三元组分布式向量。
9.英语短文句子层次主题连贯语义相似度计算公式
英语短文句子层次主题连贯语义相似度=δ1×英语短文题目与句子层次主题连贯语义相似度+δ2×英语短文段落与句子层次主题连贯语义相似度  (3)
在计算公式(3)中,δ1,δ2分别表示英语短文题目与句子层次主题连贯语义相似度、英语短文段落与句子层次主题连贯语义相似度在英语短文句子层次主题连贯语义相似度中的权重值,并且δ12=1。英语短文题目与句子层次主题连贯语义相似度由计算公式(1)得出,英语短文段落与句子层次主题连贯语义相似度由计算公式(2)得出。
10.英语短文层次主题连贯语义相似度评分值计算公式
Figure BDA0002550360860000043
在计算公式(4)中,英语短文句子层次主题连贯语义相似度由计算公式(3)得出,n表示英语短文中句子总数。
11.英语短文句子与段落层次主题连贯值计算公式
Figure BDA0002550360860000044
在计算公式(5)中,n表示英语短文中包含的所有句子主题关系三元组分布式向量的数量,i表示英语短文中第i个句子主题关系三元组分布式向量。
12.英语短文段落与段落层次主题连贯值计算公式
Figure BDA0002550360860000045
在计算公式(6)中,n表示英语短文中包含的所有段落主题关系三元组分布式向量的数量,j表示英语短文中第j个段落主题关系三元组分布式向量。
13.英语短文段落与全文层次主题连贯值计算公式
Figure BDA0002550360860000051
在计算公式(7)中,n表示英语短文中包含的所有段落主题关系三元组分布式向量的数量,k表示英语短文中第k个段落主题关系三元组分布式向量。
14.英语短文层次主题连贯评分均值计算公式
Figure BDA0002550360860000052
在计算公式(8)中,ε1,ε2,ε3分别表示英语短文中句子与段落的层次主题连贯值、段落与段落的层次主题连贯值、段落与全文的层次主题连贯值在英语短文层次主题连贯评分均值中的权重分配,且ε123=1。N表示英语短文中句子的主题关系三元组分布式向量数量,M表示英语短文中段落的主题关系三元组分布式向量数量。英语短文句子与段落层次主题连贯值由计算公式(5)得出,英语短文段落与段落层次主题连贯值由计算公式(6)得出,英语短文段落与全文层次主题连贯值由计算公式(7)得出。
15.英语短文主题连贯分数计算公式
英语短文主题连贯分数=(0.5×英语短文层次主题连贯语义相似度评分值+0.5×英语短文层次主题连贯评分均值)×100  (9)
在计算公式(9)中,英语短文层次主题连贯语义相似度评分值由计算公式(4)得出,英语短文层次主题连贯评分均值由计算公式(8)得出。
本发明的英语短文句子预处理模块、英语短文层次主题树混合语义空间分析模块、英语短文句子层次主题连贯分析模块、英语短文句子层次主题连贯分析输出模块处理流程如下所述。
如图2所示,所述英语短文句子预处理模块处理流程如下:
P201开始;
P202读取英语短文的题目和全文;
P203对英语短文题目进行分词分句,输出英语短文题目的切分结果;
P204对英语短文全文进行分词分句,输出英语短文全文的切分结果;
P205通过正则表达式匹配停用词集对英语短文的题目进行去停用词处理;
P206通过正则表达式匹配停用词集对英语短文的全文进行去停用词处理;
P207对英语短文的题目进行词干化处理;
P208对英语短文的全文进行词干化处理;
P209对英语短文的题目进行词性标注,输出英语短文的题目词性标注结果;
P210对英语短文的全文进行词性标注,输出英语短文的全文词性标注结果;
P211对英语短文的题目进行关系三元组提取,输出英语短文的题目关系三元组分析结果;
P212对英语短文的全文进行关系三元组提取,输出英语短文的全文关系三元组分析结果;
P213结束。
如图3所示,所述的英语短文层次主题树混合语义空间分析模块处理流程如下:
P301开始;
P302读取英语短文的题目和全文的预处理结果;
P303基于关系三元组层次主题树模型对英语短文题目的关系三元组信息进行主题聚类,输出英语短文题目的主题聚类结果;
P304基于关系三元组层次主题树模型对英语短文全文的关系三元组信息进行主题聚类,输出英语短文的全文主题聚类结果;
P305基于关系三元组层次主题树模型对英语短文各段落的关系三元组信息进行主题聚类,输出英语短文的各段落主题聚类结果;
P306基于关系三元组层次主题树模型对英语短文各句子的关系三元组信息进行主题聚类,输出英语短文的各句子主题聚类结果;
P307读取英语短文的题目主题聚类结果映射到分布式语义空间中生成英语短文的题目主题关系三元组分布式向量;
P308读取英语短文的全文主题聚类结果映射到分布式语义空间中生成英语短文的全文主题关系三元组分布式向量;
P309读取英语短文的各段落主题聚类结果映射到分布式语义空间中生成英语短文的段落主题关系三元组分布式向量;
P310读取英语短文的句子主题聚类结果映射到分布式语义空间中生成英语短文的句子主题关系三元组分布式向量;
P311匹配知识库扩展英语短文的题目主题关系三元组分布式向量,输出英语短文的题目主题关系三元组分布式向量;
P312匹配知识库扩展英语短文的全文主题关系三元组分布式向量,输出英语短文的全文主题关系三元组分布式向量;
P313匹配知识库扩展英语短文的段落主题关系三元组分布式向量,输出英语短文的段落主题关系三元组分布式向量;
P314匹配知识库扩展英语短文的句子主题关系三元组分布式向量,输出英语短文的句子主题关系三元组分布式向量;
P315结束。
如图4所示,所述的英语短文句子层次主题连贯分析模块处理流程如下:
P401开始;
P402读取英语短文的题目、全文、段落、句子的主题关系三元组分布式向量;
P403根据公式(1)计算出英语短文题目与句子层次主题连贯语义相似度,输出英语短文中题目与所有句子层次主题连贯语义相似度;
P404根据公式(2)计算出英语短文段落与句子层次主题连贯语义相似度,输出英语短文中段落与所有句子层次主题连贯语义相似度;
P405根据公式(3)计算出英语短文句子层次主题连贯语义相似度,输出英语短文句子层次主题连贯语义相似度;
P406判断英语短文中是否还有没有分析的句子、段落的层次主题连贯语义相似度,如果有跳转至P403操作,否则跳转至P407操作;
P407读取所有英语短文句子层次主题连贯语义相似度,根据公式(4)计算出英语短文层次主题连贯语义相似度评分值,输出英语短文层次主题连贯语义相似度评分值;
P408根据公式(5)计算出英语短文中句子与段落层次主题连贯值,输出所有的英语短文句子与段落层次主题连贯值;
P409根据所有的英语短文句子与段落层次主题连贯值,设置层次主题连贯阈值抽取英语短文的主题不连贯句子,生成主题不连贯句子集合;
P410根据公式(6)计算出英语短文段落与段落层次主题连贯值,输出所有的英语短文段落与段落的层次主题连贯值;
P411根据公式(7)计算出英语短文段落与全文层次主题连贯值,输出所有的英语短文段落与全文的层次主题连贯值;
P412判断英语短文中是否还有没有分析的句子和段落的层次主题连贯值,如果是跳转至P407操作,否则跳转至P413操作;
P413读取所有的英语短文句子与段落层次主题连贯值,英语短文段落与段落层次主题连贯值,英语短文段落与全文层次主题连贯值,根据公式(8)计算出英语短文层次主题连贯评分均值,输出英语短文层次主题连贯评分均值。
P414结束。
如图5所示,所述的英语短文句子层次主题连贯分析输出模块处理流程如下:
P501开始;
P502读取英语短文句子层次主题连贯分析模块中的英语短文层次主题连贯语义相似度评分值;
P503读取英语短文句子层次主题连贯分析模块中的英语短文层次主题连贯评分均值;
P504根据公式(9)计算出英语短文的主题连贯分数,输出英语短文的主题连贯分数,并生成英语短文的主题连贯评语;
P505结束。
附图说明
图1是本发明方法的总体处理流程图;
图2是本发明方法的英语短文句子预处理模块处理流程图;
图3是本发明方法的英语短文层次主题树混合语义空间分析模块处理流程图;
图4是本发明方法的英语短文句子层次主题连贯分析模块处理流程图;
图5是本发明方法的英语短文句子层次主题连贯分析输出模块处理流程图。
具体实施方式
本发明的一种英语短文句子层次主题连贯分析方法的具体实施方式分为如下四个步骤。
第一步骤:执行“英语短文句子预处理模块”
本发明实施方式中输入的英语短文取材于题目为“Whether it is importantfor college students to have a part time job.”的英语短文,英语短文内容如下:
Nowadays,it is a common phenomenon that college students do part-timejob,even I have tried hard to get a decent part-time job.I strongly agree andadvocate college students to find part-time job.I have the reasons andevidence to support my point.
Finding a part-time job can make college students life more colorfuland enrich their life.Moreover,doing part-time job they can gain experiencewhich is the best way to their future career.They can gain the treasure todeal with various problems.Thirdly,it is a great opportunity to develop theability to widen their work efficiently which will increase the quality oftheir career and life.Fourthly,part-time job can help us earn money.So we canreduce our parents'burden and buy something we like.For example,I even usedsome money from my part-time to buy presents to my mother.Last but not least,acquiring the ability to deal with many problems can also benefit them andthey can make the best of the knowledge they have learned in order to getcomprehensive development.
In a word,getting part-time jobs are our pre-classes of our futurecareer.Sufficient knowledge allow we massively experience is for the mentalpromise to our life.
(1)对学生英语短文题目和英语短文进行词性标注的结果如下所示:
英语短文题目词性标注结果如下:
Whether/IN,it/PRP,is/VBZ,important/JJ,for/IN,college/NN,students/NNS,to/TO,have/VB,a/DT,part/NN,time/NN,job/NN
英语短文内容词性标注结果如下:
Nowadays/RB,it/PRP,is/VBZ,a/DT,common/JJ,phenomenon/NN,that/WDT,college/NN,students/NNS,do/VBP,part-time/JJ,job/NN,even/RB,I/PRP,have/VBP,tried/VBN,hard/JJ,to/TO,get/VB,a/DT,decent/JJ,part-time/JJ,job/NN,I/PRP,strongly/RB,agree/VBP,and/CC,advocate/VBP,college/NN,students/NNS,to/TO,find/VB,part-time/JJ,job/NN,I/PRP,have/VBP,the/DT,reasons/NNS,and/CC,evidence/NN,to/TO,support/VB,my/PRP,point/NN.Finding/VBG,a/DT,part-time/JJ,job/NN,can/MD,make/VB,college/NN,students/NNS,life/NN,more/RBR,colorful/JJ,and/CC,enrich/VB,their/PRP,life/NN.Moreover/RB,doing/VBG,part-time/JJ,job/NN,they/PRP,can/MD,gain/VB,experience/NN,which/WDT,is/VBZ,the/DT,best/JJS,way/NN,to/TO,their/PRP,future/JJ,career/NN.They/PRP,can/MD,gain/VB,the/DT,treasure/NN,to/TO,deal/VB,with/IN,various/JJ,problems/NNS.Thirdly/RB,it/PRP,is/VBZ,a/DT,great/JJ,opportunity/NN,to/TO,develop/VB,the/DT,ability/NN,to/TO,widen/VB,their/PRP,work/NN,efficiently/RB,which/WDT,will/MD,increase/VB,the/DT,quality/NN,of/IN,their/PRP,career/NN,and/CC,life/NN,Fourthly/RB,part-time/JJ,job/NN,can/MD,help/VB,us/PRP,earn/VB,money/NN,So/IN,we/PRP,can/MD,reduce/VB,our/PRP,parents/NNS,burden/NN,and/CC,buy/VB,something/NN,we/PRP,like/VBP,For/IN,example/NN,I/PRP,even/RB,used/VBD,some/DT,money/NN,from/IN,my/PRP,part-time/JJ,to/TO,buy/VB,presents/NNS,to/TO,my/PRP,mother/NN.Last/JJ,but/CC,not/RB,least/JJS,acquiring/VBG,the/DT,ability/NN,to/TO,deal/VB,with/IN,many/JJ,problems/NNS,can/MD,also/RB,benefit/VB,them/PRP,and/CC,they/PRP,can/MD,make/VB,the/DT,best/JJ,Sof/IN,the/DT,knowledge/NN,they/PRP,have/VBP,learned/VBN,in/IN,order/NN,to/TO,get/VB,comprehensive/JJ,development/NN,In/IN,a/DT,word/NN,getting/VBG,part-time/JJ,jobs/NNS,are/VBP,our/PRP,pre-classes/NNS,of/IN,our/PRP,future/JJ,career/NN,Sufficient/JJ,knowledge/NN,allow/VBP,we/PRP,massively/RB,experience/NN,is/VBZ,for/IN,the/DT,mental/JJ,promise/NN,to/TO,our/PRP,life/NN.
(2)对英语短文题目和英语短文进行切分后的结果如下所示:
英语短文段落切分结果如下:
[Nowadays,it is a common phenomenon that college students do part-time job,even I have tried hard to get a decent part-time job.I stronglyagree and advocate college students to find part-time job.I have the reasonsand evidence to support my point.]-1
[Finding a part-time job can make college students life more colorfuland enrich their life.Moreover,doing part-time job they can gain experiencewhich is the best way to their future career.They can gain the treasure todeal with various problems.Thirdly,it is a great opportunity to develop theability to widen their work efficiently which will increase the quality oftheir career and life.Fourthly,part-time job can help us earn money.So we canreduce our parents'burden and buy something we like.For example,I even usedsome money from my part-time to buy presents to my mother.Last but not least,acquiring the ability to deal with many problems can also benefit them andthey can make the best of the knowledge they have learned in order to getcomprehensive development.]-2
[In a word,getting part-time jobs are our pre-classes of our futurecareer.Sufficient knowledge allow we massively experience is for the mentalpromise to our life.]-3
英语短文句子切分结果如下:
[[nowadays,,,it,be,a,common,phenomenon,that,college,student,do,part-time,job,,,even,I,have,try,hard,to,get,a,decent,part-time,job,.]-1,[I,strongly,agree,and,advocate,college,student,to,find,part-time,job,.]-2,[I,have,the,reason,and,evidence,to,support,my,point,.]-3,[find,a,part-time,job,can,make,college,student,life,more,colorful,and,enrich,they,life,.]-4,[moreover,,,do,part-time,job,they,can,gain,experience,which,be,the,best,way,to,they,future,career,.]-5,[they,can,gain,the,treasure,to,deal,with,various,problem,.]-6,[thirdly,,,it,be,a,great,opportunity,to,develop,the,ability,to,widen,they,work,efficiently,which,will,increase,the,quality,of,they,career,and,life,.]-7,[fourthly,,,part-time,job,can,help,we,earn,money,.]-8,[so,we,can,reduce,we,parent,',burden,and,buy,something,we,like,.]-9,[for,example,,,I,even,use,some,money,from,my,part-time,to,buy,present,to,my,mother,.]-10,[last,but,not,least,,,acquire,the,ability,to,deal,with,many,problem,can,also,benefit,they,and,they,can,make,the,best,of,the,knowledge,they,have,learn,in,order,to,get,comprehensive,development,.]-11,[in,a,word,,,get,part-time,job,be,we,pre-class,of,we,future,career,.]-12,[sufficient,knowledge,allow,we,massively,experience,be,for,the,mental,promise,to,we,life,.]-13]
(3)对英语短文题目和英语短文进行关系三元组提取后的结果如下所示:
英语短文题目关系三元组提取结果如下:
[college student_have_part time job],[it_be_important]
英语短文关系三元组提取结果如下:
段落1
[[college student_do_part-time job],
[college student_do_job],
[it_be_common],
[I_have try_hard],
[I_advocate_college student],
[I_have_evidence]]
段落2
[[college student_enrich_they life],
[they_can gain_treasure],
[treasure_deal with_various problem],
[treasure_deal with_problem],
[it_be_great],
[we_earn_money],
[we_so can reduce_we parent'burden],
[we_can reduce_we parent'burden],
[I_use from_my part-time],
[I_even use money from_my part-time],
[I_use for_example],
[I_even use_money],
[I_even use from_my part-time],
[I_use_money],
[I_use money for_example],
[I_buy present to_my mother],
[I_use money from_my part-time],
[I_even use money for_example],
[I_buy_present],
[I_even use for_example],
[they_get_development],
[they_get_comprehensive development],
[they_can make_best]]
段落3
[[we pre-class_be in_word],
[we_be for_promise],
[we_massively experience for_promise to we life],
[we_experience for_promise],
[we_be for_promise to we life],
[we_massively experience for_promise],
[we_experience for_mental promise],
[we_be for_mental promise to we life],
[knowledge_allow_we massively experience],
[we_massively experience for_mental promise to we life],
[we_experience for_mental promise to we life],
[sufficient knowledge_allow_we massively experience],
[we_be for_mental promise],
[we_massively experience for_mental promise],
[knowledge_allow_we experience],
[we_experience for_promise to we life],
[sufficient knowledge_allow_we experience]]。
第二步骤:执行“英语短文层次主题树混合语义空间分析模块”
英语短文层次主题树混合语义空间分析模块是利用第一步骤英语短文句子预处理模块生成的英语短文题目和英语短文切分结果和关系三元组提取结果,对英语短文题目、全文、段落和句子的关系三元组分别进行主题聚类,然后映射到分布式向量空间中生成英语短文的题目主题关系三元组分布式向量、全文主题关系三元组分布式向量、段落主题关系三元组分布式向量、句子主题关系三元组分布式向量,并匹配知识库中语义概念抽取关系三元组,扩展英语短文全文、段落、句子和题目的主题关系三元组分布式向量。本实施方式的英语短文层次主题树混合语义空间分析结果如下所示:
300维英语短文题目主题关系三元组分布式向量如下:
[0.79,0.67,-0.19,-0.62,-0.29,0.37,0.98,-0.23,-0.73,0.26,0.08,0.21,-0.06,-0.13,0.43,0.12,0.00,-0.02,0.55,-0.05,0.20,0.39,-0.01,-0.27,0.05,0.06,-0.09,0.49,0.01,0.10,-0.05,-0.15,0.13,0.01,-0.25,-0.03,-0.19,0.04,0.23,0.08,-0.24,0.05,-0.15,-0.04,0.23,-0.29,-0.12,-0.10,0.37,0.09,0.11,-0.26,-0.07,-0.01,0.10,0.51,0.10,-0.13,-0.07,0.25,0.05,-0.09,-0.18,0.11,-0.30,0.23,-0.11,0.20,-0.19,0.15,-0.14,-0.05,-0.28,-0.30,0.51,-0.35,0.07,-0.06,0.07,0.03,-0.26,0.34,-0.15,-0.09,0.04,0.16,0.32,0.05,-0.05,-0.15,0.30,-0.12,0.14,0.02,-0.08,0.35,0.05,0.16,-0.10,0.01,0.25,-0.09,0.32,0.16,-0.17,0.04,-0.09,-0.25,0.13,0.27,-0.14,-0.23,-0.22,0.25,-0.16,-0.17,0.21,0.37,0.22,0.06,0.07,-0.15,0.10,-0.37,0.07,-0.09,0.17,0.11,-0.16,-0.04,-0.04,-0.32,0.24,-0.06,-0.13,-0.38,-0.09,0.16,-0.01,-0.08,0.06,0.22,-0.16,0.02,0.01,0.13,-0.06,-0.14,-0.14,-0.41,0.10,0.16,0.00,0.41,0.36,-0.12,-0.02,-0.08,0.08,0.04,0.03,0.13,-0.12,-0.09,0.01,0.13,-0.05,0.12,0.17,0.09,0.23,-0.07,-0.04,0.03,-0.08,0.10,-0.32,0.05,0.08,0.16,0.05,0.01,-0.11,0.10,0.19,-0.09,0.02,0.19,-0.36,0.08,0.14,0.18,0.11,-0.27,0.14,0.23,0.07,0.09,0.04,0.16,0.02,0.06,0.03,-0.30,-0.09,-0.26,0.03,-0.11,0.09,0.20,0.16,-0.05,0.02,-0.06,-0.15,-0.16,-0.08,0.05,-0.12,-0.03,0.04,-0.13,0.06,0.14,-0.12,0.07,-0.09,-0.05,-0.04,0.01,-0.17,-0.22,-0.09,-0.01,0.06,-0.14,-0.19,0.09,0.01,-0.01,-0.16,-0.14,0.06,-0.06,0.10,0.13,0.19,0.05,-0.08,0.21,0.13,-0.14,-0.07,0.05,0.02,-0.08,-0.06,0.00,-0.20,0.05,0.07,-0.08,-0.01,0.03,-0.10,-0.04,-0.05,0.13,0.09,0.17,0.08,-0.26,-0.18,0.20,0.05,-0.23,-0.20,0.02,0.05,-0.28,0.16,-0.17,0.03,0.17,-0.18,-0.04,0.00,-0.10,-0.00,-0.13,0.15,-0.03,0.09,0.01,-0.08,0.09,-0.02,-0.09,0.03,-0.22]
300维英语短文全文主题关系三元组分布式向量如下:
[0.64,0.52,-0.16,-0.57,-0.29,0.42,0.83,-0.17,-0.66,0.25,-0.03,0.25,-0.01,-0.15,0.34,0.27,-0.09,-0.07,0.44,-0.06,0.20,0.41,-0.05,-0.36,0.05,0.09,0.01,0.49,0.05,0.09,-0.00,-0.12,0.24,-0.14,-0.13,-0.06,-0.23,-0.02,0.14,0.12,-0.29,0.00,-0.06,-0.06,0.16,-0.27,-0.20,-0.20,0.26,0.11,0.12,-0.28,-0.09,-0.06,0.12,0.55,0.01,-0.13,-0.13,0.17,0.14,-0.15,-0.21,0.02,-0.22,0.24,-0.14,0.16,-0.21,0.14,-0.10,-0.07,-0.38,-0.24,0.39,-0.26,0.11,-0.16,0.04,0.03,-0.25,0.30,-0.18,-0.12,0.11,0.13,0.29,0.06,0.01,-0.15,0.30,-0.03,0.16,-0.00,-0.10,0.38,0.16,0.10,-0.07,-0.02,0.20,-0.12,0.36,0.11,-0.17,0.14,-0.10,-0.20,0.14,0.22,-0.17,-0.23,-0.20,0.13,-0.21,-0.17,0.18,0.33,0.15,0.09,-0.01,-0.07,0.03,-0.27,0.08,-0.09,0.12,0.05,-0.14,-0.03,-0.07,-0.27,0.16,-0.04,-0.16,-0.32,-0.03,0.09,0.01,-0.11,-0.01,0.08,-0.16,-0.01,0.01,0.13,-0.13,-0.18,-0.21,-0.36,0.08,0.15,0.05,0.40,0.30,-0.03,0.01,-0.02,0.09,0.10,-0.03,0.15,-0.16,-0.19,0.01,0.10,-0.02,0.15,0.14,-0.01,0.18,-0.12,-0.13,0.02,-0.06,0.09,-0.33,-0.01,0.04,0.14,-0.01,0.05,-0.11,0.02,0.23,-0.15,0.00,0.14,-0.23,0.18,0.17,0.09,0.09,-0.29,0.14,0.30,0.04,0.15,0.04,0.10,-0.02,0.08,0.02,-0.21,-0.17,-0.16,-0.04,-0.09,0.04,0.11,0.13,-0.06,-0.00,0.03,-0.17,-0.07,-0.08,0.04,-0.04,0.00,-0.07,-0.21,0.08,0.10,-0.12,0.03,-0.13,-0.05,-0.05,-0.11,-0.16,-0.21,-0.08,-0.10,-0.02,-0.11,-0.17,0.04,-0.04,-0.07,-0.12,-0.06,0.07,-0.05,0.12,0.08,0.17,0.02,0.03,0.15,0.08,-0.11,-0.14,0.06,0.06,-0.10,-0.06,-0.06,-0.20,-0.05,0.02,-0.06,0.02,0.05,-0.02,-0.09,-0.06,0.04,0.10,0.17,0.08,-0.23,-0.13,0.17,0.01,-0.20,-0.18,0.01,0.11,-0.20,0.20,-0.11,0.13,0.12,-0.11,-0.01,-0.02,-0.10,-0.01,-0.14,0.04,-0.07,0.07,-0.00,-0.07,0.19,0.04,-0.04,0.00,-0.20]
300维英语短文段落主题关系三元组分布式向量如下:
段落1[[3.53,2.45,-0.32,-2.29,-0.86,0.54,2.72,-0.15,-2.24,0.84,0.33,0.01,-0.25,-0.76,1.22,0.29,-0.25,-0.03,1.47,-0.22,0.98,0.96,0.05,-0.62,0.49,-0.15,-0.30,0.86,-0.18,-0.23,-0.30,-0.12,-0.33,0.29,-0.48,0.39,-0.43,0.14,0.29,0.33,-0.49,-0.11,-0.35,-0.15,0.73,-1.09,-0.57,-0.16,1.03,0.23,0.76,-0.02,-0.40,-0.37,0.36,1.18,-0.13,0.36,-0.30,0.32,-0.04,-0.41,-0.13,0.56,-0.88,0.22,-0.02,0.25,-0.63,0.18,-0.46,-0.39,-0.56,-0.58,1.19,-0.55,-0.02,0.16,0.07,0.02,-0.81,0.94,-0.45,-0.65,0.54,0.46,0.75,-0.28,-0.67,-1.11,0.91,-0.18,-0.04,-0.26,-0.23,0.37,-0.29,-0.02,-0.38,-0.66,-0.02,-0.62,0.12,0.36,-0.47,-0.07,-0.01,-0.24,0.07,0.46,-0.13,-0.42,-0.61,0.34,-0.62,-0.51,0.68,0.65,0.35,0.39,-0.00,-0.23,0.22,-1.08,0.23,0.13,0.16,0.25,-0.44,0.05,0.18,-0.57,-0.01,0.03,-0.83,-0.90,0.26,-0.03,-0.35,0.08,-0.47,0.25,-0.79,-0.13,0.12,0.84,-0.49,-0.62,-0.76,-0.72,0.01,0.24,0.51,0.63,0.70,-0.11,-0.07,-0.15,-0.03,-0.24,0.21,0.25,0.18,-0.44,0.49,0.35,-0.16,0.30,-0.61,0.08,0.47,-0.38,-0.24,-0.29,-0.18,0.22,-1.45,0.05,-0.16,0.84,0.19,-0.04,-0.46,-0.32,0.10,-0.25,-0.24,0.43,-0.14,0.01,0.62,0.51,0.18,-0.48,0.44,0.81,0.17,0.54,-0.28,0.07,0.09,0.44,0.33,-0.29,-0.76,-0.47,0.00,-0.31,0.14,-0.18,-0.10,0.06,-0.45,0.14,-0.12,0.11,-0.25,0.53,-0.31,-0.16,0.05,-1.05,0.24,0.04,-0.27,0.42,-0.38,0.14,-0.23,-0.44,-0.25,-1.11,0.08,-0.52,0.14,0.09,-0.72,-0.27,-0.12,-0.21,-0.26,0.22,0.34,0.03,0.31,-0.07,0.21,-0.18,0.23,0.24,0.46,0.29,-0.80,0.25,0.32,-0.35,-0.37,-0.17,-0.58,-0.13,0.23,-0.53,0.08,0.21,-0.15,-0.27,-0.01,0.19,0.02,0.34,0.23,-0.62,-0.29,0.91,-0.07,-0.60,-0.45,-0.37,0.18,-0.20,0.09,-0.31,0.47,0.05,-0.20,-0.14,-0.15,-0.29,0.06,-0.34,0.06,-0.39,-0.01,0.16,-0.39,0.35,-0.09,0.07,-0.14,-0.23]]
段落2[[3.26,2.25,-0.03,-1.37,-1.26,-0.17,2.56,-0.22,-1.58,0.79,0.94,-0.18,0.07,-0.27,1.38,0.04,-0.51,-0.45,-0.43,0.36,0.16,-0.25,-0.43,-1.35,0.44,1.05,0.17,-0.33,0.60,-0.42,-0.23,-0.30,-0.20,-0.57,-0.58,-0.86,-0.58,0.08,0.34,0.19,-0.20,0.44,-0.26,-0.05,0.59,-1.43,-0.81,-0.67,0.87,-0.47,0.28,-0.41,-0.66,0.25,0.76,0.84,-0.34,0.48,-0.32,0.22,0.22,0.13,-0.29,0.50,-1.17,0.38,0.18,-0.07,-0.75,0.58,0.16,-0.12,-0.42,-0.07,0.18,-1.22,0.06,0.26,-0.56,0.36,-0.26,0.67,-0.65,-0.25,1.31,0.31,-0.11,0.45,0.17,-1.23,0.59,0.08,0.01,0.05,-0.01,0.21,0.09,0.24,0.33,-0.42,0.01,-0.03,-0.03,0.01,-0.45,-0.00,0.43,0.13,-0.43,0.47,0.31,-1.24,-0.39,0.71,-0.27,-0.52,0.85,0.85,0.43,-0.08,0.10,0.08,0.11,-0.24,0.63,-0.39,0.40,0.03,-0.25,-0.70,-0.13,-0.60,-0.34,-0.34,-0.40,-0.58,0.62,-0.18,-0.29,-0.11,-0.61,0.05,-0.57,0.66,-0.58,0.83,-0.23,0.18,-0.71,-0.20,-0.05,0.41,0.46,0.18,0.41,0.19,-0.12,0.12,0.02,0.24,-0.30,0.25,0.02,0.25,0.32,0.71,-0.43,0.41,-0.14,0.38,0.59,-0.27,0.50,0.28,0.14,-0.19,-1.00,-0.13,0.34,0.40,0.37,0.34,-0.27,-0.22,0.08,-0.57,0.14,-0.03,0.01,0.19,-0.28,0.50,-0.40,-0.39,0.05,0.30,-0.05,0.23,-0.35,0.62,-0.09,0.85,0.27,-0.06,-0.31,-0.02,-0.06,-0.10,0.41,0.06,0.04,0.40,-0.05,0.21,-0.07,-0.06,-0.08,-0.15,-0.34,-0.68,-0.34,-0.68,0.13,0.61,-0.17,0.31,-0.02,0.10,-0.31,-0.44,0.07,-0.51,0.14,-0.48,0.03,-0.23,-0.39,0.26,0.07,0.11,-0.22,-0.18,0.26,-0.45,0.22,-0.09,0.08,-0.33,0.45,0.12,0.10,0.43,-0.22,-0.11,0.05,0.16,-0.64,-0.19,-0.71,0.10,0.42,-0.25,0.35,0.03,-0.53,-0.68,-0.13,-0.09,0.36,0.34,0.34,-0.36,0.08,0.28,0.11,-0.35,-0.35,-0.16,-0.36,-0.06,0.28,-0.35,0.49,0.31,-0.08,0.04,-0.29,-0.11,0.24,0.11,-0.03,-0.08,0.04,-0.23,0.16,0.27,-0.21,-0.19,-0.02,-0.06]]
段落3[[1.08,0.89,-0.24,-0.47,-0.57,0.31,1.31,-0.16,-0.91,0.55,0.14,0.24,-0.05,-0.17,0.40,0.15,0.04,-0.16,0.19,0.05,0.09,0.09,-0.07,-0.56,-0.12,0.52,-0.23,0.53,0.36,-0.22,0.11,-0.21,-0.26,-0.26,-0.55,-0.14,-0.29,0.29,0.17,0.28,-0.15,0.05,-0.41,-0.09,0.01,-0.18,-0.34,-0.17,0.70,-0.46,0.42,-0.15,-0.44,0.10,0.56,0.77,-0.03,-0.09,0.16,0.18,0.05,0.21,-0.11,0.27,-0.53,0.52,0.14,0.29,-0.08,0.24,0.11,0.29,-0.15,-0.21,0.26,-0.24,-0.03,-0.29,-0.12,0.18,-0.28,0.45,-0.09,-0.04,-0.00,0.06,0.24,0.15,0.14,-0.29,-0.12,0.17,0.37,0.28,-0.10,-0.08,-0.08,0.17,0.22,0.26,-0.11,0.09,0.17,0.14,-0.17,-0.15,0.10,-0.12,-0.17,-0.05,0.10,-0.73,-0.26,-0.01,-0.10,0.08,0.37,0.66,-0.06,-0.15,0.37,-0.15,-0.25,0.24,0.10,-0.17,0.09,0.00,-0.07,-0.42,-0.05,-0.28,-0.40,-0.06,-0.18,-0.70,0.04,-0.21,-0.03,0.12,0.10,-0.21,-0.12,0.11,-0.24,0.42,-0.11,0.05,-0.12,-0.06,0.07,0.05,0.07,0.38,0.45,0.27,0.20,-0.19,0.05,0.08,0.12,-0.14,-0.18,-0.25,0.07,0.29,-0.16,0.18,-0.12,0.04,-0.09,0.05,-0.28,0.11,0.30,-0.15,-0.29,-0.14,0.18,0.02,0.12,0.09,-0.44,0.23,0.15,-0.28,0.17,0.06,-0.04,0.08,0.09,0.34,-0.28,0.14,-0.01,-0.03,0.12,0.16,-0.17,0.00,-0.30,0.24,-0.01,0.14,-0.12,0.10,0.22,-0.13,0.09,0.08,0.44,0.03,0.18,0.30,-0.12,-0.11,-0.15,0.33,-0.18,-0.24,-0.30,0.02,0.21,0.34,-0.05,-0.01,-0.21,0.02,-0.18,-0.44,0.08,0.01,0.12,-0.17,-0.16,-0.01,-0.31,0.08,-0.22,0.07,-0.19,-0.13,0.22,-0.39,0.00,0.02,0.00,-0.09,0.05,-0.09,-0.17,0.10,-0.50,-0.04,0.17,-0.08,0.13,-0.06,-0.35,-0.10,-0.10,0.03,0.10,-0.08,-0.04,-0.17,0.01,0.00,0.09,0.06,0.07,-0.25,0.26,0.22,-0.12,-0.03,0.04,0.17,-0.08,0.32,-0.06,-0.20,-0.11,0.18,-0.36,0.14,0.07,0.13,-0.11,-0.30,-0.19,-0.09,-0.06,-0.07,-0.18,0.01,0.25,-0.21,0.17,-0.25]]
300维英语短文句子主题关系三元组分布式向量如下:
英语短文第1句主题关系三元组分布式向量如下:
[2.23,1.73,0.09,-1.11,-0.35,0.17,1.77,-0.34,-1.33,1.08,0.26,0.10,0.27,-0.46,0.98,0.15,0.39,-0.10,0.84,0.49,0.31,0.69,0.12,-0.62,0.44,0.49,-0.40,0.86,-0.07,0.30,0.07,0.05,0.17,0.09,-0.60,0.23,-0.38,0.11,0.63,0.26,-0.35,-0.25,-0.71,-0.10,0.64,-0.60,-0.41,-0.15,0.58,-0.20,0.45,0.07,-0.44,-0.17,0.25,0.52,-0.18,-0.24,-0.22,0.17,-0.21,-0.07,-0.34,0.60,-0.70,0.21,-0.11,-0.09,-0.44,0.23,0.15,-0.04,-0.23,-0.19,0.86,-0.37,0.07,0.14,-0.15,-0.07,-0.45,0.56,-0.63,-0.24,0.27,0.27,0.86,-0.02,-0.34,-0.81,0.64,-0.01,-0.07,-0.09,0.01,0.04,-0.10,0.19,-0.16,-0.26,0.38,-0.16,-0.09,0.06,-0.09,-0.37,-0.04,-0.19,0.11,-0.00,-0.02,-0.22,-0.23,0.52,-0.35,-0.32,0.28,0.41,0.23,0.27,-0.16,-0.09,0.00,-0.37,0.22,0.15,0.24,0.16,-0.16,-0.10,0.32,0.04,0.23,0.25,-0.67,-0.67,0.18,-0.15,-0.18,-0.11,-0.34,0.17,-0.51,0.09,-0.15,0.35,-0.25,-0.37,-0.24,-0.33,0.07,0.10,0.23,0.36,0.38,-0.03,-0.13,-0.26,0.06,0.04,-0.21,-0.13,0.03,-0.34,0.42,0.34,-0.07,0.19,-0.25,-0.01,0.21,-0.29,-0.16,-0.05,0.08,0.08,-0.88,0.26,-0.08,0.49,0.21,0.25,-0.35,-0.17,-0.09,-0.26,0.04,0.22,-0.16,0.09,0.25,0.35,0.08,-0.45,0.09,0.59,0.05,0.31,-0.01,-0.00,-0.05,0.51,0.21,-0.15,-0.56,-0.43,-0.11,-0.01,0.12,0.09,-0.15,-0.13,-0.04,-0.03,0.07,0.19,-0.05,0.21,-0.16,-0.20,-0.03,-0.53,0.22,0.33,-0.13,0.03,-0.19,-0.09,-0.13,-0.18,-0.00,-0.62,0.01,-0.58,0.04,0.01,-0.41,-0.03,-0.32,-0.16,-0.26,0.02,0.35,0.02,0.41,-0.06,-0.14,-0.05,-0.03,0.11,0.06,0.33,-0.45,-0.03,0.10,0.07,-0.22,-0.16,-0.22,-0.01,0.07,-0.18,0.10,0.10,-0.05,-0.15,0.02,0.24,0.02,0.43,0.14,-0.14,-0.21,0.52,-0.02,-0.16,-0.09,-0.22,0.12,0.09,-0.06,-0.36,0.28,0.20,-0.20,-0.07,-0.13,-0.27,0.13,-0.24,0.14,-0.18,-0.15,0.03,-0.15,0.25,-0.26,0.06,-0.10,-0.19]
英语短文第2句主题关系三元组分布式向量如下:
[1.12,0.91,-0.30,-0.67,-0.27,0.16,1.14,-0.09,-0.73,0.19,0.17,-0.04,-0.27,-0.15,0.47,0.13,-0.43,-0.07,0.52,-0.03,0.41,0.57,-0.13,-0.13,0.19,-0.29,-0.02,0.37,-0.10,-0.04,0.04,-0.10,-0.12,0.23,-0.09,0.21,-0.13,-0.03,0.11,0.02,-0.33,-0.04,-0.00,-0.14,0.13,-0.43,-0.12,-0.05,0.23,0.12,0.27,-0.10,-0.09,-0.17,0.01,0.40,0.14,0.29,0.14,0.16,-0.01,-0.24,-0.08,0.12,-0.34,0.20,-0.16,0.24,-0.16,0.06,-0.23,-0.10,-0.18,-0.21,0.54,-0.11,0.10,0.08,0.12,0.09,-0.31,0.36,-0.03,-0.24,0.24,0.11,0.06,-0.02,-0.22,-0.14,0.17,-0.15,-0.05,-0.05,-0.21,0.23,-0.11,0.09,-0.23,-0.14,-0.10,-0.26,0.11,0.27,-0.02,0.06,-0.06,-0.21,0.07,0.08,-0.05,0.10,-0.11,0.03,-0.21,-0.18,0.34,0.30,0.01,0.26,0.12,-0.12,0.06,-0.53,0.02,0.02,0.02,-0.00,-0.25,0.03,0.12,-0.18,0.06,-0.15,-0.31,-0.35,-0.03,0.06,-0.20,0.07,-0.18,0.20,-0.13,-0.13,0.24,0.26,-0.17,-0.33,-0.38,-0.14,0.02,-0.08,0.04,0.23,0.21,0.04,0.05,-0.10,-0.10,-0.22,0.04,0.09,0.19,-0.32,0.13,0.16,-0.04,0.20,-0.13,-0.06,0.30,0.00,-0.14,-0.07,-0.19,0.11,-0.37,-0.05,0.06,0.27,0.04,-0.10,-0.07,-0.10,0.14,-0.11,-0.20,0.19,-0.02,-0.04,0.27,0.14,0.19,-0.04,0.31,0.31,0.28,0.23,-0.02,-0.12,-0.08,0.17,-0.05,-0.16,-0.26,-0.09,-0.03,-0.21,0.03,0.07,0.06,0.03,-0.16,0.06,-0.01,0.11,-0.13,0.13,-0.06,-0.06,0.06,-0.37,0.07,-0.08,-0.08,0.12,-0.09,-0.02,-0.19,-0.10,-0.06,-0.37,-0.03,-0.18,-0.03,-0.01,-0.26,-0.16,-0.04,-0.06,-0.04,0.16,0.09,0.05,0.05,-0.13,0.18,0.02,0.01,0.17,0.14,0.03,-0.15,0.04,0.21,-0.08,0.02,-0.02,-0.31,-0.01,-0.04,-0.24,-0.03,0.01,-0.03,-0.19,0.01,-0.10,-0.15,0.09,0.03,-0.21,-0.00,0.15,-0.17,-0.27,-0.21,-0.10,0.00,-0.06,0.12,-0.07,0.15,-0.02,-0.10,-0.06,-0.05,-0.24,0.06,-0.19,0.04,-0.06,0.05,0.06,-0.11,0.23,0.10,0.17,-0.05,-0.17]
英语短文第3句主题关系三元组分布式向量如下:
[0.22,0.14,0.02,-0.28,-0.01,0.06,0.14,-0.26,-0.14,-0.14,0.11,-0.23,-0.06,-0.15,-0.18,0.10,0.03,0.11,0.00,0.11,0.13,-0.07,-0.16,0.07,0.18,0.32,-0.14,-0.29,-0.03,0.02,0.18,0.05,0.05,-0.05,0.05,0.08,-0.05,-0.10,0.00,-0.05,-0.05,-0.20,0.08,0.15,-0.06,0.07,-0.05,-0.02,-0.09,-0.01,-0.12,-0.11,-0.01,-0.02,0.05,0.05,-0.10,0.06,0.04,0.03,-0.13,0.02,-0.05,0.00,-0.04,0.03,0.14,-0.06,0.02,-0.02,0.00,0.09,0.16,-0.07,0.01,-0.04,0.13,-0.01,-0.13,0.00,0.02,-0.15,0.04,-0.08,0.06,-0.07,0.06,0.12,-0.00,0.05,-0.12,-0.01,-0.09,-0.03,-0.15,-0.11,-0.09,0.11,-0.10,-0.14,-0.06,0.11,-0.01,-0.07,0.05,0.07,0.20,0.02,-0.08,-0.04,0.16,-0.02,-0.04,-0.05,-0.02,-0.00,0.01,0.03,-0.05,0.03,0.05,0.17,-0.04,-0.22,0.15,0.08,0.08,-0.05,-0.11,0.09,0.06,-0.10,0.05,0.01,-0.01,-0.08,0.16,0.13,-0.01,0.07,-0.12,0.05,-0.10,0.09,-0.04,-0.03,0.01,-0.01,-0.01,-0.01,-0.01,-0.03,0.02,0.02,-0.05,-0.04,0.08,0.04,-0.22,0.09,-0.09,-0.09,0.06,-0.04,0.12,-0.02,0.03,0.06,-0.02,0.04,0.02,0.07,-0.15,-0.09,-0.02,0.02,0.02,-0.06,-0.07,-0.08,-0.03,0.06,0.03,0.05,-0.12,-0.05,0.07,0.10,-0.10,0.04,0.09,-0.04,-0.07,-0.08,0.03,-0.04,0.01,0.02,0.09,-0.07,-0.06,-0.00,0.03,0.02,0.09,0.05,0.00,-0.04,0.07,0.02,-0.06,-0.02,-0.08,0.04,0.07,0.02,0.13,0.02,-0.13,0.03,0.05,0.01,-0.00,-0.10,-0.02,0.10,0.07,0.03,0.06,0.02,-0.07,0.00,0.01,0.02,-0.02,-0.07,0.00,-0.03,-0.04,-0.01,-0.01,0.07,-0.07,-0.04,-0.09,0.05,0.02,-0.08,-0.09,0.02,-0.00,-0.01,-0.12,-0.03,0.04,0.01,0.01,0.06,-0.02,-0.07,0.08,0.05,-0.02,0.03,-0.03,0.01,-0.09,-0.01,-0.05,0.05,-0.03,-0.07,-0.16,0.07,-0.06,0.01,0.07,0.04,-0.02,0.09,-0.11,-0.05,0.06,-0.05,-0.04,0.03,-0.01,-0.09,-0.02,0.00,0.06,0.02,-0.00,-0.03,0.04,-0.06,-0.06,0.07,0.02,0.13]
……
英语短文第12句主题关系三元组分布式向量如下:
[1.20,1.01,-0.28,-0.63,-0.44,0.45,1.35,-0.36,-0.98,0.53,0.20,0.11,-0.14,-0.25,0.19,0.09,0.21,-0.10,0.33,-0.02,0.17,0.08,-0.07,-0.38,-0.08,0.68,-0.20,0.45,0.37,-0.22,0.18,-0.27,-0.35,-0.37,-0.57,-0.18,-0.32,0.27,0.16,0.27,-0.14,-0.01,-0.42,-0.02,-0.03,-0.15,-0.34,-0.29,0.62,-0.44,0.35,-0.19,-0.48,0.01,0.72,0.71,-0.06,0.00,0.11,0.27,-0.03,0.25,-0.13,0.24,-0.57,0.51,0.18,0.24,-0.05,0.17,0.14,0.41,-0.11,-0.19,0.22,-0.29,0.06,-0.31,-0.12,0.13,-0.22,0.41,-0.14,-0.04,0.02,0.08,0.25,0.16,0.11,-0.27,-0.23,0.02,0.35,0.25,-0.17,-0.06,-0.15,0.11,0.26,0.19,-0.04,0.07,0.14,0.18,-0.28,-0.16,0.11,-0.21,-0.22,0.03,0.07,-0.75,-0.34,0.08,-0.13,0.13,0.34,0.73,-0.14,-0.20,0.41,-0.10,-0.15,0.16,0.17,-0.21,0.12,-0.01,-0.14,-0.42,-0.13,-0.24,-0.37,-0.09,-0.20,-0.72,0.12,-0.21,-0.03,0.15,0.07,-0.16,-0.13,0.12,-0.25,0.46,-0.11,-0.02,-0.09,-0.06,0.08,0.04,0.15,0.36,0.43,0.24,0.20,-0.21,0.02,0.05,0.10,-0.16,-0.11,-0.19,0.14,0.21,-0.18,0.21,-0.19,0.07,-0.09,0.09,-0.25,0.12,0.20,-0.19,-0.31,-0.18,0.16,-0.07,0.09,0.03,-0.57,0.22,0.20,-0.33,0.13,-0.00,-0.04,0.04,0.08,0.33,-0.20,0.12,-0.00,0.02,0.13,0.13,-0.16,-0.03,-0.33,0.25,-0.02,0.16,-0.07,0.09,0.25,-0.15,0.06,0.09,0.45,-0.04,0.26,0.31,-0.13,-0.16,-0.15,0.29,-0.20,-0.28,-0.30,0.02,0.28,0.37,0.02,0.05,-0.16,0.01,-0.14,-0.46,0.08,0.04,0.11,-0.20,-0.11,0.02,-0.26,0.09,-0.22,0.05,-0.17,-0.12,0.18,-0.36,-0.01,0.04,0.01,-0.12,0.04,-0.11,-0.17,0.04,-0.52,-0.05,0.14,-0.12,0.11,-0.02,-0.42,-0.07,-0.14,0.03,0.12,-0.01,-0.05,-0.17,0.06,-0.08,0.05,0.03,0.10,-0.28,0.22,0.20,-0.23,-0.11,0.05,0.19,0.01,0.29,-0.13,-0.28,-0.07,0.19,-0.29,0.13,0.05,0.13,-0.07,-0.35,-0.20,-0.17,-0.04,-0.11,-0.16,-0.05,0.23,-0.18,0.22,-0.22]
英语短文第13句主题关系三元组分布式向量如下:
[1.50,0.90,-0.26,-0.90,0.21,-0.16,1.17,0.12,-0.40,0.57,0.51,0.12,0.11,-0.01,0.68,-0.10,-0.40,-0.04,-0.05,0.15,-0.28,-0.35,-0.26,0.10,0.46,0.12,-0.13,-0.37,-0.26,-0.17,-0.03,-0.06,-0.34,-0.48,-0.59,0.18,-0.03,-0.05,0.15,0.31,0.57,0.17,0.21,0.04,0.05,-0.17,-0.04,0.16,0.61,0.18,0.14,0.02,-0.34,-0.06,0.36,-0.05,0.01,-0.08,0.20,0.04,0.23,-0.07,-0.03,0.42,-0.23,0.02,0.31,0.14,-0.08,0.09,0.12,0.33,0.31,0.05,0.25,-0.28,0.00,0.20,-0.13,0.59,-0.07,0.22,-0.30,-0.39,0.10,0.01,-0.09,0.35,0.06,-0.14,-0.36,-0.00,0.15,0.15,-0.20,0.14,0.05,-0.09,0.05,-0.14,-0.12,0.00,0.12,0.16,-0.23,0.34,0.21,0.43,-0.26,-0.06,0.26,-0.65,0.02,0.09,0.05,0.29,-0.06,0.50,0.12,0.01,0.35,-0.19,-0.08,0.14,0.21,0.02,-0.15,0.03,0.08,-0.42,-0.04,0.02,-0.22,-0.11,0.13,-0.32,0.06,0.19,-0.00,0.25,-0.03,-0.10,0.07,-0.22,-0.05,0.26,0.06,-0.01,-0.28,0.03,-0.17,-0.17,0.05,0.28,0.03,0.18,0.46,-0.14,-0.21,0.22,-0.00,0.01,0.14,0.09,0.07,0.15,-0.37,-0.01,-0.20,0.04,0.34,0.07,0.05,-0.11,0.01,0.31,-0.06,0.18,0.44,0.05,0.01,0.00,0.03,0.16,-0.07,0.00,0.17,-0.19,0.11,0.01,-0.28,0.10,-0.13,0.05,-0.38,-0.15,0.08,0.11,-0.02,-0.31,-0.22,0.24,-0.01,0.17,0.25,0.19,0.20,0.21,-0.04,-0.16,0.16,0.28,0.36,0.16,0.06,-0.47,-0.16,-0.22,0.25,-0.28,-0.02,0.11,-0.11,-0.02,0.22,0.15,-0.13,0.09,-0.01,-0.19,0.50,0.20,0.18,-0.00,-0.31,-0.12,0.04,0.10,0.02,0.24,-0.04,0.01,-0.01,-0.17,-0.05,0.10,0.15,0.04,0.30,-0.03,0.05,0.07,-0.07,-0.08,0.04,0.06,-0.19,-0.05,-0.26,0.00,0.24,-0.18,0.02,-0.10,-0.30,-0.51,-0.12,-0.03,0.18,-0.35,0.07,-0.10,0.31,0.02,-0.28,-0.30,-0.08,0.02,0.04,0.10,-0.24,0.26,-0.23,0.00,0.02,0.01,-0.15,0.17,0.06,0.25,0.07,0.06,-0.00,0.02,-0.10,0.15,0.09,-0.23,0.16,-0.06]。
第三步骤:执行“英语短文句子层次主题连贯分析模块”
英语短文句子层次主题连贯分析模块利用第二步骤中英语短文层次主题树混合语义空间分析模块输出的英语短文的题目主题关系三元组分布式向量、全文主题关系三元组分布式向量、段落主题关系三元组分布式向量、句子主题关系三元组分布式向量结果,首先,计算英语短文题目与句子层次主题连贯语义相似度,计算英语短文段落与句子层次主题连贯语义相似度,根据输出的英语短文题目与句子层次主题连贯语义相似度和英语短文段落与句子层次主题连贯语义相似度,计算最终英语短文句子层次主题连贯语义相似度,根据输出的所有英语短文句子层次主题连贯语义相似度,计算英语短文层次主题连贯语义相似度评分值;然后,计算英语短文句子与段落层次主题连贯值,计算英语短文段落与段落层次主题连贯值,计算英语短文段落与全文层次主题连贯值,根据生成的英语短文句子与段落层次主题连贯值,设置层次主题连贯阈值抽取英语短文中主题不连贯句子。根据英语短文句子与段落层次主题连贯值、英语短文段落与段落层次主题连贯值、英语短文段落与全文层次主题连贯值,计算英语短文层次主题连贯评分均值。本实施方式的英语短文句子层次主题连贯语义相似度、英语短文句子层次主题连贯值、英语短文层次主题连贯语义相似度评分值、英语短文层次主题连贯评分均值如下所示:
英语短文句子层次主题连贯语义相似度如下:
第1段第1句层次主题连贯语义相似度为:0.7786710858345032
第1段第2句层次主题连贯语义相似度为:0.7784239053726196
第1段第3句层次主题连贯语义相似度为:0.2951989471912384
第2段第4句层次主题连贯语义相似度为:0.7096657752990723
第2段第5句层次主题连贯语义相似度为:0.6682606935501099
第2段第6句层次主题连贯语义相似度为:0.23642082512378693
第2段第7句层次主题连贯语义相似度为:0.45290467143058777
第2段第8句层次主题连贯语义相似度为:0.6086656451225281
第2段第9句层次主题连贯语义相似度为:0.35007521510124207
第2段第10句层次主题连贯语义相似度为:0.5189481973648071
第2段第11句层次主题连贯语义相似度为:0.425748735666275
第3段第12句层次主题连贯语义相似度为:0.6587570309638977
第3段第13句层次主题连贯语义相似度为:0.3443957269191742
英语短文句子层次主题连贯值如下:
第1段第1句层次主题连贯值为:0.7558869123458862
第1段第2句层次主题连贯值为:0.7963798642158508
第1段第3句层次主题连贯值为:0.2205403298139572
第2段第4句层次主题连贯值为:0.7034226655960083
第2段第5句层次主题连贯值为:0.6429765820503235
第2段第6句层次主题连贯值为:0.2636907994747162
第2段第7句层次主题连贯值为:0.4748536944389343
第2段第8句层次主题连贯值为:0.6296454071998596
第2段第9句层次主题连贯值为:0.3357328772544861
第2段第10句层次主题连贯值为:0.5160608887672424
第2段第11句层次主题连贯值为:0.4081385135650635
第3段第12句层次主题连贯值为:0.6920690655708313
第3段第13句层次主题连贯值为:0.385113388299942
本发明将主题不连贯句子抽取阈值设为0.32,英语短文中句子与英语短文中段落的层次主题连贯值小于0.32时,将判定为主题不连贯句子,结果如下所示:
主题不连贯句子如下:
第1段第3句:I have the reasons and evidence to support my point.#
第2段第6句:They can gain the treasure to deal with various problems.#
英语短文层次主题连贯语义相似度评分值:0.73分;
英语短文层次主题连贯评分均值:0.70分。
第四步骤:执行“英语短文句子层次主题连贯分析输出模块”
英语短文句子层次主题连贯分析输出模块是输入第三步骤英语短文句子层次主题连贯分析模块中输出的英语短文层次主题连贯语义相似度评分值和英语短文主题连贯评分均值,对英语短文的层次主题连贯语义相似度评分值和英语短文主题连贯评分均值加权,计算英语短文的主题连贯分数,并输出英语短文的主题连贯分析评语。
本实施方式的英语短文句子层次主题连贯分析结果格式如下所示:
英语短文句子层次主题连贯性分数:71.50分。
英语短文句子层次主题连贯分析评语:
英语短文内容基本主题连贯,主题连贯表达一般,有些地方不够清楚。

Claims (7)

1.一种英语短文句子层次主题连贯分析方法,其特征是:包括一个由顺序连接的英语短文句子预处理模块、英语短文层次主题树混合语义空间分析模块、英语短文句子层次主题连贯分析模块、英语短文句子层次主题连贯分析输出模块组成的分析模型,其分析方法包括如下步骤:
(1)英语短文句子预处理模块输入英语短文的题目和全文,对英语短文题目和英语短文全文分别进行分词分句、删除停用词、词干化处理;对分词分句、删除停用词、词干化处理后的英语短文的题目和全文进行词性标注、关系三元组提取;输出处理的英语短文的题目和全文的预处理结果;
(2)英语短文层次主题树混合语义空间分析模块输入英语短文的题目和全文的预处理结果,使用构建的关系三元组层次主题树模型,对从英语短文的题目、全文、段落、句子的关系三元组信息分别进行主题聚类;将主题聚类映射到分布式语义空间中,生成英语短文的题目主题关系三元组分布式向量、全文主题关系三元组分布式向量、段落主题关系三元组分布式向量、句子主题关系三元组分布式向量;对生成的英语短文的题目主题关系三元组分布式向量、全文主题关系三元组分布式向量、段落主题关系三元组分布式向量、句子主题关系三元组分布式向量,匹配英语知识库中语义概念,抽取相邻关系三元组,并通过迭代的方法分析出最优英语短文的题目、全文、段落、句子的候选主题关系三元组集合,扩展英语短文的题目主题关系三元组分布式向量、全文主题关系三元组分布式向量、段落主题关系三元组分布式向量、句子主题关系三元组分布式向量;
(3)英语短文句子层次主题连贯分析模块输入英语短文的题目主题关系三元组分布式向量、全文主题关系三元组分布式向量、段落主题关系三元组分布式向量、句子主题关系三元组分布式向量,分别计算英语短文中题目与句子之间的层次主题连贯语义相似度、段落与句子之间的层次主题连贯语义相似度;根据计算出的英语短文中题目与句子之间的层次主题连贯语义相似度、段落与句子之间的层次主题连贯语义相似度,设置计算英语短文中题目与句子之间的层次主题连贯语义相似度的权重值、段落与句子之间的层次主题连贯语义相似度的权重值,计算出英语短文中句子的层次主题连贯语义相似度;根据计算出的英语短文中句子的层次主题连贯语义相似度,计算英语短文层次主题连贯语义相似度评分值;计算英语短文中句子与段落的层次主题连贯值、段落与段落的层次主题连贯值、各段落与全文的层次主题连贯值;根据英语短文中句子与段落的层次主题连贯值,将各句子与段落的层次主题连贯值排序,设置层次主题连贯阈值抽取英语短文中主题不连贯句子;根据英语短文中句子与段落的层次主题连贯值、段落与段落的层次主题连贯值、段落与全文的层次主题连贯值,计算出英语短文的层次主题连贯评分均值;
(4)英语短文句子层次主题连贯分析输出模块输入英语短文句子层次主题连贯分析模块中的英语短文层次主题连贯语义相似度评分值、英语短文层次主题连贯评分均值;根据英语短文句子层次主题连贯分析模块的英语短文层次主题连贯语义相似度评分值、英语短文层次主题连贯评分均值,计算英语短文的主题连贯分数,并生成英语短文的主题连贯分析评语。
2.根据权利要求1所述的分析方法,其特征是:所述的英语短文句子预处理模块处理步骤如下:
P201开始;
P202读取英语短文的题目和全文;
P203对英语短文题目进行分词分句,输出英语短文题目的切分结果;
P204对英语短文全文进行分词分句,输出英语短文全文的切分结果;
P205通过正则表达式匹配停用词集对英语短文的题目进行去停用词处理;
P206通过正则表达式匹配停用词集对英语短文的全文进行去停用词处理;
P207对英语短文的题目进行词干化处理;
P208对英语短文的全文进行词干化处理;
P209对英语短文的题目进行词性标注,输出英语短文的题目词性标注结果;
P210对英语短文的全文进行词性标注,输出英语短文的全文词性标注结果;
P211对英语短文的题目进行关系三元组提取,输出英语短文的题目关系三元组分析结果;
P212对英语短文的全文进行关系三元组提取,输出英语短文的全文关系三元组分析结果;
P213结束。
3.根据权利要求1所述的分析方法,其特征是:所述的英语短文层次主题树混合语义空间分析模块处理步骤如下:
P301开始;
P302读取英语短文的题目和全文的预处理结果;
P303基于关系三元组层次主题树模型对英语短文题目的关系三元组信息进行主题聚类,输出英语短文题目的主题聚类结果;
P304基于关系三元组层次主题树模型对英语短文全文的关系三元组信息进行主题聚类,输出英语短文的全文主题聚类结果;
P305基于关系三元组层次主题树模型对英语短文各段落的关系三元组信息进行主题聚类,输出英语短文的各段落主题聚类结果;
P306基于关系三元组层次主题树模型对英语短文各句子的关系三元组信息进行主题聚类,输出英语短文的各句子主题聚类结果;
P307读取英语短文的题目主题聚类结果映射到分布式语义空间中生成英语短文的题目主题关系三元组分布式向量;
P308读取英语短文的全文主题聚类结果映射到分布式语义空间中生成英语短文的全文主题关系三元组分布式向量;
P309读取英语短文的各段落主题聚类结果映射到分布式语义空间中生成英语短文的段落主题关系三元组分布式向量;
P310读取英语短文的句子主题聚类结果映射到分布式语义空间中生成英语短文的句子主题关系三元组分布式向量;
P311匹配知识库扩展英语短文的题目主题关系三元组分布式向量,输出英语短文的题目主题关系三元组分布式向量;
P312匹配知识库扩展英语短文的全文主题关系三元组分布式向量,输出英语短文的全文主题关系三元组分布式向量;
P313匹配知识库扩展英语短文的段落主题关系三元组分布式向量,输出英语短文的段落主题关系三元组分布式向量;
P314匹配知识库扩展英语短文的句子主题关系三元组分布式向量,输出英语短文的句子主题关系三元组分布式向量;
P315结束。
4.根据权利要求1所述的分析方法,其特征是:所述的英语短文句子层次主题连贯分析模块的计算公式定义如下:
(1)英语短文题目与句子层次主题连贯语义相似度计算公式
Figure FDA0004014547860000031
在计算公式(1)中,n表示在英语短文层次主题树混合语义向量空间中,从第i维到n维的英语短文题目主题关系三元组分布式向量和英语短文句子主题关系三元组分布式向量;
(2)英语短文段落与句子层次主题连贯语义相似度计算公式
Figure FDA0004014547860000041
在计算公式(2)中,n表示在英语短文的层次主题树混合语义向量空间中,从第i维到n维的英语短文段落主题关系三元组分布式向量和英语短文句子主题关系三元组分布式向量;
(3)英语短文句子层次主题连贯语义相似度计算公式
英语短文句子层次主题连贯语义相似度=δ1×英语短文题目与句子层次主题连贯语义相似度+δ2×英语短文段落与句子层次主题连贯语义相似度    (3)
在计算公式(3)中,δ1,δ2分别表示英语短文题目与句子层次主题连贯语义相似度、英语短文段落与句子层次主题连贯语义相似度在英语短文句子层次主题连贯语义相似度中的权重值,并且δ12=1,英语短文题目与句子层次主题连贯语义相似度由计算公式(1)得出,英语短文段落与句子层次主题连贯语义相似度由计算公式(2)得出;
(4)英语短文层次主题连贯语义相似度评分值计算公式
Figure FDA0004014547860000042
在计算公式(4)中,英语短文句子层次主题连贯语义相似度由计算公式(3)得出,n表示英语短文中句子总数;
(5)英语短文句子与段落层次主题连贯值计算公式
Figure FDA0004014547860000043
在计算公式(5)中,n表示英语短文中包含的所有句子主题关系三元组分布式向量的数量,i表示英语短文中第i个句子主题关系三元组分布式向量;
(6)英语短文段落与段落层次主题连贯值计算公式
Figure FDA0004014547860000044
Figure FDA0004014547860000051
在计算公式(6)中,n表示英语短文中包含的所有段落主题关系三元组分布式向量的数量,j表示英语短文中第j个段落主题关系三元组分布式向量;
(7)英语短文段落与全文层次主题连贯值计算公式
Figure FDA0004014547860000052
在计算公式(7)中,n表示英语短文中包含的所有段落主题关系三元组分布式向量的数量,k表示英语短文中第k个段落主题关系三元组分布式向量;
(8)英语短文层次主题连贯评分均值计算公式
Figure FDA0004014547860000053
在计算公式(8)中,ε1,ε2,ε3分别表示英语短文中句子与段落的层次主题连贯值、段落与段落的层次主题连贯值、段落与全文的层次主题连贯值在英语短文层次主题连贯评分均值中的权重分配,且ε123=1,N表示英语短文中句子的主题关系三元组分布式向量数量,M表示英语短文中段落的主题关系三元组分布式向量数量,英语短文句子与段落层次主题连贯值由计算公式(5)得出,英语短文段落与段落层次主题连贯值由计算公式(6)得出,英语短文段落与全文层次主题连贯值由计算公式(7)得出。
5.根据权利要求4所述的分析方法,其特征是:所述的英语短文句子层次主题连贯分析模块处理步骤如下:
P401开始;
P402读取英语短文的题目、全文、段落、句子的主题关系三元组分布式向量;
P403根据公式(1)计算出英语短文题目与句子层次主题连贯语义相似度,输出英语短文中题目与所有句子层次主题连贯语义相似度;
P404根据公式(2)计算出英语短文段落与句子层次主题连贯语义相似度,输出英语短文中段落与所有句子层次主题连贯语义相似度;
P405根据公式(3)计算出英语短文句子层次主题连贯语义相似度,输出英语短文句子层次主题连贯语义相似度;
P406判断英语短文中是否还有没有分析的句子、段落的层次主题连贯语义相似度,如果有跳转至P403操作,否则跳转至P407操作;
P407读取所有英语短文句子层次主题连贯语义相似度,根据公式(4)计算出英语短文层次主题连贯语义相似度评分值,输出英语短文层次主题连贯语义相似度评分值;
P408根据公式(5)计算出英语短文中句子与段落层次主题连贯值,输出所有的英语短文句子与段落层次主题连贯值;
P409根据所有的英语短文句子与段落层次主题连贯值,设置层次主题连贯阈值抽取英语短文的主题不连贯句子,生成主题不连贯句子集合;
P410根据公式(6)计算出英语短文段落与段落层次主题连贯值,输出所有的英语短文段落与段落的层次主题连贯值;
P411根据公式(7)计算出英语短文段落与全文层次主题连贯值,输出所有的英语短文段落与全文的层次主题连贯值;
P412判断英语短文中是否还有没有分析的句子和段落的层次主题连贯值,如果是跳转至P407操作,否则跳转至P413操作;
P413读取所有的英语短文句子与段落层次主题连贯值,英语短文段落与段落层次主题连贯值,英语短文段落与全文层次主题连贯值,根据公式(8)计算出英语短文层次主题连贯评分均值,输出英语短文层次主题连贯评分均值;
P414结束。
6.根据权利要求5所述的分析方法,其特征是:所述的英语短文句子层次主题连贯分析输出模块的计算公式定义如下:
英语短文主题连贯分数=
(0.5×英语短文层次主题连贯语义相似度评分值+0.5×英语短文层次主题连贯评分均值)×100    (9)
在计算公式(9)中,英语短文层次主题连贯语义相似度评分值由计算公式(4)得出,英语短文层次主题连贯评分均值由计算公式(8)得出。
7.根据权利要求6所述的分析方法,其特征是:所述的英语短文句子层次主题连贯分析输出模块处理步骤如下:
P501开始;
P502读取英语短文句子层次主题连贯分析模块中的英语短文层次主题连贯语义相似度评分值;
P503读取英语短文句子层次主题连贯分析模块中的英语短文层次主题连贯评分均值;
P504根据公式(9)计算出英语短文的主题连贯分数,输出英语短文的主题连贯分数,并生成英语短文的主题连贯评语;
P505结束。
CN202010573975.9A 2020-06-22 2020-06-22 一种英语短文句子层次主题连贯分析方法 Active CN111709224B (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010573975.9A CN111709224B (zh) 2020-06-22 2020-06-22 一种英语短文句子层次主题连贯分析方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010573975.9A CN111709224B (zh) 2020-06-22 2020-06-22 一种英语短文句子层次主题连贯分析方法

Publications (2)

Publication Number Publication Date
CN111709224A CN111709224A (zh) 2020-09-25
CN111709224B true CN111709224B (zh) 2023-04-07

Family

ID=72541343

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010573975.9A Active CN111709224B (zh) 2020-06-22 2020-06-22 一种英语短文句子层次主题连贯分析方法

Country Status (1)

Country Link
CN (1) CN111709224B (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5794050A (en) * 1995-01-04 1998-08-11 Intelligent Text Processing, Inc. Natural language understanding system
CN106776550A (zh) * 2016-12-06 2017-05-31 桂林电子科技大学 一种英语作文语篇连贯质量的分析方法
CN107423282A (zh) * 2017-05-24 2017-12-01 南京大学 基于混合特征的文本中语义连贯性主题与词向量并发提取方法
CN110287497A (zh) * 2019-07-03 2019-09-27 桂林电子科技大学 一种英语文本的语义结构连贯分析方法
CN111104789A (zh) * 2019-11-22 2020-05-05 华中师范大学 文本评分方法、装置和系统

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10831793B2 (en) * 2018-10-23 2020-11-10 International Business Machines Corporation Learning thematic similarity metric from article text units

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5794050A (en) * 1995-01-04 1998-08-11 Intelligent Text Processing, Inc. Natural language understanding system
CN106776550A (zh) * 2016-12-06 2017-05-31 桂林电子科技大学 一种英语作文语篇连贯质量的分析方法
CN107423282A (zh) * 2017-05-24 2017-12-01 南京大学 基于混合特征的文本中语义连贯性主题与词向量并发提取方法
CN110287497A (zh) * 2019-07-03 2019-09-27 桂林电子科技大学 一种英语文本的语义结构连贯分析方法
CN111104789A (zh) * 2019-11-22 2020-05-05 华中师范大学 文本评分方法、装置和系统

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Measuring the coherence of writing using topic-based analysis;Richard Watson Todda;《Assessing Writing》;20040819;全文 *
Off-topic English Essay Detection Model Based on Hybrid Semantic Space for Automated English Essay Scoring System;Guimin Huang,Jian Liua,Chunli Fan,Tingting Pan;《2018 2nd International Conference on Electronic Information Technology and Computer Engineering (EITCE 2018)》;20181119;全文 *
Unsupervised Learning by Probabilistic Latent Semantic Analysis;Thomas Hofmann;《Machine Learning》;20010131;全文 *
基于hLDA层次主题模型的多文档摘要技术研究;刘红艳;《中国优秀硕士学位论文全文数据库 信息科技辑》;20120815;全文 *
基于潜在语义分析的文本连贯性分析;汤世平等;《计算机应用与软件》;20080215(第02期);全文 *

Also Published As

Publication number Publication date
CN111709224A (zh) 2020-09-25

Similar Documents

Publication Publication Date Title
CN108287858B (zh) 自然语言的语义提取方法及装置
CN104464736B (zh) 语音识别文本的纠错方法和装置
CN107818141B (zh) 融入结构化要素识别的生物医学事件抽取方法
Gómez-Adorno et al. Improving feature representation based on a neural network for author profiling in social media texts
CN106649783A (zh) 一种同义词挖掘方法和装置
CN105975478A (zh) 一种基于词向量分析的网络文章所属事件的检测方法和装置
CN110287497B (zh) 一种英语文本的语义结构连贯分析方法
Riza et al. Question generator system of sentence completion in TOEFL using NLP and k-nearest neighbor
Lee et al. Spoken knowledge organization by semantic structuring and a prototype course lecture system for personalized learning
Petzell et al. Grammatical and lexical comparison of the Greater Ruvu Bantu languages
Tasharofi et al. Evaluation of statistical part of speech tagging of Persian text
CN112765319A (zh) 一种文本的处理方法、装置、电子设备及存储介质
JP6145059B2 (ja) モデル学習装置、形態素解析装置、及び方法
Almeman et al. Towards developing a multi-dialect morphological analyser for arabic
CN111709224B (zh) 一种英语短文句子层次主题连贯分析方法
CN114138969A (zh) 文本处理方法及装置
Mahmoodvand et al. Semi-supervised approach for Persian word sense disambiguation
CN110750632B (zh) 一种改进的中文alice智能问答方法及系统
Elbarougy et al. A proposed natural language processing preprocessing procedures for enhancing arabic text summarization
Khoury Microtext normalization using probably-phonetically-similar word discovery
CN112487806B (zh) 一种英语文本概念理解方法
Rofiq Indonesian news extractive text summarization using latent semantic analysis
CN113886521A (zh) 一种基于相似词汇表的文本关系自动标注方法
JP7044245B2 (ja) 対話システム補強装置及びコンピュータプログラム
Nurtomo Greedy algorithms to optimize a sentence set near-uniformly distributed on syllable units and punctuation marks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20200925

Assignee: Guilin ruiweisaide Technology Co.,Ltd.

Assignor: GUILIN University OF ELECTRONIC TECHNOLOGY

Contract record no.: X2023980046266

Denomination of invention: A Method for Analyzing Topic Coherence at the Sentence Level in English Short Essays

Granted publication date: 20230407

License type: Common License

Record date: 20231108