CN111709224B - 一种英语短文句子层次主题连贯分析方法 - Google Patents
一种英语短文句子层次主题连贯分析方法 Download PDFInfo
- Publication number
- CN111709224B CN111709224B CN202010573975.9A CN202010573975A CN111709224B CN 111709224 B CN111709224 B CN 111709224B CN 202010573975 A CN202010573975 A CN 202010573975A CN 111709224 B CN111709224 B CN 111709224B
- Authority
- CN
- China
- Prior art keywords
- topic
- english short
- sentence
- text
- english
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 16
- 238000004458 analytical method Methods 0.000 claims abstract description 80
- 238000007781 pre-processing Methods 0.000 claims abstract description 17
- 230000001427 coherent effect Effects 0.000 claims description 118
- 239000013598 vector Substances 0.000 claims description 110
- 238000004364 calculation method Methods 0.000 claims description 54
- 230000011218 segmentation Effects 0.000 claims description 31
- 238000012545 processing Methods 0.000 claims description 28
- 238000000605 extraction Methods 0.000 claims description 7
- 230000014509 gene expression Effects 0.000 claims description 5
- 230000009191 jumping Effects 0.000 claims description 4
- 238000013507 mapping Methods 0.000 claims description 4
- 238000012217 deletion Methods 0.000 claims description 3
- 230000037430 deletion Effects 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 2
- 238000011161 development Methods 0.000 description 6
- 230000018109 developmental process Effects 0.000 description 5
- 230000003340 mental effect Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000008570 general process Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
本发明提供一种英语短文句子层次主题连贯分析方法,该方法是一个由顺序连接的英语短文句子预处理模块、英语短文层次主题树混合语义空间分析模块、英语短文句子层次主题连贯分析模块、英语短文句子层次主题连贯分析输出模块组成的分析模型。一篇英语短文通过该分析模型和分析方法处理后,最后能够得到这篇英语短文句子层次的主题连贯分析结果。本发明分析方法解决了英语短文句子层次主题连贯的自动分析问题,其分析结果比传统的英语短文句子层次主题连贯分析结果更好。
Description
技术领域
本发明涉及自然语言处理技术,具体是一种使用计算机自动分析英语短文中句子层次主题是否连贯的方法,本发明的分析方法只适用于分析英语短文,不适用于分析中文短文。
背景技术
在英语短文中句子主题连贯程度决定了句子是否围绕主题,目前国内外英语短文句子主题连贯分析方法主要分为无监督的英语短文句子主题连贯分析方法和有监督的英语短文句子主题连贯分析方法,有监督的英语短文句子主题连贯分析方法,需要参考英语短文范文,不适用于对大量的英语短文进行句子层次主题连贯分析;无监督的英语短文句子主题连贯分析方法,是通过分布式向量直接计算英语短文句子层次主题连贯语义相似度,来判断英语短文句子的主题连贯程度,缺乏对英语短文句子层次主题连贯特征的分析。本发明为了解决上述问题,提供了一种英语短文句子层次主题连贯分析方法。
发明内容
本发明的一种英语短文句子层次主题连贯分析方法的总体处理流程如图1所示,其中包括英语短文句子预处理模块、英语短文层次主题树混合语义空间分析模块、英语短文句子层次主题连贯分析模块、英语短文句子层次主题连贯分析输出模块。
其中的英语短文句子预处理模块的处理流程是:第一,输入英语短文的题目和全文,对英语短文题目和英语短文全文分别进行分词分句、删除停用词、词干化处理;第二,对分词分句、删除停用词、词干化处理后的英语短文的题目和全文进行词性标注、关系三元组提取;第三,输出上述两步处理的英语短文的题目和全文的预处理结果。
其中的英语短文层次主题树混合语义空间分析模块的处理流程是:第一,输入英语短文的题目和全文的预处理结果,使用构建的关系三元组层次主题树模型,对从英语短文的题目、全文、段落、句子的关系三元组信息分别进行主题聚类;第二,将主题聚类映射到分布式语义空间中,生成英语短文的题目主题关系三元组分布式向量、全文主题关系三元组分布式向量、段落主题关系三元组分布式向量、句子主题关系三元组分布式向量;第三,对生成的英语短文的题目主题关系三元组分布式向量、全文主题关系三元组分布式向量、段落主题关系三元组分布式向量、句子主题关系三元组分布式向量,匹配英语知识库中语义概念,抽取相邻关系三元组,并通过迭代的方法分析出最优英语短文的题目、全文、段落、句子的候选主题关系三元组集合,扩展英语短文的题目主题关系三元组分布式向量、全文主题关系三元组分布式向量、段落主题关系三元组分布式向量、句子主题关系三元组分布式向量。
其中的英语短文句子层次主题连贯分析模块的处理流程是:第一,输入英语短文的题目主题关系三元组分布式向量、全文主题关系三元组分布式向量、段落主题关系三元组分布式向量、句子主题关系三元组分布式向量,分别计算英语短文中题目与句子之间的层次主题连贯语义相似度、段落与句子之间的层次主题连贯语义相似度;第二,根据计算出的英语短文中题目与句子之间的层次主题连贯语义相似度、段落与句子之间的层次主题连贯语义相似度,设置计算英语短文中题目与句子之间的层次主题连贯语义相似度的权重值、段落与句子之间的层次主题连贯语义相似度的权重值,计算出英语短文中句子的层次主题连贯语义相似度,根据计算出的英语短文中句子的层次主题连贯语义相似度,计算英语短文层次主题连贯语义相似度评分值;第三,计算英语短文中句子与段落的层次主题连贯值、段落与段落的层次主题连贯值、各段落与全文的层次主题连贯值;第四,根据英语短文中句子与段落的层次主题连贯值,将各句子与段落的层次主题连贯值排序,设置层次主题连贯阈值抽取英语短文中主题不连贯句子;第五,根据英语短文中句子与段落的层次主题连贯值、段落与段落的层次主题连贯值、段落与全文的层次主题连贯值,计算出英语短文的层次主题连贯评分均值。
其中的英语短文句子层次主题连贯分析输出模块的处理流程是:第一,输入英语短文句子层次主题连贯分析模块中的英语短文层次主题连贯语义相似度评分值、英语短文层次主题连贯评分均值;第二,根据英语短文句子层次主题连贯分析模块的英语短文层次主题连贯语义相似度评分值、英语短文层次主题连贯评分均值,计算英语短文的主题连贯分数,并生成英语短文的主题连贯分析评语。
本发明的概念与结构定义如下:
1.英语短文单词的词性标注
本发明的英语短文单词的词性标注是对英语短文句子中的单词进行词性标注,单词词性标注的格式如下:
英语短文句子单词1/词性,
英语短文句子单词2/词性,
英语短文句子单词3/词性,
……
英语短文句子单词n/词性
2.英语短文的切分
本发明的英语短文的切分是对英语短文的段落、句子两个部分进行切分,切分结果的格式如下:
[段落1]-段落切分标记1,[段落2]-段落切分标记2,……[段落m]-段落切分标记k
英语短文句子切分的格式如下:
[[句子1]-句子切分标记1,[句子2]-句子切分标记2,……[句子n]-句子切分标记p]]
3.关系三元组的结构
本发明的关系三元组包括英语短文中句子的主语,谓语和宾语,它的结构如下:
题目1[关系三元组1,关系三元组2,……,关系三元组n]
段落1[关系三元组1,关系三元组2,……,关系三元组i]
段落2[关系三元组1,关系三元组2,……,关系三元组j]
……
段落n[关系三元组1,关系三元组2,……,关系三元组k]
4.关系三元组层次主题树模型的结构
本发明的关系三元组层次主题树模型包括英语短文的题目、段落、句子和全文的主题关系三元组,它的结构如下:
题目1[主题关系三元组1,主题关系三元组2,……,主题关系三元组n]
段落1[主题关系三元组1,主题关系三元组2,……,主题关系三元组i]
段落2[主题关系三元组1,主题关系三元组2,……,主题关系三元组j]
……
段落m[主题关系三元组1,主题关系三元组2,……,主题关系三元组k]
……
句子1[主题关系三元组1,主题关系三元组2,……,主题关系三元组l]
句子2[主题关系三元组1,主题关系三元组2,……,主题关系三元组o]
……
句子n[主题关系三元组1,主题关系三元组2,……,主题关系三元组p]
……
全文[主题关系三元组1,主题关系三元组2,……,主题关系三元组r]
5.主题关系三元组分布式向量空间的结构
本发明的主题关系三元组分布式向量空间的结构如下:
主题关系三元组1[300维向量]
主题关系三元组2[300维向量]
……
主题关系三元组n[300维向量]
6.英语知识库的结构
本发明英语知识库结构中的概念是指英语短文中的单词语义,关系是指英语短文中单词之间的主谓关系,权重值是指英语短文中单词之间主谓关系出现次数,英语知识库的结构如下:
[概念1,关系及权重值,概念n+1]
[概念2,关系及权重值,概念n+2]
……
[概念n,关系及权重值,概念n+m]
7.英语短文题目与句子层次主题连贯语义相似度计算公式
在计算公式(1)中,n表示在英语短文层次主题树混合语义向量空间中,从第i维到n维的英语短文题目主题关系三元组分布式向量和英语短文句子主题关系三元组分布式向量。
8.英语短文段落与句子层次主题连贯语义相似度计算公式
在计算公式(2)中,n表示在英语短文的层次主题树混合语义向量空间中,从第i维到n维的英语短文段落主题关系三元组分布式向量和英语短文句子主题关系三元组分布式向量。
9.英语短文句子层次主题连贯语义相似度计算公式
英语短文句子层次主题连贯语义相似度=δ1×英语短文题目与句子层次主题连贯语义相似度+δ2×英语短文段落与句子层次主题连贯语义相似度 (3)
在计算公式(3)中,δ1,δ2分别表示英语短文题目与句子层次主题连贯语义相似度、英语短文段落与句子层次主题连贯语义相似度在英语短文句子层次主题连贯语义相似度中的权重值,并且δ1+δ2=1。英语短文题目与句子层次主题连贯语义相似度由计算公式(1)得出,英语短文段落与句子层次主题连贯语义相似度由计算公式(2)得出。
10.英语短文层次主题连贯语义相似度评分值计算公式
在计算公式(4)中,英语短文句子层次主题连贯语义相似度由计算公式(3)得出,n表示英语短文中句子总数。
11.英语短文句子与段落层次主题连贯值计算公式
在计算公式(5)中,n表示英语短文中包含的所有句子主题关系三元组分布式向量的数量,i表示英语短文中第i个句子主题关系三元组分布式向量。
12.英语短文段落与段落层次主题连贯值计算公式
在计算公式(6)中,n表示英语短文中包含的所有段落主题关系三元组分布式向量的数量,j表示英语短文中第j个段落主题关系三元组分布式向量。
13.英语短文段落与全文层次主题连贯值计算公式
在计算公式(7)中,n表示英语短文中包含的所有段落主题关系三元组分布式向量的数量,k表示英语短文中第k个段落主题关系三元组分布式向量。
14.英语短文层次主题连贯评分均值计算公式
在计算公式(8)中,ε1,ε2,ε3分别表示英语短文中句子与段落的层次主题连贯值、段落与段落的层次主题连贯值、段落与全文的层次主题连贯值在英语短文层次主题连贯评分均值中的权重分配,且ε1+ε2+ε3=1。N表示英语短文中句子的主题关系三元组分布式向量数量,M表示英语短文中段落的主题关系三元组分布式向量数量。英语短文句子与段落层次主题连贯值由计算公式(5)得出,英语短文段落与段落层次主题连贯值由计算公式(6)得出,英语短文段落与全文层次主题连贯值由计算公式(7)得出。
15.英语短文主题连贯分数计算公式
英语短文主题连贯分数=(0.5×英语短文层次主题连贯语义相似度评分值+0.5×英语短文层次主题连贯评分均值)×100 (9)
在计算公式(9)中,英语短文层次主题连贯语义相似度评分值由计算公式(4)得出,英语短文层次主题连贯评分均值由计算公式(8)得出。
本发明的英语短文句子预处理模块、英语短文层次主题树混合语义空间分析模块、英语短文句子层次主题连贯分析模块、英语短文句子层次主题连贯分析输出模块处理流程如下所述。
如图2所示,所述英语短文句子预处理模块处理流程如下:
P201开始;
P202读取英语短文的题目和全文;
P203对英语短文题目进行分词分句,输出英语短文题目的切分结果;
P204对英语短文全文进行分词分句,输出英语短文全文的切分结果;
P205通过正则表达式匹配停用词集对英语短文的题目进行去停用词处理;
P206通过正则表达式匹配停用词集对英语短文的全文进行去停用词处理;
P207对英语短文的题目进行词干化处理;
P208对英语短文的全文进行词干化处理;
P209对英语短文的题目进行词性标注,输出英语短文的题目词性标注结果;
P210对英语短文的全文进行词性标注,输出英语短文的全文词性标注结果;
P211对英语短文的题目进行关系三元组提取,输出英语短文的题目关系三元组分析结果;
P212对英语短文的全文进行关系三元组提取,输出英语短文的全文关系三元组分析结果;
P213结束。
如图3所示,所述的英语短文层次主题树混合语义空间分析模块处理流程如下:
P301开始;
P302读取英语短文的题目和全文的预处理结果;
P303基于关系三元组层次主题树模型对英语短文题目的关系三元组信息进行主题聚类,输出英语短文题目的主题聚类结果;
P304基于关系三元组层次主题树模型对英语短文全文的关系三元组信息进行主题聚类,输出英语短文的全文主题聚类结果;
P305基于关系三元组层次主题树模型对英语短文各段落的关系三元组信息进行主题聚类,输出英语短文的各段落主题聚类结果;
P306基于关系三元组层次主题树模型对英语短文各句子的关系三元组信息进行主题聚类,输出英语短文的各句子主题聚类结果;
P307读取英语短文的题目主题聚类结果映射到分布式语义空间中生成英语短文的题目主题关系三元组分布式向量;
P308读取英语短文的全文主题聚类结果映射到分布式语义空间中生成英语短文的全文主题关系三元组分布式向量;
P309读取英语短文的各段落主题聚类结果映射到分布式语义空间中生成英语短文的段落主题关系三元组分布式向量;
P310读取英语短文的句子主题聚类结果映射到分布式语义空间中生成英语短文的句子主题关系三元组分布式向量;
P311匹配知识库扩展英语短文的题目主题关系三元组分布式向量,输出英语短文的题目主题关系三元组分布式向量;
P312匹配知识库扩展英语短文的全文主题关系三元组分布式向量,输出英语短文的全文主题关系三元组分布式向量;
P313匹配知识库扩展英语短文的段落主题关系三元组分布式向量,输出英语短文的段落主题关系三元组分布式向量;
P314匹配知识库扩展英语短文的句子主题关系三元组分布式向量,输出英语短文的句子主题关系三元组分布式向量;
P315结束。
如图4所示,所述的英语短文句子层次主题连贯分析模块处理流程如下:
P401开始;
P402读取英语短文的题目、全文、段落、句子的主题关系三元组分布式向量;
P403根据公式(1)计算出英语短文题目与句子层次主题连贯语义相似度,输出英语短文中题目与所有句子层次主题连贯语义相似度;
P404根据公式(2)计算出英语短文段落与句子层次主题连贯语义相似度,输出英语短文中段落与所有句子层次主题连贯语义相似度;
P405根据公式(3)计算出英语短文句子层次主题连贯语义相似度,输出英语短文句子层次主题连贯语义相似度;
P406判断英语短文中是否还有没有分析的句子、段落的层次主题连贯语义相似度,如果有跳转至P403操作,否则跳转至P407操作;
P407读取所有英语短文句子层次主题连贯语义相似度,根据公式(4)计算出英语短文层次主题连贯语义相似度评分值,输出英语短文层次主题连贯语义相似度评分值;
P408根据公式(5)计算出英语短文中句子与段落层次主题连贯值,输出所有的英语短文句子与段落层次主题连贯值;
P409根据所有的英语短文句子与段落层次主题连贯值,设置层次主题连贯阈值抽取英语短文的主题不连贯句子,生成主题不连贯句子集合;
P410根据公式(6)计算出英语短文段落与段落层次主题连贯值,输出所有的英语短文段落与段落的层次主题连贯值;
P411根据公式(7)计算出英语短文段落与全文层次主题连贯值,输出所有的英语短文段落与全文的层次主题连贯值;
P412判断英语短文中是否还有没有分析的句子和段落的层次主题连贯值,如果是跳转至P407操作,否则跳转至P413操作;
P413读取所有的英语短文句子与段落层次主题连贯值,英语短文段落与段落层次主题连贯值,英语短文段落与全文层次主题连贯值,根据公式(8)计算出英语短文层次主题连贯评分均值,输出英语短文层次主题连贯评分均值。
P414结束。
如图5所示,所述的英语短文句子层次主题连贯分析输出模块处理流程如下:
P501开始;
P502读取英语短文句子层次主题连贯分析模块中的英语短文层次主题连贯语义相似度评分值;
P503读取英语短文句子层次主题连贯分析模块中的英语短文层次主题连贯评分均值;
P504根据公式(9)计算出英语短文的主题连贯分数,输出英语短文的主题连贯分数,并生成英语短文的主题连贯评语;
P505结束。
附图说明
图1是本发明方法的总体处理流程图;
图2是本发明方法的英语短文句子预处理模块处理流程图;
图3是本发明方法的英语短文层次主题树混合语义空间分析模块处理流程图;
图4是本发明方法的英语短文句子层次主题连贯分析模块处理流程图;
图5是本发明方法的英语短文句子层次主题连贯分析输出模块处理流程图。
具体实施方式
本发明的一种英语短文句子层次主题连贯分析方法的具体实施方式分为如下四个步骤。
第一步骤:执行“英语短文句子预处理模块”
本发明实施方式中输入的英语短文取材于题目为“Whether it is importantfor college students to have a part time job.”的英语短文,英语短文内容如下:
Nowadays,it is a common phenomenon that college students do part-timejob,even I have tried hard to get a decent part-time job.I strongly agree andadvocate college students to find part-time job.I have the reasons andevidence to support my point.
Finding a part-time job can make college students life more colorfuland enrich their life.Moreover,doing part-time job they can gain experiencewhich is the best way to their future career.They can gain the treasure todeal with various problems.Thirdly,it is a great opportunity to develop theability to widen their work efficiently which will increase the quality oftheir career and life.Fourthly,part-time job can help us earn money.So we canreduce our parents'burden and buy something we like.For example,I even usedsome money from my part-time to buy presents to my mother.Last but not least,acquiring the ability to deal with many problems can also benefit them andthey can make the best of the knowledge they have learned in order to getcomprehensive development.
In a word,getting part-time jobs are our pre-classes of our futurecareer.Sufficient knowledge allow we massively experience is for the mentalpromise to our life.
(1)对学生英语短文题目和英语短文进行词性标注的结果如下所示:
英语短文题目词性标注结果如下:
Whether/IN,it/PRP,is/VBZ,important/JJ,for/IN,college/NN,students/NNS,to/TO,have/VB,a/DT,part/NN,time/NN,job/NN
英语短文内容词性标注结果如下:
Nowadays/RB,it/PRP,is/VBZ,a/DT,common/JJ,phenomenon/NN,that/WDT,college/NN,students/NNS,do/VBP,part-time/JJ,job/NN,even/RB,I/PRP,have/VBP,tried/VBN,hard/JJ,to/TO,get/VB,a/DT,decent/JJ,part-time/JJ,job/NN,I/PRP,strongly/RB,agree/VBP,and/CC,advocate/VBP,college/NN,students/NNS,to/TO,find/VB,part-time/JJ,job/NN,I/PRP,have/VBP,the/DT,reasons/NNS,and/CC,evidence/NN,to/TO,support/VB,my/PRP,point/NN.Finding/VBG,a/DT,part-time/JJ,job/NN,can/MD,make/VB,college/NN,students/NNS,life/NN,more/RBR,colorful/JJ,and/CC,enrich/VB,their/PRP,life/NN.Moreover/RB,doing/VBG,part-time/JJ,job/NN,they/PRP,can/MD,gain/VB,experience/NN,which/WDT,is/VBZ,the/DT,best/JJS,way/NN,to/TO,their/PRP,future/JJ,career/NN.They/PRP,can/MD,gain/VB,the/DT,treasure/NN,to/TO,deal/VB,with/IN,various/JJ,problems/NNS.Thirdly/RB,it/PRP,is/VBZ,a/DT,great/JJ,opportunity/NN,to/TO,develop/VB,the/DT,ability/NN,to/TO,widen/VB,their/PRP,work/NN,efficiently/RB,which/WDT,will/MD,increase/VB,the/DT,quality/NN,of/IN,their/PRP,career/NN,and/CC,life/NN,Fourthly/RB,part-time/JJ,job/NN,can/MD,help/VB,us/PRP,earn/VB,money/NN,So/IN,we/PRP,can/MD,reduce/VB,our/PRP,parents/NNS,burden/NN,and/CC,buy/VB,something/NN,we/PRP,like/VBP,For/IN,example/NN,I/PRP,even/RB,used/VBD,some/DT,money/NN,from/IN,my/PRP,part-time/JJ,to/TO,buy/VB,presents/NNS,to/TO,my/PRP,mother/NN.Last/JJ,but/CC,not/RB,least/JJS,acquiring/VBG,the/DT,ability/NN,to/TO,deal/VB,with/IN,many/JJ,problems/NNS,can/MD,also/RB,benefit/VB,them/PRP,and/CC,they/PRP,can/MD,make/VB,the/DT,best/JJ,Sof/IN,the/DT,knowledge/NN,they/PRP,have/VBP,learned/VBN,in/IN,order/NN,to/TO,get/VB,comprehensive/JJ,development/NN,In/IN,a/DT,word/NN,getting/VBG,part-time/JJ,jobs/NNS,are/VBP,our/PRP,pre-classes/NNS,of/IN,our/PRP,future/JJ,career/NN,Sufficient/JJ,knowledge/NN,allow/VBP,we/PRP,massively/RB,experience/NN,is/VBZ,for/IN,the/DT,mental/JJ,promise/NN,to/TO,our/PRP,life/NN.
(2)对英语短文题目和英语短文进行切分后的结果如下所示:
英语短文段落切分结果如下:
[Nowadays,it is a common phenomenon that college students do part-time job,even I have tried hard to get a decent part-time job.I stronglyagree and advocate college students to find part-time job.I have the reasonsand evidence to support my point.]-1
[Finding a part-time job can make college students life more colorfuland enrich their life.Moreover,doing part-time job they can gain experiencewhich is the best way to their future career.They can gain the treasure todeal with various problems.Thirdly,it is a great opportunity to develop theability to widen their work efficiently which will increase the quality oftheir career and life.Fourthly,part-time job can help us earn money.So we canreduce our parents'burden and buy something we like.For example,I even usedsome money from my part-time to buy presents to my mother.Last but not least,acquiring the ability to deal with many problems can also benefit them andthey can make the best of the knowledge they have learned in order to getcomprehensive development.]-2
[In a word,getting part-time jobs are our pre-classes of our futurecareer.Sufficient knowledge allow we massively experience is for the mentalpromise to our life.]-3
英语短文句子切分结果如下:
[[nowadays,,,it,be,a,common,phenomenon,that,college,student,do,part-time,job,,,even,I,have,try,hard,to,get,a,decent,part-time,job,.]-1,[I,strongly,agree,and,advocate,college,student,to,find,part-time,job,.]-2,[I,have,the,reason,and,evidence,to,support,my,point,.]-3,[find,a,part-time,job,can,make,college,student,life,more,colorful,and,enrich,they,life,.]-4,[moreover,,,do,part-time,job,they,can,gain,experience,which,be,the,best,way,to,they,future,career,.]-5,[they,can,gain,the,treasure,to,deal,with,various,problem,.]-6,[thirdly,,,it,be,a,great,opportunity,to,develop,the,ability,to,widen,they,work,efficiently,which,will,increase,the,quality,of,they,career,and,life,.]-7,[fourthly,,,part-time,job,can,help,we,earn,money,.]-8,[so,we,can,reduce,we,parent,',burden,and,buy,something,we,like,.]-9,[for,example,,,I,even,use,some,money,from,my,part-time,to,buy,present,to,my,mother,.]-10,[last,but,not,least,,,acquire,the,ability,to,deal,with,many,problem,can,also,benefit,they,and,they,can,make,the,best,of,the,knowledge,they,have,learn,in,order,to,get,comprehensive,development,.]-11,[in,a,word,,,get,part-time,job,be,we,pre-class,of,we,future,career,.]-12,[sufficient,knowledge,allow,we,massively,experience,be,for,the,mental,promise,to,we,life,.]-13]
(3)对英语短文题目和英语短文进行关系三元组提取后的结果如下所示:
英语短文题目关系三元组提取结果如下:
[college student_have_part time job],[it_be_important]
英语短文关系三元组提取结果如下:
段落1
[[college student_do_part-time job],
[college student_do_job],
[it_be_common],
[I_have try_hard],
[I_advocate_college student],
[I_have_evidence]]
段落2
[[college student_enrich_they life],
[they_can gain_treasure],
[treasure_deal with_various problem],
[treasure_deal with_problem],
[it_be_great],
[we_earn_money],
[we_so can reduce_we parent'burden],
[we_can reduce_we parent'burden],
[I_use from_my part-time],
[I_even use money from_my part-time],
[I_use for_example],
[I_even use_money],
[I_even use from_my part-time],
[I_use_money],
[I_use money for_example],
[I_buy present to_my mother],
[I_use money from_my part-time],
[I_even use money for_example],
[I_buy_present],
[I_even use for_example],
[they_get_development],
[they_get_comprehensive development],
[they_can make_best]]
段落3
[[we pre-class_be in_word],
[we_be for_promise],
[we_massively experience for_promise to we life],
[we_experience for_promise],
[we_be for_promise to we life],
[we_massively experience for_promise],
[we_experience for_mental promise],
[we_be for_mental promise to we life],
[knowledge_allow_we massively experience],
[we_massively experience for_mental promise to we life],
[we_experience for_mental promise to we life],
[sufficient knowledge_allow_we massively experience],
[we_be for_mental promise],
[we_massively experience for_mental promise],
[knowledge_allow_we experience],
[we_experience for_promise to we life],
[sufficient knowledge_allow_we experience]]。
第二步骤:执行“英语短文层次主题树混合语义空间分析模块”
英语短文层次主题树混合语义空间分析模块是利用第一步骤英语短文句子预处理模块生成的英语短文题目和英语短文切分结果和关系三元组提取结果,对英语短文题目、全文、段落和句子的关系三元组分别进行主题聚类,然后映射到分布式向量空间中生成英语短文的题目主题关系三元组分布式向量、全文主题关系三元组分布式向量、段落主题关系三元组分布式向量、句子主题关系三元组分布式向量,并匹配知识库中语义概念抽取关系三元组,扩展英语短文全文、段落、句子和题目的主题关系三元组分布式向量。本实施方式的英语短文层次主题树混合语义空间分析结果如下所示:
300维英语短文题目主题关系三元组分布式向量如下:
[0.79,0.67,-0.19,-0.62,-0.29,0.37,0.98,-0.23,-0.73,0.26,0.08,0.21,-0.06,-0.13,0.43,0.12,0.00,-0.02,0.55,-0.05,0.20,0.39,-0.01,-0.27,0.05,0.06,-0.09,0.49,0.01,0.10,-0.05,-0.15,0.13,0.01,-0.25,-0.03,-0.19,0.04,0.23,0.08,-0.24,0.05,-0.15,-0.04,0.23,-0.29,-0.12,-0.10,0.37,0.09,0.11,-0.26,-0.07,-0.01,0.10,0.51,0.10,-0.13,-0.07,0.25,0.05,-0.09,-0.18,0.11,-0.30,0.23,-0.11,0.20,-0.19,0.15,-0.14,-0.05,-0.28,-0.30,0.51,-0.35,0.07,-0.06,0.07,0.03,-0.26,0.34,-0.15,-0.09,0.04,0.16,0.32,0.05,-0.05,-0.15,0.30,-0.12,0.14,0.02,-0.08,0.35,0.05,0.16,-0.10,0.01,0.25,-0.09,0.32,0.16,-0.17,0.04,-0.09,-0.25,0.13,0.27,-0.14,-0.23,-0.22,0.25,-0.16,-0.17,0.21,0.37,0.22,0.06,0.07,-0.15,0.10,-0.37,0.07,-0.09,0.17,0.11,-0.16,-0.04,-0.04,-0.32,0.24,-0.06,-0.13,-0.38,-0.09,0.16,-0.01,-0.08,0.06,0.22,-0.16,0.02,0.01,0.13,-0.06,-0.14,-0.14,-0.41,0.10,0.16,0.00,0.41,0.36,-0.12,-0.02,-0.08,0.08,0.04,0.03,0.13,-0.12,-0.09,0.01,0.13,-0.05,0.12,0.17,0.09,0.23,-0.07,-0.04,0.03,-0.08,0.10,-0.32,0.05,0.08,0.16,0.05,0.01,-0.11,0.10,0.19,-0.09,0.02,0.19,-0.36,0.08,0.14,0.18,0.11,-0.27,0.14,0.23,0.07,0.09,0.04,0.16,0.02,0.06,0.03,-0.30,-0.09,-0.26,0.03,-0.11,0.09,0.20,0.16,-0.05,0.02,-0.06,-0.15,-0.16,-0.08,0.05,-0.12,-0.03,0.04,-0.13,0.06,0.14,-0.12,0.07,-0.09,-0.05,-0.04,0.01,-0.17,-0.22,-0.09,-0.01,0.06,-0.14,-0.19,0.09,0.01,-0.01,-0.16,-0.14,0.06,-0.06,0.10,0.13,0.19,0.05,-0.08,0.21,0.13,-0.14,-0.07,0.05,0.02,-0.08,-0.06,0.00,-0.20,0.05,0.07,-0.08,-0.01,0.03,-0.10,-0.04,-0.05,0.13,0.09,0.17,0.08,-0.26,-0.18,0.20,0.05,-0.23,-0.20,0.02,0.05,-0.28,0.16,-0.17,0.03,0.17,-0.18,-0.04,0.00,-0.10,-0.00,-0.13,0.15,-0.03,0.09,0.01,-0.08,0.09,-0.02,-0.09,0.03,-0.22]
300维英语短文全文主题关系三元组分布式向量如下:
[0.64,0.52,-0.16,-0.57,-0.29,0.42,0.83,-0.17,-0.66,0.25,-0.03,0.25,-0.01,-0.15,0.34,0.27,-0.09,-0.07,0.44,-0.06,0.20,0.41,-0.05,-0.36,0.05,0.09,0.01,0.49,0.05,0.09,-0.00,-0.12,0.24,-0.14,-0.13,-0.06,-0.23,-0.02,0.14,0.12,-0.29,0.00,-0.06,-0.06,0.16,-0.27,-0.20,-0.20,0.26,0.11,0.12,-0.28,-0.09,-0.06,0.12,0.55,0.01,-0.13,-0.13,0.17,0.14,-0.15,-0.21,0.02,-0.22,0.24,-0.14,0.16,-0.21,0.14,-0.10,-0.07,-0.38,-0.24,0.39,-0.26,0.11,-0.16,0.04,0.03,-0.25,0.30,-0.18,-0.12,0.11,0.13,0.29,0.06,0.01,-0.15,0.30,-0.03,0.16,-0.00,-0.10,0.38,0.16,0.10,-0.07,-0.02,0.20,-0.12,0.36,0.11,-0.17,0.14,-0.10,-0.20,0.14,0.22,-0.17,-0.23,-0.20,0.13,-0.21,-0.17,0.18,0.33,0.15,0.09,-0.01,-0.07,0.03,-0.27,0.08,-0.09,0.12,0.05,-0.14,-0.03,-0.07,-0.27,0.16,-0.04,-0.16,-0.32,-0.03,0.09,0.01,-0.11,-0.01,0.08,-0.16,-0.01,0.01,0.13,-0.13,-0.18,-0.21,-0.36,0.08,0.15,0.05,0.40,0.30,-0.03,0.01,-0.02,0.09,0.10,-0.03,0.15,-0.16,-0.19,0.01,0.10,-0.02,0.15,0.14,-0.01,0.18,-0.12,-0.13,0.02,-0.06,0.09,-0.33,-0.01,0.04,0.14,-0.01,0.05,-0.11,0.02,0.23,-0.15,0.00,0.14,-0.23,0.18,0.17,0.09,0.09,-0.29,0.14,0.30,0.04,0.15,0.04,0.10,-0.02,0.08,0.02,-0.21,-0.17,-0.16,-0.04,-0.09,0.04,0.11,0.13,-0.06,-0.00,0.03,-0.17,-0.07,-0.08,0.04,-0.04,0.00,-0.07,-0.21,0.08,0.10,-0.12,0.03,-0.13,-0.05,-0.05,-0.11,-0.16,-0.21,-0.08,-0.10,-0.02,-0.11,-0.17,0.04,-0.04,-0.07,-0.12,-0.06,0.07,-0.05,0.12,0.08,0.17,0.02,0.03,0.15,0.08,-0.11,-0.14,0.06,0.06,-0.10,-0.06,-0.06,-0.20,-0.05,0.02,-0.06,0.02,0.05,-0.02,-0.09,-0.06,0.04,0.10,0.17,0.08,-0.23,-0.13,0.17,0.01,-0.20,-0.18,0.01,0.11,-0.20,0.20,-0.11,0.13,0.12,-0.11,-0.01,-0.02,-0.10,-0.01,-0.14,0.04,-0.07,0.07,-0.00,-0.07,0.19,0.04,-0.04,0.00,-0.20]
300维英语短文段落主题关系三元组分布式向量如下:
段落1[[3.53,2.45,-0.32,-2.29,-0.86,0.54,2.72,-0.15,-2.24,0.84,0.33,0.01,-0.25,-0.76,1.22,0.29,-0.25,-0.03,1.47,-0.22,0.98,0.96,0.05,-0.62,0.49,-0.15,-0.30,0.86,-0.18,-0.23,-0.30,-0.12,-0.33,0.29,-0.48,0.39,-0.43,0.14,0.29,0.33,-0.49,-0.11,-0.35,-0.15,0.73,-1.09,-0.57,-0.16,1.03,0.23,0.76,-0.02,-0.40,-0.37,0.36,1.18,-0.13,0.36,-0.30,0.32,-0.04,-0.41,-0.13,0.56,-0.88,0.22,-0.02,0.25,-0.63,0.18,-0.46,-0.39,-0.56,-0.58,1.19,-0.55,-0.02,0.16,0.07,0.02,-0.81,0.94,-0.45,-0.65,0.54,0.46,0.75,-0.28,-0.67,-1.11,0.91,-0.18,-0.04,-0.26,-0.23,0.37,-0.29,-0.02,-0.38,-0.66,-0.02,-0.62,0.12,0.36,-0.47,-0.07,-0.01,-0.24,0.07,0.46,-0.13,-0.42,-0.61,0.34,-0.62,-0.51,0.68,0.65,0.35,0.39,-0.00,-0.23,0.22,-1.08,0.23,0.13,0.16,0.25,-0.44,0.05,0.18,-0.57,-0.01,0.03,-0.83,-0.90,0.26,-0.03,-0.35,0.08,-0.47,0.25,-0.79,-0.13,0.12,0.84,-0.49,-0.62,-0.76,-0.72,0.01,0.24,0.51,0.63,0.70,-0.11,-0.07,-0.15,-0.03,-0.24,0.21,0.25,0.18,-0.44,0.49,0.35,-0.16,0.30,-0.61,0.08,0.47,-0.38,-0.24,-0.29,-0.18,0.22,-1.45,0.05,-0.16,0.84,0.19,-0.04,-0.46,-0.32,0.10,-0.25,-0.24,0.43,-0.14,0.01,0.62,0.51,0.18,-0.48,0.44,0.81,0.17,0.54,-0.28,0.07,0.09,0.44,0.33,-0.29,-0.76,-0.47,0.00,-0.31,0.14,-0.18,-0.10,0.06,-0.45,0.14,-0.12,0.11,-0.25,0.53,-0.31,-0.16,0.05,-1.05,0.24,0.04,-0.27,0.42,-0.38,0.14,-0.23,-0.44,-0.25,-1.11,0.08,-0.52,0.14,0.09,-0.72,-0.27,-0.12,-0.21,-0.26,0.22,0.34,0.03,0.31,-0.07,0.21,-0.18,0.23,0.24,0.46,0.29,-0.80,0.25,0.32,-0.35,-0.37,-0.17,-0.58,-0.13,0.23,-0.53,0.08,0.21,-0.15,-0.27,-0.01,0.19,0.02,0.34,0.23,-0.62,-0.29,0.91,-0.07,-0.60,-0.45,-0.37,0.18,-0.20,0.09,-0.31,0.47,0.05,-0.20,-0.14,-0.15,-0.29,0.06,-0.34,0.06,-0.39,-0.01,0.16,-0.39,0.35,-0.09,0.07,-0.14,-0.23]]
段落2[[3.26,2.25,-0.03,-1.37,-1.26,-0.17,2.56,-0.22,-1.58,0.79,0.94,-0.18,0.07,-0.27,1.38,0.04,-0.51,-0.45,-0.43,0.36,0.16,-0.25,-0.43,-1.35,0.44,1.05,0.17,-0.33,0.60,-0.42,-0.23,-0.30,-0.20,-0.57,-0.58,-0.86,-0.58,0.08,0.34,0.19,-0.20,0.44,-0.26,-0.05,0.59,-1.43,-0.81,-0.67,0.87,-0.47,0.28,-0.41,-0.66,0.25,0.76,0.84,-0.34,0.48,-0.32,0.22,0.22,0.13,-0.29,0.50,-1.17,0.38,0.18,-0.07,-0.75,0.58,0.16,-0.12,-0.42,-0.07,0.18,-1.22,0.06,0.26,-0.56,0.36,-0.26,0.67,-0.65,-0.25,1.31,0.31,-0.11,0.45,0.17,-1.23,0.59,0.08,0.01,0.05,-0.01,0.21,0.09,0.24,0.33,-0.42,0.01,-0.03,-0.03,0.01,-0.45,-0.00,0.43,0.13,-0.43,0.47,0.31,-1.24,-0.39,0.71,-0.27,-0.52,0.85,0.85,0.43,-0.08,0.10,0.08,0.11,-0.24,0.63,-0.39,0.40,0.03,-0.25,-0.70,-0.13,-0.60,-0.34,-0.34,-0.40,-0.58,0.62,-0.18,-0.29,-0.11,-0.61,0.05,-0.57,0.66,-0.58,0.83,-0.23,0.18,-0.71,-0.20,-0.05,0.41,0.46,0.18,0.41,0.19,-0.12,0.12,0.02,0.24,-0.30,0.25,0.02,0.25,0.32,0.71,-0.43,0.41,-0.14,0.38,0.59,-0.27,0.50,0.28,0.14,-0.19,-1.00,-0.13,0.34,0.40,0.37,0.34,-0.27,-0.22,0.08,-0.57,0.14,-0.03,0.01,0.19,-0.28,0.50,-0.40,-0.39,0.05,0.30,-0.05,0.23,-0.35,0.62,-0.09,0.85,0.27,-0.06,-0.31,-0.02,-0.06,-0.10,0.41,0.06,0.04,0.40,-0.05,0.21,-0.07,-0.06,-0.08,-0.15,-0.34,-0.68,-0.34,-0.68,0.13,0.61,-0.17,0.31,-0.02,0.10,-0.31,-0.44,0.07,-0.51,0.14,-0.48,0.03,-0.23,-0.39,0.26,0.07,0.11,-0.22,-0.18,0.26,-0.45,0.22,-0.09,0.08,-0.33,0.45,0.12,0.10,0.43,-0.22,-0.11,0.05,0.16,-0.64,-0.19,-0.71,0.10,0.42,-0.25,0.35,0.03,-0.53,-0.68,-0.13,-0.09,0.36,0.34,0.34,-0.36,0.08,0.28,0.11,-0.35,-0.35,-0.16,-0.36,-0.06,0.28,-0.35,0.49,0.31,-0.08,0.04,-0.29,-0.11,0.24,0.11,-0.03,-0.08,0.04,-0.23,0.16,0.27,-0.21,-0.19,-0.02,-0.06]]
段落3[[1.08,0.89,-0.24,-0.47,-0.57,0.31,1.31,-0.16,-0.91,0.55,0.14,0.24,-0.05,-0.17,0.40,0.15,0.04,-0.16,0.19,0.05,0.09,0.09,-0.07,-0.56,-0.12,0.52,-0.23,0.53,0.36,-0.22,0.11,-0.21,-0.26,-0.26,-0.55,-0.14,-0.29,0.29,0.17,0.28,-0.15,0.05,-0.41,-0.09,0.01,-0.18,-0.34,-0.17,0.70,-0.46,0.42,-0.15,-0.44,0.10,0.56,0.77,-0.03,-0.09,0.16,0.18,0.05,0.21,-0.11,0.27,-0.53,0.52,0.14,0.29,-0.08,0.24,0.11,0.29,-0.15,-0.21,0.26,-0.24,-0.03,-0.29,-0.12,0.18,-0.28,0.45,-0.09,-0.04,-0.00,0.06,0.24,0.15,0.14,-0.29,-0.12,0.17,0.37,0.28,-0.10,-0.08,-0.08,0.17,0.22,0.26,-0.11,0.09,0.17,0.14,-0.17,-0.15,0.10,-0.12,-0.17,-0.05,0.10,-0.73,-0.26,-0.01,-0.10,0.08,0.37,0.66,-0.06,-0.15,0.37,-0.15,-0.25,0.24,0.10,-0.17,0.09,0.00,-0.07,-0.42,-0.05,-0.28,-0.40,-0.06,-0.18,-0.70,0.04,-0.21,-0.03,0.12,0.10,-0.21,-0.12,0.11,-0.24,0.42,-0.11,0.05,-0.12,-0.06,0.07,0.05,0.07,0.38,0.45,0.27,0.20,-0.19,0.05,0.08,0.12,-0.14,-0.18,-0.25,0.07,0.29,-0.16,0.18,-0.12,0.04,-0.09,0.05,-0.28,0.11,0.30,-0.15,-0.29,-0.14,0.18,0.02,0.12,0.09,-0.44,0.23,0.15,-0.28,0.17,0.06,-0.04,0.08,0.09,0.34,-0.28,0.14,-0.01,-0.03,0.12,0.16,-0.17,0.00,-0.30,0.24,-0.01,0.14,-0.12,0.10,0.22,-0.13,0.09,0.08,0.44,0.03,0.18,0.30,-0.12,-0.11,-0.15,0.33,-0.18,-0.24,-0.30,0.02,0.21,0.34,-0.05,-0.01,-0.21,0.02,-0.18,-0.44,0.08,0.01,0.12,-0.17,-0.16,-0.01,-0.31,0.08,-0.22,0.07,-0.19,-0.13,0.22,-0.39,0.00,0.02,0.00,-0.09,0.05,-0.09,-0.17,0.10,-0.50,-0.04,0.17,-0.08,0.13,-0.06,-0.35,-0.10,-0.10,0.03,0.10,-0.08,-0.04,-0.17,0.01,0.00,0.09,0.06,0.07,-0.25,0.26,0.22,-0.12,-0.03,0.04,0.17,-0.08,0.32,-0.06,-0.20,-0.11,0.18,-0.36,0.14,0.07,0.13,-0.11,-0.30,-0.19,-0.09,-0.06,-0.07,-0.18,0.01,0.25,-0.21,0.17,-0.25]]
300维英语短文句子主题关系三元组分布式向量如下:
英语短文第1句主题关系三元组分布式向量如下:
[2.23,1.73,0.09,-1.11,-0.35,0.17,1.77,-0.34,-1.33,1.08,0.26,0.10,0.27,-0.46,0.98,0.15,0.39,-0.10,0.84,0.49,0.31,0.69,0.12,-0.62,0.44,0.49,-0.40,0.86,-0.07,0.30,0.07,0.05,0.17,0.09,-0.60,0.23,-0.38,0.11,0.63,0.26,-0.35,-0.25,-0.71,-0.10,0.64,-0.60,-0.41,-0.15,0.58,-0.20,0.45,0.07,-0.44,-0.17,0.25,0.52,-0.18,-0.24,-0.22,0.17,-0.21,-0.07,-0.34,0.60,-0.70,0.21,-0.11,-0.09,-0.44,0.23,0.15,-0.04,-0.23,-0.19,0.86,-0.37,0.07,0.14,-0.15,-0.07,-0.45,0.56,-0.63,-0.24,0.27,0.27,0.86,-0.02,-0.34,-0.81,0.64,-0.01,-0.07,-0.09,0.01,0.04,-0.10,0.19,-0.16,-0.26,0.38,-0.16,-0.09,0.06,-0.09,-0.37,-0.04,-0.19,0.11,-0.00,-0.02,-0.22,-0.23,0.52,-0.35,-0.32,0.28,0.41,0.23,0.27,-0.16,-0.09,0.00,-0.37,0.22,0.15,0.24,0.16,-0.16,-0.10,0.32,0.04,0.23,0.25,-0.67,-0.67,0.18,-0.15,-0.18,-0.11,-0.34,0.17,-0.51,0.09,-0.15,0.35,-0.25,-0.37,-0.24,-0.33,0.07,0.10,0.23,0.36,0.38,-0.03,-0.13,-0.26,0.06,0.04,-0.21,-0.13,0.03,-0.34,0.42,0.34,-0.07,0.19,-0.25,-0.01,0.21,-0.29,-0.16,-0.05,0.08,0.08,-0.88,0.26,-0.08,0.49,0.21,0.25,-0.35,-0.17,-0.09,-0.26,0.04,0.22,-0.16,0.09,0.25,0.35,0.08,-0.45,0.09,0.59,0.05,0.31,-0.01,-0.00,-0.05,0.51,0.21,-0.15,-0.56,-0.43,-0.11,-0.01,0.12,0.09,-0.15,-0.13,-0.04,-0.03,0.07,0.19,-0.05,0.21,-0.16,-0.20,-0.03,-0.53,0.22,0.33,-0.13,0.03,-0.19,-0.09,-0.13,-0.18,-0.00,-0.62,0.01,-0.58,0.04,0.01,-0.41,-0.03,-0.32,-0.16,-0.26,0.02,0.35,0.02,0.41,-0.06,-0.14,-0.05,-0.03,0.11,0.06,0.33,-0.45,-0.03,0.10,0.07,-0.22,-0.16,-0.22,-0.01,0.07,-0.18,0.10,0.10,-0.05,-0.15,0.02,0.24,0.02,0.43,0.14,-0.14,-0.21,0.52,-0.02,-0.16,-0.09,-0.22,0.12,0.09,-0.06,-0.36,0.28,0.20,-0.20,-0.07,-0.13,-0.27,0.13,-0.24,0.14,-0.18,-0.15,0.03,-0.15,0.25,-0.26,0.06,-0.10,-0.19]
英语短文第2句主题关系三元组分布式向量如下:
[1.12,0.91,-0.30,-0.67,-0.27,0.16,1.14,-0.09,-0.73,0.19,0.17,-0.04,-0.27,-0.15,0.47,0.13,-0.43,-0.07,0.52,-0.03,0.41,0.57,-0.13,-0.13,0.19,-0.29,-0.02,0.37,-0.10,-0.04,0.04,-0.10,-0.12,0.23,-0.09,0.21,-0.13,-0.03,0.11,0.02,-0.33,-0.04,-0.00,-0.14,0.13,-0.43,-0.12,-0.05,0.23,0.12,0.27,-0.10,-0.09,-0.17,0.01,0.40,0.14,0.29,0.14,0.16,-0.01,-0.24,-0.08,0.12,-0.34,0.20,-0.16,0.24,-0.16,0.06,-0.23,-0.10,-0.18,-0.21,0.54,-0.11,0.10,0.08,0.12,0.09,-0.31,0.36,-0.03,-0.24,0.24,0.11,0.06,-0.02,-0.22,-0.14,0.17,-0.15,-0.05,-0.05,-0.21,0.23,-0.11,0.09,-0.23,-0.14,-0.10,-0.26,0.11,0.27,-0.02,0.06,-0.06,-0.21,0.07,0.08,-0.05,0.10,-0.11,0.03,-0.21,-0.18,0.34,0.30,0.01,0.26,0.12,-0.12,0.06,-0.53,0.02,0.02,0.02,-0.00,-0.25,0.03,0.12,-0.18,0.06,-0.15,-0.31,-0.35,-0.03,0.06,-0.20,0.07,-0.18,0.20,-0.13,-0.13,0.24,0.26,-0.17,-0.33,-0.38,-0.14,0.02,-0.08,0.04,0.23,0.21,0.04,0.05,-0.10,-0.10,-0.22,0.04,0.09,0.19,-0.32,0.13,0.16,-0.04,0.20,-0.13,-0.06,0.30,0.00,-0.14,-0.07,-0.19,0.11,-0.37,-0.05,0.06,0.27,0.04,-0.10,-0.07,-0.10,0.14,-0.11,-0.20,0.19,-0.02,-0.04,0.27,0.14,0.19,-0.04,0.31,0.31,0.28,0.23,-0.02,-0.12,-0.08,0.17,-0.05,-0.16,-0.26,-0.09,-0.03,-0.21,0.03,0.07,0.06,0.03,-0.16,0.06,-0.01,0.11,-0.13,0.13,-0.06,-0.06,0.06,-0.37,0.07,-0.08,-0.08,0.12,-0.09,-0.02,-0.19,-0.10,-0.06,-0.37,-0.03,-0.18,-0.03,-0.01,-0.26,-0.16,-0.04,-0.06,-0.04,0.16,0.09,0.05,0.05,-0.13,0.18,0.02,0.01,0.17,0.14,0.03,-0.15,0.04,0.21,-0.08,0.02,-0.02,-0.31,-0.01,-0.04,-0.24,-0.03,0.01,-0.03,-0.19,0.01,-0.10,-0.15,0.09,0.03,-0.21,-0.00,0.15,-0.17,-0.27,-0.21,-0.10,0.00,-0.06,0.12,-0.07,0.15,-0.02,-0.10,-0.06,-0.05,-0.24,0.06,-0.19,0.04,-0.06,0.05,0.06,-0.11,0.23,0.10,0.17,-0.05,-0.17]
英语短文第3句主题关系三元组分布式向量如下:
[0.22,0.14,0.02,-0.28,-0.01,0.06,0.14,-0.26,-0.14,-0.14,0.11,-0.23,-0.06,-0.15,-0.18,0.10,0.03,0.11,0.00,0.11,0.13,-0.07,-0.16,0.07,0.18,0.32,-0.14,-0.29,-0.03,0.02,0.18,0.05,0.05,-0.05,0.05,0.08,-0.05,-0.10,0.00,-0.05,-0.05,-0.20,0.08,0.15,-0.06,0.07,-0.05,-0.02,-0.09,-0.01,-0.12,-0.11,-0.01,-0.02,0.05,0.05,-0.10,0.06,0.04,0.03,-0.13,0.02,-0.05,0.00,-0.04,0.03,0.14,-0.06,0.02,-0.02,0.00,0.09,0.16,-0.07,0.01,-0.04,0.13,-0.01,-0.13,0.00,0.02,-0.15,0.04,-0.08,0.06,-0.07,0.06,0.12,-0.00,0.05,-0.12,-0.01,-0.09,-0.03,-0.15,-0.11,-0.09,0.11,-0.10,-0.14,-0.06,0.11,-0.01,-0.07,0.05,0.07,0.20,0.02,-0.08,-0.04,0.16,-0.02,-0.04,-0.05,-0.02,-0.00,0.01,0.03,-0.05,0.03,0.05,0.17,-0.04,-0.22,0.15,0.08,0.08,-0.05,-0.11,0.09,0.06,-0.10,0.05,0.01,-0.01,-0.08,0.16,0.13,-0.01,0.07,-0.12,0.05,-0.10,0.09,-0.04,-0.03,0.01,-0.01,-0.01,-0.01,-0.01,-0.03,0.02,0.02,-0.05,-0.04,0.08,0.04,-0.22,0.09,-0.09,-0.09,0.06,-0.04,0.12,-0.02,0.03,0.06,-0.02,0.04,0.02,0.07,-0.15,-0.09,-0.02,0.02,0.02,-0.06,-0.07,-0.08,-0.03,0.06,0.03,0.05,-0.12,-0.05,0.07,0.10,-0.10,0.04,0.09,-0.04,-0.07,-0.08,0.03,-0.04,0.01,0.02,0.09,-0.07,-0.06,-0.00,0.03,0.02,0.09,0.05,0.00,-0.04,0.07,0.02,-0.06,-0.02,-0.08,0.04,0.07,0.02,0.13,0.02,-0.13,0.03,0.05,0.01,-0.00,-0.10,-0.02,0.10,0.07,0.03,0.06,0.02,-0.07,0.00,0.01,0.02,-0.02,-0.07,0.00,-0.03,-0.04,-0.01,-0.01,0.07,-0.07,-0.04,-0.09,0.05,0.02,-0.08,-0.09,0.02,-0.00,-0.01,-0.12,-0.03,0.04,0.01,0.01,0.06,-0.02,-0.07,0.08,0.05,-0.02,0.03,-0.03,0.01,-0.09,-0.01,-0.05,0.05,-0.03,-0.07,-0.16,0.07,-0.06,0.01,0.07,0.04,-0.02,0.09,-0.11,-0.05,0.06,-0.05,-0.04,0.03,-0.01,-0.09,-0.02,0.00,0.06,0.02,-0.00,-0.03,0.04,-0.06,-0.06,0.07,0.02,0.13]
……
英语短文第12句主题关系三元组分布式向量如下:
[1.20,1.01,-0.28,-0.63,-0.44,0.45,1.35,-0.36,-0.98,0.53,0.20,0.11,-0.14,-0.25,0.19,0.09,0.21,-0.10,0.33,-0.02,0.17,0.08,-0.07,-0.38,-0.08,0.68,-0.20,0.45,0.37,-0.22,0.18,-0.27,-0.35,-0.37,-0.57,-0.18,-0.32,0.27,0.16,0.27,-0.14,-0.01,-0.42,-0.02,-0.03,-0.15,-0.34,-0.29,0.62,-0.44,0.35,-0.19,-0.48,0.01,0.72,0.71,-0.06,0.00,0.11,0.27,-0.03,0.25,-0.13,0.24,-0.57,0.51,0.18,0.24,-0.05,0.17,0.14,0.41,-0.11,-0.19,0.22,-0.29,0.06,-0.31,-0.12,0.13,-0.22,0.41,-0.14,-0.04,0.02,0.08,0.25,0.16,0.11,-0.27,-0.23,0.02,0.35,0.25,-0.17,-0.06,-0.15,0.11,0.26,0.19,-0.04,0.07,0.14,0.18,-0.28,-0.16,0.11,-0.21,-0.22,0.03,0.07,-0.75,-0.34,0.08,-0.13,0.13,0.34,0.73,-0.14,-0.20,0.41,-0.10,-0.15,0.16,0.17,-0.21,0.12,-0.01,-0.14,-0.42,-0.13,-0.24,-0.37,-0.09,-0.20,-0.72,0.12,-0.21,-0.03,0.15,0.07,-0.16,-0.13,0.12,-0.25,0.46,-0.11,-0.02,-0.09,-0.06,0.08,0.04,0.15,0.36,0.43,0.24,0.20,-0.21,0.02,0.05,0.10,-0.16,-0.11,-0.19,0.14,0.21,-0.18,0.21,-0.19,0.07,-0.09,0.09,-0.25,0.12,0.20,-0.19,-0.31,-0.18,0.16,-0.07,0.09,0.03,-0.57,0.22,0.20,-0.33,0.13,-0.00,-0.04,0.04,0.08,0.33,-0.20,0.12,-0.00,0.02,0.13,0.13,-0.16,-0.03,-0.33,0.25,-0.02,0.16,-0.07,0.09,0.25,-0.15,0.06,0.09,0.45,-0.04,0.26,0.31,-0.13,-0.16,-0.15,0.29,-0.20,-0.28,-0.30,0.02,0.28,0.37,0.02,0.05,-0.16,0.01,-0.14,-0.46,0.08,0.04,0.11,-0.20,-0.11,0.02,-0.26,0.09,-0.22,0.05,-0.17,-0.12,0.18,-0.36,-0.01,0.04,0.01,-0.12,0.04,-0.11,-0.17,0.04,-0.52,-0.05,0.14,-0.12,0.11,-0.02,-0.42,-0.07,-0.14,0.03,0.12,-0.01,-0.05,-0.17,0.06,-0.08,0.05,0.03,0.10,-0.28,0.22,0.20,-0.23,-0.11,0.05,0.19,0.01,0.29,-0.13,-0.28,-0.07,0.19,-0.29,0.13,0.05,0.13,-0.07,-0.35,-0.20,-0.17,-0.04,-0.11,-0.16,-0.05,0.23,-0.18,0.22,-0.22]
英语短文第13句主题关系三元组分布式向量如下:
[1.50,0.90,-0.26,-0.90,0.21,-0.16,1.17,0.12,-0.40,0.57,0.51,0.12,0.11,-0.01,0.68,-0.10,-0.40,-0.04,-0.05,0.15,-0.28,-0.35,-0.26,0.10,0.46,0.12,-0.13,-0.37,-0.26,-0.17,-0.03,-0.06,-0.34,-0.48,-0.59,0.18,-0.03,-0.05,0.15,0.31,0.57,0.17,0.21,0.04,0.05,-0.17,-0.04,0.16,0.61,0.18,0.14,0.02,-0.34,-0.06,0.36,-0.05,0.01,-0.08,0.20,0.04,0.23,-0.07,-0.03,0.42,-0.23,0.02,0.31,0.14,-0.08,0.09,0.12,0.33,0.31,0.05,0.25,-0.28,0.00,0.20,-0.13,0.59,-0.07,0.22,-0.30,-0.39,0.10,0.01,-0.09,0.35,0.06,-0.14,-0.36,-0.00,0.15,0.15,-0.20,0.14,0.05,-0.09,0.05,-0.14,-0.12,0.00,0.12,0.16,-0.23,0.34,0.21,0.43,-0.26,-0.06,0.26,-0.65,0.02,0.09,0.05,0.29,-0.06,0.50,0.12,0.01,0.35,-0.19,-0.08,0.14,0.21,0.02,-0.15,0.03,0.08,-0.42,-0.04,0.02,-0.22,-0.11,0.13,-0.32,0.06,0.19,-0.00,0.25,-0.03,-0.10,0.07,-0.22,-0.05,0.26,0.06,-0.01,-0.28,0.03,-0.17,-0.17,0.05,0.28,0.03,0.18,0.46,-0.14,-0.21,0.22,-0.00,0.01,0.14,0.09,0.07,0.15,-0.37,-0.01,-0.20,0.04,0.34,0.07,0.05,-0.11,0.01,0.31,-0.06,0.18,0.44,0.05,0.01,0.00,0.03,0.16,-0.07,0.00,0.17,-0.19,0.11,0.01,-0.28,0.10,-0.13,0.05,-0.38,-0.15,0.08,0.11,-0.02,-0.31,-0.22,0.24,-0.01,0.17,0.25,0.19,0.20,0.21,-0.04,-0.16,0.16,0.28,0.36,0.16,0.06,-0.47,-0.16,-0.22,0.25,-0.28,-0.02,0.11,-0.11,-0.02,0.22,0.15,-0.13,0.09,-0.01,-0.19,0.50,0.20,0.18,-0.00,-0.31,-0.12,0.04,0.10,0.02,0.24,-0.04,0.01,-0.01,-0.17,-0.05,0.10,0.15,0.04,0.30,-0.03,0.05,0.07,-0.07,-0.08,0.04,0.06,-0.19,-0.05,-0.26,0.00,0.24,-0.18,0.02,-0.10,-0.30,-0.51,-0.12,-0.03,0.18,-0.35,0.07,-0.10,0.31,0.02,-0.28,-0.30,-0.08,0.02,0.04,0.10,-0.24,0.26,-0.23,0.00,0.02,0.01,-0.15,0.17,0.06,0.25,0.07,0.06,-0.00,0.02,-0.10,0.15,0.09,-0.23,0.16,-0.06]。
第三步骤:执行“英语短文句子层次主题连贯分析模块”
英语短文句子层次主题连贯分析模块利用第二步骤中英语短文层次主题树混合语义空间分析模块输出的英语短文的题目主题关系三元组分布式向量、全文主题关系三元组分布式向量、段落主题关系三元组分布式向量、句子主题关系三元组分布式向量结果,首先,计算英语短文题目与句子层次主题连贯语义相似度,计算英语短文段落与句子层次主题连贯语义相似度,根据输出的英语短文题目与句子层次主题连贯语义相似度和英语短文段落与句子层次主题连贯语义相似度,计算最终英语短文句子层次主题连贯语义相似度,根据输出的所有英语短文句子层次主题连贯语义相似度,计算英语短文层次主题连贯语义相似度评分值;然后,计算英语短文句子与段落层次主题连贯值,计算英语短文段落与段落层次主题连贯值,计算英语短文段落与全文层次主题连贯值,根据生成的英语短文句子与段落层次主题连贯值,设置层次主题连贯阈值抽取英语短文中主题不连贯句子。根据英语短文句子与段落层次主题连贯值、英语短文段落与段落层次主题连贯值、英语短文段落与全文层次主题连贯值,计算英语短文层次主题连贯评分均值。本实施方式的英语短文句子层次主题连贯语义相似度、英语短文句子层次主题连贯值、英语短文层次主题连贯语义相似度评分值、英语短文层次主题连贯评分均值如下所示:
英语短文句子层次主题连贯语义相似度如下:
第1段第1句层次主题连贯语义相似度为:0.7786710858345032
第1段第2句层次主题连贯语义相似度为:0.7784239053726196
第1段第3句层次主题连贯语义相似度为:0.2951989471912384
第2段第4句层次主题连贯语义相似度为:0.7096657752990723
第2段第5句层次主题连贯语义相似度为:0.6682606935501099
第2段第6句层次主题连贯语义相似度为:0.23642082512378693
第2段第7句层次主题连贯语义相似度为:0.45290467143058777
第2段第8句层次主题连贯语义相似度为:0.6086656451225281
第2段第9句层次主题连贯语义相似度为:0.35007521510124207
第2段第10句层次主题连贯语义相似度为:0.5189481973648071
第2段第11句层次主题连贯语义相似度为:0.425748735666275
第3段第12句层次主题连贯语义相似度为:0.6587570309638977
第3段第13句层次主题连贯语义相似度为:0.3443957269191742
英语短文句子层次主题连贯值如下:
第1段第1句层次主题连贯值为:0.7558869123458862
第1段第2句层次主题连贯值为:0.7963798642158508
第1段第3句层次主题连贯值为:0.2205403298139572
第2段第4句层次主题连贯值为:0.7034226655960083
第2段第5句层次主题连贯值为:0.6429765820503235
第2段第6句层次主题连贯值为:0.2636907994747162
第2段第7句层次主题连贯值为:0.4748536944389343
第2段第8句层次主题连贯值为:0.6296454071998596
第2段第9句层次主题连贯值为:0.3357328772544861
第2段第10句层次主题连贯值为:0.5160608887672424
第2段第11句层次主题连贯值为:0.4081385135650635
第3段第12句层次主题连贯值为:0.6920690655708313
第3段第13句层次主题连贯值为:0.385113388299942
本发明将主题不连贯句子抽取阈值设为0.32,英语短文中句子与英语短文中段落的层次主题连贯值小于0.32时,将判定为主题不连贯句子,结果如下所示:
主题不连贯句子如下:
第1段第3句:I have the reasons and evidence to support my point.#
第2段第6句:They can gain the treasure to deal with various problems.#
英语短文层次主题连贯语义相似度评分值:0.73分;
英语短文层次主题连贯评分均值:0.70分。
第四步骤:执行“英语短文句子层次主题连贯分析输出模块”
英语短文句子层次主题连贯分析输出模块是输入第三步骤英语短文句子层次主题连贯分析模块中输出的英语短文层次主题连贯语义相似度评分值和英语短文主题连贯评分均值,对英语短文的层次主题连贯语义相似度评分值和英语短文主题连贯评分均值加权,计算英语短文的主题连贯分数,并输出英语短文的主题连贯分析评语。
本实施方式的英语短文句子层次主题连贯分析结果格式如下所示:
英语短文句子层次主题连贯性分数:71.50分。
英语短文句子层次主题连贯分析评语:
英语短文内容基本主题连贯,主题连贯表达一般,有些地方不够清楚。
Claims (7)
1.一种英语短文句子层次主题连贯分析方法,其特征是:包括一个由顺序连接的英语短文句子预处理模块、英语短文层次主题树混合语义空间分析模块、英语短文句子层次主题连贯分析模块、英语短文句子层次主题连贯分析输出模块组成的分析模型,其分析方法包括如下步骤:
(1)英语短文句子预处理模块输入英语短文的题目和全文,对英语短文题目和英语短文全文分别进行分词分句、删除停用词、词干化处理;对分词分句、删除停用词、词干化处理后的英语短文的题目和全文进行词性标注、关系三元组提取;输出处理的英语短文的题目和全文的预处理结果;
(2)英语短文层次主题树混合语义空间分析模块输入英语短文的题目和全文的预处理结果,使用构建的关系三元组层次主题树模型,对从英语短文的题目、全文、段落、句子的关系三元组信息分别进行主题聚类;将主题聚类映射到分布式语义空间中,生成英语短文的题目主题关系三元组分布式向量、全文主题关系三元组分布式向量、段落主题关系三元组分布式向量、句子主题关系三元组分布式向量;对生成的英语短文的题目主题关系三元组分布式向量、全文主题关系三元组分布式向量、段落主题关系三元组分布式向量、句子主题关系三元组分布式向量,匹配英语知识库中语义概念,抽取相邻关系三元组,并通过迭代的方法分析出最优英语短文的题目、全文、段落、句子的候选主题关系三元组集合,扩展英语短文的题目主题关系三元组分布式向量、全文主题关系三元组分布式向量、段落主题关系三元组分布式向量、句子主题关系三元组分布式向量;
(3)英语短文句子层次主题连贯分析模块输入英语短文的题目主题关系三元组分布式向量、全文主题关系三元组分布式向量、段落主题关系三元组分布式向量、句子主题关系三元组分布式向量,分别计算英语短文中题目与句子之间的层次主题连贯语义相似度、段落与句子之间的层次主题连贯语义相似度;根据计算出的英语短文中题目与句子之间的层次主题连贯语义相似度、段落与句子之间的层次主题连贯语义相似度,设置计算英语短文中题目与句子之间的层次主题连贯语义相似度的权重值、段落与句子之间的层次主题连贯语义相似度的权重值,计算出英语短文中句子的层次主题连贯语义相似度;根据计算出的英语短文中句子的层次主题连贯语义相似度,计算英语短文层次主题连贯语义相似度评分值;计算英语短文中句子与段落的层次主题连贯值、段落与段落的层次主题连贯值、各段落与全文的层次主题连贯值;根据英语短文中句子与段落的层次主题连贯值,将各句子与段落的层次主题连贯值排序,设置层次主题连贯阈值抽取英语短文中主题不连贯句子;根据英语短文中句子与段落的层次主题连贯值、段落与段落的层次主题连贯值、段落与全文的层次主题连贯值,计算出英语短文的层次主题连贯评分均值;
(4)英语短文句子层次主题连贯分析输出模块输入英语短文句子层次主题连贯分析模块中的英语短文层次主题连贯语义相似度评分值、英语短文层次主题连贯评分均值;根据英语短文句子层次主题连贯分析模块的英语短文层次主题连贯语义相似度评分值、英语短文层次主题连贯评分均值,计算英语短文的主题连贯分数,并生成英语短文的主题连贯分析评语。
2.根据权利要求1所述的分析方法,其特征是:所述的英语短文句子预处理模块处理步骤如下:
P201开始;
P202读取英语短文的题目和全文;
P203对英语短文题目进行分词分句,输出英语短文题目的切分结果;
P204对英语短文全文进行分词分句,输出英语短文全文的切分结果;
P205通过正则表达式匹配停用词集对英语短文的题目进行去停用词处理;
P206通过正则表达式匹配停用词集对英语短文的全文进行去停用词处理;
P207对英语短文的题目进行词干化处理;
P208对英语短文的全文进行词干化处理;
P209对英语短文的题目进行词性标注,输出英语短文的题目词性标注结果;
P210对英语短文的全文进行词性标注,输出英语短文的全文词性标注结果;
P211对英语短文的题目进行关系三元组提取,输出英语短文的题目关系三元组分析结果;
P212对英语短文的全文进行关系三元组提取,输出英语短文的全文关系三元组分析结果;
P213结束。
3.根据权利要求1所述的分析方法,其特征是:所述的英语短文层次主题树混合语义空间分析模块处理步骤如下:
P301开始;
P302读取英语短文的题目和全文的预处理结果;
P303基于关系三元组层次主题树模型对英语短文题目的关系三元组信息进行主题聚类,输出英语短文题目的主题聚类结果;
P304基于关系三元组层次主题树模型对英语短文全文的关系三元组信息进行主题聚类,输出英语短文的全文主题聚类结果;
P305基于关系三元组层次主题树模型对英语短文各段落的关系三元组信息进行主题聚类,输出英语短文的各段落主题聚类结果;
P306基于关系三元组层次主题树模型对英语短文各句子的关系三元组信息进行主题聚类,输出英语短文的各句子主题聚类结果;
P307读取英语短文的题目主题聚类结果映射到分布式语义空间中生成英语短文的题目主题关系三元组分布式向量;
P308读取英语短文的全文主题聚类结果映射到分布式语义空间中生成英语短文的全文主题关系三元组分布式向量;
P309读取英语短文的各段落主题聚类结果映射到分布式语义空间中生成英语短文的段落主题关系三元组分布式向量;
P310读取英语短文的句子主题聚类结果映射到分布式语义空间中生成英语短文的句子主题关系三元组分布式向量;
P311匹配知识库扩展英语短文的题目主题关系三元组分布式向量,输出英语短文的题目主题关系三元组分布式向量;
P312匹配知识库扩展英语短文的全文主题关系三元组分布式向量,输出英语短文的全文主题关系三元组分布式向量;
P313匹配知识库扩展英语短文的段落主题关系三元组分布式向量,输出英语短文的段落主题关系三元组分布式向量;
P314匹配知识库扩展英语短文的句子主题关系三元组分布式向量,输出英语短文的句子主题关系三元组分布式向量;
P315结束。
4.根据权利要求1所述的分析方法,其特征是:所述的英语短文句子层次主题连贯分析模块的计算公式定义如下:
(1)英语短文题目与句子层次主题连贯语义相似度计算公式
在计算公式(1)中,n表示在英语短文层次主题树混合语义向量空间中,从第i维到n维的英语短文题目主题关系三元组分布式向量和英语短文句子主题关系三元组分布式向量;
(2)英语短文段落与句子层次主题连贯语义相似度计算公式
在计算公式(2)中,n表示在英语短文的层次主题树混合语义向量空间中,从第i维到n维的英语短文段落主题关系三元组分布式向量和英语短文句子主题关系三元组分布式向量;
(3)英语短文句子层次主题连贯语义相似度计算公式
英语短文句子层次主题连贯语义相似度=δ1×英语短文题目与句子层次主题连贯语义相似度+δ2×英语短文段落与句子层次主题连贯语义相似度 (3)
在计算公式(3)中,δ1,δ2分别表示英语短文题目与句子层次主题连贯语义相似度、英语短文段落与句子层次主题连贯语义相似度在英语短文句子层次主题连贯语义相似度中的权重值,并且δ1+δ2=1,英语短文题目与句子层次主题连贯语义相似度由计算公式(1)得出,英语短文段落与句子层次主题连贯语义相似度由计算公式(2)得出;
(4)英语短文层次主题连贯语义相似度评分值计算公式
在计算公式(4)中,英语短文句子层次主题连贯语义相似度由计算公式(3)得出,n表示英语短文中句子总数;
(5)英语短文句子与段落层次主题连贯值计算公式
在计算公式(5)中,n表示英语短文中包含的所有句子主题关系三元组分布式向量的数量,i表示英语短文中第i个句子主题关系三元组分布式向量;
(6)英语短文段落与段落层次主题连贯值计算公式
在计算公式(6)中,n表示英语短文中包含的所有段落主题关系三元组分布式向量的数量,j表示英语短文中第j个段落主题关系三元组分布式向量;
(7)英语短文段落与全文层次主题连贯值计算公式
在计算公式(7)中,n表示英语短文中包含的所有段落主题关系三元组分布式向量的数量,k表示英语短文中第k个段落主题关系三元组分布式向量;
(8)英语短文层次主题连贯评分均值计算公式
在计算公式(8)中,ε1,ε2,ε3分别表示英语短文中句子与段落的层次主题连贯值、段落与段落的层次主题连贯值、段落与全文的层次主题连贯值在英语短文层次主题连贯评分均值中的权重分配,且ε1+ε2+ε3=1,N表示英语短文中句子的主题关系三元组分布式向量数量,M表示英语短文中段落的主题关系三元组分布式向量数量,英语短文句子与段落层次主题连贯值由计算公式(5)得出,英语短文段落与段落层次主题连贯值由计算公式(6)得出,英语短文段落与全文层次主题连贯值由计算公式(7)得出。
5.根据权利要求4所述的分析方法,其特征是:所述的英语短文句子层次主题连贯分析模块处理步骤如下:
P401开始;
P402读取英语短文的题目、全文、段落、句子的主题关系三元组分布式向量;
P403根据公式(1)计算出英语短文题目与句子层次主题连贯语义相似度,输出英语短文中题目与所有句子层次主题连贯语义相似度;
P404根据公式(2)计算出英语短文段落与句子层次主题连贯语义相似度,输出英语短文中段落与所有句子层次主题连贯语义相似度;
P405根据公式(3)计算出英语短文句子层次主题连贯语义相似度,输出英语短文句子层次主题连贯语义相似度;
P406判断英语短文中是否还有没有分析的句子、段落的层次主题连贯语义相似度,如果有跳转至P403操作,否则跳转至P407操作;
P407读取所有英语短文句子层次主题连贯语义相似度,根据公式(4)计算出英语短文层次主题连贯语义相似度评分值,输出英语短文层次主题连贯语义相似度评分值;
P408根据公式(5)计算出英语短文中句子与段落层次主题连贯值,输出所有的英语短文句子与段落层次主题连贯值;
P409根据所有的英语短文句子与段落层次主题连贯值,设置层次主题连贯阈值抽取英语短文的主题不连贯句子,生成主题不连贯句子集合;
P410根据公式(6)计算出英语短文段落与段落层次主题连贯值,输出所有的英语短文段落与段落的层次主题连贯值;
P411根据公式(7)计算出英语短文段落与全文层次主题连贯值,输出所有的英语短文段落与全文的层次主题连贯值;
P412判断英语短文中是否还有没有分析的句子和段落的层次主题连贯值,如果是跳转至P407操作,否则跳转至P413操作;
P413读取所有的英语短文句子与段落层次主题连贯值,英语短文段落与段落层次主题连贯值,英语短文段落与全文层次主题连贯值,根据公式(8)计算出英语短文层次主题连贯评分均值,输出英语短文层次主题连贯评分均值;
P414结束。
6.根据权利要求5所述的分析方法,其特征是:所述的英语短文句子层次主题连贯分析输出模块的计算公式定义如下:
英语短文主题连贯分数=
(0.5×英语短文层次主题连贯语义相似度评分值+0.5×英语短文层次主题连贯评分均值)×100 (9)
在计算公式(9)中,英语短文层次主题连贯语义相似度评分值由计算公式(4)得出,英语短文层次主题连贯评分均值由计算公式(8)得出。
7.根据权利要求6所述的分析方法,其特征是:所述的英语短文句子层次主题连贯分析输出模块处理步骤如下:
P501开始;
P502读取英语短文句子层次主题连贯分析模块中的英语短文层次主题连贯语义相似度评分值;
P503读取英语短文句子层次主题连贯分析模块中的英语短文层次主题连贯评分均值;
P504根据公式(9)计算出英语短文的主题连贯分数,输出英语短文的主题连贯分数,并生成英语短文的主题连贯评语;
P505结束。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010573975.9A CN111709224B (zh) | 2020-06-22 | 2020-06-22 | 一种英语短文句子层次主题连贯分析方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010573975.9A CN111709224B (zh) | 2020-06-22 | 2020-06-22 | 一种英语短文句子层次主题连贯分析方法 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111709224A CN111709224A (zh) | 2020-09-25 |
CN111709224B true CN111709224B (zh) | 2023-04-07 |
Family
ID=72541343
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010573975.9A Active CN111709224B (zh) | 2020-06-22 | 2020-06-22 | 一种英语短文句子层次主题连贯分析方法 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111709224B (zh) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5794050A (en) * | 1995-01-04 | 1998-08-11 | Intelligent Text Processing, Inc. | Natural language understanding system |
CN106776550A (zh) * | 2016-12-06 | 2017-05-31 | 桂林电子科技大学 | 一种英语作文语篇连贯质量的分析方法 |
CN107423282A (zh) * | 2017-05-24 | 2017-12-01 | 南京大学 | 基于混合特征的文本中语义连贯性主题与词向量并发提取方法 |
CN110287497A (zh) * | 2019-07-03 | 2019-09-27 | 桂林电子科技大学 | 一种英语文本的语义结构连贯分析方法 |
CN111104789A (zh) * | 2019-11-22 | 2020-05-05 | 华中师范大学 | 文本评分方法、装置和系统 |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10831793B2 (en) * | 2018-10-23 | 2020-11-10 | International Business Machines Corporation | Learning thematic similarity metric from article text units |
-
2020
- 2020-06-22 CN CN202010573975.9A patent/CN111709224B/zh active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5794050A (en) * | 1995-01-04 | 1998-08-11 | Intelligent Text Processing, Inc. | Natural language understanding system |
CN106776550A (zh) * | 2016-12-06 | 2017-05-31 | 桂林电子科技大学 | 一种英语作文语篇连贯质量的分析方法 |
CN107423282A (zh) * | 2017-05-24 | 2017-12-01 | 南京大学 | 基于混合特征的文本中语义连贯性主题与词向量并发提取方法 |
CN110287497A (zh) * | 2019-07-03 | 2019-09-27 | 桂林电子科技大学 | 一种英语文本的语义结构连贯分析方法 |
CN111104789A (zh) * | 2019-11-22 | 2020-05-05 | 华中师范大学 | 文本评分方法、装置和系统 |
Non-Patent Citations (5)
Title |
---|
Measuring the coherence of writing using topic-based analysis;Richard Watson Todda;《Assessing Writing》;20040819;全文 * |
Off-topic English Essay Detection Model Based on Hybrid Semantic Space for Automated English Essay Scoring System;Guimin Huang,Jian Liua,Chunli Fan,Tingting Pan;《2018 2nd International Conference on Electronic Information Technology and Computer Engineering (EITCE 2018)》;20181119;全文 * |
Unsupervised Learning by Probabilistic Latent Semantic Analysis;Thomas Hofmann;《Machine Learning》;20010131;全文 * |
基于hLDA层次主题模型的多文档摘要技术研究;刘红艳;《中国优秀硕士学位论文全文数据库 信息科技辑》;20120815;全文 * |
基于潜在语义分析的文本连贯性分析;汤世平等;《计算机应用与软件》;20080215(第02期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN111709224A (zh) | 2020-09-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108287858B (zh) | 自然语言的语义提取方法及装置 | |
CN104464736B (zh) | 语音识别文本的纠错方法和装置 | |
CN107818141B (zh) | 融入结构化要素识别的生物医学事件抽取方法 | |
Gómez-Adorno et al. | Improving feature representation based on a neural network for author profiling in social media texts | |
CN106649783A (zh) | 一种同义词挖掘方法和装置 | |
CN105975478A (zh) | 一种基于词向量分析的网络文章所属事件的检测方法和装置 | |
CN110287497B (zh) | 一种英语文本的语义结构连贯分析方法 | |
Riza et al. | Question generator system of sentence completion in TOEFL using NLP and k-nearest neighbor | |
Lee et al. | Spoken knowledge organization by semantic structuring and a prototype course lecture system for personalized learning | |
Petzell et al. | Grammatical and lexical comparison of the Greater Ruvu Bantu languages | |
Tasharofi et al. | Evaluation of statistical part of speech tagging of Persian text | |
CN112765319A (zh) | 一种文本的处理方法、装置、电子设备及存储介质 | |
JP6145059B2 (ja) | モデル学習装置、形態素解析装置、及び方法 | |
Almeman et al. | Towards developing a multi-dialect morphological analyser for arabic | |
CN111709224B (zh) | 一种英语短文句子层次主题连贯分析方法 | |
CN114138969A (zh) | 文本处理方法及装置 | |
Mahmoodvand et al. | Semi-supervised approach for Persian word sense disambiguation | |
CN110750632B (zh) | 一种改进的中文alice智能问答方法及系统 | |
Elbarougy et al. | A proposed natural language processing preprocessing procedures for enhancing arabic text summarization | |
Khoury | Microtext normalization using probably-phonetically-similar word discovery | |
CN112487806B (zh) | 一种英语文本概念理解方法 | |
Rofiq | Indonesian news extractive text summarization using latent semantic analysis | |
CN113886521A (zh) | 一种基于相似词汇表的文本关系自动标注方法 | |
JP7044245B2 (ja) | 対話システム補強装置及びコンピュータプログラム | |
Nurtomo | Greedy algorithms to optimize a sentence set near-uniformly distributed on syllable units and punctuation marks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
EE01 | Entry into force of recordation of patent licensing contract | ||
EE01 | Entry into force of recordation of patent licensing contract |
Application publication date: 20200925 Assignee: Guilin ruiweisaide Technology Co.,Ltd. Assignor: GUILIN University OF ELECTRONIC TECHNOLOGY Contract record no.: X2023980046266 Denomination of invention: A Method for Analyzing Topic Coherence at the Sentence Level in English Short Essays Granted publication date: 20230407 License type: Common License Record date: 20231108 |