CN108509415B - Sentence similarity calculation method based on word order weighting - Google Patents

Sentence similarity calculation method based on word order weighting Download PDF

Info

Publication number
CN108509415B
CN108509415B CN201810217211.9A CN201810217211A CN108509415B CN 108509415 B CN108509415 B CN 108509415B CN 201810217211 A CN201810217211 A CN 201810217211A CN 108509415 B CN108509415 B CN 108509415B
Authority
CN
China
Prior art keywords
word
corpus
sentence
sen1
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810217211.9A
Other languages
Chinese (zh)
Other versions
CN108509415A (en
Inventor
王清琛
沈盛宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Yunwen Network Technology Co ltd
Original Assignee
Nanjing Yunwen Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Yunwen Network Technology Co ltd filed Critical Nanjing Yunwen Network Technology Co ltd
Priority to CN201810217211.9A priority Critical patent/CN108509415B/en
Publication of CN108509415A publication Critical patent/CN108509415A/en
Application granted granted Critical
Publication of CN108509415B publication Critical patent/CN108509415B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a sentence similarity calculation method based on word order weighting. The method comprises the following steps: is obtained in the form of<Label1i,Sen1i>Training a corpus A to obtain word vector models of all words in the corpus A; is constructed in the form of<Label2j,Sen2j>The method for testing the corpus B adopts an incremental training mode to obtain the Sen2 in the corpus BjA word vector model for all words; obtaining a sentence Sen1 by adopting a word order weighting mode according to a word vector model obtained by the corpus BiAnd Sen2jSentence vector SenVec1iAnd SenVec2j(ii) a Calculate a Sen2 one by onejWith each statement Sen1iThe sentence Sen1 with the highest similarity is determinediCorresponding Label1iAnd Label2jIf they are identical, then they are correct, otherwise, they are<Sen1i,Sen2j>Storing the training corpus C; and further processing the training corpus C to obtain a new word vector model so as to calculate the similarity of the next sentence. The accuracy of sentence similarity calculation is improved through the steps.

Description

Sentence similarity calculation method based on word order weighting
Technical Field
The invention relates to the technical field of natural language processing in the technical field of computers. In particular to a method for calculating sentence similarity based on word order weighting.
Background
Sentence similarity calculation is a very important basic problem in the field of natural language processing, and has wide application in many aspects of the field of natural language processing. For example, in machine-based translation, the degree of substitution of words in text is measured using text similarity; in a question-and-answer system (FAQ), question retrieval is performed using the similarity, and the degree of matching between the question of the question user and knowledge in a knowledge base is calculated. In the related art, similarity calculation has been an important issue of interest to researchers.
In the context of a study on statistical language models, Google corporation opened Word2vec, a software tool for training Word vectors in 2013. Word2vec can express a Word into a vector form quickly and effectively through an optimized training model according to a given corpus, and provides a new tool for application research in the field of natural language processing.
Disclosure of Invention
The invention aims to overcome the problems in the prior art, and provides a method for calculating sentence similarity based on word order weighting.
To achieve the above object, the present invention provides. The method comprises the following steps:
1) obtaining a corpus A by using a web crawler, adding classification labels to all sentences in the corpus A according to semantics to obtain a type<Label1i,Sen1i>Corpus of speech, wherein Sen1iFor single sentence sentences in corpus A, Label1iIs Sen1iTraining to obtain Word vector models of all words in the corpus A by using Word2Vec algorithm according to the corresponding category labels;
2) constructing a test corpus B by using the corpus A obtained in the step 1), wherein the form of the test corpus B is<Label2j,Sen2j>Wherein Label2jIs a category in corpus A, Sen2jBelongs to Label2jClass and Label2jSemantic similarities of sentences corresponding to the categories in the corpus A are similar, and then Word vector models of all words in the corpus B are obtained by combining the Word vector model obtained in the step 1) and adopting an incremental training mode and utilizing a Word2Vec algorithm;
3) obtaining a pair of corpora A from the corpus A in the step 1)<Label11,Sen11>For Sen11Performing word segmentation processing, and obtaining a word vector V corresponding to each word segmentation result by using the word vector model obtained in the step 2)1kAnd k denotes that the word is in the sentence Sen11The position of (1);
4) according to the statement Sen1 obtained in step 3)1Word vector V corresponding to each word in the word list1kAnd each term is located at Sen11Calculating the word order weight value weight of each word according to the position in the Chinese character, and calculating the word order weight value weight of each word according to the word vector V1kObtaining new word vector V with weight according to corresponding word order weight value weight1k’;
5) According to the weighted word vector V obtained in the step 4)1k' acquisition statement Sen11Sentence vector SenVec11
6) Repeating the steps 3) to 5), and calculating a sentence vector SenVec1 of all sentences in the corpus Ai
7) Repeating the steps 3) to 5), and calculating a sentence vector SenVec2 of all sentences in the corpus Bj
8) Selecting sentences Sen2 in the test corpus B in turn according to the sentence vectors obtained in the steps 6) and 7)jAnd its corresponding sentence vector SenVec2jRespectively calculating the sum of the calculated sum and each statement Sen1 in corpus AiSelecting the sentence Sen1 with the highest similarity rankingiCorresponding Label1iAnd Label2jComparing, if they are identical, indicating that it is correct, otherwise, using result to obtain result<Sen1i,Sen2j>Storing the training corpus C;
9) labeling the training corpus C obtained in the step 8) according to the similarity of SemEval-2017, then training by adopting an LSTM regression model to obtain a new word vector, updating the word vector model in the step 2) by using the newly trained word vector, and then executing the step 3) to carry out similarity calculation of the next sentence.
Preferably, the step 1) further comprises, before adding the classification label:
and removing redundant punctuations and webpage labels in the corpus A by adopting a regular matching mode, and only keeping single sentence sentences.
Preferably, a Hanlp open-source word segmentation device pair Sen1 is adopted in the step 3)1And performing word segmentation processing.
Preferably, the weight value weight of the word order in step 4) is calculated by the following formula:
Figure GDA0003111603530000021
where k represents the position of the word in the sentence; loc denotes the weighted start position, and λ is a constant with a value in the range of 1-3.
Preferably, the step 5) is calculated by the following formula:
Figure GDA0003111603530000022
where n represents the total number of words in the sentence, V'ikRepresenting the weighted word vector of the kth word in the ith sentence.
Preferably, the similarity in step 6) is calculated by the following formula:
Figure GDA0003111603530000023
wherein SenVec1iPresentation statement Sen1iSentence vector of, SenVec2jPresentation statement Sen2jThe sentence vector of (2).
The method for calculating the sentence similarity based on the word order weighting improves the accuracy of a task of calculating the sentence similarity by using the word order weighting mode; meanwhile, the accuracy of the task of sentence similarity is further improved by utilizing a supervision mode and increasing the training word vectors.
Drawings
Fig. 1 is a flowchart of a method for calculating sentence similarity based on word order weighting according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more clearly and clearly understood, the technical solutions in the embodiments of the present invention will be described below in detail and completely with reference to the accompanying drawings in the embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the conventional method for calculating the similarity of sentences, the weighting mode of sentence vectors is complex, and the effect is uneven.
Fig. 1 is a flowchart of a sentence similarity calculation method based on word order weighting according to an embodiment of the present invention.
In step 101, a corpus a with a scale of 8000 is obtained through a web crawler, redundant punctuations and web page tags in the corpus a are removed in a regular matching manner, and only single sentence sentences are retained. Then, 8000 sentences are classified to obtain a corpus A shown in the following table:
Label1i single sentence Sen1i
Refund shipping costs How to apply for return freight
Refund application How to apply for refund
Refund application I want to refund
Application for change of goods How to apply for changing goods
... ...
And training to obtain Word vector models of all words in the corpus A by using the Word2Vec algorithm.
In step 102, a corpus B of 30000 is constructed, wherein the corpus form in corpus B is<Label2j,Sen2j>Wherein Label2jFrom the classes contained in corpus A, Sen2jBelongs to Label2jClass and Label2jThe sentence semantics corresponding to the category in the corpus A are similar. And (3) obtaining the Word vector models of all words in the corpus B by combining the Word vector model obtained in the step 101 and utilizing a Word2Vec algorithm in an incremental training mode.
The sentence pattern in corpus B is shown in the following table:
Label2j Sen2j
refund shipping costs How to withdraw the freight
Refund application Is the clothes unsuitable and can be returned
Application for change of goods The clothes are troublesome to help I change the bar
... ...
The format can be obtained by training:
word W Word vector V
Word1 v11,v12...v1d
... ...
Wordn vn1,vn2...vnd
Where n represents the total number of words and d represents the total dimension of the word vector.
In step 103, a pair of sentences in corpus A is selected<Label11,Sen11>Such as a pair of sentences in the present embodiment<Refund freight, how to apply for refund freight>Statement Sen11How to apply for returning freight "is to perform word segmentation, the word segmentation result is (how, how to apply, return, and freight), a corresponding word vector is obtained from the word vector model trained in step 102, the dimension of the word vector model in this embodiment is 200, and some results are as follows:
word W Word vector V1k
How to do -0.15166749,0.10850359...-0.097950
Application for 0.099820456,0.11322714...0.06855157
Retreat 0.04588356,0.08467035...-0.15038626
Freight charges -0.010142227,-0.02377942...-0.09789387
In step 104, according to the position of each word, the word order weight of each word is calculated by using the following formula:
Figure GDA0003111603530000041
wherein k represents the position of the word in the sentence and counts from 1; loc denotes the weighted starting position, which can be set to between 1-3, usually when the sentence is short (the number of words contained in the sentence is less than 6), and λ is a constant, which can be set to between 1-3. Test statement Sen1 in this embodimenti"how to apply for backtracking", Loc is set to 1, λ is set to 1.5, and the word order weight values for the words are shown in the following table:
word W Word order weight
How to do 0.8123090300973813
Application for 1.0
Retreat 1.3004891818915623
Freight charges 1.6149794589701247
Multiplying the word vector obtained in step 103 by the calculated word weight to obtain a new word vector, as shown in the following table:
word W Word vector V1k
How to do 0.123200872,0.088138446...0.079565669
Application for 0.099820456,0.11322714...0.06855157
Retreat 0.059671073,0.110112874...-0.195575704
Freight charges -0.016379488,-0.038403275...-0.158096589
In step 105, the updated word vector V obtained in step 104 is used1kAdd and average to get the statement "how to apply for retirement charges"sentence vector SenVec11The concrete formula is as follows:
Figure GDA0003111603530000051
where n represents the total number of words in the sentence, V'ikRepresenting the weighted word vector of the kth word in the ith sentence. With the above formula, the result can be obtained:
sentence Sen11 SenVec11
How to apply for return freight 0.066578228,0.068268796...-0.051388763
In step 106, the above steps 103 to 105 are repeated, and sentence vectors of all 8000 sentences in all corpus a are sequentially calculated.
In step 107, the above steps 103 to 105 are repeated, and sentence vectors of all 30000 sentences in all corpus B are sequentially calculated.
In step 108, the sentence vectors of all sentences in corpus a obtained in step 106 and the sentence vectors of all sentences in corpus B obtained in step 107 are combined. Selecting a sentence Sen2 in corpus B in turnjAnd its sentence vector SenVec2jAnd using the formula:
Figure GDA0003111603530000061
compute the statement Sen2 in turnjHe languageEach statement Sen1 in Material set AiThe similarity between them. Wherein SenVec1iRepresents a sentence vector, SenVec2, corresponding to the sentences in each corpus AjRepresenting a sentence Sen2 in the corpus BjThe sentence vector of (2).
For example, calculate a sentence Sen2 in corpus BjHow to quit freight and each statement Sen1 in corpus AiThe similarity between the two parts can be obtained by using the formula and sorting the parts according to the similarity by using word order weighting, and the results are shown in the following table:
similar sentence Sen1 Class Label1 Using word order weighted similarity
How to apply for return freight Refund shipping costs 0.9334955
How to apply for refund Refund application 0.8528203
How to apply for changing goods Application for change of goods 0.7556491
How to purchase freight insurance Purchasing freight insurance 0.6946413
In the above table, the sentence with the highest similarity is "how to apply for returning the freight", and the corresponding category is "returning the freight" is consistent with the category label "returning the freight" corresponding to "how to return the freight".
The sentences with higher similarity when no word order weighting is adopted are shown in the following table:
similar sentence Sen1 Class Label1 Not using word order weighted similarity
How to apply for refund Refund application 0.8910549
How to apply for return freight Refund shipping costs 0.8341876
How to apply for changing goods Application for change of goods 0.7948803
How to purchase freight insurance Purchasing freight insurance 0.7148501
The sentence with the highest similarity is "how to apply for refund", and the corresponding category label is that "refund application" is inconsistent with the category label "refund freight" corresponding to "how to refund freight". The description is effective to the task when the word order weighting is adopted to calculate the similarity.
Another example is:
for the sentence "telephone filling error" in the sentence "how the order information is filled in incorrectly, telephone filling error" in the corpus B, the calculation results of the related sentences in the corpus a are shown in the following table:
similar sentence Sen1 Class Label1 Using word order weighted similarity
Goods returned telephone wrongly written Return information filling error 0.91249734
Information filling error How to do the order information filling error 0.8882772
Order number wrongly written How to do the order information filling error 0.78467226
Address wrongly written How to do the order information filling error 0.7377515
The Label2 of the sentence Sen2 "telephone filling error" indicates how the order information is wrongly filled, and is inconsistent with the category Label1 "return information filling error" of the sentence "return telephone filling error" with the highest similarity in the final calculation result, and then < return telephone filling error > is stored in the corpus C.
According to the obtained similarity, the overall effect in the test corpus B is shown in the following table:
Figure GDA0003111603530000071
according to the experiment, the accuracy of calculating the similarity is improved by adopting a word order weighting mode.
In step 109, according to the corpus C obtained in step 108, a Similarity value labeling method of Daniel Cer et al in "SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Cross-Textual Focused Evaluation" is adopted, and the Similarity value standard can be as shown in the following figure:
statement one Sentence two Similarity value
He is bathing He is in the bath 5
Those two persons walking on the road The two people take the hands to walk 4
He is unsuspecting about her He has little doubt about her 3
Two of it are put together to build a nest The two walk into the nest together 2
The girl likes listening to music He is playing piano 1
One dog runs by oneself He is flying at high speed 0
The statement obtained as in step 108 will "return call misfilled, call misfilled" which can be labeled as "return call misfilled, 4".
After the corpus C is further processed and labeled, a new word vector is obtained by using the LSTM regression model for training, the word vector model in the step 102 is updated by using the newly trained word vector, then the step 103 is executed, the next sentence similarity calculation is performed according to the updated word vector model when the step 103 is executed, and the overall effect in the test corpus B is shown in the following table:
Figure GDA0003111603530000081
according to the experimental result, the new word vector is trained by the method, and great help is provided for the task of sentence similarity calculation.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (6)

1. A sentence similarity calculation method based on word order weighting is characterized by comprising the following steps:
1) obtaining a corpus A by using a web crawler, adding classification labels to all sentences in the corpus A according to semantics to obtain the form<Label1i,Sen1i>Corpus of speech, wherein Sen1iFor the ith statement in corpus A, Label1iIs Sen1iTraining to obtain Word vector models of all words in the corpus A by using Word2Vec algorithm according to the corresponding category labels;
2) constructing a test corpus B by using the corpus A obtained in the step 1), wherein the form of the test corpus B is<Label2j,Sen2j>Wherein Label2jIs a category in corpus A, Sen2jFor the jth statement in corpus B, Sen2jBelongs to Label2jClass and Label2jSemantic similarities of sentences corresponding to the categories in the corpus A are similar, and then Word vector models of all words in the corpus B are obtained by combining the Word vector model obtained in the step 1) and adopting an incremental training mode and utilizing a Word2Vec algorithm;
3) obtaining a pair of corpora A from the corpus A in the step 1)<Label11,Sen11>For Sen11Performing word segmentation processing, and obtaining a word vector V corresponding to each word segmentation result by using the word vector model obtained in the step 2)1kAnd k denotes that the word is in the sentence Sen11The position of (1);
4) according to the statement Sen1 obtained in step 3)1Word vector V corresponding to each word in the word list1kAnd each term is located at Sen11Calculating the word order weight value weight of each word according to the position in the Chinese character, and calculating the word order weight value weight of each word according to the word vector V1kObtaining new word vector V with weight according to corresponding word order weight value weight1k’;
5) According to the weighted word vector V obtained in the step 4)1k' acquisition statement Sen11Sentence vector SenVec11
6) Repeating the steps 3) to 5), and calculating a sentence vector SenVec1 of all sentences in the corpus Ai
7) Repeating the steps 3) to 5), and calculating a sentence vector SenVec2 of all sentences in the corpus Bj
8) Selecting sentences Sen2 in the test corpus B in turn according to the sentence vectors obtained in the steps 6) and 7)jAnd its corresponding sentence vector SenVec2jRespectively calculating the sum of the calculated sum and each statement Sen1 in corpus AiSelecting the sentence Sen1 with the highest similarity rankingiCorresponding Label1iAnd Label2jComparing, if they are identical, indicating that it is correct, otherwise using result to obtain result<Sen1i,Sen2j>Storing the training corpus C;
9) labeling the training corpus C obtained in the step 8) according to the similarity of SemEval-2017, then training by adopting an LSTM regression model to obtain a new word vector, updating the word vector model in the step 2) by using the newly trained word vector, and then executing the step 3) to carry out similarity calculation of the next sentence.
2. The method for calculating sentence similarity based on word-order weighting according to claim 1, wherein the step 1) further comprises, before adding the classification tag:
and removing redundant punctuations and webpage labels in the corpus A by adopting a regular matching mode, and only keeping single sentence sentences.
3. According to claimThe method for calculating sentence similarity based on word order weighting is characterized in that a Hanlp open source word segmentation device pair Sen1 is adopted in the step 3)1And performing word segmentation processing.
4. The method for calculating sentence similarity based on word-order weighting according to claim 1, wherein the word-order weight in step 4) is calculated by the following formula:
Figure FDA0003111603520000021
where k represents the position of the word in the sentence; loc denotes the weighted start position, and λ is a constant with a value in the range of 1-3.
5. The word-order-weighting-based sentence similarity calculation method according to claim 1, wherein the step 5) is calculated by the following formula:
Figure FDA0003111603520000022
wherein n represents a sentence Sen1iTotal number of Chinese words, V'ikPresentation statement Sen1iAnd (5) weighted word vectors of the kth word.
6. The method for calculating sentence similarity based on word-order weighting according to claim 1, wherein the similarity in step 6) is calculated by the following formula:
Figure FDA0003111603520000023
wherein SenVec1iPresentation statement Sen1iSentence vector of, SenVec2jPresentation statement Sen2jThe sentence vector of (2).
CN201810217211.9A 2018-03-16 2018-03-16 Sentence similarity calculation method based on word order weighting Active CN108509415B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810217211.9A CN108509415B (en) 2018-03-16 2018-03-16 Sentence similarity calculation method based on word order weighting

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810217211.9A CN108509415B (en) 2018-03-16 2018-03-16 Sentence similarity calculation method based on word order weighting

Publications (2)

Publication Number Publication Date
CN108509415A CN108509415A (en) 2018-09-07
CN108509415B true CN108509415B (en) 2021-09-24

Family

ID=63376592

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810217211.9A Active CN108509415B (en) 2018-03-16 2018-03-16 Sentence similarity calculation method based on word order weighting

Country Status (1)

Country Link
CN (1) CN108509415B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109739956B (en) * 2018-11-08 2020-04-10 第四范式(北京)技术有限公司 Corpus cleaning method, apparatus, device and medium
CN109697286A (en) * 2018-12-18 2019-04-30 众安信息技术服务有限公司 A kind of diagnostic standardization method and device based on term vector
CN109710762B (en) * 2018-12-26 2023-08-01 南京云问网络技术有限公司 Short text clustering method integrating multiple feature weights
CN109766547B (en) * 2018-12-26 2022-10-18 重庆邮电大学 Sentence similarity calculation method
CN109902159A (en) * 2019-01-29 2019-06-18 华融融通(北京)科技有限公司 A kind of intelligent O&M statement similarity matching process based on natural language processing
CN110162627B (en) * 2019-04-28 2022-04-15 平安科技(深圳)有限公司 Data increment method and device, computer equipment and storage medium
CN113204612B (en) * 2021-04-24 2024-05-03 上海赛可出行科技服务有限公司 Priori knowledge-based network about vehicle similar address identification method
CN113535919B (en) * 2021-07-16 2022-11-08 北京元年科技股份有限公司 Data query method and device, computer equipment and storage medium
CN114048285A (en) * 2021-10-22 2022-02-15 盐城金堤科技有限公司 Fuzzy retrieval method, device, terminal and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095188A (en) * 2015-08-14 2015-11-25 北京京东尚科信息技术有限公司 Sentence similarity computing method and device
CN106055673A (en) * 2016-06-06 2016-10-26 中国人民解放军国防科学技术大学 Chinese short-text sentiment classification method based on text characteristic insertion
CN106610950A (en) * 2016-09-29 2017-05-03 四川用联信息技术有限公司 Improved text similarity solution method
CN106844350A (en) * 2017-02-15 2017-06-13 广州索答信息科技有限公司 A kind of computational methods of short text semantic similarity

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105824797B (en) * 2015-01-04 2019-11-12 华为技术有限公司 A kind of methods, devices and systems for evaluating semantic similarity

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095188A (en) * 2015-08-14 2015-11-25 北京京东尚科信息技术有限公司 Sentence similarity computing method and device
CN106055673A (en) * 2016-06-06 2016-10-26 中国人民解放军国防科学技术大学 Chinese short-text sentiment classification method based on text characteristic insertion
CN106610950A (en) * 2016-09-29 2017-05-03 四川用联信息技术有限公司 Improved text similarity solution method
CN106844350A (en) * 2017-02-15 2017-06-13 广州索答信息科技有限公司 A kind of computational methods of short text semantic similarity

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Cross-lingual Focused Evaluation》;Mountain View et al.;《Proceedings of the 11th International Workshop on Semantic Evaluations》;20170804;第419-424页 *
《一种基于向量词序的句子相似度算法研究》;程志强 等;《计算机仿真》;20140731;第31卷(第7期);第1-14页 *

Also Published As

Publication number Publication date
CN108509415A (en) 2018-09-07

Similar Documents

Publication Publication Date Title
CN108509415B (en) Sentence similarity calculation method based on word order weighting
CN108363790B (en) Method, device, equipment and storage medium for evaluating comments
Zhai et al. Neural models for sequence chunking
CN110245229B (en) Deep learning theme emotion classification method based on data enhancement
CN108628828B (en) Combined extraction method based on self-attention viewpoint and holder thereof
CN109635124A (en) A kind of remote supervisory Relation extraction method of combination background knowledge
CN106599032A (en) Text event extraction method in combination of sparse coding and structural perceptron
Santos et al. Assessing the impact of contextual embeddings for Portuguese named entity recognition
CN112711948A (en) Named entity recognition method and device for Chinese sentences
Todi et al. Building a kannada pos tagger using machine learning and neural network models
Yüksel et al. Turkish tweet classification with transformer encoder
Thattinaphanich et al. Thai named entity recognition using Bi-LSTM-CRF with word and character representation
CN112905736A (en) Unsupervised text emotion analysis method based on quantum theory
Zhang et al. Term recognition using conditional random fields
CN110705306B (en) Evaluation method for consistency of written and written texts
pal Singh et al. Naive Bayes classifier for word sense disambiguation of Punjabi language
Rajendran et al. Is something better than nothing? automatically predicting stance-based arguments using deep learning and small labelled dataset
CN116127954A (en) Dictionary-based new work specialized Chinese knowledge concept extraction method
Hridoy et al. Aspect based sentiment analysis for bangla newspaper headlines
CN115906824A (en) Text fine-grained emotion analysis method, system, medium and computing equipment
Zheng et al. A novel hierarchical convolutional neural network for question answering over paragraphs
Tur et al. Semi-supervised learning for spoken language understanding semantic role labeling
JP5342574B2 (en) Topic modeling apparatus, topic modeling method, and program
Vavilapalli et al. Summarizing & Sentiment Analysis on Movie Critics Data
Kardan et al. Improving Persian POS tagging using the maximum entropy model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant