CN108509415B - Sentence similarity calculation method based on word order weighting - Google Patents
Sentence similarity calculation method based on word order weighting Download PDFInfo
- Publication number
- CN108509415B CN108509415B CN201810217211.9A CN201810217211A CN108509415B CN 108509415 B CN108509415 B CN 108509415B CN 201810217211 A CN201810217211 A CN 201810217211A CN 108509415 B CN108509415 B CN 108509415B
- Authority
- CN
- China
- Prior art keywords
- word
- corpus
- sentence
- sen1
- vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a sentence similarity calculation method based on word order weighting. The method comprises the following steps: is obtained in the form of<Label1i,Sen1i>Training a corpus A to obtain word vector models of all words in the corpus A; is constructed in the form of<Label2j,Sen2j>The method for testing the corpus B adopts an incremental training mode to obtain the Sen2 in the corpus BjA word vector model for all words; obtaining a sentence Sen1 by adopting a word order weighting mode according to a word vector model obtained by the corpus BiAnd Sen2jSentence vector SenVec1iAnd SenVec2j(ii) a Calculate a Sen2 one by onejWith each statement Sen1iThe sentence Sen1 with the highest similarity is determinediCorresponding Label1iAnd Label2jIf they are identical, then they are correct, otherwise, they are<Sen1i,Sen2j>Storing the training corpus C; and further processing the training corpus C to obtain a new word vector model so as to calculate the similarity of the next sentence. The accuracy of sentence similarity calculation is improved through the steps.
Description
Technical Field
The invention relates to the technical field of natural language processing in the technical field of computers. In particular to a method for calculating sentence similarity based on word order weighting.
Background
Sentence similarity calculation is a very important basic problem in the field of natural language processing, and has wide application in many aspects of the field of natural language processing. For example, in machine-based translation, the degree of substitution of words in text is measured using text similarity; in a question-and-answer system (FAQ), question retrieval is performed using the similarity, and the degree of matching between the question of the question user and knowledge in a knowledge base is calculated. In the related art, similarity calculation has been an important issue of interest to researchers.
In the context of a study on statistical language models, Google corporation opened Word2vec, a software tool for training Word vectors in 2013. Word2vec can express a Word into a vector form quickly and effectively through an optimized training model according to a given corpus, and provides a new tool for application research in the field of natural language processing.
Disclosure of Invention
The invention aims to overcome the problems in the prior art, and provides a method for calculating sentence similarity based on word order weighting.
To achieve the above object, the present invention provides. The method comprises the following steps:
1) obtaining a corpus A by using a web crawler, adding classification labels to all sentences in the corpus A according to semantics to obtain a type<Label1i,Sen1i>Corpus of speech, wherein Sen1iFor single sentence sentences in corpus A, Label1iIs Sen1iTraining to obtain Word vector models of all words in the corpus A by using Word2Vec algorithm according to the corresponding category labels;
2) constructing a test corpus B by using the corpus A obtained in the step 1), wherein the form of the test corpus B is<Label2j,Sen2j>Wherein Label2jIs a category in corpus A, Sen2jBelongs to Label2jClass and Label2jSemantic similarities of sentences corresponding to the categories in the corpus A are similar, and then Word vector models of all words in the corpus B are obtained by combining the Word vector model obtained in the step 1) and adopting an incremental training mode and utilizing a Word2Vec algorithm;
3) obtaining a pair of corpora A from the corpus A in the step 1)<Label11,Sen11>For Sen11Performing word segmentation processing, and obtaining a word vector V corresponding to each word segmentation result by using the word vector model obtained in the step 2)1kAnd k denotes that the word is in the sentence Sen11The position of (1);
4) according to the statement Sen1 obtained in step 3)1Word vector V corresponding to each word in the word list1kAnd each term is located at Sen11Calculating the word order weight value weight of each word according to the position in the Chinese character, and calculating the word order weight value weight of each word according to the word vector V1kObtaining new word vector V with weight according to corresponding word order weight value weight1k’;
5) According to the weighted word vector V obtained in the step 4)1k' acquisition statement Sen11Sentence vector SenVec11;
6) Repeating the steps 3) to 5), and calculating a sentence vector SenVec1 of all sentences in the corpus Ai;
7) Repeating the steps 3) to 5), and calculating a sentence vector SenVec2 of all sentences in the corpus Bj;
8) Selecting sentences Sen2 in the test corpus B in turn according to the sentence vectors obtained in the steps 6) and 7)jAnd its corresponding sentence vector SenVec2jRespectively calculating the sum of the calculated sum and each statement Sen1 in corpus AiSelecting the sentence Sen1 with the highest similarity rankingiCorresponding Label1iAnd Label2jComparing, if they are identical, indicating that it is correct, otherwise, using result to obtain result<Sen1i,Sen2j>Storing the training corpus C;
9) labeling the training corpus C obtained in the step 8) according to the similarity of SemEval-2017, then training by adopting an LSTM regression model to obtain a new word vector, updating the word vector model in the step 2) by using the newly trained word vector, and then executing the step 3) to carry out similarity calculation of the next sentence.
Preferably, the step 1) further comprises, before adding the classification label:
and removing redundant punctuations and webpage labels in the corpus A by adopting a regular matching mode, and only keeping single sentence sentences.
Preferably, a Hanlp open-source word segmentation device pair Sen1 is adopted in the step 3)1And performing word segmentation processing.
Preferably, the weight value weight of the word order in step 4) is calculated by the following formula:
where k represents the position of the word in the sentence; loc denotes the weighted start position, and λ is a constant with a value in the range of 1-3.
Preferably, the step 5) is calculated by the following formula:
where n represents the total number of words in the sentence, V'ikRepresenting the weighted word vector of the kth word in the ith sentence.
Preferably, the similarity in step 6) is calculated by the following formula:
wherein SenVec1iPresentation statement Sen1iSentence vector of, SenVec2jPresentation statement Sen2jThe sentence vector of (2).
The method for calculating the sentence similarity based on the word order weighting improves the accuracy of a task of calculating the sentence similarity by using the word order weighting mode; meanwhile, the accuracy of the task of sentence similarity is further improved by utilizing a supervision mode and increasing the training word vectors.
Drawings
Fig. 1 is a flowchart of a method for calculating sentence similarity based on word order weighting according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more clearly and clearly understood, the technical solutions in the embodiments of the present invention will be described below in detail and completely with reference to the accompanying drawings in the embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the conventional method for calculating the similarity of sentences, the weighting mode of sentence vectors is complex, and the effect is uneven.
Fig. 1 is a flowchart of a sentence similarity calculation method based on word order weighting according to an embodiment of the present invention.
In step 101, a corpus a with a scale of 8000 is obtained through a web crawler, redundant punctuations and web page tags in the corpus a are removed in a regular matching manner, and only single sentence sentences are retained. Then, 8000 sentences are classified to obtain a corpus A shown in the following table:
Label1i | single sentence Sen1i |
Refund shipping costs | How to apply for return freight |
Refund application | How to apply for refund |
Refund application | I want to refund |
Application for change of goods | How to apply for changing goods |
... | ... |
And training to obtain Word vector models of all words in the corpus A by using the Word2Vec algorithm.
In step 102, a corpus B of 30000 is constructed, wherein the corpus form in corpus B is<Label2j,Sen2j>Wherein Label2jFrom the classes contained in corpus A, Sen2jBelongs to Label2jClass and Label2jThe sentence semantics corresponding to the category in the corpus A are similar. And (3) obtaining the Word vector models of all words in the corpus B by combining the Word vector model obtained in the step 101 and utilizing a Word2Vec algorithm in an incremental training mode.
The sentence pattern in corpus B is shown in the following table:
Label2j | Sen2j |
refund shipping costs | How to withdraw the freight |
Refund application | Is the clothes unsuitable and can be returned |
Application for change of goods | The clothes are troublesome to help I change the bar |
... | ... |
The format can be obtained by training:
word W | Word vector V |
Word1 | v11,v12...v1d |
... | ... |
Wordn | vn1,vn2...vnd |
Where n represents the total number of words and d represents the total dimension of the word vector.
In step 103, a pair of sentences in corpus A is selected<Label11,Sen11>Such as a pair of sentences in the present embodiment<Refund freight, how to apply for refund freight>Statement Sen11How to apply for returning freight "is to perform word segmentation, the word segmentation result is (how, how to apply, return, and freight), a corresponding word vector is obtained from the word vector model trained in step 102, the dimension of the word vector model in this embodiment is 200, and some results are as follows:
word W | Word vector V1k |
How to do | -0.15166749,0.10850359...-0.097950 |
Application for | 0.099820456,0.11322714...0.06855157 |
Retreat | 0.04588356,0.08467035...-0.15038626 |
Freight charges | -0.010142227,-0.02377942...-0.09789387 |
In step 104, according to the position of each word, the word order weight of each word is calculated by using the following formula:
wherein k represents the position of the word in the sentence and counts from 1; loc denotes the weighted starting position, which can be set to between 1-3, usually when the sentence is short (the number of words contained in the sentence is less than 6), and λ is a constant, which can be set to between 1-3. Test statement Sen1 in this embodimenti"how to apply for backtracking", Loc is set to 1, λ is set to 1.5, and the word order weight values for the words are shown in the following table:
word W | Word order weight |
How to do | 0.8123090300973813 |
Application for | 1.0 |
Retreat | 1.3004891818915623 |
Freight charges | 1.6149794589701247 |
Multiplying the word vector obtained in step 103 by the calculated word weight to obtain a new word vector, as shown in the following table:
word W | Word vector V1k’ |
How to do | 0.123200872,0.088138446...0.079565669 |
Application for | 0.099820456,0.11322714...0.06855157 |
Retreat | 0.059671073,0.110112874...-0.195575704 |
Freight charges | -0.016379488,-0.038403275...-0.158096589 |
In step 105, the updated word vector V obtained in step 104 is used1kAdd and average to get the statement "how to apply for retirement charges"sentence vector SenVec11The concrete formula is as follows:
where n represents the total number of words in the sentence, V'ikRepresenting the weighted word vector of the kth word in the ith sentence. With the above formula, the result can be obtained:
sentence Sen11 | SenVec11 |
How to apply for return freight | 0.066578228,0.068268796...-0.051388763 |
In step 106, the above steps 103 to 105 are repeated, and sentence vectors of all 8000 sentences in all corpus a are sequentially calculated.
In step 107, the above steps 103 to 105 are repeated, and sentence vectors of all 30000 sentences in all corpus B are sequentially calculated.
In step 108, the sentence vectors of all sentences in corpus a obtained in step 106 and the sentence vectors of all sentences in corpus B obtained in step 107 are combined. Selecting a sentence Sen2 in corpus B in turnjAnd its sentence vector SenVec2jAnd using the formula:
compute the statement Sen2 in turnjHe languageEach statement Sen1 in Material set AiThe similarity between them. Wherein SenVec1iRepresents a sentence vector, SenVec2, corresponding to the sentences in each corpus AjRepresenting a sentence Sen2 in the corpus BjThe sentence vector of (2).
For example, calculate a sentence Sen2 in corpus BjHow to quit freight and each statement Sen1 in corpus AiThe similarity between the two parts can be obtained by using the formula and sorting the parts according to the similarity by using word order weighting, and the results are shown in the following table:
similar sentence Sen1 | Class Label1 | Using word order weighted similarity |
How to apply for return freight | Refund shipping costs | 0.9334955 |
How to apply for refund | Refund application | 0.8528203 |
How to apply for changing goods | Application for change of goods | 0.7556491 |
How to purchase freight insurance | Purchasing freight insurance | 0.6946413 |
In the above table, the sentence with the highest similarity is "how to apply for returning the freight", and the corresponding category is "returning the freight" is consistent with the category label "returning the freight" corresponding to "how to return the freight".
The sentences with higher similarity when no word order weighting is adopted are shown in the following table:
similar sentence Sen1 | Class Label1 | Not using word order weighted similarity |
How to apply for refund | Refund application | 0.8910549 |
How to apply for return freight | Refund shipping costs | 0.8341876 |
How to apply for changing goods | Application for change of goods | 0.7948803 |
How to purchase freight insurance | Purchasing freight insurance | 0.7148501 |
The sentence with the highest similarity is "how to apply for refund", and the corresponding category label is that "refund application" is inconsistent with the category label "refund freight" corresponding to "how to refund freight". The description is effective to the task when the word order weighting is adopted to calculate the similarity.
Another example is:
for the sentence "telephone filling error" in the sentence "how the order information is filled in incorrectly, telephone filling error" in the corpus B, the calculation results of the related sentences in the corpus a are shown in the following table:
similar sentence Sen1 | Class Label1 | Using word order weighted similarity |
Goods returned telephone wrongly written | Return information filling error | 0.91249734 |
Information filling error | How to do the order information filling error | 0.8882772 |
Order number wrongly written | How to do the order information filling error | 0.78467226 |
Address wrongly written | How to do the order information filling error | 0.7377515 |
The Label2 of the sentence Sen2 "telephone filling error" indicates how the order information is wrongly filled, and is inconsistent with the category Label1 "return information filling error" of the sentence "return telephone filling error" with the highest similarity in the final calculation result, and then < return telephone filling error > is stored in the corpus C.
According to the obtained similarity, the overall effect in the test corpus B is shown in the following table:
according to the experiment, the accuracy of calculating the similarity is improved by adopting a word order weighting mode.
In step 109, according to the corpus C obtained in step 108, a Similarity value labeling method of Daniel Cer et al in "SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Cross-Textual Focused Evaluation" is adopted, and the Similarity value standard can be as shown in the following figure:
statement one | Sentence two | Similarity value |
He is bathing | He is in the bath | 5 |
Those two persons walking on the road | The two people take the hands to walk | 4 |
He is unsuspecting about her | He has little doubt about her | 3 |
Two of it are put together to build a nest | The two walk into the nest together | 2 |
The girl likes listening to music | He is playing piano | 1 |
One dog runs by oneself | He is flying at high speed | 0 |
The statement obtained as in step 108 will "return call misfilled, call misfilled" which can be labeled as "return call misfilled, 4".
After the corpus C is further processed and labeled, a new word vector is obtained by using the LSTM regression model for training, the word vector model in the step 102 is updated by using the newly trained word vector, then the step 103 is executed, the next sentence similarity calculation is performed according to the updated word vector model when the step 103 is executed, and the overall effect in the test corpus B is shown in the following table:
according to the experimental result, the new word vector is trained by the method, and great help is provided for the task of sentence similarity calculation.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.
Claims (6)
1. A sentence similarity calculation method based on word order weighting is characterized by comprising the following steps:
1) obtaining a corpus A by using a web crawler, adding classification labels to all sentences in the corpus A according to semantics to obtain the form<Label1i,Sen1i>Corpus of speech, wherein Sen1iFor the ith statement in corpus A, Label1iIs Sen1iTraining to obtain Word vector models of all words in the corpus A by using Word2Vec algorithm according to the corresponding category labels;
2) constructing a test corpus B by using the corpus A obtained in the step 1), wherein the form of the test corpus B is<Label2j,Sen2j>Wherein Label2jIs a category in corpus A, Sen2jFor the jth statement in corpus B, Sen2jBelongs to Label2jClass and Label2jSemantic similarities of sentences corresponding to the categories in the corpus A are similar, and then Word vector models of all words in the corpus B are obtained by combining the Word vector model obtained in the step 1) and adopting an incremental training mode and utilizing a Word2Vec algorithm;
3) obtaining a pair of corpora A from the corpus A in the step 1)<Label11,Sen11>For Sen11Performing word segmentation processing, and obtaining a word vector V corresponding to each word segmentation result by using the word vector model obtained in the step 2)1kAnd k denotes that the word is in the sentence Sen11The position of (1);
4) according to the statement Sen1 obtained in step 3)1Word vector V corresponding to each word in the word list1kAnd each term is located at Sen11Calculating the word order weight value weight of each word according to the position in the Chinese character, and calculating the word order weight value weight of each word according to the word vector V1kObtaining new word vector V with weight according to corresponding word order weight value weight1k’;
5) According to the weighted word vector V obtained in the step 4)1k' acquisition statement Sen11Sentence vector SenVec11;
6) Repeating the steps 3) to 5), and calculating a sentence vector SenVec1 of all sentences in the corpus Ai;
7) Repeating the steps 3) to 5), and calculating a sentence vector SenVec2 of all sentences in the corpus Bj;
8) Selecting sentences Sen2 in the test corpus B in turn according to the sentence vectors obtained in the steps 6) and 7)jAnd its corresponding sentence vector SenVec2jRespectively calculating the sum of the calculated sum and each statement Sen1 in corpus AiSelecting the sentence Sen1 with the highest similarity rankingiCorresponding Label1iAnd Label2jComparing, if they are identical, indicating that it is correct, otherwise using result to obtain result<Sen1i,Sen2j>Storing the training corpus C;
9) labeling the training corpus C obtained in the step 8) according to the similarity of SemEval-2017, then training by adopting an LSTM regression model to obtain a new word vector, updating the word vector model in the step 2) by using the newly trained word vector, and then executing the step 3) to carry out similarity calculation of the next sentence.
2. The method for calculating sentence similarity based on word-order weighting according to claim 1, wherein the step 1) further comprises, before adding the classification tag:
and removing redundant punctuations and webpage labels in the corpus A by adopting a regular matching mode, and only keeping single sentence sentences.
3. According to claimThe method for calculating sentence similarity based on word order weighting is characterized in that a Hanlp open source word segmentation device pair Sen1 is adopted in the step 3)1And performing word segmentation processing.
4. The method for calculating sentence similarity based on word-order weighting according to claim 1, wherein the word-order weight in step 4) is calculated by the following formula:
where k represents the position of the word in the sentence; loc denotes the weighted start position, and λ is a constant with a value in the range of 1-3.
5. The word-order-weighting-based sentence similarity calculation method according to claim 1, wherein the step 5) is calculated by the following formula:
wherein n represents a sentence Sen1iTotal number of Chinese words, V'ikPresentation statement Sen1iAnd (5) weighted word vectors of the kth word.
6. The method for calculating sentence similarity based on word-order weighting according to claim 1, wherein the similarity in step 6) is calculated by the following formula:
wherein SenVec1iPresentation statement Sen1iSentence vector of, SenVec2jPresentation statement Sen2jThe sentence vector of (2).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810217211.9A CN108509415B (en) | 2018-03-16 | 2018-03-16 | Sentence similarity calculation method based on word order weighting |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810217211.9A CN108509415B (en) | 2018-03-16 | 2018-03-16 | Sentence similarity calculation method based on word order weighting |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108509415A CN108509415A (en) | 2018-09-07 |
CN108509415B true CN108509415B (en) | 2021-09-24 |
Family
ID=63376592
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810217211.9A Active CN108509415B (en) | 2018-03-16 | 2018-03-16 | Sentence similarity calculation method based on word order weighting |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108509415B (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109739956B (en) * | 2018-11-08 | 2020-04-10 | 第四范式(北京)技术有限公司 | Corpus cleaning method, apparatus, device and medium |
CN109697286A (en) * | 2018-12-18 | 2019-04-30 | 众安信息技术服务有限公司 | A kind of diagnostic standardization method and device based on term vector |
CN109710762B (en) * | 2018-12-26 | 2023-08-01 | 南京云问网络技术有限公司 | Short text clustering method integrating multiple feature weights |
CN109766547B (en) * | 2018-12-26 | 2022-10-18 | 重庆邮电大学 | Sentence similarity calculation method |
CN109902159A (en) * | 2019-01-29 | 2019-06-18 | 华融融通(北京)科技有限公司 | A kind of intelligent O&M statement similarity matching process based on natural language processing |
CN110162627B (en) * | 2019-04-28 | 2022-04-15 | 平安科技(深圳)有限公司 | Data increment method and device, computer equipment and storage medium |
CN113204612B (en) * | 2021-04-24 | 2024-05-03 | 上海赛可出行科技服务有限公司 | Priori knowledge-based network about vehicle similar address identification method |
CN113535919B (en) * | 2021-07-16 | 2022-11-08 | 北京元年科技股份有限公司 | Data query method and device, computer equipment and storage medium |
CN114048285A (en) * | 2021-10-22 | 2022-02-15 | 盐城金堤科技有限公司 | Fuzzy retrieval method, device, terminal and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105095188A (en) * | 2015-08-14 | 2015-11-25 | 北京京东尚科信息技术有限公司 | Sentence similarity computing method and device |
CN106055673A (en) * | 2016-06-06 | 2016-10-26 | 中国人民解放军国防科学技术大学 | Chinese short-text sentiment classification method based on text characteristic insertion |
CN106610950A (en) * | 2016-09-29 | 2017-05-03 | 四川用联信息技术有限公司 | Improved text similarity solution method |
CN106844350A (en) * | 2017-02-15 | 2017-06-13 | 广州索答信息科技有限公司 | A kind of computational methods of short text semantic similarity |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105824797B (en) * | 2015-01-04 | 2019-11-12 | 华为技术有限公司 | A kind of methods, devices and systems for evaluating semantic similarity |
-
2018
- 2018-03-16 CN CN201810217211.9A patent/CN108509415B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105095188A (en) * | 2015-08-14 | 2015-11-25 | 北京京东尚科信息技术有限公司 | Sentence similarity computing method and device |
CN106055673A (en) * | 2016-06-06 | 2016-10-26 | 中国人民解放军国防科学技术大学 | Chinese short-text sentiment classification method based on text characteristic insertion |
CN106610950A (en) * | 2016-09-29 | 2017-05-03 | 四川用联信息技术有限公司 | Improved text similarity solution method |
CN106844350A (en) * | 2017-02-15 | 2017-06-13 | 广州索答信息科技有限公司 | A kind of computational methods of short text semantic similarity |
Non-Patent Citations (2)
Title |
---|
《SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Cross-lingual Focused Evaluation》;Mountain View et al.;《Proceedings of the 11th International Workshop on Semantic Evaluations》;20170804;第419-424页 * |
《一种基于向量词序的句子相似度算法研究》;程志强 等;《计算机仿真》;20140731;第31卷(第7期);第1-14页 * |
Also Published As
Publication number | Publication date |
---|---|
CN108509415A (en) | 2018-09-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108509415B (en) | Sentence similarity calculation method based on word order weighting | |
CN108363790B (en) | Method, device, equipment and storage medium for evaluating comments | |
Zhai et al. | Neural models for sequence chunking | |
CN110245229B (en) | Deep learning theme emotion classification method based on data enhancement | |
CN108628828B (en) | Combined extraction method based on self-attention viewpoint and holder thereof | |
CN109635124A (en) | A kind of remote supervisory Relation extraction method of combination background knowledge | |
CN106599032A (en) | Text event extraction method in combination of sparse coding and structural perceptron | |
Santos et al. | Assessing the impact of contextual embeddings for Portuguese named entity recognition | |
CN112711948A (en) | Named entity recognition method and device for Chinese sentences | |
Todi et al. | Building a kannada pos tagger using machine learning and neural network models | |
Yüksel et al. | Turkish tweet classification with transformer encoder | |
Thattinaphanich et al. | Thai named entity recognition using Bi-LSTM-CRF with word and character representation | |
CN112905736A (en) | Unsupervised text emotion analysis method based on quantum theory | |
Zhang et al. | Term recognition using conditional random fields | |
CN110705306B (en) | Evaluation method for consistency of written and written texts | |
pal Singh et al. | Naive Bayes classifier for word sense disambiguation of Punjabi language | |
Rajendran et al. | Is something better than nothing? automatically predicting stance-based arguments using deep learning and small labelled dataset | |
CN116127954A (en) | Dictionary-based new work specialized Chinese knowledge concept extraction method | |
Hridoy et al. | Aspect based sentiment analysis for bangla newspaper headlines | |
CN115906824A (en) | Text fine-grained emotion analysis method, system, medium and computing equipment | |
Zheng et al. | A novel hierarchical convolutional neural network for question answering over paragraphs | |
Tur et al. | Semi-supervised learning for spoken language understanding semantic role labeling | |
JP5342574B2 (en) | Topic modeling apparatus, topic modeling method, and program | |
Vavilapalli et al. | Summarizing & Sentiment Analysis on Movie Critics Data | |
Kardan et al. | Improving Persian POS tagging using the maximum entropy model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |