CN102693279B - Method, device and system for fast calculating comment similarity - Google Patents

Method, device and system for fast calculating comment similarity Download PDF

Info

Publication number
CN102693279B
CN102693279B CN201210132078.XA CN201210132078A CN102693279B CN 102693279 B CN102693279 B CN 102693279B CN 201210132078 A CN201210132078 A CN 201210132078A CN 102693279 B CN102693279 B CN 102693279B
Authority
CN
China
Prior art keywords
text
comment
similarity
new
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201210132078.XA
Other languages
Chinese (zh)
Other versions
CN102693279A (en
Inventor
陈学文
张宇峰
姚键
潘柏宇
卢述奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Youku Network Technology Beijing Co Ltd
Original Assignee
1Verge Internet Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 1Verge Internet Technology Beijing Co Ltd filed Critical 1Verge Internet Technology Beijing Co Ltd
Priority to CN201210132078.XA priority Critical patent/CN102693279B/en
Publication of CN102693279A publication Critical patent/CN102693279A/en
Application granted granted Critical
Publication of CN102693279B publication Critical patent/CN102693279B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method, a device and a system for fast calculating comment similarity. The method comprises the steps of: extracting keywords from a new comment firstly; checking inverted indexes and text information for each extracted keyword, and finding out a text having identical keywords with the new comment text afterwards; counting a quantity of the identical keywords between the new comment text and an index text; calculating similarity between the new text and the index text according to the quantity of identical keywords between the new comment text and the index text; and acquiring a highest similarity score of the new text to find out a text that most resembles the new comment text. The method, device and system of the invention are particularly suitable for similarity analysis of short text content of film reviews, short text similarity can be calculated quickly, and an operation program replaces space for time to reduce CPU calculation time.

Description

A kind of method, Apparatus and system of quick calculating comment similarity
Technical field
The invention belongs to text similarity analysis technical field, relate in particular to a kind of method, Apparatus and system of quick calculating comment similarity.
Background technology
Interactive field at information network, user often wishes received information to make comments, owing to often having the comment that part similarity is very high in comment, so analyze comment similarity, for the data analysis of comment, process and play an important role, such as contributing to elite comment extraction, rubbish comment content analysis etc.
Existing comment account form generally adopts direct use similarity algorithm to calculate the similarity between any two comments, and then calculates the similarity score of the comment that similarity is the highest, and then finds out the comment that similarity is higher.Yet this kind of comment account form need to compare one by one with new comment and historical review, calculated amount is larger.Caused the processing speed of server slow so, on the one hand; On the other hand, the access times to the storage comment content data base of server have also been increased.
Summary of the invention
In view of problems of the prior art, the object of the present invention is to provide a kind of method, Apparatus and system of quick calculating comment similarity, for for internet information, particularly: the comment of internet information, reply etc.For this class short text, adopt the similarity calculating method that is applicable to short text, can realize and reducing the computing dependency degree of server CPU and the access times to the storage comment content data base of server, with this, improve the system treatment efficiency of server.
In order to achieve the above object, the invention provides a kind of method of quick calculating comment similarity, it is characterized in that comprising the steps:
S1, the new comment of extraction key word; Comprise
S11, is converted into available processes text by comment urtext;
S12, is then used participle program to carry out participle to processed comment text;
S13, according to text word segmentation result, extracts sentence trunk;
S14, according to the resulting feature key word of the further filtration step S13 of stop words vocabulary, final extraction obtains useful new comment key word;
S2, for each key word extracting, look into inverted index and text message, find out the text that has same keyword with new comment text;
S3, calculate the quantity of same keyword between new comment text and index text;
Between S4, the new comment text of basis and index text, the quantity of same keyword is calculated the similarity of new text and index Chinese version; Comprise
S41, the method calculated characteristics Keyword Weight of employing boolean weight;
S42, the weight of each key word obtaining according to step S41, adopts Dice coefficient calculations text similarity, with the number of same keyword and the weight of each key word between two texts, weighs the similarity degree between text;
S5, obtain new text highest similarity score, thereby find out and text the most similar in new comment text.
S6, adds index by new comment text, generates new index, and then when calculating next comment, all known comments all will add in inverted index.
In addition, the present invention also provides a kind of device of quick calculating comment similarity, it is characterized in that comprising as lower module:
Keyword extraction module, for extracting the key word of new comment; Comprise
For comment urtext being converted into the module of available processes text;
For using participle program processed comment text to be carried out to the module of participle;
For according to text word segmentation result, extract the module of sentence trunk;
For further filtering resulting feature key word according to stop words vocabulary, the final module that obtains useful new comment key word of extracting;
Inverted index module, is used to each key word of extraction to look into inverted index and text message, finds out the text that has same keyword with new comment text;
Same keyword computing module, for calculating the quantity of same keyword between new comment text and index text;
Similarity calculation module, for calculating the similarity of new text and index Chinese version according to the quantity of same keyword between new comment text and index text; Comprise
For adopting the module of the method calculated characteristics Keyword Weight of boolean's weight;
For according to the weight of each key word obtaining, adopt Dice coefficient calculations text similarity, with the number of same keyword and the weight of each key word between two texts, weigh the module of the similarity degree between text;
Similarity text determination module, for obtaining new text highest similarity score, thereby finds out and text the most similar in new comment text;
Index adds module, for new comment text is added to index, generates new index, and then when calculating next comment, all known comments all will add in inverted index.
In addition, the present invention also provides a kind of system of quick calculating comment similarity, it is characterized in that comprising as lower device:
Keyword extraction device, for extracting the key word of new comment; Comprise
For comment urtext being converted into the module of available processes text;
For using participle program processed comment text to be carried out to the module of participle;
For according to text word segmentation result, extract the module of sentence trunk;
For further filtering resulting feature key word according to stop words vocabulary, the final module that obtains useful new comment key word of extracting;
Inverted index device, is used to each key word of extraction to look into inverted index and text message, finds out the text that has same keyword with new comment text;
Same keyword calculation element, for calculating the quantity of same keyword between new comment text and index text;
Similarity calculation element, for calculating the similarity of new text and index Chinese version according to the quantity of same keyword between new comment text and index text; Comprise
For adopting the module of the method calculated characteristics Keyword Weight of boolean's weight;
For according to the weight of each key word obtaining, adopt Dice coefficient calculations text similarity, with the number of same keyword and the weight of each key word between two texts, weigh the module of the similarity degree between text;
Similarity text determining device, for obtaining new text highest similarity score, thereby finds out and text the most similar in new comment text;
Index adding set, for new comment text is added to index, generates new index, and then when calculating next comment, all known comments all will add in inverted index.
Method, Apparatus and system that comment similarity is calculated in express-analysis of the present invention can calculate short text similarity fast, and operation program is traded space for time, and reduce CPU computing time, particularly;
1, adopt inverted index mode to store text feature key word, strengthen similarity String searching speed, do not need similarity calculating one by one between text, reduce calculated amount;
Intermediate computations value while 2, retaining each Text similarity computing, is directly used during Text similarity computing, does not need repeatedly to calculate.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of the method for quick calculating comment similarity of the present invention;
Fig. 2 is the block diagram of the device of quick calculating comment similarity of the present invention;
Fig. 3 is the block diagram of the system of quick calculating comment similarity of the present invention.
Embodiment
For above-mentioned purpose of the present invention, feature and advantage are become apparent more, below in conjunction with the drawings and specific embodiments, the present invention is further detailed explanation:
Fig. 1 is the process flow diagram of the method for quick calculating comment similarity of the present invention.As shown in Figure 1, the concrete implementation of the inventive method is as follows:
S1, the new comment of extraction key word; Concrete leaching process is as follows:
Step S11, is converted into available processes text by comment urtext, as removes the information such as inner label, expression;
Transformation Program can be carried out text-processing by self program, for example, for for this class short text of microblogging, inner label in short text, Sina's microblogging label can be removed, the name occurring in forwarding " //@", topic label " ## " etc. all removes, self content of extracting comment only, in addition also by storage " [] " in database, as expression labels such as [praising], the expression label information in short text can be removed,
Step S12, is then used participle program to carry out participle to processed comment text;
This process can be used self program realization, also can use third party's Chinese word segmentation program, and dictionary captures from internet, thereby can enrich local participle dictionary constantly; Divide word algorithm to adopt maximum reverse matching principle, according to the word in dictionary, text is carried out to participle.
Step S13, according to text word segmentation result, extracts the sentence trunks such as noun, verb;
Extraction noun, verb, adjective etc. carry out part-of-speech tagging according to program and get, and use external program to complete.
Such as after " Huang Xiaoming is development of action heartily " mark " Huang Xiaoming/nh heartily/the o story of a play or opera/n development/v ".
If for some complex sentence minor structures, likely there is marked erroneous situation, cause extraction to there will be mistake.According to the accuracy rate of test part-of-speech tagging, can exist small part mistake may affect last similarity score more than 95%, but because high-accuracy score range is not too large.So can be drawn into more exactly sentence trunk.
Step S14, finally according to the resulting feature key word of the further filtration step S13 of stop words vocabulary, final extraction obtains useful new comment key word.
Word in stop words vocabulary, represents that these words are little on the impact of the text meaning, can ignore.Stop words vocabulary partly derives from internet, and small part is used statistical method to draw, such as " sofa " this key word score after finding in the extensive comment of statistics is very low, can add stop words vocabulary.In addition, more stop words, for example: seem,, certain etc.
S2, for each key word extracting, look into inverted index and text message, find out the text that has same keyword with new comment text; Each key word is set up to an index, and index text is for making the text of similarity analysis.The object of inverted index is so that fast finding text and text message;
Row's index is a kind of technical method using in search engine.Inverted index essence is according to the keyword in text, to set up one to search mechanism, searches a kind of method of text.Each in this concordance list all comprises a property value and has the address of each record of this property value.Because not being determines property value by recording, but by property value, determined the position of recording, thereby be called inverted index (inverted index).With the file of inverted index, we are called inverted index file, are called for short inverted file.
It is as follows that the present invention sets up inverted index detailed process:
Define two table a and b; Wherein, the text of every a line storage comment of table a, feature keyword message and unique No. id of representing text of extraction; Table b is every a line storage key and one group of id sequence.According to the id sequence of the corresponding text of key word that text generates of table a.Table b create-rule is: all texts in traversal list a, to the key word occurring in each text, add in the id sequence that table b key word is corresponding No. id, if this key word not adds one group of new key word.
Inverted index use procedure, for example, finds out the document that contains key word " hello ", can navigate to fast key word " hello " according to table b, and get corresponding id sequence, according to document corresponding to id in id look-up table a.
S3, calculate the quantity of same keyword between new comment text and index text;
Detailed process is as follows:
According to new comment text and the index text that comprises identical key word in other all texts in S2 step, calculate the key word number of new comment text and all texts, because S2 step has been found out the text that has same keyword with new text, so " all texts " is an interval being simplified in this step, resulting result is the number of same keyword between text, this key word number is exactly comm (s1, s2) value in calculating formula of similarity Dice method below.
Add up the information of same characteristic features between each text and new text, this information can be key word, text feature of the present invention only represents with the key word in text, so only use the feature key word extracting in S1 step when calculating similarity, may there are some information in other method, if text size, symbolic information etc. is also text feature, also can be used as the characteristic information that text is analyzed.
Comment characteristic information refers to this value of leng in formula (s2), the text message value that this value representation is used extraordinary key word to calculate, as used Dice method to calculate text similarity in the present invention, this value is the number of feature key word in text so.This value can be kept in the table a of S2, use while carrying out similarity to facilitate with other texts.
Between S4, the new comment text of basis and index text, the quantity of same keyword is calculated the similarity of new text and index Chinese version; The specific implementation process of this step is as follows:
Step S41, the method calculated characteristics Keyword Weight of employing boolean weight; Because comment content is short text, the Feature Words negligible amounts that text packets contains, so adopt the method calculated characteristics weight of boolean's weight; Conventional feature weight method has: boolean's weight, word frequency (tf) weight, tf-idf weight.According to experiment, show if use tf-idf method calculated characteristics weight, the effect of the similarity of increase calculated amount, and calculating does not relatively have significant change, so adopt the method calculated characteristics weight of boolean's weight.
Step S42, the weight of each key word obtaining according to step S41, adopts Dice coefficient calculations text similarity, with the number of same keyword and the weight of each key word between two texts, weighs the similarity degree between text;
Dice coefficient formulas is:
Dice(s1,s2)=2×comm(s1,s2)/(leng(s1)+leng(s2))
Wherein, comm (s1, s2) is the number of identical characters in s1, s2, leng (s1), and leng (s2) is the length of character string s1, s2.
Illustrate as follows, for example: through extracting the sentence of key word after processing, be new text C1: film yellow dawn is bright plays the part of Xiao Ming;
Existing index text comprises:
Index text C2: film yellow dawn of bright artistic skills
Index text C3: Zhao Wei plays the part of little common vetch
Index text C4: little common vetch girl
First, according to " film ", " Huang Xiaoming ", " playing the part of ", key words such as " Xiao Ming " is found out corresponding document C2 and C3(C2, C3, C4 in inverted index and has been added inverted index).
Then, calculate the C1 number identical with C3 key word with C2, C1, i.e. comm (s1, s2) in formula.
Finally, use Dice calculating formula of similarity, calculate the similarity of C1 and C2, the similarity of C1 and C3.
S5, obtain new text highest similarity score, thereby find out and text the most similar in new comment text.Finally obtain a mark of similarity, this mark is between 0-1, and 1 represents that content of text is the most close, and 0 expression is least close; The object that obtains newly commenting on similarity score is to judge whether this new comment is to plagiarize, and the comment that also can determine that citation times is maximum based on this is elite comment, thereby reduces the elite comment score of plagiarizing comment.
S6, new comment text is added to index, produce new index, and then when calculating next comment, all known comments all to add in inverted index.
Technical solution of the present invention can realize in an isolated system, also can obtain thus a kind of entity apparatus that can complete this technical scheme, and Fig. 2 is the block diagram of the device of quick calculating comment similarity of the present invention; Specifically comprise as lower module:
Keyword extraction module, for extracting the key word of new comment; The detailed process of specific works process and method step S1 is identical.
Inverted index module, is used to each key word of extraction to look into inverted index and text message, finds out the text that has same keyword with new comment text; The detailed process of specific works process and method step S2 is identical.
Same keyword computing module, for calculating the quantity of same keyword between new comment text and index text; The detailed process of specific works process and method step S3 is identical.
Similarity calculation module, for calculating the similarity of new text and index Chinese version according to the quantity of same keyword between new comment text and index text; The detailed process of specific works process and method step S4 is identical.
Similarity text determination module, for obtaining new text highest similarity score, thereby finds out and text the most similar in new comment text; The detailed process of specific works process and method step S5 is identical.
Index adds module, for new comment text is added to index, produces new index, and then when calculating next comment, all known comments all will add in inverted index.
In addition, the present invention also can work in coordination with by each device of separation, can obtain thus a kind of system that can complete this technical scheme, and Fig. 3 is the block diagram of the system of quick calculating comment similarity of the present invention, specifically comprises as lower device:
Keyword extraction device, for extracting the key word of new comment; The detailed process of specific works process and method step S1 is identical.
Inverted index device, is used to each key word of extraction to look into inverted index and text message, finds out the text that has same keyword with new comment text; The detailed process of specific works process and method step S2 is identical.
Same keyword calculation element, for calculating the quantity of same keyword between new comment text and index text; The detailed process of specific works process and method step S3 is identical.
Similarity calculation element, for calculating the similarity of new text and index Chinese version according to the quantity of same keyword between new comment text and index text; The detailed process of specific works process and method step S4 is identical.
Similarity text determining device, for obtaining new text highest similarity score, thereby finds out and text the most similar in new comment text; The detailed process of specific works process and method step S5 is identical.
Index adding set, for new comment text is added to index, produces new index, and then when calculating next comment, all known comments all will add in inverted index.
In sum, method, the Apparatus and system of quick calculating comment similarity of the present invention, because comment content is short text, the Feature Words negligible amounts that text packets contains, so adopt the method calculated characteristics weight of boolean's weight, adopts the similarity of two character strings of Dice coefficient calculations, similarity calculation of complex is made to optimization, it has the following advantages: can calculate fast short text similarity, operation program is traded space for time, and reduces CPU computing time.Adopt inverted index mode to store text feature key word, strengthen similarity String searching speed, do not need similarity calculating one by one between text, reduce calculated amount.
It is more than the detailed description that the preferred embodiments of the present invention are carried out, but those of ordinary skill in the art is to be appreciated that, within the scope of the present invention, and guided by the spirit, various improvement, interpolation and replacement are all possible, such as adjusting interface interchange order, change message format and content, the different programming language (as C, C++, Java etc.) of use and realize etc.In these protection domains that all limit in claim of the present invention.

Claims (1)

1. calculate fast a method for comment similarity, it is characterized in that comprising the steps:
S1, the new comment of extraction key word;
S2, for each key word extracting, look into inverted index and text message, find out the text that has same keyword with new comment text;
S3, calculate the quantity of same keyword between new comment text and index text;
Between S4, the new comment text of basis and index text, the quantity of same keyword is calculated the similarity of new text and index Chinese version;
S5, obtain new text highest similarity score, thereby find out and text the most similar in new comment text;
S6, adds index by new comment text, produces new index, and then when calculating next comment, all known comments all will add in inverted index, and the intermediate computations value while retaining each Text similarity computing;
Wherein, step S1 specifically comprises the steps:
S11, is converted into available processes text by comment urtext;
S12, is then used participle program to carry out participle to processed comment text;
S13, according to text word segmentation result, extracts sentence trunk;
S14, according to the resulting feature key word of the further filtration step S13 of stop words vocabulary, final extraction obtains useful new comment key word;
Wherein the detailed process of step S4 comprises:
S21, the method calculated characteristics Keyword Weight of employing boolean weight;
S22, the weight of each key word obtaining according to step S21, adopts Dice coefficient calculations text similarity, with the number of same keyword and the weight of each key word between two texts, weighs the similarity degree between text,
Described Dice coefficient formulas is:
Dice(s1,s2)=2×comm(s1,s2)/(leng(s1)+leng(s2))
Wherein, comm (s1, s2) is the number of identical characters in s1, s2, leng (s1), and leng (s2) is the length of character string s1, s2.
CN201210132078.XA 2012-04-28 2012-04-28 Method, device and system for fast calculating comment similarity Expired - Fee Related CN102693279B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210132078.XA CN102693279B (en) 2012-04-28 2012-04-28 Method, device and system for fast calculating comment similarity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210132078.XA CN102693279B (en) 2012-04-28 2012-04-28 Method, device and system for fast calculating comment similarity

Publications (2)

Publication Number Publication Date
CN102693279A CN102693279A (en) 2012-09-26
CN102693279B true CN102693279B (en) 2014-09-03

Family

ID=46858713

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210132078.XA Expired - Fee Related CN102693279B (en) 2012-04-28 2012-04-28 Method, device and system for fast calculating comment similarity

Country Status (1)

Country Link
CN (1) CN102693279B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104375739B (en) * 2013-08-12 2019-07-26 联想(北京)有限公司 The method and electronic equipment of information processing
CN103646029B (en) * 2013-11-04 2017-03-15 北京中搜网络技术股份有限公司 A kind of similarity calculating method for blog article
CN104778171A (en) * 2014-01-10 2015-07-15 携程计算机技术(上海)有限公司 Character string matching system and method
CN104778184A (en) * 2014-01-15 2015-07-15 腾讯科技(深圳)有限公司 Feedback keyword determining method and device
CN104809117B (en) * 2014-01-24 2018-10-30 深圳市云帆世纪科技有限公司 Video data aggregation processing method, paradigmatic system and video search platform
CN105224518B (en) * 2014-06-17 2020-03-17 腾讯科技(深圳)有限公司 Text similarity calculation method and system and similar text search method and system
CN104217016B (en) * 2014-09-22 2018-02-02 北京国双科技有限公司 Webpage search keyword statistical method and device
CN105868236A (en) * 2015-12-09 2016-08-17 乐视网信息技术(北京)股份有限公司 Synonym data mining method and system
CN110019660A (en) * 2017-08-06 2019-07-16 北京国双科技有限公司 A kind of Similar Text detection method and device
CN110866407B (en) * 2018-08-17 2024-03-01 阿里巴巴集团控股有限公司 Analysis method, device and equipment for determining similarity between text of mutual translation
CN109615001B (en) * 2018-12-05 2020-03-10 上海恺英网络科技有限公司 Method and device for identifying similar articles
CN110674256B (en) * 2019-09-25 2023-05-12 携程计算机技术(上海)有限公司 Method and system for detecting correlation degree of comment and reply of OTA hotel
CN111913912A (en) * 2020-07-16 2020-11-10 北京字节跳动网络技术有限公司 File processing method, file matching device, electronic equipment and medium
CN113836886A (en) * 2021-08-18 2021-12-24 北京清博智能科技有限公司 News title similarity identification method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1728134A (en) * 2004-07-30 2006-02-01 国际商业机器公司 Multi-language network information search method and system based on supertext
CN101059805A (en) * 2007-03-29 2007-10-24 复旦大学 Network flow and delaminated knowledge library based dynamic file clustering method
CN101174273A (en) * 2007-12-04 2008-05-07 清华大学 News event detecting method based on metadata analysis
CN102117339A (en) * 2011-03-30 2011-07-06 曹晓晶 Filter supervision method specific to unsecure web page texts
CN102411583A (en) * 2010-09-20 2012-04-11 阿里巴巴集团控股有限公司 Method and device for matching texts

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1728134A (en) * 2004-07-30 2006-02-01 国际商业机器公司 Multi-language network information search method and system based on supertext
CN101059805A (en) * 2007-03-29 2007-10-24 复旦大学 Network flow and delaminated knowledge library based dynamic file clustering method
CN101174273A (en) * 2007-12-04 2008-05-07 清华大学 News event detecting method based on metadata analysis
CN102411583A (en) * 2010-09-20 2012-04-11 阿里巴巴集团控股有限公司 Method and device for matching texts
CN102117339A (en) * 2011-03-30 2011-07-06 曹晓晶 Filter supervision method specific to unsecure web page texts

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
互联网文本聚类与检索技术研究;孟宪军;《中国博士论文全文数据库》;20110531;第7-10,35-36,81-82页 *
孟宪军.互联网文本聚类与检索技术研究.《中国博士论文全文数据库》.2011,

Also Published As

Publication number Publication date
CN102693279A (en) 2012-09-26

Similar Documents

Publication Publication Date Title
CN102693279B (en) Method, device and system for fast calculating comment similarity
TWI636452B (en) Method and system of voice recognition
US10997370B2 (en) Hybrid classifier for assigning natural language processing (NLP) inputs to domains in real-time
CN104636466B (en) Entity attribute extraction method and system for open webpage
CN111460787A (en) Topic extraction method and device, terminal device and storage medium
Mori et al. A machine learning approach to recipe text processing
TWI554896B (en) Information Classification Method and Information Classification System Based on Product Identification
CN106202153A (en) The spelling error correction method of a kind of ES search engine and system
CN110362678A (en) A kind of method and apparatus automatically extracting Chinese text keyword
CN103020230A (en) Semantic fuzzy matching method
CN104881402A (en) Method and device for analyzing semantic orientation of Chinese network topic comment text
JP2005267638A (en) System and method for improved spell checking
CN103309926A (en) Chinese and English-named entity identification method and system based on conditional random field (CRF)
Ljubešić et al. Standardizing tweets with character-level machine translation
CN109472022B (en) New word recognition method based on machine learning and terminal equipment
CN113761890B (en) Multi-level semantic information retrieval method based on BERT context awareness
CN107092605A (en) A kind of entity link method and device
CN109522396B (en) Knowledge processing method and system for national defense science and technology field
JP2019082931A (en) Retrieval device, similarity calculation method, and program
JP2013190985A (en) Knowledge response system, method and computer program
CN109213998A (en) Chinese wrongly written character detection method and system
CN110705285B (en) Government affair text subject word library construction method, device, server and readable storage medium
CN101369285B (en) Spell emendation method for query word in Chinese search engine
CN111859974A (en) Semantic disambiguation method and device combined with knowledge graph and intelligent learning equipment
Mohnot et al. Hybrid approach for Part of Speech Tagger for Hindi language

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 100080 Beijing Haidian District city Haidian street A Sinosteel International Plaza No. 8 block 5 layer A, C

Patentee after: Youku network technology (Beijing) Co.,Ltd.

Address before: 100080 Beijing Haidian District city Haidian street A Sinosteel International Plaza No. 8 block 5 layer A, C

Patentee before: 1VERGE INTERNET TECHNOLOGY (BEIJING) Co.,Ltd.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20200623

Address after: 310052 room 508, floor 5, building 4, No. 699, Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province

Patentee after: Alibaba (China) Co.,Ltd.

Address before: 100080 Beijing Haidian District city Haidian street A Sinosteel International Plaza No. 8 block 5 layer A, C

Patentee before: Youku network technology (Beijing) Co.,Ltd.

CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20140903

Termination date: 20200428