CN107315731A - Text similarity computing method - Google Patents

Text similarity computing method Download PDF

Info

Publication number
CN107315731A
CN107315731A CN201610268995.9A CN201610268995A CN107315731A CN 107315731 A CN107315731 A CN 107315731A CN 201610268995 A CN201610268995 A CN 201610268995A CN 107315731 A CN107315731 A CN 107315731A
Authority
CN
China
Prior art keywords
text
mrow
phrase
computing method
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610268995.9A
Other languages
Chinese (zh)
Inventor
俞晓光
陶玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201610268995.9A priority Critical patent/CN107315731A/en
Publication of CN107315731A publication Critical patent/CN107315731A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A kind of Text similarity computing method, including:Step (S1), according to the default classification scheme classified based on user view, according to history text, the intention assessment disaggregated model for the phrase being directed in the history text is created, the intention assessment disaggregated model reflects probability of the phrase under the classification scheme;Step (S2), using as the object text segmentation of Similarity Measure object be object phrase corresponding with the phrase in above-mentioned intention assessment disaggregated model, based on the intention assessment disaggregated model, phase adduction normalizing is carried out to the probability of the object phrase, the intent classifier vector of the object text is obtained, the intent classifier vector reflects probability of the object text under the classification scheme;And step (S3), according to intent classifier vector, the similarity of two object texts is asked for using Method of Cosine.

Description

Text similarity computing method
Technical field
The present invention relates to a kind of Text similarity computing method, intention assessment classification mould is more particularly to utilized The Text similarity computing method of type.
Background technology
Text similarity, that is, calculate the whether similar algorithm of two problems, and it is most basic as one kind Algorithm has a wide range of applications, while being also search engine, text sequence, related question excavation etc. one The core of series of problems.It is a series of to ask if the similarity between text two-by-two can be calculated effectively Topic can also be solved therewith.
Intention assessment, that is, recognize a kind of intention of behavior.For example, in question answer dialog, quizmaster is every Word all carries certain intention, and answer party is answered according to the intention of other side.Relevant issues are being searched It is widely used under the scenes such as index is held up, chat robots.Especially, in chat robots, meaning Figure identification is the nucleus module of whole system.When the problem of answering user, all problems are drawn in advance It is divided into theme is classified by the intention of user one by one classification scheme (with company's customer service and user Exemplified by dialogue, a theme is exactly a service point.For example, relevant goods return and replacement, relevant delivery address Deng).User is putd question to every time, and all problem is mapped in some theme, particular topic pair is provided afterwards The answer answered.
Machine learning is exactly the science of an artificial intelligence, and the main study subject in the field is artificial intelligence Can, in particular how improve the performance of specific algorithm in empirical learning.Common machine learning method Supervised learning, semi-supervised learning and unsupervised learning can be divided into.
So-called supervised learning, exactly goes out a function, when new from given training data focusing study When data arrive, it can be predicted the outcome according to the function.The training set requirement of supervised learning is to include Input and output, it may also be said to be feature and target.Target in training set can be marked in advance.
So-called topic model is exactly that the method that theme is modeled is implied to text.Given training corpus, Training corpus is automatically divided into different themes, which theme new language material belongs to for predicting.
LR (Logistic regression) is logistic regression algorithm, is a kind of conventional supervised learning Algorithm.
Bag of words (bag of words), are a kind of document representation methods.
For example, there is a dictionary:
{″John″:1, " likes ":2, " to ":3, " watch ":4, " movies ":5, " also ": 6, " football ":7, " games ":8, " Mary ":9, " too ":10}
One text:
John likes to watch movies.Mary likes too.
According to existing dictionary, following vector can be converted the text to:
[1,1,1,1,1,0,0,0,1,1]
Wherein, 1 shows that the word in dictionary occurred, and 0 represents do not occur.
The existing method for calculating text similarity is a lot, for example, convert the text to ask after term vector to Cos (cosine) angle of amount, or BM25 (BM stands for Best Matching, optimal With criterion), LCS (Longest Common Subsequence, longest common subsequence) etc. Series of algorithms.
However, the existing algorithm for calculating text similarity can only often reflect text in terms of some Similarity, and algorithm be substantially it is related to text literal strong (close).On the one hand, when two Text matches are to core word or when matching general stop word, and the similarity that algorithm is provided is identical , it is impossible to make a distinction;On the other hand, if two texts contain synonym, although expression is One meaning, but be due to literal inconsistent and cause similarity very low.General topic model due to Each theme is the generation of program automatic cluster, therefore, on the one hand, the theme of generation is often people It can not understand, on the other hand, some incoherent problems can be divided into a theme and caused Effect is extremely difficult to be expected.
In addition, generally requiring in actual use while being merged to multiple similarity algorithms.Moreover, Effect also is difficult to satisfactory.
The content of the invention
The present invention is in view of prior art substantially all has stronger correlation and nothing with the literal of text Method is real to be made from the semantic level of text judging the problem of text similarity etc. is above-mentioned such, Its object is to provide a kind of avoid to calculate the disadvantage of similarity according to literal completely in the prior art The degree of accuracy at end is higher and the more preferable Text similarity computing method of effect.
The Text similarity computing method of one aspect of the present invention, step (S1), according to default The classification scheme classified based on user view, according to history text, is created and is directed to the history text In phrase intention assessment disaggregated model, the intention assessment disaggregated model reflects the phrase in institute State the probability under classification scheme;Step (S2), will be used as the object text point of Similarity Measure object Object phrase corresponding with the phrase in above-mentioned intention assessment disaggregated model is segmented into, based on the meaning Figure identification disaggregated model, phase adduction normalizing is carried out to the probability of the object phrase, obtains described The intent classifier vector of object text, the intent classifier vector reflects the object text at described point Probability under class theme;And step (S3), according to intent classifier vector, utilize Method of Cosine Ask for the similarity of two object texts.
Text similarity computing method according to an aspect of the present invention, the formula of the Method of Cosine For:
Wherein, cos θ represent similarity, and i represents the classification scheme number of the intent classifier vector, its Value is 1 to n positive integer, and A represents the first object text, and B represents the second object text, Ai、Bi The institute of the first object text or the second object text under current class theme is represented respectively State probability.
Text similarity computing method according to an aspect of the present invention, the intention assessment classification mould Type is created by bag of words method and combines logistic regression algorithm to realize.
Text similarity computing method according to an aspect of the present invention, the classification scheme is customer service With the service point of user session.
Text similarity computing method according to an aspect of the present invention, the history text is customer service With the text in the history consulting daily record of user session.
Text similarity computing method according to an aspect of the present invention, the phrase is as needed A part of phrase filtered out from the history text.
Text similarity computing method according to an aspect of the present invention, the classification scheme number is institute State the dimension of intent classifier vector.
Text similarity computing method according to an aspect of the present invention, the probability is the intention The numerical value of class vector.
In summary, according to the above-mentioned technical proposal of the Text similarity computing method of the present invention, realize A kind of degree of accuracy is higher and effect more preferable Text similarity computing method, it is to avoid in the prior art Completely according to it is literal to calculate similarity the drawbacks of.
Brief description of the drawings
Fig. 1 is the general block diagram of the Text similarity computing method of the present invention.
The step of Fig. 2 is the establishment intention assessment disaggregated model of the Text similarity computing method of the present invention S1 flow chart.
Fig. 3 is the relevant used equipment of the control method of the intelligent terminal access intelligence spot net of the present invention Process schematic diagram during access.
Embodiment
The present invention is the Text similarity computing method that make use of intention assessment disaggregated model, according to prior Each ready-portioned classification scheme, it is intended that identification disaggregated model can map the text to corresponding classification So as to therefrom obtain the information of its semantic level on theme.Text similarity meter is carried out on this basis Calculate, so as to obtain more preferable effect.
For the object, technical solutions and advantages of the present invention are more clearly understood, below in conjunction with specific reality Example is applied, and referring to the drawings, the present invention is described in detail.
Fig. 1 is the general block diagram of the Text similarity computing method of the present invention.As shown in figure 1, above-mentioned Text similarity computing method includes:Create the step S1 of intention assessment disaggregated model;Obtain object text The step S2 of this intention assessment class vector;And calculate the step S3 of similarity.
The step of Fig. 2 is the establishment intention assessment disaggregated model of the Text similarity computing method of the present invention S1 flow chart.
As shown in Fig. 2 in the step S1 for creating intention assessment disaggregated model, first, setting in advance Surely the classification scheme (step S1-1) classified by the intention of user.With company's customer service and user couple Exemplified by words, a classification scheme is exactly a service point, and each problem (text) of user can be with Corresponding service point correspondence in these service points.For example, it is assumed herein that being divided into 3 kinds of classification schemes: " relevant freight charges ", " relevant goods return and replacement ", " relevant delivery address ".
Then, obtain history text and (by taking company's customer service and user session as an example, then seek advice from day for history Text in will), and history text is subjected to cutting word, to determine modeling phrase (step S1-2). That is, by taking above-mentioned Bag of words (bag of words) method as an example, it can be cut into and the dictionary number in bag of words According to corresponding phrase one by one, modeling phrase is used as.Here, can not be all conducts of all phrases Modeling phrase, but actually useful a part of phrase can be filtered out as needed as modeling Use phrase.
Then, for each identified modeling phrase, according to above-mentioned default classification scheme, profit With known algorithm (for example, using Bag of words (bag of words) method, every text is converted to Vector, is that logistic regression algorithm carries out model training using LR (Logistic regression) then), Create the intention assessment disaggregated model (step S1-3) for each phrase.
Here, the output of intention assessment disaggregated model is a vector (also referred to as theme vector), to The classification scheme number of the dimension of amount and above-mentioned division is consistent (in this example, being " 3 "), per one-dimensional Numerical value represent text or phrase belongs to the probability of corresponding classification scheme, probability is bigger to represent text This or phrase are more likely to belong to current class theme, and vectorial all dimensions add up to 1.
It is following【Table 1】, there is shown one of the intention assessment disaggregated model for phrase created shows Example.(here, table 1 indicates an example, numerical value not actual numerical value.Moreover, the intention assessment Disaggregated model is a kind of existing machine learning algorithm, more than one, different its algorithm logic of algorithm It is different)
【Table 1】
Relevant freight charges Relevant goods return and replacement Relevant delivery address
Thing 0.33 0.33 0.33
Delivery 0.45 0.10 0.45
Bag postal 0.80 0.10 0.10
Where 0.15 0.05 0.80
Freight charges 0.80 0.10 0.10
··· ··· ··· ···
Fig. 3 is the intention assessment classification of the acquisition object text of the Text similarity computing method of the present invention The step S2 of vector flow chart.
As shown in figure 3, in the step S2 of intention assessment class vector of object text is obtained, it is first First, the object text (step S2-1) as the object for carrying out similarity assessment is obtained.
Then, using the intention assessment disaggregated model of above-mentioned establishment, the intention for obtaining the object text is known Not vectorial (step S2-2).Specifically, it is intended that the input of identification disaggregated model is the object text, The output of intention assessment disaggregated model is a vector (also referred to as theme vector), vectorial dimension with The classification scheme number of above-mentioned division is consistent (in this example, being " 3 "), is represented per one-dimensional numerical value Text or phrase belong to the probability of corresponding classification scheme, and probability is bigger represents text or phrase more Current class theme is likely to belong to, vectorial all dimensions add up to 1.
For example, it is assumed that object text is " who goes out thing freight out ", then according to Bag of words (bag of words) method carries out cutting word, and cutting word is " thing ", " delivery ", " freight charges ", " who goes out ".Then, According to above-mentioned【Table 1】The intention assessment disaggregated model for phrase, utilize the side of phase adduction normalizing Method is vectorial come the intention assessment for obtaining the object text, i.e., object text belongs to corresponding each classification master Probability under topic.For example, it is as follows specifically to calculate (phase adduction normalizing algorithm).
The first step, calculates the probability that text belongs to each classification:
Belong to the probability P 1=0.33+0.45+0.80 of classification scheme 1 (such as " about freight charges ");
Belong to the probability P 2=0.33+0.10+0.10 of classification scheme 2 (such as " about goods return and replacement ");
·····
Belong to classification scheme n probability P n=xxx+xxx+xxx;
Second step, normalizes each probability:
Belong to final probability=P1/ (P1+P2++Pn) of classification scheme 1;
Belong to final probability=P2/ (P1+P2++Pn) of classification scheme 2;
·····
Belong to classification scheme n final probability=Pn/ (P1+P2++Pn);
Here, an also simply example, numerical value not actual numerical value.Moreover, this also and it is not exclusive calculate Method.
Then, judge whether the object text that carry out similar assessment obtains and finish, be judged as it is not complete When finishing ("No"), return to step S2-1 obtains next object text;It is being judged as finishing ("Yes") When, into step S3.
It is following【Table 2】, there is shown an example of the intent classifier vector of acquired object text. (here, table 2 is also an example, numerical value not actual numerical value, and above-mentioned【Table 1】Not Matching completely)
【Table 2】
In the step S3 for calculating similarity, two texts are asked for according to following cosine formula (formula 1) This similarity.
Wherein, cos θ represent similarity, and i represents the dimension i.e. classification scheme number of vector, and its value is 1 To n positive integer (in this example, n=3), A represents the first object text, and B represents the second object text This, Ai、BiRepresent respectively the first object text or the second object text under current class theme to Numerical quantity is probability.
So, according to above-mentioned【Table 2】, by above formula try to achieve " who goes out thing freight out " with The similarity of " commodity bundle postal " be 0.9967, and " who goes out thing freight out " with " thing from Where deliver " similarity be 0.0819.
It can be seen that, when text representation be same intention when just can preferably reflect its similarity also compared with It is close.Conversely, when being intended to difference farther out, text similarity is also low.Moreover, similarity and word Face relation is not literal closer to more similar less.
Thus, the similarity that the present invention is calculated and tried to achieve by the above method is to text semantic rank Understand, more general similarity calculating method abstraction level is higher.It is not simple according to text It is literal whether unanimously to seek similarity, but whether two texts are in table from the point of view of being really intended to according to text State same implication.Compared to general literal similarity algorithm, it is to avoid what is be mentioned above is complete According to the drawbacks of literal calculating similarity.For general topic model, due to intention assessment classification mould Type has higher accuracy rate, and effect is also more excellent.
Particular embodiments described above, is carried out to the purpose of the present invention, technical scheme and beneficial effect It is further described, should be understood that the specific example that the foregoing is only of the invention, It is not intended to limit the invention.Any modification within the spirit and principles of the invention, being made, Equivalent substitution, improvement etc., should be included in the scope of the protection.

Claims (8)

1. a kind of Text similarity computing method, including:
Step (S1), according to the default classification scheme classified based on user view, according to history Text, creates the intention assessment disaggregated model for the phrase being directed in the history text, the intention assessment Disaggregated model reflects probability of the phrase under the classification scheme;
Step (S2), will be to know with above-mentioned intention as the object text segmentation of Similarity Measure object The corresponding object phrase of the phrase in other disaggregated model, based on the intention assessment disaggregated model, Phase adduction normalizing is carried out to the probability of the object phrase, the intention point of the object text is obtained Class vector, the intent classifier vector reflects probability of the object text under the classification scheme; And
Step (S3), according to intent classifier vector, two objects are asked for using Method of Cosine The similarity of text.
2. Text similarity computing method according to claim 1, it is characterised in that
The formula of the Method of Cosine is:
<mrow> <mi>cos</mi> <mi>&amp;theta;</mi> <mo>=</mo> <mfrac> <mrow> <msubsup> <mi>&amp;Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </msubsup> <mrow> <mo>(</mo> <msub> <mi>A</mi> <mi>i</mi> </msub> <mo>&amp;times;</mo> <msub> <mi>B</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <msqrt> <msubsup> <mi>&amp;Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </msubsup> <msup> <mrow> <mo>(</mo> <msub> <mi>A</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> </msqrt> <mo>&amp;times;</mo> <msqrt> <msubsup> <mi>&amp;Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </msubsup> <msup> <mrow> <mo>(</mo> <msub> <mi>B</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> </msqrt> </mrow> </mfrac> </mrow>
Wherein, cos θ represent similarity, and i represents the classification scheme number of the intent classifier vector, its Value is 1 to n positive integer, and A represents the first object text, and B represents the second object text, Ai、Bi The institute of the first object text or the second object text under current class theme is represented respectively State probability.
3. Text similarity computing method according to claim 1, it is characterised in that
The intention assessment disaggregated model is created by bag of words method and combines logistic regression algorithm coming Realize.
4. Text similarity computing method according to claim 1, it is characterised in that
The classification scheme is the service point of customer service and user session.
5. Text similarity computing method according to claim 1, it is characterised in that
The history text is the text in customer service and the history consulting daily record of user session.
6. Text similarity computing method according to claim 1, it is characterised in that
The phrase is a part of phrase filtered out as needed from the history text.
7. Text similarity computing method according to claim 1, it is characterised in that
The classification scheme number is the dimension of the intent classifier vector.
8. Text similarity computing method according to claim 1, it is characterised in that
The probability is the numerical value of the intent classifier vector.
CN201610268995.9A 2016-04-27 2016-04-27 Text similarity computing method Pending CN107315731A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610268995.9A CN107315731A (en) 2016-04-27 2016-04-27 Text similarity computing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610268995.9A CN107315731A (en) 2016-04-27 2016-04-27 Text similarity computing method

Publications (1)

Publication Number Publication Date
CN107315731A true CN107315731A (en) 2017-11-03

Family

ID=60184590

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610268995.9A Pending CN107315731A (en) 2016-04-27 2016-04-27 Text similarity computing method

Country Status (1)

Country Link
CN (1) CN107315731A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108334891A (en) * 2017-12-15 2018-07-27 北京奇艺世纪科技有限公司 A kind of Task intent classifier method and device
CN108388914A (en) * 2018-02-26 2018-08-10 中译语通科技股份有限公司 A kind of grader construction method, grader based on semantic computation
CN109284486A (en) * 2018-08-14 2019-01-29 重庆邂智科技有限公司 Text similarity measure, device, terminal and storage medium
CN109344857A (en) * 2018-08-14 2019-02-15 重庆邂智科技有限公司 Text similarity measurement method and device, terminal and storage medium
CN109635105A (en) * 2018-10-29 2019-04-16 厦门快商通信息技术有限公司 A kind of more intension recognizing methods of Chinese text and system
CN110019715A (en) * 2017-12-08 2019-07-16 阿里巴巴集团控股有限公司 Response determines method, apparatus, equipment, medium and system
CN111373391A (en) * 2017-11-29 2020-07-03 三菱电机株式会社 Language processing device, language processing system, and language processing method
CN111428010A (en) * 2019-01-10 2020-07-17 北京京东尚科信息技术有限公司 Man-machine intelligent question and answer method and device
CN112527985A (en) * 2020-12-04 2021-03-19 杭州远传新业科技有限公司 Unknown problem processing method, device, equipment and medium
CN115187153A (en) * 2022-09-14 2022-10-14 杭银消费金融股份有限公司 Data processing method and system applied to business risk tracing

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1755687A (en) * 2004-09-30 2006-04-05 微软公司 Forming intent-based clusters and employing same by search engine
CN101621391A (en) * 2009-08-07 2010-01-06 北京百问百答网络技术有限公司 Method and system for classifying short texts based on probability topic
CN102662987A (en) * 2012-03-14 2012-09-12 华侨大学 Classification method of web text semantic based on Baidu Baike
CN102681983A (en) * 2011-03-07 2012-09-19 北京百度网讯科技有限公司 Alignment method and device for text data
CN102880723A (en) * 2012-10-22 2013-01-16 深圳市宜搜科技发展有限公司 Searching method and system for identifying user retrieval intention
CN103823844A (en) * 2014-01-26 2014-05-28 北京邮电大学 Question forwarding system and question forwarding method on the basis of subjective and objective context and in community question-and-answer service
CN104050256A (en) * 2014-06-13 2014-09-17 西安蒜泥电子科技有限责任公司 Initiative study-based questioning and answering method and questioning and answering system adopting initiative study-based questioning and answering method
CN104408153A (en) * 2014-12-03 2015-03-11 中国科学院自动化研究所 Short text hash learning method based on multi-granularity topic models
CN104516986A (en) * 2015-01-16 2015-04-15 青岛理工大学 Method and device for recognizing sentence
CN104731958A (en) * 2015-04-03 2015-06-24 北京航空航天大学 User-demand-oriented cloud manufacturing service recommendation method
CN104951433A (en) * 2015-06-24 2015-09-30 北京京东尚科信息技术有限公司 Method and system for intention recognition based on context
CN105653738A (en) * 2016-03-01 2016-06-08 北京百度网讯科技有限公司 Search result broadcasting method and device based on artificial intelligence

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1755687A (en) * 2004-09-30 2006-04-05 微软公司 Forming intent-based clusters and employing same by search engine
CN101621391A (en) * 2009-08-07 2010-01-06 北京百问百答网络技术有限公司 Method and system for classifying short texts based on probability topic
CN102681983A (en) * 2011-03-07 2012-09-19 北京百度网讯科技有限公司 Alignment method and device for text data
CN102662987A (en) * 2012-03-14 2012-09-12 华侨大学 Classification method of web text semantic based on Baidu Baike
CN102880723A (en) * 2012-10-22 2013-01-16 深圳市宜搜科技发展有限公司 Searching method and system for identifying user retrieval intention
CN103823844A (en) * 2014-01-26 2014-05-28 北京邮电大学 Question forwarding system and question forwarding method on the basis of subjective and objective context and in community question-and-answer service
CN104050256A (en) * 2014-06-13 2014-09-17 西安蒜泥电子科技有限责任公司 Initiative study-based questioning and answering method and questioning and answering system adopting initiative study-based questioning and answering method
CN104408153A (en) * 2014-12-03 2015-03-11 中国科学院自动化研究所 Short text hash learning method based on multi-granularity topic models
CN104516986A (en) * 2015-01-16 2015-04-15 青岛理工大学 Method and device for recognizing sentence
CN104731958A (en) * 2015-04-03 2015-06-24 北京航空航天大学 User-demand-oriented cloud manufacturing service recommendation method
CN104951433A (en) * 2015-06-24 2015-09-30 北京京东尚科信息技术有限公司 Method and system for intention recognition based on context
CN105653738A (en) * 2016-03-01 2016-06-08 北京百度网讯科技有限公司 Search result broadcasting method and device based on artificial intelligence

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111373391B (en) * 2017-11-29 2023-10-20 三菱电机株式会社 Language processing device, language processing system, and language processing method
CN111373391A (en) * 2017-11-29 2020-07-03 三菱电机株式会社 Language processing device, language processing system, and language processing method
CN110019715A (en) * 2017-12-08 2019-07-16 阿里巴巴集团控股有限公司 Response determines method, apparatus, equipment, medium and system
CN110019715B (en) * 2017-12-08 2023-07-14 阿里巴巴集团控股有限公司 Response determination method, device, equipment, medium and system
CN108334891A (en) * 2017-12-15 2018-07-27 北京奇艺世纪科技有限公司 A kind of Task intent classifier method and device
CN108388914B (en) * 2018-02-26 2022-04-01 中译语通科技股份有限公司 Classifier construction method based on semantic calculation and classifier
CN108388914A (en) * 2018-02-26 2018-08-10 中译语通科技股份有限公司 A kind of grader construction method, grader based on semantic computation
CN109344857A (en) * 2018-08-14 2019-02-15 重庆邂智科技有限公司 Text similarity measurement method and device, terminal and storage medium
CN109284486B (en) * 2018-08-14 2023-08-22 重庆邂智科技有限公司 Text similarity measurement method, device, terminal and storage medium
CN109344857B (en) * 2018-08-14 2022-05-13 重庆邂智科技有限公司 Text similarity measurement method and device, terminal and storage medium
CN109284486A (en) * 2018-08-14 2019-01-29 重庆邂智科技有限公司 Text similarity measure, device, terminal and storage medium
CN109635105A (en) * 2018-10-29 2019-04-16 厦门快商通信息技术有限公司 A kind of more intension recognizing methods of Chinese text and system
CN111428010A (en) * 2019-01-10 2020-07-17 北京京东尚科信息技术有限公司 Man-machine intelligent question and answer method and device
CN111428010B (en) * 2019-01-10 2024-01-12 北京汇钧科技有限公司 Man-machine intelligent question-answering method and device
CN112527985A (en) * 2020-12-04 2021-03-19 杭州远传新业科技有限公司 Unknown problem processing method, device, equipment and medium
CN115187153B (en) * 2022-09-14 2022-12-09 杭银消费金融股份有限公司 Data processing method and system applied to business risk tracing
CN115187153A (en) * 2022-09-14 2022-10-14 杭银消费金融股份有限公司 Data processing method and system applied to business risk tracing

Similar Documents

Publication Publication Date Title
CN107315731A (en) Text similarity computing method
CN109493166B (en) Construction method for task type dialogue system aiming at e-commerce shopping guide scene
CN104951433B (en) The method and system of intention assessment is carried out based on context
CN104111933B (en) Obtain business object label, set up the method and device of training pattern
CN104820629B (en) A kind of intelligent public sentiment accident emergent treatment system and method
CN103150333B (en) Opinion leader identification method in microblog media
CN109189904A (en) Individuation search method and system
CN110147445A (en) Intension recognizing method, device, equipment and storage medium based on text classification
CN109767318A (en) Loan product recommended method, device, equipment and storage medium
CN107977415A (en) Automatic question-answering method and device
CN103116588A (en) Method and system for personalized recommendation
CN106126751A (en) A kind of sorting technique with time availability and device
CN105844424A (en) Product quality problem discovery and risk assessment method based on network comments
CN102193936A (en) Data classification method and device
CN105022754A (en) Social network based object classification method and apparatus
Seret et al. A new SOM-based method for profile generation: Theory and an application in direct marketing
CN105787025A (en) Network platform public account classifying method and device
CN109766557A (en) A kind of sentiment analysis method, apparatus, storage medium and terminal device
CN104750674A (en) Man-machine conversation satisfaction degree prediction method and system
CN106844407A (en) Label network production method and system based on data set correlation
CN112215629B (en) Multi-target advertisement generating system and method based on construction countermeasure sample
CN104572915A (en) User event relevance calculation method based on content environment enhancement
CN110134866A (en) Information recommendation method and device
Catapang et al. A bilingual chatbot using support vector classifier on an automatic corpus engine dataset
CN114036289A (en) Intention identification method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20171103