CN106484664A - Similarity calculating method between a kind of short text - Google Patents

Similarity calculating method between a kind of short text Download PDF

Info

Publication number
CN106484664A
CN106484664A CN201610920608.5A CN201610920608A CN106484664A CN 106484664 A CN106484664 A CN 106484664A CN 201610920608 A CN201610920608 A CN 201610920608A CN 106484664 A CN106484664 A CN 106484664A
Authority
CN
China
Prior art keywords
question
user input
similarity
candidate
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610920608.5A
Other languages
Chinese (zh)
Other versions
CN106484664B (en
Inventor
简仁贤
陈秀龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intelligent Technology (shanghai) Co Ltd
Original Assignee
Intelligent Technology (shanghai) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intelligent Technology (shanghai) Co Ltd filed Critical Intelligent Technology (shanghai) Co Ltd
Priority to CN201610920608.5A priority Critical patent/CN106484664B/en
Publication of CN106484664A publication Critical patent/CN106484664A/en
Application granted granted Critical
Publication of CN106484664B publication Critical patent/CN106484664B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses similarity calculating method between a kind of short text, obtain corpus data, pretreatment is carried out to corpus data, obtains corpus;According to corpus, obtain keyword extraction model, using participle instrument to corpus participle, and train acquisition term vector collection with word2vec;Obtain the problem of user input text and candidate's question and answer pair, respectively obtain word segmentation result and keyword extraction result;According to word segmentation result and keyword extraction result, calculate the problem of candidate's question and answer pair and the term vector of user input text by term vector collection, sentence vector is obtained by term vector, calculate the similarity between two sentence vectors;Carry out the correction of similarity by the information comprising in the text of user input and the problem of candidate's question and answer pair, obtain revised similarity.The present invention is calculated by the cosine similarity between to the problem sentence of user input and candidate's question and answer pair vector, and by the sentence pattern of sentence, name entity and pronoun correction similarity.

Description

Similarity calculating method between a kind of short text
Technical field
The present invention relates to Internet technical field, more particularly, to intelligent human computer conversation field.
Background technology
With the continuous rising of the informationalized continuous evolution of human society and manual service cost, people increasingly wish Exchanged with computer by natural language, human-machine intelligence's chat system becomes the product being born under such historical background.
Mainly there are two kinds of implementation methods in existing interactive system, one kind is retrieval model, another kind is to generate mould Type.It is the process of primary information retrieval that retrieval model treats as a wheel human computer conversation, is asked by getting out certain data volume in advance Answer questions, and question and answer are established index to the problem in (it is made up of a problem and several answers).When user input one When sentence or some words, just it is treated as primary retrieval, finds the problem the most close with its semanteme in all candidate's question and answer centerings, Then the answer of this problem is returned to this user, complete a wheel human computer conversation.Therefore want to obtain appropriate answer key It is that how to find the semantic most like problem with the input of user.Due to user input and candidate in interactive system The short text that the problem of question and answer centering is typically made up of one or some short sentences, so just fall problem phase between short text Calculate like degree.
In prior art, the method calculating similarity between short text is exactly will be each for the problem of user input and candidate's question and answer pair The sentence vector of same dimension is changed in rotation, and each dimension values of vector are each in this user input or the problem of candidate's question and answer pair From word (or referred to as participle) TF*IDF value, then weigh similarity between the two by such as calculating cosine similarity Come to all candidate's question and answer to sequence, the method is common method in search engine.But calculated remaining using the TF*IDF of vector String similarity looking for the method for most like problem to only considered the text similarity between sentence, that is, literal upper how many repeat Participle judging similarity between sentence, this is clearly inadequate, and such as " I am very tired " and " I want to sleep " semanteme are the same But almost there is no too many dittograph on literal, the method just cannot tackle this situation.Further, since interactive system leads to Often use short sentence, therefore TF is substantially 1, no too big effect, this also can affect the effect of the method.
Defect therefore of the prior art is, calculates user input and candidate's question and answer to asking by the TF*IDF value of participle The cosine similarity of the term vector of topic, only considered the text similarity between sentence, how many may only be gone up by literal The similarity to judge sentence for the participle repeating, so can make the judgement of similarity very inaccurate, directly result in human computer conversation The information replying user in system is inaccurate.
Content of the invention
The technical problem to be solved in the present invention is to provide similarity calculating method between a kind of short text, employs defeated to user Enter or the problem of candidate's question and answer pair carries out participle and keyword extraction, obtain corresponding term vector, according to term vector, calculating obtains Corresponding sentence vector, be then calculated the similarity between two sentences vectors, finally by the sentence pattern of sentence, name entity and Pronoun is modified to similarity, makes similarity become more accurate, and then improves the standard replying user in interactive system Really property.
For solving above-mentioned technical problem, the technical scheme that the present invention provides is:
The present invention provides similarity calculating method between a kind of short text, including:
Step S1, obtains corpus data, carries out pretreatment to described corpus data, obtain corpus;
Step S2, according to described corpus, obtains keyword extraction model, using participle instrument to described corpus Participle, and train acquisition term vector collection with word2vec;
Step S3, obtains problem, the problem to described candidate's question and answer pair and the institute of user input text and candidate's question and answer pair State user input text and participle is carried out respectively by described participle instrument, defeated to the problem and described user of described candidate's question and answer pair Enter text and respectively keyword extraction is carried out by described keyword extraction model, obtain the participle of the problem of described candidate's question and answer pair Result and keyword extraction result, and the word segmentation result of described user input text and keyword extraction result;
Step S4, the word segmentation result of the problem according to described candidate's question and answer pair and keyword extraction result, by institute's predicate Vector set obtains term vector, the word segmentation result according to described user input text and the key word of the problem of described candidate's question and answer pair Extract result, obtain the term vector of described user input text by described term vector collection;
Step S5, the term vector of the problem according to described candidate's question and answer pair, calculate the problem obtaining described candidate's question and answer pair Sentence vector, according to the term vector of described user input text, calculate the sentence vector obtaining described user input text;
Step S6, the sentence vector of the user input text described in sentence vector sum of the problem according to described candidate's question and answer pair, meter Calculate the similarity between two sentence vectors;
Step S7, according to the similarity between described sentence vector, by text and described candidate's question and answer of described user input To problem in the information that comprises carry out the correction of similarity, obtain revised similarity.
The present invention is a kind of technical scheme of similarity calculating method between short text is first to obtain corpus data, to institute's predicate Material data carries out pretreatment, obtains corpus, according to described corpus, obtains keyword extraction model, using participle work Tool is to described corpus participle, and trains acquisition term vector collection with word2vec;Obtain user input text and candidate's question and answer To problem, the problem to described candidate's question and answer pair and described user input text are carried out point respectively by described participle instrument Word, the problem to described candidate's question and answer pair and described user input text carry out key by described keyword extraction model respectively Word extracts, and obtains the word segmentation result of problem and the keyword extraction result of described candidate's question and answer pair, and described user input text Word segmentation result and keyword extraction result;Word segmentation result then according to the problem of described candidate's question and answer pair and keyword extraction As a result, obtain the term vector of the problem of described candidate's question and answer pair by described term vector collection, according to described user input text Word segmentation result and keyword extraction result, obtain the term vector of described user input text by described term vector collection;According to institute State the term vector of the problem of candidate's question and answer pair, calculate the sentence vector of the problem obtaining described candidate's question and answer pair, according to described user The term vector of input text, calculates the sentence vector obtaining described user input text;According to two sentence vectors, calculate two sentences to Similarity between amount;Finally according to the similarity between described sentence vector, asked by the text and described candidate of described user input The information comprising in the problem answered questions carries out the correction of similarity, obtains revised similarity.
The present invention is that between a kind of short text, similarity calculating method employs to user input text and candidate's question and answer pair The process that problem carries out participle and extracts key word, then calculates user input text and candidate's question and answer according to participle and key word To problem term vector, then calculate this two term vectors respectively, obtain user input text and the problem of candidate's question and answer pair Sentence vector, finally by the cosine similarity being calculated between two sentences vector, pass through further user input text and The information comprising in the problem of candidate's question and answer pair is modified to similarity, obtains more accurate similarity, makes human computer conversation In system, the answer replying user is more accurate.
Further, the information comprising in the problem of the text of described user input and described candidate's question and answer pair is text sentence Type, name entity and personal pronoun, described name entity includes place name and organization names.
The sentence vector of the problem of the text according to user input and candidate's question and answer pair, calculates similar between two sentence vectors Degree, the similarity that this method obtains is accurate as a rule, but is needing to consider sentence pattern, name entity and person generation Only still not accurate as the whether similar foundation of the semanteme judging text according to the similarity between two sentence vectors when word Really, therefore similarity to be modified, the present invention also analyzes the text of user input and the information of the problem of candidate's question and answer pair, It is exactly to the sentence pattern in text, name entity and personal pronoun to be analyzed, revise similarity further, and then it is just right to improve people Telephone system replys the accuracy of customer problem.
Further, in described step S2, obtain described keyword extraction model and include:
Step S21, obtains key word corpus, according to described key word corpus, carries out participle, obtains participle knot Really;
Step S22, according to described word segmentation result, marks the key in described word segmentation result by way of artificial mark Word, the key word corpus after manually being marked;
Step S23, according to the key word corpus after described artificial mark, obtains key word by maximum entropy training and carries Delivery type.
The key word in participle can be extracted by keyword extraction model, that is, participle comprises key word, because key word The semanteme of text more can be represented, so extracting the key word in participle, can be more similar using participle calculating than only in conjunction with key word Degree is more accurate.In order to train keyword extraction model, first obtain key word corpus, these key word language materials can and word The corpus of vector are different, and the method then passing through artificial mark marks out key word, by the side of maximum entropy in participle Method sets up training pattern, and in this model, which this model can export automatically is to close to any new text input not marked Which is not key word to keyword, obtains keyword set with this, to help improve the similarity between sentence vector.
Further, described keyword extraction model is the grader of one 2 classification.The model of said extracted key word is By this grader, the grader of one 2 classification, can predict that in sentence, which word is key word, improves and extracts key word Accuracy.
Further, described term vector collection is obtained by word2vec model training.Word2vec training tool is a kind of Neural network model, the semanteme of the term vector that this model training method obtains be by with it often together with more phase occurs Seemingly, that is the semantic information of the term vector that this model training method obtains is co-occurrence according to word to be captured.Pass through Word2vec model training obtains term vector, and the information in conjunction with key word can be calculated more accurately sentence vector, makes phase More accurate like degree.
Further, the term vector dimension phase of the term vector of problem of described candidate's question and answer pair and described user input text With.The dimension of term vector is identical, so just facilitates the meansigma methodss calculating below participle vector, key word term vector average Value, then calculates the sentence vector of the sentence vector sum user input text to problem for the corresponding candidate's question and answer again, finally counts Calculate the similarity between this two sentence vectors, therefore, the dimension of the term vector obtaining is identical, and the dimension obtaining term vector wants phase With.
Further, described name entity and personal pronoun are obtained by dictionary method.Similarity is modified permissible By naming entity and personal pronoun, present invention primarily contemplates place name or mechanism's name in name entity.Name entity correction be Directly result in two language justice dissmilarities in order to solve the difference of place name such as, if " Beijing has anything to be fond of eating " is with " Tianjin has assorted Nice ", this two word is different except place name, and semanteme is similar it is therefore desirable to be repaiied to similarity by naming entity Just, the similarity of two words whether unanimously can directly be judged according to the place name occurring in sentence or mechanism's name, it is right to improve The judgement of similarity between sentence.Therefore the present invention uses dictionary method, and dictionary file comprises main prefecture-level city of China, and each ground Separate no inclusion relation between name.
Further, described corpus data is obtained by web crawlers technology.A large amount of language materials are obtained by crawler technology, climbs Worm technology is a kind of program of automatic acquisition web page contents, by reptile obtain some mhkcs, Ask-Answer Community, forum, microblogging, hundred The longer semantic information of the particularly content such as section, news enriches simultaneously again relatively colloquial model and reply etc. as training language So that corpus information comprehensive and abundant, the selection of language material influences whether the quality of training pattern to material, finally affects similarity.
Further, the similarity between described sentence vector is calculated by the method for cosine similarity.According to two sentence vectors Between cosine computational methods calculate cosine value between two sentences vector, cosine value, closer to 1, indicates that angle closer to 0 degree, It is exactly that two sentence vectors are more similar.Cosine computational methods quickly and easily, can improve systematic function.
Further, described participle instrument is Chinese handling implement bag hanlp.By participle instrument, corpus are entered Row participle, the participle instrument that the present invention selects is hanlp (Han Language Processing), and hanlp is to increase income freely Chinese processes bag, a series of Java tool kit being made up of models and algorithm, it is possible to achieve Chinese word segmentation, key word carries Take, a series of functions such as index participle, also there is the complete functions such as offer morphological analysis, syntactic analysis, semantic understanding. Hanlp possesses perfect in shape and function, performance efficiency, framework is clear, language material is stylish, the feature that can customize.Therefore the present invention selects Hanlp is as participle instrument.
Brief description
In order to be illustrated more clearly that the specific embodiment of the invention or technical scheme of the prior art, below will be to concrete In embodiment or description of the prior art, the accompanying drawing of required use is briefly described.
The flow chart that Fig. 1 shows similarity calculating method between a kind of short text that first embodiment of the invention is provided.
Specific embodiment
Below in conjunction with accompanying drawing, the embodiment of technical solution of the present invention is described in detail.Following examples are only used for Clearly technical scheme is described, is therefore intended only as example, and the protection of the present invention can not be limited with this Scope.
Embodiment one
The flow chart that Fig. 1 shows similarity calculating method between a kind of short text that first embodiment of the invention is provided. As shown in figure 1, similarity calculating method includes between short text according to a first embodiment of the present invention:
Step S1, obtains corpus data, carries out pretreatment to corpus data, obtain corpus;
Step S2, according to corpus, obtains keyword extraction model, using participle instrument to corpus participle, and Trained with word2vec and obtain term vector collection;
Step S3, obtains the problem of user input text and candidate's question and answer pair, defeated to the problem and user of candidate's question and answer pair Enter text and participle is carried out respectively by participle instrument, the problem to candidate's question and answer pair and user input text pass through key word respectively Extraction model carries out keyword extraction, obtains word segmentation result and the keyword extraction result of the problem of candidate's question and answer pair, and user The word segmentation result of input text and keyword extraction result;
Step S4, the word segmentation result of the problem according to candidate's question and answer pair and keyword extraction result, by term vector collection meter Calculate the term vector of the problem of candidate's question and answer pair, the word segmentation result according to user input text and keyword extraction result, pass through Term vector collection calculates the term vector of user input text;
Step S5, the term vector of the problem according to candidate's question and answer pair, calculate the sentence vector of the problem obtaining candidate's question and answer pair, According to the term vector of user input text, calculate the sentence vector obtaining user input text;
Step S6, the sentence vector of the sentence vector sum user input text of the problem according to candidate's question and answer pair, calculate two sentences Similarity between vector;
Step S7, according to the similarity between sentence vector, by wrapping in the text of user input and the problem of candidate's question and answer pair The information containing carries out the correction of similarity, obtains revised similarity.
The present invention is a kind of technical scheme of similarity calculating method between short text, first obtains corpus data, to language material number According to carrying out pretreatment, obtain corpus, according to corpus, obtain keyword extraction model, using participle instrument to training Language material participle, and train acquisition term vector collection with word2vec;Obtain the problem of user input text and candidate's question and answer pair, to time The problem of question and answer pair and user input text is selected to carry out participle, the problem to candidate's question and answer pair and user respectively by participle instrument Input text carries out keyword extraction by keyword extraction model respectively, obtain the word segmentation result of problem of candidate's question and answer pair with Keyword extraction result, and the word segmentation result of user input text and keyword extraction result;Then according to candidate's question and answer pair The term vector of problem, calculates the sentence vector of the problem obtaining candidate's question and answer pair, according to the term vector of user input text, calculating obtains Obtain the sentence vector of user input text;The sentence vector of the sentence vector sum user input text of the problem according to candidate's question and answer pair, meter Calculate the similarity between two sentence vectors;According to the similarity between sentence vector, by the text of user input and candidate's question and answer pair The information comprising in problem carries out the correction of similarity, obtains revised similarity.
The present invention is similarity calculating method between a kind of short text, employs to user input text and candidate's question and answer pair The process that problem carries out participle and extracts key word, then obtains user input text according to participle and key word and term vector collection With the term vector of the problem of candidate's question and answer pair, then calculate the term vector of both respectively, obtain user input text and candidate The sentence vector of the problem of question and answer pair, finally by the cosine similarity being calculated between two sentence vectors, passes through user further The information comprising in the problem of the text of input and candidate's question and answer pair is modified to similarity, obtains more accurately similar Degree, makes in interactive system, and the answer replying user is more accurate.
Specifically, corpus data is obtained by web crawlers technology.A large amount of language materials, crawler technology are obtained by crawler technology It is a kind of program of automatic acquisition web page contents, some mhkcs, Ask-Answer Community, forum, microblogging, encyclopaedia, new are obtained by reptile Hear etc. the particularly longer semantic information of content enrich simultaneously again relatively colloquial model and reply etc. as corpus so that Corpus information comprehensive and abundant, the selection of language material influences whether the quality of training pattern, also just directly affects participle and key word carries Take and term vector collection, finally affect similarity.
Also pretreatment to be carried out to the corpus data climbed to, obtain corpus, mainly non-Chinese content, yellow be believed Breath and advertisement etc. have done certain filtration.Afterwards the multistage word of same content is spliced in a row, the complex form of Chinese characters changes into simplified Chinese character, Do participle again, punctuation mark is removed and is replaced with space.
After corpus data pretreatment, specifically, the term vector of each word is obtained by word2vec model training. Word2vec training tool is a kind of neural network model, and the semantic information of the term vector that this model training method obtains is root To capture according to the contribution of word.Term vector collection is obtained by word2vec model training, the information in conjunction with key word can be counted Calculation obtains more accurately sentence vector, makes similarity more accurate.
Obtain keyword extraction model by corpus.First participle is carried out to corpus, then again by manually going Key word (be not marked is non-key word) in mark sentence, then trains the grader of one 2 classification with maximum entropy. User input text inputs keyword abstraction model with the problem of candidate's question and answer pair after participle, and model can do one to each participle Individual 2 classification, predict whether for key word, just to obtain respective keyword set with this.For further lift system performance, institute There are the participle of the problem of question and answer centering and keyword extraction can carry out in advance.Obtained by participle instrument and keyword extraction model The corresponding participle of problem and key word to user input text and question and answer centering.And then obtained by the term vector collection of word2vec The corresponding term vector of each word in the problem of user input text and question and answer centering.
Calculate the problem corresponding sentence vector of user input text and question and answer centering, computational methods are all (all points of 0.8* The meansigma methodss of the term vector of word)+0.2* (meansigma methodss of key word term vector).The meansigma methodss of vector are exactly that each vector is right Dimension values are answered to be added and then divided by vectorial number.In addition participle contains key word, so this computational methods are to key word It is weighted, because key word more can represent the semanteme of text.0.8 and 0.2 weight is the conclusion being drawn by test of many times. Because term vector is all 300 dimensions, so user input is also all 300 dimensions with the sentence vector of the problem of candidate's question and answer centering.
Specifically, participle instrument is Chinese handling implement bag hanlp.Corpus are carried out point by participle instrument Word, the participle instrument that the present invention selects is hanlp (Han Language Processing), and hanlp is free Chinese of increasing income Speech processes bag, a series of Java tool kit being made up of models and algorithm, it is possible to achieve Chinese word segmentation, keyword extraction, rope Draw a series of function such as participle, also there is the complete functions such as offer morphological analysis, syntactic analysis, semantic understanding.Hanlp possesses Perfect in shape and function, performance efficiency, framework is clear, language material is stylish, the feature that can customize.Therefore the present invention select hanlp as point Word instrument.
After obtaining user input text and the problem corresponding sentence vector of question and answer centering, calculate similar between sentence vector Degree, is calculated by the method for cosine similarity, its codomain is [0,1].Calculated according to the cosine computational methods between two vectors Cosine value between two vectors, cosine value, closer to 1, indicates that angle closer to 0 degree, that is, two sentence vectors is more similar. Cosine computational methods quickly and easily, can improve systematic function.
The present invention most importantly passes through the information comprising in user input text and the problem of candidate's question and answer pair to similar Degree is modified, and this information mainly includes text sentence pattern, name entity and personal pronoun, and name entity is present invention primarily contemplates ground Name and organization names etc..These information are not considered it is therefore desirable to be entered using this information in the sentence vector similarity in described Row is revised.
From the point of view of every-day language experience and experimental result, in conjunction with following three kinds of situations, similarity is modified:
The first situation, is modified to similarity according to text sentence pattern.
When the text message of user input is " being non-question sentence ", for example:" you removed the Temple of Heaven yesterday?", or " positive and negative Question sentence ", for example:" you were either with or without removing the Temple of Heaven?", generally semantically differ greatly with the problem of the question and answer pair of " assertive sentence " type, If i.e. the short text of user input is " being non-question sentence " or the sentence pattern of " A-not-A question ", and the problem of candidate's question and answer centering is When " assertive sentence ", gained similarity need to reduce further, if in the same manner user input for " assertive sentence " problem of candidate's question and answer pair During for " being non-question sentence " or " A-not-A question ", gained similarity needs to reduce (the question and answer that concrete reduction ratio need to use according to system To determine to this language material and by experiment, the present invention is according to existing language material and experiment it is proposed that similarity reduces 30% about).
Similar also have " assertive sentence " and " negative ".System judges sentence using linguistic rules with Sentence Template on realizing Type.
Second situation, is modified to similarity according to name entity.
If the name entity of each self-contained same type of the problem of user input text and candidate's question and answer centering is (as all Have a place name, or a Dou Youyige mechanism name), but both comprise that place name is different and place name between no inclusion relation (as Beijing and sea Shallow lake area just belongs to inclusion relation) when, gained similarity needs reduction, and (concrete reduction ratio needs and to be passed through using language material according to system Test and to determine, the present invention is according to existing language material and experiment it is proposed that similarity reduces 50% about).In realization, in order to control solution The border of problem certainly and running efficiency of system, the present invention uses dictionary method, and dictionary file comprises main prefecture-level city of China, often Separate no inclusion relation between individual place name.So avoid because place name correlation leads to the language calculating in corpus Adopted similarity too high (as " Beijing has anything to be fond of eating?" with " Shanghai has anything to be fond of eating?", " Beijing " is often existed with " Shanghai " Occur together in corpus, their term vector is very related, but this two sentence semantics differ greatly).Mechanism's name also similarity Reason.
The third situation, is modified to similarity according to personal pronoun.
If each self-contained pronoun of the problem of user input text and candidate's question and answer centering, such as user input text is " I go today the Temple of Heaven play " and the problem of candidate's question and answer centering are " he goes the Temple of Heaven to play today ", now, people in two words Pronoun is claimed to have differences, gained similarity need to reduce that (concrete reduction ratio needs and to pass through using language material according to system further Test and to determine, the present invention is according to existing language material and experiment it is proposed that similarity reduces 50% about).Also using dictionary in realization Method, dictionary file comprises conventional pronoun.
It should be noted that by three of the above mode correction similarity in the present invention, can also sentence by other means The semanteme of disconnected text, revises similarity further.
The present invention can accurately calculate the semanteme between this short text in interactive system by above method The accuracy of similarity, preferably makes full use of limited question and answer to data, improves the Consumer's Experience of interactive system.
Finally it should be noted that:Various embodiments above only in order to technical scheme to be described, is not intended to limit;To the greatest extent Pipe has been described in detail to the present invention with reference to foregoing embodiments, it will be understood by those within the art that:Its according to So the technical scheme described in foregoing embodiments can be modified, or wherein some or all of technical characteristic is entered Row equivalent;And these modifications or replacement, do not make the essence of appropriate technical solution depart from various embodiments of the present invention technology The scope of scheme, it all should be covered in the middle of the claim of the present invention and the scope of description.

Claims (10)

1. between a kind of short text similarity calculating method it is characterised in that include:
Step S1, obtains corpus data, carries out pretreatment to described corpus data, obtain corpus;
Step S2, according to described corpus, is obtained keyword extraction model, using participle instrument, described corpus is divided Word, and train acquisition term vector collection with word2vec;
Step S3, obtains problem, the problem to described candidate's question and answer pair and the described use of user input text and candidate's question and answer pair Family input text carries out participle respectively by described participle instrument, and the problem to described candidate's question and answer pair and described user input are civilian This carries out keyword extraction by described keyword extraction model respectively, obtains the word segmentation result of the problem of described candidate's question and answer pair With keyword extraction result, and the word segmentation result of described user input text and keyword extraction result;
Step S4, the word segmentation result of the problem according to described candidate's question and answer pair and keyword extraction result, by described term vector Collection obtains term vector, the word segmentation result according to described user input text and the keyword extraction of the problem of described candidate's question and answer pair As a result, the term vector of described user input text is obtained by described term vector collection;
Step S5, the term vector of the problem according to described candidate's question and answer pair, calculate the sentence of the problem obtaining described candidate's question and answer pair Vector, according to the term vector of described user input text, calculates the sentence vector obtaining described user input text;
Step S6, the sentence vector of the user input text described in sentence vector sum of the problem according to described candidate's question and answer pair, calculate two Similarity between individual sentence vector;
Step S7, according to the similarity between described sentence vector, by the text of described user input and described candidate's question and answer pair The information comprising in problem carries out the correction of similarity, obtains revised similarity.
2. according to claim 1 between short text similarity calculating method it is characterised in that
The information comprising in the problem of the text of described user input and described candidate's question and answer pair be text sentence pattern, name entity and Personal pronoun, described name entity includes place name and organization names.
3. according to claim 1 between short text similarity calculating method it is characterised in that
In described step S2, obtain described keyword extraction model and include:
Step S21, obtains key word corpus, according to described key word corpus, carries out participle, obtain word segmentation result;
Step S22, according to described word segmentation result, is marked the key word in described word segmentation result by way of artificial mark, obtains Key word corpus to after artificial mark;
Step S23, according to the key word corpus after described artificial mark, obtains keyword extraction mould by maximum entropy training Type.
4. according to claim 1 between short text similarity calculating method it is characterised in that
Described keyword extraction model is the grader of one 2 classification.
5. according to claim 1 between short text similarity calculating method it is characterised in that
Described term vector collection is obtained by word2vec model training.
6. according to claim 1 between short text similarity calculating method it is characterised in that
The term vector of problem of described candidate's question and answer pair is identical with the term vector dimension of described user input text.
7. according to claim 2 between short text similarity calculating method it is characterised in that
Described name entity and personal pronoun are obtained by dictionary method.
8. according to claim 1 between short text similarity calculating method it is characterised in that
Described corpus data is obtained by web crawlers technology.
9. according to claim 1 between short text similarity calculating method it is characterised in that
Similarity between described sentence vector is calculated by the method for cosine similarity.
10. according to claim 1 between short text similarity calculating method it is characterised in that
Described participle instrument is Chinese handling implement bag hanlp.
CN201610920608.5A 2016-10-21 2016-10-21 Similarity calculating method between a kind of short text Active CN106484664B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610920608.5A CN106484664B (en) 2016-10-21 2016-10-21 Similarity calculating method between a kind of short text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610920608.5A CN106484664B (en) 2016-10-21 2016-10-21 Similarity calculating method between a kind of short text

Publications (2)

Publication Number Publication Date
CN106484664A true CN106484664A (en) 2017-03-08
CN106484664B CN106484664B (en) 2019-03-01

Family

ID=58271016

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610920608.5A Active CN106484664B (en) 2016-10-21 2016-10-21 Similarity calculating method between a kind of short text

Country Status (1)

Country Link
CN (1) CN106484664B (en)

Cited By (61)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106776559A (en) * 2016-12-14 2017-05-31 东软集团股份有限公司 The method and device of text semantic Similarity Measure
CN107066621A (en) * 2017-05-11 2017-08-18 腾讯科技(深圳)有限公司 A kind of search method of similar video, device and storage medium
CN107229753A (en) * 2017-06-29 2017-10-03 济南浪潮高新科技投资发展有限公司 A kind of article classification of countries method based on word2vec models
CN107329949A (en) * 2017-05-24 2017-11-07 北京捷通华声科技股份有限公司 A kind of semantic matching method and system
CN107391614A (en) * 2017-07-04 2017-11-24 重庆智慧思特大数据有限公司 A kind of Chinese question and answer matching process based on WMD
CN107577708A (en) * 2017-07-31 2018-01-12 北京北信源软件股份有限公司 Class base construction method and system based on SparkMLlib document classifications
CN107577658A (en) * 2017-07-18 2018-01-12 阿里巴巴集团控股有限公司 Term vector processing method, device and electronic equipment
CN107688604A (en) * 2017-07-26 2018-02-13 阿里巴巴集团控股有限公司 Data answering processing method, device and server
CN107729322A (en) * 2017-11-06 2018-02-23 广州杰赛科技股份有限公司 Segmenting method and device, establish sentence vector generation model method and device
CN107980130A (en) * 2017-11-02 2018-05-01 深圳前海达闼云端智能科技有限公司 It is automatic to answer method, apparatus, storage medium and electronic equipment
CN108305057A (en) * 2018-01-22 2018-07-20 平安科技(深圳)有限公司 Dispensing apparatus, method and the computer readable storage medium of electronics red packet
CN108334495A (en) * 2018-01-30 2018-07-27 国家计算机网络与信息安全管理中心 Short text similarity calculating method and system
CN108388559A (en) * 2018-02-26 2018-08-10 中译语通科技股份有限公司 Name entity recognition method and system, computer program of the geographical space under
CN108427735A (en) * 2018-02-28 2018-08-21 东华大学 Clinical knowledge map construction method based on electronic health record
CN108664465A (en) * 2018-03-07 2018-10-16 珍岛信息技术(上海)股份有限公司 One kind automatically generating text method and relevant apparatus
CN108920604A (en) * 2018-06-27 2018-11-30 百度在线网络技术(北京)有限公司 Voice interactive method and equipment
CN108932066A (en) * 2018-06-13 2018-12-04 北京百度网讯科技有限公司 Method, apparatus, equipment and the computer storage medium of input method acquisition expression packet
CN109062977A (en) * 2018-06-29 2018-12-21 厦门快商通信息技术有限公司 A kind of automatic question answering text matching technique, automatic question-answering method and system based on semantic similarity
CN109086303A (en) * 2018-06-21 2018-12-25 深圳壹账通智能科技有限公司 The Intelligent dialogue method, apparatus understood, terminal are read based on machine
CN109241240A (en) * 2018-08-17 2019-01-18 国家电网有限公司客户服务中心 Power failure repairing information automatically forwarding method
CN109522394A (en) * 2018-10-12 2019-03-26 北京奔影网络科技有限公司 Knowledge base question and answer system and method for building up
CN109582966A (en) * 2018-12-03 2019-04-05 北京容联易通信息技术有限公司 A kind of information matching method and device
CN109739956A (en) * 2018-11-08 2019-05-10 第四范式(北京)技术有限公司 Corpus cleaning method, device, equipment and medium
CN109815996A (en) * 2019-01-07 2019-05-28 北京首钢自动化信息技术有限公司 It is a kind of based on the scene of Recognition with Recurrent Neural Network from adaptation method and device
CN109871437A (en) * 2018-11-30 2019-06-11 阿里巴巴集团控股有限公司 Method and device for the processing of customer problem sentence
CN109902159A (en) * 2019-01-29 2019-06-18 华融融通(北京)科技有限公司 A kind of intelligent O&M statement similarity matching process based on natural language processing
CN110020189A (en) * 2018-06-29 2019-07-16 武汉掌游科技有限公司 A kind of article recommended method based on Chinese Similarity measures
CN110135551A (en) * 2019-05-15 2019-08-16 西南交通大学 A kind of robot chat method of word-based vector sum Recognition with Recurrent Neural Network
CN110245219A (en) * 2019-04-25 2019-09-17 义语智能科技(广州)有限公司 A kind of answering method and equipment based on automatic extension Q & A database
CN110275946A (en) * 2019-05-14 2019-09-24 闽江学院 A kind of FAQ automatic question-answering method and device
CN110287295A (en) * 2019-05-14 2019-09-27 闽江学院 Question and answer robot construction method and system based on small routine
CN110309278A (en) * 2019-05-23 2019-10-08 泰康保险集团股份有限公司 Keyword retrieval method, apparatus, medium and electronic equipment
WO2019200923A1 (en) * 2018-04-19 2019-10-24 京东方科技集团股份有限公司 Pinyin-based semantic recognition method and device and human-machine conversation system
CN110543636A (en) * 2019-09-06 2019-12-06 出门问问(武汉)信息科技有限公司 training data selection method of dialogue system
CN110597966A (en) * 2018-05-23 2019-12-20 北京国双科技有限公司 Automatic question answering method and device
CN110674273A (en) * 2019-09-17 2020-01-10 安徽信息工程学院 Intelligent question-answering robot training method for word segmentation
CN110727769A (en) * 2018-06-29 2020-01-24 优视科技(中国)有限公司 Corpus generation method and device, and man-machine interaction processing method and device
CN110866095A (en) * 2019-10-10 2020-03-06 重庆金融资产交易所有限责任公司 Text similarity determination method and related equipment
CN110889285A (en) * 2018-08-16 2020-03-17 阿里巴巴集团控股有限公司 Method, apparatus, device and medium for determining core word
CN111046147A (en) * 2018-10-11 2020-04-21 马上消费金融股份有限公司 Question answering method and device and terminal equipment
CN111144112A (en) * 2019-12-30 2020-05-12 广州广电运通信息科技有限公司 Text similarity analysis method and device and storage medium
CN111191465A (en) * 2018-10-25 2020-05-22 中国移动通信有限公司研究院 Question-answer matching method, device, equipment and storage medium
CN111209373A (en) * 2020-01-07 2020-05-29 北京启明星辰信息安全技术有限公司 Sensitive text recognition method and device based on natural semantics
CN111241239A (en) * 2020-01-07 2020-06-05 科大讯飞股份有限公司 Method for detecting repeated questions, related device and readable storage medium
CN111401042A (en) * 2020-03-26 2020-07-10 支付宝(杭州)信息技术有限公司 Method and system for training text key content extraction model
CN111428486A (en) * 2019-01-08 2020-07-17 北京沃东天骏信息技术有限公司 Article information data processing method, apparatus, medium, and electronic device
CN111460081A (en) * 2020-03-30 2020-07-28 招商局金融科技有限公司 Answer generation method based on deep learning, electronic device and readable storage medium
CN111460783A (en) * 2020-03-30 2020-07-28 腾讯科技(深圳)有限公司 Data processing method and device, computer equipment and storage medium
CN112084310A (en) * 2019-06-12 2020-12-15 阿里巴巴集团控股有限公司 Reply information generation and automatic reply method and device
CN112182193A (en) * 2020-10-19 2021-01-05 山东旗帜信息有限公司 Log obtaining method, device and medium in traffic industry
CN112257410A (en) * 2020-10-15 2021-01-22 江苏卓易信息科技股份有限公司 Similarity calculation method for unbalanced text
CN112507097A (en) * 2020-12-17 2021-03-16 神思电子技术股份有限公司 Method for improving generalization capability of question-answering system
CN112559658A (en) * 2020-12-08 2021-03-26 中国科学技术大学 Address matching method and device
CN112836062A (en) * 2021-01-13 2021-05-25 哈尔滨工程大学 Relation extraction method of text corpus
CN112883165A (en) * 2021-03-16 2021-06-01 山东亿云信息技术有限公司 Intelligent full-text retrieval method and system based on semantic understanding
CN112988970A (en) * 2021-03-11 2021-06-18 浙江康旭科技有限公司 Text matching algorithm serving intelligent question-answering system
CN113240485A (en) * 2021-05-10 2021-08-10 北京沃东天骏信息技术有限公司 Training method of text generation model, and text generation method and device
CN113343708A (en) * 2021-06-11 2021-09-03 北京声智科技有限公司 Method and device for realizing statement generalization based on semantics
CN114936277A (en) * 2022-01-28 2022-08-23 中国银联股份有限公司 Similarity problem matching method and user similarity problem matching system
CN116932726A (en) * 2023-08-04 2023-10-24 重庆邮电大学 Open domain dialogue generation method based on controllable multi-space feature decoupling
CN118520929A (en) * 2024-07-25 2024-08-20 国家计算机网络与信息安全管理中心 Training method of text similarity determination model and text similarity calculation method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013107345A1 (en) * 2012-01-18 2013-07-25 腾讯科技(深圳)有限公司 User question processing method and system
EP2833271A1 (en) * 2012-05-14 2015-02-04 Huawei Technologies Co., Ltd Multimedia question and answer system and method
CN105095444A (en) * 2015-07-24 2015-11-25 百度在线网络技术(北京)有限公司 Information acquisition method and device
CN105426354A (en) * 2015-10-29 2016-03-23 杭州九言科技股份有限公司 Sentence vector fusion method and apparatus

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013107345A1 (en) * 2012-01-18 2013-07-25 腾讯科技(深圳)有限公司 User question processing method and system
EP2833271A1 (en) * 2012-05-14 2015-02-04 Huawei Technologies Co., Ltd Multimedia question and answer system and method
CN105095444A (en) * 2015-07-24 2015-11-25 百度在线网络技术(北京)有限公司 Information acquisition method and device
CN105426354A (en) * 2015-10-29 2016-03-23 杭州九言科技股份有限公司 Sentence vector fusion method and apparatus

Cited By (84)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106776559A (en) * 2016-12-14 2017-05-31 东软集团股份有限公司 The method and device of text semantic Similarity Measure
CN106776559B (en) * 2016-12-14 2020-08-11 东软集团股份有限公司 Text semantic similarity calculation method and device
CN107066621A (en) * 2017-05-11 2017-08-18 腾讯科技(深圳)有限公司 A kind of search method of similar video, device and storage medium
CN107329949A (en) * 2017-05-24 2017-11-07 北京捷通华声科技股份有限公司 A kind of semantic matching method and system
CN107229753A (en) * 2017-06-29 2017-10-03 济南浪潮高新科技投资发展有限公司 A kind of article classification of countries method based on word2vec models
CN107391614A (en) * 2017-07-04 2017-11-24 重庆智慧思特大数据有限公司 A kind of Chinese question and answer matching process based on WMD
CN107577658A (en) * 2017-07-18 2018-01-12 阿里巴巴集团控股有限公司 Term vector processing method, device and electronic equipment
CN107688604A (en) * 2017-07-26 2018-02-13 阿里巴巴集团控股有限公司 Data answering processing method, device and server
CN107577708A (en) * 2017-07-31 2018-01-12 北京北信源软件股份有限公司 Class base construction method and system based on SparkMLlib document classifications
CN107980130A (en) * 2017-11-02 2018-05-01 深圳前海达闼云端智能科技有限公司 It is automatic to answer method, apparatus, storage medium and electronic equipment
WO2019084867A1 (en) * 2017-11-02 2019-05-09 深圳前海达闼云端智能科技有限公司 Automatic answering method and apparatus, storage medium, and electronic device
CN107729322A (en) * 2017-11-06 2018-02-23 广州杰赛科技股份有限公司 Segmenting method and device, establish sentence vector generation model method and device
CN108305057A (en) * 2018-01-22 2018-07-20 平安科技(深圳)有限公司 Dispensing apparatus, method and the computer readable storage medium of electronics red packet
CN108334495A (en) * 2018-01-30 2018-07-27 国家计算机网络与信息安全管理中心 Short text similarity calculating method and system
CN108388559A (en) * 2018-02-26 2018-08-10 中译语通科技股份有限公司 Name entity recognition method and system, computer program of the geographical space under
CN108427735A (en) * 2018-02-28 2018-08-21 东华大学 Clinical knowledge map construction method based on electronic health record
CN108664465A (en) * 2018-03-07 2018-10-16 珍岛信息技术(上海)股份有限公司 One kind automatically generating text method and relevant apparatus
CN108664465B (en) * 2018-03-07 2023-06-27 珍岛信息技术(上海)股份有限公司 Method and related device for automatically generating text
WO2019200923A1 (en) * 2018-04-19 2019-10-24 京东方科技集团股份有限公司 Pinyin-based semantic recognition method and device and human-machine conversation system
US11100921B2 (en) 2018-04-19 2021-08-24 Boe Technology Group Co., Ltd. Pinyin-based method and apparatus for semantic recognition, and system for human-machine dialog
CN110597966A (en) * 2018-05-23 2019-12-20 北京国双科技有限公司 Automatic question answering method and device
CN108932066A (en) * 2018-06-13 2018-12-04 北京百度网讯科技有限公司 Method, apparatus, equipment and the computer storage medium of input method acquisition expression packet
CN109086303A (en) * 2018-06-21 2018-12-25 深圳壹账通智能科技有限公司 The Intelligent dialogue method, apparatus understood, terminal are read based on machine
US10984793B2 (en) 2018-06-27 2021-04-20 Baidu Online Network Technology (Beijing) Co., Ltd. Voice interaction method and device
CN108920604A (en) * 2018-06-27 2018-11-30 百度在线网络技术(北京)有限公司 Voice interactive method and equipment
CN110727769B (en) * 2018-06-29 2024-04-19 阿里巴巴(中国)有限公司 Corpus generation method and device and man-machine interaction processing method and device
CN110020189A (en) * 2018-06-29 2019-07-16 武汉掌游科技有限公司 A kind of article recommended method based on Chinese Similarity measures
CN109062977A (en) * 2018-06-29 2018-12-21 厦门快商通信息技术有限公司 A kind of automatic question answering text matching technique, automatic question-answering method and system based on semantic similarity
CN110727769A (en) * 2018-06-29 2020-01-24 优视科技(中国)有限公司 Corpus generation method and device, and man-machine interaction processing method and device
CN110889285B (en) * 2018-08-16 2023-06-16 阿里巴巴集团控股有限公司 Method, device, equipment and medium for determining core word
CN110889285A (en) * 2018-08-16 2020-03-17 阿里巴巴集团控股有限公司 Method, apparatus, device and medium for determining core word
CN109241240A (en) * 2018-08-17 2019-01-18 国家电网有限公司客户服务中心 Power failure repairing information automatically forwarding method
CN111046147A (en) * 2018-10-11 2020-04-21 马上消费金融股份有限公司 Question answering method and device and terminal equipment
CN109522394A (en) * 2018-10-12 2019-03-26 北京奔影网络科技有限公司 Knowledge base question and answer system and method for building up
CN111191465A (en) * 2018-10-25 2020-05-22 中国移动通信有限公司研究院 Question-answer matching method, device, equipment and storage medium
CN111191465B (en) * 2018-10-25 2023-05-09 中国移动通信有限公司研究院 Question-answer matching method, device, equipment and storage medium
CN109739956B (en) * 2018-11-08 2020-04-10 第四范式(北京)技术有限公司 Corpus cleaning method, apparatus, device and medium
CN109739956A (en) * 2018-11-08 2019-05-10 第四范式(北京)技术有限公司 Corpus cleaning method, device, equipment and medium
CN109871437A (en) * 2018-11-30 2019-06-11 阿里巴巴集团控股有限公司 Method and device for the processing of customer problem sentence
CN109871437B (en) * 2018-11-30 2023-04-21 阿里巴巴集团控股有限公司 Method and device for processing user problem statement
CN109582966A (en) * 2018-12-03 2019-04-05 北京容联易通信息技术有限公司 A kind of information matching method and device
CN109815996A (en) * 2019-01-07 2019-05-28 北京首钢自动化信息技术有限公司 It is a kind of based on the scene of Recognition with Recurrent Neural Network from adaptation method and device
CN109815996B (en) * 2019-01-07 2021-05-04 北京首钢自动化信息技术有限公司 Scene self-adaptation method and device based on recurrent neural network
CN111428486A (en) * 2019-01-08 2020-07-17 北京沃东天骏信息技术有限公司 Article information data processing method, apparatus, medium, and electronic device
CN111428486B (en) * 2019-01-08 2023-06-23 北京沃东天骏信息技术有限公司 Article information data processing method, device, medium and electronic equipment
CN109902159A (en) * 2019-01-29 2019-06-18 华融融通(北京)科技有限公司 A kind of intelligent O&M statement similarity matching process based on natural language processing
CN110245219A (en) * 2019-04-25 2019-09-17 义语智能科技(广州)有限公司 A kind of answering method and equipment based on automatic extension Q & A database
CN110275946A (en) * 2019-05-14 2019-09-24 闽江学院 A kind of FAQ automatic question-answering method and device
CN110287295A (en) * 2019-05-14 2019-09-27 闽江学院 Question and answer robot construction method and system based on small routine
CN110135551A (en) * 2019-05-15 2019-08-16 西南交通大学 A kind of robot chat method of word-based vector sum Recognition with Recurrent Neural Network
CN110309278B (en) * 2019-05-23 2021-11-16 泰康保险集团股份有限公司 Keyword retrieval method, device, medium and electronic equipment
CN110309278A (en) * 2019-05-23 2019-10-08 泰康保险集团股份有限公司 Keyword retrieval method, apparatus, medium and electronic equipment
CN112084310A (en) * 2019-06-12 2020-12-15 阿里巴巴集团控股有限公司 Reply information generation and automatic reply method and device
CN110543636A (en) * 2019-09-06 2019-12-06 出门问问(武汉)信息科技有限公司 training data selection method of dialogue system
CN110674273A (en) * 2019-09-17 2020-01-10 安徽信息工程学院 Intelligent question-answering robot training method for word segmentation
CN110866095A (en) * 2019-10-10 2020-03-06 重庆金融资产交易所有限责任公司 Text similarity determination method and related equipment
CN111144112A (en) * 2019-12-30 2020-05-12 广州广电运通信息科技有限公司 Text similarity analysis method and device and storage medium
CN111241239A (en) * 2020-01-07 2020-06-05 科大讯飞股份有限公司 Method for detecting repeated questions, related device and readable storage medium
CN111241239B (en) * 2020-01-07 2022-12-02 科大讯飞股份有限公司 Method for detecting repeated questions, related device and readable storage medium
CN111209373A (en) * 2020-01-07 2020-05-29 北京启明星辰信息安全技术有限公司 Sensitive text recognition method and device based on natural semantics
CN111401042B (en) * 2020-03-26 2023-04-14 支付宝(杭州)信息技术有限公司 Method and system for training text key content extraction model
CN111401042A (en) * 2020-03-26 2020-07-10 支付宝(杭州)信息技术有限公司 Method and system for training text key content extraction model
CN111460783B (en) * 2020-03-30 2021-07-27 腾讯科技(深圳)有限公司 Data processing method and device, computer equipment and storage medium
CN111460081B (en) * 2020-03-30 2023-04-07 招商局金融科技有限公司 Answer generation method based on deep learning, electronic device and readable storage medium
CN111460783A (en) * 2020-03-30 2020-07-28 腾讯科技(深圳)有限公司 Data processing method and device, computer equipment and storage medium
CN111460081A (en) * 2020-03-30 2020-07-28 招商局金融科技有限公司 Answer generation method based on deep learning, electronic device and readable storage medium
CN112257410A (en) * 2020-10-15 2021-01-22 江苏卓易信息科技股份有限公司 Similarity calculation method for unbalanced text
CN112182193B (en) * 2020-10-19 2023-01-13 山东旗帜信息有限公司 Log obtaining method, device and medium in traffic industry
CN112182193A (en) * 2020-10-19 2021-01-05 山东旗帜信息有限公司 Log obtaining method, device and medium in traffic industry
CN112559658B (en) * 2020-12-08 2022-12-30 中国科学技术大学 Address matching method and device
CN112559658A (en) * 2020-12-08 2021-03-26 中国科学技术大学 Address matching method and device
CN112507097B (en) * 2020-12-17 2022-11-18 神思电子技术股份有限公司 Method for improving generalization capability of question-answering system
CN112507097A (en) * 2020-12-17 2021-03-16 神思电子技术股份有限公司 Method for improving generalization capability of question-answering system
CN112836062A (en) * 2021-01-13 2021-05-25 哈尔滨工程大学 Relation extraction method of text corpus
CN112836062B (en) * 2021-01-13 2022-05-13 哈尔滨工程大学 Relation extraction method of text corpus
CN112988970A (en) * 2021-03-11 2021-06-18 浙江康旭科技有限公司 Text matching algorithm serving intelligent question-answering system
CN112883165A (en) * 2021-03-16 2021-06-01 山东亿云信息技术有限公司 Intelligent full-text retrieval method and system based on semantic understanding
CN113240485A (en) * 2021-05-10 2021-08-10 北京沃东天骏信息技术有限公司 Training method of text generation model, and text generation method and device
CN113240485B (en) * 2021-05-10 2024-09-20 北京沃东天骏信息技术有限公司 Training method of text generation model, text generation method and device
CN113343708A (en) * 2021-06-11 2021-09-03 北京声智科技有限公司 Method and device for realizing statement generalization based on semantics
CN114936277A (en) * 2022-01-28 2022-08-23 中国银联股份有限公司 Similarity problem matching method and user similarity problem matching system
CN116932726A (en) * 2023-08-04 2023-10-24 重庆邮电大学 Open domain dialogue generation method based on controllable multi-space feature decoupling
CN116932726B (en) * 2023-08-04 2024-05-10 重庆邮电大学 Open domain dialogue generation method based on controllable multi-space feature decoupling
CN118520929A (en) * 2024-07-25 2024-08-20 国家计算机网络与信息安全管理中心 Training method of text similarity determination model and text similarity calculation method

Also Published As

Publication number Publication date
CN106484664B (en) 2019-03-01

Similar Documents

Publication Publication Date Title
CN106484664B (en) Similarity calculating method between a kind of short text
CN108287922B (en) Text data viewpoint abstract mining method fusing topic attributes and emotional information
CN104765769B (en) The short text query expansion and search method of a kind of word-based vector
CN105528437B (en) A kind of question answering system construction method extracted based on structured text knowledge
CN107798140B (en) Dialog system construction method, semantic controlled response method and device
CN107818164A (en) A kind of intelligent answer method and its system
CN106354710A (en) Neural network relation extracting method
CN103207860B (en) The entity relation extraction method and apparatus of public sentiment event
CN104008166B (en) Dialogue short text clustering method based on form and semantic similarity
CN107305539A (en) A kind of text tendency analysis method based on Word2Vec network sentiment new word discoveries
CN107025299B (en) A kind of financial public sentiment cognitive method based on weighting LDA topic models
CN110020189A (en) A kind of article recommended method based on Chinese Similarity measures
CN107133214A (en) A kind of product demand preference profiles based on comment information are excavated and its method for evaluating quality
CN110674252A (en) High-precision semantic search system for judicial domain
DE112013004082T5 (en) Search system of the emotion entity for the microblog
CN110134792B (en) Text recognition method and device, electronic equipment and storage medium
CN107862087A (en) Sentiment analysis method, apparatus and storage medium based on big data and deep learning
CN103631859A (en) Intelligent review expert recommending method for science and technology projects
CN108681574A (en) A kind of non-true class quiz answers selection method and system based on text snippet
CN105843796A (en) Microblog emotional tendency analysis method and device
CN108073571B (en) Multi-language text quality evaluation method and system and intelligent text processing system
CN106446147A (en) Emotion analysis method based on structuring features
CN111460158B (en) Microblog topic public emotion prediction method based on emotion analysis
CN112100365A (en) Two-stage text summarization method
CN105930509A (en) Method and system for automatic extraction and refinement of domain concept based on statistics and template matching

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant