CN104462060A - Method and device for calculating text similarity and realizing search processing through computer - Google Patents
Method and device for calculating text similarity and realizing search processing through computer Download PDFInfo
- Publication number
- CN104462060A CN104462060A CN201410728432.4A CN201410728432A CN104462060A CN 104462060 A CN104462060 A CN 104462060A CN 201410728432 A CN201410728432 A CN 201410728432A CN 104462060 A CN104462060 A CN 104462060A
- Authority
- CN
- China
- Prior art keywords
- text string
- cypher
- angle value
- model
- semantic similitude
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a method and a device for calculating text similarity and realizing search processing achieved through a computer. The method comprises the following steps: acquiring a first text cluster and a second cluster; decoding the first text cluster according to a preset phrase translation model and a dependency structure model to obtain K translation text clusters; respectively calculating a first semantic similarity value between the K translation text clusters and the second text cluster, and calculating a second semantic similarity value between the first text cluster and the second text cluster according to the K calculated semantic similarity values. By adopting the method and the device, the problem of long distance dependency relationship in sentences is solved, the semantics of searched sentences can be relatively well expressed, the searched sentences can be relatively well matched with webpage titles, and a user can obtain semantic matching search result items, so that the search experience of the user is improved.
Description
Technical field
The present invention relates to natural language processing technique, particularly relate to a kind of by computer implemented calculating text similarity and search processing method and device.
Background technology
In the middle of search engine, in order on as well as possible each territory matching document of search word (or Query) that user can be inputted (such as, title, content), usually adopt the method based on the coupling of word completely to realize described coupling.
Also the method utilizing translation model is had at present, title and search word is supposed (such as from the angle of translation, Query) be under the hypothesis write as with different sublanguages, be translated into " useful " such phrase translation realize semantic coupling to being similar to " effective ".But, this method can not solve the long-distance dependence problem in the middle of target language, simply can only carry out semantic matches, making can not real embodiment and represent the semanteme of search statement, thus by search statement and web page title matching error, affect Search Results display and sequence, and then affect Consumer's Experience.Such as, sentence " why Guan Yu does not kill Cao behaviour then " is mated for " why not Cao behaviour kills Guan Yu then ", in former sentence (query), " Guan Yu " is subject, " Cao behaviour " is object, and due to unresolved long distance dependent relation problem, search statement and web page title only carry out mating of word, and the dependence of actual sentence does not embody.
Summary of the invention
The object of the invention is to, provide a kind of by computer implemented calculating text similarity and search processing method and device, portray non-local dependence better, solve long distance dependent relation, thus realize better matching effect.
According to an aspect of the present invention, a kind of method by computer implemented calculating text similarity is provided, comprises: obtain the first text string and the second text string; According to the phrase translation model pre-set and dependency structure model, described first text string is decoded, obtain K cypher text string; Calculate the first semantic similitude angle value between described K cypher text string and described second text string respectively, and calculate the second semantic similitude angle value between described first text string and the second text string according to K the first semantic similitude angle value calculated.
According to an aspect of the present invention, a kind of search processing method is provided, comprises: receive search word; Multiple Search Results entry is obtained according to described search word; The semantic similitude angle value of the content title of described search word and described multiple Search Results entry is calculated according to the described method by computer implemented calculating text similarity; Described semantic similitude angle value according to calculating sorts to described multiple Search Results entry; Send the Search Results entry through sequence.
According to a further aspect in the invention, a kind of device calculating text similarity is provided, comprises: text string acquiring unit, for obtaining the first text string and the second text string; Text string decoding unit, for according to the phrase translation model pre-set and dependency structure model, decodes to described first text string, obtains K cypher text string; Similarity value computing unit, for calculating the first semantic similitude angle value between described K cypher text string and described second text string respectively, and calculate the second semantic similitude angle value between described first text string and the second text string according to K the first semantic similitude angle value calculated.
According to a further aspect in the invention, a kind of search process device is provided, comprises: search word receiving element, for receiving search word; Search Results acquiring unit, for obtaining multiple Search Results entry according to described search word; Semantic similitude value computing unit, the device for described calculating text similarity calculates the semantic similitude angle value of the content title of described search word and described multiple Search Results entry; Sequencing unit, for sorting to described multiple Search Results entry according to the described semantic similitude angle value calculated; Transmitting element, for sending the Search Results entry through sequence.
The embodiment of the present invention provide by computer implemented calculating text similarity and search processing method and device, by phrase translation model and dependency structure model, decoding is carried out to the first text string (search keyword or query as user's input) and obtain multiple cypher text string, calculate the first semantic similitude angle value between described multiple cypher text string and the second text string (content title as Search Results entry) respectively, and calculate the second semantic similitude angle value between described first text string and the second text string according to the multiple first semantic similitude angle value calculated, solve the dependence problem of text string middle and long distance, can be comprehensive, calculate the similarity between text string exactly.
In search technique, Semantic Similarity Measurement as above is carried out by the content title of Search Results entry search word and search obtained, the semanteme of search statement can be represented better, and can comprehensively according to this Similarity value and the first text string, the Search Results returned is sorted, thus obtain optimum Search Results, check for user.So, solve the dependence problem of text string middle and long distance, thus better search statement is mated with web page title, the Search Results entry of semantic matches is provided to user, strengthen user search and experience.
Accompanying drawing explanation
Fig. 1 is the schematic flow sheet of the method by computer implemented calculating text similarity that exemplary embodiment of the present is shown.
Fig. 2 is the dependence exemplary plot of the sentence that exemplary embodiment of the present is shown.
Fig. 3 is the schematic flow sheet of the search processing method that exemplary embodiment of the present is shown.
Fig. 4 is the structured flowchart of the device of the calculating text similarity that exemplary embodiment of the present is shown.
Fig. 5 is the structured flowchart of the search process device that exemplary embodiment of the present is shown.
Embodiment
Basic conception of the present invention is, in the information processing technology, is realized the coupling of semantic structure by the dependency structure model introducing target language for translation model; In the process of text matches, translation model and dependency structure models coupling are decoded to text string, in order to produce Top K cypher text string, carry out with another text string that will compare/mate the coupling realizing semantic structure again by described multiple cypher text string, strengthen semantic structural information, and by the calculating of semantic similarity, push the web page title mated with search statement to user.
Traditional phrase translation model, when search word being translated into the title of Top K, can use NGRAM language model to investigate the language regulation translated the title obtained and whether meet target language.In the present invention, in order to investigate the dependency structure of target language further, so introduce a dependency structure model further.
Specifically, the dependence of a sentence refer to sentence S=(w1, w2 ... wn) be modify the so interdependent arc of wi with the modified relationship of descriptor wj to wi by wj between two words (wi, wj) in; In addition, in order to describe the modified relationship of chain type, increasing special root (root) node w0, representing its initial relation with (w0, wi).
The dependency structure probability of sentence S can be calculated by following equation:
Wherein, p (wi, wj) represent that wj modifies the interdependent arc probability of wi, p (wi) is the probability that word wi occurs, p (wi, wj) and p (wi) can obtain by adding up in the interdependent treebank that prestores or in the middle of large-scale data, i and j represents the position that word occurs in sentence.
Fig. 2 is the dependence exemplary plot of the sentence that exemplary embodiment of the present is shown.Such as, p (water conservancy)=0.6, p (water conservancy, engineering)=0.5, then can be calculated by above-mentioned equation:
By that analogy, can dependency structure probability in corresponding calculating sentence between each word and word, take advantage of tired for these dependency structure probability, then obtain the dependency structure probability of this sentence.
Based on the calculating of the dependency structure probability of preceding sentence, train dependency structure model by a large amount of dependency tree.The training of described dependency structure model is not improvement of the present invention, is not therefore described in detail at this.
Below in conjunction with accompanying drawing being described in detail by computer implemented calculating text similarity and search processing method and device exemplary embodiment of the present.
Fig. 1 is the schematic flow sheet of the method by computer implemented calculating text similarity that exemplary embodiment of the present is shown.
With reference to Fig. 1, in step S110, obtain the first text string and the second text string.Wherein, described first text string can be the search statement (or a query) that user inputs, and described second text string can be the web page title of the Search Results entry obtained by described search statement.
In step S120, according to the phrase translation model pre-set and dependency structure model, described first text string is decoded, obtain K cypher text string.
In natural language processing technique, the machine translation method of Corpus--based Method is a kind of main machine translation method, and its basic thought is process mechanical translation being regarded as information transmission, is decoded to mechanical translation by channel model.According to a preferred embodiment of the invention, by post search demoder, described first text string is decoded, obtain Top K cypher text string.
Particularly, in the process of step S120, the cypher text string corresponding with described first text string can be calculated according to phrase translation model, determine the dependency structure between the word of long distance and word according to dependency structure model again, thus determine that whether the first text string is similar semantically to described cypher text string.
Preferably, according to described phrase translation model, described dependency structure model, NGRAM language model and word order Twisting model, described first text string is decoded, obtain Top K cypher text string.Wherein, described word order Twisting model is for examining or check the model of source language and the position relationship of the corresponding phrase in target language in natural language processing technique, the probability that traditional NGRAM language model occurs for examining or check a sentence.By marking to arbitrary candidate's text string based on phrase translation model, dependency structure model, NGRAM language model and word order Twisting model, can produce from a semantically more close Top K cypher text string.
Preferably, by following formula, comprehensive grading Score (T) is calculated to arbitrary candidate's text string T:
Score(T)=λ
1LM(T)+λ
2TM(Q,T)+λ
3D(Q,T)+λ
4DEP(T)
Wherein, LM (T) is to the scoring of cypher text string T according to described NGRAM language model, TM (Q, T) be the probability score being translated as cypher text string T according to described phrase translation model by the first text string Q, D (Q, T) be the scoring being translated as cypher text string T by the first text string Q calculated according to described word order Twisting model, DEP (T) is according to described dependency structure model to the scoring of cypher text string T, λ
1~ λ
4the weight of the scoring of giving aforementioned four models respectively.After this, in the middle of candidate's text string, described K cypher text string is chosen by described comprehensive grading.
Particularly, described comprehensive grading Score (T) is sorted to candidate's cypher text string according to described by post search demoder, filter out the higher K of scoring (or scoring is TOP-K) cypher text string (TOP1, TOP2, TOP3 ... TOPK).Such as, if the first text string is " hard ", the post search demoder cypher text string obtained of being decoded has " hard ", " firm ", " firm ", " hard ", " hard " and " solid ", again such as, first text string is " peach ", the post search demoder cypher text string obtained of being decoded can have " peach ", " carambola ", " honey peach ", " honey peach ", " peach ", so, post search demoder therefrom filters out the higher K of a scoring cypher text string according to its comprehensive grading again.
In step S130, calculate the first semantic similitude angle value between described K cypher text string and described second text string respectively, and calculate the second semantic similitude angle value between described first text string and the second text string according to K the first semantic similitude angle value calculated.
Preferably, described the first semantic similitude angle value calculated respectively between described K cypher text string and described second text string.Particularly, the computing of described first semantic similitude angle value comprises:
First, obtain at least one second interdependent arc that dependency analysis obtains is carried out to described second text string.Here, because the obtain from the second text string second interdependent arc is used multiple times, therefore after can obtaining at least one second interdependent arc described carrying out described second text string dependency analysis, at least one second interdependent arc described is retained in reuse in buffer memory, and does not need to re-start dependency analysis at every turn and obtain at least one second interdependent arc described again.
After this, to arbitrary described cypher text string, perform following process: dependency analysis is carried out to described cypher text string, obtain at least one first interdependent arc, calculate the first semantic similitude angle value between described arbitrary described cypher text string and described second text string based at least one first interdependent arc described and at least one the second interdependent arc.
Preferably, the cosine similarity of at least one first interdependent arc described and at least one the second interdependent arc is calculated as the first semantic similitude angle value between described arbitrary described cypher text string and described second text string.
Such as, K cypher text string is obtained by step S120, the first interdependent arc and the second interdependent arc is calculated by step S130, cosine calculating is carried out to described first interdependent arc and the second interdependent arc, and then obtain K the first semantic similitude angle value, such as, if the set of the interdependent arc of arbitrary cypher text string t and the second text string w is expressed as: arcs (t)={ (t0, ti), (ti, tj), and arcs (w) { (w0, wi), (wi, wj), , cosine similarity (i.e. the first semantic similitude angle value) Similarity (t of cypher text string t and the second text string w is so calculated by following equation, w):
Wherein, numbersof (wi, wj) and numbersof (ti, tj) represents the number of interdependent arc (wi, wj) and (ti, tj) respectively.
Preferably, give the score of each cypher text string as weight using described dependency structure model, summation is weighted to described K the first semantic similitude angle value, obtains the second semantic similitude angle value between described first text string and the second text string.
Such as, calculate the second semantic similitude angle value by following equation,
Wherein, DEP (t) is according to described dependency structure model to the scoring of cypher text string t, and can be obtained by above-mentioned dependency structure probability calculation, K is the number of cypher text string.
Fig. 3 is the schematic flow sheet of the search processing method that exemplary embodiment of the present is shown.
With reference to Fig. 3, in step S210, receive search word, i.e. the search keyword that always inputs at search engine of user.
In step S220, multiple Search Results entry is obtained according to described search word, such as, by step S210, the search keyword receiving user is " baby fever ", and the Search Results entry got may be the Search Results entry such as " baby's cat fever ", " baby's fever ", " neonate has a high fever ", " child's fever " or " baby's heating ".
In step S230, calculate the semantic similitude angle value of the content title of described search word and described multiple Search Results entry according to the method for aforementioned calculating text similarity.
In step S240, the described semantic similitude angle value according to calculating at described step S230 sorts to described multiple Search Results entry.
At this, still for above-mentioned " baby fever ", if calculate its semantic similitude angle value (such as representing with Similarity) by step S230 to be respectively Similarity (baby's cat fever, baby fever)=0.87, (baby's Similarity has a fever, baby fever)=0.71, (neonate's Similarity has a high fever, baby fever)=0.83, (child's Similarity has a fever, baby fever)=0.65, (baby's Similarity generates heat, baby fever)=0.79, carry out to little sequence being greatly to Similarity value: Similarity (baby's cat fever, baby fever), (neonate's Similarity has a high fever, baby fever), (baby's Similarity generates heat, baby fever), (baby's Similarity has a fever, baby fever), (child's Similarity has a fever, baby fever).
In step S250, send the Search Results entry through sequence.For above-mentioned " baby fever ", the Search Results entry finally sent in order is then: baby's cat fever, neonate has a high fever, baby generates heat, baby has a fever and child fever.
The embodiment of the present invention provide by computer implemented calculating text similarity and search processing method, by phrase translation model and dependency structure model, decoding is carried out to the first text string (search keyword or query as user's input) and obtain multiple cypher text string, calculate the first semantic similitude angle value between described multiple cypher text string and the second text string (content title as Search Results entry) respectively, and calculate the second semantic similitude angle value between described first text string and the second text string according to the multiple first semantic similitude angle value calculated, solve the dependence problem of text string middle and long distance, can be comprehensive, calculate the similarity between text string exactly.
In search technique, Semantic Similarity Measurement as above is carried out by the content title of Search Results entry search word and search obtained, the semanteme of search statement can be represented better, and can comprehensively according to this Similarity value and the first text string, the Search Results returned is sorted, thus obtain optimum Search Results, check for user.So, solve the dependence problem of text string middle and long distance, thus better search statement is mated with web page title, the Search Results entry of semantic matches is provided to user, strengthen user search and experience.
Fig. 4 is the structured flowchart of the device of the calculating text similarity that exemplary embodiment of the present is shown.
As shown in Figure 4, the device of described calculating text similarity comprises text string acquiring unit 310, text string decoding unit 320 and Similarity value computing unit 330.
Text string acquiring unit 310 is for obtaining the first text string and the second text string.
Such as, described first text string can be the search statement that user inputs, and described second text string can be the web page title of the document of the band coupling prestored.
Text string decoding unit 320, for decoding to described first text string according to the phrase translation model pre-set and dependency structure model, obtains K cypher text string.
Preferably, text string decoding unit 320 is decoded to described first text string according to described phrase translation model, described dependency structure model, NGRAM language model and word order Twisting model, obtain described K cypher text string, wherein, by post search demoder, described first text string is decoded, obtain K cypher text string.
Preferably, text string decoding unit 320 calculates comprehensive grading Score (T) by following formula to arbitrary candidate's text string T:
Score(T)=λ
1LM(T)+λ
2TM(Q,T)+λ
3D(Q,T)+λ
4DEP(T)
Wherein, LM (T) is to the scoring of cypher text string T according to described NGRAM language model, TM (Q, T) be the probability score being translated as cypher text string T according to described phrase translation model by the first text string Q, D (Q, T) be the scoring being translated as cypher text string T by the first text string Q calculated according to described word order Twisting model, DEP (T) is according to described dependency structure model to the scoring of cypher text string T, λ
1~ λ
4be the weight of the scoring of giving aforementioned four models respectively, in the middle of candidate's text string, choose described K cypher text string by described comprehensive grading.
Similarity value computing unit 330 for calculating the first semantic similitude angle value between described K cypher text string and described second text string respectively, and calculates the second semantic similitude angle value between described first text string and the second text string according to K the first semantic similitude angle value calculated.
Preferably, Similarity value computing unit 330 obtains and carries out to described second text string at least one second interdependent arc that dependency analysis obtains, and to arbitrary described cypher text string, perform following process: dependency analysis is carried out to described cypher text string, obtain at least one first interdependent arc, calculate the first semantic similitude angle value between described arbitrary described cypher text string and described second text string based at least one first interdependent arc described and at least one the second interdependent arc.
Preferably, Similarity value computing unit 330 calculates the cosine similarity of at least one first interdependent arc described and at least one the second interdependent arc as the first semantic similitude angle value between described arbitrary described cypher text string and described second text string.
Preferably, Similarity value computing unit 330 gives the score of each cypher text string as weight using described dependency structure model, summation is weighted to described K the first semantic similitude angle value, obtains the second semantic similitude angle value between described first text string and the second text string.
Fig. 5 is the structured flowchart of the search process device that exemplary embodiment of the present is shown.
With reference to Fig. 5, described search process device comprises: search word receiving element 410, Search Results acquiring unit 420, semantic similitude value computing unit 430, sequencing unit 440 and transmitting element 450.
Search word receiving element 410 for receiving search word, i.e. the search keyword that always inputs at search engine of user.
Search Results acquiring unit 420 obtains multiple Search Results entry for the search word received according to described search word receiving element 410.
Semantic similitude value computing unit 430 is for calculating the semantic similitude angle value of the content title of described search word and described multiple Search Results entry by the device of aforesaid calculating text similarity.
Sequencing unit 440 is for sorting to described multiple Search Results entry according to the described semantic similitude angle value calculated.
Transmitting element 450 is for sending the Search Results entry through sequence.
The embodiment of the present invention provide by computer implemented calculating text similarity and search process device, by phrase translation model and dependency structure model, decoding is carried out to the first text string (search keyword or query as user's input) and obtain multiple cypher text string, calculate the first semantic similitude angle value between described multiple cypher text string and the second text string (content title as Search Results entry) respectively, and calculate the second semantic similitude angle value between described first text string and the second text string according to the multiple first semantic similitude angle value calculated, solve the dependence problem of text string middle and long distance, can be comprehensive, calculate the similarity between text string exactly.
In search technique, Semantic Similarity Measurement as above is carried out by the content title of Search Results entry search word and search obtained, the semanteme of search statement can be represented better, and can comprehensively according to this Similarity value and the first text string, the Search Results returned is sorted, thus obtain optimum Search Results, check for user.So, solve the dependence problem of text string middle and long distance, thus better search statement is mated with web page title, the Search Results entry of semantic matches is provided to user, strengthen user search and experience.
It may be noted that the needs according to implementing, each step described can be split as more multi-step, also the part operation of two or more step or step can be combined into new step, to realize object of the present invention in the application.
Above-mentioned can at hardware according to method of the present invention, realize in firmware, or be implemented as and can be stored in recording medium (such as CD ROM, RAM, floppy disk, hard disk or magneto-optic disk) in software or computer code, or be implemented and will be stored in the computer code in local recording medium by the original storage of web download in remote logging medium or nonvolatile machine readable media, thus method described here can be stored in use multi-purpose computer, such software process on the recording medium of application specific processor or able to programme or specialized hardware (such as ASIC or FPGA).Be appreciated that, computing machine, processor, microprocessor controller or programmable hardware comprise and can store or receive the memory module of software or computer code (such as, RAM, ROM, flash memory etc.), when described software or computer code by computing machine, processor or hardware access and perform time, realize disposal route described here.In addition, when the code for realizing the process shown in this accessed by multi-purpose computer, multi-purpose computer is converted to the special purpose computer for performing the process shown in this by the execution of code.
The above; be only the specific embodiment of the present invention, but protection scope of the present invention is not limited thereto, is anyly familiar with those skilled in the art in the technical scope that the present invention discloses; change can be expected easily or replace, all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of described claim.
Claims (16)
1. by a method for computer implemented calculating text similarity, it is characterized in that, described method comprises:
Obtain the first text string and the second text string;
According to the phrase translation model pre-set and dependency structure model, described first text string is decoded, obtain K cypher text string;
Calculate the first semantic similitude angle value between described K cypher text string and described second text string respectively, and calculate the second semantic similitude angle value between described first text string and the second text string according to K the first semantic similitude angle value calculated.
2. method according to claim 1, is characterized in that, the described process calculating the first semantic similitude angle value between described K cypher text string and described second text string respectively comprises:
Obtain and at least one second interdependent arc that dependency analysis obtains carried out to described second text string, and to arbitrary described cypher text string, perform following process:
Dependency analysis is carried out to described cypher text string, obtains at least one first interdependent arc,
The first semantic similitude angle value between described arbitrary described cypher text string and described second text string is calculated based at least one first interdependent arc described and at least one the second interdependent arc.
3. method according to claim 2, it is characterized in that, the described process based at least one first interdependent arc described and the first semantic similitude angle value between at least one the second interdependent arc described arbitrary described cypher text string of calculating and described second text string comprises:
Calculate the cosine similarity of at least one first interdependent arc described and at least one the second interdependent arc as the first semantic similitude angle value between described arbitrary described cypher text string and described second text string.
4. the method according to any one of claims 1 to 3, is characterized in that, the process of the described K according to calculating a second semantic similitude angle value that the first semantic similitude angle value calculates between described first text string and the second text string comprises:
Give the score of each cypher text string as weight using described dependency structure model, summation is weighted to described K the first semantic similitude angle value, obtains the second semantic similitude angle value between described first text string and the second text string.
5. method according to claim 4, is characterized in that, the phrase translation model that described basis pre-sets and dependency structure model are decoded to described first text string, and the process obtaining K cypher text string comprises:
According to described phrase translation model, described dependency structure model, NGRAM language model and word order Twisting model, described first text string is decoded, obtain described K cypher text string.
6. method according to claim 5, it is characterized in that, according to described phrase translation model, described dependency structure model, NGRAM language model and word order Twisting model, described first text string Q is decoded described, obtain in the process of described K cypher text string T
By following formula, comprehensive grading Score (T) is calculated to arbitrary candidate's text string T:
Score(T)=λ
1LM(T)+λ
2TM(Q,T)+λ
3D(Q,T)+λ
4DEP(T)
Wherein, LM (T) is to the scoring of cypher text string T according to described NGRAM language model, TM (Q, T) be the probability score being translated as cypher text string T according to described phrase translation model by the first text string Q, D (Q, T) be the scoring being translated as cypher text string T by the first text string Q calculated according to described word order Twisting model, DEP (T) is according to described dependency structure model to the scoring of cypher text string T, λ
1~ λ
4the weight of the scoring of giving aforementioned four models respectively,
In the middle of candidate's text string, described K cypher text string is chosen by described comprehensive grading.
7. method according to claim 6, is characterized in that, is decoded, obtain K cypher text string by post search demoder to described first text string.
8. a search processing method, is characterized in that, comprising:
Receive search word;
Multiple Search Results entry is obtained according to described search word;
Search word as described in calculating according to the method according to any one of claim 1 ~ 7 and as described in the semantic similitude angle value of content title of multiple Search Results entry;
Described semantic similitude angle value according to calculating sorts to described multiple Search Results entry;
Send the Search Results entry through sequence.
9. calculate a device for text similarity, it is characterized in that, described device comprises:
Text string acquiring unit, for obtaining the first text string and the second text string;
Text string decoding unit, for according to the phrase translation model pre-set and dependency structure model, decodes to described first text string, obtains K cypher text string;
Similarity value computing unit, for calculating the first semantic similitude angle value between described K cypher text string and described second text string respectively, and calculate the second semantic similitude angle value between described first text string and the second text string according to K the first semantic similitude angle value calculated.
10. device according to claim 9, is characterized in that, described Similarity value computing unit obtains and carries out to described second text string at least one second interdependent arc that dependency analysis obtains,
To arbitrary described cypher text string, perform following process:
Dependency analysis is carried out to described cypher text string, obtains at least one first interdependent arc,
The first semantic similitude angle value between described arbitrary described cypher text string and described second text string is calculated based at least one first interdependent arc described and at least one the second interdependent arc.
11. devices according to claim 10, it is characterized in that, described Similarity value computing unit calculates the cosine similarity of at least one first interdependent arc described and at least one the second interdependent arc as the first semantic similitude angle value between described arbitrary described cypher text string and described second text string.
12. devices according to any one of claim 9 ~ 11, it is characterized in that, described Similarity value computing unit gives the score of each cypher text string as weight using described dependency structure model, summation is weighted to described K the first semantic similitude angle value, obtains the second semantic similitude angle value between described first text string and the second text string.
13. devices according to claim 12, it is characterized in that, described text string decoding unit is decoded to described first text string according to described phrase translation model, described dependency structure model, NGRAM language model and word order Twisting model, obtains described K cypher text string.
14. devices according to claim 13, is characterized in that, described text string decoding unit calculates comprehensive grading Score (T) by following formula to arbitrary candidate's text string T:
Score(T)=λ
1LM(T)+λ
2TM(Q,T)+λ
3D(Q,T)+λ
4DEP(T)
Wherein, LM (T) is to the scoring of cypher text string T according to described NGRAM language model, TM (Q, T) be the probability score being translated as cypher text string T according to described phrase translation model by the first text string Q, D (Q, T) be the scoring being translated as cypher text string T by the first text string Q calculated according to described word order Twisting model, DEP (T) is according to described dependency structure model to the scoring of cypher text string T, λ
1~ λ
4the weight of the scoring of giving aforementioned four models respectively,
In the middle of candidate's text string, described K cypher text string is chosen by described comprehensive grading.
15. devices according to claim 14, is characterized in that, decoded, obtain K cypher text string by post search demoder to described first text string.
16. 1 kinds of search process devices, is characterized in that, comprising:
Search word receiving element, for receiving search word;
Search Results acquiring unit, for obtaining multiple Search Results entry according to described search word;
Semantic similitude value computing unit, for search word as described in calculating according to the device according to any one of claim 9 ~ 15 and as described in the semantic similitude angle value of content title of multiple Search Results entry;
Sequencing unit, for sorting to described multiple Search Results entry according to the described semantic similitude angle value calculated;
Transmitting element, for sending the Search Results entry through sequence.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410728432.4A CN104462060B (en) | 2014-12-03 | 2014-12-03 | Pass through computer implemented calculating text similarity and search processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410728432.4A CN104462060B (en) | 2014-12-03 | 2014-12-03 | Pass through computer implemented calculating text similarity and search processing method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104462060A true CN104462060A (en) | 2015-03-25 |
CN104462060B CN104462060B (en) | 2017-08-01 |
Family
ID=52908130
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410728432.4A Active CN104462060B (en) | 2014-12-03 | 2014-12-03 | Pass through computer implemented calculating text similarity and search processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104462060B (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106021346A (en) * | 2016-05-09 | 2016-10-12 | 北京百度网讯科技有限公司 | A retrieval processing method and device |
CN106227771A (en) * | 2016-07-15 | 2016-12-14 | 浙江大学 | A kind of domain expert based on socialization's programming website finds method |
CN106503175A (en) * | 2016-11-01 | 2017-03-15 | 上海智臻智能网络科技股份有限公司 | The inquiry of Similar Text, problem extended method, device and robot |
CN106776782A (en) * | 2016-11-21 | 2017-05-31 | 北京百度网讯科技有限公司 | Semantic similarity acquisition methods and device based on artificial intelligence |
CN107729300A (en) * | 2017-09-18 | 2018-02-23 | 百度在线网络技术(北京)有限公司 | Processing method, device, equipment and the computer-readable storage medium of text similarity |
CN107784037A (en) * | 2016-08-31 | 2018-03-09 | 北京搜狗科技发展有限公司 | Information processing method and device, the device for information processing |
CN107885737A (en) * | 2017-12-27 | 2018-04-06 | 传神语联网网络科技股份有限公司 | A kind of human-computer interaction interpretation method and system |
CN111708942A (en) * | 2020-06-12 | 2020-09-25 | 北京达佳互联信息技术有限公司 | Multimedia resource pushing method, device, server and storage medium |
CN111881669A (en) * | 2020-06-24 | 2020-11-03 | 百度在线网络技术(北京)有限公司 | Synonymy text acquisition method and device, electronic equipment and storage medium |
CN112182348A (en) * | 2020-11-09 | 2021-01-05 | 百度国际科技(深圳)有限公司 | Semantic matching judgment method and device, electronic equipment and computer readable medium |
US11216844B2 (en) * | 2017-03-29 | 2022-01-04 | Ebay Inc. | Generating keywords by associative context with input words |
CN111414531B (en) * | 2020-03-20 | 2023-08-08 | 北京百度网讯科技有限公司 | Event searching method and device and electronic equipment |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20010029443A1 (en) * | 2000-03-27 | 2001-10-11 | International Business Machines Corporation | Machine translation system, machine translation method, and storage medium storing program for executing machine translation method |
CN101169780A (en) * | 2006-10-25 | 2008-04-30 | 华为技术有限公司 | Semantic ontology retrieval system and method |
CN101286161A (en) * | 2008-05-28 | 2008-10-15 | 华中科技大学 | Intelligent Chinese request-answering system based on concept |
CN102184169A (en) * | 2011-04-20 | 2011-09-14 | 北京百度网讯科技有限公司 | Method, device and equipment used for determining similarity information among character string information |
CN102567306A (en) * | 2011-11-07 | 2012-07-11 | 苏州大学 | Acquisition method and acquisition system for similarity of vocabularies between different languages |
CN102637163A (en) * | 2011-01-09 | 2012-08-15 | 华东师范大学 | Method and system for controlling multi-level ontology matching based on semantemes |
CN102737013A (en) * | 2011-04-02 | 2012-10-17 | 三星电子(中国)研发中心 | Device and method for identifying statement emotion based on dependency relation |
EP2541435A1 (en) * | 2010-02-26 | 2013-01-02 | National Institute of Information and Communication Technology | Relational information expansion device, relational information expansion method and program |
-
2014
- 2014-12-03 CN CN201410728432.4A patent/CN104462060B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20010029443A1 (en) * | 2000-03-27 | 2001-10-11 | International Business Machines Corporation | Machine translation system, machine translation method, and storage medium storing program for executing machine translation method |
CN101169780A (en) * | 2006-10-25 | 2008-04-30 | 华为技术有限公司 | Semantic ontology retrieval system and method |
CN101286161A (en) * | 2008-05-28 | 2008-10-15 | 华中科技大学 | Intelligent Chinese request-answering system based on concept |
EP2541435A1 (en) * | 2010-02-26 | 2013-01-02 | National Institute of Information and Communication Technology | Relational information expansion device, relational information expansion method and program |
CN102637163A (en) * | 2011-01-09 | 2012-08-15 | 华东师范大学 | Method and system for controlling multi-level ontology matching based on semantemes |
CN102737013A (en) * | 2011-04-02 | 2012-10-17 | 三星电子(中国)研发中心 | Device and method for identifying statement emotion based on dependency relation |
CN102184169A (en) * | 2011-04-20 | 2011-09-14 | 北京百度网讯科技有限公司 | Method, device and equipment used for determining similarity information among character string information |
CN102567306A (en) * | 2011-11-07 | 2012-07-11 | 苏州大学 | Acquisition method and acquisition system for similarity of vocabularies between different languages |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106021346A (en) * | 2016-05-09 | 2016-10-12 | 北京百度网讯科技有限公司 | A retrieval processing method and device |
CN106021346B (en) * | 2016-05-09 | 2020-01-07 | 北京百度网讯科技有限公司 | Retrieval processing method and device |
CN106227771B (en) * | 2016-07-15 | 2019-05-07 | 浙江大学 | A kind of domain expert's discovery method based on socialization programming website |
CN106227771A (en) * | 2016-07-15 | 2016-12-14 | 浙江大学 | A kind of domain expert based on socialization's programming website finds method |
CN107784037B (en) * | 2016-08-31 | 2022-02-01 | 北京搜狗科技发展有限公司 | Information processing method and device, and device for information processing |
CN107784037A (en) * | 2016-08-31 | 2018-03-09 | 北京搜狗科技发展有限公司 | Information processing method and device, the device for information processing |
CN106503175A (en) * | 2016-11-01 | 2017-03-15 | 上海智臻智能网络科技股份有限公司 | The inquiry of Similar Text, problem extended method, device and robot |
CN106503175B (en) * | 2016-11-01 | 2019-03-29 | 上海智臻智能网络科技股份有限公司 | Inquiry, problem extended method, device and the robot of Similar Text |
CN106776782B (en) * | 2016-11-21 | 2020-05-22 | 北京百度网讯科技有限公司 | Semantic similarity obtaining method and device based on artificial intelligence |
CN106776782A (en) * | 2016-11-21 | 2017-05-31 | 北京百度网讯科技有限公司 | Semantic similarity acquisition methods and device based on artificial intelligence |
US11769173B2 (en) | 2017-03-29 | 2023-09-26 | Ebay Inc. | Generating keywords by associative context with input words |
US11216844B2 (en) * | 2017-03-29 | 2022-01-04 | Ebay Inc. | Generating keywords by associative context with input words |
CN107729300A (en) * | 2017-09-18 | 2018-02-23 | 百度在线网络技术(北京)有限公司 | Processing method, device, equipment and the computer-readable storage medium of text similarity |
CN107729300B (en) * | 2017-09-18 | 2021-12-24 | 百度在线网络技术(北京)有限公司 | Text similarity processing method, device and equipment and computer storage medium |
CN107885737A (en) * | 2017-12-27 | 2018-04-06 | 传神语联网网络科技股份有限公司 | A kind of human-computer interaction interpretation method and system |
CN107885737B (en) * | 2017-12-27 | 2021-04-27 | 传神语联网网络科技股份有限公司 | Man-machine interactive translation method and system |
CN111414531B (en) * | 2020-03-20 | 2023-08-08 | 北京百度网讯科技有限公司 | Event searching method and device and electronic equipment |
CN111708942B (en) * | 2020-06-12 | 2023-08-08 | 北京达佳互联信息技术有限公司 | Multimedia resource pushing method, device, server and storage medium |
CN111708942A (en) * | 2020-06-12 | 2020-09-25 | 北京达佳互联信息技术有限公司 | Multimedia resource pushing method, device, server and storage medium |
CN111881669A (en) * | 2020-06-24 | 2020-11-03 | 百度在线网络技术(北京)有限公司 | Synonymy text acquisition method and device, electronic equipment and storage medium |
CN112182348A (en) * | 2020-11-09 | 2021-01-05 | 百度国际科技(深圳)有限公司 | Semantic matching judgment method and device, electronic equipment and computer readable medium |
CN112182348B (en) * | 2020-11-09 | 2024-03-29 | 百度国际科技(深圳)有限公司 | Semantic matching judging method, device, electronic equipment and computer readable medium |
Also Published As
Publication number | Publication date |
---|---|
CN104462060B (en) | 2017-08-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104462060A (en) | Method and device for calculating text similarity and realizing search processing through computer | |
Thakur et al. | Beir: A heterogenous benchmark for zero-shot evaluation of information retrieval models | |
CN104850554B (en) | Searching method and system | |
CN104462057B (en) | For the method and system for the lexicon for producing language analysis | |
US9965726B1 (en) | Adding to a knowledge base using an ontological analysis of unstructured text | |
US10157174B2 (en) | Utilizing a dialectical model in a question answering system | |
US20180293294A1 (en) | Similar Term Aggregation Method and Apparatus | |
CN105930314A (en) | Text summarization generation system and method based on coding-decoding deep neural networks | |
CN104049755A (en) | Information processing method and device | |
Chen et al. | Varclr: Variable semantic representation pre-training via contrastive learning | |
US20150081654A1 (en) | Techniques for Entity-Level Technology Recommendation | |
US11550794B2 (en) | Automated determination of document utility for a document corpus | |
Zvonarev et al. | A Comparison of Machine Learning Methods of Sentiment Analysis Based on Russian Language Twitter Data. | |
CN107656921A (en) | A kind of short text dependency analysis method based on deep learning | |
Plu et al. | A hybrid approach for entity recognition and linking | |
CN111814477A (en) | Dispute focus discovery method and device based on dispute focus entity and terminal | |
CN112015907A (en) | Method and device for quickly constructing discipline knowledge graph and storage medium | |
Omeliyanenko et al. | Lm4kg: Improving common sense knowledge graphs with language models | |
Hecht | The mining and application of diverse cultural perspectives in user-generated content | |
Agarwal et al. | Towards effective paraphrasing for information disguise | |
Anuyah et al. | Using structured knowledge and traditional word embeddings to generate concept representations in the educational domain | |
Valiaiev | Detection of Machine-Generated Text: Literature Survey | |
De Kruijf et al. | Training a Dutch (+ English) BERT model applicable for the legal domain | |
Wu et al. | Characterizing and Verifying Scientific Claims: Qualitative Causal Structure is All You Need | |
Rasyid et al. | Classification of Disinformation Tweet on the 2024 Presidential Election in Indonesia Using Optimal Tranformer Based Model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20180116 Address after: 100085 Beijing, Haidian District, No. ten on the ground floor, No. 10 Baidu building, layer 2 Patentee after: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY Co.,Ltd. Address before: 100085 Beijing, Haidian District, No. ten on the ground floor, No. 10 Baidu building, layer three Patentee before: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY Co.,Ltd. |
|
TR01 | Transfer of patent right |