CN104462060B - Pass through computer implemented calculating text similarity and search processing method and device - Google Patents

Pass through computer implemented calculating text similarity and search processing method and device Download PDF

Info

Publication number
CN104462060B
CN104462060B CN201410728432.4A CN201410728432A CN104462060B CN 104462060 B CN104462060 B CN 104462060B CN 201410728432 A CN201410728432 A CN 201410728432A CN 104462060 B CN104462060 B CN 104462060B
Authority
CN
China
Prior art keywords
text string
cypher
semantic similarity
similarity value
string
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410728432.4A
Other languages
Chinese (zh)
Other versions
CN104462060A (en
Inventor
张军
吴先超
刘占
刘占一
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201410728432.4A priority Critical patent/CN104462060B/en
Publication of CN104462060A publication Critical patent/CN104462060A/en
Application granted granted Critical
Publication of CN104462060B publication Critical patent/CN104462060B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

It is a kind of by computer implemented calculating text similarity and search processing method and device that the present invention is provided.Methods described includes:Obtain the first text string and the second text string;According to the phrase translation model and dependency structure model pre-set, first text string is decoded, K cypher text string is obtained;The first semantic similarity value between the K cypher text string and second text string is calculated respectively, and according to the second semantic similarity value between K the first semantic similarity values calculating first text strings of calculating and the second text string.The dependence problem of sentence middle and long distance is which solved, the semanteme of search statement can be preferably represented, so as to preferably be matched search statement with web page title, user is obtained the search result entry of semantic matches, strengthens user's search experience.

Description

Pass through computer implemented calculating text similarity and search processing method and device
Technical field
Pass through computer implemented calculating text similarity the present invention relates to natural language processing technique, more particularly to one kind With search processing method and device.
Background technology
Among search engine, in order to the search term (or Query) that inputs user it is as well as possible match text (for example, title, content) on each domain of shelves, the matching is generally realized using the method for the matching based on complete word.
Also there is the method using translation model at present, title and search term (for example, Query) are assumed from the angle of translation Under being the hypothesis write as with different sublanguages, come to being translated into phrase translation as " useful " similar to " effective " To realize the matching of semanteme.But, this method can not solve the problems, such as the long-distance dependence among object language, Zhi Nengjian It is single to carry out semantic matches so as to be unable to real embodiment and represent the semanteme of search statement, so that by search statement and web page title Matching error, influence search result shows and sorted, and then influences Consumer's Experience.For example, by sentence, " why Guan Yu does not have then Kill Cao behaviour " match as " it is subject that why not Cao behaviour kills " Guan Yu " in Guan Yu ", former sentence (query) then, and " Cao behaviour " is Object, and due to unresolved long distance dependent relation problem, search statement only carries out matching for word with web page title, and actual sentence The dependence of son does not embody but.
The content of the invention
It is an object of the present invention to provide a kind of pass through computer implemented calculating text similarity and search processing method And device, non local dependence is preferably portrayed, long distance dependent relation is solved, so as to realize more preferable matching effect.
According to an aspect of the present invention there is provided a kind of method by computer implemented calculating text similarity, including: Obtain the first text string and the second text string;According to the phrase translation model and dependency structure model pre-set, to described One text string is decoded, and obtains K cypher text string;The K cypher text string and second text string are calculated respectively Between the first semantic similarity value, and according to K the first semantic similarity values of calculating calculate first text strings and The second semantic similarity value between second text string.
According to an aspect of the present invention there is provided a kind of search processing method, including:Receive search term;According to the search Word obtains multiple search result entries;Searched according to being calculated the method by computer implemented calculating text similarity Rope word and the semantic similarity value of the content title of the multiple search result entry;According to the semantic similarity value of calculating The multiple search result entry is ranked up;Send the search result entry by sequence.
According to another aspect of the present invention there is provided a kind of device for calculating text similarity, including:Text string obtains single Member, for obtaining the first text string and the second text string;Text string decoding unit, for according to the phrase translation mould pre-set Type and dependency structure model, are decoded to first text string, obtain K cypher text string;Similarity value computing unit, For calculating the first semantic similarity value between the K cypher text string and second text string respectively, and according to K the first semantic similarity values of calculating calculate the second semantic similarity between first text string and the second text string Value.
According to another aspect of the present invention there is provided a kind of search process device, including:Search term receiving unit, for connecing Receive search term;Search result acquiring unit, for obtaining multiple search result entries according to the search term;Semantic similar value meter Unit is calculated, the device for the calculating text similarity calculates the search term and the content of the multiple search result entry The semantic similarity value of title;Sequencing unit, for according to the semantic similarity value of calculating to the multiple search result Entry is ranked up;Transmitting element, for sending the search result entry by sequence.
It is provided in an embodiment of the present invention to calculate text similarity and search processing method and device by computer implemented, The first text string (search keyword or query that such as user inputs) is carried out by phrase translation model and dependency structure model Decoding obtains multiple cypher text strings, and the multiple cypher text string and the second text string (such as search result entry are calculated respectively Content title) between the first semantic similarity value, and according to multiple first semantic similarity values of calculating calculate described in The second semantic similarity value between first text string and the second text string, the dependence for solving text string middle and long distance is asked Topic, the similarity between text string can be calculated comprehensively, exactly.
In search technique, as above institute is carried out by the content title for the search result entry for obtaining search term and search The Semantic Similarity Measurement stated, can preferably represent the semanteme of search statement, and can integrate according to the Similarity value and One text string, is ranked up to the search result of return, so that optimal search result is obtained, so that user checks.In this way, solution The certainly dependence problem of text string middle and long distance, so as to preferably be matched search statement with web page title, to user The search result entry of semantic matches is provided, strengthens user's search experience.
Brief description of the drawings
Fig. 1 is the stream for the method by computer implemented calculating text similarity for showing exemplary embodiment of the present Journey schematic diagram.
Fig. 2 is the dependence exemplary plot for the sentence for showing exemplary embodiment of the present.
Fig. 3 is the schematic flow sheet for the search processing method for showing exemplary embodiment of the present.
Fig. 4 is the structured flowchart of the device for the calculating text similarity for showing exemplary embodiment of the present.
Fig. 5 is the structured flowchart for the search process device for showing exemplary embodiment of the present.
Embodiment
The basic conception of the present invention is, in the information processing technology, by introducing the interdependent of object language for translation model Structural model realizes the matching of semantic structure;In the processing of text matches, by translation model and dependency structure models coupling Text string is decoded, to produce Top K cypher text strings, then by the multiple cypher text string with to be compared Another text string compared with/matching carries out realizing the matching of semantic structure, the semantic structural information of reinforcing, and passes through semantic similarity Calculate, the web page title matched with search statement is pushed to user.
Traditional phrase translation model, when search term to be translated into Top K title, can use NGRAM language models Whether the title obtained to investigate translation meets the language regulation of object language.In the present invention, in order to further investigate target language The dependency structure of speech, so it is further introduced into a dependency structure model.
Specifically, the dependence of a sentence refer in sentence S=(w1, w2 ... ... wn) two words (wi, wj) it Between be by wj modify wi as interdependent arc with modified relationships of the descriptor wj to wi;In addition, the modification in order to describe chain type Relation, is increased special root (root) node w0, its starting relation is represented with (w0, wi).
Sentence S dependency structure probability can be calculated by following equation:
Wherein, p (wi, wj) represents that wj modifies wi interdependent arc probability, and p (wi) is the probability that word wi occurs, p (wi, wj) It can be obtained with p (wi) by counting in the interdependent treebank that prestores or among large-scale data, i and j represent word in sentence The position of appearance.
Fig. 2 is the dependence exemplary plot for the sentence for showing exemplary embodiment of the present.For example, p (water conservancy)=0.6, p (water conservancy, engineering)=0.5, then can be calculated by above-mentioned equation and obtained:
By that analogy, the dependency structure probability between the word of each in sentence and word can be accordingly calculated, by these interdependent knots Structure probability is tired to be multiplied, then obtains the dependency structure probability of this sentence.
The calculating of dependency structure probability based on preceding sentence, can train dependency structure model by a large amount of dependency trees. The training of the dependency structure model is not the improvement of the present invention, therefore is not described in detail herein.
Below in conjunction with the accompanying drawings to exemplary embodiment of the present is by computer implemented calculating text similarity and searches Rope processing method and processing device is described in detail.
Fig. 1 is the stream for the method by computer implemented calculating text similarity for showing exemplary embodiment of the present Journey schematic diagram.
Reference picture 1, in step S110, obtains the first text string and the second text string.Wherein, first text string can be with It is the search statement (or a query) of user's input, second text string can be obtained by the search statement The web page title of search result entry.
In step S120, according to the phrase translation model and dependency structure model pre-set, to first text string Decoded, obtain K cypher text string.
In natural language processing technique, the machine translation method based on statistics is a kind of main machine translation method, Its basic thought is the process that machine translation is regarded as to information transfer, and machine translation is decoded by channel model.Root According to the preferred embodiments of the present invention, decoder can be searched for by post and first text string is decoded, obtained Top K and turn over Translate text string.
Specifically, in step S120 processing, it can be calculated according to phrase translation model and the first text string phase Corresponding cypher text string, the dependency structure between the word of long range and word is determined further according to dependency structure model, so that it is determined that Whether the first text string is similar semantically to the cypher text string.
Preferably, turned round according to the phrase translation model, the dependency structure model, NGRAM language models and word order Bent model is decoded to first text string, obtains Top K cypher text strings.Wherein, the word order Twisting model is It is used for the model for examining or check the position relationship of original language phrase corresponding with object language in natural language processing technique, passes The NGRAM language models of system are used to examine or check the probability that a sentence occurs.Pass through phrase-based translation model, dependency structure mould Type, NGRAM language models and word order Twisting model score any candidate's text string, can produce and more be connect from semantically Near Top K cypher text strings.
Preferably, comprehensive grading Score (T) is calculated to any candidate's text string T by below equation:
Score (T)=λ1LM(T)+λ2TM(Q,T)+λ3D(Q,T)+λ4DEP(T)
Wherein, LM (T) is the scoring to cypher text string T according to the NGRAM language models, and TM (Q, T) is according to institute The probability score that phrase translation model is translated as cypher text string T by the first text string Q is stated, D (Q, T) is turned round according to the word order The scoring that cypher text string T is translated as by the first text string Q that bent model is calculated, DEP (T) is according to the dependency structure model Scoring to cypher text string T, λ14The weight of the scoring of foregoing four models is to confer to respectively.Hereafter, the synthesis is passed through The K cypher text string is chosen in scoring among candidate's text string.
Specifically, by post search for decoder according to it is described to the comprehensive grading Score (T) to candidate's cypher text string It is ranked up, filters out K (or scoring is TOP-K) cypher text string (TOP1, TOP2, TOP3 ... for scoring higher TOPK).For example, if the first text string is " hard ", post search decoder, which decodes it obtained cypher text string, " hard ", " firm ", " firm ", " hard ", " hard " and " solid ", for another example the first text string be " peach ", post is searched Rope decoder, which decodes it obtained cypher text string, can " peach ", " carambola ", " honey peach ", " honey peach ", " peach ", such as This, post searches for decoder and therefrom filters out the K higher cypher text string that score further according to its comprehensive grading.
In step S130, the first semantic phase between the K cypher text string and second text string is calculated respectively Calculated like angle value, and according to K the first semantic similarity values of calculating between first text string and the second text string Second semantic similarity value.
Preferably, the first semantic phase calculated respectively between the K cypher text string and second text string Like angle value.Specifically, the calculating processing of the first semantic similarity value includes:
First, at least one second interdependent arc that dependency analysis acquirement is carried out to second text string is obtained.This In, because the second interdependent arc obtained from the second text string is used multiple times, therefore can to second text string carry out according to Relationship analysis is deposited to obtain after at least one described second interdependent arc, by least one described second interdependent arc be retained in caching with Reuse, at least one described second interdependent arc is obtained again without re-starting dependency analysis every time.
Hereafter, to any cypher text string, following handle is performed:Dependence point is carried out to the cypher text string Analysis, obtains at least one first interdependent arc, is calculated based at least one described first interdependent arc and at least one second interdependent arc The first semantic similarity value between any cypher text string and second text string.
Preferably, at least one described first interdependent arc and the cosine similarity conduct of at least one the second interdependent arc are calculated The first semantic similarity value between any cypher text string and second text string.
For example, obtain K cypher text string by step S120, calculated by step S130 and obtain the first interdependent arc and the Two interdependent arcs, carry out cosine calculating, and then obtain K the first semantic similarities to the described first interdependent arc and the second interdependent arc Value, if for example, any cypher text string t and the second text string w interdependent arc set are expressed as:Arcs (t)= { (t0, ti) ... (ti, tj) ... } and arcs (w) { (w0, wi) ... (wi, wj) ... }, then can be calculated by below equation Cypher text string t and the second text string w cosine similarity (i.e. the first semantic similarity value) Similarity (t, w):
Wherein, numbersof (wi, wj) and numbersof (ti, tj) represent interdependent arc (wi, wj) and (ti, tj) respectively Number.
Preferably, the score of each cypher text string is given using the dependency structure model as weight, to the K the One semantic similarity value is weighted summation, obtains the second semantic similarity between first text string and the second text string Value.
For example, the second semantic similarity value is calculated by following equation,
Wherein, DEP (t) is the scoring according to the dependency structure model to cypher text string t, can be by above-mentioned interdependent Structure probability is calculated and obtained, and K is the number of cypher text string.
Fig. 3 is the schematic flow sheet for the search processing method for showing exemplary embodiment of the present.
Reference picture 3, in step S210, receives the search keyword that search term, i.e. user always input in search engine.
In step S220, multiple search result entries are obtained according to the search term, for example, by step S210, receiving Search keyword to user is " baby fever ", and the search result entry got may be " baby's cat fever ", " baby The search result entry such as fever ", " neonate has a high fever ", " child's fever " or " baby's heating ".
In step S230, the search term and the multiple search are calculated according to the method for foregoing calculating text similarity As a result the semantic similarity value of the content title of entry.
In step S240, according to the semantic similarity value calculated in the step S230 to the multiple search result Entry is ranked up.
Here, still by taking above-mentioned " baby fever " as an example, if by step S230 calculate its semantic similarity value (for example with Similarity is represented) it is respectively Similarity (baby's cat fever, baby fever)=0.87, Similarity (babies Fever, baby fever)=0.71, Similarity (neonate has a high fever, baby fever)=0.83, Similarity (children Fever, baby fever)=0.65, Similarity (baby generate heat, baby fever)=0.79, Similarity value is carried out from greatly to It is small to be ordered as:Similarity (baby's cat fever, baby fever), Similarity (neonate has a high fever, baby fever), Similarity (baby generates heat, baby fever), Similarity (baby has a fever, baby fever), Similarity (child's hairs Burn, baby fever).
In step S250, the search result entry by sequence is sent.By taking above-mentioned " baby fever " as an example, finally in order The search result entry of transmission is then:Baby's cat fever, neonate are had a high fever, baby generates heat, baby has a fever and child's fever.
It is provided in an embodiment of the present invention to calculate text similarity and search processing method by computer implemented, by short Language translation model and dependency structure model carry out decoding to the first text string (search keyword or query of such as user's input) and obtained Multiple cypher text strings are obtained, the multiple cypher text string and the second text string (such as content of search result entry are calculated respectively Title) between the first semantic similarity value, and according to multiple first semantic similarity values of calculating calculate it is described first text The second semantic similarity value between this string and the second text string, solves the problems, such as the dependence of text string middle and long distance, energy Enough similarities calculated comprehensively, exactly between text string.
In search technique, as above institute is carried out by the content title for the search result entry for obtaining search term and search The Semantic Similarity Measurement stated, can preferably represent the semanteme of search statement, and can integrate according to the Similarity value and One text string, is ranked up to the search result of return, so that optimal search result is obtained, so that user checks.In this way, solution The certainly dependence problem of text string middle and long distance, so as to preferably be matched search statement with web page title, to user The search result entry of semantic matches is provided, strengthens user's search experience.
Fig. 4 is the structured flowchart of the device for the calculating text similarity for showing exemplary embodiment of the present.
As shown in figure 4, the device for calculating text similarity includes text string acquiring unit 310, text string decoding list Member 320 and Similarity value computing unit 330.
Text string acquiring unit 310 is used to obtain the first text string and the second text string.
For example, first text string can be the search statement of user's input, second text string can be prestored Band matching document web page title.
Text string decoding unit 320 is used for according to the phrase translation model that pre-sets and dependency structure model to described the One text string is decoded, and obtains K cypher text string.
Preferably, text string decoding unit 320 is according to the phrase translation model, the dependency structure model, NGRAM languages Speech model and word order Twisting model are decoded to first text string, obtain the K cypher text string, wherein, lead to Cross post search decoder to decode first text string, obtain K cypher text string.
Preferably, text string decoding unit 320 calculates comprehensive grading by below equation to any candidate's text string T Score(T):
Score (T)=λ1LM(T)+λ2TM(Q,T)+λ3D(Q,T)+λ4DEP(T)
Wherein, LM (T) is the scoring to cypher text string T according to the NGRAM language models, and TM (Q, T) is according to institute The probability score that phrase translation model is translated as cypher text string T by the first text string Q is stated, D (Q, T) is turned round according to the word order The scoring that cypher text string T is translated as by the first text string Q that bent model is calculated, DEP (T) is according to the dependency structure model Scoring to cypher text string T, λ14The weight of the scoring of foregoing four models is to confer to respectively, passes through the comprehensive grading The K cypher text string is chosen among candidate's text string.
Similarity value computing unit 330 is used to calculate respectively between the K cypher text string and second text string The first semantic similarity value, and calculate first text strings and second according to K the first semantic similarity values of calculating The second semantic similarity value between text string.
Preferably, Similarity value computing unit 330 is obtained carries out dependency analysis acquirement to second text string At least one second interdependent arc, and to any cypher text string, perform following handle:The cypher text string is carried out Dependency analysis, obtains at least one first interdependent arc, based at least one described first interdependent arc and at least one second Interdependent arc calculates the first semantic similarity value between any the cypher text string and second text string.
Preferably, Similarity value computing unit 330 calculate at least one described first interdependent arc and at least one second according to The cosine similarity for depositing arc is semantic similar as first between any the cypher text string and second text string Angle value.
Preferably, Similarity value computing unit 330 gives the score of each cypher text string with the dependency structure model As weight, summation is weighted to the K the first semantic similarity values, first text string and the second text string is obtained Between the second semantic similarity value.
Fig. 5 is the structured flowchart for the search process device for showing exemplary embodiment of the present.
Reference picture 5, the search process device includes:Search term receiving unit 410, search result acquiring unit 420, language Adopted similar value computing unit 430, sequencing unit 440 and transmitting element 450.
Search term receiving unit 410 is used to receive the search keyword that search term, i.e. user always input in search engine.
The search term that search result acquiring unit 420 is used to be received according to the search term receiving unit 410 obtains multiple Search result entry.
Semantic similar value computing unit 430 is used to calculate the search term by the device of foregoing calculating text similarity With the semantic similarity value of the content title of the multiple search result entry.
Sequencing unit 440 is used to arrange the multiple search result entry according to the semantic similarity value of calculating Sequence.
Transmitting element 450 is used to send the search result entry by sequence.
It is provided in an embodiment of the present invention to calculate text similarity and search process device by computer implemented, by short Language translation model and dependency structure model carry out decoding to the first text string (search keyword or query of such as user's input) and obtained Multiple cypher text strings are obtained, the multiple cypher text string and the second text string (such as content of search result entry are calculated respectively Title) between the first semantic similarity value, and according to multiple first semantic similarity values of calculating calculate it is described first text The second semantic similarity value between this string and the second text string, solves the problems, such as the dependence of text string middle and long distance, energy Enough similarities calculated comprehensively, exactly between text string.
In search technique, as above institute is carried out by the content title for the search result entry for obtaining search term and search The Semantic Similarity Measurement stated, can preferably represent the semanteme of search statement, and can integrate according to the Similarity value and One text string, is ranked up to the search result of return, so that optimal search result is obtained, so that user checks.In this way, solution The certainly dependence problem of text string middle and long distance, so as to preferably be matched search statement with web page title, to user The search result entry of semantic matches is provided, strengthens user's search experience.
It may be noted that the need for according to implementation, each step described in this application can be split as into more multi-step, also may be used The part operation of two or more steps or step is combined into new step, to realize the purpose of the present invention.
Above-mentioned the method according to the invention can be realized in hardware, firmware, or be implemented as being storable in recording medium Software or computer code in (such as CD ROM, RAM, floppy disk, hard disk or magneto-optic disk), or it is implemented through network download Original storage in long-range recording medium or nonvolatile machine readable media and the meter that will be stored in local recording medium Calculation machine code, so that method described here can be stored in using all-purpose computer, application specific processor or programmable or special With such software processing in hardware (such as ASIC or FPGA) recording medium.It is appreciated that computer, processor, micro- Processor controller or programmable hardware include can storing or receive software or computer code storage assembly (for example, RAM, ROM, flash memory etc.), when the software or computer code are by computer, processor or hardware access and when performing, realize herein The processing method of description.In addition, when all-purpose computer accesses the code for realizing the processing being shown in which, the execution of code All-purpose computer is converted into the special-purpose computer for performing the processing being shown in which.
The foregoing is only a specific embodiment of the invention, but protection scope of the present invention is not limited thereto, any Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, should all be contained Cover within protection scope of the present invention.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.

Claims (14)

1. a kind of method by computer implemented calculating text similarity, it is characterised in that methods described includes:
Obtain the first text string and the second text string;
According to the phrase translation model and dependency structure model pre-set, first text string is decoded, K are obtained Cypher text string;
Calculate the first semantic similarity value between the K cypher text string and second text string respectively, and according to K the first semantic similarity values of calculating calculate the second semantic similarity between first text string and the second text string Value, the K according to calculating the first semantic similarity values calculate second between first text string and the second text string The processing of semantic similarity value includes:The score of each cypher text string is given as weight using the dependency structure model, it is right The K the first semantic similarity values are weighted summation, obtain second between first text string and the second text string Semantic similarity value.
2. according to the method described in claim 1, it is characterised in that it is described calculate respectively the K cypher text string with it is described The processing of the first semantic similarity value between second text string includes:
At least one second interdependent arc that dependency analysis acquirement is carried out to second text string is obtained, and to any institute Cypher text string is stated, following handle is performed:
Dependency analysis is carried out to the cypher text string, at least one first interdependent arc is obtained,
Any cypher text string is calculated based at least one described first interdependent arc and at least one second interdependent arc The first semantic similarity value between second text string.
3. method according to claim 2, it is characterised in that it is described based at least one described first interdependent arc and at least One the second interdependent arc calculates the first semantic similarity between any the cypher text string and second text string The processing of value includes:
The cosine similarity for calculating at least one described first interdependent arc and at least one the second interdependent arc is used as any institute State the first semantic similarity value between cypher text string and second text string.
4. according to the method described in claim 1, it is characterised in that phrase translation model that the basis is pre-set and interdependent Structural model is decoded to first text string, is obtained the processing of K cypher text string and is included:
According to the phrase translation model, the dependency structure model, NGRAM language models and word order Twisting model to described First text string is decoded, and obtains the K cypher text string.
5. method according to claim 4, it is characterised in that it is described according to the phrase translation model, it is described interdependent Structural model, NGRAM language models and word order Twisting model are decoded to the first text string Q, are obtained described K and are turned over In the processing for translating text string T,
Comprehensive grading Score (T) is calculated to any candidate's text string T by below equation:
Score (T)=λ1LM(T)+λ2TM(Q,T)+λ3D(Q,T)+λ4DEP(T)
Wherein, LM (T) is the scoring to cypher text string T according to the NGRAM language models, and TM (Q, T) is according to described short Language translation model is translated as cypher text string T probability score by the first text string Q, and D (Q, T) is according to the word order kink mode The scoring that cypher text string T is translated as by the first text string Q that type is calculated, DEP (T) is to turning over according to the dependency structure model Translate text string T scoring, λ14The weight of the scoring of foregoing four models is to confer to respectively,
The K cypher text string is chosen among candidate's text string by the comprehensive grading.
6. method according to claim 5, it is characterised in that decoder is searched for by post and carried out to first text string Decoding, obtains K cypher text string.
7. a kind of search processing method, it is characterised in that including:
Receive search term;
Multiple search result entries are obtained according to the search term;
Search term and the multiple search result entry according to being calculated such as method according to any one of claims 1 to 6 Content title semantic similarity value;
The multiple search result entry is ranked up according to the semantic similarity value of calculating;
Send the search result entry by sequence.
8. a kind of device for calculating text similarity, it is characterised in that described device includes:
Text string acquiring unit, for obtaining the first text string and the second text string;
Text string decoding unit, for according to the phrase translation model and dependency structure model pre-set, to the described first text This string is decoded, and obtains K cypher text string;
Similarity value computing unit, for calculating first between the K cypher text string and second text string respectively Semantic similarity value, and calculate first text string and the second text string according to K the first semantic similarity values of calculating Between the second semantic similarity value, the Similarity value computing unit gives each cypher text with the dependency structure model The score of string is weighted summation to the K the first semantic similarity values as weight, acquirement first text string and the The second semantic similarity value between two text strings.
9. device according to claim 8, it is characterised in that the Similarity value computing unit is obtained to the described second text This string carries out at least one second interdependent arc of dependency analysis acquirement,
To any cypher text string, following handle is performed:
Dependency analysis is carried out to the cypher text string, at least one first interdependent arc is obtained,
Any cypher text string is calculated based at least one described first interdependent arc and at least one second interdependent arc The first semantic similarity value between second text string.
10. device according to claim 9, it is characterised in that at least one described in the Similarity value computing unit calculating The cosine similarity of individual first interdependent arc and at least one the second interdependent arc is used as any cypher text string and described The first semantic similarity value between second text string.
11. device according to claim 8, it is characterised in that the text string decoding unit is according to the phrase translation Model, the dependency structure model, NGRAM language models and word order Twisting model are decoded to first text string, Obtain the K cypher text string.
12. device according to claim 11, it is characterised in that the text string decoding unit is by below equation to appointing One candidate's text string T calculates comprehensive grading Score (T):
Score (T)=λ1LM(T)+λ2TM(Q,T)+λ3D(Q,T)+λ4DEP(T)
Wherein, LM (T) is the scoring to cypher text string T according to the NGRAM language models, and TM (Q, T) is according to described short Language translation model is translated as cypher text string T probability score by the first text string Q, and D (Q, T) is according to the word order kink mode The scoring that cypher text string T is translated as by the first text string Q that type is calculated, DEP (T) is to turning over according to the dependency structure model Translate text string T scoring, λ14The weight of the scoring of foregoing four models is to confer to respectively,
The K cypher text string is chosen among candidate's text string by the comprehensive grading.
13. device according to claim 12, it is characterised in that decoder is searched for by post and entered to first text string Row decoding, obtains K cypher text string.
14. a kind of search process device, it is characterised in that including:
Search term receiving unit, for receiving search term;
Search result acquiring unit, for obtaining multiple search result entries according to the search term;
Semantic similar value computing unit, for being searched for according to the device calculating any one of claim 8~13 Word and the semantic similarity value of the content title of the multiple search result entry;
Sequencing unit, is ranked up for the semantic similarity value according to calculating to the multiple search result entry;
Transmitting element, for sending the search result entry by sequence.
CN201410728432.4A 2014-12-03 2014-12-03 Pass through computer implemented calculating text similarity and search processing method and device Active CN104462060B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410728432.4A CN104462060B (en) 2014-12-03 2014-12-03 Pass through computer implemented calculating text similarity and search processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410728432.4A CN104462060B (en) 2014-12-03 2014-12-03 Pass through computer implemented calculating text similarity and search processing method and device

Publications (2)

Publication Number Publication Date
CN104462060A CN104462060A (en) 2015-03-25
CN104462060B true CN104462060B (en) 2017-08-01

Family

ID=52908130

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410728432.4A Active CN104462060B (en) 2014-12-03 2014-12-03 Pass through computer implemented calculating text similarity and search processing method and device

Country Status (1)

Country Link
CN (1) CN104462060B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106021346B (en) * 2016-05-09 2020-01-07 北京百度网讯科技有限公司 Retrieval processing method and device
CN106227771B (en) * 2016-07-15 2019-05-07 浙江大学 A kind of domain expert's discovery method based on socialization programming website
CN107784037B (en) * 2016-08-31 2022-02-01 北京搜狗科技发展有限公司 Information processing method and device, and device for information processing
CN106503175B (en) * 2016-11-01 2019-03-29 上海智臻智能网络科技股份有限公司 Inquiry, problem extended method, device and the robot of Similar Text
CN106776782B (en) * 2016-11-21 2020-05-22 北京百度网讯科技有限公司 Semantic similarity obtaining method and device based on artificial intelligence
US10699302B2 (en) 2017-03-29 2020-06-30 Ebay Generating keywords by associative context with input words
CN107729300B (en) * 2017-09-18 2021-12-24 百度在线网络技术(北京)有限公司 Text similarity processing method, device and equipment and computer storage medium
CN107885737B (en) * 2017-12-27 2021-04-27 传神语联网网络科技股份有限公司 Man-machine interactive translation method and system
CN111708942B (en) * 2020-06-12 2023-08-08 北京达佳互联信息技术有限公司 Multimedia resource pushing method, device, server and storage medium
CN111881669B (en) * 2020-06-24 2023-06-09 百度在线网络技术(北京)有限公司 Synonymous text acquisition method and device, electronic equipment and storage medium
CN112182348B (en) * 2020-11-09 2024-03-29 百度国际科技(深圳)有限公司 Semantic matching judging method, device, electronic equipment and computer readable medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101169780A (en) * 2006-10-25 2008-04-30 华为技术有限公司 Semantic ontology retrieval system and method
CN101286161A (en) * 2008-05-28 2008-10-15 华中科技大学 Intelligent Chinese request-answering system based on concept
CN102184169A (en) * 2011-04-20 2011-09-14 北京百度网讯科技有限公司 Method, device and equipment used for determining similarity information among character string information
CN102567306A (en) * 2011-11-07 2012-07-11 苏州大学 Acquisition method and acquisition system for similarity of vocabularies between different languages
CN102637163A (en) * 2011-01-09 2012-08-15 华东师范大学 Method and system for controlling multi-level ontology matching based on semantemes
CN102737013A (en) * 2011-04-02 2012-10-17 三星电子(中国)研发中心 Device and method for identifying statement emotion based on dependency relation
EP2541435A1 (en) * 2010-02-26 2013-01-02 National Institute of Information and Communication Technology Relational information expansion device, relational information expansion method and program

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001282786A (en) * 2000-03-27 2001-10-12 Internatl Business Mach Corp <Ibm> System and method for machine translation and storage medium with program for executing the same method stored thereon

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101169780A (en) * 2006-10-25 2008-04-30 华为技术有限公司 Semantic ontology retrieval system and method
CN101286161A (en) * 2008-05-28 2008-10-15 华中科技大学 Intelligent Chinese request-answering system based on concept
EP2541435A1 (en) * 2010-02-26 2013-01-02 National Institute of Information and Communication Technology Relational information expansion device, relational information expansion method and program
CN102637163A (en) * 2011-01-09 2012-08-15 华东师范大学 Method and system for controlling multi-level ontology matching based on semantemes
CN102737013A (en) * 2011-04-02 2012-10-17 三星电子(中国)研发中心 Device and method for identifying statement emotion based on dependency relation
CN102184169A (en) * 2011-04-20 2011-09-14 北京百度网讯科技有限公司 Method, device and equipment used for determining similarity information among character string information
CN102567306A (en) * 2011-11-07 2012-07-11 苏州大学 Acquisition method and acquisition system for similarity of vocabularies between different languages

Also Published As

Publication number Publication date
CN104462060A (en) 2015-03-25

Similar Documents

Publication Publication Date Title
CN104462060B (en) Pass through computer implemented calculating text similarity and search processing method and device
Thakur et al. Beir: A heterogenous benchmark for zero-shot evaluation of information retrieval models
Do et al. Developing a BERT based triple classification model using knowledge graph embedding for question answering system
US8346756B2 (en) Calculating valence of expressions within documents for searching a document index
JP2020522044A5 (en)
CN104199833B (en) The clustering method and clustering apparatus of a kind of network search words
US11977854B2 (en) Computer implemented methods for the automated analysis or use of data, including use of a large language model
US11829725B2 (en) Computer implemented method for the automated analysis or use of data
US11989507B2 (en) Computer implemented methods for the automated analysis or use of data, including use of a large language model
US11989527B2 (en) Computer implemented methods for the automated analysis or use of data, including use of a large language model
Tomar et al. Towards Twitter hashtag recommendation using distributed word representations and a deep feed forward neural network
US20230259705A1 (en) Computer implemented methods for the automated analysis or use of data, including use of a large language model
CN111813923B (en) Text summarization method, electronic device and storage medium
US20230274089A1 (en) Computer implemented methods for the automated analysis or use of data, including use of a large language model
Bhutani et al. Open information extraction from question-answer pairs
Omeliyanenko et al. Lm4kg: Improving common sense knowledge graphs with language models
Sotudeh et al. Guir at semeval-2020 task 12: Domain-tuned contextualized models for offensive language detection
Ayoobkhan et al. Web page recommendation system by integrating ontology and stemming algorithm
Mao et al. DIGAT: modeling news recommendation with dual-graph interaction
González et al. ELiRF-UPV at SemEval-2019 task 3: snapshot ensemble of hierarchical convolutional neural networks for contextual emotion detection
Nambiar et al. Attention based abstractive summarization of malayalam document
Calizzano et al. Ordering sentences and paragraphs with pre-trained encoder-decoder transformers and pointer ensembles
Pedraza et al. Automatic service retrieval in converged environments based on natural language request
CN116226677B (en) Parallel corpus construction method and device, storage medium and electronic equipment
Xi et al. Multi-Feature and Multi-Channel GCNs for Aspect Based Sentiment Analysis

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20180116

Address after: 100085 Beijing, Haidian District, No. ten on the ground floor, No. 10 Baidu building, layer 2

Patentee after: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY Co.,Ltd.

Address before: 100085 Beijing, Haidian District, No. ten on the ground floor, No. 10 Baidu building, layer three

Patentee before: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY Co.,Ltd.