CN107688563A - A kind of recognition methods of synonym and identification device - Google Patents
A kind of recognition methods of synonym and identification device Download PDFInfo
- Publication number
- CN107688563A CN107688563A CN201610641371.7A CN201610641371A CN107688563A CN 107688563 A CN107688563 A CN 107688563A CN 201610641371 A CN201610641371 A CN 201610641371A CN 107688563 A CN107688563 A CN 107688563A
- Authority
- CN
- China
- Prior art keywords
- participle
- similarity
- query result
- address
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of recognition methods of synonym and identification device, to improve the degree of accuracy of synonym identification, and then improves user and inquires about experience.This method is:For belonging to the same category of first participle and the second participle, after calculating address similarity and the literal similarity between the first participle and the second participle, further according to address similarity and literal similarity, calculate the comprehensive similarity between the first participle and the second participle, when determining that comprehensive similarity is not less than predetermined threshold value, the first participle and the second participle synonym each other are judged.So, by being considered from the address similarity between two participles and literal similarity so that the comprehensive similarity calculated is more accurate, and then so that the recognition result of synonym is more accurate.Moreover, calculating comprehensive similarity for belonging to same category of two participles, the degree of accuracy of synonym identification is further increased.
Description
Technical field
The present invention relates to recognition methods and the identification device of field of computer technology, more particularly to a kind of synonym.
Background technology
Synonym, not only the same or like word of symbolical meaningses, goes back the related word of symbolical meaningses.Such as:" potato "
" potato " is meaning identical synonym, and " strict " and " severe " is the synonym being close in meaning, " employment " and " recruitment "
It is related synonym of meaning, etc..
In practical application, in internet arena, particularly in query search field, the excavation of synonym is one very heavy
The work wanted, the Query Information that its realization inputs for deep understanding user, Query Result is enriched, and provided the user more
Good inquiry experience is very helpful.At present, obtaining the method for synonym mainly has two kinds of means, and one kind is special by language
Family writes thesaurus according to word accumulation, and another kind is using the degree of correlation of semantic analysis technology identification word, automatic mining
Synonym.Because the artificial synonym that obtains needs to expend substantial amounts of human resources and material resources, efficiency also than relatively low, so,
It is more and more common according to the mode of semantic analysis automatic identification synonym.
In the prior art, it is proposed that following two synonym automatic identifying methods:
First method:It is determined that the smallest edit distance between two Chinese words for needing to identify is less than or equal to editing distance
After threshold value, by judge the two Chinese words whether all exist with default thesaurus, to judge that the two Chinese words are
No is synonym.
Second method:Each Query Information in inquiry log is first divided into word, and respectively by each word of division
With the result address composition word in inquiry log and the matching pair of result address, and according to the frequency of user's match query pair and
The number of matching pair corresponding to each result address, screens all matchings pair, and is composed of matching pair by what is filtered out
Set, according to result address, from the matching of composition to searching the word matched with the result address in set, by the word found work
For synonym.
Based on above-mentioned analysis, there is following drawback in the synonymous word recognition method proposed in the prior art:
(1) it is directed to the first above-mentioned synonymous word recognition method:If two words are synonyms, but the two words on text not
It is very close to even two words are synonyms, but the editing distance between the two words is farther out, then may result in None- identified
The two synonyms;If two words are not synonyms, but the two words on text very close to even two words are not synonymous
Word, but the editing distance between the two words is nearer, then may result in identification synonym mistake.Such as:It is " how is perfume " and " double
Farther out, but they are synonyms to editing distance between C ";And for example:Editing distance between " milk " and " milk cow " is nearer, but
They are not synonyms.Moreover, the Internet era risen suddenly and sharply in word information, new word language continually, if the knowledge of synonym
Other method excessively relies on the thesaurus write in advance, then may cause because word that thesaurus is covered is than relatively limited
None- identified new life synonym.
(2) it is directed to above-mentioned second synonymous word recognition method:Although this method is independent of thesaurus as identification base
Plinth, the editing distance between two words is not used yet, relative to the first above-mentioned synonymous word recognition method, improve synonym knowledge
Other degree of accuracy, still, the synonym recognizer are fairly simple, the calculating quantified to the similarity degree neither one of synonym
Value is weighed, and the synonym degree of accuracy identified is still very low, and then be have impact on user and inquired about experience.
The content of the invention
It is of the prior art to solve the embodiments of the invention provide a kind of recognition methods of synonym and identification device
It is relatively low recognition accuracy to be present in synonymous word recognition method, and then influences the problem of user inquires about experience.
Concrete technical scheme provided in an embodiment of the present invention is as follows:
A kind of recognition methods of synonym, including:
For belonging to the same category of first participle and the second participle, the above-mentioned first participle and the above-mentioned second participle are calculated
Between address similarity;Wherein, the first user corresponding to the above-mentioned first participle of address above mentioned similarity characterization clicks on inquiry knot
Fruit address set second user corresponding with the above-mentioned second participle clicks on the similarity between Query Result address set;
Calculate the literal similarity between the above-mentioned first participle and above-mentioned second participle;Wherein, above-mentioned literal similarity table
Levy the similarity between the first character group that the above-mentioned first participle includes and the second character group that above-mentioned second participle includes;
Based on address above mentioned similarity and above-mentioned literal similarity, calculate between the above-mentioned first participle and above-mentioned second participle
Comprehensive similarity;
When determining that above-mentioned comprehensive similarity is not less than predetermined threshold value, judge the above-mentioned first participle with the above-mentioned second participle each other
Synonym.
Preferably, for belong to the same category of first participle and second participle, calculate the above-mentioned first participle with it is above-mentioned
Before address similarity between second participle, further comprise:
User's inquiry log is gathered, wherein, user's inquiry log comprises at least:The Query Information of user's input, base
In all Query Result addresses that above-mentioned Query Information is shown to user, and the Query Result address that all users click on;
All Query Informations in preset time range are carried out with word segmentation processing respectively, obtains corresponding each participle, and
The Query Result address that all users corresponding to counting each participle respectively click on;
The Query Result address clicked on based on all users corresponding to each participle and each participle, generates phase respectively
The user answered clicks on Query Result address set.
Preferably, the address similarity between the above-mentioned first participle and above-mentioned second participle is calculated, including:
The Query Result address for all users click that Query Result address set includes is clicked on based on above-mentioned first user
Domain name, and above-mentioned second user click on the Query Result address field that all users that Query Result address set includes click on
Name, the first Query Result address sum is calculated, wherein, above-mentioned first Query Result address sum characterizes above-mentioned first user and clicked on
Query Result address set and above-mentioned second user click on Query Result address domain name identical between Query Result address set
The summation of all Query Result addresses;
The Query Result address for all users click that Query Result address set includes is clicked on based on above-mentioned first user
Number, and above-mentioned second user click on the Query Result number of addresses that all users that Query Result address set includes click on
Mesh, the second Query Result address sum is calculated, wherein, above-mentioned second Query Result address sum characterizes above-mentioned first user and clicked on
Query Result address set and above-mentioned second user click on the summation of all Query Result addresses between Query Result address set;
Based on the total and above-mentioned second Query Result address sum in above-mentioned first Query Result address, above-mentioned first point is calculated
Address similarity between word and above-mentioned second participle.
Preferably, the literal similarity between the above-mentioned first participle and above-mentioned second participle is calculated, including:
All identical characters between above-mentioned first character group and above-mentioned second character group are counted, it is all identical based on statistics
Character, determine the identical characters sum between the above-mentioned first participle and above-mentioned second participle;
The second character included based on total and above-mentioned second character group of the first character that above-mentioned first character group includes is total
Number, the total minimum character sum between above-mentioned second character sum of above-mentioned first character is determined,
Based on the total and above-mentioned minimum character sum of above-mentioned identical characters, the above-mentioned first participle and the above-mentioned second participle are calculated
Between literal similarity.
Preferably, being based on address above mentioned similarity and above-mentioned literal similarity, the above-mentioned first participle and above-mentioned second are calculated
Comprehensive similarity between participle, including:
It is determined that characterize the first constant of address above mentioned similarity weight and characterize the second normal of above-mentioned literal similarity weight
Number, wherein, above-mentioned first constant is 1 with above-mentioned second constant sum;
Based on address above mentioned similarity and above-mentioned first constant, and above-mentioned literal similarity and above-mentioned second constant, meter
Count in stating the comprehensive similarity between the first participle and above-mentioned second participle.
A kind of identification device of synonym, including:
First computing unit, for for belonging to the same category of first participle and the second participle, calculating above-mentioned first
Address similarity between participle and above-mentioned second participle;Wherein, corresponding to the above-mentioned first participle of address above mentioned similarity characterization
First user clicks on Query Result address set second user corresponding with the above-mentioned second participle and clicks on Query Result address set
Between similarity;
Second computing unit, for calculating the literal similarity between the above-mentioned first participle and above-mentioned second participle;Wherein,
The second character group that the first character group that the above-mentioned above-mentioned first participle of literal similarity characterization includes includes with the above-mentioned second participle
Between similarity;
3rd computing unit, for based on address above mentioned similarity and above-mentioned literal similarity, calculating the above-mentioned first participle
With the comprehensive similarity between the above-mentioned second participle;
Recognition unit, during for determining that above-mentioned comprehensive similarity is not less than predetermined threshold value, judge the above-mentioned first participle with it is upper
State the second participle synonym each other.
Preferably, above-mentioned identification device also includes:Collecting unit, pretreatment unit, gather generation unit, wherein, upper
The first computing unit is stated for belonging to the same category of first participle and the second participle, calculates the above-mentioned first participle and above-mentioned the
Before address similarity between two participles,
Above-mentioned collecting unit, for gathering user's inquiry log, wherein, user's inquiry log comprises at least:User
The Query Information of input, all Query Result addresses shown based on above-mentioned Query Information to user, and all users are clicked on
Query Result address;
Above-mentioned pretreatment unit, for carrying out word segmentation processing respectively to all Query Informations in preset time range, obtain
Corresponding each participle is taken, and counts the Query Result address of all users' clicks corresponding to each participle respectively;
Above-mentioned set generation unit, looked into for what is clicked on based on all users corresponding to each participle and each participle
Result address is ask, corresponding user is generated respectively and clicks on Query Result address set.
Preferably, when calculating the address similarity between the above-mentioned first participle and above-mentioned second participle, above-mentioned first calculates
Unit is specifically used for:
The Query Result address for all users click that Query Result address set includes is clicked on based on above-mentioned first user
Domain name, and above-mentioned second user click on the Query Result address field that all users that Query Result address set includes click on
Name, the first Query Result address sum is calculated, wherein, above-mentioned first Query Result address sum characterizes above-mentioned first user and clicked on
Query Result address set and above-mentioned second user click on Query Result address domain name identical between Query Result address set
The summation of all Query Result addresses;
The Query Result address for all users click that Query Result address set includes is clicked on based on above-mentioned first user
Number, and above-mentioned second user click on the Query Result number of addresses that all users that Query Result address set includes click on
Mesh, the second Query Result address sum is calculated, wherein, above-mentioned second Query Result address sum characterizes above-mentioned first user and clicked on
Query Result address set and above-mentioned second user click on the summation of all Query Result addresses between Query Result address set;
Based on the total and above-mentioned second Query Result address sum in above-mentioned first Query Result address, above-mentioned first point is calculated
Address similarity between word and above-mentioned second participle.
Preferably, when calculating the literal similarity between the above-mentioned first participle and above-mentioned second participle, above-mentioned second calculates
Unit is specifically used for:
All identical characters between above-mentioned first character group and above-mentioned second character group are counted, it is all identical based on statistics
Character, determine the identical characters sum between the above-mentioned first participle and above-mentioned second participle;
The second character included based on total and above-mentioned second character group of the first character that above-mentioned first character group includes is total
Number, the total minimum character sum between above-mentioned second character sum of above-mentioned first character is determined,
Based on the total and above-mentioned minimum character sum of above-mentioned identical characters, the above-mentioned first participle and the above-mentioned second participle are calculated
Between literal similarity.
Preferably, being based on address above mentioned similarity and above-mentioned literal similarity, the above-mentioned first participle and above-mentioned second are calculated
During comprehensive similarity between participle, above-mentioned 3rd computing unit is specifically used for:
It is determined that characterize the first constant of address above mentioned similarity weight and characterize the second normal of above-mentioned literal similarity weight
Number, wherein, above-mentioned first constant is 1 with above-mentioned second constant sum;
Based on address above mentioned similarity and above-mentioned first constant, and above-mentioned literal similarity and above-mentioned second constant, meter
Count in stating the comprehensive similarity between the first participle and above-mentioned second participle.
The embodiment of the present invention has the beneficial effect that:
In the embodiment of the present invention, by calculating the comprehensive similarity between two participles, you can judge the two participles
Whether it is synonym, the synonym between being segmented suitable for any two identifies, is also no longer dependent on the synonym write in advance
Storehouse, avoid because the word that thesaurus is covered is than relatively limited, lead to not the newborn synonymous word problem of identification.It is moreover, logical
Cross and considered on both side from the address similarity between two participles and literal similarity so that two calculated segment it
Between comprehensive similarity it is more accurate, and then improve synonym identification accuracy.Further, for belonging to same category
Two participle calculate comprehensive similarities, further increase synonym identification the degree of accuracy.
Brief description of the drawings
Fig. 1 is the overview schematic diagram of synonymous word recognition method in the embodiment of the present invention;
Fig. 2 is the idiographic flow schematic diagram of synonymous word recognition method in the embodiment of the present invention;
Fig. 3 is the illustrative view of functional configuration of synonym identification device in the embodiment of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, is not whole embodiments.It is based on
Embodiment in the present invention, those of ordinary skill in the art are obtained every other under the premise of creative work is not made
Embodiment, belong to the scope of protection of the invention.
In order to solve synonymous word recognition method of the prior art, recognition accuracy to be present relatively low, and then influences user's inquiry
The problem of experience, in the embodiment of the present invention, it can be directed to and belong to the same category of first participle and the second participle, first calculate above-mentioned
Address similarity and literal similarity between the first participle and above-mentioned second participle, then based on the above-mentioned first participle and above-mentioned the
Address similarity and literal similarity between two participles, calculate the synthesis phase between the above-mentioned first participle and above-mentioned second participle
Like degree, finally, when it is determined that above-mentioned comprehensive similarity is not less than predetermined threshold value, it is possible to judge the above-mentioned first participle and above-mentioned the
Two segment synonym each other.
The present invention program is described in detail below by specific embodiment, certainly, the present invention is not limited to following reality
Apply example.
As shown in fig.1, synonymous word recognition method provided in an embodiment of the present invention, can apply to but be not limited to:Search
Engine server, specifically, the flow for the synonymous word recognition method that search engine server uses are as follows:
Step 100:For belong to the same category of first participle and second participle, calculate the above-mentioned first participle with it is above-mentioned
Address similarity between second participle;Wherein, first user's point corresponding to the above-mentioned first participle of address above mentioned similarity characterization
Hit corresponding with the above-mentioned second participle second user of Query Result address set click on it is similar between Query Result address set
Degree.
In actual applications, before step 100 is performed, search engine server can also perform but be not limited to following step
Suddenly:
First, search engine server gathers user's inquiry log in real time, wherein, user's inquiry log at least wraps
Include:The Query Information of user's input, all Query Result addresses shown based on above-mentioned Query Information to user, and institute are useful
The Query Result address that family is clicked on.
Then, search engine server carries out word segmentation processing respectively to all Query Informations in preset time range, obtains
Corresponding each participle is taken, and is classified to each, and for each participle that each classification includes, is united respectively
The Query Result address that all users corresponding to counting each participle click on.
Tellable to be, all Query Informations of the search engine server in preset time range segment respectively
Before processing, additional character processing can be also carried out respectively for each Query Information, go to stop the relevant treatments such as word processing.Than
Such as:For Query Information " fresh flower shop (Zhichun Road shop) ", search engine server can be removed in the Query Information " bracket ";Pin
To Query Information " the fresh flower shop of Zhichun Road ", search engine server can remove in the Query Information " ", etc..Specifically
Ground, go additional character processing, go to stop the correlation process methods such as word processing, it is same as the prior art, it will not be repeated here.
Finally, the inquiry that search engine server is clicked on based on all users corresponding to each participle and each participle
Result address, corresponding user is generated respectively and clicks on Query Result address set.
Such as:In the user journal information 1 that search engine server collects, the Query Information 1 of user's input is:Haidian
The fresh flower shop of area Zhichun Road;Search engine server is to all Query Result addresses that user shows:URL
(Uniform Resource Locator, URL) 1, URL 2, URL 3, URL 4 and URL 5;The inquiry knot that all users click on
Fruit address is:URL 1, URL 2 and URL 4.
In the user journal information 2 that search engine server collects, the Query Information 2 of user's input is:Haidian fresh flower
Shop (Zhichun Road shop);Search engine server is to all Query Result addresses that user shows:URL 1、URL 2、URL 3、
URL 4 and URL 5;The Query Result address that all users click on is:URL 1, URL 2, URL 3 and URL 4.
Search engine server is for all Query Informations in 1 hour (i.e. in preset time range) (assuming that having:Look into
Ask information 1 and Query Information 2), remove in Query Information 1 " ", obtaining corresponding Query Information 1, " Haidian District Zhichun Road is fresh
Florist's shop ", and " bracket " in Query Information 2 is removed, obtain corresponding Query Information 2 " Haidian fresh flower shop Zhichun Road shop ".
Search engine server carries out word segmentation processing to Query Information 1 " Haidian District Zhichun Road fresh flower shop ", and what is got is each
Individual participle is:Haidian District, Zhichun Road fresh flower shop, and word segmentation processing is carried out to Query Information 2 " Haidian fresh flower shop Zhichun Road shop ",
The each participle got is:Haidian District, fresh flower shop Zhichun Road shop, i.e., each participle that search engine server is got are:
Haidian District, Zhichun Road fresh flower shop and fresh flower shop Zhichun Road shop.
Search engine server is classified to 3 of acquisition, such as:Participle " Haidian District " is ranged into " area
Class ", participle " Zhichun Road fresh flower shop " and participle " fresh flower shop Zhichun Road shop " are ranged " sweets shop class ".
Below only by taking participle " Zhichun Road fresh flower shop " and participle " fresh flower shop Zhichun Road shop " that " sweets shop class " includes as an example
It is described in detail.
Search engine server is directed to the participle " Zhichun Road fresh flower shop " that " sweets shop class " includes, and counts corresponding to the participle
The Query Result address that all users click on is:URL 1, URL 2 and URL 4;And for participle that " sweets shop class " includes
" fresh flower shop Zhichun Road shop ", counting the Query Result address that all users corresponding to the participle click on is:URL 1、URL 2、URL
3 and URL 4.
Search engine server is based on participle " Zhichun Road fresh flower shop ", and the participle (referred to below as segments " Zhichun Road fresh flower
Shop " is KW1) corresponding to the Query Result address clicked on of all users:URL 1, URL 2 and URL 4, generation user click on inquiry
Result address set 1, it is { KW1, URL 1, URL 2, URL 4 }.
Search engine server is based on participle " fresh flower shop Zhichun Road shop ", and the participle (referred to below as " know in fresh flower shop by participle
Chun Lu shops " are KW2) corresponding to the Query Result address clicked on of all users:URL 1, URL 2, URL 3 and URL 4, generation are used
Query Result address set 2 is clicked at family, is { KW2, URL 1, URL 2, URL 3, URL 4 }.
Preferably, in all Query Result addresses shown due to search engine server to user, partial query result
The degree of association between the Query Information that address may input with user is relatively low, so, in order to avoid due to search engine server
The Query Result address of offer is inaccurate, causes two gone out based on the Query Result address computation that search engine server provides
The problem of similarity degree of accuracy between participle is poor, in the embodiment of the present invention, the Query Result address clicked on according to user, meter
Count in stating the address similarity between the first participle and above-mentioned second participle, so, search engine server shows institute to user
After having Query Result address, because user can initiate to access and ask according to self-demand and expectation to corresponding Query Result address
Ask, so, the degree of association between the Query Information that the Query Result address and user that user clicks on input is higher, and then, according to
The degree of accuracy for the address similarity that the Query Result address computation that user clicks on goes out is also higher.
Specifically, each participle that search engine server includes for each classification, generates corresponding user and clicks on
After Query Result address set, for belong to the same category of first participle and second participle, calculate the above-mentioned first participle with
It is above-mentioned second participle between address similarity when, can use but be not limited in the following manner:
First, search engine server is based on the first user corresponding to the above-mentioned first participle and clicks on Query Result address set
Comprising all users click on Query Result address domain name, and it is above-mentioned second participle corresponding to second user click on inquiry knot
The Query Result address domain name that all users that fruit address set includes click on, the first Query Result address sum is calculated, wherein,
Above-mentioned first Query Result address sum characterizes above-mentioned first user and clicks on Query Result address set and above-mentioned second user point
Hit the summation of all Query Result addresses of Query Result address domain name identical between Query Result address set.
Then, all users that search engine server is included based on above-mentioned first user click Query Result address set
The Query Result address number of click, and above-mentioned second user are clicked on all users that Query Result address set includes and clicked on
Query Result address number, calculate the second Query Result address sum, wherein, above-mentioned second Query Result address sum characterizes
Above-mentioned first user clicks on all between Query Result address set and above-mentioned second user click Query Result address set look into
Ask the summation of result address.
Finally, search engine server is based on the total and above-mentioned second Query Result address in above-mentioned first Query Result address
Sum, calculate the address similarity between the above-mentioned first participle and above-mentioned second participle.
Specifically, search engine server computationally states the address similarity between the first participle and above-mentioned second participle
When, it can use but be not limited to following calculation:
Wherein, in above-mentioned formula (1), SIMchickedurl(KWi, KWi+1) is characterized between participle KWi and participle KWi+1
Address similarity,Characterize the first Query Result address sum, URL (KWi) ∪ URL (KWi+
1) the second Query Result address sum is characterized.
Such as:Continue to use the example above, search engine server user according to corresponding to KW1 clicks on Query Result address set
1 { KW1, URL 1, URL 2, URL 4 }, and user corresponding to KW2 click on { KW2, the URL 1, URL of Query Result address set 2
2, URL 3, URL 4 }, determine that user clicks on and looked between Query Result address set 1 and user's click Query Result address set 2
Asking all Query Result addresses of result address domain name identical is:URL 1, URL 2 and URL 4, further determine that the first inquiry
Result address sum is 3.
The Query Result that all users that search engine server includes according to { KW1, URL 1, URL 2, URL 4 } click on
Address number 3, and the Query Result address that { KW2, URL 1, URL 2, URL 3, URL 4 } all users for including click on
Number 4, determine that the second Query Result address sum is:3+4=7.
Search engine server calculates according to the first Query Result address sum 3 and the second Query Result address sum 7
Address similarity between KW1 and KW2 is:
Step 101:Calculate the literal similarity between the above-mentioned first participle and above-mentioned second participle;Wherein, it is above-mentioned literal
Phase between the second character group that the first character group that the above-mentioned first participle of similarity characterization includes includes with the above-mentioned second participle
Like degree.
Specifically, search engine server computationally states the literal similarity between the first participle and above-mentioned second participle
When, it can use but be not limited in the following manner:
First, search engine server counts the first character group that the above-mentioned first participle includes and included with the above-mentioned second participle
The second character group between all identical characters, all identical characters based on statistics, determine the above-mentioned first participle and above-mentioned the
Identical characters sum between two participles.
Then, search engine server is based on total and above-mentioned second character of the first character that above-mentioned first character group includes
The second character sum that group includes, determines that the total minimum character between above-mentioned second character sum of above-mentioned first character is total
Number.
Finally, search engine server is based on the total and above-mentioned minimum character sum of above-mentioned identical characters, calculating above-mentioned the
Literal similarity between one participle and above-mentioned second participle.
Tellable to be, search engine server is computationally stated literal between the first participle and above-mentioned second participle
During similarity, it can use but be not limited to following calculation:
Wherein, in above-mentioned formula (2), SIMtypeface(KWi, KWi+1) characterizes participle KWi and segments the word between KWi+1
Face similarity ,/KWi/ ∩/KWi+1/ characterize participle KWi and segment the identical characters sum between KWi+1, Min (/KWi/ ,/
KWi+1/) characterize participle KWi and segment the minimum character sum between KWi+1.
Such as:Continue to use the example above, the character group 1 that search engine server includes according to KW1:Zhichun Road fresh flower shop and
The character group 2 that KW2 is included:Fresh flower shop Zhichun Road shop, all identical characters counted between character group 1 and character group 2 are:Know the spring
Road fresh flower shop, and further according to all identical characters of statistics:Zhichun Road fresh flower shop, determines the same word between KW1 and KW2
Symbol sum is 6.
The character sum 6 that search engine server includes according to character group 1, and the character sum 7 that character group 2 includes
It is individual, it is determined that minimum character sum is 6.
Search engine server is 6 according to identical characters sum and minimum character sum is 6, calculates KW1 and KW2
Between literal similarity be:
Step 102:Based on address above mentioned similarity and above-mentioned literal similarity, the above-mentioned first participle and above-mentioned second are calculated
Comprehensive similarity between participle.
Specifically, search engine server computationally states the summation similarity between the first participle and above-mentioned second participle
When, it can use but be not limited in the following manner:
Search engine server is determined to characterize the first constant of address above mentioned similarity weight and characterized above-mentioned literal similar
After spending the second constant of weight, then based on address above mentioned similarity and above-mentioned first constant, and above-mentioned literal similarity and upper
State second constant, calculate the above-mentioned first participle and it is above-mentioned second participle between comprehensive similarity, wherein, above-mentioned first constant with
Above-mentioned second constant sum is 1.
Preferably, search engine server computationally states the summation similarity between the first participle and above-mentioned second participle
When, it can use but be not limited to following calculation:
SIMcombined(KWi, KWi+1)=α × SIMclickedurl(KWi, KWi+1)+β × SIMtypeface(KWi, KWi+1)
... ... formula (3)
Wherein, in above-mentioned formula (3), SIMcombined(KWi, KWi+1) characterizes comprehensive between participle KWi and participle KWi+1
Close similarity, SIMclickedurl(KWi, KWi+1) characterizes participle KWi and segments the address similarity between KWi+1, SIMtypeface
(KWi, KWi+1) characterizes participle KWi and segments the literal similarity between KWi+1, and α characterizes first constant, and it is normal that β characterizes second
Number.
Tellable to be, above-mentioned first constant and above-mentioned second constant can flexibly be matched somebody with somebody according to different application scenarios
Put, specifically, to improve address above mentioned similarity weight, then can increase above-mentioned first constant;It is above-mentioned literal to improve
Similarity weight, then it can increase above-mentioned second constant.
For example, continue to use the example above, it is assumed that first constant α=0.6, second constant β=0.4.
Search engine server is according to the address similarity between the KW1 and KW2 calculatedLiteral similarity:SIMtypeface(KW1, KW2)=1, and first constant α=0.6,
Second constant β=0.4, the comprehensive similarity calculated between KW1 and KW2 are:
Step 103:When determining that above-mentioned comprehensive similarity is not less than predetermined threshold value, the above-mentioned first participle and above-mentioned second are judged
Segment synonym each other.
In actual applications, search engine server determines the synthesis phase between the above-mentioned first participle and the above-mentioned second participle
When being not less than predetermined threshold value like degree, judge that the above-mentioned first participle segments synonym each other with above-mentioned second.It is tellable to be, it is above-mentioned
Predetermined threshold value can also flexibly be set according to different application scenarios.
Such as:Continue to use the example above, it is assumed that predetermined threshold value 60%.
After the comprehensive similarity that search engine server calculates between KW1 and KW2 is 65.7%, comprehensive similarity is determined
It is more than predetermined threshold value 60% for 65.7%, further determines that KW1 and KW2 synonyms each other.
Above-described embodiment is described in further detail using specific application scenarios below, as shown in fig.2, of the invention
In embodiment, the idiographic flow of synonymous word recognition method is as follows:
Step 200:Search engine server gathers user's inquiry log in real time.
Wherein, in the user journal information 1 collected, the Query Information of user's input is 1:Haidian District Zhichun Road it is fresh
Florist's shop;Search engine server is to all Query Result addresses that user shows:URL 1, URL 2, URL 3, the and of URL 4
URL 5;The Query Result address that all users click on is:URL 1, URL 2 and URL 4.
In the user journal information 2 collected, the Query Information 2 of user's input is:Haidian fresh flower shop (Zhichun Road shop);Search
Rope engine server is to all Query Result addresses that user shows:URL 1, URL 2, URL 3, URL 4 and URL 5;Institute
The Query Result address for having user to click on is:URL 1, URL 2, URL 3 and URL 4.
Step 201:Search engine server is for all Query Informations in 1 hour (assuming that having:The He of Query Information 1
Query Information 2), remove Query Information 1 in " ", obtain corresponding Query Information 1 " Haidian District Zhichun Road fresh flower shop ", and
" bracket " in Query Information 2 is removed, obtains corresponding Query Information 2 " Haidian fresh flower shop Zhichun Road shop ".
Step 202:Search engine server is to Query Information 1 " Haidian District Zhichun Road fresh flower shop " and " Haidian of Query Information 2
Fresh flower shop Zhichun Road shop " carries out word segmentation processing respectively, gets corresponding each participle and is:Haidian District, Zhichun Road fresh flower shop and
Fresh flower shop Zhichun Road shop.
Step 203:Search engine server is classified to 3 of acquisition, and participle " Haidian District " is ranged into "
Area's class ", participle " Zhichun Road fresh flower shop " and participle " fresh flower shop Zhichun Road shop " are ranged " sweets shop class ".
Below only by taking participle " Zhichun Road fresh flower shop " and participle " fresh flower shop Zhichun Road shop " that " sweets shop class " includes as an example
It is described in detail.
Step 204:Search engine server is directed to the participle " Zhichun Road fresh flower shop " that " sweets shop class " includes, and counts this point
The Query Result address that all users corresponding to word click on is:URL 1, URL 2 and URL 4;And wrapped for " sweets shop class "
The participle " fresh flower shop Zhichun Road shop " contained, counting the Query Result address that all users corresponding to the participle click on is:URL 1、
URL 2, URL 3 and URL 4.
Step 205:Search engine server is based on participle " Zhichun Road fresh flower shop ", and the participle (referred to below as " know by participle
Spring road fresh flower shop " is KW1) corresponding to the Query Result address clicked on of all users:URL1, URL 2 and URL 4, generate user
Query Result address set 1 is clicked on, is { KW1, URL 1, URL 2, URL 4 }.
Step 206:Search engine server is based on participle " fresh flower shop Zhichun Road shop ", and the participle (referred to below as segments
" fresh flower shop Zhichun Road shop " is KW2) corresponding to the Query Result address clicked on of all users:URL 1, URL 2, URL 3 and URL
4, generation user clicks on Query Result address set 2, is { KW2, URL 1, URL 2, URL 3, URL 4 }.
Step 207:Search engine server according to user click on Query Result address set 1 KW1, URL1, URL 2,
URL 4 }, and user's click Query Result address set 2 { KW2, URL 1, URL 2, URL 3, URL 4 }, it is determined that inquiry knot
All Query Result addresses of fruit address domain name identical are:URL 1, URL 2 and URL 4, further determine that the first Query Result
Address sum is 3.
Step 208:What all users that search engine server includes according to { KW1, URL 1, URL 2, URL 4 } clicked on
Query Result address number 3, and the inquiry that { KW2, URL 1, URL 2, URL 3, URL 4 } all users for including click on
Result address number 4, determine that the second Query Result address sum is:3+4=7.
Step 209:Search engine server is total according to the first Query Result address sum 3 and the second Query Result address
Number 7, the address similarity calculated between KW1 and KW2 are:
Step 210:The character group 1 that search engine server includes according to KW1:Zhichun Road fresh flower shop, and KW2 are included
Character group 2:Fresh flower shop Zhichun Road shop, all identical characters counted between character group 1 and character group 2 are:Zhichun Road fresh flower shop,
And further according to all identical characters of statistics:Zhichun Road fresh flower shop, determine that the identical characters sum between KW1 and KW2 is 6
It is individual.
Step 211:Character sum 6 that search engine server is included according to character group 1, and character group 2 include
Character sum 7, it is determined that minimum character sum is 6.
Step 212:Search engine server is 6 according to identical characters sum and minimum character sum is 6, calculates
Literal similarity between KW1 and KW2 is:
Step 213:Search engine server is according to the address similarity between the KW1 and KW2 calculatedLiteral similarity:SIMtypeface(KW1, KW2)=1, and first constant α=0.6,
Second constant β=0.4, the comprehensive similarity calculated between KW1 and KW2 are:
Step 214:Whether the comprehensive similarity 65.7% between KW1 and KW2 that search engine server judgement calculates
Not less than predetermined threshold value 60%, if so, then performing step 215;Otherwise, step 216 is performed.
Step 215:Search engine server determines KW1 and KW2 synonyms each other.
Step 216:Search engine server determines that KW1 and KW2 is not synonym.
Based on above-described embodiment, as shown in fig.3, in the embodiment of the present invention, synonym identification device, comprise at least:
First computing unit 303, for for belonging to the same category of first participle and the second participle, calculating above-mentioned the
Address similarity between one participle and above-mentioned second participle;Wherein, the above-mentioned first participle of address above mentioned similarity characterization is corresponding
The first user click on Query Result address set corresponding with the above-mentioned second participle second user click Query Result address set
Similarity between conjunction;
Second computing unit 304, for calculating the literal similarity between the above-mentioned first participle and above-mentioned second participle;Its
In, the first character group that the above-mentioned literal above-mentioned first participle of similarity characterization includes segments the second character included with above-mentioned second
Similarity between group;
3rd computing unit 305, for based on address above mentioned similarity and above-mentioned literal similarity, calculating above-mentioned first point
Comprehensive similarity between word and above-mentioned second participle;
Recognition unit 306, during for determining that above-mentioned comprehensive similarity is not less than predetermined threshold value, judge the above-mentioned first participle with
Above-mentioned second segments synonym each other.
Preferably, above-mentioned identification device also includes:Collecting unit 300, pretreatment unit 301, gather generation unit 302,
Wherein, above-mentioned first is calculated for belonging to the same category of first participle and the second participle in above-mentioned first computing unit 303
Before address similarity between participle and above-mentioned second participle,
Above-mentioned collecting unit 300, for gathering user's inquiry log, wherein, user's inquiry log comprises at least:With
The Query Information of family input, all Query Result addresses shown based on above-mentioned Query Information to user, and all users point
The Query Result address hit;
Above-mentioned pretreatment unit 301, for carrying out word segmentation processing respectively to all Query Informations in preset time range,
Corresponding each participle is obtained, and counts the Query Result address of all users' clicks corresponding to each participle respectively;
Above-mentioned set generation unit 302, for being clicked on based on all users corresponding to each participle and each participle
Query Result address, generate corresponding user respectively and click on Query Result address set.
Preferably, when calculating the address similarity between the above-mentioned first participle and above-mentioned second participle, above-mentioned first calculates
Unit 303 is specifically used for:
The Query Result address for all users click that Query Result address set includes is clicked on based on above-mentioned first user
Domain name, and above-mentioned second user click on the Query Result address field that all users that Query Result address set includes click on
Name, the first Query Result address sum is calculated, wherein, above-mentioned first Query Result address sum characterizes above-mentioned first user and clicked on
Query Result address set and above-mentioned second user click on Query Result address domain name identical between Query Result address set
The summation of all Query Result addresses;
The Query Result address for all users click that Query Result address set includes is clicked on based on above-mentioned first user
Number, and above-mentioned second user click on the Query Result number of addresses that all users that Query Result address set includes click on
Mesh, the second Query Result address sum is calculated, wherein, above-mentioned second Query Result address sum characterizes above-mentioned first user and clicked on
Query Result address set and above-mentioned second user click on the summation of all Query Result addresses between Query Result address set;
Based on the total and above-mentioned second Query Result address sum in above-mentioned first Query Result address, above-mentioned first point is calculated
Address similarity between word and above-mentioned second participle.
Preferably, when calculating the literal similarity between the above-mentioned first participle and above-mentioned second participle, above-mentioned second calculates
Unit 304 is specifically used for:
All identical characters between above-mentioned first character group and above-mentioned second character group are counted, it is all identical based on statistics
Character, determine the identical characters sum between the above-mentioned first participle and above-mentioned second participle;
The second character included based on total and above-mentioned second character group of the first character that above-mentioned first character group includes is total
Number, the total minimum character sum between above-mentioned second character sum of above-mentioned first character is determined,
Based on the total and above-mentioned minimum character sum of above-mentioned identical characters, the above-mentioned first participle and the above-mentioned second participle are calculated
Between literal similarity.
Preferably, being based on address above mentioned similarity and above-mentioned literal similarity, the above-mentioned first participle and above-mentioned second are calculated
During comprehensive similarity between participle, above-mentioned 3rd computing unit 305 is specifically used for:
It is determined that characterize the first constant of address above mentioned similarity weight and characterize the second normal of above-mentioned literal similarity weight
Number, wherein, above-mentioned first constant is 1 with above-mentioned second constant sum;
Based on address above mentioned similarity and above-mentioned first constant, and above-mentioned literal similarity and above-mentioned second constant, meter
Count in stating the comprehensive similarity between the first participle and above-mentioned second participle.
In summary, in the embodiment of the present invention, for belonging to the same category of first participle and the second participle, the is calculated
After address similarity and literal similarity between one participle and the second participle, further according to address similarity and literal similar
Degree, the comprehensive similarity between the first participle and the second participle is calculated, when determining that comprehensive similarity is not less than predetermined threshold value, judged
The first participle and second segments synonym each other.So, by calculating the comprehensive similarity between two participles, you can judge
Whether the two participles are synonyms, and the synonym between being segmented suitable for any two identifies, is also no longer dependent on advance volume
The thesaurus write, avoid because the word that thesaurus is covered is than relatively limited, lead to not identify asking for newborn synonym
Topic.Moreover, by being considered on both side from the address similarity between two participles and literal similarity so that calculate
Comprehensive similarity between two participles is more accurate, and then, improve the accuracy that synonym identifies.Further, for returning
Belong to same category of two participles and calculate comprehensive similarity, further increase the degree of accuracy of synonym identification.
It should be understood by those skilled in the art that, embodiments of the invention can be provided as method, system or computer program
Product.Therefore, the present invention can use the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware
Apply the form of example.Moreover, the present invention can use the computer for wherein including computer usable program code in one or more
The computer program production that usable storage medium is implemented on (including but is not limited to magnetic disk storage, CD-ROM, optical memory etc.)
The form of product.
The present invention is the flow with reference to method according to embodiments of the present invention, equipment (system) and computer program product
Figure and/or block diagram describe.It should be understood that can be by each in computer program instructions implementation process figure and/or block diagram
Flow and/or the flow in square frame and flow chart and/or block diagram and/or the combination of square frame.These computer journeys can be provided
Sequence instruction to all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices processor with
Produce a machine so that produce and be used for by the instruction of computer or the computing device of other programmable data processing devices
Realize the dress for the function of being specified in one flow of flow chart or multiple flows and/or one square frame of block diagram or multiple square frames
Put.
These computer program instructions, which may be alternatively stored in, can guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory, which produces, to be included referring to
Make the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one square frame of block diagram or
The function of being specified in multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that counted
Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, so as in computer or
The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one
The step of function of being specified in individual square frame or multiple square frames.
Although preferred embodiments of the present invention have been described, but those skilled in the art once know basic creation
Property concept, then can make other change and modification to these embodiments.So appended claims be intended to be construed to include it is excellent
Select embodiment and fall into having altered and changing for the scope of the invention.
Obviously, those skilled in the art can carry out various changes and modification without departing from this hair to the embodiment of the present invention
The spirit and scope of bright embodiment.So, if these modifications and variations of the embodiment of the present invention belong to the claims in the present invention
And its within the scope of equivalent technologies, then the present invention is also intended to comprising including these changes and modification.
Claims (10)
- A kind of 1. recognition methods of synonym, it is characterised in that including:For belonging to the same category of first participle and the second participle, calculate between the first participle and second participle Address similarity;Wherein, the first user corresponding to the first participle described in the address similarity characterization clicks on Query Result Gather the similarity between second user click Query Result address set corresponding with the described second participle in location;Calculate the literal similarity between the first participle and second participle;Wherein, the literal similarity characterization institute State the similarity between the first character group that the first participle includes and the second character group that second participle includes;Based on the address similarity and the literal similarity, calculate comprehensive between the first participle and second participle Close similarity;When determining that the comprehensive similarity is not less than predetermined threshold value, judge that the first participle and the described second participle are synonymous each other Word.
- 2. recognition methods as claimed in claim 1, it is characterised in that for belonging to the same category of first participle and second Participle, before calculating the address similarity between the first participle and second participle, further comprise:User's inquiry log is gathered, wherein, user's inquiry log comprises at least:The Query Information of user's input, based on institute State all Query Result addresses that Query Information is shown to user, and the Query Result address that all users click on;All Query Informations in preset time range are carried out with word segmentation processing respectively, obtains corresponding each participle, and respectively The Query Result address that all users corresponding to counting each participle click on;The Query Result address clicked on based on all users corresponding to each participle and each participle, generation is corresponding respectively User clicks on Query Result address set.
- 3. recognition methods as claimed in claim 1 or 2, it is characterised in that calculate the first participle and the described second participle Between address similarity, including:The Query Result address domain name for all users click that Query Result address set includes is clicked on based on first user, And the second user clicks on the Query Result address domain name that all users that Query Result address set includes click on, and calculates First Query Result address sum, wherein, the first Query Result address sum characterizes first user and clicks on inquiry knot Query Result address domain name identical is all between fruit address set and second user click Query Result address set looks into Ask the summation of result address;The Query Result address number for all users click that Query Result address set includes is clicked on based on first user, And the second user clicks on the Query Result address number that all users that Query Result address set includes click on, and calculates Second Query Result address sum, wherein, the second Query Result address sum characterizes first user and clicks on inquiry knot Fruit address set and the second user click on the summation of all Query Result addresses between Query Result address set;Based on the total and described second Query Result address in the first Query Result address sum, calculate the first participle with Address similarity between second participle.
- 4. recognition methods as claimed in claim 1 or 2, it is characterised in that calculate the first participle and the described second participle Between literal similarity, including:All identical characters between first character group and second character group are counted, all same words based on statistics Symbol, determine the identical characters sum between the first participle and second participle;The the second character sum included based on total and described second character group of the first character that first character group includes, really The total minimum character sum between the second character sum of fixed first character,Based on the total and described minimum character sum of the identical characters, calculate between the first participle and second participle Literal similarity.
- 5. the recognition methods as described in claim any one of 1-4, it is characterised in that based on the address similarity and the word Face similarity, the comprehensive similarity between the first participle and second participle is calculated, including:It is determined that characterize the first constant of the address similarity weight and characterize the second constant of the literal similarity weight, its In, the first constant is 1 with the second constant sum;Based on the address similarity and the first constant, and the literal similarity and the second constant, institute is calculated State the comprehensive similarity between the first participle and second participle.
- A kind of 6. identification device of synonym, it is characterised in that including:First computing unit, for for belonging to the same category of first participle and the second participle, calculating the first participle With the address similarity between the described second participle;Wherein, first corresponding to the first participle described in the address similarity characterization User is clicked between Query Result address set second user click Query Result address set corresponding with the described second participle Similarity;Second computing unit, for calculating the literal similarity between the first participle and second participle;Wherein, it is described Between the second character group that the first character group and second participle that the first participle described in literal similarity characterization includes include Similarity;3rd computing unit, for based on the address similarity and the literal similarity, calculating the first participle and institute State the comprehensive similarity between the second participle;Recognition unit, during for determining that the comprehensive similarity is not less than predetermined threshold value, judge the first participle and described the Two segment synonym each other.
- 7. identification device as claimed in claim 6, it is characterised in that also include:Collecting unit, pretreatment unit, Yi Jiji Generation unit is closed, wherein, in first computing unit for belonging to the same category of first participle and the second participle, calculate Before address similarity between the first participle and second participle,The collecting unit, for gathering user's inquiry log, wherein, user's inquiry log comprises at least:User inputs Query Information, all Query Result addresses shown based on the Query Information to user, and all users click on look into Ask result address;The pretreatment unit, for all Query Informations in preset time range to be carried out with word segmentation processing respectively, obtain phase The each participle answered, and the Query Result address of all users' clicks corresponding to each participle is counted respectively;The set generation unit, for the inquiry knot clicked on based on all users corresponding to each participle and each participle Fruit address, corresponding user is generated respectively and clicks on Query Result address set.
- 8. identification device as claimed in claims 6 or 7, it is characterised in that calculate the first participle and the described second participle Between address similarity when, first computing unit is specifically used for:The Query Result address domain name for all users click that Query Result address set includes is clicked on based on first user, And the second user clicks on the Query Result address domain name that all users that Query Result address set includes click on, and calculates First Query Result address sum, wherein, the first Query Result address sum characterizes first user and clicks on inquiry knot Query Result address domain name identical is all between fruit address set and second user click Query Result address set looks into Ask the summation of result address;The Query Result address number for all users click that Query Result address set includes is clicked on based on first user, And the second user clicks on the Query Result address number that all users that Query Result address set includes click on, and calculates Second Query Result address sum, wherein, the second Query Result address sum characterizes first user and clicks on inquiry knot Fruit address set and the second user click on the summation of all Query Result addresses between Query Result address set;Based on the total and described second Query Result address in the first Query Result address sum, calculate the first participle with Address similarity between second participle.
- 9. identification device as claimed in claims 6 or 7, it is characterised in that calculate the first participle and the described second participle Between literal similarity when, second computing unit is specifically used for:All identical characters between first character group and second character group are counted, all same words based on statistics Symbol, determine the identical characters sum between the first participle and second participle;The the second character sum included based on total and described second character group of the first character that first character group includes, really The total minimum character sum between the second character sum of fixed first character,Based on the total and described minimum character sum of the identical characters, calculate between the first participle and second participle Literal similarity.
- 10. the identification device as described in claim any one of 6-9, it is characterised in that based on the address similarity and described Literal similarity, when calculating the comprehensive similarity between the first participle and second participle, the 3rd computing unit It is specifically used for:It is determined that characterize the first constant of the address similarity weight and characterize the second constant of the literal similarity weight, its In, the first constant is 1 with the second constant sum;Based on the address similarity and the first constant, and the literal similarity and the second constant, institute is calculated State the comprehensive similarity between the first participle and second participle.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610641371.7A CN107688563B (en) | 2016-08-05 | 2016-08-05 | Synonym recognition method and recognition device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610641371.7A CN107688563B (en) | 2016-08-05 | 2016-08-05 | Synonym recognition method and recognition device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107688563A true CN107688563A (en) | 2018-02-13 |
CN107688563B CN107688563B (en) | 2021-03-19 |
Family
ID=61152084
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610641371.7A Active CN107688563B (en) | 2016-08-05 | 2016-08-05 | Synonym recognition method and recognition device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107688563B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110309432A (en) * | 2018-06-11 | 2019-10-08 | 腾讯科技(北京)有限公司 | Method, map point of interest processing method are determined based on the synonym of point of interest |
CN110427381A (en) * | 2019-08-07 | 2019-11-08 | 北京嘉和海森健康科技有限公司 | A kind of data processing method and relevant device |
CN111126048A (en) * | 2019-12-25 | 2020-05-08 | 腾讯科技(深圳)有限公司 | Candidate synonym determination method, device, server and storage medium |
CN113326686A (en) * | 2020-02-28 | 2021-08-31 | 株式会社斯库林集团 | Similarity calculation device, recording medium, and similarity calculation method |
CN113343688A (en) * | 2021-06-22 | 2021-09-03 | 南京星云数字技术有限公司 | Address similarity determination method and device and computer equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101576916A (en) * | 2009-06-18 | 2009-11-11 | 清华大学 | Method and device for obtaining synonyms |
CN102184169A (en) * | 2011-04-20 | 2011-09-14 | 北京百度网讯科技有限公司 | Method, device and equipment used for determining similarity information among character string information |
CN103106189A (en) * | 2011-11-11 | 2013-05-15 | 北京百度网讯科技有限公司 | Method and device for excavating synonymous attribute words |
CN103136223A (en) * | 2011-11-24 | 2013-06-05 | 北京百度网讯科技有限公司 | Method and device for mining query with similar requirements |
US20140297261A1 (en) * | 2013-03-28 | 2014-10-02 | Hewlett-Packard Development Company, L.P. | Synonym determination among n-grams |
-
2016
- 2016-08-05 CN CN201610641371.7A patent/CN107688563B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101576916A (en) * | 2009-06-18 | 2009-11-11 | 清华大学 | Method and device for obtaining synonyms |
CN102184169A (en) * | 2011-04-20 | 2011-09-14 | 北京百度网讯科技有限公司 | Method, device and equipment used for determining similarity information among character string information |
CN103106189A (en) * | 2011-11-11 | 2013-05-15 | 北京百度网讯科技有限公司 | Method and device for excavating synonymous attribute words |
CN103136223A (en) * | 2011-11-24 | 2013-06-05 | 北京百度网讯科技有限公司 | Method and device for mining query with similar requirements |
US20140297261A1 (en) * | 2013-03-28 | 2014-10-02 | Hewlett-Packard Development Company, L.P. | Synonym determination among n-grams |
Non-Patent Citations (1)
Title |
---|
侯汉清 等: "利用字面相似度识别汉语同义词的实验", 《第15届全国计算机信息管理学术研讨会论文集》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110309432A (en) * | 2018-06-11 | 2019-10-08 | 腾讯科技(北京)有限公司 | Method, map point of interest processing method are determined based on the synonym of point of interest |
CN110309432B (en) * | 2018-06-11 | 2024-06-07 | 腾讯科技(北京)有限公司 | Synonym determining method based on interest points and map interest point processing method |
CN110427381A (en) * | 2019-08-07 | 2019-11-08 | 北京嘉和海森健康科技有限公司 | A kind of data processing method and relevant device |
CN111126048A (en) * | 2019-12-25 | 2020-05-08 | 腾讯科技(深圳)有限公司 | Candidate synonym determination method, device, server and storage medium |
CN113326686A (en) * | 2020-02-28 | 2021-08-31 | 株式会社斯库林集团 | Similarity calculation device, recording medium, and similarity calculation method |
CN113326686B (en) * | 2020-02-28 | 2024-05-10 | 株式会社斯库林集团 | Similarity calculation device, recording medium, and similarity calculation method |
CN113343688A (en) * | 2021-06-22 | 2021-09-03 | 南京星云数字技术有限公司 | Address similarity determination method and device and computer equipment |
Also Published As
Publication number | Publication date |
---|---|
CN107688563B (en) | 2021-03-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107688563A (en) | A kind of recognition methods of synonym and identification device | |
JP6211605B2 (en) | Ranking search results based on click-through rate | |
CN105302810B (en) | A kind of information search method and device | |
US20210042664A1 (en) | Model training and service recommendation | |
US8751470B1 (en) | Context sensitive ranking | |
CN104391999B (en) | Information recommendation method and device | |
US9317550B2 (en) | Query expansion | |
CN105183781B (en) | Information recommendation method and device | |
US20150356072A1 (en) | Method and Apparatus of Matching Text Information and Pushing a Business Object | |
WO2008106668A1 (en) | User query mining for advertising matching | |
US20220383427A1 (en) | Method and apparatus for group display | |
CN107908616B (en) | Method and device for predicting trend words | |
WO2013192093A1 (en) | Search method and apparatus | |
CN104537341A (en) | Human face picture information obtaining method and device | |
TW201923629A (en) | Data processing method and apparatus | |
CN105930507A (en) | Method and apparatus for obtaining Web browsing interest of user | |
CN103425650A (en) | Recommendation searching method and recommendation searching system | |
WO2010096986A1 (en) | Mobile search method and device | |
CN106919576A (en) | Using the method and device of two grades of classes keywords database search for application now | |
CN105653546B (en) | A kind of search method and system of target topic | |
CN104933099B (en) | Method and device for providing target search result for user | |
CN105095203B (en) | Determination, searching method and the server of synonym | |
CN110674387A (en) | Method, apparatus, and computer storage medium for data search | |
CN104408036A (en) | Correlated topic recognition method and device | |
CN105357189B (en) | Corpse account detection method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |