CN109918661A - Synonym acquisition methods and device - Google Patents
Synonym acquisition methods and device Download PDFInfo
- Publication number
- CN109918661A CN109918661A CN201910160822.9A CN201910160822A CN109918661A CN 109918661 A CN109918661 A CN 109918661A CN 201910160822 A CN201910160822 A CN 201910160822A CN 109918661 A CN109918661 A CN 109918661A
- Authority
- CN
- China
- Prior art keywords
- synonym
- sentence
- candidate
- candidate synonym
- query
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The embodiment of the present application discloses a kind of synonym acquisition methods and device, number using the corresponding sentence pair of two keywords is bigger, two keywords are the bigger thought of the probability of synonym, for the corresponding sentence of two keywords to referring to that the query statement of the sentence pair includes a keyword, the hit results sentence of the sentence pair includes another keyword;At least one corresponding candidate synonym of query word is obtained first, obtain the corresponding weighting co-occurrence frequency of at least one described candidate synonym, the weighting co-occurrence frequency of one candidate synonym at least characterizes the number for the first sentence pair that sentence set includes, and the query statement of the first sentence centering is comprising query word and the hit results sentence of the first sentence centering includes the candidate synonym.At least based on the corresponding weighting co-occurrence frequency of at least one described candidate synonym, the synonym of the query word is obtained from least one described candidate synonym.To realize the purpose for the synonym for obtaining query word.
Description
Technical field
This application involves natural language processing technique fields, and more specifically, it relates to synonym acquisition methods and devices.
Background technique
The synonym involved in human-computer interaction application scenarios excavates, for example, being inputted during intelligent search based on user
The query statement keyword for including and the keyword excavated synonym, available more comprehensive retrieval knot
Fruit.
To sum up, synonym excavation is of great significance to human-computer interaction scene.
Summary of the invention
In view of this, the embodiment of the present application provides a kind of synonym acquisition methods and device, to realize acquisition synonym
Purpose.
To achieve the above object, the invention provides the following technical scheme:
A kind of synonym acquisition methods, comprising:
At least one corresponding candidate synonym of query word that query statement includes is obtained, the candidate synonym is described
The word that the corresponding hit results sentence of query statement includes;
Obtain the corresponding weighting co-occurrence frequency of at least one described candidate synonym;The weighting of one candidate synonym
The co-occurrence frequency at least characterizes the number for the first sentence pair that sentence set includes, and the query statement of the first sentence centering includes
The query word and hit results sentence of the first sentence centering includes the candidate synonym;
At least based on the corresponding weighting co-occurrence frequency of at least one described candidate synonym, from least one described time
Select the synonym that the query word is obtained in synonym.
A kind of synonym acquisition device, comprising:
First obtains module, for obtaining at least one corresponding candidate synonym of query word that query statement includes, institute
Stating candidate synonym is the word that the corresponding hit results sentence of the query statement includes;
Second obtains module, for obtaining the corresponding weighting co-occurrence frequency of at least one described candidate synonym;Its
In, the weighting co-occurrence frequency of a candidate synonym at least characterizes the number for the first sentence pair that sentence set includes, and described
The query statement of one sentence centering is comprising the query word and the hit results sentence of the first sentence centering includes the candidate
Synonym;
Third obtains module, at least based on the corresponding weighting co-occurrence frequency of at least one described candidate synonym
It is secondary, the synonym of the query word is obtained from least one described candidate synonym.
It can be seen via above technical scheme that compared with prior art, the embodiment of the present application discloses a kind of synonym and obtains
Method is taken, the number using the corresponding sentence pair of two keywords is bigger, and two keywords are the bigger think ofs of the probability of synonym
Think, the corresponding sentence of two keywords includes a keyword, the hit of the sentence pair to the query statement for referring to the sentence pair
As a result sentence includes another keyword;At least one corresponding candidate synonym of query word is obtained first, and acquisition is described at least
The corresponding weighting co-occurrence frequency of one candidate synonym, the weighting co-occurrence frequency of a candidate synonym at least characterize sentence
The number for the first sentence pair that set includes, the query statement of the first sentence centering include the query word and described first
The hit results sentence of sentence centering includes the candidate synonym.At least respectively corresponded based at least one described candidate synonym
The weighting co-occurrence frequency, the synonym of the query word is obtained from least one described candidate synonym.It is obtained to realize
Take the purpose of the synonym of query word.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
The embodiment of invention for those of ordinary skill in the art without creative efforts, can also basis
The attached drawing of offer obtains other attached drawings.
Fig. 1 a show intelligent robot system provided by the embodiments of the present application or intelligent Answer System or man-machine chat system
A kind of realization interface schematic diagram of system or intelligent customer service system;
Fig. 1 b to Fig. 1 c is a kind of schematic diagram for realizing interface of intelligent searching system provided by the embodiments of the present application;
Fig. 1 d to 1e is a kind of structure chart for realizing interface of machine authoring system provided by the embodiments of the present application;
Fig. 2 is a kind of structure chart for implementation that synonym provided by the embodiments of the present application obtains system;
Fig. 3 is a kind of flow chart of implementation of synonym acquisition methods provided by the embodiments of the present application;
Fig. 4 is a kind of schematic diagram of implementation of sentence set provided by the embodiments of the present application;
Fig. 5 inhibits the first parameter or the corresponding dependent variable of the second parameter using first function to be provided by the embodiments of the present application
At a kind of schematic diagram for implementation that multiple proportion increases;
Fig. 6 is a kind of schematic diagram of implementation of synonym acquisition device provided by the embodiments of the present application;
Fig. 7 is a kind of implementation schematic diagram of electronic equipment provided by the embodiments of the present application.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
Synonym acquisition methods provided by the embodiments of the present application can be applied to human-computer interaction application scenarios, and human-computer interaction is answered
It include but is not limited to following application scenarios with scene: intelligent robot system, man-machine chat system, intelligent Answer System, intelligence
Customer service system, intelligent searching system, machine authoring system.
Wherein, intelligent robot system, intelligent customer service system, man-machine chat system, intelligent Answer System are to utilize intelligence
Energy machine replaces artificial customer service, i.e. realization human-computer dialogue.User can input voice or text, and intelligence machine is according to voice or text
The keyword and the corresponding synonym of the keyword that word includes obtain corresponding feedback result, which can be voice
Or text.
Optionally, intelligent robot system, intelligent customer service system, man-machine chat system, intelligent Answer System can use
The synonym of keyword quickly understands the true intention of user, to make accurate feedback result.
It is understood that feedback result may include a plurality of result sentence;User can therefrom select needed for oneself
The voice of user's input or text are known as query statement in the embodiment of the present application by as a result sentence, by user from one or more
As a result the result sentence needed for oneself is selected to be known as hit results sentence in sentence.
It is understood that feedback result may only have a result sentence, for example, intelligent robot system or man-machine chatting
When its system or intelligent Answer System are the problem of replying user, a result sentence may be directly fed back, then the result language
Sentence is the hit results sentence in the embodiment of the present application.
It as shown in Figure 1a, is intelligent robot system provided by the embodiments of the present application or intelligent Answer System or man-machine chat
A kind of realization interface schematic diagram of system or intelligent customer service system.
As shown in Figure 1a, user input voice or text " where is the site of Construction Bank? ", optionally, intelligent robot
System or intelligent Answer System or man-machine chat system or intelligent customer service system can be based on provided by the embodiments of the present application synonymous
Word acquisition methods, obtain in real time query statement " where is the site of Construction Bank? " in the corresponding synonym of each keyword, example
Such as, " Construction Bank " corresponding synonym, " site " corresponding synonym, " where " corresponding synonym;So as to fast understanding
The true intention of user.
The query word that the embodiment of the present application refers to is the keyword for including in query statement.
Optionally, intelligent robot system or intelligent Answer System or man-machine chat system or intelligent customer service system can wrap
Include database of synonyms, database of synonyms be stored with obtained using synonym acquisition methods provided by the embodiments of the present application it is each
The corresponding synonym of keyword, intelligent robot system or intelligent Answer System or man-machine chat system or intelligent customer service system
System, can directly be obtained from database of synonyms " Construction Bank " corresponding synonym, " site " corresponding synonym, " where " it is right
The synonym answered.
Intelligent robot system or intelligent Answer System or man-machine chat system or intelligent customer service system understanding user's is true
It after sincere figure, replys and oneself thinks more accurately feedback result, such as " the nearest address of Construction Bank is located at XX ";So,
For the embodiment of the present application, " where is the site of Construction Bank? " it is properly termed as query statement, " the nearest ground of Construction Bank
Location is located at XX " and is properly termed as hit results sentence.
Optionally, intelligent searching system may include information retrieval system.Intelligent searching system may include webpage client
End or application client, user can intelligently search in webpage client or application client input voice or text
The keyword and the corresponding synonym of the keyword that the voice or text that cable system can be inputted based on user include, obtain phase
The feedback result answered, so that user checks.Feedback result can be text or voice.
It is understood that feedback result may include one or more result sentence;User can therefrom select oneself
The voice of user's input or text are known as query statement in the embodiment of the present application, by user from one by required result sentence
Or the result sentence needed for oneself is selected to be known as hit results sentence in a plurality of result sentence.
It is a kind of signal for realizing interface of intelligent searching system provided by the embodiments of the present application as shown in Fig. 1 b to Fig. 1 c
Figure.
Intelligent searching system may include webpage client or application client.Assuming that user in webpage client or
Application client inputs " how much is the loan interest of Construction Bank " this query statement, as shown in Figure 1 b.
After user clicks search key 11, intelligent searching system can obtain query statement, and " loan interest of Construction Bank is more
It is few ";Query statement is divided, keyword " Construction Bank ", " loan ", " interest " are obtained.
Optionally, intelligent searching system can use synonym acquisition methods provided by the embodiments of the present application, obtain in real time
The synonym of " Construction Bank ", the synonym of the synonym of " loan " and " interest ", and by the corresponding synonym of the keyword of acquisition
It stores into database of synonyms.
Optionally, intelligent searching system may include database of synonyms, is stored in database of synonyms and utilizes this Shen
Please the obtained corresponding synonym of each keyword of synonym acquisition methods that provides of embodiment, intelligent searching system can be from
Synonym, the synonym of " loan " and the synonym of " interest " of " Construction Bank " are obtained in database of synonyms;Assuming that " Construction Bank "
Synonym include: " Construction Bank ", " China Construction Bank ";The synonym of " loan " includes: " loaning bill ";" interest " it is synonymous
Word includes: " interest ", " breath gold ", " interest rate ".
Intelligent searching system can obtain retrieval type, base based on " Construction Bank ", " loan ", " interest " and corresponding synonym
It is retrieved in the retrieval type, to obtain search result, and feeds back to webpage client or application client.
Optionally, based on " Construction Bank ", " loan ", " interest " and accordingly synonym obtain retrieval type can be with are as follows: (Construction Bank
Or China Construction Bank, or Construction Bank) and (loan or borrows money) and (interest or interest or ceases gold or interest rate).
Webpage client or application client can show search result as illustrated in figure 1 c.It is searched for shown in Fig. 1 c
As a result by taking four result sentences as an example.If user's selection " do are 2018 Construction Bank's loan interest rates how many? Construction Bank's terms of loan
Which is there? _ melt 360 ", then " 2018 Construction Bank's loan interest rates are how many? which do Construction Bank's terms of loan have? _ melt 360 "
For hit results sentence.
Machine authoring system refers to that intelligent robot system is write in the information automatically grabbed according to preset structure
Original text.After completing contribution, machine authoring system can on the basis of original contribution, the keyword that includes using original contribution it is same
The purpose of adopted word may be implemented to carry out original contribution text rewriting, and keyword is replaced.
It is a kind of structure for realizing interface of machine authoring system provided by the embodiments of the present application as shown in Fig. 1 d to 1e
Figure.
As shown in Figure 1 d, the original contribution completed for machine authoring system;It is as shown in fig. le machine authoring system
Modified contribution.
In order to protrude change part, the part changed in Fig. 1 e with overstriking underscore mark, for example, will be changed to " in midair "
Original statement " Large White falls in midair " in the embodiment of the present application, can be known as query statement, by sentence after change by " aerial "
" Large White falls in the sky " is known as hit results sentence.
Below with reference to above-mentioned application scenarios, synonym acquisition methods provided by the embodiments of the present application are illustrated.
It is understood that intelligent robot system or man-machine chat system or intelligent Answer System or intelligent customer service system
Or intelligent searching system or machine authoring system may include terminal device and server;Alternatively, man-machine chat system or intelligence
Energy question answering system or intelligent customer service system or intelligent searching system or machine authoring system only include terminal device.
Terminal device can be desktop computer or mobile terminal (such as smart phone) or ipad or intelligent robot etc.
Electronic equipment.
Above-mentioned server can be a server, be also possible to the server cluster consisted of several servers, or
Person is a cloud computing service center.
As shown in Fig. 2, obtaining a kind of structure chart of implementation of system for synonym provided by the embodiments of the present application.Together
Adopted word obtain system can for intelligent robot system or man-machine chat system or intelligent Answer System or intelligent customer service system or
Intelligent searching system or machine authoring system.
It includes: terminal device 21 and server 22 that the synonym, which obtains system,.
Terminal device 21 is for obtaining query statement.
Optionally, which can be the text or voice of user's input;Optionally, which can be original
Original statement in beginning contribution.
Optionally, terminal device 21 can obtain at least one query word that query statement includes.
Query word is any vocabulary in query statement in the embodiment of the present application.
Still for shown in Fig. 1 b, query statement is " how much is the loan interest of Construction Bank ", the query word packet in query statement
Include: Construction Bank ", " loan ", " interest ".
Optionally, server 22 is for obtaining at least one query word that query statement includes.
In an alternative embodiment, server 22 is also used to using synonym acquisition methods provided by the embodiments of the present application,
The corresponding synonym of at least one query word is obtained in real time.
In another alternative embodiment, server 22 is also used to obtain at least one query word from database of synonyms 23
Corresponding synonym.
Wherein, database of synonyms 23 be stored with obtained using synonym acquisition methods provided by the embodiments of the present application it is each
The corresponding synonym of query word.
In an alternative embodiment, database of synonyms 23 may belong to server 22;In another alternative embodiment, together
Adopted word database 23 can be independently of server 22.
Server 22 also at least one query word and the corresponding synonym of at least one query word is based on, is fed back
As a result.
In an alternative embodiment, server 22 is based at least one query word and at least one query word is corresponding same
Adopted word identifies the true intention of user, to obtain a more accurately result.
In another alternative embodiment, server 22 is based at least one query word and at least one query word is corresponding
Synonym is obtained than more comprehensive feedback result.
Optionally, still by taking Fig. 1 b as an example, server 22 can based on " Construction Bank ", " loan ", " interest " and it is corresponding together
Adopted word obtains retrieval type, is retrieved based on the retrieval type, to obtain feedback result, and feeds back to terminal device 21.
Optionally, based on " Construction Bank ", " loan ", " interest " and accordingly synonym obtain retrieval type can be with are as follows: (Construction Bank
Or China Construction Bank, or Construction Bank) and (loan or borrows money) and (interest or interest or ceases gold or interest rate).
Terminal device 21 is also used to play or show feedback result.
Synonym as shown in connection with fig. 2 obtains system, is illustrated to synonym acquisition methods, as shown in figure 3, being this Shen
Please a kind of flow chart of implementations of synonym acquisition methods that provides of embodiment, which includes:
Step S301: obtaining at least one corresponding candidate synonym of query word that query statement includes, described candidate same
Adopted word is the word that the corresponding hit results sentence of the query statement includes.
It is assumed that query statement is " how much is the loan interest of Construction Bank ", the corresponding hit results sentence of the query statement is
" do are 2018 Construction Bank's loan interest rates how many? which do Construction Bank's terms of loan have? _ melt 360 ".
It is assumed that " how much is the loan interest of Construction Bank " is divided, obtained keyword includes: Construction Bank, loan, interest;
Will " do are 2018 Construction Bank's loan interest rates how many? which do Construction Bank's terms of loan have? _ melt 360 " divide, obtained pass
Keyword include: Construction Bank, loan, interest rate, condition, which.
It is possible to determine query word from least one keyword that query statement includes.For example, " Construction Bank " is true
It is set to query word.
It is understood that the keyword for including in hit results sentence is likely to be the synonym of " Construction Bank ", therefore,
Optionally, all keywords for including by hit results sentence are used as candidate synonym.
Flow chart shown in Fig. 3 is the method for obtaining the synonym of a query word, if desired obtains multiple queries word point
Not corresponding synonym, can be performed a plurality of times process step shown in Fig. 3, alternatively, executing process step shown in Fig. 3 parallel.
Step S302: the corresponding weighting co-occurrence frequency of at least one described candidate synonym is obtained;Wherein, a time
The weighting co-occurrence frequency of synonym is selected at least to characterize the number for the first sentence pair that sentence set includes, the first sentence centering
Query statement include the hit results sentence of the query word and the first sentence centering include the candidate synonym.
Sentence set is illustrated below.
As shown in figure 4, a kind of schematic diagram of implementation for sentence set provided by the embodiments of the present application.
Query statement is indicated with query in the embodiment of the present application, hit results sentence is indicated with " title ".
Assuming that the corresponding query statement of same title includes: query1, query2, query3 ..., query n, then
Sentence set may include set 41.Wherein, n is the positive integer more than or equal to 1.
Assuming that the corresponding hit results sentence of same query includes: title1, title2, title3 ..., title m,
So sentence set may include set 42.Wherein, m is the positive integer more than or equal to 1.
Set 41 and set 42 shown in Fig. 4 may have intersection, it is also possible to not have intersection.
Optionally, sentence set may include: set 41;Alternatively, sentence set includes: set 42;Alternatively, sentence set
It include: set 41 and set 42.Fig. 4 is only a kind of example.
In an alternative embodiment, if sentence set includes set 41 and set 42, then set 41 and set 42
It can have following at least one association:
Query in set 42 belongs to the query in set 41, for example, the query in set 42 includes: in set 41
Query1, and/or, query2, and/or, query3 etc..
And/or
Title in set 41 belongs to the title in set 42, for example, the title in set 41 includes: in set 42
Title1, and/or, title2, and/or, title3 etc..
If the query in set 42 includes: query1, the query2 gathered in 41, and, query3, then set 42
It include: the corresponding one or more hit results sentence of query1, the corresponding one or more hit results sentence of query2, with
And the corresponding one or more hit results sentence of query3.
If the title in set 41 includes: title1, the title2 gathered in 42, and, title3, then set 41
It include: corresponding one or more query statement of title1, corresponding one or more query statement of title2 and title3
Corresponding one or more query statement.
In an alternative embodiment, if sentence collection is combined into set 41, optionally, sentence set only includes the same title;
Optionally, sentence set includes multiple and different title.In an alternative embodiment, if sentence collection is combined into set 42, optionally,
Sentence set only includes same query;Optionally, sentence set includes multiple and different query.
In an alternative embodiment, the weighting co-occurrence frequency of a candidate synonym at least characterizes that sentence set includes
The number of one sentence pair includes following any case:
The first situation: query statement includes query word in the weighting co-occurrence frequency=sentence set of a candidate synonym
And corresponding hit results sentence includes the number of the first sentence pair of the candidate synonym.
Assuming that query word is referred to as w1, a candidate synonym is w2 at least one described candidate synonym, in an optional reality
It applies in example, the w1 that the weighting co-occurrence frequency=sentence set of candidate synonym w2 includes, the number of corresponding first sentence pair of w2.
Wherein, corresponding first sentence of w1, w2 includes query word and hit knot to the query statement for referring to the first sentence pair
Fruit sentence includes candidate synonym w2.Optionally, corresponding first sentence of w1, w2 to can also become w2 pairs of candidate synonym
The first sentence pair answered.
Second situation: w1 is utilized, the number of corresponding first sentence pair of w2 is bigger, and w1, w2 are that the probability of synonym is got over
Greatly;The number of corresponding second sentence of w1, w2 is bigger, and w1, w2 are the smaller thought of the probability of same word.
Wherein, corresponding second sentence of w1, w2 refers to the second sentence to comprising query word and include candidate synonym w2.
Optionally, corresponding second sentence of w1, w2 can also become corresponding second sentence of candidate synonym w2.
In an alternative embodiment, the weighting co-occurrence frequency of any one candidate synonym w2 is obtained, comprising:
Obtain in the sentence set that query statement includes the query word and corresponding hit results sentence includes the time
Select the first number of the first sentence pair of synonym w2;
Obtain in the sentence set comprising the query word and include candidate synonym w2 the second sentence second
Number, second sentence are query statement or hit results sentence;
Based on first number and second number, the weighting co-occurrence frequency of candidate synonym w2 is obtained.
In an alternative embodiment, the weighting co-occurrence frequency and the first number of candidate synonym w2 is positively correlated and the second number
Mesh is negatively correlated.
First number is the number of query word the first sentence pair corresponding with candidate synonym w2;Second number is
The query word and candidate synonym w2 appear in the number of same second sentence in the sentence set, and the second sentence is to look into
Ask sentence or hit results sentence.
In an alternative embodiment, the calculation formula of the weighting co-occurrence frequency of a candidate synonym w2 be can be such that
In an alternative embodiment, if sentence collection is combined into set 41, then the second sentence is query statement;Another optional
In embodiment, if sentence collection is combined into set 42, then the second sentence is hit results sentence;In a further alternative embodiment, if
Sentence set includes set 41 and set 42, then the second sentence can be query statement, or hit results sentence.
The first number and the second number are illustrated so that sentence set includes set 41 as an example below.
It is assumed that the sentence set sentence that includes to include: (query1, title), (query2, title), (query3,
Title) ..., (query n, title).
The first number of one candidate synonym w2 refers to the number for the first sentence pair that sentence set includes, the first sentence
The query statement of centering includes candidate synonym w2 comprising query word and the hit results sentence of the first sentence centering.
It is assumed that the query2 in (query2, title) includes query word, and title includes candidate keywords w2;
Query n in (query n, title) includes query word, and title includes candidate keywords w2, then the candidate is synonymous
First number of word is 2.
The second number of one candidate synonym w2 refers to that query word and candidate synonym w2 appear in sentence set
The number of query statement;Alternatively, the second number of a candidate synonym w2 refers to that query word and candidate synonym w2 occur
The number of query statement and query word and candidate synonym w2 appear in hit results language in sentence set in sentence set
The sum of the number of sentence.
For example, if query word and candidate synonym w2 appear in query2, in query4, query8, query n, then
The second number of candidate synonym w2 can be 4.
For example, if query word and candidate synonym w2 are appeared in title simultaneously, the second of optional candidate synonym w2
Number=4+1=5.
Optionally, if sentence set includes set 42, the second number of a candidate synonym refers to query word and the time
Synonym is selected to appear in the number of hit results sentence in sentence set;Alternatively, the second number of a candidate synonym refers to
Query word and the candidate synonym appear in the number of hit results sentence and query word and the candidate synonym in sentence set
Appear in the sum of the number of query statement in sentence set.
It is still illustrated by taking set 42 shown in Fig. 4 as an example, the sentence that set 42 includes is to as follows:
(query, title1), (query, title2), (query, title3), (query, title4) ...,
(query, title m).
It is assumed that query word w1 and candidate synonym w2 appear in hit results sentence in sentence set include: title1,
Title2, title3, title4 and title 5, then the second number of candidate synonym w2 can be 5.
If query word and candidate synonym w2 are appeared in query simultaneously, the second number of optional candidate synonym w2
=5+1=6.
Optionally, if sentence set includes set 41 and set 42, the second number of a candidate synonym, which refers to, to be looked into
It askes word and the candidate synonym appears in the number 1 of query statement in sentence set, alternatively, the second number of a candidate synonym
Mesh refers to that query word and the candidate synonym appear in the number 2 of hit results sentence in sentence set, alternatively, a candidate is same
Second number of adopted word refers to the sum of number 1 and number 2.
Optionally, if sentence set includes set 41 and set 42, then sentence set may include a plurality of different
Hit results sentence and a plurality of different query statement.
Step S303: at least based on the corresponding weighting co-occurrence frequency of at least one candidate synonym, from described
The synonym of the query word is obtained at least one candidate synonym.
In an alternative embodiment, step S303 be may comprise steps of:
Step 1: the corresponding weighting co-occurrence frequency of at least one described candidate synonym is at least based on, described in acquisition
The corresponding synonym weight of at least one candidate synonym.
Step 2: based on the corresponding synonym weight of at least one candidate synonym, from it is described at least one
The synonym of the query word is obtained in candidate synonym.
In synonym acquisition methods provided by the embodiments of the present application, got over using the number of the corresponding sentence pair of two keywords
Greatly, two keywords are the bigger thought of probability of synonym, and the corresponding sentence of two keywords is to referring to that the sentence pair looks into
Asking sentence includes a keyword, and the hit results sentence of the sentence pair includes another keyword;Query word pair is obtained first
At least one candidate synonym answered, the corresponding weighting co-occurrence frequency of at least one described candidate synonym of acquisition, one
The weighting co-occurrence frequency of candidate synonym at least characterizes the number for the first sentence pair that sentence set includes, first sentence pair
In query statement include the hit results sentence of the query word and the first sentence centering include the candidate synonym.Extremely
Few corresponding weighting co-occurrence frequency of at least one candidate synonym based on described in, from least one described candidate synonym
Obtain the synonym of the query word.To realize the purpose for the synonym for obtaining query word.
Synonym acquisition methods as shown in connection with fig. 3, below to acquisition synonym candidate collection in synonym acquisition methods
Process be illustrated.Obtain synonym candidate collection method include:
Step 1: obtaining the sentence set.
It may refer to the explanation of the part Fig. 4 about the explanation of sentence set, which is not described herein again.
Step 2: determining sentence pair to be processed from the sentence set.
By taking sentence set includes set 41 as an example, sentence to be processed is to being assumed to be (query1, title), it is assumed that query1
For " loan interest of Construction Bank is how much ", it is assumed that title be " 2018 Construction Bank's loan interest rates are how many? Construction Bank provide a loan item
Which does part have? _ melt 360 ".
Step 3: to the sentence to be processed to comprising query statement and hit results sentence divide, obtain institute
State at least one keyword and the sentence centering hit results language to be processed that sentence centering query statement to be processed includes
At least one corresponding keyword of sentence.
Query1, title can be segmented respectively, optionally, can use HanLP participle tool and segmented.
It can also be segmented with other participle tools, the embodiment of the present application is not limited to HanLP participle tool.
Optionally, the customized keyword of user can be added in user in HanLP participle tool, and HanLP segments tool
When being segmented, the customized keyword of user can be divided into a vocabulary, for example, if it is one that user, which defines " I likes ",
" I " and " liking " can be divided into a vocabulary then HanLP segments tool during being segmented by a keyword,
If user does not input customized vocabulary " I likes " in HanLP participle tool, then HanLP participle tool can be by " I likes "
It is divided into two vocabulary.
It is assumed that the result segmented to query1, title is as follows:
The word segmentation result of query1 includes: Construction Bank, loan, interest, more, few;The word segmentation result of title includes: 2018, builds
If bank, loan, interest rate, it is more, less, condition, have, which, melt, 360.
In an alternative embodiment, the stop words for including in word segmentation result can be removed, optionally, can preset and stop
Word, for example, can by modal particle, and/or, preposition, and/or, number, and/or, preposition is as stop words.
Optionally, remove the stop words in the word segmentation result of query1, and, the stop words in the word segmentation result of title,
Obtain following result:
The word segmentation processing result of query1 includes: Construction Bank, loan, interest;The word segmentation processing result of title includes: construction
Bank, loan, interest rate, condition, which.
Step 4: being looked into described in determination from least one keyword that the sentence centering query statement to be processed includes
Ask word.
Step 5: by least one corresponding described keyword of the sentence centering hit results sentence to be processed, as
Element in the synonym candidate collection.
Assuming that it regard " Construction Bank " in query1 as query word, then, the word segmentation processing result of title includes: construction silver
Row, loan, interest rate, condition, which be element in synonym candidate collection.
Optionally, synonym candidate collection may include following candidate synonym pair:
(Construction Bank, Construction Bank), (Construction Bank, loan), (Construction Bank, interest rate), (Construction Bank, condition), (Construction Bank, which).
To sum up, Construction Bank, loan, interest rate, condition, which be likely to be the synonym of Construction Bank.
It is understood that not only to excavate comprehensive synonym, and to excavate during excavating synonym
Accurate synonym, the result otherwise obtained will inaccuracy.
In order to further such that the synonym of the query word excavated is more accurate, if utilization query word w1 and candidate synonym
Corresponding first sentence of w2 to comprising query statement length information and hit results sentence length information it is longer, inquiry
Word w1 and candidate synonym w2 is the lower thought of the probability of synonym.
In any of the above-described synonym acquisition methods embodiment, step S303 includes:
Step 1: obtaining the corresponding length characteristic information of at least one described candidate synonym.
Wherein, the length characteristic information of a candidate synonym at least characterize the candidate synonym it is corresponding at least one
One sentence is to the length information of corresponding query statement and the length information of hit results sentence.
In an alternative embodiment, the length characteristic information of a candidate synonym w2 at least characterizes candidate synonym w2
The length information of at least one corresponding the first sentence centering query statement and the length information of hit results sentence, packet
Include following any case:
The first situation: the length characteristic information of candidate synonym w2 it is corresponding with candidate synonym w2 at least one first
The length information of sentence centering query statement and the length information of hit results sentence are negatively correlated.
Negative correlation refers to the length characteristic information of candidate synonym w2 with the corresponding first sentence centering of candidate synonym w2
The increase (reduction) of the length information of the length information and hit results sentence of query statement and reduce (increase).
Second situation: if utilize corresponding first sentence of query word w1 and candidate synonym w2 to comprising query statement
Length information and hit results sentence length information it is longer, query word w1 and candidate synonym w2 are the probability of synonym
Lower, if the length information of corresponding second sentence of query word w1 and candidate synonym w2 is longer, query word w1 and candidate are synonymous
Word w2 is the higher thought of the probability of synonym.
Wherein, corresponding second sentence of query word w1 and candidate synonym w2 refers to, the second sentence include query word w1 and
Candidate synonym w2.
In an alternative embodiment, the length characteristic information for obtaining any candidate synonym w2 includes:
Based on candidate synonym w2 the first sentence of corresponding first number to corresponding query statement
The length information of length information and hit results sentence obtains the first length information;
Based on the corresponding length information of candidate synonym w2 the second sentence of corresponding second number, obtain
Second length information;
Based on first length information and second length information, the length characteristic of candidate synonym w2 is obtained
Information.
In an alternative embodiment, the length characteristic information of candidate synonym w2 and the first length information are negatively correlated and the
Two length informations are positively correlated.
Negative correlation refers to that the length characteristic information of candidate synonym w2 reduces with the increase (reduction) of the first length information
(increase).Positive correlation refers to that the length characteristic information of candidate synonym w2 increases with the increase (reduction) of the second length information
(reduction).
Assuming that corresponding first sentence of query word w1 and candidate synonym w2 that sentence set includes to include: (query1,
Title), (query2, title), (query3, title).
Assuming that (query1, title) is that (how much is the loan interest of Construction Bank, and 2018 Construction Bank's loan interest rates are how many? it builds
Which if bank loan condition has? _ melt 360);So query1 character length len (query1)=9, title character length
Len (title)=33.
Assuming that corresponding second sentence of query word w1 and candidate synonym w2 that sentence set includes include: query4,
Query5, it is assumed that len (query4)=11, len (query5)=30.
Step 2: at least based on the corresponding length characteristic information of at least one candidate synonym and it is described extremely
Few corresponding weighting co-occurrence frequency of a candidate synonym, obtains the inquiry from least one described candidate synonym
The synonym of word.
In an alternative embodiment, at least based on the corresponding length characteristic information of at least one described candidate synonym
And the corresponding weighting co-occurrence frequency of described at least one candidate synonym, available first weight.
Optionally, the calculation formula of the first weight of a candidate synonym w2 can be such that
Wherein, w1 indicates that query word, w2 are any candidate synonym, and (query, title) indicates that sentence set includes
Sentence pair, wherein I (w1, w2, query, title) indicates whether w1 appears in sentence centering query and whether w2 appears in this
The title of sentence pair, if w1 appears in sentence centering query and w2 is appeared in the title of the sentence pair, I (w1, w2,
Query, title) value be 1, otherwise the value of I (w1, w2, query, title) be 0.
S indicates the query statement or hit results sentence that sentence set includes, wherein I (w1, w2, s) indicates that w1 and w2 is
It is no to appear in same sentence s;If w1 and w2 are appeared in same sentence s, then I (w1, w2, s) value be 1, otherwise I (w1,
W2, s) value be 0.
Wherein, beta is a bias, prevents denominator too small, optionally, can be with value in the embodiment of the present application
8.0。
In an alternative embodiment, the synonym weight of candidate synonym w2 can be the first weight.
Assuming that sentence set include 10 sentences pair, be respectively as follows: (query1, title), (query2, title),
(query3, title), (query4, title), (query5, title), (query6, title), (query7, title),
(query8, title), (query9, title), (query10, title).
Wherein, the number for corresponding first sentence pair of query word w1 and candidate synonym w2 that sentence set includes is 4, point
Not are as follows: (query1, title), (query2, title), (query3, title), (query4, title);Sentence set includes
Query word w1 and the number of corresponding second sentence of candidate synonym w2 be 2, and be respectively as follows: query5, query6.
Optionally,
In an alternative embodiment, corresponding first number of different candidate synonyms is different, in order to prevent with first
Number (or second number) increases, and causes the molecule (or denominator) of the first weight that the relationship of a high magnification numbe is presented, for example, candidate
The first number of synonym w2 and the first number difference of candidate synonym w3 are 1, but the first weight of candidate synonym w2
Molecule and candidate synonym w3 the first weight molecule present multiple proportion, for example, the first weight of candidate synonym w2
Molecule be 2 times of molecule of the first weight of candidate synonym w3, do not meet actual conditions in this way.In logic, if it is candidate same
The first number of adopted word w2 and the first number difference of candidate synonym w3 are 1, then the first weight of candidate synonym w2
The molecule of the first weight of molecule and candidate synonym w3 should be very close to.
To sum up, described at least based on the corresponding length characteristic information of at least one candidate synonym and described
The corresponding weighting co-occurrence frequency of at least one candidate synonym obtains described look into from least one described candidate synonym
The detailed process of synonym for asking word includes:
Step 1: being based on the candidate synonym for any candidate synonym w2 at least one described candidate synonym
The first number of w2 and the first length information of candidate synonym w2 obtain the first parameter.
Optionally, the first parameter is ∑(w1,w2)∈(query,title)[I(w1,w2,query,title)/(len(query)*
len(title))]。
Step 2: the second length information of the second number and the candidate synonym based on candidate synonym w2, is obtained
Obtain the second parameter.
Optionally, the second parameter is max (∑(w1,w2)∈s[I(w1,w2,s)/(len(s)*len(s))],beta)。
Step 3: respectively using first parameter and second parameter as the independent variable of first function, to obtain
Corresponding first dependent variable of first parameter and corresponding second dependent variable of second parameter.
Wherein, there is the first function difference of the corresponding parameter of different candidate synonyms to be less than or equal to first
In the case where threshold value, the difference of the corresponding dependent variable of different candidate synonyms is less than or equal to the function of second threshold;Institute
State that parameter is first parameter, the dependent variable is first dependent variable;Or, the parameter be the second parameter, it is described because
Variable is second dependent variable.
In an alternative embodiment, first function can beWherein, k1 and k2 can be identical, can also
With difference, k1 and k2 are hyper parameter, and k1 and k2 are the positive number greater than 0.
In an alternative embodiment, k1 can be with value 50, k2 value 20.
As shown in figure 5, inhibiting the first parameter or the second parameter corresponding using first function to be provided by the embodiments of the present application
A kind of schematic diagram of implementation that increases at multiple proportion of dependent variable.
From fig. 5, it can be seen that if using the first parameter or the second parameter as abscissa, if not utilizing first function, then
First parameter is the molecule of the first weight, and the second parameter is the denominator of the first weight, i.e. the first parameter and the first weight
The relationship of molecule, the relationship with the second parameter and the denominator of the first weight, f (x)=x shown in single dotted broken line as shown in Figure 5
Relationship.
If k1's is very big, thenCurve be dotted line shown in fig. 5.If the value of k1 is that 50, k2 takes
Value 20, thenCurve be in Fig. 5 shown in heavy line.Wherein, the first parameter or the second parameter are x, first
The molecule of weight or the denominator of the first weight are f (x).
When double dot dash line refers to that k1 and k2 is 0 in Fig. 5,Curve.
Fine line in Fig. 5 is f (x)=k1+1。
As can be seen from Figure 5When two different x are more close, corresponding f (x) value compares
It is close, the case where multiple increases will not be rendered into.
Step 4: at least based on corresponding first dependent variable of at least one candidate synonym and second because becoming
Amount, obtains the synonym of the query word from least one described candidate synonym.
Optionally, it is based on the first dependent variable of candidate synonym w2 and the second dependent variable, to obtain candidate synonym w2's
First weight.
Optionally, the calculation formula of the first weight of candidate synonym w2 is as follows:
It is understood that often will appear label word in hit results sentence title, for example, search dog, having, taking journey
The invalid keyword such as net, video, invalid keyword refers to commonplace general keyword, for example, no matter user searches for which
The TV play or film that star deduces, " video " can all occur in hit results sentence, that is, label word is in sentence set
The document frequencies of appearance are very high.
The document frequencies of one candidate synonym w2 refer to the number of the sentence in sentence set comprising candidate synonym w2.
It is above-mentioned to be referred to the determination method of at least one corresponding candidate synonym of query word, i.e., it is query statement is corresponding
The keyword that hit results sentence includes is as candidate synonym.If at least one described candidate synonym includes label word,
It will lead to that the document frequencies that label word occurs in sentence set are higher, may result in finally determining query word (assuming that looking into
Ask word be not label word) synonym include label word.So that the synonym inaccuracy of the query word determined.
To sum up, in order to avoid label word to be determined as to the synonym of query word, the document for reducing candidate synonym is proposed
The thought of influence of the frequency to synonym weight.
It is at least right respectively based at least one described candidate synonym in any of the above-described synonym acquisition methods embodiment
The weighting co-occurrence frequency answered, the synonym that the query word is obtained from least one described candidate synonym include:
Step 1: obtaining the corresponding document frequency weight of at least one described candidate synonym.
Wherein, the document frequency weight of a candidate synonym at least characterizes in the sentence set comprising described candidate same
The sentence number of the sentence of adopted word.
In an alternative embodiment, the document frequency weight of a candidate synonym w2 is at least characterized in the sentence set
The sentence number of sentence comprising the candidate synonym w2 includes following any case:
The first situation: the document frequency weight of candidate synonym w2 in the sentence set include the candidate it is synonymous
The sentence number of the sentence of word w2 is negatively correlated.
In an alternative embodiment, the document frequencies of candidate synonym w2 are in the sentence set comprising the candidate
The number of the sentence of synonym w2 includes: the of the hit results sentence in the sentence set comprising the candidate synonym w2
Three numbers;Or, the 4th number of the query statement in the sentence set comprising the candidate synonym w2;Or, third number
The sum of with the 4th number.
In an alternative embodiment, the calculation formula of the document frequency weight of candidate synonym w2 is as follows:
Log (M/w2 document frequencies), wherein M is the total number for the query statement that the sentence set includes, or, M is language
The total number for the hit results that sentence set includes, or, M is the total number and hit knot for the query statement that the sentence set includes
The sum of total number of fruit.
In an alternative embodiment, the corresponding document frequencies of different candidate synonyms are different, in order to prevent with document
The increase of the frequency, cause document frequency weight present a high magnification numbe relationship, in conjunction with first function to document frequency weight into
Row optimization, optionally, obtains the corresponding document frequency weight of any one candidate synonym, comprising:
Obtain the sentence number of the sentence in the sentence set comprising candidate synonym w2;
The sentence number of candidate synonym w2 is obtained into third dependent variable as the independent variable of first function;
Wherein, there is the first function difference of the corresponding sentence number of different candidate synonyms to be less than or equal to
In the case where third threshold value, the difference of the corresponding third dependent variable of different candidate synonyms is less than or equal to the 4th threshold value
Function;
Based on the sentence total number that the third dependent variable and the sentence set include, candidate synonym w2 is obtained
Document frequency weight.
In an alternative embodiment, the calculation formula of the document frequency weight of candidate synonym w2 is as follows:
Log (M/f (w2 document frequencies)).
Step 2: at least based on the corresponding weighting co-occurrence frequency of at least one candidate synonym and it is described extremely
Few corresponding document frequency weight of a candidate synonym, obtains the inquiry from least one described candidate synonym
The synonym of word.
In an alternative embodiment, the calculation formula of the synonym weight of candidate synonym w2 be can be such that
Synonym acquisition methods provided by the embodiments of the present application are using w1, and the number of corresponding first sentence pair of w2 is got over
Greatly, w1, w2 are that the probability of synonym is bigger;The number that w1, w2 appear in same sentence is bigger, w1, and w2 is the probability of same word
Smaller thought;But the number of corresponding first sentence pair of w1, w2 is bigger, not necessarily w1, w2 are that the probability of synonym is got over
Greatly.
For example, query statement is " Taobao ", hit results sentence is that " Taobao-washes in a pan!I likes ", it is assumed that query word is " to wash in a pan
It is precious ", it is likely that " I likes " is determined as to the synonym of " Taobao ";For another example query statement is that " which pillow of Taobao compares
It is good ", hit results sentence is " the XX brand pillow in Jingdone district is preferable ", it is assumed that and query word is Taobao, then, it is likely that by " Jingdone district "
It is determined as the synonym of " Taobao ".
In order to avoid the above problem, similarity information of the query word respectively with each candidate synonym can be obtained, such as remaining
String similarity.It is few right respectively based at least one described candidate synonym in any of the above-described synonym acquisition methods embodiment
The weighting co-occurrence frequency answered obtains the corresponding synonym weight of at least one described candidate synonym, comprising:
Step 1: obtaining the corresponding semantic feature information of at least one candidate synonym and the inquiry
The semantic feature information of word.
In an alternative embodiment, the corresponding semanteme of each candidate synonym can be obtained based on Word2Vec term vector
Characteristic information and the corresponding semantic feature information of query word.
Optionally, word embedding semantic feature can be added.Optionally, it can use existing known corpus training
Word2vec term vector.
In another alternative embodiment, semantic feature Matching Model can be constructed in advance, semantic feature Matching Model tool
There is the semantic feature information of the keyword of prediction to tend to the function of the true semantic feature information of the keyword.
Optionally, which obtained by training neural network.
The sentence that can include from sentence set to comprising query statement and hit results sentence in extracting keywords,
To obtain sample keyword, due to the corresponding semantic feature information of known sample keyword.
It can be based on sample keyword training neural network, to obtain semantic feature Matching Model.
Step 2: for any candidate synonym at least one described candidate synonym, based on the candidate synonym
The semantic feature information of semantic feature information and the query word, obtains the similarity of the candidate synonym Yu the query word
Information, to obtain the corresponding similarity information of at least one described candidate synonym.
The similarity information of query word w1 and candidate synonym w2 can be cosine similarity, the semantic feature letter of w1 and w2
The cosine similarity of breath can be indicated with cos (w1, w2).
Step 3: at least based on the corresponding weighting co-occurrence frequency of at least one candidate synonym and it is described extremely
Few corresponding similarity information of a candidate synonym, obtains the query word from least one described candidate synonym
Synonym.
In an alternative embodiment, the calculation formula of the synonym weight of candidate synonym w2 be can be such that
Alternatively,
In any of the above-described synonym acquisition methods embodiment, the implementation of the step two in step S303 includes but not
It is limited to following manner.
The corresponding synonym weight of at least one described candidate synonym is ranked up by the first implementation;
The synonym of the query word is obtained based on ranking results.
If descending sort, synonym of the L candidate synonyms as query word before can taking, wherein L is to be greater than or wait
In 1 positive integer.
If ascending sort, synonym of the rear L candidate synonym as query word can be taken.
Second of implementation determines same from the corresponding synonym weight of at least one described candidate synonym
The adopted maximum target candidate synonym of word weight, using the target candidate synonym as the synonym of the query word.
The third implementation will be greater than or wait in the corresponding synonym weight of at least one described candidate synonym
In synonym of the candidate synonym as the query word of weight threshold.
Optionally, weight threshold can be configured based on actual conditions, and the embodiment of the present application is not to weight threshold
Occurrence is defined.
It is still illustrated by taking Fig. 1 b and Fig. 1 c as an example, if query word is " Construction Bank ", " Construction Bank " at least one corresponding candidate
Synonym includes: Construction Bank, loan, interest rate etc., then as shown in table 1, be query word respectively with Construction Bank, loan, benefit
The weighting co-occurrence frequency of rate, length characteristic information, document frequencies, similarity information are as shown in table 1.
Table 1
If the maximum candidate synonym of synonym weight to be determined as to the synonym of " Construction Bank ", then can determine " to build
If bank " be " Construction Bank " synonym.
Method is described in detail in aforementioned present invention disclosed embodiment, diversified forms can be used for method of the invention
Device realize that therefore the invention also discloses a kind of devices, and specific embodiment is given below and is described in detail.
As shown in fig. 6, a kind of schematic diagram of implementation for synonym acquisition device provided by the embodiments of the present application, it should
Synonym acquisition device includes:
First obtains module 61, for obtaining at least one corresponding candidate synonym of query word that query statement includes,
The candidate synonym is the word that the corresponding hit results sentence of the query statement includes;
Second obtains module 62, for obtaining the corresponding weighting co-occurrence frequency of at least one described candidate synonym;
Wherein, the weighting co-occurrence frequency of a candidate synonym at least characterizes the number for the first sentence pair that sentence set includes, described
The query statement of first sentence centering is comprising the query word and the hit results sentence of the first sentence centering includes the time
Select synonym;
Third obtains module 63, at least based on the corresponding weighting co-occurrence frequency of at least one described candidate synonym
It is secondary, the synonym of the query word is obtained from least one described candidate synonym.
In an alternative embodiment, second obtains module 62 when obtaining the weighting co-occurrence frequency of any candidate synonym,
It specifically includes:
First acquisition unit includes the query word and corresponding hit for obtaining query statement in the sentence set
As a result sentence includes the first number of the first sentence pair of the candidate synonym;
Second acquisition unit, for obtaining in the sentence set comprising the query word and include the candidate synonym
Second number of the second sentence, second sentence are query statement or hit results sentence;
Third acquiring unit obtains the candidate synonym for being based on first number and second number
Weight the co-occurrence frequency.
In an alternative embodiment, third obtains module 63 and includes:
4th acquiring unit, for obtaining the corresponding length characteristic information of at least one described candidate synonym;
Wherein, the length characteristic information of a candidate synonym at least characterize the candidate synonym it is corresponding at least one
One sentence is to the length information of corresponding query statement and the length information of hit results sentence;
5th acquiring unit, at least based on the corresponding length characteristic information of at least one described candidate synonym
And the corresponding weighting co-occurrence frequency of described at least one candidate synonym, it is obtained from least one described candidate synonym
Obtain the synonym of the query word.
In an alternative embodiment, the 4th acquiring unit obtains the length characteristic information of any one candidate synonym executing
When, it specifically includes:
First obtains subelement, for being based on the candidate synonym the first sentence of corresponding first number to respectively
The length information of corresponding query statement and the length information of hit results sentence, obtain the first length information;
Second obtains subelement, for right respectively based on the candidate synonym the second sentence of corresponding second number
The length information answered obtains the second length information;
Third obtains subelement, for being based on first length information and second length information, obtains the time
Select the length characteristic information of synonym.
In an alternative embodiment, the 5th acquiring unit includes:
4th obtains subelement, for for any candidate synonym at least one described candidate synonym, being based on should
First number of candidate synonym and the first length information of the candidate synonym obtain the first parameter;
5th obtains subelement, and second for the second number and the candidate synonym based on the candidate synonym is long
Information is spent, the second parameter is obtained;
6th obtains subelement, for first parameter and second parameter are used as respectively first function oneself
Variable, to obtain corresponding first dependent variable of first parameter and corresponding second dependent variable of second parameter;
Wherein, there is the first function difference of the corresponding parameter of different candidate synonyms to be less than or equal to first
In the case where threshold value, the difference of the corresponding dependent variable of different candidate synonyms is less than or equal to the function of second threshold;Institute
State that parameter is first parameter, the dependent variable is first dependent variable;Or, the parameter be the second parameter, it is described because
Variable is second dependent variable;
7th obtains subelement, at least based on corresponding first dependent variable of at least one described candidate synonym
And second dependent variable, the synonym of the query word is obtained from least one described candidate synonym.
In an alternative embodiment, third obtains module 63 and includes:
6th acquiring unit, for obtaining the corresponding document frequency weight of at least one described candidate synonym;
Wherein, the document frequency weight of a candidate synonym at least characterizes in the sentence set comprising described candidate same
The sentence number of the sentence of adopted word;
7th acquiring unit, at least based on the corresponding weighting co-occurrence frequency of at least one described candidate synonym
And the corresponding document frequency weight of described at least one candidate synonym, it is obtained from least one described candidate synonym
Obtain the synonym of the query word.
In an alternative embodiment, the 6th acquiring unit obtains the corresponding document of any one candidate synonym in execution
When frequency weight, specifically include:
8th obtains subelement, for obtaining the sentence number of the sentence in the sentence set comprising the candidate synonym
Mesh;
9th obtains subelement, for obtaining for the sentence number of the candidate synonym as the independent variable of first function
Three dependent variables;
Wherein, there is the first function difference of the corresponding sentence number of different candidate synonyms to be less than or equal to
In the case where third threshold value, the difference of the corresponding third dependent variable of different candidate synonyms is less than or equal to the 4th threshold value
Function;
Tenth obtains subelement, the sentence sum for including based on the third dependent variable and the sentence set
Mesh obtains the document frequency weight of the candidate synonym.
In an alternative embodiment, third obtains module 63 and includes:
8th acquiring unit, for obtaining the corresponding semantic feature information of at least one described candidate synonym, with
And the semantic feature information of the query word;
9th acquiring unit, for being based on the time for any candidate synonym at least one described candidate synonym
The semantic feature information of synonym and the semantic feature information of the query word are selected, the candidate synonym and the inquiry are obtained
The similarity information of word, to obtain the corresponding similarity information of at least one described candidate synonym;
Tenth acquiring unit, at least based on the corresponding weighting co-occurrence frequency of at least one described candidate synonym
And the corresponding similarity information of described at least one candidate synonym, it is obtained from least one described candidate synonym
The synonym of the query word.
In an alternative embodiment, the first acquisition module 61 includes:
11st acquiring unit, for obtaining the sentence set, the sentence set includes at least a query statement
Corresponding one or more hit results sentence, and/or, the corresponding one or more query statements of a hit results sentence;
First determination unit, for determining sentence pair to be processed from the sentence set;
12nd acquiring unit, for the sentence to be processed to comprising query statement and hit results sentence carry out
It divides, obtains at least one keyword and the sentence pair to be processed that the sentence centering query statement to be processed includes
At least one corresponding keyword of middle hit results sentence;
Second determination unit, for from least one keyword that the sentence centering query statement to be processed includes,
Determine the query word;
Third determination unit is used at least one corresponding described pass of the sentence centering hit results sentence to be processed
Keyword, as at least one described candidate synonym.
As shown in fig. 7, being a kind of implementation schematic diagram of electronic equipment provided by the embodiments of the present application, the electronic equipment
Include:
Memory 71, for storing program;
Processor 72, for executing described program, described program is specifically used for:
At least one corresponding candidate synonym of query word that query statement includes is obtained, the candidate synonym is described
The word that the corresponding hit results sentence of query statement includes;
Obtain the corresponding weighting co-occurrence frequency of at least one described candidate synonym;The weighting of one candidate synonym
The co-occurrence frequency at least characterizes the number for the first sentence pair that sentence set includes, and the query statement of the first sentence centering includes
The query word and hit results sentence of the first sentence centering includes the candidate synonym;
At least based on the corresponding weighting co-occurrence frequency of at least one described candidate synonym, from least one described time
Select the synonym that the query word is obtained in synonym.
Processor may be a central processor CPU or specific integrated circuit ASIC (Application
Specific Integrated Circuit), or be arranged to implement the integrated electricity of one or more of the embodiment of the present invention
Road.
Electronic equipment can also include communication interface 73 and communication bus 74, wherein memory, processor and communication
Interface completes mutual communication by communication bus.
Optionally, communication interface can be the interface of communication module, such as the interface of gsm module.
The embodiment of the invention also provides a kind of readable storage medium storing program for executing, are stored thereon with computer program, the computer
When program is executed by processor, each step for including such as above-mentioned synonym acquisition methods embodiment is realized.
The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention.
Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein
General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, of the invention
It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one
The widest scope of cause.
Claims (10)
1. a kind of synonym acquisition methods characterized by comprising
At least one corresponding candidate synonym of query word that query statement includes is obtained, the candidate synonym is the inquiry
The word that the corresponding hit results sentence of sentence includes;
Obtain the corresponding weighting co-occurrence frequency of at least one described candidate synonym;The weighting co-occurrence of one candidate synonym
The frequency at least characterizes the number for the first sentence pair that sentence set includes, and the query statement of the first sentence centering includes described
The query word and hit results sentence of the first sentence centering includes the candidate synonym;
It is same from least one described candidate at least based on the corresponding weighting co-occurrence frequency of at least one described candidate synonym
The synonym of the query word is obtained in adopted word.
2. synonym acquisition methods according to claim 1, which is characterized in that obtain adding for any one candidate synonym
Weigh the co-occurrence frequency, comprising:
Obtain in the sentence set query statement include the query word and corresponding hit results sentence include candidate together
First number of the first sentence pair of adopted word;
Obtain in the sentence set comprising the query word and include the candidate synonym the second sentence the second number, institute
Stating the second sentence is query statement or hit results sentence;
Based on first number and second number, the weighting co-occurrence frequency of the candidate synonym is obtained.
3. synonym acquisition methods according to claim 2, which is characterized in that described at least based at least one described candidate
The corresponding weighting co-occurrence frequency of synonym, obtains the synonymous of the query word from least one described candidate synonym
Word, comprising:
Obtain the corresponding length characteristic information of at least one described candidate synonym;
Wherein, the length characteristic information of a candidate synonym at least characterizes at least one corresponding first language of the candidate synonym
The length information of the corresponding query statement of sentence pair and the length information of hit results sentence;
At least based on the corresponding length characteristic information of at least one candidate synonym and at least one described candidate
The corresponding weighting co-occurrence frequency of synonym, obtains the synonymous of the query word from least one described candidate synonym
Word.
4. synonym acquisition methods according to claim 3, which is characterized in that the length for obtaining any one candidate synonym is special
Reference ceases
Believed based on length of the candidate synonym the first sentence of corresponding first number to corresponding query statement
The length information of breath and hit results sentence, obtains the first length information;
Based on the corresponding length information of the candidate synonym the second sentence of corresponding second number, it is long to obtain second
Spend information;
Based on first length information and second length information, the length characteristic information of the candidate synonym is obtained.
5. synonym acquisition methods according to claim 4, which is characterized in that described at least based at least one described candidate
The corresponding length characteristic information of synonym and the corresponding weighting co-occurrence frequency of at least one described candidate synonym,
The synonym of the query word is obtained from least one described candidate synonym, comprising:
For any candidate synonym at least one described candidate synonym, the first number based on the candidate synonym and
First length information of the candidate synonym obtains the first parameter;
Second length information of the second number and the candidate synonym based on the candidate synonym obtains the second parameter;
Respectively using first parameter and second parameter as the independent variable of first function, to obtain first parameter
Corresponding first dependent variable and corresponding second dependent variable of second parameter;
Wherein, there is the first function difference of the corresponding parameter of different candidate synonyms to be less than or equal to first threshold
In the case where, the difference of the corresponding dependent variable of different candidate synonyms is less than or equal to the function of second threshold;The ginseng
Number is first parameter, the dependent variable is first dependent variable;Or, the parameter is the second parameter, the dependent variable
For second dependent variable;
At least based on corresponding first dependent variable of at least one candidate synonym and the second dependent variable, from it is described to
The synonym of the query word is obtained in a few candidate synonym.
6. synonym acquisition methods according to claim 1, which is characterized in that described at least based at least one described candidate
The corresponding weighting co-occurrence frequency of synonym, obtains the synonymous of the query word from least one described candidate synonym
Word, comprising:
Obtain the corresponding document frequency weight of at least one described candidate synonym;
Wherein, the document frequency weight of a candidate synonym at least characterizes in the sentence set comprising the candidate synonym
Sentence sentence number;
At least based on the corresponding weighting co-occurrence frequency of at least one candidate synonym and at least one described candidate
The corresponding document frequency weight of synonym, obtains the synonymous of the query word from least one described candidate synonym
Word.
7. synonym acquisition methods according to claim 6, the corresponding document of described any one candidate synonym of acquisition
Frequency weight, comprising:
Obtain the sentence number of the sentence in the sentence set comprising the candidate synonym;
The sentence number of the candidate synonym is obtained into third dependent variable as the independent variable of first function;
Wherein, there is the first function difference of the corresponding sentence number of different candidate synonyms to be less than or equal to third
In the case where threshold value, the difference of the corresponding third dependent variable of different candidate synonyms is less than or equal to the function of the 4th threshold value
Energy;
Based on the sentence total number that the third dependent variable and the sentence set include, the document of the candidate synonym is obtained
Frequency weight.
8. synonym acquisition methods according to claim 1, which is characterized in that described at least based at least one described candidate
The corresponding weighting co-occurrence frequency of synonym, obtains the synonymous of the query word from least one described candidate synonym
Word, comprising:
Obtain the semantic feature of the corresponding semantic feature information of at least one candidate synonym and the query word
Information;
For any candidate synonym at least one described candidate synonym, the semantic feature information based on the candidate synonym
And the semantic feature information of the query word, the similarity information of the candidate synonym Yu the query word is obtained, to obtain
The corresponding similarity information of described at least one candidate synonym;
At least based on the corresponding weighting co-occurrence frequency of at least one candidate synonym and at least one described candidate
The corresponding similarity information of synonym obtains the synonym of the query word from least one described candidate synonym.
9. according to claim 1 to 8 any synonym acquisition methods, which is characterized in that the acquisition query statement includes
At least one corresponding candidate synonym of query word include:
The sentence set is obtained, the sentence set includes at least the corresponding one or more hit results of a query statement
Sentence, and/or, the corresponding one or more query statements of a hit results sentence;
Sentence pair to be processed is determined from the sentence set;
To the sentence to be processed to comprising query statement and hit results sentence divide, obtain the sentence to be processed
At least one keyword and the sentence centering hit results sentence to be processed that centering query statement includes are corresponding at least
One keyword;
From at least one keyword that the sentence centering query statement to be processed includes, the query word is determined;
By at least one corresponding described keyword of the sentence centering hit results sentence to be processed, determine it is described at least one
Candidate synonym.
10. a kind of synonym acquisition device characterized by comprising
First obtains module, for obtaining at least one corresponding candidate synonym of query word that query statement includes, the time
Selecting synonym is the word that the corresponding hit results sentence of the query statement includes;
Second obtains module, for obtaining the corresponding weighting co-occurrence frequency of at least one described candidate synonym;Wherein, one
The weighting co-occurrence frequency of a candidate synonym at least characterizes the number for the first sentence pair that sentence set includes, first sentence
The query statement of centering is comprising the query word and the hit results sentence of the first sentence centering includes the candidate synonym;
Third obtains module, for being at least based on the corresponding weighting co-occurrence frequency of at least one described candidate synonym, from
The synonym of the query word is obtained at least one described candidate synonym.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910160822.9A CN109918661B (en) | 2019-03-04 | 2019-03-04 | Synonym acquisition method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910160822.9A CN109918661B (en) | 2019-03-04 | 2019-03-04 | Synonym acquisition method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109918661A true CN109918661A (en) | 2019-06-21 |
CN109918661B CN109918661B (en) | 2023-05-30 |
Family
ID=66963144
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910160822.9A Active CN109918661B (en) | 2019-03-04 | 2019-03-04 | Synonym acquisition method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109918661B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110413737A (en) * | 2019-07-29 | 2019-11-05 | 腾讯科技(深圳)有限公司 | A kind of determination method, apparatus, server and the readable storage medium storing program for executing of synonym |
CN110795565A (en) * | 2019-09-06 | 2020-02-14 | 腾讯科技(深圳)有限公司 | Semantic recognition-based alias mining method, device, medium and electronic equipment |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR2970795A1 (en) * | 2011-01-25 | 2012-07-27 | Synomia | Method for filtering of synonyms in electronic document database in information system for searching information in e.g. Internet, involves performing reduction of number of synonyms of keyword based on score value of semantic proximity |
CN102622411A (en) * | 2012-02-17 | 2012-08-01 | 清华大学 | Structured abstract generating method |
CN102760134A (en) * | 2011-04-28 | 2012-10-31 | 北京百度网讯科技有限公司 | Method and device for mining synonyms |
WO2014002775A1 (en) * | 2012-06-25 | 2014-01-03 | 日本電気株式会社 | Synonym extraction system, method and recording medium |
CN103514269A (en) * | 2013-09-12 | 2014-01-15 | 百度在线网络技术(北京)有限公司 | Second query term determined to be related to first query term based on natural searching results |
JP2015032228A (en) * | 2013-08-05 | 2015-02-16 | Kddi株式会社 | Program, method, apparatus and server generating co-occurrence pattern for detecting near-synonym |
CN105279252A (en) * | 2015-10-12 | 2016-01-27 | 广州神马移动信息科技有限公司 | Related word mining method, search method and search system |
CN106933806A (en) * | 2017-03-15 | 2017-07-07 | 北京大数医达科技有限公司 | The determination method and apparatus of medical synonym |
CN107247745A (en) * | 2017-05-23 | 2017-10-13 | 华中师范大学 | A kind of information retrieval method and system based on pseudo-linear filter model |
CN107451212A (en) * | 2017-07-14 | 2017-12-08 | 北京京东尚科信息技术有限公司 | Synonymous method for digging and device based on relevant search |
CN108038096A (en) * | 2017-11-10 | 2018-05-15 | 平安科技(深圳)有限公司 | Knowledge database documents method for quickly retrieving, application server computer readable storage medium storing program for executing |
CN108509474A (en) * | 2017-09-15 | 2018-09-07 | 腾讯科技(深圳)有限公司 | Search for the synonym extended method and device of information |
-
2019
- 2019-03-04 CN CN201910160822.9A patent/CN109918661B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR2970795A1 (en) * | 2011-01-25 | 2012-07-27 | Synomia | Method for filtering of synonyms in electronic document database in information system for searching information in e.g. Internet, involves performing reduction of number of synonyms of keyword based on score value of semantic proximity |
CN102760134A (en) * | 2011-04-28 | 2012-10-31 | 北京百度网讯科技有限公司 | Method and device for mining synonyms |
CN102622411A (en) * | 2012-02-17 | 2012-08-01 | 清华大学 | Structured abstract generating method |
WO2014002775A1 (en) * | 2012-06-25 | 2014-01-03 | 日本電気株式会社 | Synonym extraction system, method and recording medium |
JP2015032228A (en) * | 2013-08-05 | 2015-02-16 | Kddi株式会社 | Program, method, apparatus and server generating co-occurrence pattern for detecting near-synonym |
CN103514269A (en) * | 2013-09-12 | 2014-01-15 | 百度在线网络技术(北京)有限公司 | Second query term determined to be related to first query term based on natural searching results |
CN105279252A (en) * | 2015-10-12 | 2016-01-27 | 广州神马移动信息科技有限公司 | Related word mining method, search method and search system |
CN106933806A (en) * | 2017-03-15 | 2017-07-07 | 北京大数医达科技有限公司 | The determination method and apparatus of medical synonym |
CN107247745A (en) * | 2017-05-23 | 2017-10-13 | 华中师范大学 | A kind of information retrieval method and system based on pseudo-linear filter model |
CN107451212A (en) * | 2017-07-14 | 2017-12-08 | 北京京东尚科信息技术有限公司 | Synonymous method for digging and device based on relevant search |
CN108509474A (en) * | 2017-09-15 | 2018-09-07 | 腾讯科技(深圳)有限公司 | Search for the synonym extended method and device of information |
CN108038096A (en) * | 2017-11-10 | 2018-05-15 | 平安科技(深圳)有限公司 | Knowledge database documents method for quickly retrieving, application server computer readable storage medium storing program for executing |
Non-Patent Citations (3)
Title |
---|
REKHA VAIDYANATHAN: "Query Expansion Strategy based on Pseudo Relevance Feedback and Term Weight Scheme for Monolingual Retrieval", 《ARXIV》 * |
王颖等: "基于专利搜索日志的同义词挖掘", 《计算机工程与设计》 * |
肖淋峰: "面向检索信息的同义词挖掘", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110413737A (en) * | 2019-07-29 | 2019-11-05 | 腾讯科技(深圳)有限公司 | A kind of determination method, apparatus, server and the readable storage medium storing program for executing of synonym |
CN110413737B (en) * | 2019-07-29 | 2022-10-14 | 腾讯科技(深圳)有限公司 | Synonym determination method, synonym determination device, server and readable storage medium |
CN110795565A (en) * | 2019-09-06 | 2020-02-14 | 腾讯科技(深圳)有限公司 | Semantic recognition-based alias mining method, device, medium and electronic equipment |
CN110795565B (en) * | 2019-09-06 | 2023-10-27 | 腾讯科技(深圳)有限公司 | Alias mining method and device based on semantic recognition, medium and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN109918661B (en) | 2023-05-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11227118B2 (en) | Methods, devices, and systems for constructing intelligent knowledge base | |
CN107992585B (en) | Universal label mining method, device, server and medium | |
CN110704743B (en) | Semantic search method and device based on knowledge graph | |
US20210312139A1 (en) | Method and apparatus of generating semantic feature, method and apparatus of training model, electronic device, and storage medium | |
CN111143576A (en) | Event-oriented dynamic knowledge graph construction method and device | |
EP3933657A1 (en) | Conference minutes generation method and apparatus, electronic device, and computer-readable storage medium | |
CN110209808A (en) | A kind of event generation method and relevant apparatus based on text information | |
CN110162768B (en) | Method and device for acquiring entity relationship, computer readable medium and electronic equipment | |
EP3916579A1 (en) | Method for resource sorting, method for training sorting model and corresponding apparatuses | |
CN108287875B (en) | Character co-occurrence relation determining method, expert recommending method, device and equipment | |
CN110569289B (en) | Column data processing method, equipment and medium based on big data | |
US20220300543A1 (en) | Method of retrieving query, electronic device and medium | |
CN112115232A (en) | Data error correction method and device and server | |
CN109063184A (en) | Multilingual newsletter archive clustering method, storage medium and terminal device | |
CN112905768A (en) | Data interaction method, device and storage medium | |
CN112148886A (en) | Method and system for constructing content knowledge graph | |
CN109918661A (en) | Synonym acquisition methods and device | |
CN108388556A (en) | The method for digging and system of similar entity | |
US20220335070A1 (en) | Method and apparatus for querying writing material, and storage medium | |
CN116401345A (en) | Intelligent question-answering method, device, storage medium and equipment | |
US20220198358A1 (en) | Method for generating user interest profile, electronic device and storage medium | |
CN113505196B (en) | Text retrieval method and device based on parts of speech, electronic equipment and storage medium | |
CN114547257A (en) | Class matching method and device, computer equipment and storage medium | |
US20230142351A1 (en) | Methods and systems for searching and retrieving information | |
CN109727591B (en) | Voice search method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |