CN109918661A - Synonym acquisition methods and device - Google Patents

Synonym acquisition methods and device Download PDF

Info

Publication number
CN109918661A
CN109918661A CN201910160822.9A CN201910160822A CN109918661A CN 109918661 A CN109918661 A CN 109918661A CN 201910160822 A CN201910160822 A CN 201910160822A CN 109918661 A CN109918661 A CN 109918661A
Authority
CN
China
Prior art keywords
synonym
sentence
candidate
candidate synonym
query
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910160822.9A
Other languages
Chinese (zh)
Other versions
CN109918661B (en
Inventor
谭小龙
汤煌
张小鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201910160822.9A priority Critical patent/CN109918661B/en
Publication of CN109918661A publication Critical patent/CN109918661A/en
Application granted granted Critical
Publication of CN109918661B publication Critical patent/CN109918661B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the present application discloses a kind of synonym acquisition methods and device, number using the corresponding sentence pair of two keywords is bigger, two keywords are the bigger thought of the probability of synonym, for the corresponding sentence of two keywords to referring to that the query statement of the sentence pair includes a keyword, the hit results sentence of the sentence pair includes another keyword;At least one corresponding candidate synonym of query word is obtained first, obtain the corresponding weighting co-occurrence frequency of at least one described candidate synonym, the weighting co-occurrence frequency of one candidate synonym at least characterizes the number for the first sentence pair that sentence set includes, and the query statement of the first sentence centering is comprising query word and the hit results sentence of the first sentence centering includes the candidate synonym.At least based on the corresponding weighting co-occurrence frequency of at least one described candidate synonym, the synonym of the query word is obtained from least one described candidate synonym.To realize the purpose for the synonym for obtaining query word.

Description

Synonym acquisition methods and device
Technical field
This application involves natural language processing technique fields, and more specifically, it relates to synonym acquisition methods and devices.
Background technique
The synonym involved in human-computer interaction application scenarios excavates, for example, being inputted during intelligent search based on user The query statement keyword for including and the keyword excavated synonym, available more comprehensive retrieval knot Fruit.
To sum up, synonym excavation is of great significance to human-computer interaction scene.
Summary of the invention
In view of this, the embodiment of the present application provides a kind of synonym acquisition methods and device, to realize acquisition synonym Purpose.
To achieve the above object, the invention provides the following technical scheme:
A kind of synonym acquisition methods, comprising:
At least one corresponding candidate synonym of query word that query statement includes is obtained, the candidate synonym is described The word that the corresponding hit results sentence of query statement includes;
Obtain the corresponding weighting co-occurrence frequency of at least one described candidate synonym;The weighting of one candidate synonym The co-occurrence frequency at least characterizes the number for the first sentence pair that sentence set includes, and the query statement of the first sentence centering includes The query word and hit results sentence of the first sentence centering includes the candidate synonym;
At least based on the corresponding weighting co-occurrence frequency of at least one described candidate synonym, from least one described time Select the synonym that the query word is obtained in synonym.
A kind of synonym acquisition device, comprising:
First obtains module, for obtaining at least one corresponding candidate synonym of query word that query statement includes, institute Stating candidate synonym is the word that the corresponding hit results sentence of the query statement includes;
Second obtains module, for obtaining the corresponding weighting co-occurrence frequency of at least one described candidate synonym;Its In, the weighting co-occurrence frequency of a candidate synonym at least characterizes the number for the first sentence pair that sentence set includes, and described The query statement of one sentence centering is comprising the query word and the hit results sentence of the first sentence centering includes the candidate Synonym;
Third obtains module, at least based on the corresponding weighting co-occurrence frequency of at least one described candidate synonym It is secondary, the synonym of the query word is obtained from least one described candidate synonym.
It can be seen via above technical scheme that compared with prior art, the embodiment of the present application discloses a kind of synonym and obtains Method is taken, the number using the corresponding sentence pair of two keywords is bigger, and two keywords are the bigger think ofs of the probability of synonym Think, the corresponding sentence of two keywords includes a keyword, the hit of the sentence pair to the query statement for referring to the sentence pair As a result sentence includes another keyword;At least one corresponding candidate synonym of query word is obtained first, and acquisition is described at least The corresponding weighting co-occurrence frequency of one candidate synonym, the weighting co-occurrence frequency of a candidate synonym at least characterize sentence The number for the first sentence pair that set includes, the query statement of the first sentence centering include the query word and described first The hit results sentence of sentence centering includes the candidate synonym.At least respectively corresponded based at least one described candidate synonym The weighting co-occurrence frequency, the synonym of the query word is obtained from least one described candidate synonym.It is obtained to realize Take the purpose of the synonym of query word.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The embodiment of invention for those of ordinary skill in the art without creative efforts, can also basis The attached drawing of offer obtains other attached drawings.
Fig. 1 a show intelligent robot system provided by the embodiments of the present application or intelligent Answer System or man-machine chat system A kind of realization interface schematic diagram of system or intelligent customer service system;
Fig. 1 b to Fig. 1 c is a kind of schematic diagram for realizing interface of intelligent searching system provided by the embodiments of the present application;
Fig. 1 d to 1e is a kind of structure chart for realizing interface of machine authoring system provided by the embodiments of the present application;
Fig. 2 is a kind of structure chart for implementation that synonym provided by the embodiments of the present application obtains system;
Fig. 3 is a kind of flow chart of implementation of synonym acquisition methods provided by the embodiments of the present application;
Fig. 4 is a kind of schematic diagram of implementation of sentence set provided by the embodiments of the present application;
Fig. 5 inhibits the first parameter or the corresponding dependent variable of the second parameter using first function to be provided by the embodiments of the present application At a kind of schematic diagram for implementation that multiple proportion increases;
Fig. 6 is a kind of schematic diagram of implementation of synonym acquisition device provided by the embodiments of the present application;
Fig. 7 is a kind of implementation schematic diagram of electronic equipment provided by the embodiments of the present application.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
Synonym acquisition methods provided by the embodiments of the present application can be applied to human-computer interaction application scenarios, and human-computer interaction is answered It include but is not limited to following application scenarios with scene: intelligent robot system, man-machine chat system, intelligent Answer System, intelligence Customer service system, intelligent searching system, machine authoring system.
Wherein, intelligent robot system, intelligent customer service system, man-machine chat system, intelligent Answer System are to utilize intelligence Energy machine replaces artificial customer service, i.e. realization human-computer dialogue.User can input voice or text, and intelligence machine is according to voice or text The keyword and the corresponding synonym of the keyword that word includes obtain corresponding feedback result, which can be voice Or text.
Optionally, intelligent robot system, intelligent customer service system, man-machine chat system, intelligent Answer System can use The synonym of keyword quickly understands the true intention of user, to make accurate feedback result.
It is understood that feedback result may include a plurality of result sentence;User can therefrom select needed for oneself The voice of user's input or text are known as query statement in the embodiment of the present application by as a result sentence, by user from one or more As a result the result sentence needed for oneself is selected to be known as hit results sentence in sentence.
It is understood that feedback result may only have a result sentence, for example, intelligent robot system or man-machine chatting When its system or intelligent Answer System are the problem of replying user, a result sentence may be directly fed back, then the result language Sentence is the hit results sentence in the embodiment of the present application.
It as shown in Figure 1a, is intelligent robot system provided by the embodiments of the present application or intelligent Answer System or man-machine chat A kind of realization interface schematic diagram of system or intelligent customer service system.
As shown in Figure 1a, user input voice or text " where is the site of Construction Bank? ", optionally, intelligent robot System or intelligent Answer System or man-machine chat system or intelligent customer service system can be based on provided by the embodiments of the present application synonymous Word acquisition methods, obtain in real time query statement " where is the site of Construction Bank? " in the corresponding synonym of each keyword, example Such as, " Construction Bank " corresponding synonym, " site " corresponding synonym, " where " corresponding synonym;So as to fast understanding The true intention of user.
The query word that the embodiment of the present application refers to is the keyword for including in query statement.
Optionally, intelligent robot system or intelligent Answer System or man-machine chat system or intelligent customer service system can wrap Include database of synonyms, database of synonyms be stored with obtained using synonym acquisition methods provided by the embodiments of the present application it is each The corresponding synonym of keyword, intelligent robot system or intelligent Answer System or man-machine chat system or intelligent customer service system System, can directly be obtained from database of synonyms " Construction Bank " corresponding synonym, " site " corresponding synonym, " where " it is right The synonym answered.
Intelligent robot system or intelligent Answer System or man-machine chat system or intelligent customer service system understanding user's is true It after sincere figure, replys and oneself thinks more accurately feedback result, such as " the nearest address of Construction Bank is located at XX ";So, For the embodiment of the present application, " where is the site of Construction Bank? " it is properly termed as query statement, " the nearest ground of Construction Bank Location is located at XX " and is properly termed as hit results sentence.
Optionally, intelligent searching system may include information retrieval system.Intelligent searching system may include webpage client End or application client, user can intelligently search in webpage client or application client input voice or text The keyword and the corresponding synonym of the keyword that the voice or text that cable system can be inputted based on user include, obtain phase The feedback result answered, so that user checks.Feedback result can be text or voice.
It is understood that feedback result may include one or more result sentence;User can therefrom select oneself The voice of user's input or text are known as query statement in the embodiment of the present application, by user from one by required result sentence Or the result sentence needed for oneself is selected to be known as hit results sentence in a plurality of result sentence.
It is a kind of signal for realizing interface of intelligent searching system provided by the embodiments of the present application as shown in Fig. 1 b to Fig. 1 c Figure.
Intelligent searching system may include webpage client or application client.Assuming that user in webpage client or Application client inputs " how much is the loan interest of Construction Bank " this query statement, as shown in Figure 1 b.
After user clicks search key 11, intelligent searching system can obtain query statement, and " loan interest of Construction Bank is more It is few ";Query statement is divided, keyword " Construction Bank ", " loan ", " interest " are obtained.
Optionally, intelligent searching system can use synonym acquisition methods provided by the embodiments of the present application, obtain in real time The synonym of " Construction Bank ", the synonym of the synonym of " loan " and " interest ", and by the corresponding synonym of the keyword of acquisition It stores into database of synonyms.
Optionally, intelligent searching system may include database of synonyms, is stored in database of synonyms and utilizes this Shen Please the obtained corresponding synonym of each keyword of synonym acquisition methods that provides of embodiment, intelligent searching system can be from Synonym, the synonym of " loan " and the synonym of " interest " of " Construction Bank " are obtained in database of synonyms;Assuming that " Construction Bank " Synonym include: " Construction Bank ", " China Construction Bank ";The synonym of " loan " includes: " loaning bill ";" interest " it is synonymous Word includes: " interest ", " breath gold ", " interest rate ".
Intelligent searching system can obtain retrieval type, base based on " Construction Bank ", " loan ", " interest " and corresponding synonym It is retrieved in the retrieval type, to obtain search result, and feeds back to webpage client or application client.
Optionally, based on " Construction Bank ", " loan ", " interest " and accordingly synonym obtain retrieval type can be with are as follows: (Construction Bank Or China Construction Bank, or Construction Bank) and (loan or borrows money) and (interest or interest or ceases gold or interest rate).
Webpage client or application client can show search result as illustrated in figure 1 c.It is searched for shown in Fig. 1 c As a result by taking four result sentences as an example.If user's selection " do are 2018 Construction Bank's loan interest rates how many? Construction Bank's terms of loan Which is there? _ melt 360 ", then " 2018 Construction Bank's loan interest rates are how many? which do Construction Bank's terms of loan have? _ melt 360 " For hit results sentence.
Machine authoring system refers to that intelligent robot system is write in the information automatically grabbed according to preset structure Original text.After completing contribution, machine authoring system can on the basis of original contribution, the keyword that includes using original contribution it is same The purpose of adopted word may be implemented to carry out original contribution text rewriting, and keyword is replaced.
It is a kind of structure for realizing interface of machine authoring system provided by the embodiments of the present application as shown in Fig. 1 d to 1e Figure.
As shown in Figure 1 d, the original contribution completed for machine authoring system;It is as shown in fig. le machine authoring system Modified contribution.
In order to protrude change part, the part changed in Fig. 1 e with overstriking underscore mark, for example, will be changed to " in midair " Original statement " Large White falls in midair " in the embodiment of the present application, can be known as query statement, by sentence after change by " aerial " " Large White falls in the sky " is known as hit results sentence.
Below with reference to above-mentioned application scenarios, synonym acquisition methods provided by the embodiments of the present application are illustrated.
It is understood that intelligent robot system or man-machine chat system or intelligent Answer System or intelligent customer service system Or intelligent searching system or machine authoring system may include terminal device and server;Alternatively, man-machine chat system or intelligence Energy question answering system or intelligent customer service system or intelligent searching system or machine authoring system only include terminal device.
Terminal device can be desktop computer or mobile terminal (such as smart phone) or ipad or intelligent robot etc. Electronic equipment.
Above-mentioned server can be a server, be also possible to the server cluster consisted of several servers, or Person is a cloud computing service center.
As shown in Fig. 2, obtaining a kind of structure chart of implementation of system for synonym provided by the embodiments of the present application.Together Adopted word obtain system can for intelligent robot system or man-machine chat system or intelligent Answer System or intelligent customer service system or Intelligent searching system or machine authoring system.
It includes: terminal device 21 and server 22 that the synonym, which obtains system,.
Terminal device 21 is for obtaining query statement.
Optionally, which can be the text or voice of user's input;Optionally, which can be original Original statement in beginning contribution.
Optionally, terminal device 21 can obtain at least one query word that query statement includes.
Query word is any vocabulary in query statement in the embodiment of the present application.
Still for shown in Fig. 1 b, query statement is " how much is the loan interest of Construction Bank ", the query word packet in query statement Include: Construction Bank ", " loan ", " interest ".
Optionally, server 22 is for obtaining at least one query word that query statement includes.
In an alternative embodiment, server 22 is also used to using synonym acquisition methods provided by the embodiments of the present application, The corresponding synonym of at least one query word is obtained in real time.
In another alternative embodiment, server 22 is also used to obtain at least one query word from database of synonyms 23 Corresponding synonym.
Wherein, database of synonyms 23 be stored with obtained using synonym acquisition methods provided by the embodiments of the present application it is each The corresponding synonym of query word.
In an alternative embodiment, database of synonyms 23 may belong to server 22;In another alternative embodiment, together Adopted word database 23 can be independently of server 22.
Server 22 also at least one query word and the corresponding synonym of at least one query word is based on, is fed back As a result.
In an alternative embodiment, server 22 is based at least one query word and at least one query word is corresponding same Adopted word identifies the true intention of user, to obtain a more accurately result.
In another alternative embodiment, server 22 is based at least one query word and at least one query word is corresponding Synonym is obtained than more comprehensive feedback result.
Optionally, still by taking Fig. 1 b as an example, server 22 can based on " Construction Bank ", " loan ", " interest " and it is corresponding together Adopted word obtains retrieval type, is retrieved based on the retrieval type, to obtain feedback result, and feeds back to terminal device 21.
Optionally, based on " Construction Bank ", " loan ", " interest " and accordingly synonym obtain retrieval type can be with are as follows: (Construction Bank Or China Construction Bank, or Construction Bank) and (loan or borrows money) and (interest or interest or ceases gold or interest rate).
Terminal device 21 is also used to play or show feedback result.
Synonym as shown in connection with fig. 2 obtains system, is illustrated to synonym acquisition methods, as shown in figure 3, being this Shen Please a kind of flow chart of implementations of synonym acquisition methods that provides of embodiment, which includes:
Step S301: obtaining at least one corresponding candidate synonym of query word that query statement includes, described candidate same Adopted word is the word that the corresponding hit results sentence of the query statement includes.
It is assumed that query statement is " how much is the loan interest of Construction Bank ", the corresponding hit results sentence of the query statement is " do are 2018 Construction Bank's loan interest rates how many? which do Construction Bank's terms of loan have? _ melt 360 ".
It is assumed that " how much is the loan interest of Construction Bank " is divided, obtained keyword includes: Construction Bank, loan, interest; Will " do are 2018 Construction Bank's loan interest rates how many? which do Construction Bank's terms of loan have? _ melt 360 " divide, obtained pass Keyword include: Construction Bank, loan, interest rate, condition, which.
It is possible to determine query word from least one keyword that query statement includes.For example, " Construction Bank " is true It is set to query word.
It is understood that the keyword for including in hit results sentence is likely to be the synonym of " Construction Bank ", therefore, Optionally, all keywords for including by hit results sentence are used as candidate synonym.
Flow chart shown in Fig. 3 is the method for obtaining the synonym of a query word, if desired obtains multiple queries word point Not corresponding synonym, can be performed a plurality of times process step shown in Fig. 3, alternatively, executing process step shown in Fig. 3 parallel.
Step S302: the corresponding weighting co-occurrence frequency of at least one described candidate synonym is obtained;Wherein, a time The weighting co-occurrence frequency of synonym is selected at least to characterize the number for the first sentence pair that sentence set includes, the first sentence centering Query statement include the hit results sentence of the query word and the first sentence centering include the candidate synonym.
Sentence set is illustrated below.
As shown in figure 4, a kind of schematic diagram of implementation for sentence set provided by the embodiments of the present application.
Query statement is indicated with query in the embodiment of the present application, hit results sentence is indicated with " title ".
Assuming that the corresponding query statement of same title includes: query1, query2, query3 ..., query n, then Sentence set may include set 41.Wherein, n is the positive integer more than or equal to 1.
Assuming that the corresponding hit results sentence of same query includes: title1, title2, title3 ..., title m, So sentence set may include set 42.Wherein, m is the positive integer more than or equal to 1.
Set 41 and set 42 shown in Fig. 4 may have intersection, it is also possible to not have intersection.
Optionally, sentence set may include: set 41;Alternatively, sentence set includes: set 42;Alternatively, sentence set It include: set 41 and set 42.Fig. 4 is only a kind of example.
In an alternative embodiment, if sentence set includes set 41 and set 42, then set 41 and set 42 It can have following at least one association:
Query in set 42 belongs to the query in set 41, for example, the query in set 42 includes: in set 41 Query1, and/or, query2, and/or, query3 etc..
And/or
Title in set 41 belongs to the title in set 42, for example, the title in set 41 includes: in set 42 Title1, and/or, title2, and/or, title3 etc..
If the query in set 42 includes: query1, the query2 gathered in 41, and, query3, then set 42 It include: the corresponding one or more hit results sentence of query1, the corresponding one or more hit results sentence of query2, with And the corresponding one or more hit results sentence of query3.
If the title in set 41 includes: title1, the title2 gathered in 42, and, title3, then set 41 It include: corresponding one or more query statement of title1, corresponding one or more query statement of title2 and title3 Corresponding one or more query statement.
In an alternative embodiment, if sentence collection is combined into set 41, optionally, sentence set only includes the same title; Optionally, sentence set includes multiple and different title.In an alternative embodiment, if sentence collection is combined into set 42, optionally, Sentence set only includes same query;Optionally, sentence set includes multiple and different query.
In an alternative embodiment, the weighting co-occurrence frequency of a candidate synonym at least characterizes that sentence set includes The number of one sentence pair includes following any case:
The first situation: query statement includes query word in the weighting co-occurrence frequency=sentence set of a candidate synonym And corresponding hit results sentence includes the number of the first sentence pair of the candidate synonym.
Assuming that query word is referred to as w1, a candidate synonym is w2 at least one described candidate synonym, in an optional reality It applies in example, the w1 that the weighting co-occurrence frequency=sentence set of candidate synonym w2 includes, the number of corresponding first sentence pair of w2.
Wherein, corresponding first sentence of w1, w2 includes query word and hit knot to the query statement for referring to the first sentence pair Fruit sentence includes candidate synonym w2.Optionally, corresponding first sentence of w1, w2 to can also become w2 pairs of candidate synonym The first sentence pair answered.
Second situation: w1 is utilized, the number of corresponding first sentence pair of w2 is bigger, and w1, w2 are that the probability of synonym is got over Greatly;The number of corresponding second sentence of w1, w2 is bigger, and w1, w2 are the smaller thought of the probability of same word.
Wherein, corresponding second sentence of w1, w2 refers to the second sentence to comprising query word and include candidate synonym w2. Optionally, corresponding second sentence of w1, w2 can also become corresponding second sentence of candidate synonym w2.
In an alternative embodiment, the weighting co-occurrence frequency of any one candidate synonym w2 is obtained, comprising:
Obtain in the sentence set that query statement includes the query word and corresponding hit results sentence includes the time Select the first number of the first sentence pair of synonym w2;
Obtain in the sentence set comprising the query word and include candidate synonym w2 the second sentence second Number, second sentence are query statement or hit results sentence;
Based on first number and second number, the weighting co-occurrence frequency of candidate synonym w2 is obtained.
In an alternative embodiment, the weighting co-occurrence frequency and the first number of candidate synonym w2 is positively correlated and the second number Mesh is negatively correlated.
First number is the number of query word the first sentence pair corresponding with candidate synonym w2;Second number is The query word and candidate synonym w2 appear in the number of same second sentence in the sentence set, and the second sentence is to look into Ask sentence or hit results sentence.
In an alternative embodiment, the calculation formula of the weighting co-occurrence frequency of a candidate synonym w2 be can be such that
In an alternative embodiment, if sentence collection is combined into set 41, then the second sentence is query statement;Another optional In embodiment, if sentence collection is combined into set 42, then the second sentence is hit results sentence;In a further alternative embodiment, if Sentence set includes set 41 and set 42, then the second sentence can be query statement, or hit results sentence.
The first number and the second number are illustrated so that sentence set includes set 41 as an example below.
It is assumed that the sentence set sentence that includes to include: (query1, title), (query2, title), (query3, Title) ..., (query n, title).
The first number of one candidate synonym w2 refers to the number for the first sentence pair that sentence set includes, the first sentence The query statement of centering includes candidate synonym w2 comprising query word and the hit results sentence of the first sentence centering.
It is assumed that the query2 in (query2, title) includes query word, and title includes candidate keywords w2; Query n in (query n, title) includes query word, and title includes candidate keywords w2, then the candidate is synonymous First number of word is 2.
The second number of one candidate synonym w2 refers to that query word and candidate synonym w2 appear in sentence set The number of query statement;Alternatively, the second number of a candidate synonym w2 refers to that query word and candidate synonym w2 occur The number of query statement and query word and candidate synonym w2 appear in hit results language in sentence set in sentence set The sum of the number of sentence.
For example, if query word and candidate synonym w2 appear in query2, in query4, query8, query n, then The second number of candidate synonym w2 can be 4.
For example, if query word and candidate synonym w2 are appeared in title simultaneously, the second of optional candidate synonym w2 Number=4+1=5.
Optionally, if sentence set includes set 42, the second number of a candidate synonym refers to query word and the time Synonym is selected to appear in the number of hit results sentence in sentence set;Alternatively, the second number of a candidate synonym refers to Query word and the candidate synonym appear in the number of hit results sentence and query word and the candidate synonym in sentence set Appear in the sum of the number of query statement in sentence set.
It is still illustrated by taking set 42 shown in Fig. 4 as an example, the sentence that set 42 includes is to as follows:
(query, title1), (query, title2), (query, title3), (query, title4) ..., (query, title m).
It is assumed that query word w1 and candidate synonym w2 appear in hit results sentence in sentence set include: title1, Title2, title3, title4 and title 5, then the second number of candidate synonym w2 can be 5.
If query word and candidate synonym w2 are appeared in query simultaneously, the second number of optional candidate synonym w2 =5+1=6.
Optionally, if sentence set includes set 41 and set 42, the second number of a candidate synonym, which refers to, to be looked into It askes word and the candidate synonym appears in the number 1 of query statement in sentence set, alternatively, the second number of a candidate synonym Mesh refers to that query word and the candidate synonym appear in the number 2 of hit results sentence in sentence set, alternatively, a candidate is same Second number of adopted word refers to the sum of number 1 and number 2.
Optionally, if sentence set includes set 41 and set 42, then sentence set may include a plurality of different Hit results sentence and a plurality of different query statement.
Step S303: at least based on the corresponding weighting co-occurrence frequency of at least one candidate synonym, from described The synonym of the query word is obtained at least one candidate synonym.
In an alternative embodiment, step S303 be may comprise steps of:
Step 1: the corresponding weighting co-occurrence frequency of at least one described candidate synonym is at least based on, described in acquisition The corresponding synonym weight of at least one candidate synonym.
Step 2: based on the corresponding synonym weight of at least one candidate synonym, from it is described at least one The synonym of the query word is obtained in candidate synonym.
In synonym acquisition methods provided by the embodiments of the present application, got over using the number of the corresponding sentence pair of two keywords Greatly, two keywords are the bigger thought of probability of synonym, and the corresponding sentence of two keywords is to referring to that the sentence pair looks into Asking sentence includes a keyword, and the hit results sentence of the sentence pair includes another keyword;Query word pair is obtained first At least one candidate synonym answered, the corresponding weighting co-occurrence frequency of at least one described candidate synonym of acquisition, one The weighting co-occurrence frequency of candidate synonym at least characterizes the number for the first sentence pair that sentence set includes, first sentence pair In query statement include the hit results sentence of the query word and the first sentence centering include the candidate synonym.Extremely Few corresponding weighting co-occurrence frequency of at least one candidate synonym based on described in, from least one described candidate synonym Obtain the synonym of the query word.To realize the purpose for the synonym for obtaining query word.
Synonym acquisition methods as shown in connection with fig. 3, below to acquisition synonym candidate collection in synonym acquisition methods Process be illustrated.Obtain synonym candidate collection method include:
Step 1: obtaining the sentence set.
It may refer to the explanation of the part Fig. 4 about the explanation of sentence set, which is not described herein again.
Step 2: determining sentence pair to be processed from the sentence set.
By taking sentence set includes set 41 as an example, sentence to be processed is to being assumed to be (query1, title), it is assumed that query1 For " loan interest of Construction Bank is how much ", it is assumed that title be " 2018 Construction Bank's loan interest rates are how many? Construction Bank provide a loan item Which does part have? _ melt 360 ".
Step 3: to the sentence to be processed to comprising query statement and hit results sentence divide, obtain institute State at least one keyword and the sentence centering hit results language to be processed that sentence centering query statement to be processed includes At least one corresponding keyword of sentence.
Query1, title can be segmented respectively, optionally, can use HanLP participle tool and segmented. It can also be segmented with other participle tools, the embodiment of the present application is not limited to HanLP participle tool.
Optionally, the customized keyword of user can be added in user in HanLP participle tool, and HanLP segments tool When being segmented, the customized keyword of user can be divided into a vocabulary, for example, if it is one that user, which defines " I likes ", " I " and " liking " can be divided into a vocabulary then HanLP segments tool during being segmented by a keyword, If user does not input customized vocabulary " I likes " in HanLP participle tool, then HanLP participle tool can be by " I likes " It is divided into two vocabulary.
It is assumed that the result segmented to query1, title is as follows:
The word segmentation result of query1 includes: Construction Bank, loan, interest, more, few;The word segmentation result of title includes: 2018, builds If bank, loan, interest rate, it is more, less, condition, have, which, melt, 360.
In an alternative embodiment, the stop words for including in word segmentation result can be removed, optionally, can preset and stop Word, for example, can by modal particle, and/or, preposition, and/or, number, and/or, preposition is as stop words.
Optionally, remove the stop words in the word segmentation result of query1, and, the stop words in the word segmentation result of title, Obtain following result:
The word segmentation processing result of query1 includes: Construction Bank, loan, interest;The word segmentation processing result of title includes: construction Bank, loan, interest rate, condition, which.
Step 4: being looked into described in determination from least one keyword that the sentence centering query statement to be processed includes Ask word.
Step 5: by least one corresponding described keyword of the sentence centering hit results sentence to be processed, as Element in the synonym candidate collection.
Assuming that it regard " Construction Bank " in query1 as query word, then, the word segmentation processing result of title includes: construction silver Row, loan, interest rate, condition, which be element in synonym candidate collection.
Optionally, synonym candidate collection may include following candidate synonym pair:
(Construction Bank, Construction Bank), (Construction Bank, loan), (Construction Bank, interest rate), (Construction Bank, condition), (Construction Bank, which).
To sum up, Construction Bank, loan, interest rate, condition, which be likely to be the synonym of Construction Bank.
It is understood that not only to excavate comprehensive synonym, and to excavate during excavating synonym Accurate synonym, the result otherwise obtained will inaccuracy.
In order to further such that the synonym of the query word excavated is more accurate, if utilization query word w1 and candidate synonym Corresponding first sentence of w2 to comprising query statement length information and hit results sentence length information it is longer, inquiry Word w1 and candidate synonym w2 is the lower thought of the probability of synonym.
In any of the above-described synonym acquisition methods embodiment, step S303 includes:
Step 1: obtaining the corresponding length characteristic information of at least one described candidate synonym.
Wherein, the length characteristic information of a candidate synonym at least characterize the candidate synonym it is corresponding at least one One sentence is to the length information of corresponding query statement and the length information of hit results sentence.
In an alternative embodiment, the length characteristic information of a candidate synonym w2 at least characterizes candidate synonym w2 The length information of at least one corresponding the first sentence centering query statement and the length information of hit results sentence, packet Include following any case:
The first situation: the length characteristic information of candidate synonym w2 it is corresponding with candidate synonym w2 at least one first The length information of sentence centering query statement and the length information of hit results sentence are negatively correlated.
Negative correlation refers to the length characteristic information of candidate synonym w2 with the corresponding first sentence centering of candidate synonym w2 The increase (reduction) of the length information of the length information and hit results sentence of query statement and reduce (increase).
Second situation: if utilize corresponding first sentence of query word w1 and candidate synonym w2 to comprising query statement Length information and hit results sentence length information it is longer, query word w1 and candidate synonym w2 are the probability of synonym Lower, if the length information of corresponding second sentence of query word w1 and candidate synonym w2 is longer, query word w1 and candidate are synonymous Word w2 is the higher thought of the probability of synonym.
Wherein, corresponding second sentence of query word w1 and candidate synonym w2 refers to, the second sentence include query word w1 and Candidate synonym w2.
In an alternative embodiment, the length characteristic information for obtaining any candidate synonym w2 includes:
Based on candidate synonym w2 the first sentence of corresponding first number to corresponding query statement The length information of length information and hit results sentence obtains the first length information;
Based on the corresponding length information of candidate synonym w2 the second sentence of corresponding second number, obtain Second length information;
Based on first length information and second length information, the length characteristic of candidate synonym w2 is obtained Information.
In an alternative embodiment, the length characteristic information of candidate synonym w2 and the first length information are negatively correlated and the Two length informations are positively correlated.
Negative correlation refers to that the length characteristic information of candidate synonym w2 reduces with the increase (reduction) of the first length information (increase).Positive correlation refers to that the length characteristic information of candidate synonym w2 increases with the increase (reduction) of the second length information (reduction).
Assuming that corresponding first sentence of query word w1 and candidate synonym w2 that sentence set includes to include: (query1, Title), (query2, title), (query3, title).
Assuming that (query1, title) is that (how much is the loan interest of Construction Bank, and 2018 Construction Bank's loan interest rates are how many? it builds Which if bank loan condition has? _ melt 360);So query1 character length len (query1)=9, title character length Len (title)=33.
Assuming that corresponding second sentence of query word w1 and candidate synonym w2 that sentence set includes include: query4, Query5, it is assumed that len (query4)=11, len (query5)=30.
Step 2: at least based on the corresponding length characteristic information of at least one candidate synonym and it is described extremely Few corresponding weighting co-occurrence frequency of a candidate synonym, obtains the inquiry from least one described candidate synonym The synonym of word.
In an alternative embodiment, at least based on the corresponding length characteristic information of at least one described candidate synonym And the corresponding weighting co-occurrence frequency of described at least one candidate synonym, available first weight.
Optionally, the calculation formula of the first weight of a candidate synonym w2 can be such that
Wherein, w1 indicates that query word, w2 are any candidate synonym, and (query, title) indicates that sentence set includes Sentence pair, wherein I (w1, w2, query, title) indicates whether w1 appears in sentence centering query and whether w2 appears in this The title of sentence pair, if w1 appears in sentence centering query and w2 is appeared in the title of the sentence pair, I (w1, w2, Query, title) value be 1, otherwise the value of I (w1, w2, query, title) be 0.
S indicates the query statement or hit results sentence that sentence set includes, wherein I (w1, w2, s) indicates that w1 and w2 is It is no to appear in same sentence s;If w1 and w2 are appeared in same sentence s, then I (w1, w2, s) value be 1, otherwise I (w1, W2, s) value be 0.
Wherein, beta is a bias, prevents denominator too small, optionally, can be with value in the embodiment of the present application 8.0。
In an alternative embodiment, the synonym weight of candidate synonym w2 can be the first weight.
Assuming that sentence set include 10 sentences pair, be respectively as follows: (query1, title), (query2, title), (query3, title), (query4, title), (query5, title), (query6, title), (query7, title), (query8, title), (query9, title), (query10, title).
Wherein, the number for corresponding first sentence pair of query word w1 and candidate synonym w2 that sentence set includes is 4, point Not are as follows: (query1, title), (query2, title), (query3, title), (query4, title);Sentence set includes Query word w1 and the number of corresponding second sentence of candidate synonym w2 be 2, and be respectively as follows: query5, query6.
Optionally,
In an alternative embodiment, corresponding first number of different candidate synonyms is different, in order to prevent with first Number (or second number) increases, and causes the molecule (or denominator) of the first weight that the relationship of a high magnification numbe is presented, for example, candidate The first number of synonym w2 and the first number difference of candidate synonym w3 are 1, but the first weight of candidate synonym w2 Molecule and candidate synonym w3 the first weight molecule present multiple proportion, for example, the first weight of candidate synonym w2 Molecule be 2 times of molecule of the first weight of candidate synonym w3, do not meet actual conditions in this way.In logic, if it is candidate same The first number of adopted word w2 and the first number difference of candidate synonym w3 are 1, then the first weight of candidate synonym w2 The molecule of the first weight of molecule and candidate synonym w3 should be very close to.
To sum up, described at least based on the corresponding length characteristic information of at least one candidate synonym and described The corresponding weighting co-occurrence frequency of at least one candidate synonym obtains described look into from least one described candidate synonym The detailed process of synonym for asking word includes:
Step 1: being based on the candidate synonym for any candidate synonym w2 at least one described candidate synonym The first number of w2 and the first length information of candidate synonym w2 obtain the first parameter.
Optionally, the first parameter is ∑(w1,w2)∈(query,title)[I(w1,w2,query,title)/(len(query)* len(title))]。
Step 2: the second length information of the second number and the candidate synonym based on candidate synonym w2, is obtained Obtain the second parameter.
Optionally, the second parameter is max (∑(w1,w2)∈s[I(w1,w2,s)/(len(s)*len(s))],beta)。
Step 3: respectively using first parameter and second parameter as the independent variable of first function, to obtain Corresponding first dependent variable of first parameter and corresponding second dependent variable of second parameter.
Wherein, there is the first function difference of the corresponding parameter of different candidate synonyms to be less than or equal to first In the case where threshold value, the difference of the corresponding dependent variable of different candidate synonyms is less than or equal to the function of second threshold;Institute State that parameter is first parameter, the dependent variable is first dependent variable;Or, the parameter be the second parameter, it is described because Variable is second dependent variable.
In an alternative embodiment, first function can beWherein, k1 and k2 can be identical, can also With difference, k1 and k2 are hyper parameter, and k1 and k2 are the positive number greater than 0.
In an alternative embodiment, k1 can be with value 50, k2 value 20.
As shown in figure 5, inhibiting the first parameter or the second parameter corresponding using first function to be provided by the embodiments of the present application A kind of schematic diagram of implementation that increases at multiple proportion of dependent variable.
From fig. 5, it can be seen that if using the first parameter or the second parameter as abscissa, if not utilizing first function, then First parameter is the molecule of the first weight, and the second parameter is the denominator of the first weight, i.e. the first parameter and the first weight The relationship of molecule, the relationship with the second parameter and the denominator of the first weight, f (x)=x shown in single dotted broken line as shown in Figure 5 Relationship.
If k1's is very big, thenCurve be dotted line shown in fig. 5.If the value of k1 is that 50, k2 takes Value 20, thenCurve be in Fig. 5 shown in heavy line.Wherein, the first parameter or the second parameter are x, first The molecule of weight or the denominator of the first weight are f (x).
When double dot dash line refers to that k1 and k2 is 0 in Fig. 5,Curve.
Fine line in Fig. 5 is f (x)=k1+1。
As can be seen from Figure 5When two different x are more close, corresponding f (x) value compares It is close, the case where multiple increases will not be rendered into.
Step 4: at least based on corresponding first dependent variable of at least one candidate synonym and second because becoming Amount, obtains the synonym of the query word from least one described candidate synonym.
Optionally, it is based on the first dependent variable of candidate synonym w2 and the second dependent variable, to obtain candidate synonym w2's First weight.
Optionally, the calculation formula of the first weight of candidate synonym w2 is as follows:
It is understood that often will appear label word in hit results sentence title, for example, search dog, having, taking journey The invalid keyword such as net, video, invalid keyword refers to commonplace general keyword, for example, no matter user searches for which The TV play or film that star deduces, " video " can all occur in hit results sentence, that is, label word is in sentence set The document frequencies of appearance are very high.
The document frequencies of one candidate synonym w2 refer to the number of the sentence in sentence set comprising candidate synonym w2.
It is above-mentioned to be referred to the determination method of at least one corresponding candidate synonym of query word, i.e., it is query statement is corresponding The keyword that hit results sentence includes is as candidate synonym.If at least one described candidate synonym includes label word, It will lead to that the document frequencies that label word occurs in sentence set are higher, may result in finally determining query word (assuming that looking into Ask word be not label word) synonym include label word.So that the synonym inaccuracy of the query word determined.
To sum up, in order to avoid label word to be determined as to the synonym of query word, the document for reducing candidate synonym is proposed The thought of influence of the frequency to synonym weight.
It is at least right respectively based at least one described candidate synonym in any of the above-described synonym acquisition methods embodiment The weighting co-occurrence frequency answered, the synonym that the query word is obtained from least one described candidate synonym include:
Step 1: obtaining the corresponding document frequency weight of at least one described candidate synonym.
Wherein, the document frequency weight of a candidate synonym at least characterizes in the sentence set comprising described candidate same The sentence number of the sentence of adopted word.
In an alternative embodiment, the document frequency weight of a candidate synonym w2 is at least characterized in the sentence set The sentence number of sentence comprising the candidate synonym w2 includes following any case:
The first situation: the document frequency weight of candidate synonym w2 in the sentence set include the candidate it is synonymous The sentence number of the sentence of word w2 is negatively correlated.
In an alternative embodiment, the document frequencies of candidate synonym w2 are in the sentence set comprising the candidate The number of the sentence of synonym w2 includes: the of the hit results sentence in the sentence set comprising the candidate synonym w2 Three numbers;Or, the 4th number of the query statement in the sentence set comprising the candidate synonym w2;Or, third number The sum of with the 4th number.
In an alternative embodiment, the calculation formula of the document frequency weight of candidate synonym w2 is as follows:
Log (M/w2 document frequencies), wherein M is the total number for the query statement that the sentence set includes, or, M is language The total number for the hit results that sentence set includes, or, M is the total number and hit knot for the query statement that the sentence set includes The sum of total number of fruit.
In an alternative embodiment, the corresponding document frequencies of different candidate synonyms are different, in order to prevent with document The increase of the frequency, cause document frequency weight present a high magnification numbe relationship, in conjunction with first function to document frequency weight into Row optimization, optionally, obtains the corresponding document frequency weight of any one candidate synonym, comprising:
Obtain the sentence number of the sentence in the sentence set comprising candidate synonym w2;
The sentence number of candidate synonym w2 is obtained into third dependent variable as the independent variable of first function;
Wherein, there is the first function difference of the corresponding sentence number of different candidate synonyms to be less than or equal to In the case where third threshold value, the difference of the corresponding third dependent variable of different candidate synonyms is less than or equal to the 4th threshold value Function;
Based on the sentence total number that the third dependent variable and the sentence set include, candidate synonym w2 is obtained Document frequency weight.
In an alternative embodiment, the calculation formula of the document frequency weight of candidate synonym w2 is as follows:
Log (M/f (w2 document frequencies)).
Step 2: at least based on the corresponding weighting co-occurrence frequency of at least one candidate synonym and it is described extremely Few corresponding document frequency weight of a candidate synonym, obtains the inquiry from least one described candidate synonym The synonym of word.
In an alternative embodiment, the calculation formula of the synonym weight of candidate synonym w2 be can be such that
Synonym acquisition methods provided by the embodiments of the present application are using w1, and the number of corresponding first sentence pair of w2 is got over Greatly, w1, w2 are that the probability of synonym is bigger;The number that w1, w2 appear in same sentence is bigger, w1, and w2 is the probability of same word Smaller thought;But the number of corresponding first sentence pair of w1, w2 is bigger, not necessarily w1, w2 are that the probability of synonym is got over Greatly.
For example, query statement is " Taobao ", hit results sentence is that " Taobao-washes in a pan!I likes ", it is assumed that query word is " to wash in a pan It is precious ", it is likely that " I likes " is determined as to the synonym of " Taobao ";For another example query statement is that " which pillow of Taobao compares It is good ", hit results sentence is " the XX brand pillow in Jingdone district is preferable ", it is assumed that and query word is Taobao, then, it is likely that by " Jingdone district " It is determined as the synonym of " Taobao ".
In order to avoid the above problem, similarity information of the query word respectively with each candidate synonym can be obtained, such as remaining String similarity.It is few right respectively based at least one described candidate synonym in any of the above-described synonym acquisition methods embodiment The weighting co-occurrence frequency answered obtains the corresponding synonym weight of at least one described candidate synonym, comprising:
Step 1: obtaining the corresponding semantic feature information of at least one candidate synonym and the inquiry The semantic feature information of word.
In an alternative embodiment, the corresponding semanteme of each candidate synonym can be obtained based on Word2Vec term vector Characteristic information and the corresponding semantic feature information of query word.
Optionally, word embedding semantic feature can be added.Optionally, it can use existing known corpus training Word2vec term vector.
In another alternative embodiment, semantic feature Matching Model can be constructed in advance, semantic feature Matching Model tool There is the semantic feature information of the keyword of prediction to tend to the function of the true semantic feature information of the keyword.
Optionally, which obtained by training neural network.
The sentence that can include from sentence set to comprising query statement and hit results sentence in extracting keywords, To obtain sample keyword, due to the corresponding semantic feature information of known sample keyword.
It can be based on sample keyword training neural network, to obtain semantic feature Matching Model.
Step 2: for any candidate synonym at least one described candidate synonym, based on the candidate synonym The semantic feature information of semantic feature information and the query word, obtains the similarity of the candidate synonym Yu the query word Information, to obtain the corresponding similarity information of at least one described candidate synonym.
The similarity information of query word w1 and candidate synonym w2 can be cosine similarity, the semantic feature letter of w1 and w2 The cosine similarity of breath can be indicated with cos (w1, w2).
Step 3: at least based on the corresponding weighting co-occurrence frequency of at least one candidate synonym and it is described extremely Few corresponding similarity information of a candidate synonym, obtains the query word from least one described candidate synonym Synonym.
In an alternative embodiment, the calculation formula of the synonym weight of candidate synonym w2 be can be such that
Alternatively,
In any of the above-described synonym acquisition methods embodiment, the implementation of the step two in step S303 includes but not It is limited to following manner.
The corresponding synonym weight of at least one described candidate synonym is ranked up by the first implementation; The synonym of the query word is obtained based on ranking results.
If descending sort, synonym of the L candidate synonyms as query word before can taking, wherein L is to be greater than or wait In 1 positive integer.
If ascending sort, synonym of the rear L candidate synonym as query word can be taken.
Second of implementation determines same from the corresponding synonym weight of at least one described candidate synonym The adopted maximum target candidate synonym of word weight, using the target candidate synonym as the synonym of the query word.
The third implementation will be greater than or wait in the corresponding synonym weight of at least one described candidate synonym In synonym of the candidate synonym as the query word of weight threshold.
Optionally, weight threshold can be configured based on actual conditions, and the embodiment of the present application is not to weight threshold Occurrence is defined.
It is still illustrated by taking Fig. 1 b and Fig. 1 c as an example, if query word is " Construction Bank ", " Construction Bank " at least one corresponding candidate Synonym includes: Construction Bank, loan, interest rate etc., then as shown in table 1, be query word respectively with Construction Bank, loan, benefit The weighting co-occurrence frequency of rate, length characteristic information, document frequencies, similarity information are as shown in table 1.
Table 1
If the maximum candidate synonym of synonym weight to be determined as to the synonym of " Construction Bank ", then can determine " to build If bank " be " Construction Bank " synonym.
Method is described in detail in aforementioned present invention disclosed embodiment, diversified forms can be used for method of the invention Device realize that therefore the invention also discloses a kind of devices, and specific embodiment is given below and is described in detail.
As shown in fig. 6, a kind of schematic diagram of implementation for synonym acquisition device provided by the embodiments of the present application, it should Synonym acquisition device includes:
First obtains module 61, for obtaining at least one corresponding candidate synonym of query word that query statement includes, The candidate synonym is the word that the corresponding hit results sentence of the query statement includes;
Second obtains module 62, for obtaining the corresponding weighting co-occurrence frequency of at least one described candidate synonym; Wherein, the weighting co-occurrence frequency of a candidate synonym at least characterizes the number for the first sentence pair that sentence set includes, described The query statement of first sentence centering is comprising the query word and the hit results sentence of the first sentence centering includes the time Select synonym;
Third obtains module 63, at least based on the corresponding weighting co-occurrence frequency of at least one described candidate synonym It is secondary, the synonym of the query word is obtained from least one described candidate synonym.
In an alternative embodiment, second obtains module 62 when obtaining the weighting co-occurrence frequency of any candidate synonym, It specifically includes:
First acquisition unit includes the query word and corresponding hit for obtaining query statement in the sentence set As a result sentence includes the first number of the first sentence pair of the candidate synonym;
Second acquisition unit, for obtaining in the sentence set comprising the query word and include the candidate synonym Second number of the second sentence, second sentence are query statement or hit results sentence;
Third acquiring unit obtains the candidate synonym for being based on first number and second number Weight the co-occurrence frequency.
In an alternative embodiment, third obtains module 63 and includes:
4th acquiring unit, for obtaining the corresponding length characteristic information of at least one described candidate synonym;
Wherein, the length characteristic information of a candidate synonym at least characterize the candidate synonym it is corresponding at least one One sentence is to the length information of corresponding query statement and the length information of hit results sentence;
5th acquiring unit, at least based on the corresponding length characteristic information of at least one described candidate synonym And the corresponding weighting co-occurrence frequency of described at least one candidate synonym, it is obtained from least one described candidate synonym Obtain the synonym of the query word.
In an alternative embodiment, the 4th acquiring unit obtains the length characteristic information of any one candidate synonym executing When, it specifically includes:
First obtains subelement, for being based on the candidate synonym the first sentence of corresponding first number to respectively The length information of corresponding query statement and the length information of hit results sentence, obtain the first length information;
Second obtains subelement, for right respectively based on the candidate synonym the second sentence of corresponding second number The length information answered obtains the second length information;
Third obtains subelement, for being based on first length information and second length information, obtains the time Select the length characteristic information of synonym.
In an alternative embodiment, the 5th acquiring unit includes:
4th obtains subelement, for for any candidate synonym at least one described candidate synonym, being based on should First number of candidate synonym and the first length information of the candidate synonym obtain the first parameter;
5th obtains subelement, and second for the second number and the candidate synonym based on the candidate synonym is long Information is spent, the second parameter is obtained;
6th obtains subelement, for first parameter and second parameter are used as respectively first function oneself Variable, to obtain corresponding first dependent variable of first parameter and corresponding second dependent variable of second parameter;
Wherein, there is the first function difference of the corresponding parameter of different candidate synonyms to be less than or equal to first In the case where threshold value, the difference of the corresponding dependent variable of different candidate synonyms is less than or equal to the function of second threshold;Institute State that parameter is first parameter, the dependent variable is first dependent variable;Or, the parameter be the second parameter, it is described because Variable is second dependent variable;
7th obtains subelement, at least based on corresponding first dependent variable of at least one described candidate synonym And second dependent variable, the synonym of the query word is obtained from least one described candidate synonym.
In an alternative embodiment, third obtains module 63 and includes:
6th acquiring unit, for obtaining the corresponding document frequency weight of at least one described candidate synonym;
Wherein, the document frequency weight of a candidate synonym at least characterizes in the sentence set comprising described candidate same The sentence number of the sentence of adopted word;
7th acquiring unit, at least based on the corresponding weighting co-occurrence frequency of at least one described candidate synonym And the corresponding document frequency weight of described at least one candidate synonym, it is obtained from least one described candidate synonym Obtain the synonym of the query word.
In an alternative embodiment, the 6th acquiring unit obtains the corresponding document of any one candidate synonym in execution When frequency weight, specifically include:
8th obtains subelement, for obtaining the sentence number of the sentence in the sentence set comprising the candidate synonym Mesh;
9th obtains subelement, for obtaining for the sentence number of the candidate synonym as the independent variable of first function Three dependent variables;
Wherein, there is the first function difference of the corresponding sentence number of different candidate synonyms to be less than or equal to In the case where third threshold value, the difference of the corresponding third dependent variable of different candidate synonyms is less than or equal to the 4th threshold value Function;
Tenth obtains subelement, the sentence sum for including based on the third dependent variable and the sentence set Mesh obtains the document frequency weight of the candidate synonym.
In an alternative embodiment, third obtains module 63 and includes:
8th acquiring unit, for obtaining the corresponding semantic feature information of at least one described candidate synonym, with And the semantic feature information of the query word;
9th acquiring unit, for being based on the time for any candidate synonym at least one described candidate synonym The semantic feature information of synonym and the semantic feature information of the query word are selected, the candidate synonym and the inquiry are obtained The similarity information of word, to obtain the corresponding similarity information of at least one described candidate synonym;
Tenth acquiring unit, at least based on the corresponding weighting co-occurrence frequency of at least one described candidate synonym And the corresponding similarity information of described at least one candidate synonym, it is obtained from least one described candidate synonym The synonym of the query word.
In an alternative embodiment, the first acquisition module 61 includes:
11st acquiring unit, for obtaining the sentence set, the sentence set includes at least a query statement Corresponding one or more hit results sentence, and/or, the corresponding one or more query statements of a hit results sentence;
First determination unit, for determining sentence pair to be processed from the sentence set;
12nd acquiring unit, for the sentence to be processed to comprising query statement and hit results sentence carry out It divides, obtains at least one keyword and the sentence pair to be processed that the sentence centering query statement to be processed includes At least one corresponding keyword of middle hit results sentence;
Second determination unit, for from least one keyword that the sentence centering query statement to be processed includes, Determine the query word;
Third determination unit is used at least one corresponding described pass of the sentence centering hit results sentence to be processed Keyword, as at least one described candidate synonym.
As shown in fig. 7, being a kind of implementation schematic diagram of electronic equipment provided by the embodiments of the present application, the electronic equipment Include:
Memory 71, for storing program;
Processor 72, for executing described program, described program is specifically used for:
At least one corresponding candidate synonym of query word that query statement includes is obtained, the candidate synonym is described The word that the corresponding hit results sentence of query statement includes;
Obtain the corresponding weighting co-occurrence frequency of at least one described candidate synonym;The weighting of one candidate synonym The co-occurrence frequency at least characterizes the number for the first sentence pair that sentence set includes, and the query statement of the first sentence centering includes The query word and hit results sentence of the first sentence centering includes the candidate synonym;
At least based on the corresponding weighting co-occurrence frequency of at least one described candidate synonym, from least one described time Select the synonym that the query word is obtained in synonym.
Processor may be a central processor CPU or specific integrated circuit ASIC (Application Specific Integrated Circuit), or be arranged to implement the integrated electricity of one or more of the embodiment of the present invention Road.
Electronic equipment can also include communication interface 73 and communication bus 74, wherein memory, processor and communication Interface completes mutual communication by communication bus.
Optionally, communication interface can be the interface of communication module, such as the interface of gsm module.
The embodiment of the invention also provides a kind of readable storage medium storing program for executing, are stored thereon with computer program, the computer When program is executed by processor, each step for including such as above-mentioned synonym acquisition methods embodiment is realized.
The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, of the invention It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one The widest scope of cause.

Claims (10)

1. a kind of synonym acquisition methods characterized by comprising
At least one corresponding candidate synonym of query word that query statement includes is obtained, the candidate synonym is the inquiry The word that the corresponding hit results sentence of sentence includes;
Obtain the corresponding weighting co-occurrence frequency of at least one described candidate synonym;The weighting co-occurrence of one candidate synonym The frequency at least characterizes the number for the first sentence pair that sentence set includes, and the query statement of the first sentence centering includes described The query word and hit results sentence of the first sentence centering includes the candidate synonym;
It is same from least one described candidate at least based on the corresponding weighting co-occurrence frequency of at least one described candidate synonym The synonym of the query word is obtained in adopted word.
2. synonym acquisition methods according to claim 1, which is characterized in that obtain adding for any one candidate synonym Weigh the co-occurrence frequency, comprising:
Obtain in the sentence set query statement include the query word and corresponding hit results sentence include candidate together First number of the first sentence pair of adopted word;
Obtain in the sentence set comprising the query word and include the candidate synonym the second sentence the second number, institute Stating the second sentence is query statement or hit results sentence;
Based on first number and second number, the weighting co-occurrence frequency of the candidate synonym is obtained.
3. synonym acquisition methods according to claim 2, which is characterized in that described at least based at least one described candidate The corresponding weighting co-occurrence frequency of synonym, obtains the synonymous of the query word from least one described candidate synonym Word, comprising:
Obtain the corresponding length characteristic information of at least one described candidate synonym;
Wherein, the length characteristic information of a candidate synonym at least characterizes at least one corresponding first language of the candidate synonym The length information of the corresponding query statement of sentence pair and the length information of hit results sentence;
At least based on the corresponding length characteristic information of at least one candidate synonym and at least one described candidate The corresponding weighting co-occurrence frequency of synonym, obtains the synonymous of the query word from least one described candidate synonym Word.
4. synonym acquisition methods according to claim 3, which is characterized in that the length for obtaining any one candidate synonym is special Reference ceases
Believed based on length of the candidate synonym the first sentence of corresponding first number to corresponding query statement The length information of breath and hit results sentence, obtains the first length information;
Based on the corresponding length information of the candidate synonym the second sentence of corresponding second number, it is long to obtain second Spend information;
Based on first length information and second length information, the length characteristic information of the candidate synonym is obtained.
5. synonym acquisition methods according to claim 4, which is characterized in that described at least based at least one described candidate The corresponding length characteristic information of synonym and the corresponding weighting co-occurrence frequency of at least one described candidate synonym, The synonym of the query word is obtained from least one described candidate synonym, comprising:
For any candidate synonym at least one described candidate synonym, the first number based on the candidate synonym and First length information of the candidate synonym obtains the first parameter;
Second length information of the second number and the candidate synonym based on the candidate synonym obtains the second parameter;
Respectively using first parameter and second parameter as the independent variable of first function, to obtain first parameter Corresponding first dependent variable and corresponding second dependent variable of second parameter;
Wherein, there is the first function difference of the corresponding parameter of different candidate synonyms to be less than or equal to first threshold In the case where, the difference of the corresponding dependent variable of different candidate synonyms is less than or equal to the function of second threshold;The ginseng Number is first parameter, the dependent variable is first dependent variable;Or, the parameter is the second parameter, the dependent variable For second dependent variable;
At least based on corresponding first dependent variable of at least one candidate synonym and the second dependent variable, from it is described to The synonym of the query word is obtained in a few candidate synonym.
6. synonym acquisition methods according to claim 1, which is characterized in that described at least based at least one described candidate The corresponding weighting co-occurrence frequency of synonym, obtains the synonymous of the query word from least one described candidate synonym Word, comprising:
Obtain the corresponding document frequency weight of at least one described candidate synonym;
Wherein, the document frequency weight of a candidate synonym at least characterizes in the sentence set comprising the candidate synonym Sentence sentence number;
At least based on the corresponding weighting co-occurrence frequency of at least one candidate synonym and at least one described candidate The corresponding document frequency weight of synonym, obtains the synonymous of the query word from least one described candidate synonym Word.
7. synonym acquisition methods according to claim 6, the corresponding document of described any one candidate synonym of acquisition Frequency weight, comprising:
Obtain the sentence number of the sentence in the sentence set comprising the candidate synonym;
The sentence number of the candidate synonym is obtained into third dependent variable as the independent variable of first function;
Wherein, there is the first function difference of the corresponding sentence number of different candidate synonyms to be less than or equal to third In the case where threshold value, the difference of the corresponding third dependent variable of different candidate synonyms is less than or equal to the function of the 4th threshold value Energy;
Based on the sentence total number that the third dependent variable and the sentence set include, the document of the candidate synonym is obtained Frequency weight.
8. synonym acquisition methods according to claim 1, which is characterized in that described at least based at least one described candidate The corresponding weighting co-occurrence frequency of synonym, obtains the synonymous of the query word from least one described candidate synonym Word, comprising:
Obtain the semantic feature of the corresponding semantic feature information of at least one candidate synonym and the query word Information;
For any candidate synonym at least one described candidate synonym, the semantic feature information based on the candidate synonym And the semantic feature information of the query word, the similarity information of the candidate synonym Yu the query word is obtained, to obtain The corresponding similarity information of described at least one candidate synonym;
At least based on the corresponding weighting co-occurrence frequency of at least one candidate synonym and at least one described candidate The corresponding similarity information of synonym obtains the synonym of the query word from least one described candidate synonym.
9. according to claim 1 to 8 any synonym acquisition methods, which is characterized in that the acquisition query statement includes At least one corresponding candidate synonym of query word include:
The sentence set is obtained, the sentence set includes at least the corresponding one or more hit results of a query statement Sentence, and/or, the corresponding one or more query statements of a hit results sentence;
Sentence pair to be processed is determined from the sentence set;
To the sentence to be processed to comprising query statement and hit results sentence divide, obtain the sentence to be processed At least one keyword and the sentence centering hit results sentence to be processed that centering query statement includes are corresponding at least One keyword;
From at least one keyword that the sentence centering query statement to be processed includes, the query word is determined;
By at least one corresponding described keyword of the sentence centering hit results sentence to be processed, determine it is described at least one Candidate synonym.
10. a kind of synonym acquisition device characterized by comprising
First obtains module, for obtaining at least one corresponding candidate synonym of query word that query statement includes, the time Selecting synonym is the word that the corresponding hit results sentence of the query statement includes;
Second obtains module, for obtaining the corresponding weighting co-occurrence frequency of at least one described candidate synonym;Wherein, one The weighting co-occurrence frequency of a candidate synonym at least characterizes the number for the first sentence pair that sentence set includes, first sentence The query statement of centering is comprising the query word and the hit results sentence of the first sentence centering includes the candidate synonym;
Third obtains module, for being at least based on the corresponding weighting co-occurrence frequency of at least one described candidate synonym, from The synonym of the query word is obtained at least one described candidate synonym.
CN201910160822.9A 2019-03-04 2019-03-04 Synonym acquisition method and device Active CN109918661B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910160822.9A CN109918661B (en) 2019-03-04 2019-03-04 Synonym acquisition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910160822.9A CN109918661B (en) 2019-03-04 2019-03-04 Synonym acquisition method and device

Publications (2)

Publication Number Publication Date
CN109918661A true CN109918661A (en) 2019-06-21
CN109918661B CN109918661B (en) 2023-05-30

Family

ID=66963144

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910160822.9A Active CN109918661B (en) 2019-03-04 2019-03-04 Synonym acquisition method and device

Country Status (1)

Country Link
CN (1) CN109918661B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110413737A (en) * 2019-07-29 2019-11-05 腾讯科技(深圳)有限公司 A kind of determination method, apparatus, server and the readable storage medium storing program for executing of synonym
CN110795565A (en) * 2019-09-06 2020-02-14 腾讯科技(深圳)有限公司 Semantic recognition-based alias mining method, device, medium and electronic equipment

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2970795A1 (en) * 2011-01-25 2012-07-27 Synomia Method for filtering of synonyms in electronic document database in information system for searching information in e.g. Internet, involves performing reduction of number of synonyms of keyword based on score value of semantic proximity
CN102622411A (en) * 2012-02-17 2012-08-01 清华大学 Structured abstract generating method
CN102760134A (en) * 2011-04-28 2012-10-31 北京百度网讯科技有限公司 Method and device for mining synonyms
WO2014002775A1 (en) * 2012-06-25 2014-01-03 日本電気株式会社 Synonym extraction system, method and recording medium
CN103514269A (en) * 2013-09-12 2014-01-15 百度在线网络技术(北京)有限公司 Second query term determined to be related to first query term based on natural searching results
JP2015032228A (en) * 2013-08-05 2015-02-16 Kddi株式会社 Program, method, apparatus and server generating co-occurrence pattern for detecting near-synonym
CN105279252A (en) * 2015-10-12 2016-01-27 广州神马移动信息科技有限公司 Related word mining method, search method and search system
CN106933806A (en) * 2017-03-15 2017-07-07 北京大数医达科技有限公司 The determination method and apparatus of medical synonym
CN107247745A (en) * 2017-05-23 2017-10-13 华中师范大学 A kind of information retrieval method and system based on pseudo-linear filter model
CN107451212A (en) * 2017-07-14 2017-12-08 北京京东尚科信息技术有限公司 Synonymous method for digging and device based on relevant search
CN108038096A (en) * 2017-11-10 2018-05-15 平安科技(深圳)有限公司 Knowledge database documents method for quickly retrieving, application server computer readable storage medium storing program for executing
CN108509474A (en) * 2017-09-15 2018-09-07 腾讯科技(深圳)有限公司 Search for the synonym extended method and device of information

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2970795A1 (en) * 2011-01-25 2012-07-27 Synomia Method for filtering of synonyms in electronic document database in information system for searching information in e.g. Internet, involves performing reduction of number of synonyms of keyword based on score value of semantic proximity
CN102760134A (en) * 2011-04-28 2012-10-31 北京百度网讯科技有限公司 Method and device for mining synonyms
CN102622411A (en) * 2012-02-17 2012-08-01 清华大学 Structured abstract generating method
WO2014002775A1 (en) * 2012-06-25 2014-01-03 日本電気株式会社 Synonym extraction system, method and recording medium
JP2015032228A (en) * 2013-08-05 2015-02-16 Kddi株式会社 Program, method, apparatus and server generating co-occurrence pattern for detecting near-synonym
CN103514269A (en) * 2013-09-12 2014-01-15 百度在线网络技术(北京)有限公司 Second query term determined to be related to first query term based on natural searching results
CN105279252A (en) * 2015-10-12 2016-01-27 广州神马移动信息科技有限公司 Related word mining method, search method and search system
CN106933806A (en) * 2017-03-15 2017-07-07 北京大数医达科技有限公司 The determination method and apparatus of medical synonym
CN107247745A (en) * 2017-05-23 2017-10-13 华中师范大学 A kind of information retrieval method and system based on pseudo-linear filter model
CN107451212A (en) * 2017-07-14 2017-12-08 北京京东尚科信息技术有限公司 Synonymous method for digging and device based on relevant search
CN108509474A (en) * 2017-09-15 2018-09-07 腾讯科技(深圳)有限公司 Search for the synonym extended method and device of information
CN108038096A (en) * 2017-11-10 2018-05-15 平安科技(深圳)有限公司 Knowledge database documents method for quickly retrieving, application server computer readable storage medium storing program for executing

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
REKHA VAIDYANATHAN: "Query Expansion Strategy based on Pseudo Relevance Feedback and Term Weight Scheme for Monolingual Retrieval", 《ARXIV》 *
王颖等: "基于专利搜索日志的同义词挖掘", 《计算机工程与设计》 *
肖淋峰: "面向检索信息的同义词挖掘", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110413737A (en) * 2019-07-29 2019-11-05 腾讯科技(深圳)有限公司 A kind of determination method, apparatus, server and the readable storage medium storing program for executing of synonym
CN110413737B (en) * 2019-07-29 2022-10-14 腾讯科技(深圳)有限公司 Synonym determination method, synonym determination device, server and readable storage medium
CN110795565A (en) * 2019-09-06 2020-02-14 腾讯科技(深圳)有限公司 Semantic recognition-based alias mining method, device, medium and electronic equipment
CN110795565B (en) * 2019-09-06 2023-10-27 腾讯科技(深圳)有限公司 Alias mining method and device based on semantic recognition, medium and electronic equipment

Also Published As

Publication number Publication date
CN109918661B (en) 2023-05-30

Similar Documents

Publication Publication Date Title
US11227118B2 (en) Methods, devices, and systems for constructing intelligent knowledge base
CN107992585B (en) Universal label mining method, device, server and medium
CN110704743B (en) Semantic search method and device based on knowledge graph
US20210312139A1 (en) Method and apparatus of generating semantic feature, method and apparatus of training model, electronic device, and storage medium
CN111143576A (en) Event-oriented dynamic knowledge graph construction method and device
EP3933657A1 (en) Conference minutes generation method and apparatus, electronic device, and computer-readable storage medium
CN110209808A (en) A kind of event generation method and relevant apparatus based on text information
CN110162768B (en) Method and device for acquiring entity relationship, computer readable medium and electronic equipment
EP3916579A1 (en) Method for resource sorting, method for training sorting model and corresponding apparatuses
CN108287875B (en) Character co-occurrence relation determining method, expert recommending method, device and equipment
CN110569289B (en) Column data processing method, equipment and medium based on big data
US20220300543A1 (en) Method of retrieving query, electronic device and medium
CN112115232A (en) Data error correction method and device and server
CN109063184A (en) Multilingual newsletter archive clustering method, storage medium and terminal device
CN112905768A (en) Data interaction method, device and storage medium
CN112148886A (en) Method and system for constructing content knowledge graph
CN109918661A (en) Synonym acquisition methods and device
CN108388556A (en) The method for digging and system of similar entity
US20220335070A1 (en) Method and apparatus for querying writing material, and storage medium
CN116401345A (en) Intelligent question-answering method, device, storage medium and equipment
US20220198358A1 (en) Method for generating user interest profile, electronic device and storage medium
CN113505196B (en) Text retrieval method and device based on parts of speech, electronic equipment and storage medium
CN114547257A (en) Class matching method and device, computer equipment and storage medium
US20230142351A1 (en) Methods and systems for searching and retrieving information
CN109727591B (en) Voice search method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant