CN106126588A - The method and apparatus that related term is provided - Google Patents
The method and apparatus that related term is provided Download PDFInfo
- Publication number
- CN106126588A CN106126588A CN201610445489.2A CN201610445489A CN106126588A CN 106126588 A CN106126588 A CN 106126588A CN 201610445489 A CN201610445489 A CN 201610445489A CN 106126588 A CN106126588 A CN 106126588A
- Authority
- CN
- China
- Prior art keywords
- word
- related term
- tested
- word set
- entry
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
Abstract
The invention discloses a kind of method that related term is provided, including: using user input key word as input word, from entry data base, obtain the bottom of described key word be correlated with word set, and determine the degree of association of each the next related term that described the next related term concentrates and described key word;Being correlated with word set in bottom according to described key word, obtains the upper relevant word set of described key word from entry data base, and determines the degree of association of each upper related term that described upper related term concentrates and described key word;The union of word set of being correlated with the bottom of described key word and upper relevant word set is correlated with word set as the output of described key word, and the degree of association of each the output related term according to described output related term concentration, the related term selecting to be supplied to described user is concentrated at described output related term.Correspondingly, the invention also discloses a kind of device that related term is provided.Use the embodiment of the present invention, using the teaching of the invention it is possible to provide more and related term more accurately.
Description
Technical field
The present invention relates to field of computer technology, particularly relate to a kind of method and apparatus that related term is provided.
Background technology
At present, the function of the keyword search that shopping website and search engine service website all provide, i.e. user's input
Wanting commodity or the key word of technology of search, server then goes out corresponding result according to this keyword search and returns to use
Family.Server is in order to provide Search Results accurately, and key word typically can be extended by server, i.e. according to user's input
Key word, is found out the related term that key word is corresponding, and provides the related term found to user, searched by key word user
When rope and failing obtains satisfied Search Results, just scan for according to related term.But the extension of existing related term is by
Dictionary is had to be extended, such as WordNet, " synonym woods ", and the related term that this mode is obtained the most quite has
Limit, and the related term obtained likely does not catches up with the development and change of language, it is impossible to meet related term to ageing requirement.
Summary of the invention
The embodiment of the present invention proposes a kind of method and apparatus providing related term, using the teaching of the invention it is possible to provide more and more accurately
Related term.
A kind of method that related term is provided that the embodiment of the present invention proposes, including:
Using user input key word as input word, from entry data base, obtain the next related term of described key word
Collection, and determine each the next related term of described the next related term concentration and the degree of association of described key word;
Being correlated with word set in bottom according to described key word, obtains the upper related term of described key word from entry data base
Collection, and determine each upper related term and the degree of association of described key word that described upper related term concentrates;
The union of word set of being correlated with the bottom of described key word and upper relevant word set is as the output phase of described key word
Close word set, and the degree of association of each the output related term concentrated according to described output related term, be correlated with word set in described output
Middle selection is supplied to the related term of described user.
As the further improvement of the embodiment of the present invention, be correlated with word set in the described bottom according to described key word, from entry
Data base obtains the upper relevant word set of described key word, and determines each upper phase that described upper related term is concentrated
Close the degree of association of word and described key word, particularly as follows:
Each the next related term concentrated for described the next related term, carrys out more newly inputted word with this bottom related term,
It is correlated with word set in the bottom obtaining the input word after updating from entry data base;
Judge that whether the total quantity of the next relevant word set is more than predetermined threshold value;
The most then concentrate from the next related term and filter out the bottom comprising described key word and be correlated with word set, and by described bag
The corresponding input word containing the next related term collection of described key word is as upper related term, it is thus achieved that described key word upper relevant
Word set;Wherein, at described key word and this bottom related term set pair of the described the next related term concentration comprising described key word
The degree of association of the input word answered, as this input word when as upper related term with the degree of association of described key word;
If it is not, then continue executing with following operation: under each of the next related term concentration inputting word after updating
Position related term, with this most newly inputted word of bottom related term, obtains the input word after again updating from entry data base
The next relevant word set, until the total quantity of the next relevant word set is more than predetermined threshold value.
Further, the above-mentioned mode obtaining the relevant word set in bottom from entry data base specifically includes:
According to described input word, from entry data base, obtain the entry comprising described input word, and described entry is entered
Row participle and screening, it is thus achieved that relevant word set to be tested;
Each related term to be tested concentrated for described related term to be tested, according to described related term to be tested, from institute's predicate
Data storehouse obtains the entry comprising described related term to be tested, and the entry of described related term to be tested is carried out participle and sieve
Choosing, it is thus achieved that the comparison word set of described related term to be tested;
When the absolute value of the comparison word set to the common factor of described relevant word set to be tested judging described related term to be tested is more than sieve
When selecting threshold value, described related term to be tested is the next related term of described input word, it is thus achieved that the next relevant word set;Wherein, described absolutely
To value as described the next related term and the degree of association of described key word.
As a further improvement on the present invention, described according to described input word, obtain from entry data base described in comprising
The entry of input word, and described entry is carried out participle and screening, it is thus achieved that relevant word set to be tested, specifically include:
According to described input word, obtain from entry data base and comprise described input word and sequence entry before M position;
According to standard words wiht strip-lattice type, the entry obtained is carried out Format adjusting;
Call participle instrument;
Utilize described participle instrument that the entry after Format adjusting is carried out participle, it is thus achieved that the first word collection;
Concentrate the word extracting the core word belonged to user-oriented dictionary as related term to be tested from described first word, it is thus achieved that
Relevant word set to be tested;Wherein, described user-oriented dictionary is provided by described participle instrument;
And, described according to described related term to be tested, obtain from described entry data base and comprise described related term to be tested
Entry, and the entry of described related term to be tested is carried out participle and screening, it is thus achieved that the comparison word set of described related term to be tested, tool
Body includes:
According to described related term to be tested, obtain from entry data base and comprise described related term to be tested and sequence in M position
Front entry;
According to described standard words wiht strip-lattice type, comprise described related term to be tested and sequence entry before M position enters to described
Row format adjusts;
Call described participle instrument;
Utilize described participle instrument to comprising described related term to be tested and sequence entry before M position after Format adjusting
Carry out participle, it is thus achieved that the second word collection;
Concentrate the word extracting the core word belonged to user-oriented dictionary as comparison word from described second word, it is thus achieved that comparison
Word set.
Specifically, the be correlated with common factor of word set and described upper relevant word set of the bottom of described key word is included in described key
The output related term of word is concentrated, then the degree of association of each the output related term being included in described common factor is T, T=(T1+
T2)/2;Wherein, T1 is the degree of association when this output related term is as the next related term with described key word, and T2 is as at this
Output related term as during upper related term with the degree of association of described key word.
Changing further as the present invention, described acquisition methods also includes:
Each the next related term concentrated by the next related term of described key word is equal with the degree of association of described key word
Deduct described screening threshold value;
Each the upper related term concentrated by the upper related term of described key word is equal with the degree of association of described key word
Deduct described screening threshold value, complete the normalization of degree of association.
Correspondingly, the present invention implements to also provide for a kind of device providing related term, including:
The next relevant word set module, for the key word using user's input as input word, obtains from entry data base
Being correlated with word set in the bottom of described key word, and determines each the next related term and described pass that described the next related term concentrates
The degree of association of keyword;
Upper relevant word set module, for being correlated with word set according to the bottom of described key word, obtains from entry data base
The upper relevant word set of described key word, and determine each upper related term and described pass that described upper related term concentrates
The degree of association of keyword;
The relevant word set module of output, the union for be correlated with the bottom of described key word word set and upper relevant word set is made
It is correlated with word set for the output of described key word, and each the output related term concentrated according to described output related term is relevant
Degree, concentrates the related term selecting to be supplied to described user at described output related term.
As the further improvement of the embodiment of the present invention, described upper relevant word set module specifically includes: the next word set obtains
Take unit, threshold decision unit and upper word set acquiring unit, wherein,
Described the next word set acquiring unit, is used for each the next related term concentrated for described the next related term, with
This bottom related term carrys out more newly inputted word, is correlated with word set in the bottom obtaining the input word after updating from entry data base;
Described threshold decision unit, for judging that whether the total quantity of the next relevant word set is more than predetermined threshold value;
Described upper word set acquiring unit, during for being more than predetermined threshold value when the total quantity judging the relevant word set in bottom, from
The next related term is concentrated and is filtered out the bottom comprising described key word and be correlated with word set, and by the described bottom comprising described key word
Input word corresponding to related term collection is as upper related term, it is thus achieved that the upper relevant word set of described key word;Wherein, at described bag
The degree of association inputting word that described key word that the next related term containing described key word is concentrated is corresponding with this bottom related term collection,
As this input word when as upper related term with the degree of association of described key word;
Described hyponym acquiring unit, is additionally operable to, when the total quantity judging the relevant word set in bottom is less than predetermined threshold value, continue
The following operation of continuous execution: for each the next related term of the next related term concentration of the input word after updating, with this bottom
The most newly inputted word of related term, the bottom obtaining the input word after again updating from entry data base is correlated with word set, until
The total quantity of the next relevant word set is more than predetermined threshold value.
Further, word set module is correlated with in described bottom and described the next word set acquiring unit also includes for from entry number
According to the unit of the relevant word set in acquisition bottom in storehouse, particularly as follows:
Relevant word set unit to be tested, for according to described input word, obtains from entry data base and comprises described input word
Entry, and described entry is carried out participle and screening, it is thus achieved that relevant word set to be tested;
Comparison word set unit, for each related term to be tested concentrated for described related term to be tested, treats according to described
Test related term, from described entry data base, obtain the entry comprising described related term to be tested, and to described related term to be tested
Entry carries out participle and screening, it is thus achieved that the comparison word set of described related term to be tested;With
Judge acquiring unit, for when the friendship of the comparison word set to described relevant word set to be tested judging described related term to be tested
When the absolute value of collection is more than screening threshold value, described related term to be tested is the next related term of described input word, it is thus achieved that the next relevant
Word set;Wherein, described absolute value is as the degree of association of described the next related term with described key word.
Further, described relevant word set unit to be tested, specifically include:
First entry subelement, for according to described input word, obtain from entry data base comprise described input word and
Sequence entry before M position;
First adjusts subelement, for according to standard words wiht strip-lattice type, the entry obtained being carried out Format adjusting;
First calls subelement, is used for calling participle instrument;
First participle subelement, for utilizing described participle instrument that the entry after Format adjusting is carried out participle, it is thus achieved that the
One word collection;With,
First extracts subelement, for concentrating the word extracting the core word belonged to user's word to make from described first word
For related term to be tested, it is thus achieved that relevant word set to be tested;Wherein, described user-oriented dictionary is provided by described participle instrument;
And, described comparison word set unit specifically includes:
Second entry subelement, for according to described related term to be tested, from entry data base, acquisition comprises described to be tested
Related term and sequence entry before M position;
Second adjusts subelement, for according to described standard words wiht strip-lattice type, comprises described related term to be tested and row to described
Sequence entry before M position carries out Format adjusting;
Second calls subelement, is used for calling described participle instrument;
Second participle subelement, for utilize described participle instrument to after Format adjusting comprise described related term to be tested and
Sequence entry before M position carries out participle, it is thus achieved that the second word collection;With,
Second extracts subelement, for extracting, according to from described second word concentration, the core word belonged to user-oriented dictionary
Word is as comparison word, it is thus achieved that comparison word set.
Further, the device of described offer related term also includes normalization module:
Described normalization module, for each the next related term and institute of being concentrated by the next related term of described key word
The degree of association stating key word all deducts described screening threshold value;And it is each for what the upper related term of described key word was concentrated
Individual upper related term all deducts described screening threshold value with the degree of association of described key word, completes the normalization of degree of association.
Implement the embodiment of the present invention, have the advantages that
The method and apparatus providing related term that the embodiment of the present invention provides, the key word provided by user is from entry number
It is correlated with word set according to the bottom obtaining described key word in storehouse, then further according to the relevant word set in this bottom, seeks out the upper of key word
The relevant word set in position, finally the union of the relevant word set in this bottom and this upper relevant word set is as the output related term of described key word
Collection, can expand substantial amounts of related term and be supplied to user's selection, it addition, be determined by the degree of association of related term, can retouch exactly
State as the degree of correlation between related term and key word, follow-up can select be supplied to described user's according to the degree of association of related term
Related term, can be described by the degree of association of related term, provide the user related term exactly.
Accompanying drawing explanation
Fig. 1 is the schematic flow sheet of an embodiment of the method providing related term that the present invention provides;
Fig. 2 is the schematic flow sheet of an embodiment of step S2 of the method providing related term that Fig. 1 provides;
Fig. 3 is the schematic flow sheet of an enforcement of step S3 of the method providing related term that Fig. 1 provides;
Fig. 4 is the schematic flow sheet of another embodiment of the method providing related term that the present invention provides;
Fig. 5 is the structural representation of an embodiment of the device providing related term that the present invention provides;
Fig. 6 is the structure of an embodiment of the upper relevant word set module of the device providing related term that the present invention provides
Schematic diagram;
Fig. 7 is a reality of the unit for obtaining the next relevant word set of the device providing related term that the present invention provides
Execute the structural representation of example;
Fig. 8 is the structure of an embodiment of the to be tested relevant word set unit of the device providing related term that the present invention provides
Schematic diagram;
Fig. 9 is the structural representation of an embodiment of the comparison word set unit of the device providing related term that the present invention provides
Figure.
Detailed description of the invention
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Describe, it is clear that described embodiment is only a part of embodiment of the present invention rather than whole embodiments wholely.Based on
Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under not making creative work premise
Embodiment, broadly falls into the scope of protection of the invention.
Seeing Fig. 1, Fig. 2 and Fig. 3, Fig. 1 is the flow process of an embodiment of the method providing related term that the present invention provides
Schematic diagram, Fig. 2 is the schematic flow sheet of an embodiment of step S2 of the method providing related term that Fig. 1 provides, and Fig. 3 is figure
The schematic flow sheet of one enforcement of step S3 of 1 method that related term is provided provided.Below in conjunction with these three flow chart,
Using paper database (National IP Network in such as) as entry data base, as a example by therefrom obtaining the related term of key word Java, in detail
The method providing related term of the present embodiment is described, the method comprises the following steps:
S1, with the key word Java of user's input for input word, obtains the bottom of key word Java from entry data base
Relevant word set, and determine each the next related term of described the next related term concentration and the degree of association of described key word.Step
Rapid S1 includes step S11 to S13, specific as follows:
S11, obtains the entry comprising described input word Java from paper database according to described input word Java, and right
Described entry carries out participle and screening, it is thus achieved that relevant word set A={a to be tested1,…,an};The specific implementation process of this step is as follows:
Utilize search engine obtain from paper database according to described input word Java comprise described input word Java and
Sequence entry before M position, such as, front page 50 abstracts of a thesis as entry, or, in Wiki, search for key word Java's
Front 500 summaries;
According to standard words wiht strip-lattice type, described entry is carried out Format adjusting;Such as, the small letter in entry is unified into capitalization,
To the punctuation mark in space deletion unnecessary in entry, unified entry, by full-shape form or the half width form unification of entry it is
A kind of etc..
Call participle instrument;Preferably, described participle instrument is jieba participle instrument, but is not limited to this participle instrument.
Utilize described participle instrument that the entry after Format adjusting is carried out participle, it is thus achieved that the first word collection;
According to keyword extraction algorithm, concentrate from described first word and extract the word relevant to described input word as treating
Test related term { a1,…,an, it is thus achieved that relevant word set A={a to be tested1,…,an}.It should be noted that can by participle instrument or
Add dictionary by the device of this offer related term, utilize the core word that dictionary provides, concentrate from described first word and extract core
Heart word is as related term to be tested.
S12, for described relevant word set A={a to be tested1,…,anEach in } related term to be tested, according to described to be tested
Related term obtains the entry comprising described related term to be tested, and the entry to described related term to be tested from described entry data base
Carry out participle and screening, it is thus achieved that the comparison word set of described related term to be tested.It should be noted that this step S22 and a upper step
The specific implementation process of rapid S21 is identical, simply distinguishes the input word being in step S21 and becomes related term { a to be tested1,…,an,
Then by obtained related term a to be testediTo be tested relevant word set Bai={ bi1,…,binAs related term a to be testediComparison word
Collection, thus do not repeat them here.
S13, when judging described related term a to be testediComparison word set Bai={ bi1,…,binTo described relevant word set to be tested
A={a1,…,anCommon factor absolute value r more than screening threshold value p time, i.e. BaiGather big with the quantity of identical element in set A
In time screening threshold value p, described related term a to be testediThe next related term for described input word Java, it is thus achieved that under described key word
The relevant word set A '={ a in positionj, and j ∈ 1 ..., n}, | A ' |≤n, | A ∩ Baj|>p;Wherein, the absolute value r of described common factor is institute
State the degree of association that the next related term is concentrated at described the next related term.It should be noted that described degree of association is expressed as related term
Degree of correlation between the input word of the related term word set relevant to this concentrated.
Obtained the relevant word set in bottom of input word by above-mentioned steps S11, S12 with S13, noise word can be filtered, improve
Obtain the efficiency of the next related term.
Being correlated with word set in bottom according to described key word, obtains the upper related term of described key word from entry data base
Collection, and determine each upper related term and the degree of association of described key word that described upper related term concentrates
S2, is correlated with word set A '={ a according to the bottom of described key wordj, from entry data base, obtain described key word
Upper relevant word set, and determine that each upper related term that described upper related term is concentrated is relevant to described key word
Degree.This step specifically include following steps S21 to S24:
S21, is correlated with word set A '={ a for described bottomjThe next related term of each in }, with this bottom related term aj
Carry out more newly inputted word, i.e. as input word, from paper database, obtain the input word a after updatingjBottom be correlated with word set A ";
Needs explanation, in this embodiment, it is preferred that, step S31 obtains in mode and above-mentioned steps S2 of the relevant word set in bottom
The mode obtaining the relevant word set in bottom is consistent, does not repeats them here.
S22, it is judged that whether total quantity N of current the next relevant word set is more than predetermined threshold value S;
S23, the most then concentrate from the next related term and filter out all bottoms comprising described key word and be correlated with word set, and
Using input word corresponding for the described the next related term collection comprising described key word as upper related term, it is thus achieved that described key word
Upper relevant word set C;Wherein, in described key word and this bottom phase of the described the next related term concentration comprising described key word
Close the degree of association of input word corresponding to word set, as this input word when as upper related term with the degree of association of described key word
S24, if it is not, then continue executing with following operation: for update after input word the next related term concentrate each
Individual the next related term, with this most newly inputted word of bottom related term, obtains the input after again updating from entry data base
Be correlated with word set in the bottom of word, until total quantity N of the next relevant word set is more than predetermined threshold value S;It should be noted that in step
The mode obtaining the next relevant word set in S21 with S23 is also one to the mode obtaining the next relevant word set in above-mentioned steps S1
Cause, do not repeat them here.
It is to say, such as key word Java, the upper relevant word set of Java is the element in a set
It is Java for input the next related term concentration of the set of word, i.e. each element in this set having identical element.Logical
Cross and use the mode inverse upper relevant word set of asking for key word identical with obtaining the relevant word set in bottom, can be from multiple dimensions for using
Family provides related term.
S3, the union of word set of being correlated with the bottom of described key word and upper relevant word set is as the output of described key word
Relevant word set, and the degree of association of each the output related term concentrated according to described output related term, at described output related term
Concentrate the related term selecting to be supplied to described user.
The be correlated with common factor of word set and described upper relevant word set of the bottom of the most described key word is included in described key
The output related term of word is concentrated, then the degree of association of each the output related term being included in described common factor is T, T=(T1+
T2)/2;Wherein, T1 is the degree of association when this output related term is as the next related term with described key word, and T2 is as at this
Output related term as during upper related term with the degree of association of described key word.It is to say, after union, upper relevant word set and
The next related term concentrates the value of the degree of association of identical related term to be the equal of this related term degree of association in the two set
Value.
As a further improvement on the present invention, described acquisition methods also includes being normalized degree of association:
Each the next related term concentrated by the next related term of described key word is equal with the degree of association of described key word
Deduct described screening threshold value;
Each the upper related term concentrated by the upper related term of described key word is equal with the degree of association of described key word
Deduct described screening threshold value, complete the normalization of degree of association.
It should be noted that normalized purpose is the related term and this key word allowing the output related term of key word concentrate
The numerical value of degree of association of degree of correlation can be on the basis of 0, numerical value is the highest, and related term is the highest with the degree of correlation of key word,
The convenient related term concentrating selection to be supplied to user at output related term in step s 4.
Implement the method that related term is provided of the embodiment of the present invention, by the to be tested the next related term obtained is compareed
Related term after checking, as the next related term, can filter the impact of noise word, improves the quality of the related term got, the most just
It is to say, can ensure that the accuracy of the related term being supplied to user.On the other hand, it is correlated with in the bottom getting key word word set
After, carry out the inverse upper related term asking for key word when continuing through the next relevant word set, can extend in a large number and provide the user
The quantity of related term, and the quality of the true upper related term of energy.
See Fig. 4, be the schematic flow sheet of another embodiment of the method that related term is provided that the present invention provides;This reality
The method providing related term executing example is: respectively using paper database and wikipedia data base as entry data base, therefrom
Obtain the corresponding first relevant word set of output and export relevant word set with second, then by the first relevant word set of output and the second output
The union of relevant word set is correlated with word set as the final output of key word;Wherein, in wikipedia data base, obtain second
The mode of the relevant word set of output is identical with the mode obtaining the relevant word set of output in a upper embodiment in paper database.This reality
Executing example uses two kinds of different entry data bases and entry data base to be paper database and wikipedia data base, carries out phase
Close the excavation of word, on the one hand the most with strong points for the extension of related term, and be avoided that language material is single, and providing the user of causing
Related term obtain the most unilateral.
Correspondingly, see Fig. 5, be the structural representation of an embodiment of the device that related term is provided that the present invention provides
Figure, can realize whole flow processs of above two embodiment, and the device of this offer related term includes:
The next relevant word set module 10, for the key word using user's input as input word, obtains from entry data base
Take the bottom of described key word to be correlated with word set, and determine that each the next related term that described the next related term concentrates is with described
The degree of association of key word;
Upper relevant word set module 20, for being correlated with word set according to the bottom of described key word, obtains from entry data base
Take the upper relevant word set of described key word, and determine that each upper related term that described upper related term concentrates is with described
The degree of association of key word;
The relevant word set module 30 of output, for be correlated with the bottom of described key word word set and the union of upper relevant word set
It is correlated with word set as the output of described key word, and each the output related term concentrated according to described output related term is relevant
Degree, concentrates the related term selecting to be supplied to described user at described output related term.
As the further improvement of the embodiment of the present invention, as shown in Figure 6, Fig. 6 is the offer related term that the present invention provides
The structural representation of one embodiment of the upper relevant word set module of device;This upper relevant word set module 30 specifically includes:
The next word set acquiring unit 31, threshold decision unit 32 and upper word set acquiring unit 33, wherein,
Described the next word set acquiring unit 31, for each the next related term concentrated for described the next related term,
Carrying out more newly inputted word with this bottom related term, is correlated with word set in the bottom obtaining the input word after updating from entry data base;
Described threshold decision unit 32, for judging that whether the total quantity of the next relevant word set is more than predetermined threshold value;
Described upper word set acquiring unit 33, during for being more than predetermined threshold value when the total quantity judging the relevant word set in bottom,
Concentrate from the next related term and filter out the bottom comprising described key word and be correlated with word set, and comprise described under described key word
Position input word corresponding to related term collection is as upper related term, it is thus achieved that the upper relevant word set of described key word;Wherein, described
Comprise described key word the next related term concentrate described key word corresponding to this bottom related term collection input word relevant
Degree, as this input word when as upper related term with the degree of association of described key word;
Described the next word set acquiring unit 31, is additionally operable to when judging that the total quantity of the relevant word set in bottom is less than predetermined threshold value
Time, continue executing with following operation: for each the next related term of the next related term concentration of the input word after updating, with this
The next the most newly inputted word of related term, obtains the bottom of the input word after again updating from entry data base and is correlated with word set,
Until the total quantity of the next relevant word set is more than predetermined threshold value.
Further, described bottom be correlated with word set module 20 and described the next word set acquiring unit 31 the most also include for from
Entry data base obtains the unit of the relevant word set in bottom, as it is shown in fig. 7, the dress that related term is provided that Fig. 7 is the present invention to be provided
The structural representation of one embodiment of the unit for obtaining the next relevant word set put, specifically includes:
Relevant word set unit 1 to be tested, for according to described input word, obtains from entry data base and comprises described input word
Entry, and described entry is carried out participle and screening, it is thus achieved that relevant word set to be tested;
Comparison word set unit 2, for each related term to be tested concentrated for described related term to be tested, treats according to described
Test related term, from described entry data base, obtain the entry comprising described related term to be tested, and to described related term to be tested
Entry carries out participle and screening, it is thus achieved that the comparison word set of described related term to be tested;With
Judge acquiring unit 3, for when judging the comparison word set of described related term to be tested and described relevant word set to be tested
When the absolute value occured simultaneously is more than screening threshold value, described related term to be tested is the next related term of described input word, it is thus achieved that the next phase
Close word set;Wherein, described absolute value as described the next related term at the degree of association with described key word.
Further, as shown in Figure 8, Fig. 8 is the to be tested relevant word set list of the device providing related term that the present invention provides
The structural representation of one embodiment of unit;Described relevant word set unit 1 to be tested, specifically includes:
First entry subelement 11, for according to described input word, obtains from entry data base and comprises described input word
And the entry that sequence is before M position;
First adjusts subelement 12, for according to standard words wiht strip-lattice type, the entry obtained being carried out Format adjusting;
First calls subelement 13, is used for calling participle instrument;
First participle subelement 14, for utilizing described participle instrument that the entry after Format adjusting is carried out participle, it is thus achieved that
First word collection;With,
First extracts subelement 15, extracts the word of the core word belonged to user's word from described first word concentration for root
Language is as related term to be tested, it is thus achieved that relevant word set to be tested;Wherein, described user-oriented dictionary is provided by described participle instrument;
And, as it is shown in figure 9, one of the comparison word set unit of the device that related term is provided that Fig. 9 is the present invention to be provided
The structural representation of embodiment;Described comparison word set unit 2 specifically includes:
Second entry subelement 21, for according to described related term to be tested, obtains from entry data base and treats described in comprising
Test related term and sequence entry before M position;
Second adjusts subelement 22, for according to described standard words wiht strip-lattice type, to described comprise described related term to be tested and
Sequence entry before M position carries out Format adjusting;
Second calls subelement 23, is used for calling described participle instrument;
Second participle subelement 24, for utilizing described participle instrument to comprising described related term to be tested after Format adjusting
And the entry that sequence is before M position carries out participle, it is thus achieved that the second word collection;With,
Second extracts subelement 25, for concentrating the word of the core word belonged to user-oriented dictionary to make from described second word
For comparison word, it is thus achieved that comparison word set.
Specifically, the be correlated with common factor of word set and described upper relevant word set of the bottom of described key word is included in described key
The output related term of word is concentrated, then the degree of association of each the output related term being included in described common factor is T, T=(T1+
T2)/2;Wherein, T1 is the degree of association when this output related term is as the next related term with described key word, and T2 is as at this
Output related term as during upper related term with the degree of association of described key word.
Further, as it is shown in figure 5, the device of described offer related term also includes normalization module 40:
Described normalization module, for each the next related term and institute of being concentrated by the next related term of described key word
The degree of association stating key word all deducts described screening threshold value;And it is each for what the upper related term of described key word was concentrated
Individual upper related term all deducts described screening threshold value with the degree of association of described key word, completes the normalization of degree of association.
The device providing related term that the embodiment of the present invention provides, by compareing the to be tested the next related term obtained
Related term after checking, as the next related term, can filter the impact of noise word, improves the quality of the related term got, the most just
It is to say, can ensure that the accuracy of the related term being supplied to user.On the other hand, it is correlated with in the bottom getting key word word set
After, carry out the inverse upper related term asking for key word when continuing through the next relevant word set, can extend in a large number and provide the user
The quantity of related term, and the quality of the true upper related term of energy.
One of ordinary skill in the art will appreciate that all or part of flow process realizing in above-described embodiment method, be permissible
Instructing relevant hardware by computer program to complete, described program can be stored in a computer read/write memory medium
In, this program is upon execution, it may include such as the flow process of the embodiment of above-mentioned each method.Wherein, described storage medium can be magnetic
Dish, CD, read-only store-memory body (Read-Only Memory, ROM) or random store-memory body (Random Access
Memory, RAM) etc..
The above is the preferred embodiment of the present invention, it is noted that for those skilled in the art
For, under the premise without departing from the principles of the invention, it is also possible to make some improvements and modifications, these improvements and modifications are also considered as
Protection scope of the present invention.
Claims (11)
1. the method that related term is provided, it is characterised in that including:
Using the key word of user's input as input word, from entry data base, obtain the bottom of described key word be correlated with word set,
And determine each the next related term of described the next related term concentration and the degree of association of described key word;
Being correlated with word set in bottom according to described key word, obtains the upper relevant word set of described key word from entry data base,
And determine the degree of association of each upper related term that described upper related term concentrates and described key word;
The union of word set of being correlated with the bottom of described key word and upper relevant word set is as the output related term of described key word
Collection, and the degree of association of each the output related term concentrated according to described output related term, concentrate choosing at described output related term
Select the related term being supplied to described user.
2. the method that related term is provided as claimed in claim 1, it is characterised in that the described the next phase according to described key word
Close word set, from entry data base, obtain the upper relevant word set of described key word, and determine that described upper related term is concentrated
The degree of association of each upper related term and described key word, particularly as follows:
Each the next related term concentrated for described the next related term, carrys out more newly inputted word, from word with this bottom related term
It is correlated with word set in the bottom obtaining the input word after updating in data storehouse;
Judge that whether the total quantity of the next relevant word set is more than predetermined threshold value;
The most then concentrate from the next related term and filter out the bottom comprising described key word and be correlated with word set, and comprise institute by described
State input word corresponding to the next related term collection of key word as upper related term, it is thus achieved that the upper related term of described key word
Collection;Wherein, the described key word concentrated at the described the next related term comprising described key word is corresponding with this bottom related term collection
The degree of association of input word, as this input word when as upper related term with the degree of association of described key word;
If it is not, then continue executing with following operation: for each the next phase of the next related term concentration of the input word after updating
Close word, with this most newly inputted word of bottom related term, from entry data base, obtain the bottom of the input word after again updating
Relevant word set, until the total quantity of the next relevant word set is more than predetermined threshold value.
3. the method that related term is provided as claimed in claim 2, it is characterised in that obtain bottom from entry data base relevant
The mode of word set specifically includes:
According to described input word, from entry data base, obtain the entry comprising described input word, and described entry is carried out point
Word and screening, it is thus achieved that relevant word set to be tested;
Each related term to be tested concentrated for described related term to be tested, according to described related term to be tested, from described entry number
According to storehouse obtains the entry comprising described related term to be tested, and the entry of described related term to be tested is carried out participle and screening, obtain
Obtain the comparison word set of described related term to be tested;
When the absolute value of the comparison word set to the common factor of described relevant word set to be tested judging described related term to be tested is more than screening threshold
During value, described related term to be tested is the next related term of described input word, it is thus achieved that the next relevant word set;Wherein, described absolute value
Degree of association as described the next related term with described key word.
4. the method that related term is provided as claimed in claim 3, it is characterised in that described according to described input word, from entry
Data base obtains the entry comprising described input word, and described entry is carried out participle and screening, it is thus achieved that relevant word set to be tested,
Specifically include:
According to described input word, obtain from entry data base and comprise described input word and sequence entry before M position;
According to standard words wiht strip-lattice type, the entry obtained is carried out Format adjusting;
Call participle instrument;
Utilize described participle instrument that the entry after Format adjusting is carried out participle, it is thus achieved that the first word collection;
Concentrate the word extracting the core word belonged to user-oriented dictionary as related term to be tested from described first word, it is thus achieved that to be tested
Relevant word set;Wherein, described user-oriented dictionary is provided by described participle instrument;
And, described according to described related term to be tested, from described entry data base, obtain the word comprising described related term to be tested
Bar, and the entry of described related term to be tested is carried out participle and screening, it is thus achieved that the comparison word set of described related term to be tested, specifically wrap
Include:
According to described related term to be tested, obtain from entry data base and comprise described related term to be tested and sequence before M position
Entry;
According to described standard words wiht strip-lattice type, comprise described related term to be tested and sequence entry before M position carries out lattice to described
Formula adjusts;
Call described participle instrument;
Utilize described participle instrument to after Format adjusting comprise described related term to be tested and sequence entry before M position is carried out
Participle, it is thus achieved that the second word collection;
Concentrate the word extracting the core word belonged to user-oriented dictionary as comparison word from described second word, it is thus achieved that comparison word
Collection.
5. the as claimed in claim 1 method that related term is provided, it is characterised in that the bottom of described key word be correlated with word set and
The common factor of described upper relevant word set is included in the output related term of described key word and concentrates, then be included in described common factor is every
The degree of association of one output related term is T, T=(T1+T2)/2;Wherein, T1 is as the next related term at this output related term
Time with the degree of association of described key word, T2 is as relevant to described key word when this output related term is as upper related term
Degree.
6. the method that related term is provided as claimed in claim 3, it is characterised in that described acquisition methods also includes:
Each the next related term concentrated by the next related term of described key word all deducts with the degree of association of described key word
Described screening threshold value;
Each the upper related term concentrated by the upper related term of described key word all deducts with the degree of association of described key word
Described screening threshold value, completes the normalization of degree of association.
7. the device that related term is provided, it is characterised in that including:
The next relevant word set module, for the key word using user's input as input word, obtains described from entry data base
Being correlated with word set in the bottom of key word, and determines each the next related term and described key word that described the next related term concentrates
Degree of association;
Upper relevant word set module, for being correlated with word set according to the bottom of described key word, obtains described from entry data base
The upper relevant word set of key word, and determine each upper related term and described key word that described upper related term concentrates
Degree of association;
The relevant word set module of output, for the union of be correlated with the bottom of described key word word set and upper relevant word set as institute
State the output of key word to be correlated with word set, and the degree of association of each the output related term concentrated according to described output related term,
Described output related term concentrates the related term selecting to be supplied to described user.
8. the device that related term is provided as claimed in claim 7, it is characterised in that described upper relevant word set module is specifically wrapped
Include: the next word set acquiring unit, threshold decision unit and upper word set acquiring unit, wherein,
Described the next word set acquiring unit, for each the next related term concentrated for described the next related term, with under this
Position related term carrys out more newly inputted word, is correlated with word set in the bottom obtaining the input word after updating from entry data base;
Described threshold decision unit, for judging that whether the total quantity of the next relevant word set is more than predetermined threshold value;
Described upper word set acquiring unit, during for being more than predetermined threshold value when the total quantity judging the relevant word set in bottom, from bottom
Related term is concentrated and is filtered out the bottom comprising described key word and be correlated with word set, and is correlated with the described bottom comprising described key word
Input word corresponding to word set is as upper related term, it is thus achieved that the upper relevant word set of described key word;Wherein, institute is comprised described
State key word the next related term concentrate described key word corresponding with this bottom related term collection input word degree of association, as
This input word when as upper related term with the degree of association of described key word;
Described the next word set acquiring unit, is additionally operable to, when the total quantity judging the relevant word set in bottom is less than predetermined threshold value, continue
Operation below performing: each the bottom related term concentrated for the next related term inputting word after updating, with this bottom phase
Closing the most newly inputted word of word, is correlated with word set in the bottom obtaining the input word after again updating from entry data base, until under
The total quantity of the relevant word set in position is more than predetermined threshold value.
9. the as claimed in claim 8 device that related term is provided, it is characterised in that be correlated with word set module and described in described bottom
The next word set acquiring unit also includes the unit for obtaining the relevant word set in bottom from entry data base, particularly as follows:
Relevant word set unit to be tested, for according to described input word, obtaining the word comprising described input word from entry data base
Bar, and described entry is carried out participle and screening, it is thus achieved that relevant word set to be tested;
Comparison word set unit, for each related term to be tested concentrated for described related term to be tested, according to described phase to be tested
Close word, from described entry data base, obtain the entry comprising described related term to be tested, and the entry to described related term to be tested
Carry out participle and screening, it is thus achieved that the comparison word set of described related term to be tested;With
Judge acquiring unit, for the common factor when the comparison word set to described relevant word set to be tested judging described related term to be tested
When absolute value is more than screening threshold value, described related term to be tested is the next related term of described input word, it is thus achieved that the next relevant word set;
Wherein, described absolute value is as the degree of association of described the next related term with described key word.
10. the device that related term is provided as claimed in claim 9, it is characterised in that described relevant word set unit to be tested, specifically
Including:
First entry subelement, for according to described input word, obtains from entry data base and comprises described input word and sequence
Entry before M position;
First adjusts subelement, for according to standard words wiht strip-lattice type, the entry obtained being carried out Format adjusting;
First calls subelement, is used for calling participle instrument;
First participle subelement, for utilizing described participle instrument that the entry after Format adjusting is carried out participle, it is thus achieved that the first word
Language collection;With,
First extracts subelement, for concentrating the word extracting the core word belonged to user's word as treating from described first word
Test related term, it is thus achieved that relevant word set to be tested;Wherein, described user-oriented dictionary is provided by described participle instrument;
And, described comparison word set unit specifically includes:
Second entry subelement, for according to described related term to be tested, from entry data base, acquisition comprises described to be tested relevant
Word and sequence entry before M position;
Second adjusts subelement, for according to described standard words wiht strip-lattice type, comprises described related term to be tested and sequence exists to described
Entry before M position carries out Format adjusting;
Second calls subelement, is used for calling described participle instrument;
Second participle subelement, for utilizing described participle instrument to comprising described related term to be tested and sequence after Format adjusting
Entry before M position carries out participle, it is thus achieved that the second word collection;With,
Second extracts subelement, for according to concentrating the word extracting the core word belonged to user-oriented dictionary from described second word
As comparison word, it is thus achieved that comparison word set.
11. devices that related term is provided as claimed in claim 10, it is characterised in that the device of described offer related term also wraps
Include normalization module:
Described normalization module, for each the next related term concentrated by the next related term of described key word and described pass
The degree of association of keyword all deducts described screening threshold value;And it is used for each concentrated by the upper related term of described key word
Position related term all deducts described screening threshold value with the degree of association of described key word, completes the normalization of degree of association.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610445489.2A CN106126588B (en) | 2016-06-17 | 2016-06-17 | The method and apparatus of related term are provided |
PCT/CN2016/113175 WO2017215244A1 (en) | 2016-06-17 | 2016-12-29 | Method and device for providing relevant words |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610445489.2A CN106126588B (en) | 2016-06-17 | 2016-06-17 | The method and apparatus of related term are provided |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106126588A true CN106126588A (en) | 2016-11-16 |
CN106126588B CN106126588B (en) | 2019-09-20 |
Family
ID=57470913
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610445489.2A Active CN106126588B (en) | 2016-06-17 | 2016-06-17 | The method and apparatus of related term are provided |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN106126588B (en) |
WO (1) | WO2017215244A1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017215244A1 (en) * | 2016-06-17 | 2017-12-21 | 广州视源电子科技股份有限公司 | Method and device for providing relevant words |
CN108304366A (en) * | 2017-03-21 | 2018-07-20 | 腾讯科技(深圳)有限公司 | A kind of hypernym detection method and equipment |
CN108628832A (en) * | 2018-05-08 | 2018-10-09 | 中国联合网络通信集团有限公司 | A kind of information keyword acquisition methods and device |
CN109241525A (en) * | 2018-08-20 | 2019-01-18 | 深圳追科技有限公司 | Extracting method, the device and system of keyword |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103810274A (en) * | 2014-02-12 | 2014-05-21 | 北京联合大学 | Multi-feature image tag sorting method based on WordNet semantic similarity |
CN104008097A (en) * | 2013-02-21 | 2014-08-27 | 日电(中国)有限公司 | Method and device for achieving query understanding |
CN104123351A (en) * | 2014-07-09 | 2014-10-29 | 百度在线网络技术(北京)有限公司 | Interactive search method and device |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5044236B2 (en) * | 2007-01-12 | 2012-10-10 | 富士フイルム株式会社 | Content search device and content search method |
TW201214163A (en) * | 2010-09-21 | 2012-04-01 | Inventec Corp | Searching system and method thereof with generating extending keywords according to input keywords |
CN103778262B (en) * | 2014-03-06 | 2017-07-21 | 北京林业大学 | Information retrieval method and device based on thesaurus |
CN106126588B (en) * | 2016-06-17 | 2019-09-20 | 广州视源电子科技股份有限公司 | The method and apparatus of related term are provided |
-
2016
- 2016-06-17 CN CN201610445489.2A patent/CN106126588B/en active Active
- 2016-12-29 WO PCT/CN2016/113175 patent/WO2017215244A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104008097A (en) * | 2013-02-21 | 2014-08-27 | 日电(中国)有限公司 | Method and device for achieving query understanding |
CN103810274A (en) * | 2014-02-12 | 2014-05-21 | 北京联合大学 | Multi-feature image tag sorting method based on WordNet semantic similarity |
CN104123351A (en) * | 2014-07-09 | 2014-10-29 | 百度在线网络技术(北京)有限公司 | Interactive search method and device |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017215244A1 (en) * | 2016-06-17 | 2017-12-21 | 广州视源电子科技股份有限公司 | Method and device for providing relevant words |
CN108304366A (en) * | 2017-03-21 | 2018-07-20 | 腾讯科技(深圳)有限公司 | A kind of hypernym detection method and equipment |
WO2018171499A1 (en) * | 2017-03-21 | 2018-09-27 | 腾讯科技(深圳)有限公司 | Information detection method, device and storage medium |
CN108628832A (en) * | 2018-05-08 | 2018-10-09 | 中国联合网络通信集团有限公司 | A kind of information keyword acquisition methods and device |
CN108628832B (en) * | 2018-05-08 | 2022-03-18 | 中国联合网络通信集团有限公司 | Method and device for acquiring information keywords |
CN109241525A (en) * | 2018-08-20 | 2019-01-18 | 深圳追科技有限公司 | Extracting method, the device and system of keyword |
CN109241525B (en) * | 2018-08-20 | 2022-05-06 | 深圳追一科技有限公司 | Keyword extraction method, device and system |
Also Published As
Publication number | Publication date |
---|---|
CN106126588B (en) | 2019-09-20 |
WO2017215244A1 (en) | 2017-12-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102915342B (en) | Search index based on topic is provided | |
CN102279851B (en) | Intelligent navigation method, device and system | |
CN106126588A (en) | The method and apparatus that related term is provided | |
CN105260362B (en) | New words extraction method and apparatus | |
CN105446989B (en) | Searching method and device, display device | |
CN105956161A (en) | Information recommendation method and apparatus | |
CN103942712A (en) | Product similarity based e-commerce recommendation system and method thereof | |
US20190197071A1 (en) | System and method for evaluating nodes of funnel model | |
CN106126589A (en) | Resume searching method and device | |
CN105302815B (en) | The filter method and device of the uniform resource position mark URL of webpage | |
CN105847288A (en) | Verification code processing method and device | |
CN106294535A (en) | The recognition methods of website and device | |
CN108876470A (en) | Tagging user extended method, computer equipment and storage medium | |
CN105069077A (en) | Search method and device | |
CN105631007A (en) | Industry technical information collecting method and system | |
CN106156114A (en) | Patent retrieval method and device | |
CN104699837B (en) | Method, device and server for selecting illustrated pictures of web pages | |
CN102902788B (en) | Browsing device net page label automatic grouping system and method | |
CN108197243A (en) | Method and device is recommended in a kind of input association based on user identity | |
CN107220378A (en) | Form sort method and device, WEB page methods of exhibiting and device | |
CN104268572A (en) | Feature extraction and feature selection method oriented to background multi-source data | |
CN107783962A (en) | Method and device for query statement | |
CN104102704B (en) | System control methods of exhibiting and device | |
CN106650610A (en) | Human face expression data collection method and device | |
CN111488434B (en) | Recommendation method and device for input associative words, storage medium and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |