CN106126588A - The method and apparatus that related term is provided - Google Patents

The method and apparatus that related term is provided Download PDF

Info

Publication number
CN106126588A
CN106126588A CN201610445489.2A CN201610445489A CN106126588A CN 106126588 A CN106126588 A CN 106126588A CN 201610445489 A CN201610445489 A CN 201610445489A CN 106126588 A CN106126588 A CN 106126588A
Authority
CN
China
Prior art keywords
word
related term
tested
word set
entry
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610445489.2A
Other languages
Chinese (zh)
Other versions
CN106126588B (en
Inventor
李贤�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Shiyuan Electronics Thecnology Co Ltd
Original Assignee
Guangzhou Shiyuan Electronics Thecnology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Shiyuan Electronics Thecnology Co Ltd filed Critical Guangzhou Shiyuan Electronics Thecnology Co Ltd
Priority to CN201610445489.2A priority Critical patent/CN106126588B/en
Publication of CN106126588A publication Critical patent/CN106126588A/en
Priority to PCT/CN2016/113175 priority patent/WO2017215244A1/en
Application granted granted Critical
Publication of CN106126588B publication Critical patent/CN106126588B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis

Abstract

The invention discloses a kind of method that related term is provided, including: using user input key word as input word, from entry data base, obtain the bottom of described key word be correlated with word set, and determine the degree of association of each the next related term that described the next related term concentrates and described key word;Being correlated with word set in bottom according to described key word, obtains the upper relevant word set of described key word from entry data base, and determines the degree of association of each upper related term that described upper related term concentrates and described key word;The union of word set of being correlated with the bottom of described key word and upper relevant word set is correlated with word set as the output of described key word, and the degree of association of each the output related term according to described output related term concentration, the related term selecting to be supplied to described user is concentrated at described output related term.Correspondingly, the invention also discloses a kind of device that related term is provided.Use the embodiment of the present invention, using the teaching of the invention it is possible to provide more and related term more accurately.

Description

The method and apparatus that related term is provided
Technical field
The present invention relates to field of computer technology, particularly relate to a kind of method and apparatus that related term is provided.
Background technology
At present, the function of the keyword search that shopping website and search engine service website all provide, i.e. user's input Wanting commodity or the key word of technology of search, server then goes out corresponding result according to this keyword search and returns to use Family.Server is in order to provide Search Results accurately, and key word typically can be extended by server, i.e. according to user's input Key word, is found out the related term that key word is corresponding, and provides the related term found to user, searched by key word user When rope and failing obtains satisfied Search Results, just scan for according to related term.But the extension of existing related term is by Dictionary is had to be extended, such as WordNet, " synonym woods ", and the related term that this mode is obtained the most quite has Limit, and the related term obtained likely does not catches up with the development and change of language, it is impossible to meet related term to ageing requirement.
Summary of the invention
The embodiment of the present invention proposes a kind of method and apparatus providing related term, using the teaching of the invention it is possible to provide more and more accurately Related term.
A kind of method that related term is provided that the embodiment of the present invention proposes, including:
Using user input key word as input word, from entry data base, obtain the next related term of described key word Collection, and determine each the next related term of described the next related term concentration and the degree of association of described key word;
Being correlated with word set in bottom according to described key word, obtains the upper related term of described key word from entry data base Collection, and determine each upper related term and the degree of association of described key word that described upper related term concentrates;
The union of word set of being correlated with the bottom of described key word and upper relevant word set is as the output phase of described key word Close word set, and the degree of association of each the output related term concentrated according to described output related term, be correlated with word set in described output Middle selection is supplied to the related term of described user.
As the further improvement of the embodiment of the present invention, be correlated with word set in the described bottom according to described key word, from entry Data base obtains the upper relevant word set of described key word, and determines each upper phase that described upper related term is concentrated Close the degree of association of word and described key word, particularly as follows:
Each the next related term concentrated for described the next related term, carrys out more newly inputted word with this bottom related term, It is correlated with word set in the bottom obtaining the input word after updating from entry data base;
Judge that whether the total quantity of the next relevant word set is more than predetermined threshold value;
The most then concentrate from the next related term and filter out the bottom comprising described key word and be correlated with word set, and by described bag The corresponding input word containing the next related term collection of described key word is as upper related term, it is thus achieved that described key word upper relevant Word set;Wherein, at described key word and this bottom related term set pair of the described the next related term concentration comprising described key word The degree of association of the input word answered, as this input word when as upper related term with the degree of association of described key word;
If it is not, then continue executing with following operation: under each of the next related term concentration inputting word after updating Position related term, with this most newly inputted word of bottom related term, obtains the input word after again updating from entry data base The next relevant word set, until the total quantity of the next relevant word set is more than predetermined threshold value.
Further, the above-mentioned mode obtaining the relevant word set in bottom from entry data base specifically includes:
According to described input word, from entry data base, obtain the entry comprising described input word, and described entry is entered Row participle and screening, it is thus achieved that relevant word set to be tested;
Each related term to be tested concentrated for described related term to be tested, according to described related term to be tested, from institute's predicate Data storehouse obtains the entry comprising described related term to be tested, and the entry of described related term to be tested is carried out participle and sieve Choosing, it is thus achieved that the comparison word set of described related term to be tested;
When the absolute value of the comparison word set to the common factor of described relevant word set to be tested judging described related term to be tested is more than sieve When selecting threshold value, described related term to be tested is the next related term of described input word, it is thus achieved that the next relevant word set;Wherein, described absolutely To value as described the next related term and the degree of association of described key word.
As a further improvement on the present invention, described according to described input word, obtain from entry data base described in comprising The entry of input word, and described entry is carried out participle and screening, it is thus achieved that relevant word set to be tested, specifically include:
According to described input word, obtain from entry data base and comprise described input word and sequence entry before M position;
According to standard words wiht strip-lattice type, the entry obtained is carried out Format adjusting;
Call participle instrument;
Utilize described participle instrument that the entry after Format adjusting is carried out participle, it is thus achieved that the first word collection;
Concentrate the word extracting the core word belonged to user-oriented dictionary as related term to be tested from described first word, it is thus achieved that Relevant word set to be tested;Wherein, described user-oriented dictionary is provided by described participle instrument;
And, described according to described related term to be tested, obtain from described entry data base and comprise described related term to be tested Entry, and the entry of described related term to be tested is carried out participle and screening, it is thus achieved that the comparison word set of described related term to be tested, tool Body includes:
According to described related term to be tested, obtain from entry data base and comprise described related term to be tested and sequence in M position Front entry;
According to described standard words wiht strip-lattice type, comprise described related term to be tested and sequence entry before M position enters to described Row format adjusts;
Call described participle instrument;
Utilize described participle instrument to comprising described related term to be tested and sequence entry before M position after Format adjusting Carry out participle, it is thus achieved that the second word collection;
Concentrate the word extracting the core word belonged to user-oriented dictionary as comparison word from described second word, it is thus achieved that comparison Word set.
Specifically, the be correlated with common factor of word set and described upper relevant word set of the bottom of described key word is included in described key The output related term of word is concentrated, then the degree of association of each the output related term being included in described common factor is T, T=(T1+ T2)/2;Wherein, T1 is the degree of association when this output related term is as the next related term with described key word, and T2 is as at this Output related term as during upper related term with the degree of association of described key word.
Changing further as the present invention, described acquisition methods also includes:
Each the next related term concentrated by the next related term of described key word is equal with the degree of association of described key word Deduct described screening threshold value;
Each the upper related term concentrated by the upper related term of described key word is equal with the degree of association of described key word Deduct described screening threshold value, complete the normalization of degree of association.
Correspondingly, the present invention implements to also provide for a kind of device providing related term, including:
The next relevant word set module, for the key word using user's input as input word, obtains from entry data base Being correlated with word set in the bottom of described key word, and determines each the next related term and described pass that described the next related term concentrates The degree of association of keyword;
Upper relevant word set module, for being correlated with word set according to the bottom of described key word, obtains from entry data base The upper relevant word set of described key word, and determine each upper related term and described pass that described upper related term concentrates The degree of association of keyword;
The relevant word set module of output, the union for be correlated with the bottom of described key word word set and upper relevant word set is made It is correlated with word set for the output of described key word, and each the output related term concentrated according to described output related term is relevant Degree, concentrates the related term selecting to be supplied to described user at described output related term.
As the further improvement of the embodiment of the present invention, described upper relevant word set module specifically includes: the next word set obtains Take unit, threshold decision unit and upper word set acquiring unit, wherein,
Described the next word set acquiring unit, is used for each the next related term concentrated for described the next related term, with This bottom related term carrys out more newly inputted word, is correlated with word set in the bottom obtaining the input word after updating from entry data base;
Described threshold decision unit, for judging that whether the total quantity of the next relevant word set is more than predetermined threshold value;
Described upper word set acquiring unit, during for being more than predetermined threshold value when the total quantity judging the relevant word set in bottom, from The next related term is concentrated and is filtered out the bottom comprising described key word and be correlated with word set, and by the described bottom comprising described key word Input word corresponding to related term collection is as upper related term, it is thus achieved that the upper relevant word set of described key word;Wherein, at described bag The degree of association inputting word that described key word that the next related term containing described key word is concentrated is corresponding with this bottom related term collection, As this input word when as upper related term with the degree of association of described key word;
Described hyponym acquiring unit, is additionally operable to, when the total quantity judging the relevant word set in bottom is less than predetermined threshold value, continue The following operation of continuous execution: for each the next related term of the next related term concentration of the input word after updating, with this bottom The most newly inputted word of related term, the bottom obtaining the input word after again updating from entry data base is correlated with word set, until The total quantity of the next relevant word set is more than predetermined threshold value.
Further, word set module is correlated with in described bottom and described the next word set acquiring unit also includes for from entry number According to the unit of the relevant word set in acquisition bottom in storehouse, particularly as follows:
Relevant word set unit to be tested, for according to described input word, obtains from entry data base and comprises described input word Entry, and described entry is carried out participle and screening, it is thus achieved that relevant word set to be tested;
Comparison word set unit, for each related term to be tested concentrated for described related term to be tested, treats according to described Test related term, from described entry data base, obtain the entry comprising described related term to be tested, and to described related term to be tested Entry carries out participle and screening, it is thus achieved that the comparison word set of described related term to be tested;With
Judge acquiring unit, for when the friendship of the comparison word set to described relevant word set to be tested judging described related term to be tested When the absolute value of collection is more than screening threshold value, described related term to be tested is the next related term of described input word, it is thus achieved that the next relevant Word set;Wherein, described absolute value is as the degree of association of described the next related term with described key word.
Further, described relevant word set unit to be tested, specifically include:
First entry subelement, for according to described input word, obtain from entry data base comprise described input word and Sequence entry before M position;
First adjusts subelement, for according to standard words wiht strip-lattice type, the entry obtained being carried out Format adjusting;
First calls subelement, is used for calling participle instrument;
First participle subelement, for utilizing described participle instrument that the entry after Format adjusting is carried out participle, it is thus achieved that the One word collection;With,
First extracts subelement, for concentrating the word extracting the core word belonged to user's word to make from described first word For related term to be tested, it is thus achieved that relevant word set to be tested;Wherein, described user-oriented dictionary is provided by described participle instrument;
And, described comparison word set unit specifically includes:
Second entry subelement, for according to described related term to be tested, from entry data base, acquisition comprises described to be tested Related term and sequence entry before M position;
Second adjusts subelement, for according to described standard words wiht strip-lattice type, comprises described related term to be tested and row to described Sequence entry before M position carries out Format adjusting;
Second calls subelement, is used for calling described participle instrument;
Second participle subelement, for utilize described participle instrument to after Format adjusting comprise described related term to be tested and Sequence entry before M position carries out participle, it is thus achieved that the second word collection;With,
Second extracts subelement, for extracting, according to from described second word concentration, the core word belonged to user-oriented dictionary Word is as comparison word, it is thus achieved that comparison word set.
Further, the device of described offer related term also includes normalization module:
Described normalization module, for each the next related term and institute of being concentrated by the next related term of described key word The degree of association stating key word all deducts described screening threshold value;And it is each for what the upper related term of described key word was concentrated Individual upper related term all deducts described screening threshold value with the degree of association of described key word, completes the normalization of degree of association.
Implement the embodiment of the present invention, have the advantages that
The method and apparatus providing related term that the embodiment of the present invention provides, the key word provided by user is from entry number It is correlated with word set according to the bottom obtaining described key word in storehouse, then further according to the relevant word set in this bottom, seeks out the upper of key word The relevant word set in position, finally the union of the relevant word set in this bottom and this upper relevant word set is as the output related term of described key word Collection, can expand substantial amounts of related term and be supplied to user's selection, it addition, be determined by the degree of association of related term, can retouch exactly State as the degree of correlation between related term and key word, follow-up can select be supplied to described user's according to the degree of association of related term Related term, can be described by the degree of association of related term, provide the user related term exactly.
Accompanying drawing explanation
Fig. 1 is the schematic flow sheet of an embodiment of the method providing related term that the present invention provides;
Fig. 2 is the schematic flow sheet of an embodiment of step S2 of the method providing related term that Fig. 1 provides;
Fig. 3 is the schematic flow sheet of an enforcement of step S3 of the method providing related term that Fig. 1 provides;
Fig. 4 is the schematic flow sheet of another embodiment of the method providing related term that the present invention provides;
Fig. 5 is the structural representation of an embodiment of the device providing related term that the present invention provides;
Fig. 6 is the structure of an embodiment of the upper relevant word set module of the device providing related term that the present invention provides Schematic diagram;
Fig. 7 is a reality of the unit for obtaining the next relevant word set of the device providing related term that the present invention provides Execute the structural representation of example;
Fig. 8 is the structure of an embodiment of the to be tested relevant word set unit of the device providing related term that the present invention provides Schematic diagram;
Fig. 9 is the structural representation of an embodiment of the comparison word set unit of the device providing related term that the present invention provides Figure.
Detailed description of the invention
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Describe, it is clear that described embodiment is only a part of embodiment of the present invention rather than whole embodiments wholely.Based on Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under not making creative work premise Embodiment, broadly falls into the scope of protection of the invention.
Seeing Fig. 1, Fig. 2 and Fig. 3, Fig. 1 is the flow process of an embodiment of the method providing related term that the present invention provides Schematic diagram, Fig. 2 is the schematic flow sheet of an embodiment of step S2 of the method providing related term that Fig. 1 provides, and Fig. 3 is figure The schematic flow sheet of one enforcement of step S3 of 1 method that related term is provided provided.Below in conjunction with these three flow chart, Using paper database (National IP Network in such as) as entry data base, as a example by therefrom obtaining the related term of key word Java, in detail The method providing related term of the present embodiment is described, the method comprises the following steps:
S1, with the key word Java of user's input for input word, obtains the bottom of key word Java from entry data base Relevant word set, and determine each the next related term of described the next related term concentration and the degree of association of described key word.Step Rapid S1 includes step S11 to S13, specific as follows:
S11, obtains the entry comprising described input word Java from paper database according to described input word Java, and right Described entry carries out participle and screening, it is thus achieved that relevant word set A={a to be tested1,…,an};The specific implementation process of this step is as follows:
Utilize search engine obtain from paper database according to described input word Java comprise described input word Java and Sequence entry before M position, such as, front page 50 abstracts of a thesis as entry, or, in Wiki, search for key word Java's Front 500 summaries;
According to standard words wiht strip-lattice type, described entry is carried out Format adjusting;Such as, the small letter in entry is unified into capitalization, To the punctuation mark in space deletion unnecessary in entry, unified entry, by full-shape form or the half width form unification of entry it is A kind of etc..
Call participle instrument;Preferably, described participle instrument is jieba participle instrument, but is not limited to this participle instrument.
Utilize described participle instrument that the entry after Format adjusting is carried out participle, it is thus achieved that the first word collection;
According to keyword extraction algorithm, concentrate from described first word and extract the word relevant to described input word as treating Test related term { a1,…,an, it is thus achieved that relevant word set A={a to be tested1,…,an}.It should be noted that can by participle instrument or Add dictionary by the device of this offer related term, utilize the core word that dictionary provides, concentrate from described first word and extract core Heart word is as related term to be tested.
S12, for described relevant word set A={a to be tested1,…,anEach in } related term to be tested, according to described to be tested Related term obtains the entry comprising described related term to be tested, and the entry to described related term to be tested from described entry data base Carry out participle and screening, it is thus achieved that the comparison word set of described related term to be tested.It should be noted that this step S22 and a upper step The specific implementation process of rapid S21 is identical, simply distinguishes the input word being in step S21 and becomes related term { a to be tested1,…,an, Then by obtained related term a to be testediTo be tested relevant word set Bai={ bi1,…,binAs related term a to be testediComparison word Collection, thus do not repeat them here.
S13, when judging described related term a to be testediComparison word set Bai={ bi1,…,binTo described relevant word set to be tested A={a1,…,anCommon factor absolute value r more than screening threshold value p time, i.e. BaiGather big with the quantity of identical element in set A In time screening threshold value p, described related term a to be testediThe next related term for described input word Java, it is thus achieved that under described key word The relevant word set A '={ a in positionj, and j ∈ 1 ..., n}, | A ' |≤n, | A ∩ Baj|>p;Wherein, the absolute value r of described common factor is institute State the degree of association that the next related term is concentrated at described the next related term.It should be noted that described degree of association is expressed as related term Degree of correlation between the input word of the related term word set relevant to this concentrated.
Obtained the relevant word set in bottom of input word by above-mentioned steps S11, S12 with S13, noise word can be filtered, improve Obtain the efficiency of the next related term.
Being correlated with word set in bottom according to described key word, obtains the upper related term of described key word from entry data base Collection, and determine each upper related term and the degree of association of described key word that described upper related term concentrates
S2, is correlated with word set A '={ a according to the bottom of described key wordj, from entry data base, obtain described key word Upper relevant word set, and determine that each upper related term that described upper related term is concentrated is relevant to described key word Degree.This step specifically include following steps S21 to S24:
S21, is correlated with word set A '={ a for described bottomjThe next related term of each in }, with this bottom related term aj Carry out more newly inputted word, i.e. as input word, from paper database, obtain the input word a after updatingjBottom be correlated with word set A "; Needs explanation, in this embodiment, it is preferred that, step S31 obtains in mode and above-mentioned steps S2 of the relevant word set in bottom The mode obtaining the relevant word set in bottom is consistent, does not repeats them here.
S22, it is judged that whether total quantity N of current the next relevant word set is more than predetermined threshold value S;
S23, the most then concentrate from the next related term and filter out all bottoms comprising described key word and be correlated with word set, and Using input word corresponding for the described the next related term collection comprising described key word as upper related term, it is thus achieved that described key word Upper relevant word set C;Wherein, in described key word and this bottom phase of the described the next related term concentration comprising described key word Close the degree of association of input word corresponding to word set, as this input word when as upper related term with the degree of association of described key word
S24, if it is not, then continue executing with following operation: for update after input word the next related term concentrate each Individual the next related term, with this most newly inputted word of bottom related term, obtains the input after again updating from entry data base Be correlated with word set in the bottom of word, until total quantity N of the next relevant word set is more than predetermined threshold value S;It should be noted that in step The mode obtaining the next relevant word set in S21 with S23 is also one to the mode obtaining the next relevant word set in above-mentioned steps S1 Cause, do not repeat them here.
It is to say, such as key word Java, the upper relevant word set of Java is the element in a set It is Java for input the next related term concentration of the set of word, i.e. each element in this set having identical element.Logical Cross and use the mode inverse upper relevant word set of asking for key word identical with obtaining the relevant word set in bottom, can be from multiple dimensions for using Family provides related term.
S3, the union of word set of being correlated with the bottom of described key word and upper relevant word set is as the output of described key word Relevant word set, and the degree of association of each the output related term concentrated according to described output related term, at described output related term Concentrate the related term selecting to be supplied to described user.
The be correlated with common factor of word set and described upper relevant word set of the bottom of the most described key word is included in described key The output related term of word is concentrated, then the degree of association of each the output related term being included in described common factor is T, T=(T1+ T2)/2;Wherein, T1 is the degree of association when this output related term is as the next related term with described key word, and T2 is as at this Output related term as during upper related term with the degree of association of described key word.It is to say, after union, upper relevant word set and The next related term concentrates the value of the degree of association of identical related term to be the equal of this related term degree of association in the two set Value.
As a further improvement on the present invention, described acquisition methods also includes being normalized degree of association:
Each the next related term concentrated by the next related term of described key word is equal with the degree of association of described key word Deduct described screening threshold value;
Each the upper related term concentrated by the upper related term of described key word is equal with the degree of association of described key word Deduct described screening threshold value, complete the normalization of degree of association.
It should be noted that normalized purpose is the related term and this key word allowing the output related term of key word concentrate The numerical value of degree of association of degree of correlation can be on the basis of 0, numerical value is the highest, and related term is the highest with the degree of correlation of key word, The convenient related term concentrating selection to be supplied to user at output related term in step s 4.
Implement the method that related term is provided of the embodiment of the present invention, by the to be tested the next related term obtained is compareed Related term after checking, as the next related term, can filter the impact of noise word, improves the quality of the related term got, the most just It is to say, can ensure that the accuracy of the related term being supplied to user.On the other hand, it is correlated with in the bottom getting key word word set After, carry out the inverse upper related term asking for key word when continuing through the next relevant word set, can extend in a large number and provide the user The quantity of related term, and the quality of the true upper related term of energy.
See Fig. 4, be the schematic flow sheet of another embodiment of the method that related term is provided that the present invention provides;This reality The method providing related term executing example is: respectively using paper database and wikipedia data base as entry data base, therefrom Obtain the corresponding first relevant word set of output and export relevant word set with second, then by the first relevant word set of output and the second output The union of relevant word set is correlated with word set as the final output of key word;Wherein, in wikipedia data base, obtain second The mode of the relevant word set of output is identical with the mode obtaining the relevant word set of output in a upper embodiment in paper database.This reality Executing example uses two kinds of different entry data bases and entry data base to be paper database and wikipedia data base, carries out phase Close the excavation of word, on the one hand the most with strong points for the extension of related term, and be avoided that language material is single, and providing the user of causing Related term obtain the most unilateral.
Correspondingly, see Fig. 5, be the structural representation of an embodiment of the device that related term is provided that the present invention provides Figure, can realize whole flow processs of above two embodiment, and the device of this offer related term includes:
The next relevant word set module 10, for the key word using user's input as input word, obtains from entry data base Take the bottom of described key word to be correlated with word set, and determine that each the next related term that described the next related term concentrates is with described The degree of association of key word;
Upper relevant word set module 20, for being correlated with word set according to the bottom of described key word, obtains from entry data base Take the upper relevant word set of described key word, and determine that each upper related term that described upper related term concentrates is with described The degree of association of key word;
The relevant word set module 30 of output, for be correlated with the bottom of described key word word set and the union of upper relevant word set It is correlated with word set as the output of described key word, and each the output related term concentrated according to described output related term is relevant Degree, concentrates the related term selecting to be supplied to described user at described output related term.
As the further improvement of the embodiment of the present invention, as shown in Figure 6, Fig. 6 is the offer related term that the present invention provides The structural representation of one embodiment of the upper relevant word set module of device;This upper relevant word set module 30 specifically includes: The next word set acquiring unit 31, threshold decision unit 32 and upper word set acquiring unit 33, wherein,
Described the next word set acquiring unit 31, for each the next related term concentrated for described the next related term, Carrying out more newly inputted word with this bottom related term, is correlated with word set in the bottom obtaining the input word after updating from entry data base;
Described threshold decision unit 32, for judging that whether the total quantity of the next relevant word set is more than predetermined threshold value;
Described upper word set acquiring unit 33, during for being more than predetermined threshold value when the total quantity judging the relevant word set in bottom, Concentrate from the next related term and filter out the bottom comprising described key word and be correlated with word set, and comprise described under described key word Position input word corresponding to related term collection is as upper related term, it is thus achieved that the upper relevant word set of described key word;Wherein, described Comprise described key word the next related term concentrate described key word corresponding to this bottom related term collection input word relevant Degree, as this input word when as upper related term with the degree of association of described key word;
Described the next word set acquiring unit 31, is additionally operable to when judging that the total quantity of the relevant word set in bottom is less than predetermined threshold value Time, continue executing with following operation: for each the next related term of the next related term concentration of the input word after updating, with this The next the most newly inputted word of related term, obtains the bottom of the input word after again updating from entry data base and is correlated with word set, Until the total quantity of the next relevant word set is more than predetermined threshold value.
Further, described bottom be correlated with word set module 20 and described the next word set acquiring unit 31 the most also include for from Entry data base obtains the unit of the relevant word set in bottom, as it is shown in fig. 7, the dress that related term is provided that Fig. 7 is the present invention to be provided The structural representation of one embodiment of the unit for obtaining the next relevant word set put, specifically includes:
Relevant word set unit 1 to be tested, for according to described input word, obtains from entry data base and comprises described input word Entry, and described entry is carried out participle and screening, it is thus achieved that relevant word set to be tested;
Comparison word set unit 2, for each related term to be tested concentrated for described related term to be tested, treats according to described Test related term, from described entry data base, obtain the entry comprising described related term to be tested, and to described related term to be tested Entry carries out participle and screening, it is thus achieved that the comparison word set of described related term to be tested;With
Judge acquiring unit 3, for when judging the comparison word set of described related term to be tested and described relevant word set to be tested When the absolute value occured simultaneously is more than screening threshold value, described related term to be tested is the next related term of described input word, it is thus achieved that the next phase Close word set;Wherein, described absolute value as described the next related term at the degree of association with described key word.
Further, as shown in Figure 8, Fig. 8 is the to be tested relevant word set list of the device providing related term that the present invention provides The structural representation of one embodiment of unit;Described relevant word set unit 1 to be tested, specifically includes:
First entry subelement 11, for according to described input word, obtains from entry data base and comprises described input word And the entry that sequence is before M position;
First adjusts subelement 12, for according to standard words wiht strip-lattice type, the entry obtained being carried out Format adjusting;
First calls subelement 13, is used for calling participle instrument;
First participle subelement 14, for utilizing described participle instrument that the entry after Format adjusting is carried out participle, it is thus achieved that First word collection;With,
First extracts subelement 15, extracts the word of the core word belonged to user's word from described first word concentration for root Language is as related term to be tested, it is thus achieved that relevant word set to be tested;Wherein, described user-oriented dictionary is provided by described participle instrument;
And, as it is shown in figure 9, one of the comparison word set unit of the device that related term is provided that Fig. 9 is the present invention to be provided The structural representation of embodiment;Described comparison word set unit 2 specifically includes:
Second entry subelement 21, for according to described related term to be tested, obtains from entry data base and treats described in comprising Test related term and sequence entry before M position;
Second adjusts subelement 22, for according to described standard words wiht strip-lattice type, to described comprise described related term to be tested and Sequence entry before M position carries out Format adjusting;
Second calls subelement 23, is used for calling described participle instrument;
Second participle subelement 24, for utilizing described participle instrument to comprising described related term to be tested after Format adjusting And the entry that sequence is before M position carries out participle, it is thus achieved that the second word collection;With,
Second extracts subelement 25, for concentrating the word of the core word belonged to user-oriented dictionary to make from described second word For comparison word, it is thus achieved that comparison word set.
Specifically, the be correlated with common factor of word set and described upper relevant word set of the bottom of described key word is included in described key The output related term of word is concentrated, then the degree of association of each the output related term being included in described common factor is T, T=(T1+ T2)/2;Wherein, T1 is the degree of association when this output related term is as the next related term with described key word, and T2 is as at this Output related term as during upper related term with the degree of association of described key word.
Further, as it is shown in figure 5, the device of described offer related term also includes normalization module 40:
Described normalization module, for each the next related term and institute of being concentrated by the next related term of described key word The degree of association stating key word all deducts described screening threshold value;And it is each for what the upper related term of described key word was concentrated Individual upper related term all deducts described screening threshold value with the degree of association of described key word, completes the normalization of degree of association.
The device providing related term that the embodiment of the present invention provides, by compareing the to be tested the next related term obtained Related term after checking, as the next related term, can filter the impact of noise word, improves the quality of the related term got, the most just It is to say, can ensure that the accuracy of the related term being supplied to user.On the other hand, it is correlated with in the bottom getting key word word set After, carry out the inverse upper related term asking for key word when continuing through the next relevant word set, can extend in a large number and provide the user The quantity of related term, and the quality of the true upper related term of energy.
One of ordinary skill in the art will appreciate that all or part of flow process realizing in above-described embodiment method, be permissible Instructing relevant hardware by computer program to complete, described program can be stored in a computer read/write memory medium In, this program is upon execution, it may include such as the flow process of the embodiment of above-mentioned each method.Wherein, described storage medium can be magnetic Dish, CD, read-only store-memory body (Read-Only Memory, ROM) or random store-memory body (Random Access Memory, RAM) etc..
The above is the preferred embodiment of the present invention, it is noted that for those skilled in the art For, under the premise without departing from the principles of the invention, it is also possible to make some improvements and modifications, these improvements and modifications are also considered as Protection scope of the present invention.

Claims (11)

1. the method that related term is provided, it is characterised in that including:
Using the key word of user's input as input word, from entry data base, obtain the bottom of described key word be correlated with word set, And determine each the next related term of described the next related term concentration and the degree of association of described key word;
Being correlated with word set in bottom according to described key word, obtains the upper relevant word set of described key word from entry data base, And determine the degree of association of each upper related term that described upper related term concentrates and described key word;
The union of word set of being correlated with the bottom of described key word and upper relevant word set is as the output related term of described key word Collection, and the degree of association of each the output related term concentrated according to described output related term, concentrate choosing at described output related term Select the related term being supplied to described user.
2. the method that related term is provided as claimed in claim 1, it is characterised in that the described the next phase according to described key word Close word set, from entry data base, obtain the upper relevant word set of described key word, and determine that described upper related term is concentrated The degree of association of each upper related term and described key word, particularly as follows:
Each the next related term concentrated for described the next related term, carrys out more newly inputted word, from word with this bottom related term It is correlated with word set in the bottom obtaining the input word after updating in data storehouse;
Judge that whether the total quantity of the next relevant word set is more than predetermined threshold value;
The most then concentrate from the next related term and filter out the bottom comprising described key word and be correlated with word set, and comprise institute by described State input word corresponding to the next related term collection of key word as upper related term, it is thus achieved that the upper related term of described key word Collection;Wherein, the described key word concentrated at the described the next related term comprising described key word is corresponding with this bottom related term collection The degree of association of input word, as this input word when as upper related term with the degree of association of described key word;
If it is not, then continue executing with following operation: for each the next phase of the next related term concentration of the input word after updating Close word, with this most newly inputted word of bottom related term, from entry data base, obtain the bottom of the input word after again updating Relevant word set, until the total quantity of the next relevant word set is more than predetermined threshold value.
3. the method that related term is provided as claimed in claim 2, it is characterised in that obtain bottom from entry data base relevant The mode of word set specifically includes:
According to described input word, from entry data base, obtain the entry comprising described input word, and described entry is carried out point Word and screening, it is thus achieved that relevant word set to be tested;
Each related term to be tested concentrated for described related term to be tested, according to described related term to be tested, from described entry number According to storehouse obtains the entry comprising described related term to be tested, and the entry of described related term to be tested is carried out participle and screening, obtain Obtain the comparison word set of described related term to be tested;
When the absolute value of the comparison word set to the common factor of described relevant word set to be tested judging described related term to be tested is more than screening threshold During value, described related term to be tested is the next related term of described input word, it is thus achieved that the next relevant word set;Wherein, described absolute value Degree of association as described the next related term with described key word.
4. the method that related term is provided as claimed in claim 3, it is characterised in that described according to described input word, from entry Data base obtains the entry comprising described input word, and described entry is carried out participle and screening, it is thus achieved that relevant word set to be tested, Specifically include:
According to described input word, obtain from entry data base and comprise described input word and sequence entry before M position;
According to standard words wiht strip-lattice type, the entry obtained is carried out Format adjusting;
Call participle instrument;
Utilize described participle instrument that the entry after Format adjusting is carried out participle, it is thus achieved that the first word collection;
Concentrate the word extracting the core word belonged to user-oriented dictionary as related term to be tested from described first word, it is thus achieved that to be tested Relevant word set;Wherein, described user-oriented dictionary is provided by described participle instrument;
And, described according to described related term to be tested, from described entry data base, obtain the word comprising described related term to be tested Bar, and the entry of described related term to be tested is carried out participle and screening, it is thus achieved that the comparison word set of described related term to be tested, specifically wrap Include:
According to described related term to be tested, obtain from entry data base and comprise described related term to be tested and sequence before M position Entry;
According to described standard words wiht strip-lattice type, comprise described related term to be tested and sequence entry before M position carries out lattice to described Formula adjusts;
Call described participle instrument;
Utilize described participle instrument to after Format adjusting comprise described related term to be tested and sequence entry before M position is carried out Participle, it is thus achieved that the second word collection;
Concentrate the word extracting the core word belonged to user-oriented dictionary as comparison word from described second word, it is thus achieved that comparison word Collection.
5. the as claimed in claim 1 method that related term is provided, it is characterised in that the bottom of described key word be correlated with word set and The common factor of described upper relevant word set is included in the output related term of described key word and concentrates, then be included in described common factor is every The degree of association of one output related term is T, T=(T1+T2)/2;Wherein, T1 is as the next related term at this output related term Time with the degree of association of described key word, T2 is as relevant to described key word when this output related term is as upper related term Degree.
6. the method that related term is provided as claimed in claim 3, it is characterised in that described acquisition methods also includes:
Each the next related term concentrated by the next related term of described key word all deducts with the degree of association of described key word Described screening threshold value;
Each the upper related term concentrated by the upper related term of described key word all deducts with the degree of association of described key word Described screening threshold value, completes the normalization of degree of association.
7. the device that related term is provided, it is characterised in that including:
The next relevant word set module, for the key word using user's input as input word, obtains described from entry data base Being correlated with word set in the bottom of key word, and determines each the next related term and described key word that described the next related term concentrates Degree of association;
Upper relevant word set module, for being correlated with word set according to the bottom of described key word, obtains described from entry data base The upper relevant word set of key word, and determine each upper related term and described key word that described upper related term concentrates Degree of association;
The relevant word set module of output, for the union of be correlated with the bottom of described key word word set and upper relevant word set as institute State the output of key word to be correlated with word set, and the degree of association of each the output related term concentrated according to described output related term, Described output related term concentrates the related term selecting to be supplied to described user.
8. the device that related term is provided as claimed in claim 7, it is characterised in that described upper relevant word set module is specifically wrapped Include: the next word set acquiring unit, threshold decision unit and upper word set acquiring unit, wherein,
Described the next word set acquiring unit, for each the next related term concentrated for described the next related term, with under this Position related term carrys out more newly inputted word, is correlated with word set in the bottom obtaining the input word after updating from entry data base;
Described threshold decision unit, for judging that whether the total quantity of the next relevant word set is more than predetermined threshold value;
Described upper word set acquiring unit, during for being more than predetermined threshold value when the total quantity judging the relevant word set in bottom, from bottom Related term is concentrated and is filtered out the bottom comprising described key word and be correlated with word set, and is correlated with the described bottom comprising described key word Input word corresponding to word set is as upper related term, it is thus achieved that the upper relevant word set of described key word;Wherein, institute is comprised described State key word the next related term concentrate described key word corresponding with this bottom related term collection input word degree of association, as This input word when as upper related term with the degree of association of described key word;
Described the next word set acquiring unit, is additionally operable to, when the total quantity judging the relevant word set in bottom is less than predetermined threshold value, continue Operation below performing: each the bottom related term concentrated for the next related term inputting word after updating, with this bottom phase Closing the most newly inputted word of word, is correlated with word set in the bottom obtaining the input word after again updating from entry data base, until under The total quantity of the relevant word set in position is more than predetermined threshold value.
9. the as claimed in claim 8 device that related term is provided, it is characterised in that be correlated with word set module and described in described bottom The next word set acquiring unit also includes the unit for obtaining the relevant word set in bottom from entry data base, particularly as follows:
Relevant word set unit to be tested, for according to described input word, obtaining the word comprising described input word from entry data base Bar, and described entry is carried out participle and screening, it is thus achieved that relevant word set to be tested;
Comparison word set unit, for each related term to be tested concentrated for described related term to be tested, according to described phase to be tested Close word, from described entry data base, obtain the entry comprising described related term to be tested, and the entry to described related term to be tested Carry out participle and screening, it is thus achieved that the comparison word set of described related term to be tested;With
Judge acquiring unit, for the common factor when the comparison word set to described relevant word set to be tested judging described related term to be tested When absolute value is more than screening threshold value, described related term to be tested is the next related term of described input word, it is thus achieved that the next relevant word set; Wherein, described absolute value is as the degree of association of described the next related term with described key word.
10. the device that related term is provided as claimed in claim 9, it is characterised in that described relevant word set unit to be tested, specifically Including:
First entry subelement, for according to described input word, obtains from entry data base and comprises described input word and sequence Entry before M position;
First adjusts subelement, for according to standard words wiht strip-lattice type, the entry obtained being carried out Format adjusting;
First calls subelement, is used for calling participle instrument;
First participle subelement, for utilizing described participle instrument that the entry after Format adjusting is carried out participle, it is thus achieved that the first word Language collection;With,
First extracts subelement, for concentrating the word extracting the core word belonged to user's word as treating from described first word Test related term, it is thus achieved that relevant word set to be tested;Wherein, described user-oriented dictionary is provided by described participle instrument;
And, described comparison word set unit specifically includes:
Second entry subelement, for according to described related term to be tested, from entry data base, acquisition comprises described to be tested relevant Word and sequence entry before M position;
Second adjusts subelement, for according to described standard words wiht strip-lattice type, comprises described related term to be tested and sequence exists to described Entry before M position carries out Format adjusting;
Second calls subelement, is used for calling described participle instrument;
Second participle subelement, for utilizing described participle instrument to comprising described related term to be tested and sequence after Format adjusting Entry before M position carries out participle, it is thus achieved that the second word collection;With,
Second extracts subelement, for according to concentrating the word extracting the core word belonged to user-oriented dictionary from described second word As comparison word, it is thus achieved that comparison word set.
11. devices that related term is provided as claimed in claim 10, it is characterised in that the device of described offer related term also wraps Include normalization module:
Described normalization module, for each the next related term concentrated by the next related term of described key word and described pass The degree of association of keyword all deducts described screening threshold value;And it is used for each concentrated by the upper related term of described key word Position related term all deducts described screening threshold value with the degree of association of described key word, completes the normalization of degree of association.
CN201610445489.2A 2016-06-17 2016-06-17 The method and apparatus of related term are provided Active CN106126588B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201610445489.2A CN106126588B (en) 2016-06-17 2016-06-17 The method and apparatus of related term are provided
PCT/CN2016/113175 WO2017215244A1 (en) 2016-06-17 2016-12-29 Method and device for providing relevant words

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610445489.2A CN106126588B (en) 2016-06-17 2016-06-17 The method and apparatus of related term are provided

Publications (2)

Publication Number Publication Date
CN106126588A true CN106126588A (en) 2016-11-16
CN106126588B CN106126588B (en) 2019-09-20

Family

ID=57470913

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610445489.2A Active CN106126588B (en) 2016-06-17 2016-06-17 The method and apparatus of related term are provided

Country Status (2)

Country Link
CN (1) CN106126588B (en)
WO (1) WO2017215244A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017215244A1 (en) * 2016-06-17 2017-12-21 广州视源电子科技股份有限公司 Method and device for providing relevant words
CN108304366A (en) * 2017-03-21 2018-07-20 腾讯科技(深圳)有限公司 A kind of hypernym detection method and equipment
CN108628832A (en) * 2018-05-08 2018-10-09 中国联合网络通信集团有限公司 A kind of information keyword acquisition methods and device
CN109241525A (en) * 2018-08-20 2019-01-18 深圳追科技有限公司 Extracting method, the device and system of keyword

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103810274A (en) * 2014-02-12 2014-05-21 北京联合大学 Multi-feature image tag sorting method based on WordNet semantic similarity
CN104008097A (en) * 2013-02-21 2014-08-27 日电(中国)有限公司 Method and device for achieving query understanding
CN104123351A (en) * 2014-07-09 2014-10-29 百度在线网络技术(北京)有限公司 Interactive search method and device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5044236B2 (en) * 2007-01-12 2012-10-10 富士フイルム株式会社 Content search device and content search method
TW201214163A (en) * 2010-09-21 2012-04-01 Inventec Corp Searching system and method thereof with generating extending keywords according to input keywords
CN103778262B (en) * 2014-03-06 2017-07-21 北京林业大学 Information retrieval method and device based on thesaurus
CN106126588B (en) * 2016-06-17 2019-09-20 广州视源电子科技股份有限公司 The method and apparatus of related term are provided

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104008097A (en) * 2013-02-21 2014-08-27 日电(中国)有限公司 Method and device for achieving query understanding
CN103810274A (en) * 2014-02-12 2014-05-21 北京联合大学 Multi-feature image tag sorting method based on WordNet semantic similarity
CN104123351A (en) * 2014-07-09 2014-10-29 百度在线网络技术(北京)有限公司 Interactive search method and device

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017215244A1 (en) * 2016-06-17 2017-12-21 广州视源电子科技股份有限公司 Method and device for providing relevant words
CN108304366A (en) * 2017-03-21 2018-07-20 腾讯科技(深圳)有限公司 A kind of hypernym detection method and equipment
WO2018171499A1 (en) * 2017-03-21 2018-09-27 腾讯科技(深圳)有限公司 Information detection method, device and storage medium
CN108628832A (en) * 2018-05-08 2018-10-09 中国联合网络通信集团有限公司 A kind of information keyword acquisition methods and device
CN108628832B (en) * 2018-05-08 2022-03-18 中国联合网络通信集团有限公司 Method and device for acquiring information keywords
CN109241525A (en) * 2018-08-20 2019-01-18 深圳追科技有限公司 Extracting method, the device and system of keyword
CN109241525B (en) * 2018-08-20 2022-05-06 深圳追一科技有限公司 Keyword extraction method, device and system

Also Published As

Publication number Publication date
CN106126588B (en) 2019-09-20
WO2017215244A1 (en) 2017-12-21

Similar Documents

Publication Publication Date Title
CN102915342B (en) Search index based on topic is provided
CN102279851B (en) Intelligent navigation method, device and system
CN106126588A (en) The method and apparatus that related term is provided
CN105260362B (en) New words extraction method and apparatus
CN105446989B (en) Searching method and device, display device
CN105956161A (en) Information recommendation method and apparatus
CN103942712A (en) Product similarity based e-commerce recommendation system and method thereof
US20190197071A1 (en) System and method for evaluating nodes of funnel model
CN106126589A (en) Resume searching method and device
CN105302815B (en) The filter method and device of the uniform resource position mark URL of webpage
CN105847288A (en) Verification code processing method and device
CN106294535A (en) The recognition methods of website and device
CN108876470A (en) Tagging user extended method, computer equipment and storage medium
CN105069077A (en) Search method and device
CN105631007A (en) Industry technical information collecting method and system
CN106156114A (en) Patent retrieval method and device
CN104699837B (en) Method, device and server for selecting illustrated pictures of web pages
CN102902788B (en) Browsing device net page label automatic grouping system and method
CN108197243A (en) Method and device is recommended in a kind of input association based on user identity
CN107220378A (en) Form sort method and device, WEB page methods of exhibiting and device
CN104268572A (en) Feature extraction and feature selection method oriented to background multi-source data
CN107783962A (en) Method and device for query statement
CN104102704B (en) System control methods of exhibiting and device
CN106650610A (en) Human face expression data collection method and device
CN111488434B (en) Recommendation method and device for input associative words, storage medium and electronic equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant