CN103186647A - Method and device for sequencing according to contribution degree - Google Patents

Method and device for sequencing according to contribution degree Download PDF

Info

Publication number
CN103186647A
CN103186647A CN2011104606657A CN201110460665A CN103186647A CN 103186647 A CN103186647 A CN 103186647A CN 2011104606657 A CN2011104606657 A CN 2011104606657A CN 201110460665 A CN201110460665 A CN 201110460665A CN 103186647 A CN103186647 A CN 103186647A
Authority
CN
China
Prior art keywords
textual analysis
lexical
frequency
analysis item
contribution degree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2011104606657A
Other languages
Chinese (zh)
Other versions
CN103186647B (en
Inventor
田建峰
张朝胜
于亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Office Software Inc
Original Assignee
Beijing Kingsoft Software Co Ltd
Beijing Jinshan Digital Entertainment Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Software Co Ltd, Beijing Jinshan Digital Entertainment Technology Co Ltd filed Critical Beijing Kingsoft Software Co Ltd
Priority to CN201110460665.7A priority Critical patent/CN103186647B/en
Publication of CN103186647A publication Critical patent/CN103186647A/en
Application granted granted Critical
Publication of CN103186647B publication Critical patent/CN103186647B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a device for sequencing according to contribution degree. The method comprises the following steps: determining sequencing target items and selecting at least two dimensions according to the characteristics of the sequencing target items; calculating sequencing parameters of the sequencing target items on the at least two dimensions; calculating the contribution degree vector of the sequencing target items on the at least two dimensions according to the sequencing parameters; calculating and synthesizing the contribution degree vector for the normalized contribution degree vector according to the weight of the dimensions; and sequencing the sequencing target items according to the synthesized contribution degree vector. According to the method and the device, the contribution degree vector of the sequencing target items on at least two dimensions can be calculated and synthesized, and the sequencing target items are sequenced according to the synthesized contribution degree vector, so the sequencing accuracy degree of the sequencing target items is increased, and the sequence of the sequencing target items accords with the using habits of users.

Description

A kind of method and device according to the contribution degree ordering
Technical field
The present invention relates to the ordering field, especially relate to a kind of method and device according to the ordering of contribution degree vector.
Background technology
An entry usually has a plurality of lexical or textual analysis items in dictionary, but each lexical or textual analysis item is not to be equal to the user.It is that most of users are commonly used in life that some lexical or textual analysis items are arranged, the user wish can be in dictionary first just see these lexical or textual analysis items.Some are arranged is more uncommon, generally can not be used to.Therefore, in order to strengthen the ease for use of dictionary, lexical or textual analysis item commonly used can be placed on forward position compiling in the process of dictionary usually, first just shows the explanation that it is wanted most to the user, and lexical or textual analysis item that will be more uncommon is placed on the position after leaning on.The dictionary field belongs to specific technical field, and for the general dictionary after merging by many dictionaries, prior art perhaps relies on and manually arranges by randomly ordered in this field, and is time-consuming, require great effort but also a large amount of subjective factor that can mix is entered.The method that also has a kind of dictionary to sort in addition is based on the machine statistics, but this method Consideration is single, and the effect that obtains is bad.Such as the frequency that only occurs in corpus according to the lexical or textual analysis item lexical or textual analysis item is sorted.But this method Consideration is single, and artificial comparative analysis finds that the ranking results that obtains is not accurate enough, and sort method is intelligence inadequately.At first, whether a lexical or textual analysis item is that the lexical or textual analysis item of using always is not merely determined by the frequency that this lexical or textual analysis item occurs in corpus.In other words, the lexical or textual analysis item that frequency of occurrence is many in corpus is not must be the lexical or textual analysis item of using always.Secondly, to distinguishing at corpus intermediate frequency time identical lexical or textual analysis item, also just can't sort to it.Therefore, prior art can not be effectively sorts accurately to the lexical or textual analysis item of a certain entry in the dictionary.
This problem appears in present search engine or the forum's comment equally.At search engine or check that forum's comment the time a plurality of results can occur equally, prior art does not provide a kind of effective sort method yet.
Summary of the invention
The invention provides a kind of method and device according to the contribution degree ordering, by calculate and integrated ordered target item at the contribution degree vector of at least two dimensions, contribution degree vector after utilizing comprehensively is to the ordering of ordering target item, improved the accuracy to ordering target item ordering, made the order of ordering target item more meet user's use habit.
The invention provides a kind of method according to the contribution degree ordering, described method comprises:
Determine the ordering target item, and according to the feature of described ordering target item, choose at least two dimensions;
Add up the parameters sortnig of described ordering target item on described at least two dimensions;
According to described parameters sortnig, calculate the contribution degree vector of described ordering target item on described at least two dimensions;
The weight of the described dimension of foundation is to normalized described contribution degree vector calculation comprehensive contribution degree vector;
According to described comprehensive contribution degree vector described ordering target item is sorted.
Preferably, when representative lexical or textual analysis item that described ordering target item is definition of head-word item, described at least two dimensions are at least two dimensions in the following dimension:
Word frequently; The length frequency; Occurrence frequency in the example sentence; Occurrence frequency in the corpus; The frequency that occurs in the dictionary; Put in order; Standardization;
The parameters sortnig of the described ordering target item of described statistics on described at least two dimensions comprises:
Add up the parameters sortnig of described representative lexical or textual analysis item on described at least two dimensions;
Described according to described parameters sortnig, calculate the contribution degree vector of described ordering target item on described at least two dimensions and comprise:
According to described parameters sortnig, calculate the contribution degree vector of described representative lexical or textual analysis item on described at least two dimensions;
Described described ordering target item the ordering according to described comprehensive contribution degree vector comprises:
According to described comprehensive contribution degree vector described representative lexical or textual analysis item is sorted.
Preferably, when a dimension in described at least two dimensions was the word frequency, the parameters sortnig of the described representative lexical or textual analysis of described statistics item on described at least two dimensions comprised:
Add up the frequency of each word in all lexical or textual analysis items of described entry;
Calculate the average word frequency of described representative lexical or textual analysis item according to the frequency of described statistics;
Described according to described parameters sortnig, calculate the contribution degree vector of described representative lexical or textual analysis item on described at least two dimensions and comprise:
According to the average word of described representative lexical or textual analysis item frequently and the average word of described representative lexical or textual analysis item frequently and, calculate the word frequency contribution degree vector of described representative lexical or textual analysis item.
Preferably, when a dimension in described at least two dimensions was the length frequency, the parameters sortnig of the described representative lexical or textual analysis of described statistics item on described at least two dimensions comprised:
Add up the length of all lexical or textual analysis items of described entry and the frequency of described length;
Obtain the frequency of the length of described representative lexical or textual analysis item;
Described according to described parameters sortnig, calculate the contribution degree vector of described representative lexical or textual analysis item on described at least two dimensions and comprise:
According to the frequency of the length of the frequency of the length of described representative lexical or textual analysis item and described representative lexical or textual analysis item and, calculate the length contribution degree vector of described representative lexical or textual analysis item.
Preferably, when a dimension in described at least two dimensions is in the example sentence during occurrence frequency, the parameters sortnig of the described representative lexical or textual analysis of described statistics item on described at least two dimensions comprises:
Add up the frequency that described representative lexical or textual analysis item occurs in the example sentence of Query Result;
Calculate the frequency that described representative lexical or textual analysis item occurs in the example sentence of Query Result and;
Described according to described parameters sortnig, calculate the contribution degree vector of described representative lexical or textual analysis item on described at least two dimensions and comprise:
According to the described frequency that in the example sentence of Query Result, occurs and the described frequency that in the example sentence of Query Result, occurs and, calculate the example sentence frequency contribution degree vector of described representative lexical or textual analysis item.
Preferably, when a dimension in described at least two dimensions is in the corpus during occurrence frequency, the parameters sortnig of the described representative lexical or textual analysis of described statistics item on described at least two dimensions comprises:
Add up the frequency that described representative lexical or textual analysis item occurs in the word frequency list of described corpus;
Calculate the frequency that described representative lexical or textual analysis item occurs in the word frequency list of described corpus and;
Described according to described parameters sortnig, calculate the contribution degree vector of described representative lexical or textual analysis item on described at least two dimensions and comprise:
According to the frequency that in the word frequency list of described corpus, occurs and the frequency that in the word frequency list of described corpus, occurs and, calculate the corpus frequency contribution degree vector of described representative lexical or textual analysis item.
Preferably, when a dimension in described at least two dimensions is in the dictionary during occurrence frequency, the parameters sortnig of the described representative lexical or textual analysis of described statistics item on described at least two dimensions comprises:
Add up the frequency that described representative lexical or textual analysis item occurs in the lexical or textual analysis item of entry described in the described dictionary;
Calculate the frequency that described representative lexical or textual analysis item occurs in the lexical or textual analysis item of entry described in the described dictionary and;
Described according to described parameters sortnig, calculate the contribution degree vector of described representative lexical or textual analysis item on described at least two dimensions and comprise:
According to the frequency that in the lexical or textual analysis item of entry described in the described dictionary, occurs and the frequency that in the lexical or textual analysis item of entry described in the described dictionary, occurs and, calculate the dictionary frequency contribution degree vector of described representative lexical or textual analysis item.
Preferably, when a dimension in described at least two dimensions when putting in order dimension, the parameters sortnig of the described representative lexical or textual analysis of described statistics item on described at least two dimensions comprises:
Compose branch in proper order according to the front and back of described representative lexical or textual analysis item in current dictionary; Wherein, represent the preceding lexical or textual analysis item mark be higher than after representative lexical or textual analysis item mark;
Described according to described parameters sortnig, calculate the contribution degree vector of described representative lexical or textual analysis item on described at least two dimensions and comprise:
Calculate the order contribution degree vector of described representative lexical or textual analysis item according to described mark.
Preferably, when a dimension in described at least two dimensions was the standardization dimension, the parameters sortnig of the described representative lexical or textual analysis of described statistics item on described at least two dimensions comprised:
Add up character lack of standardization in the described representative lexical or textual analysis item and the corresponding relation of described character and described representative lexical or textual analysis item;
Described according to described parameters sortnig, calculate the contribution degree vector of described representative lexical or textual analysis item on described at least two dimensions and comprise:
Character lack of standardization and described corresponding relation according to described statistics reduce the corresponding vector value that represents the lexical or textual analysis item in the initialized standardization contribution degree vector, generate the standardization contribution degree vector of described representative lexical or textual analysis item.
The present invention also provides a kind of device according to the contribution degree ordering, and described device comprises:
Determining unit is used for determining the ordering target item;
Selected cell is used for the feature according to described ordering target item, chooses at least two dimensions;
The parameters sortnig unit is used for the parameters sortnig of the described ordering target item of statistics on described at least two dimensions;
The contribution degree vector location is used for according to described parameters sortnig, calculates the contribution degree vector of described ordering target item on described at least two dimensions;
Comprehensive contribution degree vector location is used for the weight of the described dimension of foundation to normalized described contribution degree vector calculation comprehensive contribution degree vector;
Sequencing unit is used for according to described comprehensive contribution degree vector described ordering target item being sorted.
Preferably, when representative lexical or textual analysis item that described ordering target item is definition of head-word item, described at least two dimensions are at least two dimensions in the following dimension:
Word frequently; The length frequency; Occurrence frequency in the example sentence; Occurrence frequency in the corpus; The frequency that occurs in the dictionary; Put in order; Standardization;
Described parameters sortnig unit also is used for the parameters sortnig of the described representative lexical or textual analysis item of statistics on described at least two dimensions;
Described contribution degree vector location also is used for according to described parameters sortnig, calculates the contribution degree vector of described representative lexical or textual analysis item on described at least two dimensions;
Described comprehensive contribution degree vector location also is used for according to described comprehensive contribution degree vector described representative lexical or textual analysis item being sorted.
Preferably, when a dimension in described at least two dimensions was the word frequency, described parameters sortnig unit comprised:
Word is the unit frequently, is used for the frequency of all each words of lexical or textual analysis item of the described entry of statistics;
Average word is the unit frequently, is used for calculating according to the frequency of described statistics the average word frequency of described representative lexical or textual analysis item;
Described contribution degree vector location comprises:
Word is the contribution degree vector location frequently, be used for according to the average word of described representative lexical or textual analysis item frequently and the average word of described representative lexical or textual analysis item frequently and, calculate the word frequency contribution degree vector of described representative lexical or textual analysis item.
Preferably, when a dimension in described at least two dimensions was the length frequency, described parameters sortnig unit comprised:
The first length frequency unit is used for the length of all lexical or textual analysis items of the described entry of statistics and the frequency of described length;
The second length frequency unit is for the frequency of the length of obtaining described representative lexical or textual analysis item;
Described contribution degree vector location comprises:
Length contribution degree vector location, be used for according to the frequency of the length of the frequency of the length of described representative lexical or textual analysis item and described representative lexical or textual analysis item and, calculate the length contribution degree vector of described representative lexical or textual analysis item.
Preferably, when a dimension in described at least two dimensions is in the example sentence during occurrence frequency, described parameters sortnig unit comprises:
The first example sentence frequency unit is for adding up the frequency that the example sentence of described representative lexical or textual analysis item at Query Result occurs;
The second example sentence frequency unit, be used for to calculate the frequency that described representative lexical or textual analysis item occurs at the example sentence of Query Result and;
Described contribution degree vector location comprises:
Example sentence contribution degree vector location, be used for according to the described frequency that in the frequency that the example sentence of Query Result occurs and described example sentence at Query Result, occurs and, calculate the example sentence frequency contribution degree vector of described representative lexical or textual analysis item.
Preferably, when a dimension in described at least two dimensions is in the corpus during occurrence frequency, described parameters sortnig unit comprises:
The first corpus frequency unit is for adding up the frequency that the word frequency list of described representative lexical or textual analysis item at described corpus occurs;
The second corpus frequency unit, be used for to calculate the frequency that described representative lexical or textual analysis item occurs at the word frequency list of described corpus and;
Described contribution degree vector location comprises:
Corpus contribution degree vector location, be used for the frequency that occurs in the frequency that occurs according to the word frequency list at described corpus and the word frequency list at described corpus and, calculate the corpus frequency contribution degree vector of described representative lexical or textual analysis item.
Preferably, when a dimension in described at least two dimensions is in the dictionary during occurrence frequency, described parameters sortnig unit comprises:
The first dictionary frequency unit is for adding up the frequency that the lexical or textual analysis item of described representative lexical or textual analysis item at entry described in the described dictionary occurs;
The second dictionary frequency unit, be used for to calculate the frequency that described representative lexical or textual analysis item occurs at the lexical or textual analysis item of entry described in the described dictionary and;
Described contribution degree vector location comprises:
Dictionary contribution degree vector location, be used for the frequency that occurs according to the lexical or textual analysis item at entry described in the described dictionary and the frequency that in the lexical or textual analysis item of entry described in the described dictionary, occurs and, calculate the dictionary frequency contribution degree vector of described representative lexical or textual analysis item.
Preferably, when a dimension in described at least two dimensions when putting in order dimension, described parameters sortnig unit comprises:
Compose subdivision, be used for composing branch according to described representative lexical or textual analysis item in proper order in the front and back of current dictionary; Wherein, represent the preceding lexical or textual analysis item mark be higher than after representative lexical or textual analysis item mark;
Described contribution degree vector location comprises:
Order contribution degree vector location is for the order contribution degree vector that calculates described representative lexical or textual analysis item according to described mark.
Preferably, when a dimension in described at least two dimensions was the standardization dimension, described parameters sortnig unit comprised:
Specification unit is used for the character lack of standardization of the described representative lexical or textual analysis item of statistics and the corresponding relation of described character and described representative lexical or textual analysis item;
Described contribution degree vector location comprises:
Standardization contribution degree vector location is used for reducing the corresponding vector value that represents the lexical or textual analysis item of initialized standardization contribution degree vector according to character lack of standardization and the described corresponding relation of described statistics, generates the standardization contribution degree vector of described representative lexical or textual analysis item.
Compared with prior art, the present invention has following beneficial effect:
The present invention is by calculating the ordering target item at the contribution degree vector of at least two dimensions, and compose with corresponding weight according to the contribution of each dimension, behind a plurality of contribution degree vectors of comprehensive introducing, according to comprehensive contribution degree vector the ordering target item is carried out screening and sequencing, thereby it is more accurate to make ranking results compare with artificial screening, makes the order of ordering target item more meet user's use habit.
Description of drawings
In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art, to do to introduce simply to the accompanying drawing of required use among the embodiment below, apparently, accompanying drawing in describing below only is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain other accompanying drawing according to these accompanying drawings.
Fig. 1 is the embodiment of the invention 1 process flow diagram;
Fig. 2 is the embodiment of the invention 2 process flow diagrams;
Fig. 3 is the embodiment of the invention 3 process flow diagrams;
Fig. 4 is the embodiment of the invention 4 process flow diagrams;
Fig. 5 is the embodiment of the invention 5 process flow diagrams;
Fig. 6 is the embodiment of the invention 6 process flow diagrams;
Fig. 7 is the embodiment of the invention 7 process flow diagrams;
Fig. 8 is the embodiment of the invention 8 process flow diagrams;
Fig. 9 is the embodiment of the invention 9 process flow diagrams;
Figure 10 is apparatus of the present invention embodiment 10 structural drawing.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, the every other embodiment that those of ordinary skills obtain belongs to the scope of protection of the invention.
Referring to Fig. 1, the embodiment of the invention 1 provides a kind of method according to the contribution degree ordering, and this method comprises:
S11, determine the ordering target item, and according to the feature of described ordering target item, choose at least two dimensions.
Refer to need be to the object of its ordering for the ordering target item among the present invention, such as the lexical or textual analysis item of entry correspondence in the dictionary, and the result items of search engine or the comment in the forum etc.
The feature difference of ordering target item, the dimension that influences its ordering is just different.Such as to search engine, it is relevant with the pageview of its corresponding webpage whether result should come the front, and less with corresponding entries in the Search Results or the frequency relation of character in corpus.Therefore when application is of the present invention, need dimension be selected according to the characteristics of ordering target item.
In embodiments of the present invention, the lexical or textual analysis item with entry in the dictionary is that example is described.Corresponding definition of head-word item, in specific embodiments of the invention, at least two preferred following dimensions of dimension at least two: word is frequently; The length frequency; Occurrence frequency in the example sentence; Occurrence frequency in the corpus; The frequency that occurs in other dictionaries; In proper order; Standardization.Can be clear and definite be, the dimension of selecting for use is more many, the factor of consideration is more comprehensive, the accuracy of ordering is just more high.In a preferred embodiment of the invention, when the contribution degree vector that calculates in the dimension of selecting for use when the lexical or textual analysis item is identical, can continue to select for use other dimensions to calculate.
Entry in the dictionary has a plurality of parts of speech usually, such as an entry be noun be again adjective, the user is accustomed to same part of speech when checking lexical or textual analysis item is arranged in together.Describe and understanding for convenient, in an embodiment of the present invention, select the lexical or textual analysis item specifically to refer to the lexical or textual analysis item of the same part of speech of same entry.
Entry has synonym lexical or textual analysis item.Synonym lexical or textual analysis item refers to express with different phrases the lexical or textual analysis item of identical or similar explanation.In dictionary, has same label usually.Explain such as the Chinese of word: same and to be: same 2. identical of adj.1.; Same; (with ...) identical; Living.Easily as can be seen, in the lexical or textual analysis item of same, label is a lexical or textual analysis Xiang Weiyi group synonym lexical or textual analysis item of 1, and label is four lexical or textual analysis Xiang Weiyi group synonym lexical or textual analysis items of 2.When the lexical or textual analysis item was sorted, each the lexical or textual analysis item in one group of synonym lexical or textual analysis item bound together arrangement, therefore, for avoiding causing ordering chaotic, improved ordering speed simultaneously, can only select a lexical or textual analysis Xiang Wei in the synonym lexical or textual analysis item to represent the lexical or textual analysis item.Representing the contribution degree vector of lexical or textual analysis item at least two dimensions by calculating sorts.
Concrete, can select first lexical or textual analysis Xiang Wei in the synonym lexical or textual analysis item to represent the lexical or textual analysis item.Define and represent lexical or textual analysis item is W iSuch as it represents lexical or textual analysis item W among the above-mentioned same 1Be exactly " same ", W 2Be exactly " identical ".Can certainly calculate to select the lexical or textual analysis item by some, such as utilize with following identical calculating word frequently the method for contribution degree vector calculate the vector magnitude of synonym lexical or textual analysis item, and select the lexical or textual analysis Xiang Wei of vector maximum to represent the lexical or textual analysis item.The present invention does not limit concrete selection course.
S12, the parameters sortnig of statistics ordering target item at least two dimensions.
Parameters sortnig refers to sort, and target item itself has to its influential parameter that sorts.According to dimension and the different of target item of sorting, parameters sortnig is also inequality.
S13, according to parameters sortnig, calculate the contribution degree vector of ordering target item at least two dimensions.
S14, according to the weight of dimension to normalized contribution degree vector calculation comprehensive contribution degree vector.
Because the computation rule of each dimension and radix are inequality, carry out after for it comprehensively, must at first the contribution degree vector that obtains be carried out normalization.
Follow the contribution degree vector after the comprehensive normalization, generate comprehensive contribution degree vector.
Can obtain after the normalization a comprehensive contribution degree vector P=(V1, V2 ... Vn), its successively corresponding represent the lexical or textual analysis item (W1, W2......Wn).
S15, according to comprehensive contribution degree vector to the ordering target item sort.
Concrete can be according to the ordering target item ordering to correspondence of the size of comprehensive contribution degree vector.Usually, vector value is more big, and it is more forward to sort.
Be example to represent the lexical or textual analysis item below, the calculating of contribution degree vector on each dimension is described.
The embodiment of the invention 2 is the computation process to lexical or textual analysis item contribution degree vector on word frequency dimension.
The basis of choosing word and be frequently an ordering dimension is, if the frequency that word that represent in the lexical or textual analysis item occurs is more, the probability that is used to of this representative lexical or textual analysis item is with regard to greatly so, and just this representative lexical or textual analysis item is relatively used always.
At first statistics represents the parameters sortnig of lexical or textual analysis item at least two dimensions, referring to Fig. 2, specifically comprises:
The frequency of each word in S21, all lexical or textual analysis items of statistics entry.
Defining each word is A iBe example with same, add up the frequency of each word in its all lexical or textual analysis items, then A 1For same, the frequency is 4; A 2Be one, the frequency is 3; A 3For, the frequency is 5; A 4Be phase, the frequency is 2; A 5Be sample, the frequency is 2; A 6Be mould, the frequency is 1.During for subsequent calculations, conveniently take this statistic, statistics can be put into %hash.
Especially, for the character of avoiding not having practical significance impacts ordering, can be with the deletion in advance from the lexical or textual analysis item of these characters.Such as bracket or " with " word etc.Certainly be not refer to all " with " word all will delete, but delete when the statistic procedure of the technical program do not had the actual contribution meaning.Concrete when judging, can by judge a word whether the mode in bracket carry out.Such as in " (with ...) is identical " in the above-mentioned example " with " word.Certainly this is a kind of concrete mode, and the present invention does not do concrete restriction to this.
S22, calculate the average word frequency represent lexical or textual analysis item according to the frequency of statistics.
For avoiding because one represent the number of words that the lexical or textual analysis item comprises many, and cause the word frequency many, thereby causing this representative lexical or textual analysis item to be mistaken as uses always, can calculate the average word frequency of this representative lexical or textual analysis item.
Defining each each word that represents in the lexical or textual analysis item is Zn, then Wi=Z1Z2......Zn.The word frequency of Wi is the frequency sum of the character that comprises among the Wi.Its average word is n for the character frequency sum of this representative lexical or textual analysis item divided by number of characters just frequently.Define average word and be SCORE (W frequently i), then can calculate by following formula:
SCORE ( Wi ) = Σ i = 1 n $ hash { Z i } n
Such as the W2=Z1Z2Z3. of above-mentioned same wherein Z1 be phase, Z2 be with, Z3 is.SCORE (W then 2) be 11/3.
According to parameters sortnig, calculate and represent the contribution degree vector of lexical or textual analysis item at least two dimensions, referring to Fig. 2, be specially:
S23, according to the average word that represents the lexical or textual analysis item frequently and the average word that represent the lexical or textual analysis item frequently and, calculate the word frequency contribution degree vector that represents the lexical or textual analysis item.
Represent the average word frequency of lexical or textual analysis item and equal all average words that represent lexical or textual analysis item sum frequently.The average word of definition lexical or textual analysis item frequently and be SUM, then
Figure BDA0000128180850000122
Be example with same, SUM=SCORE (W1)+SCORE (W2)=4+11/3=23/3.
Defined word frequently contribution degree vector is P1, then P1=(SCORE (W1), SCORE (W2) ..., SCORE (Wn))/SUM.
The embodiment of the invention 3 is the computation process that represents lexical or textual analysis item contribution degree vector on length dimension.
Choose the length frequency and be for the basis of ordering dimension, if the frequency that length value that represents the lexical or textual analysis item occurs in the length value of all representative lexical or textual analysis items is more, the length of this representative lexical or textual analysis item is exactly a length of relatively using always so.That is to say, the representative lexical or textual analysis item of this length correspondence be relatively use always or comparison operators share the family use habit.
At first statistics represents the parameters sortnig of lexical or textual analysis item at least two dimensions, referring to Fig. 3, specifically comprises:
Length and the length frequency of S31, all lexical or textual analysis items of statistics entry.
In embodiments of the present invention, the length of lexical or textual analysis item can be foundation with the number that comprises word in the lexical or textual analysis item, and a word is a unit length.The length frequency refers to the frequency that same length occurs.Such as in the lexical or textual analysis item of same, the length of its five lexical or textual analysis items is 3,3,3,3,5 successively.The frequency of length 3 is 4 so, and the frequency of length 5 is 1.Concrete, the length and the length frequency that count can be put into %hash_length.Certainly, if the length computation of literal beyond the Chinese can be unit with a word, can be unit with the phrase, look concrete condition and select.
S32, from above-mentioned statistics, obtain the length frequency that represents the lexical or textual analysis item.
The corresponding lexical or textual analysis item W that represents i, defining its length frequency is SCORE (W i), then can directly read corresponding W among the %hash_length iThe length frequency.Be specially: SCORE (W i)=$hash_length{length (W i).
Representing lexical or textual analysis item " same " with first of same is example, and its length is 3, is 4 to the frequency that should length.
According to parameters sortnig, calculating represents the contribution degree vector of lexical or textual analysis item at least two dimensions and comprises:
S33, according to the length frequency that represents the lexical or textual analysis item and represent the lexical or textual analysis item the length frequency and, calculate the length contribution degree vector represent the lexical or textual analysis item.
Represent the lexical or textual analysis item the length frequency and be specially:
Figure BDA0000128180850000131
Definition length contribution degree vector is P2, then P2=(SCORE (W1), SCORE (W2) ..., SCORE (Wn))/SUM.
The embodiment of the invention 4 is the computation process to contribution degree vector on the dimension of lexical or textual analysis item occurrence frequency in example sentence.
In dictionary, can provide the example sentence of counterpart or whole lexical or textual analysis items for the reader understanding usually.The general more lexical or textual analysis item that occurs in example sentence is lexical or textual analysis item important and commonly used, so the contribution degree vector of lexical or textual analysis item in example sentence is relevant with the ordering of lexical or textual analysis item.
At first statistics represents the parameters sortnig of lexical or textual analysis item at least two dimensions, and referring to Fig. 4, this step specifically comprises:
S41, statistics represent the frequency that the lexical or textual analysis item occurs in the example sentence of Query Result.
In concrete statistic processes, only statistics represents the frequency that the lexical or textual analysis item occurs in example sentence.
Can certainly add up at the frequency that all lexical or textual analysis items occur in example sentence, the frequency statistics that synonym lexical or textual analysis item is occurred in example sentence represents the frequency of lexical or textual analysis item for it.Such as, the frequency statistics that " same " in second group of synonym lexical or textual analysis of same, " (with) identical ", " living " are occurred in example sentence is to representing in the frequency of occurrence of lexical or textual analysis item " identical ".
The frequency that define and represent lexical or textual analysis item occurs in example sentence is SCORE (W i), SCORE (W then i)=times (Wi) inexample, i.e. W iThe number of times that in example sentence, occurs.
S42, calculate represent the frequency that the lexical or textual analysis item occurs in the example sentence of Query Result and.
The concrete definable frequency and be SUM, then
Referring to Fig. 4, according to parameters sortnig, calculating represents the contribution degree of lexical or textual analysis item at least two dimensions and comprises to measuring:
S43, according to the frequency that in the example sentence of Query Result, occurs, the frequency that in the example sentence of Query Result, occurs and, calculate the example sentence frequency contribution degree vector that represents the lexical or textual analysis item.
Definable example sentence frequency contribution degree vector is P3, then P3=(SCORE (W1), SCORE (W2) ..., SCORE (Wn))/SUM.
The embodiment of the invention 5 is the computation process to contribution degree vector on the dimension of lexical or textual analysis item occurrence frequency in corpus.
Understandable, the number of times that occurs in corpus when a certain lexical or textual analysis item illustrates that this lexical or textual analysis item itself is exactly relatively to use always more for a long time.
At first statistics represents the parameters sortnig of lexical or textual analysis item at least two dimensions, and referring to Fig. 5, this step specifically comprises:
S51, statistics represent the frequency that the lexical or textual analysis item occurs in the word frequency list of corpus.
Existing corpus capacity has 1,000,000 grades, the branch of millions.Be inquiry in a wider context, select the corpus of millions in a preferred embodiment of the invention for use.
Corpus comprises the multilingual type usually.Therefore need extract the part of corpus correspondence according to the language form of lexical or textual analysis item.Such as when the lexical or textual analysis item is Chinese, need to extract the Chinese part in the corpus.And Chinese is partly adopted the segmenting method of vocabulary, the vocabulary that wherein adopts can be the lexical or textual analysis item that extracts, add up each then and represent the frequency that the lexical or textual analysis item occurs in this vocabulary, and during statistics is saved to.Like this, represent the just corresponding frequency of lexical or textual analysis item Wi and count SCORE (W i)=$hash_fre{W i.
S52, calculate represent the frequency that the lexical or textual analysis item occurs in the word frequency list of corpus and.
Concrete, the definable frequency and SUM = Σ i = 1 n SCORE ( W i ) .
According to parameters sortnig, calculate the contribution degree vector of described representative lexical or textual analysis item at least two dimensions, referring to Fig. 5, specifically comprise:
S53, calculating represent the corpus frequency contribution degree vector of lexical or textual analysis item.
Definable corpus frequency contribution degree vector P4=(SCORE (W1), SCORE (W2) ..., SCORE (Wn))/SUM.
The embodiment of the invention 6 is the computation process to lexical or textual analysis item contribution degree vector of occurrence frequency in other dictionaries.
The frequency that lexical or textual analysis item occurs in other dictionaries is more high, illustrates that this lexical or textual analysis item is more important, more commonly used.Therefore, can judge importance degree and the degree commonly used of lexical or textual analysis item by the frequency that calculating lexical or textual analysis item occurs in other dictionaries, with this it be sorted.
At first statistics represents the parameters sortnig of lexical or textual analysis item at least two dimensions, referring to Fig. 6, specifically comprises:
S61, statistics represent the lexical or textual analysis item frequency of occurrence of lexical or textual analysis item entry in other dictionaries.
Add up the lexical or textual analysis item of corresponding entry word in other dictionaries according to the entry that represents the lexical or textual analysis item, and be stored among the %hash_mini_dict.The frequency that define and represent lexical or textual analysis item occurs in the corresponding entry of other dictionaries is SCORE (Wi), then SCORE (Wi)=times (Wi) in$hash_mini_dict{word}.
S62, calculate represent the frequency that occurs in the lexical or textual analysis item of lexical or textual analysis item entry in other dictionaries and.The definition frequency and SUM = Σ i = 1 n SCORE ( W i ) .
According to parameters sortnig, calculate and represent the contribution degree vector of lexical or textual analysis item at least two dimensions, referring to Fig. 6, specifically comprise
S63, according to each lexical or textual analysis item frequency of occurrence of entry in other dictionaries and the lexical or textual analysis item of entry occurs in other dictionaries the frequency and, calculate other dictionary frequency contribution degree vectors that represent the lexical or textual analysis item.
Concrete, definable represents other dictionary frequency contribution degree vectors P5=(SCORE (W of lexical or textual analysis item 1), SCORE (W2) ..., SCORE (W n))/SUM.
The embodiment of the invention can be carried out the calculating of contribution degree vector at different dictionarys respectively, according to the different characteristics of dictionary, is conducive to obtain more valuable contribution degree vector.
The embodiment of the invention 7 is to the computation process of lexical or textual analysis item at the contribution degree vector of order dimension.
The ordering of lexical or textual analysis item has embodied significance level and the degree commonly used of lexical or textual analysis item to a certain extent in the current dictionary.Therefore can be with current order as a kind of dimension.
Add up the parameters sortnig of described representative lexical or textual analysis item on described at least two dimensions, referring to Fig. 7, specifically comprise:
S71, compose branch in proper order according to representing the front and back of lexical or textual analysis item in current dictionary; Wherein, represent the preceding lexical or textual analysis item mark be higher than after representative lexical or textual analysis item mark.
At first compose branch in proper order according to representing the front and back of lexical or textual analysis item in current dictionary.To compose mark be n such as representing the representative lexical or textual analysis item that makes number one in the lexical or textual analysis item to n, and successively decrease successively, and to compose mark be 1 to coming last representative lexical or textual analysis item.
According to parameters sortnig, calculate and represent the contribution degree vector of lexical or textual analysis item at least two dimensions, referring to Fig. 7, specifically comprise:
S72, calculate the order contribution degree vector represent the lexical or textual analysis item according to mark.
The order contribution degree vector of define and represent lexical or textual analysis item
The embodiment of the invention 8 is to the computation process of lexical or textual analysis item at the contribution degree vector of standardization dimension.
The use of character lack of standardization can influence the normal use that represents the lexical or textual analysis item, for this reason, can add up the situation lack of standardization that represents in the lexical or textual analysis item, and sort based on this.
At first statistics represents the parameters sortnig of lexical or textual analysis item at least two dimensions, referring to Fig. 8, specifically comprises:
S81, statistics represent word lack of standardization and the word lack of standardization and the corresponding relation that represents the lexical or textual analysis item in the lexical or textual analysis item.
Word lack of standardization refers to not meet the word of normal operating specification.Be non-existent such as this word or certain symbol, for example bracket have only half etc. situation.Perhaps the combination of two words is non-existent,
In concrete application, can add up the standardization mistake of often making at ordinary times in advance, obtain an error correction data storehouse.Contrast the word lack of standardization that obtains to represent in the lexical or textual analysis item by word and the error correction data storehouse that represents the lexical or textual analysis item then.
Then according to parameters sortnig, calculate and represent the contribution degree vector of lexical or textual analysis item at least two dimensions, referring to Fig. 8, specifically comprise:
S82, reduce the corresponding vector value that represents the lexical or textual analysis item in the initialized standardization contribution degree vector according to representing word lack of standardization in the lexical or textual analysis item and corresponding relation.
The initialization contribution degree vector P7=(V concrete, that we can the definition standard dimension 1, V 2... V n)=(1,1 ... 1).
When representing the lexical or textual analysis item and word lack of standardization occurs, reduce corresponding vector value.Reduce by 0.1 such as occurring one.The last like this contribution degree vector that just can generate the standardization dimension.
In the preferred embodiments of the present invention 9, sort based on the lexical or textual analysis item of above-mentioned seven kinds of dimensions to the same part of speech of an entry in the dictionary simultaneously.Its process is as shown in Figure 9:
S91, determine all lexical or textual analysis items of an entry, and from all synonym lexical or textual analysis of lexical or textual analysis item, find out and represent lexical or textual analysis item Wi, needing to obtain ordering vector (W1, W2......Wn).
The parameters sortnig of S92, the above-mentioned representative lexical or textual analysis item of statistics, and calculate respectively represent the vectorial P1 of the contribution degree of lexical or textual analysis item on above-mentioned seven dimensions (V1, V2......Vn), P2 (V1, V2......Vn), P3 (V1, V2......Vn), and P4 (V1, V2......Vn), P5 (V1, V2......Vn), and P6 (V1, V2......Vn), P7 (V1, V2......Vn).
The contribution degree of S93, seven dimensions of normalization vector P1, P2, P3, P4, P5, P6, P7, and each vector after the normalization composed weight according to its contribution degree size.
S94, comprehensive normalization and compose weight after seven contribution degree vectors, obtain comprehensive contribution degree vector P (V1, V2......Vn).
S95, according to comprehensive contribution degree vector to representing lexical or textual analysis item ordering.
Concrete, arrange backward according to the representative lexical or textual analysis item of big young pathbreaker's correspondence of vector value in the past.
The comprehensive contribution degree is the form by configuration file, and the contribution degree of configuration different dimensions obtains a rational resultant vector, by resultant vector the lexical or textual analysis item is sorted, and ranking results can reach re-set target.
Need to prove that in the above-described embodiments, its executive agent is computing machine.
The embodiment of the invention 10 also provides a kind of device according to the ordering of contribution degree vector, and referring to Figure 10, this device comprises:
Determining unit 101 is used for determining the ordering target item.
Selected cell 102 is used for the feature according to described ordering target item, chooses at least two dimensions.
In the present invention, the lexical or textual analysis item with entry in the dictionary is that example is described.Corresponding definition of head-word item, in specific embodiments of the invention, at least two preferred following dimensions of dimension at least two: word is frequently; The length frequency; Occurrence frequency in the example sentence; Occurrence frequency in the corpus; The frequency that occurs in other dictionaries; In proper order; Standardization.Can be clear and definite be, the dimension of selecting for use is more many, the factor of consideration is more comprehensive, the accuracy of ordering is just more high.In a preferred embodiment of the invention, when the contribution degree vector that calculates in the dimension of selecting for use when the lexical or textual analysis item is identical, can continue to select for use other dimensions to calculate.
Entry in the dictionary has a plurality of parts of speech usually, such as an entry be noun be again adjective, the user is accustomed to same part of speech when checking lexical or textual analysis item is arranged in together.Describe and understanding for convenient, in an embodiment of the present invention, the lexical or textual analysis item specifically refers to the lexical or textual analysis item of the same part of speech of same entry.
Entry has synonym lexical or textual analysis item.Synonym lexical or textual analysis item refers to express with different phrases the lexical or textual analysis item of identical or similar explanation.In dictionary, has same label usually.When the lexical or textual analysis item was sorted, the lexical or textual analysis item in the synonym lexical or textual analysis item bound together arrangement, therefore, for avoiding causing ordering chaotic, can only select a lexical or textual analysis Xiang Wei in the synonym lexical or textual analysis item to represent the lexical or textual analysis item.Representing the contribution degree vector of lexical or textual analysis item at least two dimensions by calculating sorts.
Concrete, can select first lexical or textual analysis Xiang Wei in the synonym lexical or textual analysis item to represent the lexical or textual analysis item.Define and represent lexical or textual analysis item is W iCan certainly calculate to select the lexical or textual analysis item by some, such as utilize with following identical calculating word frequently the method for contribution degree vector calculate the vector magnitude of synonym lexical or textual analysis item, and select the lexical or textual analysis Xiang Wei of vector maximum to represent the lexical or textual analysis item.The present invention does not limit concrete selection course.
Parameters sortnig unit 103 is used for the parameters sortnig of the described ordering target item of statistics on described at least two dimensions.
Parameters sortnig refers to sort, and target item itself has to its influential parameter that sorts.According to dimension and the different of target item of sorting, parameters sortnig is also inequality.Such as, when the ordering target item is the representative lexical or textual analysis item of entry, the dimension of choosing is word frequently the time, the parameter that influences its ordering be exactly represent each word in the lexical or textual analysis item word frequently and the average word frequency that represents the lexical or textual analysis item.
Contribution degree vector location 104 is used for according to described parameters sortnig, calculates the contribution degree vector of described ordering target item on described at least two dimensions.
Comprehensive contribution degree vector location 105 is used for the weight of the described dimension of foundation to normalized described contribution degree vector calculation comprehensive contribution degree vector.
Because the computation rule of each dimension and radix are inequality, carry out after for it comprehensively, must at first the contribution degree vector that obtains be carried out normalization.
Follow the contribution degree vector after the comprehensive normalization, generate comprehensive contribution degree vector.
Sequencing unit 106 is used for according to described comprehensive contribution degree vector described ordering target item being sorted.
Concrete can be according to the ordering target item ordering to correspondence of the size of comprehensive contribution degree vector.Usually, vector value is more big, and it is more forward to sort.
In the embodiment of the invention 11, when one of them dimension of choosing was word frequency dimension, parameters sortnig unit 103 comprised:
Word is the unit frequently, is used for the frequency of all each words of lexical or textual analysis item of statistics entry.
Average word is the unit frequently, for the average word that represents the lexical or textual analysis item according to the frequency calculating of adding up frequently.
Contribution degree vector location 104 comprises:
Word is the contribution degree vector location frequently, be used for according to the average word that represent the lexical or textual analysis item frequently, the average word that represent the lexical or textual analysis item frequently and, calculating represents the word frequency contribution degree vector of lexical or textual analysis item.
In the embodiment of the invention 102, when one of them dimension of choosing was length frequency dimension, parameters sortnig unit 103 comprised:
The first length frequency unit is used for the length of all lexical or textual analysis items of the described entry of statistics and the frequency of described length;
The second length frequency unit is for the frequency of obtaining the length that represents the lexical or textual analysis item.
Contribution degree vector location 104 comprises:
Length contribution degree vector location, be used for root state the frequency of the length that represents the lexical or textual analysis item, described representative lexical or textual analysis item length the frequency and, calculate the length contribution degree vector of described representative lexical or textual analysis item.
In the embodiment of the invention 13, when a dimension at least two dimensions is in the example sentence during occurrence frequency, parameters sortnig unit 103 comprises:
The first example sentence frequency unit is used for statistics and represents the frequency that the lexical or textual analysis item occurs at the example sentence of Query Result.
The second example sentence frequency unit, be used for to calculate represent the frequency that the lexical or textual analysis item occurs at the example sentence of Query Result with.
Contribution degree vector location 104 comprises:
Example sentence contribution degree vector location, the frequency that occurs in the frequency that is used for occurring according to the example sentence at Query Result, the example sentence at Query Result and, calculate the example sentence frequency contribution degree vector that represents the lexical or textual analysis item.
In the embodiment of the invention 14, when a dimension at least two dimensions is in the corpus during occurrence frequency, parameters sortnig unit 103 comprises:
The first corpus frequency unit is used for statistics and represents the frequency that the lexical or textual analysis item occurs at the word frequency list of corpus.
The second corpus frequency unit, be used for to calculate represent the frequency that the lexical or textual analysis item occurs at the word frequency list of corpus with.
Contribution degree vector location 104 comprises:
Corpus contribution degree vector location, the frequency that occurs in the frequency that is used for occurring according to the word frequency list at corpus, the word frequency list at corpus and, calculate the corpus frequency contribution degree vector that represents the lexical or textual analysis item.
In the embodiment of the invention 15, when a dimension at least two dimensions is in other dictionaries during occurrence frequency, parameters sortnig unit 103 comprises:
First other dictionary frequency unit are used for statistics and represent the frequency that the lexical or textual analysis item occurs in the lexical or textual analysis items of the corresponding entry of other dictionaries.
Second other dictionary frequency unit, be used for to calculate represent the frequency that the lexical or textual analysis item occurs in the lexical or textual analysis item of the corresponding entry of other dictionaries with.
Contribution degree vector location 104 comprises:
Other dictionary contribution degree vector locations, be used for according to the frequency that in the lexical or textual analysis item of the corresponding entry of other dictionaries, occurs, the frequency that in other dictionaries, occurs in the lexical or textual analysis item of corresponding entry and, calculate other dictionary frequency contribution degree vectors that represent the lexical or textual analysis item.
In the embodiment of the invention 16, when a dimension at least two dimensions when putting in order dimension, parameters sortnig unit 103 comprises:
Compose subdivision, be used for composing branch according to representing the lexical or textual analysis item in proper order in the front and back of current dictionary; Wherein, represent the preceding lexical or textual analysis item mark be higher than after representative lexical or textual analysis item mark.
Contribution degree vector location 104 comprises:
Order contribution degree vector location is used for calculating the order contribution degree vector that represents the lexical or textual analysis item according to the mark that represents the lexical or textual analysis item.
In the embodiment of the invention 17, when a dimension at least two dimensions was the standardization dimension, parameters sortnig unit 103 comprised:
Specification unit is used for character lack of standardization and character lack of standardization and the corresponding relation that represents the lexical or textual analysis item that statistics represents the lexical or textual analysis item.
In concrete application, can add up the standardization mistake of often making at ordinary times in advance, obtain an error correction data storehouse.Contrast the word lack of standardization that obtains to represent in the lexical or textual analysis item by word and the error correction data storehouse that represents the lexical or textual analysis item then.
Contribution degree vector location 104 comprises:
Standardization contribution degree vector location is used for reducing the corresponding vector value that represents the lexical or textual analysis item of initialized standardization contribution degree vector according to character lack of standardization and the corresponding relation of statistics, generates the standardization contribution degree vector that represents the lexical or textual analysis item.
More than a kind of method and device according to contribution degree vector ordering provided by the present invention introduced, used specific case herein principle of the present invention and embodiment are set forth, the explanation of above embodiment just is used for helping to understand method of the present invention and core concept thereof; Simultaneously, for one of ordinary skill in the art, according to thought of the present invention, part in specific embodiments and applications all can change.In sum, this description should not be construed as limitation of the present invention.

Claims (18)

1. method according to contribution degree ordering is characterized in that described method comprises:
Determine the ordering target item, and according to the feature of described ordering target item, choose at least two dimensions;
Add up the parameters sortnig of described ordering target item on described at least two dimensions;
According to described parameters sortnig, calculate the contribution degree vector of described ordering target item on described at least two dimensions;
The weight of the described dimension of foundation is to normalized described contribution degree vector calculation comprehensive contribution degree vector;
According to described comprehensive contribution degree vector described ordering target item is sorted.
2. method according to claim 1 is characterized in that, when representative lexical or textual analysis item that described ordering target item is definition of head-word item, described at least two dimensions are at least two dimensions in the following dimension:
Word frequently; The length frequency; Occurrence frequency in the example sentence; Occurrence frequency in the corpus; The frequency that occurs in the dictionary; Put in order; Standardization;
The described ordering target item of described statistics comprises at the parameters sortnig on described two dimensions at least: add up the parameters sortnig of described representative lexical or textual analysis item on described at least two dimensions;
Described according to described parameters sortnig, calculate the contribution degree vector of described ordering target item on described at least two dimensions and comprise: according to described parameters sortnig, calculate the contribution degree vector of described representative lexical or textual analysis item on described at least two dimensions;
Described described ordering target item the ordering according to described comprehensive contribution degree vector comprises: according to described comprehensive contribution degree vector described representative lexical or textual analysis item is sorted.
3. method according to claim 2 is characterized in that, when a dimension in described at least two dimensions was the word frequency, the parameters sortnig of the described representative lexical or textual analysis of described statistics item on described at least two dimensions comprised:
Add up the frequency of each word in all lexical or textual analysis items of described entry;
Calculate the average word frequency of described representative lexical or textual analysis item according to the frequency of described statistics;
Described according to described parameters sortnig, calculate the contribution degree vector of described representative lexical or textual analysis item on described at least two dimensions and comprise:
According to the average word of described representative lexical or textual analysis item frequently, the average word of described representative lexical or textual analysis item frequently and, the word that calculates described representative lexical or textual analysis item is the contribution degree vector frequently.
4. method according to claim 2 is characterized in that, when a dimension in described at least two dimensions was the length frequency, the parameters sortnig of the described representative lexical or textual analysis of described statistics item on described at least two dimensions comprised:
Add up the length of all lexical or textual analysis items of described entry and the frequency of described length;
Obtain the frequency of the length of described representative lexical or textual analysis item;
Described according to described parameters sortnig, calculate the contribution degree vector of described representative lexical or textual analysis item on described at least two dimensions and comprise:
According to the frequency of the length of the frequency of the length of described representative lexical or textual analysis item, described representative lexical or textual analysis item and, calculate the length contribution degree vector of described representative lexical or textual analysis item.
5. method according to claim 2 is characterized in that, when a dimension in described at least two dimensions is in the example sentence during occurrence frequency, the parameters sortnig of the described representative lexical or textual analysis of described statistics item on described at least two dimensions comprises:
Add up the frequency that described representative lexical or textual analysis item occurs in the example sentence of Query Result;
Calculate the frequency that described representative lexical or textual analysis item occurs in the example sentence of Query Result and;
Described according to described parameters sortnig, calculate the contribution degree vector of described representative lexical or textual analysis item on described at least two dimensions and comprise:
According to the described frequency that in the example sentence of Query Result, occurs, the described frequency that in the example sentence of Query Result, occurs and, calculate the example sentence frequency contribution degree vector of described representative lexical or textual analysis item.
6. method according to claim 2 is characterized in that, when a dimension in described at least two dimensions is in the corpus during occurrence frequency, the parameters sortnig of the described representative lexical or textual analysis of described statistics item on described at least two dimensions comprises:
Add up the frequency that described representative lexical or textual analysis item occurs in the word frequency list of described corpus;
Calculate the frequency that described representative lexical or textual analysis item occurs in the word frequency list of described corpus and;
Described according to described parameters sortnig, calculate the contribution degree vector of described representative lexical or textual analysis item on described at least two dimensions and comprise:
According to the frequency that in the word frequency list of described corpus, occurs, the frequency that in the word frequency list of described corpus, occurs and, calculate the corpus frequency contribution degree vector of described representative lexical or textual analysis item.
7. method according to claim 2 is characterized in that, when a dimension in described at least two dimensions is in the dictionary during occurrence frequency, the parameters sortnig of the described representative lexical or textual analysis of described statistics item on described at least two dimensions comprises:
Add up the frequency that described representative lexical or textual analysis item occurs in the lexical or textual analysis item of entry described in the described dictionary;
Calculate the frequency that described representative lexical or textual analysis item occurs in the lexical or textual analysis item of entry described in the described dictionary and;
Described according to described parameters sortnig, calculate the contribution degree vector of described representative lexical or textual analysis item on described at least two dimensions and comprise:
According to the frequency that in the lexical or textual analysis item of entry described in the described dictionary, occurs, the frequency that in the lexical or textual analysis item of entry described in the described dictionary, occurs and, calculate the dictionary frequency contribution degree vector of described representative lexical or textual analysis item.
8. method according to claim 2 is characterized in that, when a dimension in described at least two dimensions when putting in order dimension, the parameters sortnig of the described representative lexical or textual analysis of described statistics item on described at least two dimensions comprises:
Compose branch in proper order according to the front and back of described representative lexical or textual analysis item in current dictionary; Wherein, represent the preceding lexical or textual analysis item mark be higher than after representative lexical or textual analysis item mark;
Described according to described parameters sortnig, calculate the contribution degree vector of described representative lexical or textual analysis item on described at least two dimensions and comprise:
Calculate the order contribution degree vector of described representative lexical or textual analysis item according to described mark.
9. method according to claim 2 is characterized in that, when a dimension in described at least two dimensions was the standardization dimension, the parameters sortnig of the described representative lexical or textual analysis of described statistics item on described at least two dimensions comprised:
Add up character lack of standardization in the described representative lexical or textual analysis item and the corresponding relation of described character and described representative lexical or textual analysis item;
Described according to described parameters sortnig, calculate the contribution degree vector of described representative lexical or textual analysis item on described at least two dimensions and comprise:
Character lack of standardization and described corresponding relation according to described statistics reduce the corresponding vector value that represents the lexical or textual analysis item in the initialized standardization contribution degree vector, generate the standardization contribution degree vector of described representative lexical or textual analysis item.
10. device according to contribution degree ordering is characterized in that described device comprises:
Determining unit is used for determining the ordering target item;
Selected cell is used for the feature according to described ordering target item, chooses at least two dimensions;
The parameters sortnig unit is used for the parameters sortnig of the described ordering target item of statistics on described at least two dimensions;
The contribution degree vector location is used for according to described parameters sortnig, calculates the contribution degree vector of described ordering target item on described at least two dimensions;
Comprehensive contribution degree vector location is used for the weight of the described dimension of foundation to normalized described contribution degree vector calculation comprehensive contribution degree vector;
Sequencing unit is used for according to described comprehensive contribution degree vector described ordering target item being sorted.
11. device according to claim 10 is characterized in that, when representative lexical or textual analysis item that described ordering target item is definition of head-word item, described at least two dimensions are at least two dimensions in the following dimension:
Word frequently; The length frequency; Occurrence frequency in the example sentence; Occurrence frequency in the corpus; The frequency that occurs in the dictionary; Put in order; Standardization;
Described parameters sortnig unit also is used for the parameters sortnig of the described representative lexical or textual analysis item of statistics on described at least two dimensions;
Described contribution degree vector location also is used for according to described parameters sortnig, calculates the contribution degree vector of described representative lexical or textual analysis item on described at least two dimensions;
Described comprehensive contribution degree vector location also is used for according to described comprehensive contribution degree vector described representative lexical or textual analysis item being sorted.
12. device according to claim 11 is characterized in that, when a dimension in described at least two dimensions was the word frequency, described parameters sortnig unit comprised:
Word is the unit frequently, is used for the frequency of all each words of lexical or textual analysis item of the described entry of statistics;
Average word is the unit frequently, is used for calculating according to the frequency of described statistics the average word frequency of described representative lexical or textual analysis item;
Described contribution degree vector location comprises:
Word is the contribution degree vector location frequently, be used for according to the average word of described representative lexical or textual analysis item frequently, the average word of described representative lexical or textual analysis item frequently and, the word that calculates described representative lexical or textual analysis item is the contribution degree vector frequently.
13. device according to claim 11 is characterized in that, when a dimension in described at least two dimensions was the length frequency, described parameters sortnig unit comprised:
The first length frequency unit is used for the length of all lexical or textual analysis items of the described entry of statistics and the frequency of described length;
The second length frequency unit is for the frequency of the length of obtaining described representative lexical or textual analysis item;
Described contribution degree vector location comprises:
Length contribution degree vector location, be used for according to the frequency of the length of the frequency of the length of described representative lexical or textual analysis item, described representative lexical or textual analysis item and, calculate the length contribution degree vector of described representative lexical or textual analysis item.
14. device according to claim 11 is characterized in that, when a dimension in described at least two dimensions is in the example sentence during occurrence frequency, described parameters sortnig unit comprises:
The first example sentence frequency unit is for adding up the frequency that the example sentence of described representative lexical or textual analysis item at Query Result occurs;
The second example sentence frequency unit, be used for to calculate the frequency that described representative lexical or textual analysis item occurs at the example sentence of Query Result and;
Described contribution degree vector location comprises:
Example sentence contribution degree vector location, be used for according to the described frequency that in the frequency that the example sentence of Query Result occurs, described example sentence at Query Result, occurs and, calculate the example sentence frequency contribution degree vector of described representative lexical or textual analysis item.
15. device according to claim 11 is characterized in that, when a dimension in described at least two dimensions is in the corpus during occurrence frequency, described parameters sortnig unit comprises:
The first corpus frequency unit is for adding up the frequency that the word frequency list of described representative lexical or textual analysis item at described corpus occurs;
The second corpus frequency unit, be used for to calculate the frequency that described representative lexical or textual analysis item occurs at the word frequency list of described corpus and;
Described contribution degree vector location comprises:
Corpus contribution degree vector location, the frequency that occurs in the frequency that is used for occurring according to the word frequency list at described corpus, the word frequency list at described corpus and, calculate the corpus frequency contribution degree vector of described representative lexical or textual analysis item.
16. device according to claim 11 is characterized in that, when a dimension in described at least two dimensions is in the dictionary during occurrence frequency, described parameters sortnig unit comprises:
The first dictionary frequency unit is for adding up the frequency that the lexical or textual analysis item of described representative lexical or textual analysis item at entry described in the described dictionary occurs;
The second dictionary frequency unit, be used for to calculate the frequency that described representative lexical or textual analysis item occurs at the lexical or textual analysis item of entry described in the described dictionary and;
Described contribution degree vector location comprises:
Dictionary contribution degree vector location, be used for the frequency that occurs according to the lexical or textual analysis item at entry described in the described dictionary, the frequency that in the lexical or textual analysis item of entry described in the described dictionary, occurs and, calculate the dictionary frequency contribution degree vector of described representative lexical or textual analysis item.
17. device according to claim 11 is characterized in that, when a dimension in described at least two dimensions when putting in order dimension, described parameters sortnig unit comprises:
Compose subdivision, be used for composing branch according to described representative lexical or textual analysis item in proper order in the front and back of current dictionary; Wherein, represent the preceding lexical or textual analysis item mark be higher than after representative lexical or textual analysis item mark;
Described contribution degree vector location comprises:
Order contribution degree vector location is for the order contribution degree vector that calculates described representative lexical or textual analysis item according to described mark.
18. device according to claim 11 is characterized in that, when a dimension in described at least two dimensions was the standardization dimension, described parameters sortnig unit comprised:
Specification unit is used for the character lack of standardization of the described representative lexical or textual analysis item of statistics and the corresponding relation of described character and described representative lexical or textual analysis item;
Described contribution degree vector location comprises:
Standardization contribution degree vector location is used for reducing the corresponding vector value that represents the lexical or textual analysis item of initialized standardization contribution degree vector according to character lack of standardization and the described corresponding relation of described statistics, generates the standardization contribution degree vector of described representative lexical or textual analysis item.
CN201110460665.7A 2011-12-31 2011-12-31 A kind of method and device according to contribution degree sequence Active CN103186647B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110460665.7A CN103186647B (en) 2011-12-31 2011-12-31 A kind of method and device according to contribution degree sequence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110460665.7A CN103186647B (en) 2011-12-31 2011-12-31 A kind of method and device according to contribution degree sequence

Publications (2)

Publication Number Publication Date
CN103186647A true CN103186647A (en) 2013-07-03
CN103186647B CN103186647B (en) 2016-05-11

Family

ID=48677816

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110460665.7A Active CN103186647B (en) 2011-12-31 2011-12-31 A kind of method and device according to contribution degree sequence

Country Status (1)

Country Link
CN (1) CN103186647B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678576A (en) * 2013-12-11 2014-03-26 华中师范大学 Full-text retrieval system based on dynamic semantic analysis
CN106294567A (en) * 2016-07-26 2017-01-04 腾讯科技(深圳)有限公司 A kind of Audio Sorting method and apparatus
CN107193806A (en) * 2017-06-08 2017-09-22 清华大学 A kind of vocabulary justice former automatic prediction method and device
CN107368510A (en) * 2017-04-10 2017-11-21 口碑控股有限公司 A kind of shop search ordering method and device
CN109961199A (en) * 2017-12-25 2019-07-02 北京京东尚科信息技术有限公司 A kind of method and apparatus for analyzing data fluctuations

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1996363A (en) * 2006-01-01 2007-07-11 腾讯科技(深圳)有限公司 Information displaying method and system
CN101079033A (en) * 2006-06-30 2007-11-28 腾讯科技(深圳)有限公司 Integrative searching result sequencing system and method
CN101105815A (en) * 2007-09-06 2008-01-16 腾讯科技(深圳)有限公司 Internet music file sequencing method, system and search method and search engine
CN101354712A (en) * 2008-09-05 2009-01-28 北京大学 System and method for automatically extracting Chinese technical terms
EP2199926A2 (en) * 2008-12-22 2010-06-23 Sap Ag Semantically weighted searching in a governed corpus of terms

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1996363A (en) * 2006-01-01 2007-07-11 腾讯科技(深圳)有限公司 Information displaying method and system
CN101079033A (en) * 2006-06-30 2007-11-28 腾讯科技(深圳)有限公司 Integrative searching result sequencing system and method
CN101105815A (en) * 2007-09-06 2008-01-16 腾讯科技(深圳)有限公司 Internet music file sequencing method, system and search method and search engine
CN101354712A (en) * 2008-09-05 2009-01-28 北京大学 System and method for automatically extracting Chinese technical terms
EP2199926A2 (en) * 2008-12-22 2010-06-23 Sap Ag Semantically weighted searching in a governed corpus of terms

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678576A (en) * 2013-12-11 2014-03-26 华中师范大学 Full-text retrieval system based on dynamic semantic analysis
CN103678576B (en) * 2013-12-11 2016-08-17 华中师范大学 The text retrieval system analyzed based on dynamic semantics
CN106294567A (en) * 2016-07-26 2017-01-04 腾讯科技(深圳)有限公司 A kind of Audio Sorting method and apparatus
CN107368510A (en) * 2017-04-10 2017-11-21 口碑控股有限公司 A kind of shop search ordering method and device
CN107193806A (en) * 2017-06-08 2017-09-22 清华大学 A kind of vocabulary justice former automatic prediction method and device
CN107193806B (en) * 2017-06-08 2019-11-22 清华大学 A kind of automatic prediction method and device that vocabulary justice is former
CN109961199A (en) * 2017-12-25 2019-07-02 北京京东尚科信息技术有限公司 A kind of method and apparatus for analyzing data fluctuations

Also Published As

Publication number Publication date
CN103186647B (en) 2016-05-11

Similar Documents

Publication Publication Date Title
US11475209B2 (en) Device, system, and method for extracting named entities from sectioned documents
CN106708966B (en) Junk comment detection method based on similarity calculation
CN103885938B (en) Industry spelling mistake checking method based on user feedback
US9563665B2 (en) Product search method and system
Ahmed et al. Language identification from text using n-gram based cumulative frequency addition
CN102054006B (en) Vocabulary quality excavating evaluation method and device
EP3051432A1 (en) Semantic information acquisition method, keyword expansion method thereof, and search method and system
CN104915327A (en) Text information processing method and device
CN103324609A (en) Text proofreading apparatus and text proofreading method
CN101782898A (en) Method for analyzing tendentiousness of affective words
CN107102993B (en) User appeal analysis method and device
CN110727862B (en) Method and device for generating query strategy of commodity search
CN103186647A (en) Method and device for sequencing according to contribution degree
CN107133282B (en) Improved evaluation object identification method based on bidirectional propagation
CN105843796A (en) Microblog emotional tendency analysis method and device
CN111563384A (en) Evaluation object identification method and device for E-commerce products and storage medium
CN110532354A (en) The search method and device of content
CN102789452A (en) Similar content extraction method
CN112633000A (en) Method and device for associating entities in text, electronic equipment and storage medium
CN109213998A (en) Chinese wrongly written character detection method and system
CN104778283A (en) User occupation classification method and system based on microblog
CN114625834A (en) Enterprise industry information determination method and device and electronic equipment
CN104881446A (en) Searching method and searching device
CN103970732B (en) Mining method and device of new word translation
CN105787004A (en) Text classification method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
ASS Succession or assignment of patent right

Free format text: FORMER OWNER: BEIJING JINSHAN DIGITAL ENTERTAINMENT SCIENCE AND TECHNOLOGY CO., LTD.

Effective date: 20140403

Owner name: BEIJING KINGSOFT OFFICE SOFTWARE CO., LTD.

Free format text: FORMER OWNER: BEIJING JINSHAN SOFTWARE CO., LTD.

Effective date: 20140403

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 100085 HAIDIAN, BEIJING TO: 100080 HAIDIAN, BEIJING

TA01 Transfer of patent application right

Effective date of registration: 20140403

Address after: 100080 Beijing city Haidian District small business Road No. 33 two storey commercial office area C

Applicant after: Beijing Kingsoft WPS Office Co., Ltd.

Address before: Kingsoft 33 Building No. 100085 Beijing Haidian District City 1 Xiaoying Road West

Applicant before: Beijing Jinshan Software Co., Ltd.

Applicant before: Beijing Jinshan Digital Entertainment Science and Technology Co., Ltd.

C14 Grant of patent or utility model
GR01 Patent grant
C56 Change in the name or address of the patentee
CP03 Change of name, title or address

Address after: 100085 Beijing city Haidian District small business Road No. 33 two storey commercial office area C

Patentee after: Beijing Kingsoft office software Limited by Share Ltd

Address before: 100080 Beijing city Haidian District small business Road No. 33 two storey commercial office area C

Patentee before: Beijing Kingsoft WPS Office Co., Ltd.