CN103678356A - Method, device and equipment for obtaining application field attribute information of keywords - Google Patents

Method, device and equipment for obtaining application field attribute information of keywords Download PDF

Info

Publication number
CN103678356A
CN103678356A CN201210335806.7A CN201210335806A CN103678356A CN 103678356 A CN103678356 A CN 103678356A CN 201210335806 A CN201210335806 A CN 201210335806A CN 103678356 A CN103678356 A CN 103678356A
Authority
CN
China
Prior art keywords
keyword
application
value information
word
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201210335806.7A
Other languages
Chinese (zh)
Other versions
CN103678356B (en
Inventor
高徽
王平
郎文静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201210335806.7A priority Critical patent/CN103678356B/en
Publication of CN103678356A publication Critical patent/CN103678356A/en
Application granted granted Critical
Publication of CN103678356B publication Critical patent/CN103678356B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods

Abstract

The invention aims at providing a method and device for obtaining application field attribute information of keywords and equipment for obtaining the application field attribute information of the keywords. The method includes the steps of firstly, obtaining initial application field characteristic value information of at least one keyword contained in each keyword set of multiple keyword sets to be processed; secondly, conducting classified statistics on each keyword set according to the initial application field characteristic value information of the at least one keyword to obtain application field distribution characteristic value information of each keyword set; thirdly, obtaining at least one piece of first application field characteristic value information of each keyword according to the application field distribution characteristic value information of each keyword set; fourthly, conducting statistics on the at least one piece of first application filed characteristic value information of each keyword to obtain second application field characteristic value information of each keyword.

Description

A kind of for obtaining method, device and the equipment of the application attribute information of keyword
Technical field
The present invention relates to field of computer technology, relate in particular to a kind of for obtaining method, device and the equipment of the application attribute information of keyword.
Background technology
In existing Web information issuance system, the application of a plurality of keywords that information issue user is set is divided most dependence and is manually completed, or utilize statistical sorting technique to classify to a plurality of keywords, the original application field that obtains a plurality of keywords is divided, then by bulk information, issue user the purchase relation of a plurality of keywords is carried out to iterative computation, finally obtain the application attribute information of a plurality of keywords.Prior art is because main people's the subjective will that relies on is carried out the division of keyword, thereby make the accuracy of division result and objectivity lower, and because information issue user divides not accurate enoughly to domain attribute under himself, by bulk information, issue user also relatively low to the application attribute information accuracy of the purchase Relation acquisition keyword of a plurality of keywords; Meanwhile, along with the development of search technique, the accuracy that the application of keyword is divided and segmentation degree require also more and more higher, and existing dividing mode can not meet the demand of pin-point accuracy and high segmentation degree.
Therefore, how to provide a kind of method, device and equipment that obtains the application attribute information of keyword, thus can be exactly and obtain efficiently the application attribute information of keyword, become one of current urgent problem.
Summary of the invention
The object of this invention is to provide a kind of method, device and equipment that obtains the application attribute information of keyword.
According to an aspect of the present invention, provide a kind of method of obtaining the application attribute information of keyword, the method comprises the following steps:
A obtains the original application domain features value information of at least one keyword that in pending a plurality of keyword set, each keyword set comprises, wherein, described each keyword set comprises a plurality of keywords;
B, according to the original application domain features value information of described at least one keyword, carries out statistic of classification processing to described each keyword set, to obtain the application distribution characteristics value information of described each keyword set;
C is according to the application distribution characteristics value information of described each keyword set, obtain at least one first application characteristic value information of each keyword, wherein, the application distribution characteristics value information of at least one first application characteristic value information at least one keyword set under should keyword;
D carries out statistical treatment at least one first application characteristic value information of described each keyword, to obtain the second application characteristic value information of described each keyword;
Original application domain features value information using the second application characteristic value information of described each keyword as this keyword, repeated execution of steps b, c, d, until meet predetermined stoppage condition;
Wherein, the method also comprises:
W, when meeting described predetermined stoppage condition, according to the second application characteristic value information of described each keyword, obtains the application attribute information of described each keyword.
According to a further aspect in the invention, also provide a kind of acquisition device that obtains the application attribute information of keyword, this obtains equipment and comprises:
Initial characteristics value acquisition device, for obtaining the original application domain features value information of at least one keyword that pending each keyword set of a plurality of keyword set comprises, wherein, described each keyword set comprises a plurality of keywords;
Application distributed acquisition device, for according to the original application domain features value information of described at least one keyword, carries out statistic of classification processing to described each keyword set, to obtain the application distribution characteristics value information of described each keyword set;
The First Eigenvalue acquisition device, be used for according to the application distribution characteristics value information of described each keyword set, obtain at least one first application characteristic value information of each keyword, wherein, the application distribution characteristics value information of at least one first application characteristic value information at least one keyword set under should keyword;
Second Eigenvalue acquisition device, carries out statistical treatment at least one the first application characteristic value information to described each keyword, to obtain the second application characteristic value information of described each keyword;
Control device, when not meeting predetermined stoppage condition, original application domain features value information using the second application characteristic value information of described each keyword as this keyword, to control described application distributed acquisition device, described the First Eigenvalue acquisition device and described Second Eigenvalue acquisition device, repeat corresponding operating, until meet described predetermined stoppage condition;
Wherein, this acquisition device also comprises:
Application attribute acquisition device, for when meeting described predetermined stoppage condition, according to the second application characteristic value information of described each keyword, obtains the application attribute information of described each keyword.
Compared with prior art, the present invention has the following advantages: by the application distribution characteristics value information to each keyword set, obtain, thereby can obtain the first application characteristic value information that each keyword belongs to one or more keyword set, and then can obtain from the angle of a plurality of keyword set the second application characteristic value information of each keyword, by repeating above-mentioned steps, iterative computation is to obtain the higher application attribute information of accuracy of each keyword, realized the Obtaining Accurate to application under magnanimity keyword, make to obtain result and more there is objectivity, meanwhile, met existing search technique to the keyword demand that more application of refinement is divided, further, the accurate classification to application under keyword, can tutorial message issue user set up rational keyword and the corresponding relation releasing news, thereby effectively optimize this user profile publishing policy.
Accompanying drawing explanation
By reading the detailed description that non-limiting example is done of doing with reference to the following drawings, it is more obvious that other features, objects and advantages of the present invention will become:
Fig. 1 illustrate according to one aspect of the invention for obtaining the structural representation of acquisition device of the application attribute information of keyword;
Fig. 2 illustrate in accordance with a preferred embodiment of the present invention for obtaining the structural representation of acquisition device of the degree of correlation information of each keyword and a plurality of application attribute informations;
Fig. 3 illustrates the method flow diagram of the application attribute information that obtains keyword according to a further aspect of the present invention;
Fig. 4 illustrates the method flow diagram of the degree of correlation information of obtaining each keyword and a plurality of application attribute informations in accordance with a preferred embodiment of the present invention.
In accompanying drawing, same or analogous Reference numeral represents same or analogous parts.
Embodiment
Below in conjunction with accompanying drawing, the present invention is described in further detail.
Fig. 1 illustrate according to one aspect of the invention for obtaining the structural representation of acquisition device of the application attribute information of keyword.The acquisition device of the present embodiment is contained in computer equipment; This acquisition device comprises initial characteristics value acquisition device 1, application distributed acquisition device 2, the First Eigenvalue acquisition device 3, Second Eigenvalue acquisition device 4, control device 5 and application attribute acquisition device 6.
Described computer equipment includes but not limited to the network equipment and subscriber equipment.Wherein, described subscriber equipment includes but not limited to PC etc.; The described network equipment includes but not limited to server group or the cloud consisting of a large amount of computing machines or the webserver based on cloud computing (Cloud Computing) that single network server, a plurality of webserver form, wherein, cloud computing is a kind of of Distributed Calculation, the super virtual machine being comprised of the loosely-coupled computing machine collection of a group.Wherein, the residing network of described subscriber equipment and the network equipment includes but not limited to internet, wide area network, Metropolitan Area Network (MAN), LAN (Local Area Network), VPN network etc.
It should be noted that; described subscriber equipment and the network equipment are only for giving an example; other subscriber equipmenies existing or that may occur from now on, the network equipment or network, as applicable to the present invention, also should be included in protection domain of the present invention, and be contained in this with way of reference.
First, initial characteristics value acquisition device 1 obtains the original application domain features value information of at least one keyword that in pending a plurality of keyword set, each keyword set comprises, wherein, described each keyword set comprises a plurality of keywords.
Wherein, application refers to the field that described keyword is applied to, and includes but not limited to industry etc.
Wherein, described characteristic value information includes but not limited to probabilistic information.
Particularly, a plurality of keywords that initial characteristics value acquisition device 1 comprises each keyword set are inquired about in default application sorted table, obtain the original application domain features value information of at least one keyword that this each keyword set comprises.
Wherein, described default application sorted table comprises the original application domain features value information that a plurality of keywords are corresponding with its difference, and it can obtain by modes such as language material training.Preferably, language material training patterns comprises carries out application attribute labeling to the keyword language material of each preset application, and keyword language material is carried out to the processing such as participle and part-of-speech tagging, then utilize sorting algorithm, the Algorithm of documents categorization based on maximum entropy for example, keyword language material is trained, to obtain original application domain features value information corresponding to a plurality of keywords difference.
It should be noted that, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, any implementation of obtaining the original application domain features value information of at least one keyword that in pending a plurality of keyword set, each keyword set comprises, all should be within the scope of the present invention.
Then, the original application domain features value information of at least one keyword that application distributed acquisition device 2 obtains according to initial characteristics value acquisition device 1, described each keyword set is carried out to statistic of classification processing, to obtain the application distribution characteristics value information of described each keyword set.
Particularly, application distributed acquisition device 2 is according to the original application domain features value information of at least one keyword in each keyword set of having obtained, original application domain features value information to this at least one keyword carries out the statistical treatment of application eigenwert by application, to obtain the application distribution characteristics value information of this each keyword set.
In an example, when keyword set Unit1 comprises keyword word 1, word 2, word 6, word 7, wherein, word 1original application domain features value information for belonging to application T 1probability be Wa, word 2belong to application T 1probabilistic information be Wb, 2 couples of Unit1 of application distributed acquisition device carry out statistic of classification processing, the application eigenwert distributed intelligence of obtaining Unit1 is that Unit1 belongs to application T 1probability be P (T 1| Unit1)=Wa/ (Wa+Wb), Unit1 belongs to application T 2probability be P (T 2| Unit1)=Wb/ (Wa+Wb).
It should be noted that, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, the original application domain features value information of at least one keyword described in any basis, described each keyword set is carried out to statistic of classification processing, to obtain the implementation of the application distribution characteristics value information of described each keyword set, all should be within the scope of the present invention.
Subsequently, the application distribution characteristics value information of each keyword set that the First Eigenvalue acquisition device 3 obtains according to application distributed acquisition device 2, obtain at least one first application characteristic value information of each keyword, wherein, the application distribution characteristics value information of at least one first application characteristic value information at least one keyword set under should keyword.
Wherein, each keyword can belong to one or more keyword set.
Particularly, the First Eigenvalue acquisition device 3 is according to the application distribution characteristics value information of each keyword set of having obtained, the application distribution characteristics value information discretize of each keyword set, to each keyword that this set comprises, is obtained at least one first application characteristic value information of each keyword.
In an example, keyword set Unit1 comprises keyword word 1, word 2, word 6, word 7, the application distribution characteristics value information of keyword set Unit1 is: P (T 1| Unit1)=Wa/ (Wa+Wb), P (T 2| Unit1)=Wb/ (Wa+Wb); The First Eigenvalue acquisition device 3 by the application distribution characteristics value information discretize of Unit1 to this set in keyword word 1, word 2, word 6, word 7, obtain:
Keyword word 1, word 2, word 6, word 7belong to application T 1the first application characteristic value information be P (T 1| word 1)=P (T 1| word 2)=P (T 1| word 6)=P (T 1| word 7)=Wa/ (Wa+Wb),
And keyword word 1, word 2, word 6, word 7belong to application T 2the first application characteristic value information be P (T 2| word 1)=P (T 2| word 2)=P (T 2| word 6)=P (T 2| word 7)=Wb/ (Wa+Wb);
In addition, keyword set Unit2 comprises keyword word 2, word 3, word 6, word 7, the application distribution characteristics value information of keyword set Unit2 is: P (T 2| Unit2)=Wb/ (Wb+Wc), P (T 3| Unit2)=Wc/ (Wb+Wc); The First Eigenvalue acquisition device 3 by the application distribution characteristics value information discretize of Unit2 to this set in keyword word 2, word 3, word 6, word 7, obtain:
Keyword word 2, word 3, word 6, word 7belong to application T 2the first application characteristic value information be P (T 2| Word 2)=P (T 2| word 3)=P (T 2| word 6)=P (T 2| word 7)=Wb/ (Wb+Wc);
Keyword word 2, word 3, word 6, word 7belong to application T 3the first application characteristic value information be P (T 3| Word 2)=P (T 3| word 3)=P (T 3| word 6)=P (T 3| word 7)=Wc/ (Wb+Wc).
It should be noted that, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, the application distribution characteristics value information of each keyword set described in any basis, obtain the implementation of at least one the first application characteristic value information of each keyword, all should be within the scope of the present invention.
Subsequently, at least one first application characteristic value information of each keyword that 4 pairs of the First Eigenvalue acquisition device 3 of Second Eigenvalue acquisition device obtain is carried out statistical treatment, to obtain the second application characteristic value information of described each keyword.
Particularly, at least one first application characteristic value information of each keyword is carried out to statistical treatment, to obtain the method for the second application characteristic value information of described each keyword, includes but not limited to:
1) Second Eigenvalue acquisition device 4 selects the maximal value of the first application characteristic value information as the second application characteristic value information from least one first application characteristic value information of each keyword;
In an example, connect example, in keyword set Unit1, word 2belong to application T 2the first application characteristic value information be P (T 2| word 2)=Wb/ (Wa+Wb), in keyword set Unit2, word 2belong to application T 2the first application characteristic value information be P (T 2| Word 2)=Wb/ (Wb+Wc) selects maximal value as word from these two the first application characteristic value information 2belong to application T 2the second application characteristic value information.
2) Second Eigenvalue acquisition device 4 is according to following formula 1) at least one first application characteristic value information of described each keyword is merged to processing, to obtain the second application characteristic value information of described each keyword:
P ( T i | word j ) = Σ unit 1 unitN P ( T i | word j ) Σ i = 1 M P ( T i | word j ) - - - 1 )
Wherein, T irepresent a certain application attribute information;
Word jrepresent a certain keyword;
N represents the sum of keyword set;
M represents the sum of application attribute information;
P(T i| word j) expression keyword word jbelong to application T ithe second application characteristic value information.
In an example, in keyword set Unit1, a plurality of each keyword word 1, word 2, word 6, word 7belong to application T 1the first application characteristic value information be P (T 1| word 1)=P (T 1| word 2)=P (T 1| word 6)=P (T 1| word 7)=Wa/ (Wa+Wb), belongs to application T 2the first application characteristic value information be P (T 2| word 1)=P (T 2| word 2)=P (T 2| word 6)=P (T 2| word 7)=Wb/ (Wa+Wb);
In keyword set Unit2, a plurality of each keyword word 2, word 3, word 6, word 7belong to application T 2the first application characteristic value information be P (T 2| Word 2)=P (T 2| word 3)=P (T 2| word 6)=P (T 2| word 7)=Wb/ (Wb+Wc), belongs to application T 3the first application characteristic value information be P (T 3| Word 2)=P (T 3| word 3)=P (T 3| word 6)=P (T 3| word 7)=Wc/ (Wb+Wc);
4 couples of keyword word of Second Eigenvalue acquisition device 2belong to application T 2application eigenwert according to above formula 2) merge processing, obtain
P ( T 2 | word 2 ) = Wh Wa + Wb + Wh Wb + Wc Wa Wa + Wb + Wb Wa + Wb + Wb Wb + Wc + Wb Wb + Wc
Obtain word 2belong to application T 2the second application characteristic value information.
It should be noted that, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, any at least one first application characteristic value information to described each keyword is carried out statistical treatment, to obtain the implementation of the second application characteristic value information of described each keyword, all should be within the scope of the present invention.
Subsequently, when not meeting predetermined stoppage condition, the second application characteristic value information of each keyword that control device 5 obtains Second Eigenvalue acquisition device 4 is as the original application domain features value information of this keyword, to control application distributed acquisition device 2, the First Eigenvalue acquisition device 3 and Second Eigenvalue acquisition device 4, repeat corresponding operating, until meet described predetermined stoppage condition
Wherein, described predetermined stoppage condition includes but not limited to:
1) repeat number of times and be greater than while being scheduled to carry out frequency threshold value, control device 5 stops repeating operation;
2), when application attribute information corresponding to the maximal value in the second application characteristic value information of keyword is corresponding with the original application domain features value information of this keyword application attribute information is identical, control device 5 stops repeating operation.
It should be noted that, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, any original application domain features value information using the second application characteristic value information of described each keyword as this keyword, repeat the operation of application distributed acquisition device, the First Eigenvalue acquisition device and Second Eigenvalue acquisition device, until meet the implementation of predetermined stoppage condition, all should be within the scope of the present invention.
When meeting described predetermined stoppage condition, application attribute acquisition device 6, according to the second application characteristic value information of described each keyword, obtains the application attribute information of described each keyword.
Particularly, the application attribute information of application attribute acquisition device 6 using application characteristic value information corresponding to the maximal value in the second application characteristic value information of each keyword as this keyword, obtains the application attribute information of each keyword.
In an example, when control device 5 repeats number of times 51, be greater than predeterminedly while carrying out frequency threshold value 50, control device 5 stops repeating operation; Application attribute acquisition device 6 is according to application characteristic value information corresponding to maximal value in the second application characteristic value information of each keyword, as word 2belong to application T 1the second application characteristic value information be 0.3, belong to application T 2the second application characteristic value information be 0.65, application attribute acquisition device 6 is using application attribute information corresponding to the maximal value of the second application characteristic value information as word 2application attribute information, obtain word 2application attribute information be application T 2.
It should be noted that, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, any when meeting described predetermined stoppage condition, according to the second application characteristic value information of described each keyword, obtain the implementation of the application attribute information of described each keyword, all should be within the scope of the present invention.
By the application distribution characteristics value information to each keyword set, obtain, thereby can obtain the first application characteristic value information that each keyword belongs to one or more keyword set, and then can obtain from the angle of a plurality of keyword set the second application characteristic value information of each keyword, by repeating above-mentioned steps, iterative computation is to obtain the higher application attribute information of accuracy of each keyword, realized the Obtaining Accurate to application under magnanimity keyword, made to obtain result and more there is objectivity; Meanwhile, met existing search technique to the keyword demand that more application of refinement is divided; Further, the accurate classification to application under keyword, can tutorial message issue user set up rational keyword and the corresponding relation releasing news, thereby effectively optimize this user profile publishing policy.
As one of the preferred version of this programme (with reference to Fig. 1), this acquisition device also comprises the first degree of correlation acquisition device (not shown), and the First Eigenvalue acquisition device comprises weighting device (not shown).
The first degree of correlation acquisition device obtain each keyword respectively with the degree of correlation information of a plurality of application attribute informations.
Particularly, the mode that the first degree of correlation acquisition device obtains degree of correlation information includes but not limited to:
1) from preset degree of correlation database, inquire about to obtain; Wherein, described preset degree of correlation database comprises the degree of correlation information of a plurality of application attribute informations of a plurality of keywords and difference correspondence thereof; Described degree of correlation database includes but not limited to relational database, Key-Value storage system or file system etc.
In an example, the first degree of correlation acquisition device carries out matching inquiry respectively by each keyword in preset degree of correlation database, to obtain the degree of correlation information of corresponding itself of this each keyword and a plurality of application attribute informations.
2) described each keyword is cut to word and process, to obtain at least one keyword of described each keyword, cut word fragment; According at least one keyword cutting fragment of described each keyword, in preset each self-corresponding application of a plurality of application attribute informations is cut dictionary, inquire about, obtain this at least one keyword and cut word fragment and in each self-corresponding application of a plurality of application attribute informations, cut the frequency of occurrences information in dictionary respectively; According to described frequency of occurrences information, obtain each keyword respectively with the degree of correlation information of a plurality of application attribute informations; Which will describe in detail in the embodiment shown in Figure 2.
The application distribution characteristics value information of each keyword set that weighting device obtains according to application distributed acquisition device 2, and the degree of correlation information of obtaining in conjunction with the first degree of correlation acquisition device is weighted, to obtain at least one first application characteristic value information of described each keyword.
Particularly, weighting device is according to the application distribution characteristics value information of each keyword set of having obtained, each keyword that the application distribution characteristics value information discretize of each keyword set is comprised to this set, and be weighted in conjunction with the degree of correlation information of having obtained, to obtain at least one first application characteristic value information of this each keyword.
In an example, keyword set Unit1 comprises keyword word 1, word 2, word 6, word 7, the application distribution characteristics value information of keyword set Unit1 is: P (T 1| Unit1)=Wa/ (Wa+Wb), P (T 2| Unit1)=Wb/ (Wa+Wb), and word 1, word 2, word 6, word 7respectively with application T 1the degree of correlation be 0.7,0.1,0,0.2, word 1, word 2, word 6, word 7respectively with application T 2the degree of correlation be 0.1,0.8,0,0;
Weighting device by the application distribution characteristics value information discretize of Unit1 to this set in word 1, word 2, word 6, word 7, and tie degree of correlation information and be weighted, obtain:
Word 1belong to application T 1the first application characteristic value information be P (T 1| word 1)=0.7 * Wa/ (Wa+Wb),
Word 2belong to application T 1the first application characteristic value information be P (T 1| word 2)=0.1 * Wa/ (Wa+Wb),
Word 6belong to application T 1the first application characteristic value information be P (T 1| word 6)=0,
Word 7belong to application T 1the first application characteristic value information be P (T 1| word 6)=0.2 * Wa/ (Wa+Wb);
Obtain:
Word 1belong to application T 2the first application characteristic value information be P (T 2| word 1)=0.1 * Wb/ (Wa+Wb),
Word 2belong to application T 2the first application characteristic value information be P (T 2| word 2)=0.8 * Wb/ (Wa+Wb),
Word 6belong to application T 2the first application characteristic value information be P (T 2| word 6)=0,
Word 7belong to application T 2the first application characteristic value information be P (T 2| word 6)=0.
It should be noted that, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, any obtain each keyword respectively with the degree of correlation information of a plurality of application attribute informations, according to the application distribution characteristics value information of described each keyword set, and the degree of correlation information of obtaining in conjunction with the first degree of correlation acquisition device is weighted, to obtain the implementation of at least one the first application characteristic value information of described each keyword, all should be within the scope of the present invention.
Consider that different keywords have different correlativitys from different application domain attribute information on the meaning of a word, by each keyword and a plurality of application attribute information, the degree of correlation information on the meaning of a word adds in the process of obtaining the first application characteristic value information, make a certain keyword that the meaning of a word and a certain application attribute information degree of correlation are high belong to corresponding the heightening of weighted value of this application, thereby obtain the first application characteristic value information more accurately, for the final higher application attribute information of accuracy that obtains provides strong guarantee.
As one of preferred version of the present embodiment, Fig. 2 illustrate in accordance with a preferred embodiment of the present invention for obtaining the structural representation of acquisition device of the degree of correlation information of each keyword and a plurality of application attribute informations.Wherein, this acquisition device comprises initial characteristics value acquisition device 1, application distributed acquisition device 2, the First Eigenvalue acquisition device 3, Second Eigenvalue acquisition device 4, control device 5, application attribute acquisition device 6, cutting device 7, frequency of occurrences acquisition device 8 and the second degree of correlation acquisition device 9.
Wherein, initial characteristics value acquisition device 1, application distributed acquisition device 2, the First Eigenvalue acquisition device 3, Second Eigenvalue acquisition device 4, control device 5 and application attribute acquisition device 6 are described in detail with reference to the embodiment shown in FIG. 1, do not repeat them here.
Cutting device 7 is cut word to each keyword and is processed, and to obtain at least one keyword of this each keyword, cuts word fragment.
At this, described in cut word mode and include but not limited to Forward Maximum Method, reverse maximum coupling, two-way maximum coupling, language model method, shortest path first etc.
Then, frequency of occurrences acquisition device 8 is cut word fragment according at least one keyword of each keyword having obtained, in preset each self-corresponding application of a plurality of application attribute informations is cut dictionary, inquire about, obtain this at least one keyword and cut word fragment and in each self-corresponding application of a plurality of application attribute informations, cut the frequency of occurrences information in dictionary respectively; Wherein, described each self-corresponding application of a plurality of application attribute information cut dictionary comprise each application preset cut word fragment.
Subsequently, the second degree of correlation acquisition device 9 is according to the frequency of occurrences information of having obtained, obtain each keyword respectively with the degree of correlation information of a plurality of application attribute informations.
Wherein, the mode that the second degree of correlation acquisition device 9 obtains the degree of correlation includes but not limited to:
1) according to the predetermined rule of obtaining, the second degree of correlation acquisition device 9 obtain each keyword respectively with the degree of correlation information of a plurality of application attribute informations; For example, the predetermined rule of obtaining is for an application attribute information, the frequency of occurrences is greater than predetermined first and occurs that the keyword of threshold value and the degree of correlation of this application attribute information are 0.8, the frequency of occurrences is at predetermined Second Threshold and while subscribing between the 3rd threshold value, the degree of correlation of keyword and this application attribute information is 0.4, and the frequency of occurrences is less than the predetermined the 4th and occurs that the keyword of threshold value and the degree of correlation of this application attribute information are 0.
2) according to frequency of occurrences information, by BM25 algorithm, calculate, the second degree of correlation acquisition device 9 obtain each keyword respectively with the degree of correlation information of a plurality of application attribute informations.
Particularly, the mathematic(al) representation of BM25 algorithm is following formula 2) and 3)
score ( D , Q ) = Σ i = 1 n IDF ( q i ) · f ( q i , D ) · ( k 1 + 1 ) f ( q i , D ) + k 1 · ( 1 - b + b · | D | avgdl ) , - - - 2 )
IDF ( q i ) = log N - n ( q i ) + 0.5 n ( q i ) + 0.5 , - - - 3 )
Wherein, Score (D, Q) represents the degree of correlation information of a certain keyword word and a certain application;
Qi represents that keyword cuts word fragment;
F (qi, D) expression keyword is cut word fragment qi and is cut the frequency of occurrences information in dictionary in application corresponding to a certain application;
| D| represents that application corresponding to a certain application cut and cuts word fragment sum in dictionary;
Avgdl represent all applications respectively corresponding application cut and cut word fragment sum in dictionary;
K1 and b represent for adjusting the parameter of precision, preferably, and k1=2, b=0.75;
N: application classification total quantity
N (qi): all applications respectively corresponding application are cut and comprised the application categorical measure that word fragment qi cut in keyword in dictionary.
It should be noted that, cutting device, frequency of occurrences acquisition device and the second degree of correlation acquisition device can be contained in the first degree of correlation acquisition device and obtain degree of correlation information, also can be independent mutually with the first degree of correlation acquisition device, the first degree of correlation acquisition device obtains the degree of correlation information of having obtained from the second degree of correlation acquisition device.
It should be noted that, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention those skilled in the art should understand that, anyly described each keyword is cut to word process, to obtain at least one keyword of described each keyword, cut word fragment; According at least one keyword of described each keyword, cut word fragment, in preset each self-corresponding application of a plurality of application attribute informations is cut dictionary, inquire about, obtain this at least one keyword and cut word fragment and in each self-corresponding application of a plurality of application attribute informations, cut the frequency of occurrences information in dictionary respectively; According to described frequency of occurrences information, obtain each keyword respectively with the implementation of the degree of correlation information of a plurality of application attribute informations, all should be within the scope of the present invention.
Fig. 3 illustrates the method flow diagram of the application attribute information that obtains keyword according to a further aspect of the present invention.Method of the present invention mainly realizes by computer equipment; Wherein, according to the method for this preferred embodiment, comprise step S 1, step S2, step S3, step S4, step S5 and step S6.
Described computer equipment includes but not limited to the network equipment and subscriber equipment.Wherein, described subscriber equipment includes but not limited to PC etc.; The described network equipment includes but not limited to server group or the cloud consisting of a large amount of computing machines or the webserver based on cloud computing (Cloud Computing) that single network server, a plurality of webserver form, wherein, cloud computing is a kind of of Distributed Calculation, the super virtual machine being comprised of the loosely-coupled computing machine collection of a group.Wherein, the residing network of described subscriber equipment and the network equipment includes but not limited to internet, wide area network, Metropolitan Area Network (MAN), LAN (Local Area Network), VPN network etc.
It should be noted that; described subscriber equipment and the network equipment are only for giving an example; other subscriber equipmenies existing or that may occur from now on, the network equipment or network, as applicable to the present invention, also should be included in protection domain of the present invention, and be contained in this with way of reference.
First, in step S1, computer equipment obtains the original application domain features value information of at least one keyword that in pending a plurality of keyword set, each keyword set comprises, wherein, described each keyword set comprises a plurality of keywords.
Wherein, application refers to the field that described keyword is applied to, and includes but not limited to industry etc.
Wherein, described characteristic value information includes but not limited to probabilistic information.
Particularly, in step S1, a plurality of keywords that computer equipment comprises each keyword set are inquired about in default application sorted table, obtain the original application domain features value information of at least one keyword that this each keyword set comprises.
Wherein, described default application sorted table comprises the original application domain features value information that a plurality of keywords are corresponding with its difference, and it can obtain by modes such as language material training.Preferably, language material training patterns comprises carries out application attribute labeling to the keyword language material of each preset application, and keyword language material is carried out to the processing such as participle and part-of-speech tagging, then utilize sorting algorithm, the Algorithm of documents categorization based on maximum entropy for example, keyword language material is trained, to obtain original application domain features value information corresponding to a plurality of keywords difference.
It should be noted that, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, any implementation of obtaining the original application domain features value information of at least one keyword that in pending a plurality of keyword set, each keyword set comprises, all should be within the scope of the present invention.
Then, in step S2, the original application domain features value information of at least one keyword that computer equipment obtains in step S1 according to it, carries out statistic of classification processing to described each keyword set, to obtain the application distribution characteristics value information of described each keyword set.
Particularly, in step S2, computer equipment is according to the original application domain features value information of at least one keyword in each keyword set of having obtained, original application domain features value information to this at least one keyword carries out the statistical treatment of application eigenwert by application, to obtain the application distribution characteristics value information of this each keyword set.
In an example, when keyword set Unit1 comprises keyword word 1, word 2, word 6, word 7, wherein, word 1original application domain features value information for belonging to application T 1probability be Wa, word 2belong to application T 1probabilistic information be Wb,, in step S2, computer equipment carries out statistic of classification processing to Unit1, the application eigenwert distributed intelligence of obtaining Unit1 is that Unit1 belongs to application T 1probability be P (T 1| Unit1)=Wa/ (Wa+Wb), Unit1 belongs to application T 2probability be P (T 2| Unit1)=Wb/ (Wa+Wb).
It should be noted that, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, the original application domain features value information of at least one keyword described in any basis, described each keyword set is carried out to statistic of classification processing, to obtain the implementation of the application distribution characteristics value information of described each keyword set, all should be within the scope of the present invention.
Subsequently, in step S3, the application distribution characteristics value information of each keyword set that computer equipment obtains in step S2 according to it, obtain at least one first application characteristic value information of each keyword, wherein, the application distribution characteristics value information of at least one first application characteristic value information at least one keyword set under should keyword.
Wherein, each keyword can belong to one or more keyword set.
Particularly, in step S3, computer equipment is according to the application distribution characteristics value information of each keyword set of having obtained, the application distribution characteristics value information discretize of each keyword set, to each keyword that this set comprises, is obtained at least one first application characteristic value information of each keyword.
In an example, keyword set Unit1 comprises keyword word 1, word 2, word 6, word 7, the application distribution characteristics value information of keyword set Unit1 is: P (T 1| Unit1)=Wa/ (Wa+Wb), P (T 2| Unit1)=Wb/ (Wa+Wb); In step S3, computer equipment by the application distribution characteristics value information discretize of Unit1 to this set in keyword word 1, word 2, word 6, word 7, obtain:
Keyword word 1, word 2, word 6, word 7belong to application T 1the first application characteristic value information be P (T 1| word 1)=P (T 1| word 2)=P (T 1| word 6)=P (T 1| word 7)=Wa/ (Wa+Wb),
And keyword word 1, word 2, word 6, word 7belong to application T 2the first application characteristic value information be P (T 2| word 1)=P (T 2| word 2)=P (T 2| word 6)=P (T 2| word 7)=Wb/ (Wa+Wb);
In addition, keyword set Unit2 comprises keyword word 2, word 3, word 6, word 7, the application distribution characteristics value information of keyword set Unit2 is: P (T 2| Unit2)=Wb/ (Wb+Wc), P (T 3| Unit2)=Wc/ (Wb+Wc); In step S3, computer equipment by the application distribution characteristics value information discretize of Unit2 to this set in keyword word 2, word 3, word 6, word 7, obtain:
Keyword word 2, word 3, word 6, word 7belong to application T 2the first application characteristic value information be P (T 2| Word 2)=P (T 2| word 3)=P (T 2| word 6)=P (T 2| word 7)=Wb/ (Wb+Wc);
Keyword word 2, word 3, word 6, word 7belong to application T 3the first application characteristic value information be P (T 3| Word 2)=P (T 3| word 3)=P (T 3| word 6)=P (T 3| word 7)=Wc/ (Wb+Wc).
It should be noted that, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, the application distribution characteristics value information of each keyword set described in any basis, obtain the implementation of at least one the first application characteristic value information of each keyword, all should be within the scope of the present invention.
Subsequently, in step S4, at least one first application characteristic value information of each keyword that computer equipment obtains in step S3 it is carried out statistical treatment, to obtain the second application characteristic value information of described each keyword.
Particularly, at least one first application characteristic value information of each keyword is carried out to statistical treatment, to obtain the method for the second application characteristic value information of described each keyword, includes but not limited to:
1), in step S4, computer equipment selects the maximal value of the first application characteristic value information as the second application characteristic value information from least one first application characteristic value information of each keyword;
In an example, connect example, in keyword set Unit1, word 2belong to application T 2the first application characteristic value information be P (T 2| word 2)=Wb/ (Wa+Wb), in keyword set Unit2, word 2belong to application T 2the first application characteristic value information be P (T 2| Word 2)=Wb/ (Wb+Wc) selects maximal value as word from these two the first application characteristic value information 2belong to application T 2the second application characteristic value information.
2), in step S4, computer equipment is according to following formula 1) at least one first application characteristic value information of described each keyword is merged to processing, to obtain the second application characteristic value information of described each keyword:
P ( T i | word j ) = Σ unit 1 unitN P ( T i | word j ) Σ i = 1 M P ( T i | word j ) - - - 1 )
Wherein, T irepresent a certain application attribute information;
Word jrepresent a certain keyword;
N represents the sum of keyword set;
M represents the sum of application attribute information;
P(T i| word j) expression keyword word jbelong to application T ithe second application characteristic value information.
In an example, in keyword set Unit1, a plurality of each keyword word 1, word 2, word 6, word 7belong to application T 1the first application characteristic value information be P (T 1| word 1)=P (T 1| word 2)=P (T 1| word 6)=P (T 1| word 7)=Wa/ (Wa+Wb), belongs to application T 2the first application characteristic value information be P (T 2| word 1)=P (T 2| word 2)=P (T 2| word 6)=P (T 2| word 7)=Wb/ (Wa+Wb);
In keyword set Unit2, a plurality of each keyword word 2, word 3, word 6, word 7belong to application T 2the first application characteristic value information be P (T 2| Word 2)=P (T 2| word 3)=P (T 2| word 6)=P (T 2| word 7)=Wb/ (Wb+Wc), belongs to application T 3the first application characteristic value information be P (T 3| Word 2)=P (T 3| word 3)=P (T 3| word 6)=P (T 3| word 7)=Wc/ (Wb+Wc);
In step S4, computer equipment is to keyword word 2belong to application T 2application eigenwert according to above formula 2) merge processing, obtain
P ( T 2 | word 2 ) = Wb Wa + Wb + Wb Wb + Wc Wa Wa + Wb + Wb Wa + Wb + Wb Wb + Wc + Wc Wb + Wc
Obtain word 2belong to application T 2the second application characteristic value information.
It should be noted that, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, any at least one first application characteristic value information to described each keyword is carried out statistical treatment, to obtain the implementation of the second application characteristic value information of described each keyword, all should be within the scope of the present invention.
Subsequently, when not meeting predetermined stoppage condition, in step S5, the second application characteristic value information of each keyword that computer equipment obtains it in step S4 is as the original application domain features value information of this keyword, with computer equipment, repeat its corresponding operating in step S2, step S3 and step S4, until meet described predetermined stoppage condition
Wherein, described predetermined stoppage condition includes but not limited to:
1) repeat number of times and be greater than while being scheduled to carry out frequency threshold value, in step S5, computer equipment stops repeating operation;
2) when application attribute information corresponding to the maximal value in the second application characteristic value information of keyword is corresponding with the original application domain features value information of this keyword application attribute information is identical, in step S5, computer equipment stops repeating operation.
It should be noted that, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, any original application domain features value information using the second application characteristic value information of described each keyword as this keyword, repeat the operation in step S2, step S3 and step S4, until meet the implementation of predetermined stoppage condition, all should be within the scope of the present invention.
When meeting described predetermined stoppage condition, in step S6, computer equipment, according to the second application characteristic value information of described each keyword, obtains the application attribute information of described each keyword.
Particularly, in step S6, the application attribute information of computer equipment using application characteristic value information corresponding to the maximal value in the second application characteristic value information of each keyword as this keyword, obtains the application attribute information of each keyword.
In an example, when computer equipment repeats number of times 51, be greater than predeterminedly while carrying out frequency threshold value 50, it stops repeating operation in step S5; In step S6, computer equipment is according to application characteristic value information corresponding to maximal value in the second application characteristic value information of each keyword, as word 2belong to application T 1the second application characteristic value information be 0.3, belong to application T 2the second application characteristic value information be 0.65, computer equipment is using application attribute information corresponding to the maximal value of the second application characteristic value information as word 2application attribute information, obtain word 2application attribute information be application T 2.
It should be noted that, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, any when meeting described predetermined stoppage condition, according to the second application characteristic value information of described each keyword, obtain the implementation of the application attribute information of described each keyword, all should be within the scope of the present invention.
By the application distribution characteristics value information to each keyword set, obtain, thereby can obtain the first application characteristic value information that each keyword belongs to one or more keyword set, and then can obtain from the angle of a plurality of keyword set the second application characteristic value information of each keyword, by repeating above-mentioned steps, iterative computation is to obtain the higher application attribute information of accuracy of each keyword, realized the Obtaining Accurate to application under magnanimity keyword, made to obtain result and more there is objectivity; Meanwhile, met existing search technique to the keyword demand that more application of refinement is divided; Further, the accurate classification to application under keyword, can tutorial message issue user set up rational keyword and the corresponding relation releasing news, thereby effectively optimize this user profile publishing policy.
As one of the preferred version of this programme (with reference to Fig. 3), the method also comprises step S10 (not shown), and step S3 comprises step S301 (not shown).
In step S10, computer equipment obtain each keyword respectively with the degree of correlation information of a plurality of application attribute informations.
Particularly, in step S10, the mode that computer equipment obtains degree of correlation information includes but not limited to:
1) from preset degree of correlation database, inquire about to obtain; Wherein, described preset degree of correlation database comprises the degree of correlation information of a plurality of application attribute informations of a plurality of keywords and difference correspondence thereof; Described degree of correlation database includes but not limited to relational database, Key-Value storage system or file system etc.
In an example, in step S10, computer equipment carries out matching inquiry respectively by each keyword in preset degree of correlation database, to obtain the degree of correlation information of corresponding itself of this each keyword and a plurality of application attribute informations.
2) described each keyword is cut to word and process, to obtain at least one keyword of described each keyword, cut word fragment; According at least one keyword cutting fragment of described each keyword, in preset each self-corresponding application of a plurality of application attribute informations is cut dictionary, inquire about, obtain this at least one keyword and cut word fragment and in each self-corresponding application of a plurality of application attribute informations, cut the frequency of occurrences information in dictionary respectively; According to described frequency of occurrences information, obtain each keyword respectively with the degree of correlation information of a plurality of application attribute informations; Which will describe in detail in the embodiment shown in fig. 4.
In step S301, the application distribution characteristics value information of each keyword set that computer equipment obtains in step S2 according to it, and be weighted in conjunction with the degree of correlation information of having obtained, to obtain at least one first application characteristic value information of described each keyword.
Particularly, in step S301, computer equipment is according to the application distribution characteristics value information of each keyword set of having obtained, each keyword that the application distribution characteristics value information discretize of each keyword set is comprised to this set, and be weighted in conjunction with the degree of correlation information of having obtained, to obtain at least one first application characteristic value information of this each keyword.
In an example, keyword set Unit1 comprises keyword word 1, word 2, word 6, word 7, the application distribution characteristics value information of keyword set Unit1 is: P (T 1| Unit1)=Wa/ (Wa+Wb), P (T 2| Unit1)=Wb/ (Wa+Wb), and word 1, word 2, word 6, word 7respectively with application T 1the degree of correlation be 0.7,0.1,0,0.2, word 1, word 2, word 6, word 7respectively with application T 2the degree of correlation be 0.1,0.8,0,0;
In step S301, computer equipment by the application distribution characteristics value information discretize of Unit1 to this set in word 1, word 2, word 6, word 7, and tie degree of correlation information and be weighted, obtain:
Word 1belong to application T 1the first application characteristic value information be P (T 1| word 1)=0.7 * Wa/ (Wa+Wb),
Word 2belong to application T 1the first application characteristic value information be P (T 1| word 2)=0.1 * Wa/ (Wa+Wb),
Word 6belong to application T 1the first application characteristic value information be P (T 1| word 6)=0,
Word 7belong to application T 1the first application characteristic value information be P (T 1| word 6)=0.2 * Wa/ (Wa+Wb);
Obtain:
Word 1belong to application T 2the first application characteristic value information be P (T 2| word 1)=0.1 * Wb/ (Wa+Wb),
Word 2belong to application T 2the first application characteristic value information be P (T 2| word 2)=0.8 * Wb/ (Wa+Wb),
Word 6belong to application T 2the first application characteristic value information be P (T 2| word 6)=0,
Word 7belong to application T 2the first application characteristic value information be P (T 2| word 6)=0.
It should be noted that, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, any obtain each keyword respectively with the degree of correlation information of a plurality of application attribute informations, according to the application distribution characteristics value information of described each keyword set, and be weighted in conjunction with described degree of correlation information, to obtain the implementation of at least one the first application characteristic value information of described each keyword, all should be within the scope of the present invention.
Consider that different keywords have different correlativitys from different application domain attribute information on the meaning of a word, by each keyword and a plurality of application attribute information, the degree of correlation information on the meaning of a word adds in the process of obtaining the first application characteristic value information, make a certain keyword that the meaning of a word and a certain application attribute information degree of correlation are high belong to corresponding the heightening of weighted value of this application, thereby obtain the first application characteristic value information more accurately, for the final higher application attribute information of accuracy that obtains provides strong guarantee.
As one of preferred version of the present embodiment, Fig. 4 illustrate in accordance with a preferred embodiment of the present invention for obtaining the method flow diagram of the degree of correlation information of each keyword and a plurality of application attribute informations.Wherein, according to the method for this preferred embodiment, comprise step S1, step S2, step S3, step S4, step S5, step S6, step S7, step S8 and step S9.
Wherein, step S1, step S2, step S3, step S4, step S5 and step S6 are described in detail with reference to the embodiment shown in FIG. 3, do not repeat them here.
In step S7, computer equipment is cut word to each keyword and is processed, and to obtain at least one keyword of this each keyword, cuts word fragment.
At this, described in cut word mode and include but not limited to Forward Maximum Method, reverse maximum coupling, two-way maximum coupling, language model method, shortest path first etc.
Then, in step S8, computer equipment is cut word fragment according at least one keyword of each keyword having obtained, in preset each self-corresponding application of a plurality of application attribute informations is cut dictionary, inquire about, obtain this at least one keyword and cut word fragment and in each self-corresponding application of a plurality of application attribute informations, cut the frequency of occurrences information in dictionary respectively; Wherein, described each self-corresponding application of a plurality of application attribute information cut dictionary comprise each application preset cut word fragment.
Subsequently, in step S9, computer equipment is according to the frequency of occurrences information obtained, obtain each keyword respectively with the degree of correlation information of a plurality of application attribute informations.
Wherein, in step S9, the mode that computer equipment obtains the degree of correlation includes but not limited to:
1) according to the predetermined rule of obtaining, in step S9, computer equipment obtain each keyword respectively with the degree of correlation information of a plurality of application attribute informations; For example, the predetermined rule of obtaining is for an application attribute information, the frequency of occurrences is greater than predetermined first and occurs that the keyword of threshold value and the degree of correlation of this application attribute information are 0.8, the frequency of occurrences is at predetermined Second Threshold and while subscribing between the 3rd threshold value, the degree of correlation of keyword and this application attribute information is 0.4, and the frequency of occurrences is less than the predetermined the 4th and occurs that the keyword of threshold value and the degree of correlation of this application attribute information are 0.
2) according to frequency of occurrences information, by BM25 algorithm, calculate, in step S9, computer equipment obtain each keyword respectively with the degree of correlation information of a plurality of application attribute informations.
Particularly, the mathematic(al) representation of BM25 algorithm is following formula 2) and 3)
score ( D , Q ) = Σ i = 1 n IDF ( q i ) · f ( q i , D ) · ( k 1 + 1 ) f ( q i , D ) + k 1 · ( 1 - b + b · | D | avgdl ) , - - - 2 )
IDF ( q i ) = log N - n ( q i ) + 0.5 n ( q i ) + 0.5 , - - - 3 )
Wherein, Score (D, Q) represents the degree of correlation information of a certain keyword word and a certain application;
Qi represents that keyword cuts word fragment;
F (qi, D) expression keyword is cut word fragment qi and is cut the frequency of occurrences information in dictionary in application corresponding to a certain application;
| D| represents that application corresponding to a certain application cut and cuts word fragment sum in dictionary;
Avgdl represent all applications respectively corresponding application cut and cut word fragment sum in dictionary;
K1 and b represent for adjusting the parameter of precision, preferably, and k1=2, b=O.75;
N: application classification total quantity
N (qi): all applications respectively corresponding application are cut and comprised the application categorical measure that word fragment qi cut in keyword in dictionary.
It should be noted that, step S7, step S8 and step S9 can be contained in step S10 and obtain degree of correlation information, also can be independent mutually with step S10, and computer equipment obtains the degree of correlation information of having obtained in step S10 among step S9 from it.
It should be noted that, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention those skilled in the art should understand that, anyly described each keyword is cut to word process, to obtain at least one keyword of described each keyword, cut word fragment; According at least one keyword of described each keyword, cut word fragment, in preset each self-corresponding application of a plurality of application attribute informations is cut dictionary, inquire about, obtain this at least one keyword and cut word fragment and in each self-corresponding application of a plurality of application attribute informations, cut the frequency of occurrences information in dictionary respectively; According to described frequency of occurrences information, obtain each keyword respectively with the implementation of the degree of correlation information of a plurality of application attribute informations, all should be within the scope of the present invention.
It should be noted that the present invention can be implemented in the assembly of software and/or software and hardware, for example, can adopt special IC (ASIC) or any other similar hardware device to realize.In one embodiment, software program of the present invention can carry out to realize step mentioned above or function by processor.Similarly, software program of the present invention (comprising relevant data structure) can be stored in computer readable recording medium storing program for performing, for example, and RAM storer, magnetic or CD-ROM driver or flexible plastic disc and similar devices.In addition, steps more of the present invention or function can adopt hardware to realize, for example, thereby as coordinate the circuit of carrying out each step or function with processor.
To those skilled in the art, obviously the invention is not restricted to the details of above-mentioned one exemplary embodiment, and in the situation that not deviating from spirit of the present invention or essential characteristic, can realize the present invention with other concrete form.Therefore, no matter from which point, all should regard embodiment as exemplary, and be nonrestrictive, scope of the present invention is limited by claims rather than above-mentioned explanation, is therefore intended to be included in the present invention dropping on the implication that is equal to important document of claim and all changes in scope.Any Reference numeral in claim should be considered as limiting related claim.In addition, obviously other unit or step do not got rid of in " comprising " word, and odd number is not got rid of plural number.A plurality of unit of stating in device claim or device also can be realized by software or hardware by a unit or device.The first, the second word such as grade is used for representing title, and does not represent any specific order.

Claims (11)

1. for obtaining a method for the application attribute information of keyword, the method comprises the following steps:
A obtains the original application domain features value information of at least one keyword that in pending a plurality of keyword set, each keyword set comprises, wherein, described each keyword set comprises a plurality of keywords;
B, according to the original application domain features value information of described at least one keyword, carries out statistic of classification processing to described each keyword set, to obtain the application distribution characteristics value information of described each keyword set;
C is according to the application distribution characteristics value information of described each keyword set, obtain at least one first application characteristic value information of each keyword, wherein, the application distribution characteristics value information of at least one first application characteristic value information at least one keyword set under should keyword;
D carries out statistical treatment at least one first application characteristic value information of described each keyword, to obtain the second application characteristic value information of described each keyword;
Original application domain features value information using the second application characteristic value information of described each keyword as this keyword, repeated execution of steps b, c, d, until meet predetermined stoppage condition;
Wherein, the method also comprises:
W, when meeting described predetermined stoppage condition, according to the second application characteristic value information of described each keyword, obtains the application attribute information of described each keyword.
2. method according to claim 1, wherein, described step a comprises:
-a plurality of keywords that described each keyword set is comprised are inquired about in default application sorted table, obtain the original application domain features value information of at least one keyword that described each keyword set comprises.
3. method according to claim 1 and 2, wherein, at least one first application characteristic value information of described each keyword is carried out to statistical treatment and to obtain the mode of the second application characteristic value information of described each keyword, also comprise:
-according to following formula, at least one first application characteristic value information of described each keyword is merged to processing, to obtain the second application characteristic value information of described each keyword:
P ( T i | word j ) = Σ unit 1 unitN P ( T i | word j ) Σ i = 1 M P ( T i | word j )
Wherein, T irepresent a certain application attribute information;
Word jrepresent a certain keyword;
N represents the sum of keyword set;
M represents the sum of application attribute information;
P(T i| word j) expression keyword word jbelong to application T ithe second application characteristic value information.
4. according to the method described in claims 1 to 3 any one, wherein, the method also comprises:
-obtain each keyword respectively with the degree of correlation information of a plurality of application attribute informations;
Wherein, described step c comprises:
-according to the application distribution characteristics value information of described each keyword set, and be weighted in conjunction with described degree of correlation information, to obtain at least one first application characteristic value information of described each keyword.
5. method according to claim 4, wherein, described in to obtain the method for degree of correlation information of each keyword and a plurality of application attribute informations further comprising the steps of:
-described each keyword is cut to word process, to obtain at least one keyword of described each keyword, cut word fragment;
-according at least one keyword of described each keyword, cut word fragment, in preset each self-corresponding application of a plurality of application attribute informations is cut dictionary, inquire about, obtain this at least one keyword and cut word fragment and in each self-corresponding application of a plurality of application attribute informations, cut the frequency of occurrences information in dictionary respectively;
-according to described frequency of occurrences information, obtain each keyword respectively with the degree of correlation information of a plurality of application attribute informations.
6. for obtaining an acquisition device for the application attribute information of keyword, this acquisition device comprises:
Initial characteristics value acquisition device, for obtaining the original application domain features value information of at least one keyword that pending each keyword set of a plurality of keyword set comprises, wherein, described each keyword set comprises a plurality of keywords;
Application distributed acquisition device, for according to the original application domain features value information of described at least one keyword, carries out statistic of classification processing to described each keyword set, to obtain the application distribution characteristics value information of described each keyword set;
The First Eigenvalue acquisition device, be used for according to the application distribution characteristics value information of described each keyword set, obtain at least one first application characteristic value information of each keyword, wherein, the application distribution characteristics value information of at least one first application characteristic value information at least one keyword set under should keyword;
Second Eigenvalue acquisition device, carries out statistical treatment at least one the first application characteristic value information to described each keyword, to obtain the second application characteristic value information of described each keyword;
Control device, when not meeting predetermined stoppage condition, original application domain features value information using the second application characteristic value information of described each keyword as this keyword, to control described application distributed acquisition device, described the First Eigenvalue acquisition device and described Second Eigenvalue acquisition device, repeat corresponding operating, until meet described predetermined stoppage condition;
Wherein, this acquisition device also comprises:
Application attribute acquisition device, for when meeting described predetermined stoppage condition, according to the second application characteristic value information of described each keyword, obtains the application attribute information of described each keyword.
7. acquisition device according to claim 6, wherein, described initial characteristics value acquisition device is inquired about in default application sorted table for a plurality of keywords that described each keyword set is comprised, the original application domain features value information of at least one keyword that described in obtaining, each keyword set comprises.
8. according to the acquisition device described in claim 6 or 7, wherein, at least one first application characteristic value information of described each keyword is carried out to statistical treatment and to obtain the mode of the second application characteristic value information of described each keyword, also comprises:
-according to following formula, at least one first application characteristic value information of described each keyword is merged to processing, to obtain the second application characteristic value information of described each keyword:
P ( T i | word j ) = Σ unit 1 unitN P ( T i | word j ) Σ i = 1 M P ( T i | word j )
Wherein, T irepresent a certain application attribute information;
Word jrepresent a certain keyword;
N represents the sum of keyword set;
M represents the sum of application attribute information;
P(T i| word j) expression keyword word jbelong to application T ithe second application characteristic value information.
9. according to the acquisition device described in claim 6 to 8 any one, wherein, this acquisition device also comprises:
The first degree of correlation acquisition device, for obtain each keyword respectively with the degree of correlation information of a plurality of application attribute informations;
Wherein, described the First Eigenvalue acquisition device comprises:
Weighting device, for according to the application distribution characteristics value information of described each keyword set, and is weighted in conjunction with described degree of correlation information, to obtain at least one first application characteristic value information of described each keyword.
10. acquisition device according to claim 9, wherein, this acquisition device also comprises:
Cutting device, processes for described each keyword being cut to word, to obtain at least one keyword of described each keyword, cuts word fragment;
Frequency of occurrences acquisition device, for cutting word fragment according at least one keyword of described each keyword, in preset each self-corresponding application of a plurality of application attribute informations is cut dictionary, inquire about, obtain this at least one keyword and cut word fragment and in each self-corresponding application of a plurality of application attribute informations, cut the frequency of occurrences information in dictionary respectively;
The second degree of correlation acquisition device, for according to described frequency of occurrences information, obtain each keyword respectively with the degree of correlation information of a plurality of application attribute informations.
11. 1 kinds of computer equipments, comprise the acquisition device as described in claim 6 to 10 at least one.
CN201210335806.7A 2012-09-11 2012-09-11 A kind of method, apparatus and equipment of the application field attribute information for being used to obtain keyword Active CN103678356B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210335806.7A CN103678356B (en) 2012-09-11 2012-09-11 A kind of method, apparatus and equipment of the application field attribute information for being used to obtain keyword

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210335806.7A CN103678356B (en) 2012-09-11 2012-09-11 A kind of method, apparatus and equipment of the application field attribute information for being used to obtain keyword

Publications (2)

Publication Number Publication Date
CN103678356A true CN103678356A (en) 2014-03-26
CN103678356B CN103678356B (en) 2018-05-25

Family

ID=50315949

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210335806.7A Active CN103678356B (en) 2012-09-11 2012-09-11 A kind of method, apparatus and equipment of the application field attribute information for being used to obtain keyword

Country Status (1)

Country Link
CN (1) CN103678356B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107093099A (en) * 2017-03-10 2017-08-25 重庆软易科技有限公司 A kind of internet trading system and method
CN107193973A (en) * 2017-05-25 2017-09-22 百度在线网络技术(北京)有限公司 The field recognition methods of semanteme parsing information and device, equipment and computer-readable recording medium
WO2018076243A1 (en) * 2016-10-27 2018-05-03 华为技术有限公司 Search method and device
CN109284392A (en) * 2018-12-07 2019-01-29 深圳前海达闼云端智能科技有限公司 Text classification method, device, terminal and storage medium
CN110019827A (en) * 2017-08-24 2019-07-16 腾讯科技(北京)有限公司 A kind of corpus library generating method, device, equipment and computer storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101021866A (en) * 2007-03-13 2007-08-22 白云 Method for criminating electronci file and relative degree with certain field and application thereof
CN101770580A (en) * 2009-01-04 2010-07-07 中国科学院计算技术研究所 Training method and classification method of cross-field text sentiment classifier
CN102682090A (en) * 2012-04-26 2012-09-19 焦点科技股份有限公司 System and method for matching and processing sensitive words on basis of polymerized word tree
CN102722503A (en) * 2011-03-31 2012-10-10 北京百度网讯科技有限公司 Method and device for sequencing search results

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101021866A (en) * 2007-03-13 2007-08-22 白云 Method for criminating electronci file and relative degree with certain field and application thereof
CN101770580A (en) * 2009-01-04 2010-07-07 中国科学院计算技术研究所 Training method and classification method of cross-field text sentiment classifier
CN102722503A (en) * 2011-03-31 2012-10-10 北京百度网讯科技有限公司 Method and device for sequencing search results
CN102682090A (en) * 2012-04-26 2012-09-19 焦点科技股份有限公司 System and method for matching and processing sensitive words on basis of polymerized word tree

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张琦: ""搜索引擎营销领域某关键词分析管理工具的测试"", 《中国优秀硕士学位论文全文数据库·信息科技辑》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018076243A1 (en) * 2016-10-27 2018-05-03 华为技术有限公司 Search method and device
US11210292B2 (en) 2016-10-27 2021-12-28 Huawei Technologies Co., Ltd. Search method and apparatus
CN107093099A (en) * 2017-03-10 2017-08-25 重庆软易科技有限公司 A kind of internet trading system and method
CN107093099B (en) * 2017-03-10 2020-10-30 重庆软易科技有限公司 Network transaction system and method
CN107193973A (en) * 2017-05-25 2017-09-22 百度在线网络技术(北京)有限公司 The field recognition methods of semanteme parsing information and device, equipment and computer-readable recording medium
US10777192B2 (en) 2017-05-25 2020-09-15 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus of recognizing field of semantic parsing information, device and readable medium
CN107193973B (en) * 2017-05-25 2021-07-20 百度在线网络技术(北京)有限公司 Method, device and equipment for identifying field of semantic analysis information and readable medium
CN110019827A (en) * 2017-08-24 2019-07-16 腾讯科技(北京)有限公司 A kind of corpus library generating method, device, equipment and computer storage medium
CN109284392A (en) * 2018-12-07 2019-01-29 深圳前海达闼云端智能科技有限公司 Text classification method, device, terminal and storage medium
CN109284392B (en) * 2018-12-07 2021-04-06 达闼机器人有限公司 Text classification method, device, terminal and storage medium

Also Published As

Publication number Publication date
CN103678356B (en) 2018-05-25

Similar Documents

Publication Publication Date Title
CN102760138B (en) Classification method and device for user network behaviors and search method and device for user network behaviors
Hu et al. Adaptive online event detection in news streams
CN110851598B (en) Text classification method and device, terminal equipment and storage medium
US20150142760A1 (en) Method and device for deduplicating web page
US10565253B2 (en) Model generation method, word weighting method, device, apparatus, and computer storage medium
CN103678356A (en) Method, device and equipment for obtaining application field attribute information of keywords
CN103838756A (en) Method and device for determining pushed information
CN102662960A (en) On-line supervised theme-modeling and evolution-analyzing method
CN101430695A (en) Automatic generation of ontologies using word affinities
CN104077417A (en) Figure tag recommendation method and system in social network
CN104252456A (en) Method, device and system for weight estimation
CN104463208A (en) Multi-view semi-supervised collaboration classification algorithm with combination of agreement and disagreement label rules
CN104050556A (en) Feature selection method and detection method of junk mails
CN105975459A (en) Lexical item weight labeling method and device
CN104636407A (en) Parameter choice training and search request processing method and device
CN104102635A (en) Method and device for digging knowledge graph
CN105373546A (en) Information processing method and system for knowledge services
CN104615723A (en) Determining method and device of search term weight value
Jalil et al. Comparative study of clustering algorithms in text mining context
CN105447004A (en) Mining device for query suggestion words, related query method and device
CN104252487A (en) Method and device for generating entry information
KR101666740B1 (en) Method for generating assocication rules for data mining based on semantic analysis in big data environment
CN113627132A (en) Data deduplication mark code generation method and system, electronic device and storage medium
CN110389932B (en) Automatic classification method and device for power files
Gao et al. Text categorization based on improved Rocchio algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant