CN103678356B - A kind of method, apparatus and equipment of the application field attribute information for being used to obtain keyword - Google Patents
A kind of method, apparatus and equipment of the application field attribute information for being used to obtain keyword Download PDFInfo
- Publication number
- CN103678356B CN103678356B CN201210335806.7A CN201210335806A CN103678356B CN 103678356 B CN103678356 B CN 103678356B CN 201210335806 A CN201210335806 A CN 201210335806A CN 103678356 B CN103678356 B CN 103678356B
- Authority
- CN
- China
- Prior art keywords
- keyword
- application field
- word
- value information
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Probability & Statistics with Applications (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The object of the present invention is to provide a kind of for obtaining the method, apparatus and equipment of the application field attribute information of keyword.Wherein, the present invention obtains the original application domain features value information of at least one keyword that each keyword set includes in pending multiple keyword sets first;Then according to the original application domain features value information of at least one keyword, statistic of classification processing is carried out to each keyword set, to obtain the application field distribution characteristics value information of each keyword set;Then according to the application field distribution characteristics value information of each keyword set, at least one first application field characteristic value information of each keyword is obtained;Statistical disposition then is carried out at least one first application field characteristic value information of each keyword, to obtain the second application field characteristic value information of each keyword.
Description
Technical field
The present invention relates to field of computer technology more particularly to a kind of application field attribute informations for being used to obtain keyword
Method, apparatus and equipment.
Background technology
In existing Web information issuance system, the application field that multiple keywords set by user are issued to information divides
Mostly by being accomplished manually or classifying using statistical sorting technique to multiple keywords, the first of multiple keywords is obtained
Beginning application field divides, and then issue user by bulk information is iterated calculating to the purchase relation of multiple keywords, most
The application field attribute information of multiple keywords is obtained afterwards.The prior art carries out key due to relying primarily on the subjective will of people
The division of word, so that the accuracy and objectivity of division result are relatively low, and since information issues user to belonging to its own
Domain attribute division is not accurate enough, issues user by bulk information and buys answering for Relation acquisition keyword to multiple keywords
It is relatively low with domain attribute information accuracy;Meanwhile with the continuous development of search technique, to the application field of keyword
The accuracy of division and the requirement of subdivision degree are also higher and higher, and existing dividing mode cannot meet high accuracy and high subdivision degree
Demand.
Therefore, how a kind of the method, apparatus and equipment of the application field attribute information for obtaining keyword are provided, so as to
The application field attribute information for exactly and efficiently obtaining keyword is reached, becomes one of current urgent problem.
The content of the invention
The object of the present invention is to provide a kind of method, apparatus and equipment of the application field attribute information for obtaining keyword.
According to an aspect of the invention, there is provided a kind of method for the application field attribute information for obtaining keyword, it should
Method comprises the following steps:
A obtains the initial of at least one keyword that each keyword set includes in pending multiple keyword sets
Application field characteristic value information, wherein, each keyword set includes multiple keywords;
B is according to the original application domain features value information of at least one keyword, to each keyword set
Statistic of classification processing is carried out, to obtain the application field distribution characteristics value information of each keyword set;
C obtains each keyword at least according to the application field distribution characteristics value information of each keyword set
One the first application field characteristic value information, wherein, at least one first application field characteristic value information corresponds to the keyword institute
The application field distribution characteristics value information of at least one keyword set belonged to;
D carries out statistical disposition at least one first application field characteristic value information of each keyword, to obtain
Second application field characteristic value information of each keyword;
Second application field characteristic value information of each keyword is special as the original application field of the keyword
Value indicative information repeats step b, c, d, until meeting predetermined stoppage condition;
Wherein, this method further includes:
W is when meeting the predetermined stoppage condition, according to the second application field characteristic value information of each keyword,
Obtain the application field attribute information of each keyword.
According to another aspect of the present invention, a kind of acquisition dress of application field attribute information for obtaining keyword is additionally provided
It puts, which includes:
Initial characteristic values acquisition device includes for obtaining each keyword set in pending multiple keyword sets
At least one keyword original application domain features value information, wherein, each keyword set include multiple keys
Word;
Application field distributed acquisition device, for being believed according to the original application domain features value of at least one keyword
Breath carries out statistic of classification processing, to obtain the application field of each keyword set point to each keyword set
Cloth characteristic value information;
The First Eigenvalue acquisition device, for being believed according to the application field distribution characteristics value of each keyword set
Breath obtains at least one first application field characteristic value information of each keyword, wherein, at least one first application field is special
Value indicative information corresponds to the application field distribution characteristics value information of at least one keyword set belonging to the keyword;
Second Eigenvalue acquisition device, for believing at least one first application field characteristic value of each keyword
Breath carries out statistical disposition, to obtain the second application field characteristic value information of each keyword;
Control device, when not meeting predetermined stoppage condition, by the second application field characteristic value of each keyword
Original application domain features value information of the information as the keyword, to control the application field distributed acquisition device, described
The First Eigenvalue acquisition device and the Second Eigenvalue acquisition device repeat corresponding operating, until meeting described predetermined stop
Only condition;
Wherein, which further includes:
Application field attribute acquisition device, for when meeting the predetermined stoppage condition, according to each keyword
The second application field characteristic value information, obtain the application field attribute information of each keyword.
Compared with prior art, the present invention has the following advantages:It is distributed by the application field to each keyword set
Characteristic value information is obtained, so as to obtain the first application neck that each keyword belongs to one or more keyword sets
Characteristic of field value information, and then the second application field characteristic value letter of each keyword can be obtained from the angle of multiple keyword sets
Breath by repeating above-mentioned steps, iterates to calculate to obtain the higher application field attribute letter of the accuracy of each keyword
Breath, realizes the accurate acquisition to the affiliated application field of magnanimity keyword so that obtaining result more has objectivity;Meanwhile
Meet the demand that existing search technique divides the application field that keyword more refines;Further, to being answered belonging to keyword
With the Accurate classification in field, can user be issued with tutorial message and establish rational keyword and the correspondence to release news, from
And effectively optimize the user's Strategy for information issuing.
Description of the drawings
By reading the detailed description made to non-limiting example made with reference to the following drawings, of the invention is other
Feature, objects and advantages will become more apparent upon:
Fig. 1 shows the acquisition device for being used to obtain the application field attribute information of keyword of one side according to the present invention
Structure diagram;
Fig. 2 show in accordance with a preferred embodiment of the present invention for obtaining each keyword and multiple application field attributes
The structure diagram of the acquisition device of the degree of correlation information of information;
Fig. 3 shows the method flow of the application field attribute information of acquisition keyword according to a further aspect of the present invention
Figure;
Fig. 4 shows each keyword of acquisition and multiple application field attribute informations in accordance with a preferred embodiment of the present invention
Degree of correlation information method flow diagram.
The same or similar reference numeral represents the same or similar component in attached drawing.
Specific embodiment
The present invention is described in further detail below in conjunction with the accompanying drawings.
Fig. 1 shows the acquisition device for being used to obtain the application field attribute information of keyword of one side according to the present invention
Structure diagram.The acquisition device of the present embodiment is contained in computer equipment;The acquisition device is obtained including initial characteristic values
Take device 1, application field distributed acquisition device 2, the First Eigenvalue acquisition device 3, Second Eigenvalue acquisition device 4, control dress
Put 5 and application field attribute acquisition device 6.
The computer equipment includes but not limited to the network equipment and user equipment.Wherein, the user equipment include but
It is not limited to PC machine etc.;The network equipment includes but not limited to the service of single network server, multiple network servers composition
Device group or the cloud being made of a large amount of computers or network server based on cloud computing (Cloud Computing), wherein, cloud meter
It is one kind of Distributed Calculation, a super virtual computer being made of the computer collection of a group loose couplings.Wherein, institute
It states user equipment and the network residing for the network equipment includes but not limited to internet, wide area network, Metropolitan Area Network (MAN), LAN, VPN network
Deng.
It should be noted that the user equipment and the network equipment are only for example, other are existing or are likely to occur from now on
User equipment, the network equipment or network be such as applicable to the present invention, should also be included within the scope of the present invention, and to draw
It is incorporated herein with mode.
First, initial characteristic values acquisition device 1 obtains each keyword set bag in pending multiple keyword sets
The original application domain features value information of at least one keyword included, wherein, each keyword set includes multiple passes
Keyword.
Wherein, application field refers to the field that the keyword is applied to, and includes but not limited to, industry etc..
Wherein, the characteristic value information includes but not limited to probabilistic information.
Specifically, multiple keywords that initial characteristic values acquisition device 1 includes each keyword set are default
It is inquired about in application field classification chart, to obtain the original application at least one keyword that each keyword set includes
Domain features value information.
Wherein, the default application field classification chart includes multiple keywords original application field corresponding with its
Characteristic value information can be obtained by modes such as language material training.Preferably, language material training method is included to preset each application
The keyword language material in field carries out application field attribute labeling, and keyword language material is segmented and the processing such as part-of-speech tagging,
Followed by sorting algorithm, for example, the Algorithm of documents categorization based on maximum entropy, keyword language material is trained, it is multiple to obtain
The corresponding original application domain features value information of keyword.
It should be noted that the above-mentioned examples are merely illustrative of the technical solutions of the present invention rather than the limit to the present invention
System obtains each keyword set in pending multiple keyword sets and includes it should be appreciated by those skilled in the art that any
At least one keyword original application domain features value information realization method, should be included in the scope of the present invention.
Then, at least one keyword that application field distributed acquisition device 2 is obtained according to initial characteristic values acquisition device 1
Original application domain features value information, statistic of classification processing is carried out to each keyword set, it is described each to obtain
The application field distribution characteristics value information of keyword set.
Specifically, application field distributed acquisition device 2 is according at least one key in each keyword set obtained
The original application domain features value information of word is pressed application to the original application domain features value information of at least one keyword and is led
Domain carries out application field characteristic value statistical disposition, to obtain the application field distribution characteristics value information of each keyword set.
In one example, when keyword set Unit1 includes keyword word1、word2、word6、word7, wherein,
word1Original application domain features value information to belong to application field T1Probability be Wa, word2Belong to application field T1's
Probabilistic information for Wb, then application field distributed acquisition device 2 carries out statistic of classification processing to Unit1, obtains the application of Unit1
Domain features Distribution value information belongs to application field T for Unit11Probability be P (T1| Unit1)=Wa/ (Wa+Wb), Unit1 category
In application field T2Probability be P (T2| Unit1)=Wb/ (Wa+Wb).
It should be noted that the above-mentioned examples are merely illustrative of the technical solutions of the present invention rather than the limit to the present invention
System, it should be appreciated by those skilled in the art that any original application domain features value information according at least one keyword,
Statistic of classification processing is carried out to each keyword set, it is special to obtain the distribution of the application field of each keyword set
The realization method of value indicative information, should be included in the scope of the present invention.
Then, each keyword set that the First Eigenvalue acquisition device 3 is obtained according to application field distributed acquisition device 2
Application field distribution characteristics value information, obtain at least one first application field characteristic value information of each keyword, wherein,
At least one first application field characteristic value information corresponds to the application field of at least one keyword set belonging to the keyword
Distribution characteristics value information.
Wherein, each keyword can belong to one or more keyword sets.
Specifically, the First Eigenvalue acquisition device 3 is distributed special according to the application field of each keyword set obtained
Value indicative information, each pass that application field distribution characteristics value information discretization to set of each keyword set is included
Keyword, to obtain at least one first application field characteristic value information of each keyword.
In one example, keyword set Unit1 includes keyword word1、word2、word6、word7, keyword set
Close Unit1 application field distribution characteristics value information be:P(T1| Unit1)=Wa/ (Wa+Wb), P (T2| Unit1)=Wb/ (Wa+
Wb);The First Eigenvalue acquisition device 3 is by the key in application field distribution characteristics value information discretization to set of Unit1
Word word1、word2、word6、word7, obtain:
Keyword word1、word2、word6、word7Belong to application field T1The first application field characteristic value information be P
(T1|word1)=P (T1|word2)=P (T1|word6)=P (T1|word7)=Wa/ (Wa+Wb),
And keyword word1、word2、word6、word7Belong to application field T2The first application field characteristic value information be
P(T2|word1)=P (T2|word2)=P (T2|word6)=P (T2|word7)=Wb/ (Wa+Wb);
In addition, keyword set Unit2 includes keyword word2、word3、word6、word7, keyword set
The application field distribution characteristics value information of Unit2 is:P(T2| Unit2)=Wb/ (Wb+Wc), P (T3| Unit2)=Wc/ (Wb+
Wc);The First Eigenvalue acquisition device 3 is by the key in application field distribution characteristics value information discretization to set of Unit2
Word word2、word3、word6、word7, obtain:
Keyword word2、word3、word6、word7Belong to application field T2The first application field characteristic value information be P
(T2|Word2)=P (T2|word3)=P (T2|word6)=P (T2|word7)=Wb/ (Wb+Wc);
Keyword word2、word3、word6、word7Belong to application field T3The first application field characteristic value information be P
(T3|Word2)=P (T3|word3)=P (T3|word6)=P (T3|word7)=Wc/ (Wb+Wc).
It should be noted that the above-mentioned examples are merely illustrative of the technical solutions of the present invention rather than the limit to the present invention
System, it should be appreciated by those skilled in the art that any application field distribution characteristics value information according to each keyword set,
The realization method of at least one first application field characteristic value information of each keyword is obtained, should be included in the model of the present invention
In enclosing.
Then, at least the one of each keyword that Second Eigenvalue acquisition device 4 obtains the First Eigenvalue acquisition device 3
A first application field characteristic value information carries out statistical disposition, to obtain the second application field characteristic value of each keyword
Information.
Specifically, statistical disposition is carried out at least one first application field characteristic value information of each keyword, to obtain
The method of the second application field characteristic value information of each keyword is taken to include but not limited to:
1) Second Eigenvalue acquisition device 4 is selected from least one first application field characteristic value information of each keyword
The maximum of the first application field characteristic value information is selected as the second application field characteristic value information;
In one example, example is connected, in keyword set Unit1, word2Belong to application field T2First application neck
Characteristic of field value information is P (T2|word2)=Wb/ (Wa+Wb), in keyword set Unit2, word2Belong to application field T2
The first application field characteristic value information be P (T2|Word2)=Wb/ (Wb+Wc), then from this two the first application field characteristic values
Maximum is selected in information as word2Belong to application field T2The second application field characteristic value information.
2) Second Eigenvalue acquisition device 4 is according to the following formula 1) at least one first application field of each keyword
Characteristic value information merges processing, to obtain the second application field characteristic value information of each keyword:
Wherein, TiRepresent a certain application field attribute information;
wordjRepresent a certain keyword;
N represents the sum of keyword set;
M represents the sum of application field attribute information;
P(Ti|wordj) represent keyword wordjBelong to application field TiThe second application field characteristic value information.
In one example, in keyword set Unit1, multiple each keyword word1、word2、word6、word7Belong to
Application field T1The first application field characteristic value information be P (T1|word1)=P (T1|word2)=P (T1|word6)=P (T1|
word7)=Wa/ (Wa+Wb), belongs to application field T2The first application field characteristic value information be P (T2|word1)=P (T2|
word2)=P (T2|word6)=P (T2|word7)=Wb/ (Wa+Wb);
In keyword set Unit2, multiple each keyword word2、word3、word6、word7Belong to application field T2's
First application field characteristic value information is P (T2|Word2)=P (T2|word3)=P (T2|word6)=P (T2|word7)=Wb/
(Wb+Wc), application field T is belonged to3The first application field characteristic value information be P (T3|Word2)=P (T3|word3)=P (T3|
word6)=P (T3|word7)=Wc/ (Wb+Wc);
Second Eigenvalue acquisition device 4 is to keyword word2Belong to application field T2Application field characteristic value according to above formula
2) processing is merged, is obtained
Obtain word2Belong to application field T2The second application field characteristic value information.
It should be noted that the above-mentioned examples are merely illustrative of the technical solutions of the present invention rather than the limit to the present invention
System, it should be appreciated by those skilled in the art that any at least one first application field characteristic value to each keyword is believed
Breath carries out statistical disposition, to obtain the realization method of the second application field characteristic value information of each keyword, should all wrap
Containing within the scope of the invention.
Then, when not meeting predetermined stoppage condition, control device 5 obtains Second Eigenvalue acquisition device 4 each
Original application domain features value information of the second application field characteristic value information of keyword as the keyword, to control application
Field distributed acquisition device 2, the First Eigenvalue acquisition device 3 and Second Eigenvalue acquisition device 4 repeat corresponding operating, directly
To meeting the predetermined stoppage condition
Wherein, the predetermined stoppage condition includes but not limited to:
1) when repeating number more than predetermined execution frequency threshold value, the stopping of control device 5 repeats operation;
2) when the corresponding application field attribute information of maximum in the second application field characteristic value information of keyword with
When the corresponding application field attribute information of original application domain features value information of the keyword is identical, control device 5 stops weight
Perform operation again.
It should be noted that the above-mentioned examples are merely illustrative of the technical solutions of the present invention rather than the limit to the present invention
System, it should be appreciated by those skilled in the art that any the second application field characteristic value information using each keyword is as this
The original application domain features value information of keyword repeats application field distributed acquisition device, the First Eigenvalue obtains dress
It puts and the operation of Second Eigenvalue acquisition device, the realization method up to meeting predetermined stoppage condition should be included in the present invention
In the range of.
When meeting the predetermined stoppage condition, application field attribute acquisition device 6 is according to the of each keyword
Two application field characteristic value informations obtain the application field attribute information of each keyword.
Specifically, application field attribute acquisition device 6 will be in the second application field characteristic value information of each keyword
Application field attribute information of the corresponding application field characteristic value information of maximum as the keyword, to obtain each keyword
Application field attribute information.
In one example, when control device 5, which repeats number 51, is more than predetermined execution frequency threshold value 50, control device
5 stop repeating operation;Application field attribute acquisition device 6 is according to the second application field characteristic value information of each keyword
In the corresponding application field characteristic value information of maximum, such as word2Belong to application field T1The second application field characteristic value letter
It ceases for 0.3, belongs to application field T2The second application field characteristic value information for 0.65, then application field attribute acquisition device 6
Using the corresponding application field attribute information of the maximum of the second application field characteristic value information as word2Application field attribute
Information obtains word2Application field attribute information be application field T2。
It should be noted that the above-mentioned examples are merely illustrative of the technical solutions of the present invention rather than the limit to the present invention
System, it should be appreciated by those skilled in the art that any when meeting the predetermined stoppage condition, according to the of each keyword
Two application field characteristic value informations obtain the realization method of the application field attribute information of each keyword, should all include
Within the scope of the invention.
It is obtained by the application field distribution characteristics value information to each keyword set, it is each so as to obtain
Keyword belongs to the first application field characteristic value information of one or more keyword sets, and then can be from multiple keyword sets
Angle obtain the second application field characteristic value information of each keyword, by repeating above-mentioned steps, iterate to calculate with
The higher application field attribute information of the accuracy of each keyword is obtained, is realized to the affiliated application field of magnanimity keyword
It is accurate to obtain so that obtaining result more has objectivity;Meanwhile meet what existing search technique more refined keyword
The demand of application field division;Further, to the Accurate classification of the affiliated application field of keyword, can user be issued with tutorial message
Rational keyword and the correspondence to release news are established, so as to effectively optimize the user's Strategy for information issuing.
One of preferred embodiment as this programme (with reference to Fig. 1), which further includes the first degree of correlation acquisition device
(not shown), the First Eigenvalue acquisition device include weighting device (not shown).
First degree of correlation acquisition device obtains each keyword to be believed respectively with the degree of correlation of multiple application field attribute informations
Breath.
Specifically, the mode of the first degree of correlation acquisition device acquisition degree of correlation information includes but not limited to:
1) inquire about to obtain from preset relevance data storehouse;Wherein, the preset relevance data storehouse includes more
The degree of correlation information of a keyword and its corresponding multiple application field attribute informations;The relevance data storehouse include but
It is not limited to relational database, Key-Value storage systems or file system etc..
In one example, the first degree of correlation acquisition device by each keyword respectively in preset relevance data storehouse into
Row matching inquiry, to obtain its corresponding degree of correlation information with multiple application field attribute informations of each keyword.
2) cutting word processing is carried out to each keyword, is cut at least one keyword for obtaining each keyword
Word segment;According at least one crucial word segmentation segment of each keyword, believe in preset multiple application field attributes
It ceases and is inquired about in corresponding application field cutting word storehouse, to obtain at least one keyword cutting word segment respectively multiple
Frequency of occurrences information in the corresponding application field cutting word storehouse of application field attribute information;Believed according to the frequency of occurrences
Breath, to obtain degree of correlation information of each keyword respectively with multiple application field attribute informations;Which will be shown in Fig. 2
It gives and is described in detail in embodiment.
The application field distribution for each keyword set that weighting device is obtained according to application field distributed acquisition device 2 is special
Value indicative information, and combine the degree of correlation information that the first degree of correlation acquisition device obtains and be weighted, it is described each to obtain
At least one first application field characteristic value information of keyword.
Specifically, weighting device, will according to the application field distribution characteristics value information of each keyword set obtained
Each keyword that application field distribution characteristics value information discretization to set of each keyword set includes, and combine
The degree of correlation information obtained is weighted, to obtain at least one first application field characteristic value of each keyword
Information.
In one example, keyword set Unit1 includes keyword word1、word2、word6、word7, keyword set
Close Unit1 application field distribution characteristics value information be:P(T1| Unit1)=Wa/ (Wa+Wb), P (T2| Unit1)=Wb/ (Wa+
), and word Wb1、word2、word6、word7Respectively with application field T1The degree of correlation be 0.7,0.1,0,0.2, word1、
word2、word6、word7Respectively with application field T2The degree of correlation be 0.1,0.8,0,0;
Weighting device is by the word in application field distribution characteristics value information discretization to set of Unit11、word2、
word6、word7, and tie degree of correlation information and be weighted, it obtains:
word1Belong to application field T1The first application field characteristic value information be P (T1|word1)=0.7 × Wa/ (Wa+
Wb),
word2Belong to application field T1The first application field characteristic value information be P (T1|word2)=0.1 × Wa/ (Wa+
Wb),
word6Belong to application field T1The first application field characteristic value information be P (T1|word6)=0,
word7Belong to application field T1The first application field characteristic value information be P (T1|word6)=0.2 × Wa/ (Wa+
Wb);
It obtains:
word1Belong to application field T2The first application field characteristic value information be P (T2|word1)=0.1 × Wb/ (Wa+
Wb),
word2Belong to application field T2The first application field characteristic value information be P (T2|word2)=0.8 × Wb/ (Wa+
Wb),
word6Belong to application field T2The first application field characteristic value information be P (T2|word6)=0,
word7Belong to application field T2The first application field characteristic value information be P (T2|word6)=0.
It should be noted that the above-mentioned examples are merely illustrative of the technical solutions of the present invention rather than the limit to the present invention
System, it should be appreciated by those skilled in the art that any each keyword of acquisition is related to multiple application field attribute informations respectively
Information is spent, according to the application field distribution characteristics value information of each keyword set, and first degree of correlation is combined and obtains dress
The degree of correlation information for putting acquisition is weighted, to obtain at least one first application field feature of each keyword
The realization method of value information, should be included in the scope of the present invention.
There is different correlations on the meaning of a word from different application domain attribute information in view of different keywords, it will be each
Degree of correlation information of the keyword with multiple application field attribute informations on the meaning of a word, which adds in, obtains the first application field characteristic value letter
During breath so that the meaning of a word belongs to the application field with the high a certain keyword of a certain application field attribute information degree of correlation
Weighted value is accordingly heightened, higher for final acquisition accuracy so as to obtain more accurate first application field characteristic value information
Application field attribute information provide strong guarantee.
One of preferred embodiment as the present embodiment, Fig. 2 show in accordance with a preferred embodiment of the present invention for obtaining
The structure diagram of each keyword and the acquisition device of the degree of correlation information of multiple application field attribute informations.Wherein, this is obtained
Device is taken to include initial characteristic values acquisition device 1, application field distributed acquisition device 2, the First Eigenvalue acquisition device 3, second
Characteristic value acquisition device 4, control device 5, application field attribute acquisition device 6, cutting device 7,8 and of frequency of occurrences acquisition device
Second degree of correlation acquisition device 9.
Wherein, initial characteristic values acquisition device 1, application field distributed acquisition device 2, the First Eigenvalue acquisition device 3,
Two characteristic value acquisition device 4, control device 5 and application field attribute acquisition device 6 give with reference to the embodiment shown in FIG. 1
It is described in detail, details are not described herein.
Cutting device 7 carries out cutting word processing to each keyword, to obtain at least one keyword of each keyword
Cutting word segment.
Here, the cutting word mode includes but not limited to Forward Maximum Method, and reversely maximum matching, two-way maximum matching,
Language model method, shortest path first etc..
Then, frequency of occurrences acquisition device 8 is according at least one keyword cutting word segment of each keyword obtained,
Inquired about in the preset corresponding application field cutting word storehouse of multiple application field attribute informations, come obtain this at least one
A keyword cutting word segment respectively in the corresponding application field cutting word storehouse of multiple application field attribute informations appearance frequency
Rate information;Wherein, the corresponding application field cutting word storehouse of the multiple application field attribute information includes each application neck
The preset cutting word segment in domain.
Then, the second degree of correlation acquisition device 9 is according to the frequency of occurrences information obtained, to obtain each keyword difference
With the degree of correlation information of multiple application field attribute informations.
Wherein, the mode of the second degree of correlation acquisition device 9 acquisition degree of correlation includes but not limited to:
1) according to predetermined acquisition rule, the second degree of correlation acquisition device 9 obtain each keyword respectively with multiple applications
The degree of correlation information of domain attribute information;For example, predetermined acquisition rule is for an application field attribute information, there is frequency
It is 0.8 that rate, which is more than predetermined first the keyword of threshold value occur with the degree of correlation of the application field attribute information, and the frequency of occurrences is located at
When between the 3rd threshold value of predetermined second threshold and reservation, the degree of correlation of keyword and the application field attribute information is 0.4, is occurred
The degree of correlation that frequency is less than the predetermined 4th keyword and the application field attribute information for threshold value occur is 0.
2) according to frequency of occurrences information, calculated by BM25 algorithms, the second degree of correlation acquisition device 9 is each to obtain
The keyword degree of correlation information with multiple application field attribute informations respectively.
Specifically, the mathematic(al) representation of BM25 algorithms is following formula 2) and 3)
Wherein, Score (D, Q) represents the degree of correlation information of a certain keyword word and a certain application field;
Qi represents keyword cutting word segment;
F (qi, D) represents appearance of the keyword cutting word segment qi in the corresponding application field cutting word storehouse of a certain application field
Frequency information;
| D | represent the cutting word segment sum in the corresponding application field cutting word storehouse of a certain application field;
Avgdl represents the cutting word segment sum in the corresponding application field cutting word storehouse of all application fields;
K1 and b represents to be used for the parameter for adjusting precision, it is preferable that k1=2, b=0.75;
N:Application field classification total quantity
n(qi):In the corresponding application field cutting word storehouse of all application fields comprising keyword cutting word segment qi should
With field categorical measure.
It should be noted that cutting device, frequency of occurrences acquisition device and the second degree of correlation acquisition device may be included in
Degree of correlation information is obtained in one degree of correlation acquisition device, also can be mutually independent with the first degree of correlation acquisition device, first degree of correlation
Acquisition device obtains the degree of correlation information obtained from the second degree of correlation acquisition device.
It should be noted that the above-mentioned examples are merely illustrative of the technical solutions of the present invention rather than the limit to the present invention
System, it should be appreciated by those skilled in the art that any carry out cutting word processing to each keyword, to obtain each key
At least one keyword cutting word segment of word;According at least one keyword cutting word segment of each keyword, preset
The corresponding application field cutting word storehouse of multiple application field attribute informations in inquired about, to obtain at least one key
The word cutting word segment frequency of occurrences information in the corresponding application field cutting word storehouse of multiple application field attribute informations respectively;
According to the frequency of occurrences information, come obtain each keyword respectively with the degree of correlation information of multiple application field attribute informations
Realization method should be included in the scope of the present invention.
Fig. 3 shows the method flow of the application field attribute information of acquisition keyword according to a further aspect of the present invention
Figure.The method of the present invention is mainly realized by computer equipment;Wherein, step S is included according to the method for this preferred embodiment
1st, step S2, step S3, step S4, step S5 and step S6.
The computer equipment includes but not limited to the network equipment and user equipment.Wherein, the user equipment include but
It is not limited to PC machine etc.;The network equipment includes but not limited to the service of single network server, multiple network servers composition
Device group or the cloud being made of a large amount of computers or network server based on cloud computing (Cloud Computing), wherein, cloud meter
It is one kind of Distributed Calculation, a super virtual computer being made of the computer collection of a group loose couplings.Wherein, institute
It states user equipment and the network residing for the network equipment includes but not limited to internet, wide area network, Metropolitan Area Network (MAN), LAN, VPN network
Deng.
It should be noted that the user equipment and the network equipment are only for example, other are existing or are likely to occur from now on
User equipment, the network equipment or network be such as applicable to the present invention, should also be included within the scope of the present invention, and to draw
It is incorporated herein with mode.
First, in step sl, computer equipment obtains each keyword set in pending multiple keyword sets
Including at least one keyword original application domain features value information, wherein, each keyword set include it is multiple
Keyword.
Wherein, application field refers to the field that the keyword is applied to, and includes but not limited to, industry etc..
Wherein, the characteristic value information includes but not limited to probabilistic information.
Specifically, in step sl, computer equipment is presetting multiple keywords that each keyword set includes
Application field classification chart in inquired about, to obtain initially should at least one keyword that each keyword set includes
With domain features value information.
Wherein, the default application field classification chart includes multiple keywords original application field corresponding with its
Characteristic value information can be obtained by modes such as language material training.Preferably, language material training method is included to preset each application
The keyword language material in field carries out application field attribute labeling, and keyword language material is segmented and the processing such as part-of-speech tagging,
Followed by sorting algorithm, for example, the Algorithm of documents categorization based on maximum entropy, keyword language material is trained, it is multiple to obtain
The corresponding original application domain features value information of keyword.
It should be noted that the above-mentioned examples are merely illustrative of the technical solutions of the present invention rather than the limit to the present invention
System obtains each keyword set in pending multiple keyword sets and includes it should be appreciated by those skilled in the art that any
At least one keyword original application domain features value information realization method, should be included in the scope of the present invention.
Then, in step s 2, computer equipment is according to the initial of its at least one keyword obtained in step sl
Application field characteristic value information carries out statistic of classification processing, to obtain each keyword to each keyword set
The application field distribution characteristics value information of set.
Specifically, in step s 2, computer equipment is according at least one key in each keyword set obtained
The original application domain features value information of word is pressed application to the original application domain features value information of at least one keyword and is led
Domain carries out application field characteristic value statistical disposition, to obtain the application field distribution characteristics value information of each keyword set.
In one example, when keyword set Unit1 includes keyword word1、word2、word6、word7, wherein,
word1Original application domain features value information to belong to application field T1Probability be Wa, word2Belong to application field T1's
Probabilistic information for Wb, then in step s 2, computer equipment carries out statistic of classification processing to Unit1, obtains the application of Unit1
Domain features Distribution value information belongs to application field T for Unit11Probability be P (T1| Unit1)=Wa/ (Wa+Wb), Unit1 category
In application field T2Probability be P (T2| Unit1)=Wb/ (Wa+Wb).
It should be noted that the above-mentioned examples are merely illustrative of the technical solutions of the present invention rather than the limit to the present invention
System, it should be appreciated by those skilled in the art that any original application domain features value information according at least one keyword,
Statistic of classification processing is carried out to each keyword set, it is special to obtain the distribution of the application field of each keyword set
The realization method of value indicative information, should be included in the scope of the present invention.
Then, in step s3, computer equipment is according to the application of its each keyword set obtained in step s 2
Field distribution characteristics value information obtains at least one first application field characteristic value information of each keyword, wherein, at least one
The application field distribution that a first application field characteristic value information corresponds at least one keyword set belonging to the keyword is special
Value indicative information.
Wherein, each keyword can belong to one or more keyword sets.
Specifically, in step s3, computer equipment is distributed according to the application field of each keyword set obtained
Characteristic value information includes application field distribution characteristics value information discretization to set of each keyword set each
Keyword, to obtain at least one first application field characteristic value information of each keyword.
In one example, keyword set Unit1 includes keyword word1、word2、word6、word7, keyword set
Close Unit1 application field distribution characteristics value information be:P(T1| Unit1)=Wa/ (Wa+Wb), P (T2| Unit1)=Wb/ (Wa+
Wb);In step s3, computer equipment is by the pass in application field distribution characteristics value information discretization to set of Unit1
Keyword word1、word2、word6、word7, obtain:
Keyword word1、word2、word6、word7Belong to application field T1The first application field characteristic value information be P
(T1|word1)=P (T1|word2)=P (T1|word6)=P (T1|word7)=Wa/ (Wa+Wb),
And keyword word1、word2、word6、word7Belong to application field T2The first application field characteristic value information be
P(T2|word1)=P (T2|word2)=P (T2|word6)=P (T2|word7)=Wb/ (Wa+Wb);
In addition, keyword set Unit2 includes keyword word2、word3、word6、word7, keyword set
The application field distribution characteristics value information of Unit2 is:P(T2| Unit2)=Wb/ (Wb+Wc), P (T3| Unit2)=Wc/ (Wb+
Wc);In step s3, computer equipment is by the pass in application field distribution characteristics value information discretization to set of Unit2
Keyword word2、word3、word6、word7, obtain:
Keyword word2、word3、word6、word7Belong to application field T2The first application field characteristic value information be P
(T2|Word2)=P (T2|word3)=P (T2|word6)=P (T2|word7)=Wb/ (Wb+Wc);
Keyword word2、word3、word6、word7Belong to application field T3The first application field characteristic value information be P
(T3|Word2)=P (T3|word3)=P (T3|word6)=P (T3|word7)=Wc/ (Wb+Wc).
It should be noted that the above-mentioned examples are merely illustrative of the technical solutions of the present invention rather than the limit to the present invention
System, it should be appreciated by those skilled in the art that any application field distribution characteristics value information according to each keyword set,
The realization method of at least one first application field characteristic value information of each keyword is obtained, should be included in the model of the present invention
In enclosing.
Then, in step s 4, at least one the of each keyword that computer equipment obtains it in step s3
One application field characteristic value information carries out statistical disposition, to obtain the second application field characteristic value of each keyword letter
Breath.
Specifically, statistical disposition is carried out at least one first application field characteristic value information of each keyword, to obtain
The method of the second application field characteristic value information of each keyword is taken to include but not limited to:
1) in step s 4, computer equipment is from least one first application field characteristic value information of each keyword
The maximum of the first application field characteristic value information is selected as the second application field characteristic value information;
In one example, example is connected, in keyword set Unit1, word2Belong to application field T2First application neck
Characteristic of field value information is P (T2|word2)=Wb/ (Wa+Wb), in keyword set Unit2, word2Belong to application field T2
The first application field characteristic value information be P (T2|Word2)=Wb/ (Wb+Wc), then from this two the first application field characteristic values
Maximum is selected in information as word2Belong to application field T2The second application field characteristic value information.
2) in step s 4,1) computer equipment is led at least one first application of each keyword according to the following formula
Characteristic of field value information merges processing, to obtain the second application field characteristic value information of each keyword:
Wherein, TiRepresent a certain application field attribute information;
wordjRepresent a certain keyword;
N represents the sum of keyword set;
M represents the sum of application field attribute information;
P(Ti|wordj) represent keyword wordjBelong to application field TiThe second application field characteristic value information.
In one example, in keyword set Unit1, multiple each keyword word1、word2、word6、word7Belong to
Application field T1The first application field characteristic value information be P (T1|word1)=P (T1|word2)=P (T1|word6)=P (T1|
word7)=Wa/ (Wa+Wb), belongs to application field T2The first application field characteristic value information be P (T2|word1)=P (T2|
word2)=P (T2|word6)=P (T2|word7)=Wb/ (Wa+Wb);
In keyword set Unit2, multiple each keyword word2、word3、word6、word7Belong to application field T2's
First application field characteristic value information is P (T2|Word2)=P (T2|word3)=P (T2|word6)=P (T2|word7)=Wb/
(Wb+Wc), application field T is belonged to3The first application field characteristic value information be P (T3|Word2)=P (T3|word3)=P (T3|
word6)=P (T3|word7)=Wc/ (Wb+Wc);
In step s 4, computer equipment is to keyword word2Belong to application field T2Application field characteristic value according to upper
Formula 2) processing is merged, it obtains
Obtain word2Belong to application field T2The second application field characteristic value information.
It should be noted that the above-mentioned examples are merely illustrative of the technical solutions of the present invention rather than the limit to the present invention
System, it should be appreciated by those skilled in the art that any at least one first application field characteristic value to each keyword is believed
Breath carries out statistical disposition, to obtain the realization method of the second application field characteristic value information of each keyword, should all wrap
Containing within the scope of the invention.
Then, when not meeting predetermined stoppage condition, in step s 5, computer equipment obtains it in step s 4
Original application domain features value information of the second application field characteristic value information of each keyword as the keyword, to calculate
Machine equipment repeats its corresponding operating in step S2, step S3 and step S4, until meeting the predetermined stoppage condition
Wherein, the predetermined stoppage condition includes but not limited to:
1) when repeating number more than predetermined execution frequency threshold value, in step s 5, computer equipment stops repeating to hold
Row operation;
2) when the corresponding application field attribute information of maximum in the second application field characteristic value information of keyword with
When the corresponding application field attribute information of original application domain features value information of the keyword is identical, in step s 5, calculate
Machine equipment stopping repeats operation.
It should be noted that the above-mentioned examples are merely illustrative of the technical solutions of the present invention rather than the limit to the present invention
System, it should be appreciated by those skilled in the art that any the second application field characteristic value information using each keyword is as this
The original application domain features value information of keyword, repeats the operation in step S2, step S3 and step S4, until full
The realization method of sufficient predetermined stoppage condition, should be included in the scope of the present invention.
When meeting the predetermined stoppage condition, in step s 6, computer equipment is according to the of each keyword
Two application field characteristic value informations obtain the application field attribute information of each keyword.
Specifically, in step s 6, computer equipment will be in the second application field characteristic value information of each keyword
Application field attribute information of the corresponding application field characteristic value information of maximum as the keyword, to obtain each keyword
Application field attribute information.
In one example, when computer equipment, which repeats number 51, is more than predetermined execution frequency threshold value 50, in step
Stop repeating operation in rapid S5;In step s 6, computer equipment is according to the second application field feature of each keyword
The corresponding application field characteristic value information of maximum in value information, such as word2Belong to application field T1The second application field it is special
Value indicative information is 0.3, belongs to application field T2The second application field characteristic value information for 0.65, then computer equipment is by second
The corresponding application field attribute information of maximum of application field characteristic value information is as word2Application field attribute information,
Obtain word2Application field attribute information be application field T2。
It should be noted that the above-mentioned examples are merely illustrative of the technical solutions of the present invention rather than the limit to the present invention
System, it should be appreciated by those skilled in the art that any when meeting the predetermined stoppage condition, according to the of each keyword
Two application field characteristic value informations obtain the realization method of the application field attribute information of each keyword, should all include
Within the scope of the invention.
It is obtained by the application field distribution characteristics value information to each keyword set, it is each so as to obtain
Keyword belongs to the first application field characteristic value information of one or more keyword sets, and then can be from multiple keyword sets
Angle obtain the second application field characteristic value information of each keyword, by repeating above-mentioned steps, iterate to calculate with
The higher application field attribute information of the accuracy of each keyword is obtained, is realized to the affiliated application field of magnanimity keyword
It is accurate to obtain so that obtaining result more has objectivity;Meanwhile meet what existing search technique more refined keyword
The demand of application field division;Further, to the Accurate classification of the affiliated application field of keyword, can user be issued with tutorial message
Rational keyword and the correspondence to release news are established, so as to effectively optimize the user's Strategy for information issuing.
One of preferred embodiment as this programme (with reference to Fig. 3), this method further includes step S10 (not shown), step S3
Including step S301 (not shown).
In step slo, it is related to multiple application field attribute informations respectively to obtain each keyword for computer equipment
Spend information.
Specifically, in step slo, the mode of computer equipment acquisition degree of correlation information includes but not limited to:
1) inquire about to obtain from preset relevance data storehouse;Wherein, the preset relevance data storehouse includes more
The degree of correlation information of a keyword and its corresponding multiple application field attribute informations;The relevance data storehouse include but
It is not limited to relational database, Key-Value storage systems or file system etc..
In one example, in step slo, computer equipment by each keyword respectively in preset relevance data storehouse
Middle carry out matching inquiry, to obtain its corresponding degree of correlation information with multiple application field attribute informations of each keyword.
2) cutting word processing is carried out to each keyword, is cut at least one keyword for obtaining each keyword
Word segment;According at least one crucial word segmentation segment of each keyword, believe in preset multiple application field attributes
It ceases and is inquired about in corresponding application field cutting word storehouse, to obtain at least one keyword cutting word segment respectively multiple
Frequency of occurrences information in the corresponding application field cutting word storehouse of application field attribute information;Believed according to the frequency of occurrences
Breath, to obtain degree of correlation information of each keyword respectively with multiple application field attribute informations;Which will be shown in Fig. 4
It gives and is described in detail in embodiment.
In step S301, computer equipment is according to the application field of its each keyword set obtained in step s 2
Distribution characteristics value information, and combine the degree of correlation information obtained and be weighted, to obtain each keyword extremely
A few first application field characteristic value information.
Specifically, in step S301, computer equipment is according to the application field of each keyword set obtained point
Cloth characteristic value information includes application field distribution characteristics value information discretization to set of each keyword set every
A keyword, and combine the degree of correlation information that has obtained and be weighted, to obtain at least one the of each keyword
One application field characteristic value information.
In one example, keyword set Unit1 includes keyword word1、word2、word6、word7, keyword set
Close Unit1 application field distribution characteristics value information be:P(T1| Unit1)=Wa/ (Wa+Wb), P (T2| Unit1)=Wb/ (Wa+
), and word Wb1、word2、word6、word7Respectively with application field T1The degree of correlation be 0.7,0.1,0,0.2, word1、
word2、word6、word7Respectively with application field T2The degree of correlation be 0.1,0.8,0,0;
In step S301, computer equipment is by the application field distribution characteristics value information discretization of Unit1 to the set
In word1、word2、word6、word7, and tie degree of correlation information and be weighted, it obtains:
word1Belong to application field T1The first application field characteristic value information be P (T1|word1)=0.7 × Wa/ (Wa+
Wb),
word2Belong to application field T1The first application field characteristic value information be P (T1|word2)=0.1 × Wa/ (Wa+
Wb),
word6Belong to application field T1The first application field characteristic value information be P (T1|word6)=0,
word7Belong to application field T1The first application field characteristic value information be P (T1|word6)=0.2 × Wa/ (Wa+
Wb);
It obtains:
word1Belong to application field T2The first application field characteristic value information be P (T2|word1)=0.1 × Wb/ (Wa+
Wb),
word2Belong to application field T2The first application field characteristic value information be P (T2|word2)=0.8 × Wb/ (Wa+
Wb),
word6Belong to application field T2The first application field characteristic value information be P (T2|word6)=0,
word7Belong to application field T2The first application field characteristic value information be P (T2|word6)=0.
It should be noted that the above-mentioned examples are merely illustrative of the technical solutions of the present invention rather than the limit to the present invention
System, it should be appreciated by those skilled in the art that any each keyword of acquisition is related to multiple application field attribute informations respectively
Spend information, according to the application field distribution characteristics value information of each keyword set, and with reference to the degree of correlation information into
Row weighted calculation, to obtain the realization method of at least one first application field characteristic value information of each keyword,
It should within the scope of the present invention.
There is different correlations on the meaning of a word from different application domain attribute information in view of different keywords, it will be each
Degree of correlation information of the keyword with multiple application field attribute informations on the meaning of a word, which adds in, obtains the first application field characteristic value letter
During breath so that the meaning of a word belongs to the application field with the high a certain keyword of a certain application field attribute information degree of correlation
Weighted value is accordingly heightened, higher for final acquisition accuracy so as to obtain more accurate first application field characteristic value information
Application field attribute information provide strong guarantee.
One of preferred embodiment as the present embodiment, Fig. 4 show in accordance with a preferred embodiment of the present invention for obtaining
Each keyword and the method flow diagram of the degree of correlation information of multiple application field attribute informations.Wherein, according to being originally preferably implemented
The method of example includes step S1, step S2, step S3, step S4, step S5, step S6, step S7, step S8 and step S9.
Wherein, step S1, step S2, step S3, step S4, step S5 and step S6 with reference to embodiment illustrated in fig. 3
In be described in detail, details are not described herein.
In the step s 7, computer equipment carries out cutting word processing to each keyword, to obtain each keyword extremely
A few keyword cutting word segment.
Here, the cutting word mode includes but not limited to Forward Maximum Method, and reversely maximum matching, two-way maximum matching,
Language model method, shortest path first etc..
Then, in step s 8, computer equipment is according at least one keyword cutting word of each keyword obtained
Segment is inquired about in the preset corresponding application field cutting word storehouse of multiple application field attribute informations, to obtain this
At least one keyword cutting word segment is respectively in the corresponding application field cutting word storehouse of multiple application field attribute informations
Frequency of occurrences information;Wherein, the corresponding application field cutting word storehouse of the multiple application field attribute information includes each
The preset cutting word segment of application field.
Then, in step s 9, computer equipment is according to the frequency of occurrences information obtained, to obtain each keyword point
Not with the degree of correlation information of multiple application field attribute informations.
Wherein, in step s 9, the mode of the computer equipment acquisition degree of correlation includes but not limited to:
1) according to predetermined acquisition rule, in step s 9, computer equipment obtains each keyword and is answered respectively with multiple
With the degree of correlation information of domain attribute information;For example, predetermined acquisition rule is for an application field attribute information, occur
It is 0.8 that frequency, which is more than predetermined first the keyword of threshold value occur with the degree of correlation of the application field attribute information, frequency of occurrences position
When between the 3rd threshold value of predetermined second threshold and reservation, the degree of correlation of keyword and the application field attribute information is 0.4, is gone out
The degree of correlation that existing frequency is less than the predetermined 4th keyword and the application field attribute information for threshold value occur is 0.
2) according to frequency of occurrences information, calculated by BM25 algorithms, in step s 9, computer equipment is every to obtain
A keyword degree of correlation information with multiple application field attribute informations respectively.
Specifically, the mathematic(al) representation of BM25 algorithms is following formula 2) and 3)
Wherein, Score (D, Q) represents the degree of correlation information of a certain keyword word and a certain application field;
Qi represents keyword cutting word segment;
F (qi, D) represents appearance of the keyword cutting word segment qi in the corresponding application field cutting word storehouse of a certain application field
Frequency information;
| D | represent the cutting word segment sum in the corresponding application field cutting word storehouse of a certain application field;
Avgdl represents the cutting word segment sum in the corresponding application field cutting word storehouse of all application fields;
K1 and b represents to be used for the parameter for adjusting precision, it is preferable that k1=2, b=O.75;
N:Application field classification total quantity
n(qi):In the corresponding application field cutting word storehouse of all application fields comprising keyword cutting word segment qi should
With field categorical measure.
It should be noted that step S7, step S8 and step S9 may be included in step S10 to obtain degree of correlation information,
Also can be independent with step S10 phases, computer equipment obtains the degree of correlation letter obtained in step s 9 from it in step slo
Breath.
It should be noted that the above-mentioned examples are merely illustrative of the technical solutions of the present invention rather than the limit to the present invention
System, it should be appreciated by those skilled in the art that any carry out cutting word processing to each keyword, to obtain each key
At least one keyword cutting word segment of word;According at least one keyword cutting word segment of each keyword, preset
The corresponding application field cutting word storehouse of multiple application field attribute informations in inquired about, to obtain at least one key
The word cutting word segment frequency of occurrences information in the corresponding application field cutting word storehouse of multiple application field attribute informations respectively;
According to the frequency of occurrences information, come obtain each keyword respectively with the degree of correlation information of multiple application field attribute informations
Realization method should be included in the scope of the present invention.
It should be noted that the present invention can be carried out in the assembly of software and/or software and hardware, for example, can adopt
With application-specific integrated circuit (ASIC) or any other realized similar to hardware device.In one embodiment, software of the invention
Program can perform to realize steps described above or function by processor.Similarly, software program of the invention is (including phase
The data structure of pass) it can be stored in computer readable recording medium storing program for performing, for example, RAM memory, magnetic or optical driver or soft
Disk and similar devices.In addition, the present invention some steps or function hardware can be used to realize, for example, as with processor
Coordinate the circuit for performing each step or function.
It is obvious to a person skilled in the art that the invention is not restricted to the details of above-mentioned exemplary embodiment, Er Qie
In the case of without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter
From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the present invention is by appended power
Profit requirement rather than above description limit, it is intended that all by what is fallen within the meaning and scope of the equivalent requirements of the claims
Variation includes within the present invention.Any reference numeral in claim should not be considered as to the involved claim of limitation.This
Outside, it is clear that one word of " comprising " is not excluded for other units or step, and odd number is not excluded for plural number.That is stated in device claim is multiple
Unit or device can also be realized by a unit or device by software or hardware.The first, the second grade words are used for table
Show title, and do not represent any particular order.
Claims (11)
1. a kind of for obtaining the method for the application field attribute information of keyword, this method comprises the following steps:
A obtains the original application of at least one keyword that each keyword set includes in pending multiple keyword sets
Domain features value information, wherein, each keyword set includes multiple keywords;
B carries out each keyword set according to the original application domain features value information of at least one keyword
Statistic of classification is handled, to obtain the application field distribution characteristics value information of each keyword set;
C obtains at least one of each keyword according to the application field distribution characteristics value information of each keyword set
First application field characteristic value information, wherein, at least one first application field characteristic value information is corresponded to belonging to the keyword
The application field distribution characteristics value information of at least one keyword set;
D carries out statistical disposition at least one first application field characteristic value information of each keyword, with described in acquisition
Second application field characteristic value information of each keyword;
Using the second application field characteristic value information of each keyword as the original application domain features value of the keyword
Information repeats step b, c, d, until meeting predetermined stoppage condition;
Wherein, this method further includes:
W, according to the second application field characteristic value information of each keyword, is obtained when meeting the predetermined stoppage condition
The application field attribute information of each keyword;
Wherein, the predetermined stoppage condition includes any one of following:
When repeating number more than predetermined execution frequency threshold value, computer equipment stopping repeats operation;
When the corresponding application field attribute information of maximum in the second application field characteristic value information of keyword and the key
When the corresponding application field attribute information of original application domain features value information of word is identical, computer equipment stopping repeats
Operation.
2. according to the method described in claim 1, wherein, the step a includes:
- inquire about multiple keywords that each keyword set includes in default application field classification chart,
To obtain the original application domain features value information at least one keyword that each keyword set includes.
3. method according to claim 1 or 2, wherein, at least one first application field of each keyword
Characteristic value information carries out statistical disposition also to be wrapped in a manner of the second application field characteristic value information for obtaining each keyword
It includes:
- processing is merged at least one first application field characteristic value information of each keyword according to the following formula, with
Obtain the second application field characteristic value information of each keyword:
<mrow>
<mi>P</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>T</mi>
<mi>i</mi>
</msub>
<mo>|</mo>
<msub>
<mi>word</mi>
<mi>j</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mrow>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>u</mi>
<mi>n</mi>
<mi>i</mi>
<mi>t</mi>
<mn>1</mn>
</mrow>
<mrow>
<mi>u</mi>
<mi>n</mi>
<mi>i</mi>
<mi>t</mi>
<mi>N</mi>
</mrow>
</munderover>
<mi>P</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>T</mi>
<mi>i</mi>
</msub>
<mo>|</mo>
<msub>
<mi>word</mi>
<mi>j</mi>
</msub>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>M</mi>
</munderover>
<mi>P</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>T</mi>
<mi>i</mi>
</msub>
<mo>|</mo>
<msub>
<mi>word</mi>
<mi>j</mi>
</msub>
<mo>)</mo>
</mrow>
</mrow>
</mfrac>
</mrow>
Wherein, TiRepresent a certain application field attribute information;
wordjRepresent a certain keyword;
N represents the sum of keyword set;
M represents the sum of application field attribute information;
Unit1 ... ... UnitN represent keyword set respectively;
P(Ti|wordj) represent keyword wordjBelong to application field TiThe second application field characteristic value information.
4. method according to claim 1 or 2, wherein, this method further includes:
- obtain each keyword degree of correlation information with multiple application field attribute informations respectively;
Wherein, the step c includes:
- carried out according to the application field distribution characteristics value information of each keyword set, and with reference to the degree of correlation information
Weighted calculation, to obtain at least one first application field characteristic value information of each keyword.
It is 5. described to obtain each keyword and multiple application field attribute informations according to the method described in claim 4, wherein
The method of degree of correlation information is further comprising the steps of:
- cutting word processing is carried out to each keyword, to obtain at least one keyword cutting word piece of each keyword
Section;
- according at least one keyword cutting word segment of each keyword, in preset multiple application field attribute informations
It is inquired about in corresponding application field cutting word storehouse, is answered respectively multiple to obtain at least one keyword cutting word segment
With the frequency of occurrences information in the corresponding application field cutting word storehouse of domain attribute information;
- according to the frequency of occurrences information, to obtain the degree of correlation of each keyword respectively with multiple application field attribute informations
Information.
6. a kind of for obtaining the acquisition device of the application field attribute information of keyword, which includes:
Initial characteristic values acquisition device, for obtain each keyword set in pending multiple keyword sets include to
The original application domain features value information of a few keyword, wherein, each keyword set includes multiple keywords;
Application field distributed acquisition device, for the original application domain features value information according at least one keyword,
Statistic of classification processing is carried out to each keyword set, it is special to obtain the distribution of the application field of each keyword set
Value indicative information;
The First Eigenvalue acquisition device for the application field distribution characteristics value information according to each keyword set, obtains
At least one first application field characteristic value information of each keyword is taken, wherein, at least one first application field characteristic value
Information corresponds to the application field distribution characteristics value information of at least one keyword set belonging to the keyword;
Second Eigenvalue acquisition device, at least one first application field characteristic value information to each keyword into
Row statistical disposition, to obtain the second application field characteristic value information of each keyword;
Control device, when not meeting predetermined stoppage condition, by the second application field characteristic value information of each keyword
As the original application domain features value information of the keyword, to control the application field distributed acquisition device, described first
Characteristic value acquisition device and the Second Eigenvalue acquisition device repeat corresponding operating, until meeting the predetermined stopping item
Part;
Wherein, which further includes:
Application field attribute acquisition device, for when meeting the predetermined stoppage condition, according to the of each keyword
Two application field characteristic value informations obtain the application field attribute information of each keyword;
Wherein, the predetermined stoppage condition includes any one of following:
When repeating number more than predetermined execution frequency threshold value, control device stopping repeats operation;
When the corresponding application field attribute information of maximum in the second application field characteristic value information of keyword and the key
When the corresponding application field attribute information of original application domain features value information of word is identical, control device stopping repeats behaviour
Make.
7. acquisition device according to claim 6, wherein, the initial characteristic values acquisition device is used for each pass
Multiple keywords that keyword set includes are inquired about in default application field classification chart, to obtain each key
The original application domain features value information at least one keyword that set of words includes.
8. the acquisition device according to claim 6 or 7, wherein, at least one first application to each keyword
Domain features value information carries out statistical disposition in a manner of the second application field characteristic value information for obtaining each keyword
It further includes:
- processing is merged at least one first application field characteristic value information of each keyword according to the following formula, with
Obtain the second application field characteristic value information of each keyword:
<mrow>
<mi>P</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>T</mi>
<mi>i</mi>
</msub>
<mo>|</mo>
<msub>
<mi>word</mi>
<mi>j</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mrow>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>u</mi>
<mi>n</mi>
<mi>i</mi>
<mi>t</mi>
<mn>1</mn>
</mrow>
<mrow>
<mi>u</mi>
<mi>n</mi>
<mi>i</mi>
<mi>t</mi>
<mi>N</mi>
</mrow>
</munderover>
<mi>P</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>T</mi>
<mi>i</mi>
</msub>
<mo>|</mo>
<msub>
<mi>word</mi>
<mi>j</mi>
</msub>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>M</mi>
</munderover>
<mi>P</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>T</mi>
<mi>i</mi>
</msub>
<mo>|</mo>
<msub>
<mi>word</mi>
<mi>j</mi>
</msub>
<mo>)</mo>
</mrow>
</mrow>
</mfrac>
</mrow>
Wherein, TiRepresent a certain application field attribute information;
wordjRepresent a certain keyword;
N represents the sum of keyword set;
M represents the sum of application field attribute information;
Unit1 ... ... UnitN represent keyword set respectively;
P(Ti|wordj) represent keyword wordjBelong to application field TiThe second application field characteristic value information.
9. the acquisition device according to claim 6 or 7, wherein, which further includes:
First degree of correlation acquisition device is believed for obtaining each keyword with the degree of correlation of multiple application field attribute informations respectively
Breath;
Wherein, the First Eigenvalue acquisition device includes:
Weighting device, for the application field distribution characteristics value information according to each keyword set, and with reference to the phase
Pass degree information is weighted, to obtain at least one first application field characteristic value information of each keyword.
10. acquisition device according to claim 9, wherein, which further includes:
Cutting device, for carrying out cutting word processing to each keyword, to obtain at least one of each keyword
Keyword cutting word segment;
Frequency of occurrences acquisition device, at least one keyword cutting word segment according to each keyword, preset
It is inquired about in multiple corresponding application field cutting word storehouses of application field attribute information, to obtain at least one keyword
The cutting word segment frequency of occurrences information in the corresponding application field cutting word storehouse of multiple application field attribute informations respectively;
Second degree of correlation acquisition device, for according to the frequency of occurrences information, being answered respectively with multiple to obtain each keyword
With the degree of correlation information of domain attribute information.
11. a kind of computer equipment, including the acquisition device as described at least one of in claim 6 to 10.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210335806.7A CN103678356B (en) | 2012-09-11 | 2012-09-11 | A kind of method, apparatus and equipment of the application field attribute information for being used to obtain keyword |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210335806.7A CN103678356B (en) | 2012-09-11 | 2012-09-11 | A kind of method, apparatus and equipment of the application field attribute information for being used to obtain keyword |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103678356A CN103678356A (en) | 2014-03-26 |
CN103678356B true CN103678356B (en) | 2018-05-25 |
Family
ID=50315949
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210335806.7A Active CN103678356B (en) | 2012-09-11 | 2012-09-11 | A kind of method, apparatus and equipment of the application field attribute information for being used to obtain keyword |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103678356B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018076243A1 (en) | 2016-10-27 | 2018-05-03 | 华为技术有限公司 | Search method and device |
CN107093099B (en) * | 2017-03-10 | 2020-10-30 | 重庆软易科技有限公司 | Network transaction system and method |
CN107193973B (en) * | 2017-05-25 | 2021-07-20 | 百度在线网络技术(北京)有限公司 | Method, device and equipment for identifying field of semantic analysis information and readable medium |
CN110019827B (en) * | 2017-08-24 | 2023-03-14 | 腾讯科技(北京)有限公司 | Corpus generation method, apparatus, device and computer storage medium |
CN109284392B (en) * | 2018-12-07 | 2021-04-06 | 达闼机器人有限公司 | Text classification method, device, terminal and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101021866A (en) * | 2007-03-13 | 2007-08-22 | 白云 | Method for criminating electronci file and relative degree with certain field and application thereof |
CN101770580A (en) * | 2009-01-04 | 2010-07-07 | 中国科学院计算技术研究所 | Training method and classification method of cross-field text sentiment classifier |
CN102682090A (en) * | 2012-04-26 | 2012-09-19 | 焦点科技股份有限公司 | System and method for matching and processing sensitive words on basis of polymerized word tree |
CN102722503A (en) * | 2011-03-31 | 2012-10-10 | 北京百度网讯科技有限公司 | Method and device for sequencing search results |
-
2012
- 2012-09-11 CN CN201210335806.7A patent/CN103678356B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101021866A (en) * | 2007-03-13 | 2007-08-22 | 白云 | Method for criminating electronci file and relative degree with certain field and application thereof |
CN101770580A (en) * | 2009-01-04 | 2010-07-07 | 中国科学院计算技术研究所 | Training method and classification method of cross-field text sentiment classifier |
CN102722503A (en) * | 2011-03-31 | 2012-10-10 | 北京百度网讯科技有限公司 | Method and device for sequencing search results |
CN102682090A (en) * | 2012-04-26 | 2012-09-19 | 焦点科技股份有限公司 | System and method for matching and processing sensitive words on basis of polymerized word tree |
Non-Patent Citations (1)
Title |
---|
"搜索引擎营销领域某关键词分析管理工具的测试";张琦;《中国优秀硕士学位论文全文数据库·信息科技辑》;20110315;I138-475 * |
Also Published As
Publication number | Publication date |
---|---|
CN103678356A (en) | 2014-03-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wei | A method for multiple attribute group decision making based on the ET-WG and ET-OWG operators with 2-tuple linguistic information | |
CN103678356B (en) | A kind of method, apparatus and equipment of the application field attribute information for being used to obtain keyword | |
CN109711925A (en) | Cross-domain recommending data processing method, cross-domain recommender system with multiple auxiliary domains | |
CN102262653B (en) | Label recommendation method and system based on user motivation orientation | |
CN104572797A (en) | Individual service recommendation system and method based on topic model | |
US20100328312A1 (en) | Personal music recommendation mapping | |
WO2022126901A1 (en) | Commodity recommendation method and related device thereof | |
WO2022142001A1 (en) | Target object evaluation method based on multi-score card fusion, and related device therefor | |
CN107357812A (en) | A kind of data query method and device | |
CN111651678B (en) | Personalized recommendation method based on knowledge graph | |
CN103530416A (en) | Project data forecasting grading library generating and project data pushing method and project data forecasting grading library generating and project data pushing system | |
Liu et al. | Multiple attribute group decision making methods based on some normal neutrosophic number Heronian Mean operators | |
WO2015101161A1 (en) | Method and device for generating user page corresponding to target system | |
US20080301111A1 (en) | Method and system for providing ranked search results | |
CN106919997A (en) | A kind of customer consumption Forecasting Methodology of the ecommerce based on LDA | |
Goyal et al. | Lossy conservative update (LCU) sketch: Succinct approximate count storage | |
Shuxian et al. | Design and implementation of movie recommendation system based on naive bayes | |
CN112100177A (en) | Data storage method and device, computer equipment and storage medium | |
Di Nunzio | Using scatterplots to understand and improve probabilistic models for text categorization and retrieval | |
Han et al. | Improving recommendation based on features’ co-occurrence effects in collaborative tagging systems | |
Pongnumkul et al. | Random walk-based recommendation with restart using social information and bayesian transition matrices | |
Liu et al. | Some intuitionistic linguistic dependent Bonferroni mean operators and application in group decision-making | |
CN107688979A (en) | Method and apparatus for providing credit reference information | |
Kim et al. | TrendsSummary: a platform for retrieving and summarizing trendy multimedia contents | |
Wang | Application of E-Commerce Recommendation Algorithm in Consumer Preference Prediction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |