CN103678356B - A kind of method, apparatus and equipment of the application field attribute information for being used to obtain keyword - Google Patents

A kind of method, apparatus and equipment of the application field attribute information for being used to obtain keyword Download PDF

Info

Publication number
CN103678356B
CN103678356B CN201210335806.7A CN201210335806A CN103678356B CN 103678356 B CN103678356 B CN 103678356B CN 201210335806 A CN201210335806 A CN 201210335806A CN 103678356 B CN103678356 B CN 103678356B
Authority
CN
China
Prior art keywords
keyword
application field
word
value information
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210335806.7A
Other languages
Chinese (zh)
Other versions
CN103678356A (en
Inventor
高徽
王平
郎文静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201210335806.7A priority Critical patent/CN103678356B/en
Publication of CN103678356A publication Critical patent/CN103678356A/en
Application granted granted Critical
Publication of CN103678356B publication Critical patent/CN103678356B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The object of the present invention is to provide a kind of for obtaining the method, apparatus and equipment of the application field attribute information of keyword.Wherein, the present invention obtains the original application domain features value information of at least one keyword that each keyword set includes in pending multiple keyword sets first;Then according to the original application domain features value information of at least one keyword, statistic of classification processing is carried out to each keyword set, to obtain the application field distribution characteristics value information of each keyword set;Then according to the application field distribution characteristics value information of each keyword set, at least one first application field characteristic value information of each keyword is obtained;Statistical disposition then is carried out at least one first application field characteristic value information of each keyword, to obtain the second application field characteristic value information of each keyword.

Description

It is a kind of for obtain keyword application field attribute information method, apparatus with Equipment
Technical field
The present invention relates to field of computer technology more particularly to a kind of application field attribute informations for being used to obtain keyword Method, apparatus and equipment.
Background technology
In existing Web information issuance system, the application field that multiple keywords set by user are issued to information divides Mostly by being accomplished manually or classifying using statistical sorting technique to multiple keywords, the first of multiple keywords is obtained Beginning application field divides, and then issue user by bulk information is iterated calculating to the purchase relation of multiple keywords, most The application field attribute information of multiple keywords is obtained afterwards.The prior art carries out key due to relying primarily on the subjective will of people The division of word, so that the accuracy and objectivity of division result are relatively low, and since information issues user to belonging to its own Domain attribute division is not accurate enough, issues user by bulk information and buys answering for Relation acquisition keyword to multiple keywords It is relatively low with domain attribute information accuracy;Meanwhile with the continuous development of search technique, to the application field of keyword The accuracy of division and the requirement of subdivision degree are also higher and higher, and existing dividing mode cannot meet high accuracy and high subdivision degree Demand.
Therefore, how a kind of the method, apparatus and equipment of the application field attribute information for obtaining keyword are provided, so as to The application field attribute information for exactly and efficiently obtaining keyword is reached, becomes one of current urgent problem.
The content of the invention
The object of the present invention is to provide a kind of method, apparatus and equipment of the application field attribute information for obtaining keyword.
According to an aspect of the invention, there is provided a kind of method for the application field attribute information for obtaining keyword, it should Method comprises the following steps:
A obtains the initial of at least one keyword that each keyword set includes in pending multiple keyword sets Application field characteristic value information, wherein, each keyword set includes multiple keywords;
B is according to the original application domain features value information of at least one keyword, to each keyword set Statistic of classification processing is carried out, to obtain the application field distribution characteristics value information of each keyword set;
C obtains each keyword at least according to the application field distribution characteristics value information of each keyword set One the first application field characteristic value information, wherein, at least one first application field characteristic value information corresponds to the keyword institute The application field distribution characteristics value information of at least one keyword set belonged to;
D carries out statistical disposition at least one first application field characteristic value information of each keyword, to obtain Second application field characteristic value information of each keyword;
Second application field characteristic value information of each keyword is special as the original application field of the keyword Value indicative information repeats step b, c, d, until meeting predetermined stoppage condition;
Wherein, this method further includes:
W is when meeting the predetermined stoppage condition, according to the second application field characteristic value information of each keyword, Obtain the application field attribute information of each keyword.
According to another aspect of the present invention, a kind of acquisition dress of application field attribute information for obtaining keyword is additionally provided It puts, which includes:
Initial characteristic values acquisition device includes for obtaining each keyword set in pending multiple keyword sets At least one keyword original application domain features value information, wherein, each keyword set include multiple keys Word;
Application field distributed acquisition device, for being believed according to the original application domain features value of at least one keyword Breath carries out statistic of classification processing, to obtain the application field of each keyword set point to each keyword set Cloth characteristic value information;
The First Eigenvalue acquisition device, for being believed according to the application field distribution characteristics value of each keyword set Breath obtains at least one first application field characteristic value information of each keyword, wherein, at least one first application field is special Value indicative information corresponds to the application field distribution characteristics value information of at least one keyword set belonging to the keyword;
Second Eigenvalue acquisition device, for believing at least one first application field characteristic value of each keyword Breath carries out statistical disposition, to obtain the second application field characteristic value information of each keyword;
Control device, when not meeting predetermined stoppage condition, by the second application field characteristic value of each keyword Original application domain features value information of the information as the keyword, to control the application field distributed acquisition device, described The First Eigenvalue acquisition device and the Second Eigenvalue acquisition device repeat corresponding operating, until meeting described predetermined stop Only condition;
Wherein, which further includes:
Application field attribute acquisition device, for when meeting the predetermined stoppage condition, according to each keyword The second application field characteristic value information, obtain the application field attribute information of each keyword.
Compared with prior art, the present invention has the following advantages:It is distributed by the application field to each keyword set Characteristic value information is obtained, so as to obtain the first application neck that each keyword belongs to one or more keyword sets Characteristic of field value information, and then the second application field characteristic value letter of each keyword can be obtained from the angle of multiple keyword sets Breath by repeating above-mentioned steps, iterates to calculate to obtain the higher application field attribute letter of the accuracy of each keyword Breath, realizes the accurate acquisition to the affiliated application field of magnanimity keyword so that obtaining result more has objectivity;Meanwhile Meet the demand that existing search technique divides the application field that keyword more refines;Further, to being answered belonging to keyword With the Accurate classification in field, can user be issued with tutorial message and establish rational keyword and the correspondence to release news, from And effectively optimize the user's Strategy for information issuing.
Description of the drawings
By reading the detailed description made to non-limiting example made with reference to the following drawings, of the invention is other Feature, objects and advantages will become more apparent upon:
Fig. 1 shows the acquisition device for being used to obtain the application field attribute information of keyword of one side according to the present invention Structure diagram;
Fig. 2 show in accordance with a preferred embodiment of the present invention for obtaining each keyword and multiple application field attributes The structure diagram of the acquisition device of the degree of correlation information of information;
Fig. 3 shows the method flow of the application field attribute information of acquisition keyword according to a further aspect of the present invention Figure;
Fig. 4 shows each keyword of acquisition and multiple application field attribute informations in accordance with a preferred embodiment of the present invention Degree of correlation information method flow diagram.
The same or similar reference numeral represents the same or similar component in attached drawing.
Specific embodiment
The present invention is described in further detail below in conjunction with the accompanying drawings.
Fig. 1 shows the acquisition device for being used to obtain the application field attribute information of keyword of one side according to the present invention Structure diagram.The acquisition device of the present embodiment is contained in computer equipment;The acquisition device is obtained including initial characteristic values Take device 1, application field distributed acquisition device 2, the First Eigenvalue acquisition device 3, Second Eigenvalue acquisition device 4, control dress Put 5 and application field attribute acquisition device 6.
The computer equipment includes but not limited to the network equipment and user equipment.Wherein, the user equipment include but It is not limited to PC machine etc.;The network equipment includes but not limited to the service of single network server, multiple network servers composition Device group or the cloud being made of a large amount of computers or network server based on cloud computing (Cloud Computing), wherein, cloud meter It is one kind of Distributed Calculation, a super virtual computer being made of the computer collection of a group loose couplings.Wherein, institute It states user equipment and the network residing for the network equipment includes but not limited to internet, wide area network, Metropolitan Area Network (MAN), LAN, VPN network Deng.
It should be noted that the user equipment and the network equipment are only for example, other are existing or are likely to occur from now on User equipment, the network equipment or network be such as applicable to the present invention, should also be included within the scope of the present invention, and to draw It is incorporated herein with mode.
First, initial characteristic values acquisition device 1 obtains each keyword set bag in pending multiple keyword sets The original application domain features value information of at least one keyword included, wherein, each keyword set includes multiple passes Keyword.
Wherein, application field refers to the field that the keyword is applied to, and includes but not limited to, industry etc..
Wherein, the characteristic value information includes but not limited to probabilistic information.
Specifically, multiple keywords that initial characteristic values acquisition device 1 includes each keyword set are default It is inquired about in application field classification chart, to obtain the original application at least one keyword that each keyword set includes Domain features value information.
Wherein, the default application field classification chart includes multiple keywords original application field corresponding with its Characteristic value information can be obtained by modes such as language material training.Preferably, language material training method is included to preset each application The keyword language material in field carries out application field attribute labeling, and keyword language material is segmented and the processing such as part-of-speech tagging, Followed by sorting algorithm, for example, the Algorithm of documents categorization based on maximum entropy, keyword language material is trained, it is multiple to obtain The corresponding original application domain features value information of keyword.
It should be noted that the above-mentioned examples are merely illustrative of the technical solutions of the present invention rather than the limit to the present invention System obtains each keyword set in pending multiple keyword sets and includes it should be appreciated by those skilled in the art that any At least one keyword original application domain features value information realization method, should be included in the scope of the present invention.
Then, at least one keyword that application field distributed acquisition device 2 is obtained according to initial characteristic values acquisition device 1 Original application domain features value information, statistic of classification processing is carried out to each keyword set, it is described each to obtain The application field distribution characteristics value information of keyword set.
Specifically, application field distributed acquisition device 2 is according at least one key in each keyword set obtained The original application domain features value information of word is pressed application to the original application domain features value information of at least one keyword and is led Domain carries out application field characteristic value statistical disposition, to obtain the application field distribution characteristics value information of each keyword set.
In one example, when keyword set Unit1 includes keyword word1、word2、word6、word7, wherein, word1Original application domain features value information to belong to application field T1Probability be Wa, word2Belong to application field T1's Probabilistic information for Wb, then application field distributed acquisition device 2 carries out statistic of classification processing to Unit1, obtains the application of Unit1 Domain features Distribution value information belongs to application field T for Unit11Probability be P (T1| Unit1)=Wa/ (Wa+Wb), Unit1 category In application field T2Probability be P (T2| Unit1)=Wb/ (Wa+Wb).
It should be noted that the above-mentioned examples are merely illustrative of the technical solutions of the present invention rather than the limit to the present invention System, it should be appreciated by those skilled in the art that any original application domain features value information according at least one keyword, Statistic of classification processing is carried out to each keyword set, it is special to obtain the distribution of the application field of each keyword set The realization method of value indicative information, should be included in the scope of the present invention.
Then, each keyword set that the First Eigenvalue acquisition device 3 is obtained according to application field distributed acquisition device 2 Application field distribution characteristics value information, obtain at least one first application field characteristic value information of each keyword, wherein, At least one first application field characteristic value information corresponds to the application field of at least one keyword set belonging to the keyword Distribution characteristics value information.
Wherein, each keyword can belong to one or more keyword sets.
Specifically, the First Eigenvalue acquisition device 3 is distributed special according to the application field of each keyword set obtained Value indicative information, each pass that application field distribution characteristics value information discretization to set of each keyword set is included Keyword, to obtain at least one first application field characteristic value information of each keyword.
In one example, keyword set Unit1 includes keyword word1、word2、word6、word7, keyword set Close Unit1 application field distribution characteristics value information be:P(T1| Unit1)=Wa/ (Wa+Wb), P (T2| Unit1)=Wb/ (Wa+ Wb);The First Eigenvalue acquisition device 3 is by the key in application field distribution characteristics value information discretization to set of Unit1 Word word1、word2、word6、word7, obtain:
Keyword word1、word2、word6、word7Belong to application field T1The first application field characteristic value information be P (T1|word1)=P (T1|word2)=P (T1|word6)=P (T1|word7)=Wa/ (Wa+Wb),
And keyword word1、word2、word6、word7Belong to application field T2The first application field characteristic value information be P(T2|word1)=P (T2|word2)=P (T2|word6)=P (T2|word7)=Wb/ (Wa+Wb);
In addition, keyword set Unit2 includes keyword word2、word3、word6、word7, keyword set The application field distribution characteristics value information of Unit2 is:P(T2| Unit2)=Wb/ (Wb+Wc), P (T3| Unit2)=Wc/ (Wb+ Wc);The First Eigenvalue acquisition device 3 is by the key in application field distribution characteristics value information discretization to set of Unit2 Word word2、word3、word6、word7, obtain:
Keyword word2、word3、word6、word7Belong to application field T2The first application field characteristic value information be P (T2|Word2)=P (T2|word3)=P (T2|word6)=P (T2|word7)=Wb/ (Wb+Wc);
Keyword word2、word3、word6、word7Belong to application field T3The first application field characteristic value information be P (T3|Word2)=P (T3|word3)=P (T3|word6)=P (T3|word7)=Wc/ (Wb+Wc).
It should be noted that the above-mentioned examples are merely illustrative of the technical solutions of the present invention rather than the limit to the present invention System, it should be appreciated by those skilled in the art that any application field distribution characteristics value information according to each keyword set, The realization method of at least one first application field characteristic value information of each keyword is obtained, should be included in the model of the present invention In enclosing.
Then, at least the one of each keyword that Second Eigenvalue acquisition device 4 obtains the First Eigenvalue acquisition device 3 A first application field characteristic value information carries out statistical disposition, to obtain the second application field characteristic value of each keyword Information.
Specifically, statistical disposition is carried out at least one first application field characteristic value information of each keyword, to obtain The method of the second application field characteristic value information of each keyword is taken to include but not limited to:
1) Second Eigenvalue acquisition device 4 is selected from least one first application field characteristic value information of each keyword The maximum of the first application field characteristic value information is selected as the second application field characteristic value information;
In one example, example is connected, in keyword set Unit1, word2Belong to application field T2First application neck Characteristic of field value information is P (T2|word2)=Wb/ (Wa+Wb), in keyword set Unit2, word2Belong to application field T2 The first application field characteristic value information be P (T2|Word2)=Wb/ (Wb+Wc), then from this two the first application field characteristic values Maximum is selected in information as word2Belong to application field T2The second application field characteristic value information.
2) Second Eigenvalue acquisition device 4 is according to the following formula 1) at least one first application field of each keyword Characteristic value information merges processing, to obtain the second application field characteristic value information of each keyword:
Wherein, TiRepresent a certain application field attribute information;
wordjRepresent a certain keyword;
N represents the sum of keyword set;
M represents the sum of application field attribute information;
P(Ti|wordj) represent keyword wordjBelong to application field TiThe second application field characteristic value information.
In one example, in keyword set Unit1, multiple each keyword word1、word2、word6、word7Belong to Application field T1The first application field characteristic value information be P (T1|word1)=P (T1|word2)=P (T1|word6)=P (T1| word7)=Wa/ (Wa+Wb), belongs to application field T2The first application field characteristic value information be P (T2|word1)=P (T2| word2)=P (T2|word6)=P (T2|word7)=Wb/ (Wa+Wb);
In keyword set Unit2, multiple each keyword word2、word3、word6、word7Belong to application field T2's First application field characteristic value information is P (T2|Word2)=P (T2|word3)=P (T2|word6)=P (T2|word7)=Wb/ (Wb+Wc), application field T is belonged to3The first application field characteristic value information be P (T3|Word2)=P (T3|word3)=P (T3| word6)=P (T3|word7)=Wc/ (Wb+Wc);
Second Eigenvalue acquisition device 4 is to keyword word2Belong to application field T2Application field characteristic value according to above formula 2) processing is merged, is obtained
Obtain word2Belong to application field T2The second application field characteristic value information.
It should be noted that the above-mentioned examples are merely illustrative of the technical solutions of the present invention rather than the limit to the present invention System, it should be appreciated by those skilled in the art that any at least one first application field characteristic value to each keyword is believed Breath carries out statistical disposition, to obtain the realization method of the second application field characteristic value information of each keyword, should all wrap Containing within the scope of the invention.
Then, when not meeting predetermined stoppage condition, control device 5 obtains Second Eigenvalue acquisition device 4 each Original application domain features value information of the second application field characteristic value information of keyword as the keyword, to control application Field distributed acquisition device 2, the First Eigenvalue acquisition device 3 and Second Eigenvalue acquisition device 4 repeat corresponding operating, directly To meeting the predetermined stoppage condition
Wherein, the predetermined stoppage condition includes but not limited to:
1) when repeating number more than predetermined execution frequency threshold value, the stopping of control device 5 repeats operation;
2) when the corresponding application field attribute information of maximum in the second application field characteristic value information of keyword with When the corresponding application field attribute information of original application domain features value information of the keyword is identical, control device 5 stops weight Perform operation again.
It should be noted that the above-mentioned examples are merely illustrative of the technical solutions of the present invention rather than the limit to the present invention System, it should be appreciated by those skilled in the art that any the second application field characteristic value information using each keyword is as this The original application domain features value information of keyword repeats application field distributed acquisition device, the First Eigenvalue obtains dress It puts and the operation of Second Eigenvalue acquisition device, the realization method up to meeting predetermined stoppage condition should be included in the present invention In the range of.
When meeting the predetermined stoppage condition, application field attribute acquisition device 6 is according to the of each keyword Two application field characteristic value informations obtain the application field attribute information of each keyword.
Specifically, application field attribute acquisition device 6 will be in the second application field characteristic value information of each keyword Application field attribute information of the corresponding application field characteristic value information of maximum as the keyword, to obtain each keyword Application field attribute information.
In one example, when control device 5, which repeats number 51, is more than predetermined execution frequency threshold value 50, control device 5 stop repeating operation;Application field attribute acquisition device 6 is according to the second application field characteristic value information of each keyword In the corresponding application field characteristic value information of maximum, such as word2Belong to application field T1The second application field characteristic value letter It ceases for 0.3, belongs to application field T2The second application field characteristic value information for 0.65, then application field attribute acquisition device 6 Using the corresponding application field attribute information of the maximum of the second application field characteristic value information as word2Application field attribute Information obtains word2Application field attribute information be application field T2
It should be noted that the above-mentioned examples are merely illustrative of the technical solutions of the present invention rather than the limit to the present invention System, it should be appreciated by those skilled in the art that any when meeting the predetermined stoppage condition, according to the of each keyword Two application field characteristic value informations obtain the realization method of the application field attribute information of each keyword, should all include Within the scope of the invention.
It is obtained by the application field distribution characteristics value information to each keyword set, it is each so as to obtain Keyword belongs to the first application field characteristic value information of one or more keyword sets, and then can be from multiple keyword sets Angle obtain the second application field characteristic value information of each keyword, by repeating above-mentioned steps, iterate to calculate with The higher application field attribute information of the accuracy of each keyword is obtained, is realized to the affiliated application field of magnanimity keyword It is accurate to obtain so that obtaining result more has objectivity;Meanwhile meet what existing search technique more refined keyword The demand of application field division;Further, to the Accurate classification of the affiliated application field of keyword, can user be issued with tutorial message Rational keyword and the correspondence to release news are established, so as to effectively optimize the user's Strategy for information issuing.
One of preferred embodiment as this programme (with reference to Fig. 1), which further includes the first degree of correlation acquisition device (not shown), the First Eigenvalue acquisition device include weighting device (not shown).
First degree of correlation acquisition device obtains each keyword to be believed respectively with the degree of correlation of multiple application field attribute informations Breath.
Specifically, the mode of the first degree of correlation acquisition device acquisition degree of correlation information includes but not limited to:
1) inquire about to obtain from preset relevance data storehouse;Wherein, the preset relevance data storehouse includes more The degree of correlation information of a keyword and its corresponding multiple application field attribute informations;The relevance data storehouse include but It is not limited to relational database, Key-Value storage systems or file system etc..
In one example, the first degree of correlation acquisition device by each keyword respectively in preset relevance data storehouse into Row matching inquiry, to obtain its corresponding degree of correlation information with multiple application field attribute informations of each keyword.
2) cutting word processing is carried out to each keyword, is cut at least one keyword for obtaining each keyword Word segment;According at least one crucial word segmentation segment of each keyword, believe in preset multiple application field attributes It ceases and is inquired about in corresponding application field cutting word storehouse, to obtain at least one keyword cutting word segment respectively multiple Frequency of occurrences information in the corresponding application field cutting word storehouse of application field attribute information;Believed according to the frequency of occurrences Breath, to obtain degree of correlation information of each keyword respectively with multiple application field attribute informations;Which will be shown in Fig. 2 It gives and is described in detail in embodiment.
The application field distribution for each keyword set that weighting device is obtained according to application field distributed acquisition device 2 is special Value indicative information, and combine the degree of correlation information that the first degree of correlation acquisition device obtains and be weighted, it is described each to obtain At least one first application field characteristic value information of keyword.
Specifically, weighting device, will according to the application field distribution characteristics value information of each keyword set obtained Each keyword that application field distribution characteristics value information discretization to set of each keyword set includes, and combine The degree of correlation information obtained is weighted, to obtain at least one first application field characteristic value of each keyword Information.
In one example, keyword set Unit1 includes keyword word1、word2、word6、word7, keyword set Close Unit1 application field distribution characteristics value information be:P(T1| Unit1)=Wa/ (Wa+Wb), P (T2| Unit1)=Wb/ (Wa+ ), and word Wb1、word2、word6、word7Respectively with application field T1The degree of correlation be 0.7,0.1,0,0.2, word1、 word2、word6、word7Respectively with application field T2The degree of correlation be 0.1,0.8,0,0;
Weighting device is by the word in application field distribution characteristics value information discretization to set of Unit11、word2、 word6、word7, and tie degree of correlation information and be weighted, it obtains:
word1Belong to application field T1The first application field characteristic value information be P (T1|word1)=0.7 × Wa/ (Wa+ Wb),
word2Belong to application field T1The first application field characteristic value information be P (T1|word2)=0.1 × Wa/ (Wa+ Wb),
word6Belong to application field T1The first application field characteristic value information be P (T1|word6)=0,
word7Belong to application field T1The first application field characteristic value information be P (T1|word6)=0.2 × Wa/ (Wa+ Wb);
It obtains:
word1Belong to application field T2The first application field characteristic value information be P (T2|word1)=0.1 × Wb/ (Wa+ Wb),
word2Belong to application field T2The first application field characteristic value information be P (T2|word2)=0.8 × Wb/ (Wa+ Wb),
word6Belong to application field T2The first application field characteristic value information be P (T2|word6)=0,
word7Belong to application field T2The first application field characteristic value information be P (T2|word6)=0.
It should be noted that the above-mentioned examples are merely illustrative of the technical solutions of the present invention rather than the limit to the present invention System, it should be appreciated by those skilled in the art that any each keyword of acquisition is related to multiple application field attribute informations respectively Information is spent, according to the application field distribution characteristics value information of each keyword set, and first degree of correlation is combined and obtains dress The degree of correlation information for putting acquisition is weighted, to obtain at least one first application field feature of each keyword The realization method of value information, should be included in the scope of the present invention.
There is different correlations on the meaning of a word from different application domain attribute information in view of different keywords, it will be each Degree of correlation information of the keyword with multiple application field attribute informations on the meaning of a word, which adds in, obtains the first application field characteristic value letter During breath so that the meaning of a word belongs to the application field with the high a certain keyword of a certain application field attribute information degree of correlation Weighted value is accordingly heightened, higher for final acquisition accuracy so as to obtain more accurate first application field characteristic value information Application field attribute information provide strong guarantee.
One of preferred embodiment as the present embodiment, Fig. 2 show in accordance with a preferred embodiment of the present invention for obtaining The structure diagram of each keyword and the acquisition device of the degree of correlation information of multiple application field attribute informations.Wherein, this is obtained Device is taken to include initial characteristic values acquisition device 1, application field distributed acquisition device 2, the First Eigenvalue acquisition device 3, second Characteristic value acquisition device 4, control device 5, application field attribute acquisition device 6, cutting device 7,8 and of frequency of occurrences acquisition device Second degree of correlation acquisition device 9.
Wherein, initial characteristic values acquisition device 1, application field distributed acquisition device 2, the First Eigenvalue acquisition device 3, Two characteristic value acquisition device 4, control device 5 and application field attribute acquisition device 6 give with reference to the embodiment shown in FIG. 1 It is described in detail, details are not described herein.
Cutting device 7 carries out cutting word processing to each keyword, to obtain at least one keyword of each keyword Cutting word segment.
Here, the cutting word mode includes but not limited to Forward Maximum Method, and reversely maximum matching, two-way maximum matching, Language model method, shortest path first etc..
Then, frequency of occurrences acquisition device 8 is according at least one keyword cutting word segment of each keyword obtained, Inquired about in the preset corresponding application field cutting word storehouse of multiple application field attribute informations, come obtain this at least one A keyword cutting word segment respectively in the corresponding application field cutting word storehouse of multiple application field attribute informations appearance frequency Rate information;Wherein, the corresponding application field cutting word storehouse of the multiple application field attribute information includes each application neck The preset cutting word segment in domain.
Then, the second degree of correlation acquisition device 9 is according to the frequency of occurrences information obtained, to obtain each keyword difference With the degree of correlation information of multiple application field attribute informations.
Wherein, the mode of the second degree of correlation acquisition device 9 acquisition degree of correlation includes but not limited to:
1) according to predetermined acquisition rule, the second degree of correlation acquisition device 9 obtain each keyword respectively with multiple applications The degree of correlation information of domain attribute information;For example, predetermined acquisition rule is for an application field attribute information, there is frequency It is 0.8 that rate, which is more than predetermined first the keyword of threshold value occur with the degree of correlation of the application field attribute information, and the frequency of occurrences is located at When between the 3rd threshold value of predetermined second threshold and reservation, the degree of correlation of keyword and the application field attribute information is 0.4, is occurred The degree of correlation that frequency is less than the predetermined 4th keyword and the application field attribute information for threshold value occur is 0.
2) according to frequency of occurrences information, calculated by BM25 algorithms, the second degree of correlation acquisition device 9 is each to obtain The keyword degree of correlation information with multiple application field attribute informations respectively.
Specifically, the mathematic(al) representation of BM25 algorithms is following formula 2) and 3)
Wherein, Score (D, Q) represents the degree of correlation information of a certain keyword word and a certain application field;
Qi represents keyword cutting word segment;
F (qi, D) represents appearance of the keyword cutting word segment qi in the corresponding application field cutting word storehouse of a certain application field Frequency information;
| D | represent the cutting word segment sum in the corresponding application field cutting word storehouse of a certain application field;
Avgdl represents the cutting word segment sum in the corresponding application field cutting word storehouse of all application fields;
K1 and b represents to be used for the parameter for adjusting precision, it is preferable that k1=2, b=0.75;
N:Application field classification total quantity
n(qi):In the corresponding application field cutting word storehouse of all application fields comprising keyword cutting word segment qi should With field categorical measure.
It should be noted that cutting device, frequency of occurrences acquisition device and the second degree of correlation acquisition device may be included in Degree of correlation information is obtained in one degree of correlation acquisition device, also can be mutually independent with the first degree of correlation acquisition device, first degree of correlation Acquisition device obtains the degree of correlation information obtained from the second degree of correlation acquisition device.
It should be noted that the above-mentioned examples are merely illustrative of the technical solutions of the present invention rather than the limit to the present invention System, it should be appreciated by those skilled in the art that any carry out cutting word processing to each keyword, to obtain each key At least one keyword cutting word segment of word;According at least one keyword cutting word segment of each keyword, preset The corresponding application field cutting word storehouse of multiple application field attribute informations in inquired about, to obtain at least one key The word cutting word segment frequency of occurrences information in the corresponding application field cutting word storehouse of multiple application field attribute informations respectively; According to the frequency of occurrences information, come obtain each keyword respectively with the degree of correlation information of multiple application field attribute informations Realization method should be included in the scope of the present invention.
Fig. 3 shows the method flow of the application field attribute information of acquisition keyword according to a further aspect of the present invention Figure.The method of the present invention is mainly realized by computer equipment;Wherein, step S is included according to the method for this preferred embodiment 1st, step S2, step S3, step S4, step S5 and step S6.
The computer equipment includes but not limited to the network equipment and user equipment.Wherein, the user equipment include but It is not limited to PC machine etc.;The network equipment includes but not limited to the service of single network server, multiple network servers composition Device group or the cloud being made of a large amount of computers or network server based on cloud computing (Cloud Computing), wherein, cloud meter It is one kind of Distributed Calculation, a super virtual computer being made of the computer collection of a group loose couplings.Wherein, institute It states user equipment and the network residing for the network equipment includes but not limited to internet, wide area network, Metropolitan Area Network (MAN), LAN, VPN network Deng.
It should be noted that the user equipment and the network equipment are only for example, other are existing or are likely to occur from now on User equipment, the network equipment or network be such as applicable to the present invention, should also be included within the scope of the present invention, and to draw It is incorporated herein with mode.
First, in step sl, computer equipment obtains each keyword set in pending multiple keyword sets Including at least one keyword original application domain features value information, wherein, each keyword set include it is multiple Keyword.
Wherein, application field refers to the field that the keyword is applied to, and includes but not limited to, industry etc..
Wherein, the characteristic value information includes but not limited to probabilistic information.
Specifically, in step sl, computer equipment is presetting multiple keywords that each keyword set includes Application field classification chart in inquired about, to obtain initially should at least one keyword that each keyword set includes With domain features value information.
Wherein, the default application field classification chart includes multiple keywords original application field corresponding with its Characteristic value information can be obtained by modes such as language material training.Preferably, language material training method is included to preset each application The keyword language material in field carries out application field attribute labeling, and keyword language material is segmented and the processing such as part-of-speech tagging, Followed by sorting algorithm, for example, the Algorithm of documents categorization based on maximum entropy, keyword language material is trained, it is multiple to obtain The corresponding original application domain features value information of keyword.
It should be noted that the above-mentioned examples are merely illustrative of the technical solutions of the present invention rather than the limit to the present invention System obtains each keyword set in pending multiple keyword sets and includes it should be appreciated by those skilled in the art that any At least one keyword original application domain features value information realization method, should be included in the scope of the present invention.
Then, in step s 2, computer equipment is according to the initial of its at least one keyword obtained in step sl Application field characteristic value information carries out statistic of classification processing, to obtain each keyword to each keyword set The application field distribution characteristics value information of set.
Specifically, in step s 2, computer equipment is according at least one key in each keyword set obtained The original application domain features value information of word is pressed application to the original application domain features value information of at least one keyword and is led Domain carries out application field characteristic value statistical disposition, to obtain the application field distribution characteristics value information of each keyword set.
In one example, when keyword set Unit1 includes keyword word1、word2、word6、word7, wherein, word1Original application domain features value information to belong to application field T1Probability be Wa, word2Belong to application field T1's Probabilistic information for Wb, then in step s 2, computer equipment carries out statistic of classification processing to Unit1, obtains the application of Unit1 Domain features Distribution value information belongs to application field T for Unit11Probability be P (T1| Unit1)=Wa/ (Wa+Wb), Unit1 category In application field T2Probability be P (T2| Unit1)=Wb/ (Wa+Wb).
It should be noted that the above-mentioned examples are merely illustrative of the technical solutions of the present invention rather than the limit to the present invention System, it should be appreciated by those skilled in the art that any original application domain features value information according at least one keyword, Statistic of classification processing is carried out to each keyword set, it is special to obtain the distribution of the application field of each keyword set The realization method of value indicative information, should be included in the scope of the present invention.
Then, in step s3, computer equipment is according to the application of its each keyword set obtained in step s 2 Field distribution characteristics value information obtains at least one first application field characteristic value information of each keyword, wherein, at least one The application field distribution that a first application field characteristic value information corresponds at least one keyword set belonging to the keyword is special Value indicative information.
Wherein, each keyword can belong to one or more keyword sets.
Specifically, in step s3, computer equipment is distributed according to the application field of each keyword set obtained Characteristic value information includes application field distribution characteristics value information discretization to set of each keyword set each Keyword, to obtain at least one first application field characteristic value information of each keyword.
In one example, keyword set Unit1 includes keyword word1、word2、word6、word7, keyword set Close Unit1 application field distribution characteristics value information be:P(T1| Unit1)=Wa/ (Wa+Wb), P (T2| Unit1)=Wb/ (Wa+ Wb);In step s3, computer equipment is by the pass in application field distribution characteristics value information discretization to set of Unit1 Keyword word1、word2、word6、word7, obtain:
Keyword word1、word2、word6、word7Belong to application field T1The first application field characteristic value information be P (T1|word1)=P (T1|word2)=P (T1|word6)=P (T1|word7)=Wa/ (Wa+Wb),
And keyword word1、word2、word6、word7Belong to application field T2The first application field characteristic value information be P(T2|word1)=P (T2|word2)=P (T2|word6)=P (T2|word7)=Wb/ (Wa+Wb);
In addition, keyword set Unit2 includes keyword word2、word3、word6、word7, keyword set The application field distribution characteristics value information of Unit2 is:P(T2| Unit2)=Wb/ (Wb+Wc), P (T3| Unit2)=Wc/ (Wb+ Wc);In step s3, computer equipment is by the pass in application field distribution characteristics value information discretization to set of Unit2 Keyword word2、word3、word6、word7, obtain:
Keyword word2、word3、word6、word7Belong to application field T2The first application field characteristic value information be P (T2|Word2)=P (T2|word3)=P (T2|word6)=P (T2|word7)=Wb/ (Wb+Wc);
Keyword word2、word3、word6、word7Belong to application field T3The first application field characteristic value information be P (T3|Word2)=P (T3|word3)=P (T3|word6)=P (T3|word7)=Wc/ (Wb+Wc).
It should be noted that the above-mentioned examples are merely illustrative of the technical solutions of the present invention rather than the limit to the present invention System, it should be appreciated by those skilled in the art that any application field distribution characteristics value information according to each keyword set, The realization method of at least one first application field characteristic value information of each keyword is obtained, should be included in the model of the present invention In enclosing.
Then, in step s 4, at least one the of each keyword that computer equipment obtains it in step s3 One application field characteristic value information carries out statistical disposition, to obtain the second application field characteristic value of each keyword letter Breath.
Specifically, statistical disposition is carried out at least one first application field characteristic value information of each keyword, to obtain The method of the second application field characteristic value information of each keyword is taken to include but not limited to:
1) in step s 4, computer equipment is from least one first application field characteristic value information of each keyword The maximum of the first application field characteristic value information is selected as the second application field characteristic value information;
In one example, example is connected, in keyword set Unit1, word2Belong to application field T2First application neck Characteristic of field value information is P (T2|word2)=Wb/ (Wa+Wb), in keyword set Unit2, word2Belong to application field T2 The first application field characteristic value information be P (T2|Word2)=Wb/ (Wb+Wc), then from this two the first application field characteristic values Maximum is selected in information as word2Belong to application field T2The second application field characteristic value information.
2) in step s 4,1) computer equipment is led at least one first application of each keyword according to the following formula Characteristic of field value information merges processing, to obtain the second application field characteristic value information of each keyword:
Wherein, TiRepresent a certain application field attribute information;
wordjRepresent a certain keyword;
N represents the sum of keyword set;
M represents the sum of application field attribute information;
P(Ti|wordj) represent keyword wordjBelong to application field TiThe second application field characteristic value information.
In one example, in keyword set Unit1, multiple each keyword word1、word2、word6、word7Belong to Application field T1The first application field characteristic value information be P (T1|word1)=P (T1|word2)=P (T1|word6)=P (T1| word7)=Wa/ (Wa+Wb), belongs to application field T2The first application field characteristic value information be P (T2|word1)=P (T2| word2)=P (T2|word6)=P (T2|word7)=Wb/ (Wa+Wb);
In keyword set Unit2, multiple each keyword word2、word3、word6、word7Belong to application field T2's First application field characteristic value information is P (T2|Word2)=P (T2|word3)=P (T2|word6)=P (T2|word7)=Wb/ (Wb+Wc), application field T is belonged to3The first application field characteristic value information be P (T3|Word2)=P (T3|word3)=P (T3| word6)=P (T3|word7)=Wc/ (Wb+Wc);
In step s 4, computer equipment is to keyword word2Belong to application field T2Application field characteristic value according to upper Formula 2) processing is merged, it obtains
Obtain word2Belong to application field T2The second application field characteristic value information.
It should be noted that the above-mentioned examples are merely illustrative of the technical solutions of the present invention rather than the limit to the present invention System, it should be appreciated by those skilled in the art that any at least one first application field characteristic value to each keyword is believed Breath carries out statistical disposition, to obtain the realization method of the second application field characteristic value information of each keyword, should all wrap Containing within the scope of the invention.
Then, when not meeting predetermined stoppage condition, in step s 5, computer equipment obtains it in step s 4 Original application domain features value information of the second application field characteristic value information of each keyword as the keyword, to calculate Machine equipment repeats its corresponding operating in step S2, step S3 and step S4, until meeting the predetermined stoppage condition
Wherein, the predetermined stoppage condition includes but not limited to:
1) when repeating number more than predetermined execution frequency threshold value, in step s 5, computer equipment stops repeating to hold Row operation;
2) when the corresponding application field attribute information of maximum in the second application field characteristic value information of keyword with When the corresponding application field attribute information of original application domain features value information of the keyword is identical, in step s 5, calculate Machine equipment stopping repeats operation.
It should be noted that the above-mentioned examples are merely illustrative of the technical solutions of the present invention rather than the limit to the present invention System, it should be appreciated by those skilled in the art that any the second application field characteristic value information using each keyword is as this The original application domain features value information of keyword, repeats the operation in step S2, step S3 and step S4, until full The realization method of sufficient predetermined stoppage condition, should be included in the scope of the present invention.
When meeting the predetermined stoppage condition, in step s 6, computer equipment is according to the of each keyword Two application field characteristic value informations obtain the application field attribute information of each keyword.
Specifically, in step s 6, computer equipment will be in the second application field characteristic value information of each keyword Application field attribute information of the corresponding application field characteristic value information of maximum as the keyword, to obtain each keyword Application field attribute information.
In one example, when computer equipment, which repeats number 51, is more than predetermined execution frequency threshold value 50, in step Stop repeating operation in rapid S5;In step s 6, computer equipment is according to the second application field feature of each keyword The corresponding application field characteristic value information of maximum in value information, such as word2Belong to application field T1The second application field it is special Value indicative information is 0.3, belongs to application field T2The second application field characteristic value information for 0.65, then computer equipment is by second The corresponding application field attribute information of maximum of application field characteristic value information is as word2Application field attribute information, Obtain word2Application field attribute information be application field T2
It should be noted that the above-mentioned examples are merely illustrative of the technical solutions of the present invention rather than the limit to the present invention System, it should be appreciated by those skilled in the art that any when meeting the predetermined stoppage condition, according to the of each keyword Two application field characteristic value informations obtain the realization method of the application field attribute information of each keyword, should all include Within the scope of the invention.
It is obtained by the application field distribution characteristics value information to each keyword set, it is each so as to obtain Keyword belongs to the first application field characteristic value information of one or more keyword sets, and then can be from multiple keyword sets Angle obtain the second application field characteristic value information of each keyword, by repeating above-mentioned steps, iterate to calculate with The higher application field attribute information of the accuracy of each keyword is obtained, is realized to the affiliated application field of magnanimity keyword It is accurate to obtain so that obtaining result more has objectivity;Meanwhile meet what existing search technique more refined keyword The demand of application field division;Further, to the Accurate classification of the affiliated application field of keyword, can user be issued with tutorial message Rational keyword and the correspondence to release news are established, so as to effectively optimize the user's Strategy for information issuing.
One of preferred embodiment as this programme (with reference to Fig. 3), this method further includes step S10 (not shown), step S3 Including step S301 (not shown).
In step slo, it is related to multiple application field attribute informations respectively to obtain each keyword for computer equipment Spend information.
Specifically, in step slo, the mode of computer equipment acquisition degree of correlation information includes but not limited to:
1) inquire about to obtain from preset relevance data storehouse;Wherein, the preset relevance data storehouse includes more The degree of correlation information of a keyword and its corresponding multiple application field attribute informations;The relevance data storehouse include but It is not limited to relational database, Key-Value storage systems or file system etc..
In one example, in step slo, computer equipment by each keyword respectively in preset relevance data storehouse Middle carry out matching inquiry, to obtain its corresponding degree of correlation information with multiple application field attribute informations of each keyword.
2) cutting word processing is carried out to each keyword, is cut at least one keyword for obtaining each keyword Word segment;According at least one crucial word segmentation segment of each keyword, believe in preset multiple application field attributes It ceases and is inquired about in corresponding application field cutting word storehouse, to obtain at least one keyword cutting word segment respectively multiple Frequency of occurrences information in the corresponding application field cutting word storehouse of application field attribute information;Believed according to the frequency of occurrences Breath, to obtain degree of correlation information of each keyword respectively with multiple application field attribute informations;Which will be shown in Fig. 4 It gives and is described in detail in embodiment.
In step S301, computer equipment is according to the application field of its each keyword set obtained in step s 2 Distribution characteristics value information, and combine the degree of correlation information obtained and be weighted, to obtain each keyword extremely A few first application field characteristic value information.
Specifically, in step S301, computer equipment is according to the application field of each keyword set obtained point Cloth characteristic value information includes application field distribution characteristics value information discretization to set of each keyword set every A keyword, and combine the degree of correlation information that has obtained and be weighted, to obtain at least one the of each keyword One application field characteristic value information.
In one example, keyword set Unit1 includes keyword word1、word2、word6、word7, keyword set Close Unit1 application field distribution characteristics value information be:P(T1| Unit1)=Wa/ (Wa+Wb), P (T2| Unit1)=Wb/ (Wa+ ), and word Wb1、word2、word6、word7Respectively with application field T1The degree of correlation be 0.7,0.1,0,0.2, word1、 word2、word6、word7Respectively with application field T2The degree of correlation be 0.1,0.8,0,0;
In step S301, computer equipment is by the application field distribution characteristics value information discretization of Unit1 to the set In word1、word2、word6、word7, and tie degree of correlation information and be weighted, it obtains:
word1Belong to application field T1The first application field characteristic value information be P (T1|word1)=0.7 × Wa/ (Wa+ Wb),
word2Belong to application field T1The first application field characteristic value information be P (T1|word2)=0.1 × Wa/ (Wa+ Wb),
word6Belong to application field T1The first application field characteristic value information be P (T1|word6)=0,
word7Belong to application field T1The first application field characteristic value information be P (T1|word6)=0.2 × Wa/ (Wa+ Wb);
It obtains:
word1Belong to application field T2The first application field characteristic value information be P (T2|word1)=0.1 × Wb/ (Wa+ Wb),
word2Belong to application field T2The first application field characteristic value information be P (T2|word2)=0.8 × Wb/ (Wa+ Wb),
word6Belong to application field T2The first application field characteristic value information be P (T2|word6)=0,
word7Belong to application field T2The first application field characteristic value information be P (T2|word6)=0.
It should be noted that the above-mentioned examples are merely illustrative of the technical solutions of the present invention rather than the limit to the present invention System, it should be appreciated by those skilled in the art that any each keyword of acquisition is related to multiple application field attribute informations respectively Spend information, according to the application field distribution characteristics value information of each keyword set, and with reference to the degree of correlation information into Row weighted calculation, to obtain the realization method of at least one first application field characteristic value information of each keyword, It should within the scope of the present invention.
There is different correlations on the meaning of a word from different application domain attribute information in view of different keywords, it will be each Degree of correlation information of the keyword with multiple application field attribute informations on the meaning of a word, which adds in, obtains the first application field characteristic value letter During breath so that the meaning of a word belongs to the application field with the high a certain keyword of a certain application field attribute information degree of correlation Weighted value is accordingly heightened, higher for final acquisition accuracy so as to obtain more accurate first application field characteristic value information Application field attribute information provide strong guarantee.
One of preferred embodiment as the present embodiment, Fig. 4 show in accordance with a preferred embodiment of the present invention for obtaining Each keyword and the method flow diagram of the degree of correlation information of multiple application field attribute informations.Wherein, according to being originally preferably implemented The method of example includes step S1, step S2, step S3, step S4, step S5, step S6, step S7, step S8 and step S9.
Wherein, step S1, step S2, step S3, step S4, step S5 and step S6 with reference to embodiment illustrated in fig. 3 In be described in detail, details are not described herein.
In the step s 7, computer equipment carries out cutting word processing to each keyword, to obtain each keyword extremely A few keyword cutting word segment.
Here, the cutting word mode includes but not limited to Forward Maximum Method, and reversely maximum matching, two-way maximum matching, Language model method, shortest path first etc..
Then, in step s 8, computer equipment is according at least one keyword cutting word of each keyword obtained Segment is inquired about in the preset corresponding application field cutting word storehouse of multiple application field attribute informations, to obtain this At least one keyword cutting word segment is respectively in the corresponding application field cutting word storehouse of multiple application field attribute informations Frequency of occurrences information;Wherein, the corresponding application field cutting word storehouse of the multiple application field attribute information includes each The preset cutting word segment of application field.
Then, in step s 9, computer equipment is according to the frequency of occurrences information obtained, to obtain each keyword point Not with the degree of correlation information of multiple application field attribute informations.
Wherein, in step s 9, the mode of the computer equipment acquisition degree of correlation includes but not limited to:
1) according to predetermined acquisition rule, in step s 9, computer equipment obtains each keyword and is answered respectively with multiple With the degree of correlation information of domain attribute information;For example, predetermined acquisition rule is for an application field attribute information, occur It is 0.8 that frequency, which is more than predetermined first the keyword of threshold value occur with the degree of correlation of the application field attribute information, frequency of occurrences position When between the 3rd threshold value of predetermined second threshold and reservation, the degree of correlation of keyword and the application field attribute information is 0.4, is gone out The degree of correlation that existing frequency is less than the predetermined 4th keyword and the application field attribute information for threshold value occur is 0.
2) according to frequency of occurrences information, calculated by BM25 algorithms, in step s 9, computer equipment is every to obtain A keyword degree of correlation information with multiple application field attribute informations respectively.
Specifically, the mathematic(al) representation of BM25 algorithms is following formula 2) and 3)
Wherein, Score (D, Q) represents the degree of correlation information of a certain keyword word and a certain application field;
Qi represents keyword cutting word segment;
F (qi, D) represents appearance of the keyword cutting word segment qi in the corresponding application field cutting word storehouse of a certain application field Frequency information;
| D | represent the cutting word segment sum in the corresponding application field cutting word storehouse of a certain application field;
Avgdl represents the cutting word segment sum in the corresponding application field cutting word storehouse of all application fields;
K1 and b represents to be used for the parameter for adjusting precision, it is preferable that k1=2, b=O.75;
N:Application field classification total quantity
n(qi):In the corresponding application field cutting word storehouse of all application fields comprising keyword cutting word segment qi should With field categorical measure.
It should be noted that step S7, step S8 and step S9 may be included in step S10 to obtain degree of correlation information, Also can be independent with step S10 phases, computer equipment obtains the degree of correlation letter obtained in step s 9 from it in step slo Breath.
It should be noted that the above-mentioned examples are merely illustrative of the technical solutions of the present invention rather than the limit to the present invention System, it should be appreciated by those skilled in the art that any carry out cutting word processing to each keyword, to obtain each key At least one keyword cutting word segment of word;According at least one keyword cutting word segment of each keyword, preset The corresponding application field cutting word storehouse of multiple application field attribute informations in inquired about, to obtain at least one key The word cutting word segment frequency of occurrences information in the corresponding application field cutting word storehouse of multiple application field attribute informations respectively; According to the frequency of occurrences information, come obtain each keyword respectively with the degree of correlation information of multiple application field attribute informations Realization method should be included in the scope of the present invention.
It should be noted that the present invention can be carried out in the assembly of software and/or software and hardware, for example, can adopt With application-specific integrated circuit (ASIC) or any other realized similar to hardware device.In one embodiment, software of the invention Program can perform to realize steps described above or function by processor.Similarly, software program of the invention is (including phase The data structure of pass) it can be stored in computer readable recording medium storing program for performing, for example, RAM memory, magnetic or optical driver or soft Disk and similar devices.In addition, the present invention some steps or function hardware can be used to realize, for example, as with processor Coordinate the circuit for performing each step or function.
It is obvious to a person skilled in the art that the invention is not restricted to the details of above-mentioned exemplary embodiment, Er Qie In the case of without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the present invention is by appended power Profit requirement rather than above description limit, it is intended that all by what is fallen within the meaning and scope of the equivalent requirements of the claims Variation includes within the present invention.Any reference numeral in claim should not be considered as to the involved claim of limitation.This Outside, it is clear that one word of " comprising " is not excluded for other units or step, and odd number is not excluded for plural number.That is stated in device claim is multiple Unit or device can also be realized by a unit or device by software or hardware.The first, the second grade words are used for table Show title, and do not represent any particular order.

Claims (11)

1. a kind of for obtaining the method for the application field attribute information of keyword, this method comprises the following steps:
A obtains the original application of at least one keyword that each keyword set includes in pending multiple keyword sets Domain features value information, wherein, each keyword set includes multiple keywords;
B carries out each keyword set according to the original application domain features value information of at least one keyword Statistic of classification is handled, to obtain the application field distribution characteristics value information of each keyword set;
C obtains at least one of each keyword according to the application field distribution characteristics value information of each keyword set First application field characteristic value information, wherein, at least one first application field characteristic value information is corresponded to belonging to the keyword The application field distribution characteristics value information of at least one keyword set;
D carries out statistical disposition at least one first application field characteristic value information of each keyword, with described in acquisition Second application field characteristic value information of each keyword;
Using the second application field characteristic value information of each keyword as the original application domain features value of the keyword Information repeats step b, c, d, until meeting predetermined stoppage condition;
Wherein, this method further includes:
W, according to the second application field characteristic value information of each keyword, is obtained when meeting the predetermined stoppage condition The application field attribute information of each keyword;
Wherein, the predetermined stoppage condition includes any one of following:
When repeating number more than predetermined execution frequency threshold value, computer equipment stopping repeats operation;
When the corresponding application field attribute information of maximum in the second application field characteristic value information of keyword and the key When the corresponding application field attribute information of original application domain features value information of word is identical, computer equipment stopping repeats Operation.
2. according to the method described in claim 1, wherein, the step a includes:
- inquire about multiple keywords that each keyword set includes in default application field classification chart, To obtain the original application domain features value information at least one keyword that each keyword set includes.
3. method according to claim 1 or 2, wherein, at least one first application field of each keyword Characteristic value information carries out statistical disposition also to be wrapped in a manner of the second application field characteristic value information for obtaining each keyword It includes:
- processing is merged at least one first application field characteristic value information of each keyword according to the following formula, with Obtain the second application field characteristic value information of each keyword:
<mrow> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>T</mi> <mi>i</mi> </msub> <mo>|</mo> <msub> <mi>word</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>u</mi> <mi>n</mi> <mi>i</mi> <mi>t</mi> <mn>1</mn> </mrow> <mrow> <mi>u</mi> <mi>n</mi> <mi>i</mi> <mi>t</mi> <mi>N</mi> </mrow> </munderover> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>T</mi> <mi>i</mi> </msub> <mo>|</mo> <msub> <mi>word</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>T</mi> <mi>i</mi> </msub> <mo>|</mo> <msub> <mi>word</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow>
Wherein, TiRepresent a certain application field attribute information;
wordjRepresent a certain keyword;
N represents the sum of keyword set;
M represents the sum of application field attribute information;
Unit1 ... ... UnitN represent keyword set respectively;
P(Ti|wordj) represent keyword wordjBelong to application field TiThe second application field characteristic value information.
4. method according to claim 1 or 2, wherein, this method further includes:
- obtain each keyword degree of correlation information with multiple application field attribute informations respectively;
Wherein, the step c includes:
- carried out according to the application field distribution characteristics value information of each keyword set, and with reference to the degree of correlation information Weighted calculation, to obtain at least one first application field characteristic value information of each keyword.
It is 5. described to obtain each keyword and multiple application field attribute informations according to the method described in claim 4, wherein The method of degree of correlation information is further comprising the steps of:
- cutting word processing is carried out to each keyword, to obtain at least one keyword cutting word piece of each keyword Section;
- according at least one keyword cutting word segment of each keyword, in preset multiple application field attribute informations It is inquired about in corresponding application field cutting word storehouse, is answered respectively multiple to obtain at least one keyword cutting word segment With the frequency of occurrences information in the corresponding application field cutting word storehouse of domain attribute information;
- according to the frequency of occurrences information, to obtain the degree of correlation of each keyword respectively with multiple application field attribute informations Information.
6. a kind of for obtaining the acquisition device of the application field attribute information of keyword, which includes:
Initial characteristic values acquisition device, for obtain each keyword set in pending multiple keyword sets include to The original application domain features value information of a few keyword, wherein, each keyword set includes multiple keywords;
Application field distributed acquisition device, for the original application domain features value information according at least one keyword, Statistic of classification processing is carried out to each keyword set, it is special to obtain the distribution of the application field of each keyword set Value indicative information;
The First Eigenvalue acquisition device for the application field distribution characteristics value information according to each keyword set, obtains At least one first application field characteristic value information of each keyword is taken, wherein, at least one first application field characteristic value Information corresponds to the application field distribution characteristics value information of at least one keyword set belonging to the keyword;
Second Eigenvalue acquisition device, at least one first application field characteristic value information to each keyword into Row statistical disposition, to obtain the second application field characteristic value information of each keyword;
Control device, when not meeting predetermined stoppage condition, by the second application field characteristic value information of each keyword As the original application domain features value information of the keyword, to control the application field distributed acquisition device, described first Characteristic value acquisition device and the Second Eigenvalue acquisition device repeat corresponding operating, until meeting the predetermined stopping item Part;
Wherein, which further includes:
Application field attribute acquisition device, for when meeting the predetermined stoppage condition, according to the of each keyword Two application field characteristic value informations obtain the application field attribute information of each keyword;
Wherein, the predetermined stoppage condition includes any one of following:
When repeating number more than predetermined execution frequency threshold value, control device stopping repeats operation;
When the corresponding application field attribute information of maximum in the second application field characteristic value information of keyword and the key When the corresponding application field attribute information of original application domain features value information of word is identical, control device stopping repeats behaviour Make.
7. acquisition device according to claim 6, wherein, the initial characteristic values acquisition device is used for each pass Multiple keywords that keyword set includes are inquired about in default application field classification chart, to obtain each key The original application domain features value information at least one keyword that set of words includes.
8. the acquisition device according to claim 6 or 7, wherein, at least one first application to each keyword Domain features value information carries out statistical disposition in a manner of the second application field characteristic value information for obtaining each keyword It further includes:
- processing is merged at least one first application field characteristic value information of each keyword according to the following formula, with Obtain the second application field characteristic value information of each keyword:
<mrow> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>T</mi> <mi>i</mi> </msub> <mo>|</mo> <msub> <mi>word</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>u</mi> <mi>n</mi> <mi>i</mi> <mi>t</mi> <mn>1</mn> </mrow> <mrow> <mi>u</mi> <mi>n</mi> <mi>i</mi> <mi>t</mi> <mi>N</mi> </mrow> </munderover> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>T</mi> <mi>i</mi> </msub> <mo>|</mo> <msub> <mi>word</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>T</mi> <mi>i</mi> </msub> <mo>|</mo> <msub> <mi>word</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow>
Wherein, TiRepresent a certain application field attribute information;
wordjRepresent a certain keyword;
N represents the sum of keyword set;
M represents the sum of application field attribute information;
Unit1 ... ... UnitN represent keyword set respectively;
P(Ti|wordj) represent keyword wordjBelong to application field TiThe second application field characteristic value information.
9. the acquisition device according to claim 6 or 7, wherein, which further includes:
First degree of correlation acquisition device is believed for obtaining each keyword with the degree of correlation of multiple application field attribute informations respectively Breath;
Wherein, the First Eigenvalue acquisition device includes:
Weighting device, for the application field distribution characteristics value information according to each keyword set, and with reference to the phase Pass degree information is weighted, to obtain at least one first application field characteristic value information of each keyword.
10. acquisition device according to claim 9, wherein, which further includes:
Cutting device, for carrying out cutting word processing to each keyword, to obtain at least one of each keyword Keyword cutting word segment;
Frequency of occurrences acquisition device, at least one keyword cutting word segment according to each keyword, preset It is inquired about in multiple corresponding application field cutting word storehouses of application field attribute information, to obtain at least one keyword The cutting word segment frequency of occurrences information in the corresponding application field cutting word storehouse of multiple application field attribute informations respectively;
Second degree of correlation acquisition device, for according to the frequency of occurrences information, being answered respectively with multiple to obtain each keyword With the degree of correlation information of domain attribute information.
11. a kind of computer equipment, including the acquisition device as described at least one of in claim 6 to 10.
CN201210335806.7A 2012-09-11 2012-09-11 A kind of method, apparatus and equipment of the application field attribute information for being used to obtain keyword Active CN103678356B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210335806.7A CN103678356B (en) 2012-09-11 2012-09-11 A kind of method, apparatus and equipment of the application field attribute information for being used to obtain keyword

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210335806.7A CN103678356B (en) 2012-09-11 2012-09-11 A kind of method, apparatus and equipment of the application field attribute information for being used to obtain keyword

Publications (2)

Publication Number Publication Date
CN103678356A CN103678356A (en) 2014-03-26
CN103678356B true CN103678356B (en) 2018-05-25

Family

ID=50315949

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210335806.7A Active CN103678356B (en) 2012-09-11 2012-09-11 A kind of method, apparatus and equipment of the application field attribute information for being used to obtain keyword

Country Status (1)

Country Link
CN (1) CN103678356B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018076243A1 (en) 2016-10-27 2018-05-03 华为技术有限公司 Search method and device
CN107093099B (en) * 2017-03-10 2020-10-30 重庆软易科技有限公司 Network transaction system and method
CN107193973B (en) * 2017-05-25 2021-07-20 百度在线网络技术(北京)有限公司 Method, device and equipment for identifying field of semantic analysis information and readable medium
CN110019827B (en) * 2017-08-24 2023-03-14 腾讯科技(北京)有限公司 Corpus generation method, apparatus, device and computer storage medium
CN109284392B (en) * 2018-12-07 2021-04-06 达闼机器人有限公司 Text classification method, device, terminal and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101021866A (en) * 2007-03-13 2007-08-22 白云 Method for criminating electronci file and relative degree with certain field and application thereof
CN101770580A (en) * 2009-01-04 2010-07-07 中国科学院计算技术研究所 Training method and classification method of cross-field text sentiment classifier
CN102682090A (en) * 2012-04-26 2012-09-19 焦点科技股份有限公司 System and method for matching and processing sensitive words on basis of polymerized word tree
CN102722503A (en) * 2011-03-31 2012-10-10 北京百度网讯科技有限公司 Method and device for sequencing search results

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101021866A (en) * 2007-03-13 2007-08-22 白云 Method for criminating electronci file and relative degree with certain field and application thereof
CN101770580A (en) * 2009-01-04 2010-07-07 中国科学院计算技术研究所 Training method and classification method of cross-field text sentiment classifier
CN102722503A (en) * 2011-03-31 2012-10-10 北京百度网讯科技有限公司 Method and device for sequencing search results
CN102682090A (en) * 2012-04-26 2012-09-19 焦点科技股份有限公司 System and method for matching and processing sensitive words on basis of polymerized word tree

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"搜索引擎营销领域某关键词分析管理工具的测试";张琦;《中国优秀硕士学位论文全文数据库·信息科技辑》;20110315;I138-475 *

Also Published As

Publication number Publication date
CN103678356A (en) 2014-03-26

Similar Documents

Publication Publication Date Title
Wei A method for multiple attribute group decision making based on the ET-WG and ET-OWG operators with 2-tuple linguistic information
CN103678356B (en) A kind of method, apparatus and equipment of the application field attribute information for being used to obtain keyword
CN109711925A (en) Cross-domain recommending data processing method, cross-domain recommender system with multiple auxiliary domains
CN102262653B (en) Label recommendation method and system based on user motivation orientation
CN104572797A (en) Individual service recommendation system and method based on topic model
US20100328312A1 (en) Personal music recommendation mapping
WO2022126901A1 (en) Commodity recommendation method and related device thereof
WO2022142001A1 (en) Target object evaluation method based on multi-score card fusion, and related device therefor
CN107357812A (en) A kind of data query method and device
CN111651678B (en) Personalized recommendation method based on knowledge graph
CN103530416A (en) Project data forecasting grading library generating and project data pushing method and project data forecasting grading library generating and project data pushing system
Liu et al. Multiple attribute group decision making methods based on some normal neutrosophic number Heronian Mean operators
WO2015101161A1 (en) Method and device for generating user page corresponding to target system
US20080301111A1 (en) Method and system for providing ranked search results
CN106919997A (en) A kind of customer consumption Forecasting Methodology of the ecommerce based on LDA
Goyal et al. Lossy conservative update (LCU) sketch: Succinct approximate count storage
Shuxian et al. Design and implementation of movie recommendation system based on naive bayes
CN112100177A (en) Data storage method and device, computer equipment and storage medium
Di Nunzio Using scatterplots to understand and improve probabilistic models for text categorization and retrieval
Han et al. Improving recommendation based on features’ co-occurrence effects in collaborative tagging systems
Pongnumkul et al. Random walk-based recommendation with restart using social information and bayesian transition matrices
Liu et al. Some intuitionistic linguistic dependent Bonferroni mean operators and application in group decision-making
CN107688979A (en) Method and apparatus for providing credit reference information
Kim et al. TrendsSummary: a platform for retrieving and summarizing trendy multimedia contents
Wang Application of E-Commerce Recommendation Algorithm in Consumer Preference Prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant