CN102270212A - User interest feature extraction method based on hidden semi-Markov model - Google Patents

User interest feature extraction method based on hidden semi-Markov model Download PDF

Info

Publication number
CN102270212A
CN102270212A CN2011100881918A CN201110088191A CN102270212A CN 102270212 A CN102270212 A CN 102270212A CN 2011100881918 A CN2011100881918 A CN 2011100881918A CN 201110088191 A CN201110088191 A CN 201110088191A CN 102270212 A CN102270212 A CN 102270212A
Authority
CN
China
Prior art keywords
state
hsmm
model
user
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2011100881918A
Other languages
Chinese (zh)
Inventor
琚春华
王蓓
章敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Gongshang University
Original Assignee
Zhejiang Gongshang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Gongshang University filed Critical Zhejiang Gongshang University
Priority to CN2011100881918A priority Critical patent/CN102270212A/en
Publication of CN102270212A publication Critical patent/CN102270212A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a user interest feature extraction method based on a Hidden semi-Markov model. The invention aims at providing a user interest feature extraction method which is more accord with actual situations and has better modeling and analyzing ability. The technical scheme of the invention is that: the method comprises the steps of: data collection, data preprocessing, model training and user interest feature extraction. The user interest feature extraction method based on the Hidden semi-Markov model is used for more accurately extracting the user interest features under complex conditions by introducing the Hidden semi-Markov model .

Description

A kind of user interest feature extracting method based on hidden semi-Markov model
Technical field
The present invention relates to machine learning and information extraction technique field, especially a kind of user interest feature extracting method based on hidden semi-Markov model.Be applicable under complex conditions, the user interest feature extracted more accurately by the introducing of hidden semi-Markov model.
Background technology
Since the sixties in 20th century, text message extracts Study on Theory and has obtained continuous development, becomes an important research branch of natural language processing field, and the model that extracts for information about mainly contains 3 classes at present: a kind of model that is based on dictionary; A kind of model that is based on rule is as ontology; Also have a kind of model that is based on statistics, as hidden Markov model (Hidden Markov model, HMM).
Webpage is compared with traditional text, and many characteristics are arranged: amount is big, the normal renewal, and variation is many, and the page a greater part of comprises structurized literal piece, also has hyperlink.For non-structured natural language text is transformed in the structurized information bank, need the collaborative work of multiple natural language processing technique.Information extraction technique can be summed up as automatic word segmentation, mark and the template filling automatically to text.Need to seek text and carry out certain semantic analysis, according to traditional natural language processing technique, the treatment step that roughly should comprise in the information extraction module of Chinese has comprised word segmentation processing, title analysis, grammatical analysis, semantic analysis, scene coupling, consistency analysis, reasoning and judging, template matches filling, or the like.Information extraction technique is as " bridge " treatment technology between unstructured data and the database, and is very crucial and important for web text data digging multilingual, heterogeneous, isomery.
Because HMM has very the statistical basis that is fit to natural language processing, add that it extracts strong robustness, precision height, is easy to set up and advantage such as adaptability is strong, more and more receives researcher's concern.Hidden semi-Markov model (Hidden Semi-Markov Model, HSMM) be a kind of extended model of HMM, overcome the limitation that the hypothesis because of Markov chain causes the HMM modeling to be had, compare with HMM, HSMM is more suitable in describing the hidden Markov process of state duration for distributing arbitrarily.
Summary of the invention
The technical problem to be solved in the present invention is: be provided at a kind of user interest feature extracting method based on hidden semi-Markov model in the e-commerce website, the model based on statistics is more tallied with the actual situation, have better modeling and analysis ability.
The technical solution adopted in the present invention is: the user interest feature extracting method based on hidden semi-Markov model is characterized in that comprising step:
Step 1, data aggregation,
Obtain the information relevant with user characteristics, interest characteristics or demand by user's recessive behavior, wherein interest characteristics obtains by Web server daily record and client data;
Step 2, data pre-service,
Pre-service mainly to user access logs carry out that data purification, User Recognition, session jd, path replenish, the processing of format and event recognition, form the user conversation file;
Step 3, model training,
3.1, tentatively choose a HSMM model that N state arranged, it by hexa-atomic group of parameter lambda=(N, M, π, A, B P) forms, wherein:
N represents the number of state, and its limited state set is S={s 1, s 2..., s N,
M represents the number of observed value, and its limited observation set is V={v 1, v 2..., v M,
The initial probability distribution π of state={ π 1, π 2..., π N, be used to describe observed value sequence O at t=1 moment state q of living in 1The probability distribution that belongs to each state in the model, i.e. π i=P (q 1=s i),
State transition probability matrix A=[a Ij] N * M, consider single order HSMM, current state q of living in iOnly and q I-1Relevant, i.e. a Ij=P (q i=s j| q I-1=s i),
Observed value probability matrix B=[b j(k)], expression observation sequence O kBe in state S jProbability, it is stochastic variable or the distribution of random vector in the observation probability space of each state, represents with the mixed Gaussian distribution function usually:
b j ( O k ) = Σ g = 1 G ω jg N ( O , μ jg , U jg ) .
In formula: the Gauss unit number that G may comprise for each state; ω JgBe g mixed Gaussian weight of j state; μ JgIt is the average of g mixed Gaussian of j state; U JgBe g mixed Gaussian variance matrix of j state.
State duration density function p j(d), expression state S jContinue the probability of d time quantum, represent with single Gaussian distribution:
p i(d)=N(d|μ j,σ j),
Figure BSA00000469520600022
In the formula: μ jThe average of representing the j state duration; σ jThe variance of representing the j state duration, D represents maximum rating duration unit;
3.2, use the BW algorithm and train pretreated sample,
Adopt repeatedly the optimized Algorithm of iteration, count objective function Q of multiplication structure with Lagrange, all HSMM parameters have wherein been comprised as variable, make then that Q is 0 to the inclined to one side inverse of each variable, derive HSMM parameter that Q culminates stylish with respect to the relation between the old HSMM parameter, thereby obtain each estimates of parameters of HSMM model, with the computing that iterates of the funtcional relationship between the new and old HSMM parameter, till the HSMM parameter no longer obvious variation takes place;
3.3, HSMM is carried out initialization,
3.4, ask for HSMM model λ;
Value sequence O and initial model λ=(π, A, B, P of choosing according to the observation i(d)), by the revaluation formula, try to achieve one group of new argument
Figure BSA00000469520600031
With
Figure BSA00000469520600032
That is obtained a new model
Figure BSA00000469520600033
Promptly obtain by the revaluation formula
Figure BSA00000469520600034
Good aspect the expression observed value sequence O, repeat this process than λ, improved model parameter progressively, up to
Figure BSA00000469520600035
Convergence promptly no longer obviously increases, this moment
Figure BSA00000469520600036
Be the HSMM model of being asked.
Step 4, user interest feature extraction,
4.1, the text of pretreated feature to be extracted is carried out scan text, utilize then composing such as line feed, colon, two line space lattice and separator information the retrtieval sequence be converted to the text sections sequence of mark.The HSMM model λ of integrating step 3 training part outputs calculates the test sample book that obtains after the text sections processing with the Viterbi algorithm, carry out the user interest feature extraction;
4.2, with the text observed value sequence O=O that obtains after the pre-service 1O 2... O TBe input to HSMM model λ, find out the state tag sequence of maximum probability
Figure BSA00000469520600037
The observation text that is marked as the dbjective state label is the content that user characteristics extracts.
Data purification described in the step 2 is unwanted data in the deletion mining process; User Recognition is the process that the page with user and request is associated, the wherein main situation of handling a plurality of users by acting server or firewall access website; Session jd is that a user all requests for page in a period of time are decomposed to obtain user conversation; The path process of replenishing is exactly to replenish this locality or omission requested page that proxy server caches caused complete.
The invention has the beneficial effects as follows: this method is by controlling user browsing behavior with the probability of state presence time, the latent state of description interest characteristics and the correlativity of time are more closely combined, and use the characteristic that HSMM produces many observed values sequence, text message is divided into a plurality of text block subregions, the feature of each subregion and one of them observed value sequence are mapped; And utilize HSMM to allow the default characteristic extracting of observed value to have the characteristic information of disappearance behavior.Experiment shows that utilizing HSMM to carry out feature extraction has higher accuracy rate and recall rate than HMM method, and more realistic problem has better modeling and analysis ability.
Description of drawings
Fig. 1 is a workflow diagram of the present invention.
Fig. 2 is that the comprehensive evaluation index of HSMM and HMM compares.
Embodiment
Present embodiment may further comprise the steps based on the user interest feature extracting method of hidden semi-Markov model:
Step 1, experimental data are collected
Data aggregation is a process of obtaining the information relevant with user characteristics, interest characteristics or demand.The present invention mainly reflects its interest by research user's recessive behavior, and interest characteristics can obtain by Web server daily record and client data.
The pre-service of step 2, data
Pre-service mainly to user access logs carry out that data purification, User Recognition, session jd, path replenish, processing such as format and event recognition, form the user conversation file.Data purification is deleted unwanted data in the mining process exactly; User Recognition is the process that the page with user and request is associated, the wherein main situation of handling a plurality of users by acting server or firewall access website; Session jd is that a user all requests for page in a period of time are decomposed to obtain user conversation; The path process of replenishing is exactly to replenish this locality or omission requested page that proxy server caches caused complete.
Pre-service mainly contains following steps for data:
1) webpage format inspection: can remove wherein insignificant Web page annotation of information extraction and part format character by webpage being carried out entire scan, keep the supplementary that the important node mark grasps as information simultaneously.
2) signature: utilize rule that fixed character is described, be convenient to extraction model stationary state or the fixing situation that discharges probability are handled, reduce algorithm operation complicacy, raise the efficiency.
3) participle: if run into longer character string when handling each intra-node information then need to carry out word segmentation processing, can directly utilize the JE word-dividing mode of increasing income, easy API wherein is provided, and can add neologisms.
4) text mark: valuable information in the text is carried out mark, be beneficial to extraction model identification, need artificial careful mark, just can draw more suitably model parameter, have a significant impact for the degree of accuracy of the information of extraction in the extraction model training stage.
Utilize HTMLPaser to generate node tree structure data are carried out preliminary piecemeal, make text-converted become to be easier to the pattern of being handled by information extraction system.HTMLPaser is a Java storehouse of increasing income, support linear or nested parsing html text, according to typesetting format information, information such as separator will be converted to the sequence of being made up of text sections with the good Web log information text sequence of html language mark, and each piecemeal all carries out status indication with html language.
Step 3, model training
Utilize model must be earlier before carrying out information extraction data source by a large amount of accurate marks to the model training, can utilize existing model directly to carry out information extraction in the practical application, simultaneously also will be according to the new situation adjustment model parameter that runs in the information extraction process, this has embodied the self-adjusting characteristic of hidden Markov model.Experimental data derives from the Taobao website, collect user's the behavior of browsing webpage, after the webpage processing, obtain 2000 user behavior text datas, and at random will be wherein 1500 retrtieval is as training set (having carried out model training), other 500 samples are as test set (being used for information extraction).
HSMM is that it allows basic process is a semi-Markov chain, and each state is all had a variable cycle or the residence time between the semicontinuous HMM of continuous and Discrete HMM.In order to overcome the shortcoming of conventional H MM, the various explicit p that use have appearred i(d) HMM of expression state presence probability distribution.
3.1, earlier tentatively choose a HSMM model that N state arranged, it is by hexa-atomic group of parameter lambda=(N, M, π, A, B, pj (d)) composition, wherein:
N represents the number of state, and its limited state set is S={s 1, s 2..., s N;
M represents the number of observed value, and its limited observation set is V={v 1, v 2..., v M;
The initial probability distribution π of state={ π 1, π 2..., π N; be used for describing observation sequence O (more observed sequences of real process energy; such as according to you have order certain link in the browsing page process, the network address of this link is what is to know, the title of this page also is available) at t=1 state q of living in constantly 1The probability distribution that belongs to each state in the model, i.e. π i=P (q 1=s i);
State transition probability matrix A=[a Ij] N * M, consider single order HSMM, current state q of living in iOnly and q I-1Relevant, i.e. a Ij=P (q i=s j| q I-1=s i);
Observed value probability matrix B=[b j(k)], expression observation sequence O kBe in state S jProbability, it is stochastic variable or the distribution of random vector in the observation probability space of each state, represents with the mixed Gaussian distribution function usually:
b j ( O k ) = Σ g = 1 G ω jg N ( O , μ jg , U jg ) ,
In formula: the Gauss unit number that G may comprise for each state; ω JgBe g mixed Gaussian weight of j state; μ JgIt is the average of g mixed Gaussian of j state; U JgBe g mixed Gaussian variance matrix of j state;
State duration density function (being used for describing the density function that a state can be represented under the time remaining situation) p j(d), expression state S jContinue the probability of d time quantum, represent with single Gaussian distribution:
P i(d)=N(d|μ j,σ j),
Figure BSA00000469520600062
In the formula: μ jThe average of representing the j state duration; σ jThe variance of representing the j state duration, D represents maximum rating duration unit.
The present invention is mainly used in the extraction of user interest behavioural information, to show and the implicit information obtain manner combines, the main source that these several respects of behavior of mainly utilizing the searching keyword of user's inputted search engine, the page that the user browses and user to browse are set out and obtained as user interest information.Mainly choose 7 states, state set S={User, Keys, Title, Time, Marks, Operations, Links}, and in the regular hour section, under Marks and the Operations a series of states are arranged also, set { books, savepage} ∈ Marks, { cut, copy, scroll} ∈ Operations.User interest information state set structure sees Table 1.
Table 1 user interest information state set structure
Keywords The searching keyword of search engine
Title The title of requested webpage
Time The residence time on the page (second)
Marks Marking behavior: increase bookmark (books), preserve the page (savepage)
Operations Operation behavior: shear (cut), duplicate (copy), drag scroll bar (scroll)
Links Link behavior: have when browsing certain page and do not click certain hyperlink
3.2, to the training of HSMM model,
Use BW (Baum-Welch) algorithm and come training step 2 pretreated samples, the Baum-Welch algorithm also is mainly to solve model training and parameter revaluation problem, in fact also be an application of maximum likelihood criterion, it has adopted a kind of optimized Algorithm of repeatedly iteration, count objective function Q of multiplication structure with Lagrange, all HSMM parameters have wherein been comprised as variable, make then that Q is 0 to the inclined to one side inverse of each variable, derive HSMM parameter that Q culminates stylish with respect to the relation between the old model parameter, thereby obtain the estimated value of each parameter of HSMM, with the computing that iterates of the funtcional relationship between the new and old HSMM model parameter, till the HSMM model parameter no longer obvious variation takes place.
3.3, HSMM is carried out initialization,
When obtaining the parameter of HSMM by training dataset, an important problem is exactly choosing of initial model.Different initial models will obtain different training results.Because algorithm is the model parameter that obtains when making P (O| λ) local maximum.Therefore, choosing initial model preferably, make local maximum and the global maximum obtained at last approaching, is highly significant.But this problem does not still have perfect answer so far.Usually adopt the method for some experiences during actual treatment.It is generally acknowledged that the influence that parameter π and A choose initial value is little, can picked at random or evenly choose, as long as satisfy certain random constraints condition.But the initial value of B is bigger to the training influence of HSMM, the general comparatively initial value choosing method of complexity that adopts.The initial model λ here can choose arbitrarily.But because
Figure BSA00000469520600071
Arbitrarily
Figure BSA00000469520600072
It is the model after λ improves.Again will Use the revaluation formula as initial value, obtain
Figure BSA00000469520600074
This has just been avoided choosing of initial value improper to a certain extent.
3.4, ask for HSMM model λ;
Value sequence O and initial model λ=(π, A, B, P of choosing according to the observation i(d)), by the revaluation formula, try to achieve one group of new argument
Figure BSA00000469520600075
With
Figure BSA00000469520600076
That is obtained a new model
Figure BSA00000469520600077
Promptly obtain by the revaluation formula
Figure BSA00000469520600078
Good aspect the expression observed value sequence O, repeat this process than λ, improved model parameter progressively, up to Convergence promptly no longer obviously increases, this moment
Figure BSA000004695206000710
Be the HSMM model of being asked;
Step 4, user interest feature extraction,
4.1, the text of pretreated feature to be extracted is carried out scan text, utilize then composing such as line feed, colon, two line space lattice and separator information the retrtieval sequence be converted to the text sections sequence of mark.The HSMM model λ of integrating step 3 training part outputs calculates 500 test sample books that obtain after the text sections processing with the Viterbi algorithm, carry out the user interest feature extraction;
4.2, with the text observed value sequence O=O that obtains after the pre-service 1O 2... O TBe input to HSMM model λ, find out the state tag sequence of maximum probability
Figure BSA00000469520600081
The observation text that is marked as the dbjective state label is the content that user characteristics extracts.
Use the Viterbi algorithm to solve a given observed value sequence O=(O 1, O 2..., O T) and HSMM model λ=(π, A, B, a P i(d)), on the meaning of the best, determine a state tag sequence
Figure BSA00000469520600082
Problem.
This example uses in test accuracy rate (P), recall rate (R) and three indexs of F value as the system performance evaluating standard.Three indexs are defined as follows:
P = N 1 N 1 + N 2 , R = N 1 N 1 + N 3
Wherein, N 1The example number of the correct identification of expression, N 2Expression is recognized such other example number, N by mistake 3The example number of representing to belong to this classification but being recognized other classification by mistake.
Comprehensive evaluation index:
F = ( β 2 + 1 ) × P × R β 2 × P + R
Wherein, parameter beta is used for to accuracy rate P gives different weights with recall rate R, and when β got 1, accuracy rate was endowed identical weight with recall ratio.We get β=1 in the experiment.
Utilize comprehensive evaluation index F respectively HSMM and HMM to be estimated, ask for an interview table 2, table 3.
Table 2 utilizes the HMM model to carry out the test result and the performance index statistical form of feature extraction experiment
N 1 N 2 N 3 P(%) R(%)
Keywords 317 113 70 73.72 81.91
Title 280 103 117 73.11 70.53
Time 332 98 70 77.21 82.59
Operations 295 143 62 67.35 82.63
Marks 353 92 55 79.33 86.52
Links 382 51 67 88.22 85.08
Table 3 utilizes the HSMM model to carry out the test result and the performance index statistical form of feature extraction experiment
N 1 N 2 N 3 P(%) R(%)
Keywords 363 79 58 82.13 85.37
Title 308 104 88 74.76 77.78
Time 402 56 42 87.77 90.54
Operations 391 38 71 91.14 84.63
Marks 381 67 52 85.04 87.99
Links 443 27 30 94.26 93.66

Claims (2)

1. user interest feature extracting method based on hidden semi-Markov model is characterized in that comprising step:
Step 1, data aggregation,
Obtain the information relevant with user characteristics, interest characteristics or demand by user's recessive behavior, wherein interest characteristics obtains by Web server daily record and client data;
Step 2, data pre-service,
Pre-service mainly to user access logs carry out that data purification, User Recognition, session jd, path replenish, the processing of format and event recognition, form the user conversation file;
Step 3, model training,
3.1, tentatively choose a HSMM model that N state arranged, it is by hexa-atomic group of parameter lambda=(N, M, π, A, B, p j(d)) form, wherein:
N represents the number of state, and its limited state set is S={s 1, s 2..., s N,
M represents the number of observed value, and its limited observation set is V={v 1, v 2..., v M,
The initial probability distribution π of state={ π 1, π 2..., π N, be used to describe observed value sequence O at t=1 moment state q of living in 1The probability distribution that belongs to each state in the model, i.e. π i=P (q 1=s i),
State transition probability matrix A=[a Ij] N * M, consider single order HSMM, current state q of living in iOnly and q I-1Relevant, i.e. a Ij=P (q i=s j| q I-1=s i),
Observed value probability matrix B=[b j(k)], expression observation sequence O kBe in state S jProbability, it is stochastic variable or the distribution of random vector in the observation probability space of each state, represents with the mixed Gaussian distribution function usually:
b j ( O k ) = Σ g = 1 G ω jg N ( O , μ jg , U jg ) .
In formula: the Gauss unit number that G may comprise for each state; ω JgBe g mixed Gaussian weight of j state; μ JgIt is the average of g mixed Gaussian of j state; U JgBe g mixed Gaussian variance matrix of j state.
State duration density function p j(d), expression state S jContinue the probability of d time quantum, represent with single Gaussian distribution:
p j(d)=N(d|μ j,σ j),
In the formula: μ jThe average of representing the j state duration; σ jThe variance of representing the j state duration, D represents maximum rating duration unit;
3.2, use the BW algorithm and train pretreated sample,
Adopt repeatedly the optimized Algorithm of iteration, count objective function Q of multiplication structure with Lagrange, all HSMM parameters have wherein been comprised as variable, make then that Q is 0 to the inclined to one side inverse of each variable, derive HSMM parameter that Q culminates stylish with respect to the relation between the old HSMM parameter, thereby obtain each estimates of parameters of HSMM model, with the computing that iterates of the funtcional relationship between the new and old HSMM parameter, till the HSMM parameter no longer obvious variation takes place;
3.3, HSMM is carried out initialization,
3.4, ask for HSMM model λ;
Value sequence O and initial model λ=(π, A, B, P of choosing according to the observation i(d)), by the revaluation formula, try to achieve one group of new argument
Figure FSA00000469520500021
With
Figure FSA00000469520500022
That is obtained a new model
Figure FSA00000469520500023
Promptly obtain by the revaluation formula Good aspect the expression observed value sequence O, repeat this process than λ, improved model parameter progressively, up to
Figure FSA00000469520500025
Convergence promptly no longer obviously increases, this moment
Figure FSA00000469520500026
Be the HSMM model of being asked.
Step 4, user interest feature extraction,
4.1, the text of pretreated feature to be extracted is carried out scan text, utilize then composing such as line feed, colon, two line space lattice and separator information the retrtieval sequence be converted to the text sections sequence of mark.The HSMM model λ of integrating step 3 training part outputs calculates the test sample book that obtains after the text sections processing with the Viterbi algorithm, carry out the user interest feature extraction;
4.2, with the text observed value sequence O=O that obtains after the pre-service 1O 2... O TBe input to HSMM model λ, find out the state tag sequence of maximum probability
Figure FSA00000469520500027
The observation text that is marked as the dbjective state label is the content that user characteristics extracts.
2. according to right 1 described extracting method, it is characterized in that: the data purification described in the step 2 is unwanted data in the deletion mining process; User Recognition is the process that the page with user and request is associated, the wherein main situation of handling a plurality of users by acting server or firewall access website; Session jd is that a user all requests for page in a period of time are decomposed to obtain user conversation; The path process of replenishing is exactly to replenish this locality or omission requested page that proxy server caches caused complete.
CN2011100881918A 2011-04-07 2011-04-07 User interest feature extraction method based on hidden semi-Markov model Pending CN102270212A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011100881918A CN102270212A (en) 2011-04-07 2011-04-07 User interest feature extraction method based on hidden semi-Markov model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011100881918A CN102270212A (en) 2011-04-07 2011-04-07 User interest feature extraction method based on hidden semi-Markov model

Publications (1)

Publication Number Publication Date
CN102270212A true CN102270212A (en) 2011-12-07

Family

ID=45052519

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011100881918A Pending CN102270212A (en) 2011-04-07 2011-04-07 User interest feature extraction method based on hidden semi-Markov model

Country Status (1)

Country Link
CN (1) CN102270212A (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102438025A (en) * 2012-01-10 2012-05-02 中山大学 Indirect distributed denial of service attack defense method and system based on Web agency
CN102999789A (en) * 2012-11-19 2013-03-27 浙江工商大学 Digital city safety precaution method based on semi-hidden-markov model
CN103020289A (en) * 2012-12-25 2013-04-03 浙江鸿程计算机系统有限公司 Method for providing individual needs of search engine user based on log mining
CN104123312A (en) * 2013-04-28 2014-10-29 国际商业机器公司 Data mining method and device
CN105740327A (en) * 2016-01-22 2016-07-06 天津中科智能识别产业技术研究院有限公司 Self-adaptive sampling method based on user preferences
CN106651517A (en) * 2016-12-20 2017-05-10 广东技术师范学院 Hidden semi-Markov model-based drug recommendation method
CN106685996A (en) * 2017-02-23 2017-05-17 上海万雍科技股份有限公司 Method for detecting account abnormal logging based on HMM model
CN106803422A (en) * 2015-11-26 2017-06-06 中国科学院声学研究所 A kind of language model re-evaluation method based on memory network in short-term long
CN107808168A (en) * 2017-10-31 2018-03-16 北京科技大学 A kind of social network user behavior prediction method based on strong or weak relation
CN108303649A (en) * 2017-01-13 2018-07-20 重庆邮电大学 A kind of cell health state recognition methods
CN108829808A (en) * 2018-06-07 2018-11-16 麒麟合盛网络技术股份有限公司 A kind of page personalized ordering method, apparatus and electronic equipment
CN109726292A (en) * 2019-01-02 2019-05-07 山东省科学院情报研究所 Text analyzing method and apparatus towards extensive multilingual data
CN109933741A (en) * 2019-02-27 2019-06-25 京东数字科技控股有限公司 User network behaviors feature extracting method, device and storage medium
CN109947891A (en) * 2017-11-07 2019-06-28 北京国双科技有限公司 Document analysis method and device
WO2019128938A1 (en) * 2017-12-29 2019-07-04 北京神州绿盟信息安全科技股份有限公司 Method for extracting feature string, device, network apparatus, and storage medium
CN110008334A (en) * 2017-08-04 2019-07-12 腾讯科技(北京)有限公司 A kind of information processing method, device and storage medium
CN110224850A (en) * 2019-04-19 2019-09-10 北京亿阳信通科技有限公司 Telecommunication network fault early warning method, device and terminal device

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102438025B (en) * 2012-01-10 2015-03-25 中山大学 Indirect distributed denial of service attack defense method and system based on Web agency
CN102438025A (en) * 2012-01-10 2012-05-02 中山大学 Indirect distributed denial of service attack defense method and system based on Web agency
CN102999789A (en) * 2012-11-19 2013-03-27 浙江工商大学 Digital city safety precaution method based on semi-hidden-markov model
CN103020289A (en) * 2012-12-25 2013-04-03 浙江鸿程计算机系统有限公司 Method for providing individual needs of search engine user based on log mining
CN103020289B (en) * 2012-12-25 2015-08-05 浙江鸿程计算机系统有限公司 A kind of search engine user individual demand supplying method based on Web log mining
CN104123312B (en) * 2013-04-28 2018-02-16 国际商业机器公司 A kind of data digging method and device
CN104123312A (en) * 2013-04-28 2014-10-29 国际商业机器公司 Data mining method and device
CN106803422A (en) * 2015-11-26 2017-06-06 中国科学院声学研究所 A kind of language model re-evaluation method based on memory network in short-term long
CN106803422B (en) * 2015-11-26 2020-05-12 中国科学院声学研究所 Language model reestimation method based on long-time and short-time memory network
CN105740327B (en) * 2016-01-22 2019-04-19 天津中科智能识别产业技术研究院有限公司 A kind of adaptively sampled method based on user preference
CN105740327A (en) * 2016-01-22 2016-07-06 天津中科智能识别产业技术研究院有限公司 Self-adaptive sampling method based on user preferences
CN106651517B (en) * 2016-12-20 2021-11-30 广东技术师范大学 Drug recommendation method based on hidden semi-Markov model
CN106651517A (en) * 2016-12-20 2017-05-10 广东技术师范学院 Hidden semi-Markov model-based drug recommendation method
CN108303649A (en) * 2017-01-13 2018-07-20 重庆邮电大学 A kind of cell health state recognition methods
CN106685996A (en) * 2017-02-23 2017-05-17 上海万雍科技股份有限公司 Method for detecting account abnormal logging based on HMM model
CN110008334A (en) * 2017-08-04 2019-07-12 腾讯科技(北京)有限公司 A kind of information processing method, device and storage medium
CN107808168A (en) * 2017-10-31 2018-03-16 北京科技大学 A kind of social network user behavior prediction method based on strong or weak relation
CN109947891A (en) * 2017-11-07 2019-06-28 北京国双科技有限公司 Document analysis method and device
WO2019128938A1 (en) * 2017-12-29 2019-07-04 北京神州绿盟信息安全科技股份有限公司 Method for extracting feature string, device, network apparatus, and storage medium
US11379687B2 (en) 2017-12-29 2022-07-05 Nsfocus Technologies Group Co., Ltd. Method for extracting feature string, device, network apparatus, and storage medium
CN108829808A (en) * 2018-06-07 2018-11-16 麒麟合盛网络技术股份有限公司 A kind of page personalized ordering method, apparatus and electronic equipment
CN108829808B (en) * 2018-06-07 2021-07-13 麒麟合盛网络技术股份有限公司 Page personalized sorting method and device and electronic equipment
CN109726292A (en) * 2019-01-02 2019-05-07 山东省科学院情报研究所 Text analyzing method and apparatus towards extensive multilingual data
CN109933741A (en) * 2019-02-27 2019-06-25 京东数字科技控股有限公司 User network behaviors feature extracting method, device and storage medium
CN109933741B (en) * 2019-02-27 2020-06-23 京东数字科技控股有限公司 Method, device and storage medium for extracting user network behavior characteristics
CN110224850A (en) * 2019-04-19 2019-09-10 北京亿阳信通科技有限公司 Telecommunication network fault early warning method, device and terminal device

Similar Documents

Publication Publication Date Title
CN102270212A (en) User interest feature extraction method based on hidden semi-Markov model
CN106649260B (en) Product characteristic structure tree construction method based on comment text mining
CN103544255B (en) Text semantic relativity based network public opinion information analysis method
US9009134B2 (en) Named entity recognition in query
Chen et al. A Two‐Step Resume Information Extraction Algorithm
CN103514183B (en) Information search method and system based on interactive document clustering
CN103150382B (en) Automatic short text semantic concept expansion method and system based on open knowledge base
CN103605658B (en) A kind of search engine system analyzed based on text emotion
CN107885793A (en) A kind of hot microblog topic analyzing and predicting method and system
CN103246644B (en) Method and device for processing Internet public opinion information
CN103473280A (en) Method and device for mining comparable network language materials
CN103646112A (en) Dependency parsing field self-adaption method based on web search
CN106844349A (en) Comment spam recognition methods based on coorinated training
CN112989208B (en) Information recommendation method and device, electronic equipment and storage medium
Wang et al. Neural related work summarization with a joint context-driven attention mechanism
CN104346382B (en) Use the text analysis system and method for language inquiry
CN111325018A (en) Domain dictionary construction method based on web retrieval and new word discovery
CN106126618B (en) Email address recommended method and system based on name
CN105677684A (en) Method for making semantic annotations on content generated by users based on external data sources
Chen et al. Toward the understanding of deep text matching models for information retrieval
CN111753540B (en) Method and system for collecting text data to perform Natural Language Processing (NLP)
CN103744830A (en) Semantic analysis based identification method of identity information in EXCEL document
Kalita et al. An extractive approach of text summarization of Assamese using WordNet
CN109597879B (en) Service behavior relation extraction method and device based on 'citation relation' data
CN113157857A (en) Hot topic detection method, device and equipment for news

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20111207