CN104008203A - User interest discovering method with ontology situation blended in - Google Patents

User interest discovering method with ontology situation blended in Download PDF

Info

Publication number
CN104008203A
CN104008203A CN201410269562.6A CN201410269562A CN104008203A CN 104008203 A CN104008203 A CN 104008203A CN 201410269562 A CN201410269562 A CN 201410269562A CN 104008203 A CN104008203 A CN 104008203A
Authority
CN
China
Prior art keywords
user
interest
state
model
probability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410269562.6A
Other languages
Chinese (zh)
Other versions
CN104008203B (en
Inventor
陈庭贵
周广澜
许翀寰
封毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Gongshang University
Original Assignee
Zhejiang Gongshang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Gongshang University filed Critical Zhejiang Gongshang University
Priority to CN201410269562.6A priority Critical patent/CN104008203B/en
Publication of CN104008203A publication Critical patent/CN104008203A/en
Application granted granted Critical
Publication of CN104008203B publication Critical patent/CN104008203B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A user interest discovering method with an ontology situation blended in comprises the steps that firstly, a user interest characteristic extracting model based on a second-order hidden Markov model is constructed for complex and multi-dimensional Web user interest behavior characteristic data in an e-commerce website; secondly, situation information capable of reflecting user interests is analyzed, wherein the situation information comprises individual information, environment information, device information and the like; thirdly, a user interest model based on situation ontology is constructed, and meanwhile the interest degree of the user individual information is measured and expressed by using the ideology of fuzzy logic; and lastly, a model is established according to user browsing paths and based on a user interest drifting detection method for a hidden semi-Markov model, and the average value of the average logarithm probable probabilities of a sequence is regarded as a threshold value point which is used for judging whether the interests are drifted. According to the user interest discovering method with the ontology situation blended in, the interest model capable of meeting user demands is constructed so as to provide individualized recommendation services and provide an effective means to improve the user satisfaction degree, and the user interest discovering method has good application value.

Description

A kind of Users' Interests Mining method that incorporates body situation
Technical field
The present invention relates to data mining and ontology field, especially a kind of Users' Interests Mining method, is specially adapted to the problem that user personalized information is served.
Background technology
Network application becomes increasingly complex, data volume is also increasing, some become more complicated and heavy as work such as ecommerce, web site designs, this need to be on the basis of user's existing information, dynamically adjust structure of web page from behavior aspects such as user's Access Interest, access time, visiting frequencies, carry out targetedly ecommerce and meet consumers' demand, provide personalized service.The individual info service of Internet is exactly the feature different according to user, and user interest hobby carries out the service of automatic Information Organization and adjustment, quick with one, and efficiently, acquisition of information mode solves the problems such as user profile is isotropic accurately.Based on this, how accurate understanding user's information requirement from the information of rapid expansion, builds and characterizes the user model of network user's feature, interest, target and behavior preference and carry out accordingly predictive user behavior, become a difficult problem for user provides personalized service better.How to find in time and exactly user interest drift, build the user interest model dynamically updating, to meet the customized information Demand and service of different user, become the key issue of individual info service simultaneously.
Summary of the invention
For the interest model that cannot meet consumers' demand that overcomes existing data mining mode is to provide the deficiency of personalized recommendation clothes, the present invention builds the interest model that can meet consumers' demand so that personalized recommendation service to be provided, the effective means that improves user satisfaction, provides a kind of Users' Interests Mining method that incorporates body situation.
The technical solution adopted for the present invention to solve the technical problems is:
Incorporate a Users' Interests Mining method for body situation, described Users' Interests Mining method comprises the following steps:
1) set up the user interest profile extraction model based on Second-Order Hidden Markov Model:
First need to collect and obtain the data that those can reflect user interest, process is as follows: obtain user source data from client, server end, proxy server end, after these source datas are obtained, they are carried out to pre-service and preserve for later use excavation of family interest with the form of setting.
Secondly, adopt Second-Order Hidden Markov Model to extract user interest profile, comprise training part and Extraction parts;
Training department divides the characteristic information sequencing comprising user interest to carry out pre-service, form text document, then to text after overscanning, utilize separator, space, line feed, colon typesetting retrtieval sequence to be converted to the text sections sequence of mark, finally with second order HMM model, it is calculated to following model parameter, definite algorithm of its parameter is as shown in formula:
1. initial probability distribution vector
π i = Init ( i ) Σ j = 1 N Init ( j ) , 1 ≤ i ≤ N - - - ( 1 )
Wherein, Init (i) refers in the whole training sample of mark, with state S ifor the number of initial state sequence, refer to the number summation taking all states as initial state sequence;
2. original state transition probability
a ij = C ij Σ k = 1 N C ik , 1 ≤ i , j ≤ N - - - ( 2 )
a ijk = C ijk Σ u = 1 N C iju , 1 ≤ i , j , k ≤ N - - - ( 3 )
Wherein, C ijand C ijkrepresent respectively from state S ito S jtransfer number, and the state S in t-1 moment i, t moment state S j, transferring to t+1 moment state is S knumber of times. with represent respectively from state S ito the transfer number sum of all states, and the state S in t-1 moment i, t moment state S j, transfer to the number of times sum of all states;
3. observed value discharges probability
b j ( O k ) = E j ( O k ) Σ i = 1 M E j ( O i ) , 1 ≤ j ≤ N - - - ( 4 )
b ij ( O k ) = E ij ( O k ) Σ i = 1 M E ij ( O u ) , 1 ≤ i , j ≤ N , 1 ≤ k ≤ M - - - ( 5 )
Wherein, E j(O k) and E ij(O k) represent respectively state S jtime discharge observed value O knumber of times, and the state S in t-1 moment i, t moment state S j, discharge observed value O knumber of times. with represent respectively state S jtime discharge the number of times sum of all observed values and the state S in t-1 moment i, t moment state S j, discharge the number of times sum of all observed values;
Extraction parts comprises two steps, that is: (a) carries out pre-service to the text of feature to be extracted, after overscanning, utilizes separator, space, line feed, colon typesetting retrtieval sequence to be converted to the text sections sequence of mark to text; (b) the second order HMM model of combined training part output, utilizes Viterbi algorithm to calculate, and the HMM model that application has established carries out user interest profile extraction, the State-output observed value O=O after processing is obtained 1o 2... O tas mode input, therefrom find out maximum probability in state tag sequence, the content that user characteristics extracts is exactly the observation text that is marked as dbjective state label;
2) analyze the contextual information of reflection user interest: by the search to user, browse the analysis of behavior and purchaser record information, derive interior user's of a period of time true interest;
3) the user interest ontology model that incorporates situation builds: first by several to region, sex, age, marriage, education background and income key factor indexs as a setting that affect user interest, and buy information and user behavior feature is carried out Fuzzy Processing to obtain its interest level in conjunction with user's history; Then adopt the method for expressing of body situation, by many granularity division, build user interest ontology model;
4) user interest drift detection method based on hidden semi-Markov model:
Choose two observed values and describe user's the behavior of browsing: a) the browse path sequence of user's accessed web page; B) arrive time interval of another webpage from a webpage; All state sets are expressed as S={S 1, S 2..., S n, corresponding observation set is expressed as V={v 1, v 2..., v n, the time interval is expressed as set I={1, and 2 ...; Browse behavior for user a certain, the number of its browse path link is a stochastic variable, the number of the observed value of exporting under given state this can be browsed behavior representation become set 1 ..., D}.Be that two-dimentional observed value sequence table is shown as O={ (r user's browse path sequence 1, τ 1) ..., (r t, τ t), wherein: r t∈ V represents the object of user's browsed web content; τ t∈ I represents that user is from a page jump to another page r twith r t-1between the time interval; The output probability matrix B={b of model i(v, q) } represent, for given state i ∈ S, b i(v, q) represents that user is at a page r t=v ∈ V and with the time interval of the previous page be τ tthe probability of=q ∈ I, and meet ∑ v,qb i(v, q)=1; Use P={p i(d) } be illustrated under given state i, export observed value number be d ∈ 1 ..., the probability of D}, is the probability matrix of state duration in hidden semi-Markov model, and meets ∑ dp i(d)=1; State transition probability matrix passes through A={a ijrepresent a ijrepresent the probability shifting from i ∈ S to j ∈ S; Initial π for probability vector={ π irepresent π irepresent the probability of original state in the time of i ∈ S;
One of user important interest behavior record is defined as: U interest=user, and background, history, behavior, timestamp, content}, wherein, user user represents, as ID; Background represents the concrete contextual factor of user; History represents user's historical purchaser record; Behavior identifies concrete interest behavior operating result; Timestamp represents the execution time of user behavior; Content represents interest topic content;
In user's accessing work, between any two behaviors operation, exist access transition probability P (q i→ q j), represent that interest weight is as follows:
P ( q i → q j ) = P ( q j | q i ) = P ( q i q j ) P ( q i ) = θ 1 W B ( q i , q j ) + θ 2 W HI ( q i , q j ) + θ 3 W IB ( q i , q j ) + θ 4 W L ( q i , q j ) θ 1 W B ( q i ) + θ 2 W HI ( q i ) + θ 3 W IB ( q i ) + θ 4 W L ( q i ) , i ≠ j 0 , i = j - - - ( 6 )
For each q jand corresponding concept all there is an observed value probability distribution be that u is to q jall access in, right interest probability, can be by icomprise access node set be Q i=q ' 1..., q' f| q' ∈ IC}, Q i,jrepresent at iin at q jthe set of all access nodes afterwards, represent Q i,jin contain the set of node:
Q i , j = { q k + l ′ | q k ′ = q j , l = 0 , . . . , ( f - k ) } , q j ∈ Q i Null - - - ( 7 )
By u at q jupper observed value probability distribution be defined as:
Then in user u basis institute likely in access sequence, find a status switch, set up the hidden semi-Markov model of user interest behavior, make it have maximum access probability:
P max ( σ z k ) = arg max ΠP ( q k → q k + 1 ) P ( σ z k | q k ) - - - ( 9 )
In the process that user interest drift is detected, first need to gather the observation sequence in HSMM model, and before model training, data are carried out to pre-service, determine after model parameter, then, by calling HSMM algorithm, obtain the probable value that user interest is constant, its probable value is calculated with the probable probability of average logarithm, when user's interest value is in normal range, user data is joined to training data and concentrate, to upgrade the parameter of hidden semi-Markov model; Otherwise this user will be considered to interest drift.
Further, described step 1) in, the approach that obtains user personalized information has two kinds: (a) by network surveying, the mode that user oneself participates in is collected; (b) obtain user's interest information by following the tracks of user behavior, adopt the feature extracting method of user behavior data.
Further, described step 2) in, user's behavioural information comprises user search keyword, the historical purchaser record of user and the behavior of user's historical viewings.
Further again, described step 3) in, according to user's interest situation information, building in User-ontology situation, user context is divided into the individual situation of user, user environment situation and subscriber equipment situation.Body adopts the form of level conceptional tree, and a certain element of user context represents by the each node in tree, builds situation ontologies tree.
Technical conceive of the present invention is: user oriented personalized service field, according to the related concept drift of method and Question Scene, propose to incorporate the Users' Interests Mining method of body situation, build the interest model that can meet consumers' demand so that personalized recommendation service to be provided, improved the effective means of user satisfaction.
Based on this, the present invention, taking user personalized information service as research object, introduces data mining, ontology, takes into full account user individual feature, proposes a kind of Users' Interests Mining method that incorporates body situation, effectively realizes user individual demand for services.
Introduce data mining, ontology, take into full account user individual feature, first for the Web user interest behavioural characteristic data of complex multi-dimensional in e-commerce website, build the user interest profile extraction model based on Second-Order Hidden Markov Model (Second-Order Hidden Markov Model); Next has analyzed the contextual information that can reflect user interest, comprises user's individual information, environmental information and facility information etc.; Again build the user interest model based on situation ontologies, adopt the thought that logic is fuzzy that the interest-degree of user's individual information is measured and expressed simultaneously, finally based on hidden semi-Markov model (Hidden Semi-Markov Model, HSMM) user interest drift detection method, build model according to user's browse path, using the average of the probable probability of average logarithm of sequence as threshold point, in order to judge whether interest drift has occurred.
Beneficial effect of the present invention is: the present invention has built the interest model that can meet consumers' demand so that personalized recommendation service to be provided, and improves the effective means of user satisfaction, has good using value.
Brief description of the drawings
Fig. 1 is the algorithm flow chart that the interest characteristics based on second order HMM extracts.
Fig. 2 is the structure flow process of user context body.
Fig. 3 interest drift detects block diagram.
Embodiment
Below in conjunction with accompanying drawing, the invention will be further described.
With reference to Fig. 1, Fig. 2 and Fig. 3, a kind of Users' Interests Mining method that incorporates body situation, described Users' Interests Mining method comprises the following steps:
5) set up the user interest profile extraction model based on Second-Order Hidden Markov Model: Web information extraction (Web Information Extraction) belongs to the category that web content excavates, extracted data from semi-structured Web document, the category information abstracting method using Web as information source.This step comprises the collection of user data and the foundation of user interest profile extraction model.
In order to build user interest model, first need collection to obtain the data that those can reflect user interest.Under normal circumstances, user's data are often a lot, comprise the information that user registers, log information, and page of text content-data, website topological structure, user's behavioral data, and page hyperlink information etc.These data can obtain from data sources such as client, server end, proxy server ends, after these metadata are obtained, they can be carried out to pre-service and preserve with suitable form, for later use the excavation of family interest.Be summed up, the approach that obtains user personalized information mainly contains two kinds: (a) by network surveying, the mode that user oneself participates in is collected.This method can directly be obtained user's interest and information requirement tendency, but must have user's positive cooperation; (b) obtain user's interest information by following the tracks of user behavior.Because the first is obtained the approach of user data, for example log-on message, directly provided in the mode of list by user, import background data base into, the extraction comparison of its user interest profile is convenient, and infer that by the implicit expression behavior of following the tracks of user the data of user interest cannot directly obtain, so mainly adopt the feature extracting method of user behavior data here.
Secondly, the feature extraction of user interest belongs to Text Information Extraction category, and information extraction has become an important directions of natural language processing, and theoretical research is constantly developed.The model extracting for information about at present mainly contains 3 classes: a kind of is model based on dictionary; One is rule-based model, as body; The model based on statistics, as hidden Markov model (HMM).Because HMM has very the statistical basis that is applicable to natural language processing, add its extract strong robustness, precision high, be easy to set up and the advantage such as strong adaptability, more and more receive researcher's concern.Here adopt Second-Order Hidden Markov Model to extract user interest profile, process flow diagram as shown in Figure 1.Mainly comprise two large divisions, i.e. training part and Extraction parts.
Training department divides some characteristic information sequencings that comprise user interest to carry out pre-service, form text document, then to text after overscanning, utilize the typesettings such as separator, space, line feed, colon retrtieval sequence to be converted to the text sections sequence of mark, finally with second order HMM model, it is calculated to following model parameter, definite algorithm of its parameter is as shown in formula:
1. initial probability distribution vector
π i = Init ( i ) Σ j = 1 N Init ( j ) , 1 ≤ i ≤ N - - - ( 10 )
Wherein, Init (i) refers in the whole training sample of mark, with state S ifor the number of initial state sequence, refer to the number summation taking all states as initial state sequence.
2. original state transition probability
a ij = C ij Σ k = 1 N C ik , 1 ≤ i , j ≤ N - - - ( 11 )
a ijk = C ijk Σ u = 1 N C iju , 1 ≤ i , j , k ≤ N - - - ( 12 )
Wherein, C ijand C ijkrepresent respectively from state S ito S jtransfer number, and the state S in t-1 moment i, t moment state S j, transferring to t+1 moment state is S knumber of times. with represent respectively from state S ito the transfer number sum of all states, and the state S in t-1 moment i, t moment state S j, transfer to the number of times sum of all states.
3. observed value discharges probability
b j ( O k ) = E j ( O k ) Σ i = 1 M E j ( O i ) , 1 ≤ j ≤ N - - - ( 13 )
b ij ( O k ) = E ij ( O k ) Σ i = 1 M E ij ( O u ) , 1 ≤ i , j ≤ N , 1 ≤ k ≤ M - - - ( 14 )
Wherein, E j(O k) and E ij(O k) represent respectively state S jtime discharge observed value O knumber of times, and the state S in t-1 moment i, t moment state S j, discharge observed value O knumber of times. with represent respectively state S jtime discharge the number of times sum of all observed values and the state S in t-1 moment i, t moment state S j, discharge the number of times sum of all observed values.
Extraction parts comprises two steps, that is: (a) carries out pre-service to the text of feature to be extracted, after overscanning, utilizes the typesettings such as separator, space, line feed, colon retrtieval sequence to be converted to the text sections sequence of mark to text; (b) the second order HMM model of combined training part output, utilizes Viterbi algorithm to calculate.The HMM model that application has established carries out user interest profile extraction.State-output observed value O=O after processing is obtained 1o 2... O tas mode input, therefrom find out maximum probability in state tag sequence, the content that user characteristics extracts is exactly the observation text that is marked as dbjective state label.
6) analyze the contextual information that reflects user interest: the network user's interest characteristics is mainly to be affected by the internal factor relevant to user interest and external factor.Internal factor has the aspects such as sex, age, occupation, personality, education, income, and external factor has comprised the aspects such as culture background, social environment, home background, and inherent with external many factors has caused the generation of the different behaviors of the network user.Just because of this reason makes different users have many-sided difference, also different with deflection to the level of interest of commodity.
User's interest usually can be reflected in the behavior of self, when they will produce certain tendentiousness to whatsit is interesting, user's demand and interest can be recorded in their behavioural information, therefore can be by the search to user, browse the analysis of the information such as behavior and purchaser record, derive the true interest of user in a period of time.Here, user's behavioural information mainly comprises the following aspects: user search keyword, the historical purchaser record of user, the behavior of user's historical viewings etc.
7) the user interest ontology model that incorporates situation builds: first by several to region, sex, age, marriage, education background and income key factor indexs as a setting that affect user interest, and buy information and user behavior feature is carried out Fuzzy Processing to obtain its interest level in conjunction with user's history; Then adopt the method for expressing of body situation, by many granularity division, build user interest ontology model.Build the process flow diagram of user context ontology model as shown in Figure 2.
According to user's interest situation information, building in User-ontology situation, user context is divided into the individual situation of user, user environment situation and subscriber equipment situation.Body normally adopts the form of level conceptional tree, and a certain element of user context represents by the each node in tree, builds situation ontologies tree.
8) user interest drift detection method based on hidden semi-Markov model: the shopping action process of user on the network in browsing browsed the complex process that the multiple individual factors such as object, culture background, hobby affect, by contextual factor, user behavior and interest content are considered to user's interest, and set up hidden semi-Markov model (HSMM) and detect user interest and whether drift about.
Suppose that user is in the process of browsing page, it is browsed behavior and meets Markov property, chooses following two observed values herein and describe user's the behavior of browsing: a) the browse path sequence of user's accessed web page; B) arrive time interval of another webpage from a webpage.All state sets are expressed as S={S 1, S 2..., S n, corresponding observation set is expressed as V={v 1, v 2..., v n, the time interval is expressed as set I={1, and 2 ...; Browse behavior for user a certain, the number of its browse path link is a stochastic variable, the number of the observed value of exporting under given state this can be browsed behavior representation become set 1 ..., D}.Be that two-dimentional observed value sequence table is shown as O={ (r user's browse path sequence 1, τ 1) ..., (r t, τ t), wherein: r t∈ V represents the object of user's browsed web content; τ t∈ I represents that user is from a page jump to another page r twith r t-1between the time interval.The output probability matrix B={b of model i(v, q) } represent, for given state i ∈ S, b i(v, q) represents that user is at a page r t=v ∈ V and with the time interval of the previous page be τ tthe probability of=q ∈ I, and meet ∑ v,qb i(v, q)=1.Use P={p i(d) } be illustrated under given state i, export observed value number be d ∈ 1 ..., the probability of D}, is the probability matrix of state duration in hidden semi-Markov model, and meets ∑ dp i(d)=1.State transition probability matrix passes through A={a ijrepresent a ijrepresent the probability shifting from i ∈ S to j ∈ S.Initial π for probability vector={ π irepresent π irepresent the probability of original state in the time of i ∈ S.
One of user important interest behavior record is defined as: U interest={ user, background, history, behavior, timestamp, content}.Wherein, user user represents, as ID; Background represents the concrete contextual factor of user; History represents user's historical purchaser record; Behavior identifies concrete interest behavior operating result; Timestamp represents the execution time of user behavior; Content represents interest topic content.
In user's accessing work, between any two behaviors operation, exist access transition probability P (q i→ q j), can represent that interest weight is as follows:
P ( q i → q j ) = P ( q j | q i ) = P ( q i q j ) P ( q i ) = θ 1 W B ( q i , q j ) + θ 2 W HI ( q i , q j ) + θ 3 W IB ( q i , q j ) + θ 4 W L ( q i , q j ) θ 1 W B ( q i ) + θ 2 W HI ( q i ) + θ 3 W IB ( q i ) + θ 4 W L ( q i ) , i ≠ j 0 , i = j - - - ( 15 )
For each q jand corresponding concept all there is an observed value probability distribution be that u is to q jall access in.Right interest probability, can be by icomprise access node set be Q i=q ' 1..., q' f| q' ∈ IC}, Q i,jrepresent at iin at q jthe set of all access nodes afterwards, represent Q i,jin contain the set of node:
Q i , j = { q k + l ′ | q k ′ = q j , l = 0 , . . . , ( f - k ) } , q j ∈ Q i Null - - - ( 16 )
By u at q jupper observed value probability distribution be defined as:
Then in user u basis institute likely in access sequence, find a status switch, set up the hidden semi-Markov model of user interest behavior, make it have maximum access probability:
P max ( σ z k ) = arg max ΠP ( q k → q k + 1 ) P ( σ z k | q k ) - - - ( 18 )
In the process that user interest drift is detected, first need to gather the observation sequence in HSMM model, here be mainly that user's the behavioral data of browsing is used as to observed value sequence, and before model training, data are carried out to pre-service, determine after model parameter, then by calling HSMM algorithm, obtain the probable value that user interest is constant, its probable value is calculated with the probable probability of average logarithm.When user's interest value is in normal range, user data is joined to training data and concentrate, to upgrade the parameter of hidden semi-Markov model; Otherwise this user will be considered to interest drift.The implementation method that drift detects as shown in Figure 3.

Claims (4)

1. a Users' Interests Mining method that incorporates body situation, is characterized in that: described Users' Interests Mining method comprises the following steps:
1) set up the user interest profile extraction model based on Second-Order Hidden Markov Model:
First need to collect and obtain the data that those can reflect user interest, process is as follows: obtain user source data from client, server end, proxy server end, after these source datas are obtained, they are carried out to pre-service and preserve for later use excavation of family interest with the form of setting.
Secondly, adopt Second-Order Hidden Markov Model to extract user interest profile, comprise training part and Extraction parts;
Training department divides the characteristic information sequencing comprising user interest to carry out pre-service, form text document, then to text after overscanning, utilize separator, space, line feed, colon typesetting retrtieval sequence to be converted to the text sections sequence of mark, finally with second order HMM model, it is calculated to following model parameter, definite algorithm of its parameter is as shown in formula:
1. initial probability distribution vector
π i = Init ( i ) Σ j = 1 N Init ( j ) , 1 ≤ i ≤ N - - - ( 1 )
Wherein, Init (i) refers in the whole training sample of mark, with state S ifor the number of initial state sequence, refer to the number summation taking all states as initial state sequence;
2. original state transition probability
a ij = C ij Σ k = 1 N C ik , 1 ≤ i , j ≤ N - - - ( 2 )
a ijk = C ijk Σ u = 1 N C iju , 1 ≤ i , j , k ≤ N - - - ( 3 )
Wherein, C ijand C ijkrepresent respectively from state S ito S jtransfer number, and the state S in t-1 moment i, t moment state S j, transferring to t+1 moment state is S knumber of times. with represent respectively from state S ito the transfer number sum of all states, and the state S in t-1 moment i, t moment state S j, transfer to the number of times sum of all states;
3. observed value discharges probability
b j ( O k ) = E j ( O k ) Σ i = 1 M E j ( O i ) , 1 ≤ j ≤ N - - - ( 4 )
b ij ( O k ) = E ij ( O k ) Σ i = 1 M E ij ( O u ) , 1 ≤ i , j ≤ N , 1 ≤ k ≤ M - - - ( 5 )
Wherein, E j(O k) and E ij(O k) represent respectively state S jtime discharge observed value O knumber of times, and the state S in t-1 moment i, t moment state S j, discharge observed value O knumber of times. with represent respectively state S jtime discharge the number of times sum of all observed values and the state S in t-1 moment i, t moment state S j, discharge the number of times sum of all observed values;
Extraction parts comprises two steps, that is: (a) carries out pre-service to the text of feature to be extracted, after overscanning, utilizes separator, space, line feed, colon typesetting retrtieval sequence to be converted to the text sections sequence of mark to text; (b) the second order HMM model of combined training part output, utilizes Viterbi algorithm to calculate, and the HMM model that application has established carries out user interest profile extraction, the State-output observed value O=O after processing is obtained 1o 2... O tas mode input, therefrom find out maximum probability in state tag sequence, the content that user characteristics extracts is exactly the observation text that is marked as dbjective state label;
2) analyze the contextual information of reflection user interest: by the search to user, browse the analysis of behavior and purchaser record information, derive interior user's of a period of time true interest;
3) the user interest ontology model that incorporates situation builds: first by several to region, sex, age, marriage, education background and income key factor indexs as a setting that affect user interest, and buy information and user behavior feature is carried out Fuzzy Processing to obtain its interest level in conjunction with user's history; Then adopt the method for expressing of body situation, by many granularity division, build user interest ontology model;
4) user interest drift detection method based on hidden semi-Markov model:
Choose two observed values and describe user's the behavior of browsing: a) the browse path sequence of user's accessed web page; B) arrive time interval of another webpage from a webpage; All state sets are expressed as S={S 1, S 2..., S n, corresponding observation set is expressed as V={v 1, v 2..., v n, the time interval is expressed as set I={1, and 2 ...; Browse behavior for user a certain, the number of its browse path link is a stochastic variable, the number of the observed value of exporting under given state this can be browsed behavior representation become set 1 ..., D}.Be that two-dimentional observed value sequence table is shown as O={ (r user's browse path sequence 1, τ 1) ..., (r t, τ t), wherein: r t∈ V represents the object of user's browsed web content; τ t∈ I represents that user is from a page jump to another page r twith r t-1between the time interval; The output probability matrix B={b of model i(v, q) } represent, for given state i ∈ S, b i(v, q) represents that user is at a page r t=v ∈ V and with the time interval of the previous page be τ tthe probability of=q ∈ I, and meet ∑ v,qb i(v, q)=1; Use P={p i(d) } be illustrated under given state i, export observed value number be d ∈ 1 ..., the probability of D}, is the probability matrix of state duration in hidden semi-Markov model, and meets ∑ dp i(d)=1; State transition probability matrix passes through A={a ijrepresent a ijrepresent the probability shifting from i ∈ S to j ∈ S; Initial π for probability vector={ π irepresent π irepresent the probability of original state in the time of i ∈ S;
One of user important interest behavior record is defined as: U interest=user, and background, history, behavior, timestamp, content}, wherein, user user represents, as ID; Background represents the concrete contextual factor of user; History represents user's historical purchaser record; Behavior identifies concrete interest behavior operating result; Timestamp represents the execution time of user behavior; Content represents interest topic content;
In user's accessing work, between any two behaviors operation, exist access transition probability P (q i→ q j), represent that interest weight is as follows:
P ( q i → q j ) = P ( q j | q i ) = P ( q i q j ) P ( q i ) = θ 1 W B ( q i , q j ) + θ 2 W HI ( q i , q j ) + θ 3 W IB ( q i , q j ) + θ 4 W L ( q i , q j ) θ 1 W B ( q i ) + θ 2 W HI ( q i ) + θ 3 W IB ( q i ) + θ 4 W L ( q i ) , i ≠ j 0 , i = j - - - ( 6 )
For each q jand corresponding concept all there is an observed value probability distribution be that u is to q jall access in, right interest probability, can be by icomprise access node set be Q i=q ' 1..., q' f| q' ∈ IC}, Q i,jrepresent at iin at q jthe set of all access nodes afterwards, represent Q i,jin contain the set of node:
Q i , j = { q k + l ′ | q k ′ = q j , l = 0 , . . . , ( f - k ) } , q j ∈ Q i Null - - - ( 7 )
By u at q jupper observed value probability distribution be defined as:
Then in user u basis institute likely in access sequence, find a status switch, set up the hidden semi-Markov model of user interest behavior, make it have maximum access probability:
P max ( σ z k ) = arg max ΠP ( q k → q k + 1 ) P ( σ z k | q k ) - - - ( 9 )
In the process that user interest drift is detected, first need to gather the observation sequence in HSMM model, and before model training, data are carried out to pre-service, determine after model parameter, then, by calling HSMM algorithm, obtain the probable value that user interest is constant, its probable value is calculated with the probable probability of average logarithm, when user's interest value is in normal range, user data is joined to training data and concentrate, to upgrade the parameter of hidden semi-Markov model; Otherwise this user will be considered to interest drift.
2. a kind of Users' Interests Mining method that incorporates body situation as claimed in claim 1, it is characterized in that: described step 1) in, the approach that obtains user personalized information has two kinds: (a) by network surveying, the mode that user oneself participates in is collected; (b) obtain user's interest information by following the tracks of user behavior, adopt the feature extracting method of user behavior data.
3. a kind of Users' Interests Mining method that incorporates body situation as claimed in claim 1 or 2, is characterized in that: described step 2) in, user's behavioural information comprises user search keyword, the historical purchaser record of user and the behavior of user's historical viewings.
4. a kind of Users' Interests Mining method that incorporates body situation as claimed in claim 1 or 2, it is characterized in that: described step 3) in, according to user's interest situation information, in structure User-ontology situation, user context is divided into the individual situation of user, user environment situation and subscriber equipment situation.Body adopts the form of level conceptional tree, and a certain element of user context represents by the each node in tree, builds situation ontologies tree.
CN201410269562.6A 2014-06-17 2014-06-17 A kind of Users' Interests Mining method for incorporating body situation Expired - Fee Related CN104008203B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410269562.6A CN104008203B (en) 2014-06-17 2014-06-17 A kind of Users' Interests Mining method for incorporating body situation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410269562.6A CN104008203B (en) 2014-06-17 2014-06-17 A kind of Users' Interests Mining method for incorporating body situation

Publications (2)

Publication Number Publication Date
CN104008203A true CN104008203A (en) 2014-08-27
CN104008203B CN104008203B (en) 2018-04-17

Family

ID=51368860

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410269562.6A Expired - Fee Related CN104008203B (en) 2014-06-17 2014-06-17 A kind of Users' Interests Mining method for incorporating body situation

Country Status (1)

Country Link
CN (1) CN104008203B (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105718471A (en) * 2014-12-03 2016-06-29 中国科学院声学研究所 User preference modeling method, system, and user preference evaluation method and system
CN106055661A (en) * 2016-06-02 2016-10-26 福州大学 Multi-interest resource recommendation method based on multi-Markov-chain model
CN106651517A (en) * 2016-12-20 2017-05-10 广东技术师范学院 Hidden semi-Markov model-based drug recommendation method
CN106776757A (en) * 2016-11-15 2017-05-31 中国银行股份有限公司 User completes the indicating means and device of Net silver operation
CN107609063A (en) * 2017-08-29 2018-01-19 重庆邮电大学 A kind of the mobile phone application commending system and its method of multi-tag classification
CN108038222A (en) * 2017-12-22 2018-05-15 冶金自动化研究设计院 System for Information System Modeling and entity-property frame of data access
CN108596205A (en) * 2018-03-20 2018-09-28 重庆邮电大学 Behavior prediction method is forwarded based on the microblogging of region correlation factor and rarefaction representation
CN108809955A (en) * 2018-05-22 2018-11-13 南瑞集团有限公司 A kind of power consumer behavior depth analysis method based on hidden Markov model
CN109741146A (en) * 2019-01-04 2019-05-10 平安科技(深圳)有限公司 Products Show method, apparatus, equipment and storage medium based on user behavior
CN109933741A (en) * 2019-02-27 2019-06-25 京东数字科技控股有限公司 User network behaviors feature extracting method, device and storage medium
WO2019120037A1 (en) * 2017-12-18 2019-06-27 Oppo广东移动通信有限公司 Model construction method, network resource preloading method and apparatus, medium, and terminal
CN110162553A (en) * 2019-05-21 2019-08-23 南京邮电大学 Users' Interests Mining method based on attention-RNN
CN110297817A (en) * 2019-06-25 2019-10-01 哈尔滨工业大学 A method of the structure of knowledge is constructed based on personalized Bayes's knowledge tracing model
CN110866542A (en) * 2019-10-17 2020-03-06 西安交通大学 Depth representation learning method based on feature controllable fusion
CN109388661B (en) * 2017-08-02 2020-04-21 创新先进技术有限公司 Model training method and device based on shared data
CN112948672A (en) * 2015-05-26 2021-06-11 谷歌有限责任公司 Predicting user needs for a particular context
CN114169869A (en) * 2022-02-14 2022-03-11 北京大学 Attention mechanism-based post recommendation method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100257131A1 (en) * 2007-12-28 2010-10-07 Kun-Oh Kim Apparatus and method for controlling hybrid motor
CN102043793A (en) * 2009-10-09 2011-05-04 卢健华 Knowledge-service-oriented recommendation method
CN103514289A (en) * 2013-10-08 2014-01-15 北京百度网讯科技有限公司 Method and device for building interest entity base

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100257131A1 (en) * 2007-12-28 2010-10-07 Kun-Oh Kim Apparatus and method for controlling hybrid motor
CN102043793A (en) * 2009-10-09 2011-05-04 卢健华 Knowledge-service-oriented recommendation method
CN103514289A (en) * 2013-10-08 2014-01-15 北京百度网讯科技有限公司 Method and device for building interest entity base

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105718471A (en) * 2014-12-03 2016-06-29 中国科学院声学研究所 User preference modeling method, system, and user preference evaluation method and system
CN112948672A (en) * 2015-05-26 2021-06-11 谷歌有限责任公司 Predicting user needs for a particular context
CN106055661B (en) * 2016-06-02 2017-11-17 福州大学 More interest resource recommendations based on more Markov chain models
CN106055661A (en) * 2016-06-02 2016-10-26 福州大学 Multi-interest resource recommendation method based on multi-Markov-chain model
CN106776757B (en) * 2016-11-15 2020-03-27 中国银行股份有限公司 Method and device for indicating user to complete online banking operation
CN106776757A (en) * 2016-11-15 2017-05-31 中国银行股份有限公司 User completes the indicating means and device of Net silver operation
CN106651517A (en) * 2016-12-20 2017-05-10 广东技术师范学院 Hidden semi-Markov model-based drug recommendation method
CN106651517B (en) * 2016-12-20 2021-11-30 广东技术师范大学 Drug recommendation method based on hidden semi-Markov model
CN109388661B (en) * 2017-08-02 2020-04-21 创新先进技术有限公司 Model training method and device based on shared data
US11106804B2 (en) 2017-08-02 2021-08-31 Advanced New Technologies Co., Ltd. Model training method and apparatus based on data sharing
US11106802B2 (en) 2017-08-02 2021-08-31 Advanced New Technologies Co., Ltd. Model training method and apparatus based on data sharing
CN107609063A (en) * 2017-08-29 2018-01-19 重庆邮电大学 A kind of the mobile phone application commending system and its method of multi-tag classification
CN107609063B (en) * 2017-08-29 2020-03-17 重庆邮电大学 Multi-label classified mobile phone application recommendation system and method thereof
WO2019120037A1 (en) * 2017-12-18 2019-06-27 Oppo广东移动通信有限公司 Model construction method, network resource preloading method and apparatus, medium, and terminal
CN108038222B (en) * 2017-12-22 2022-01-11 冶金自动化研究设计院 System of entity-attribute framework for information system modeling and data access
CN108038222A (en) * 2017-12-22 2018-05-15 冶金自动化研究设计院 System for Information System Modeling and entity-property frame of data access
CN108596205A (en) * 2018-03-20 2018-09-28 重庆邮电大学 Behavior prediction method is forwarded based on the microblogging of region correlation factor and rarefaction representation
CN108596205B (en) * 2018-03-20 2022-02-11 重庆邮电大学 Microblog forwarding behavior prediction method based on region correlation factor and sparse representation
CN108809955B (en) * 2018-05-22 2019-05-24 南瑞集团有限公司 A kind of power consumer behavior depth analysis method based on hidden Markov model
CN108809955A (en) * 2018-05-22 2018-11-13 南瑞集团有限公司 A kind of power consumer behavior depth analysis method based on hidden Markov model
CN109741146A (en) * 2019-01-04 2019-05-10 平安科技(深圳)有限公司 Products Show method, apparatus, equipment and storage medium based on user behavior
CN109933741B (en) * 2019-02-27 2020-06-23 京东数字科技控股有限公司 Method, device and storage medium for extracting user network behavior characteristics
CN109933741A (en) * 2019-02-27 2019-06-25 京东数字科技控股有限公司 User network behaviors feature extracting method, device and storage medium
CN110162553A (en) * 2019-05-21 2019-08-23 南京邮电大学 Users' Interests Mining method based on attention-RNN
CN110297817A (en) * 2019-06-25 2019-10-01 哈尔滨工业大学 A method of the structure of knowledge is constructed based on personalized Bayes's knowledge tracing model
CN110866542A (en) * 2019-10-17 2020-03-06 西安交通大学 Depth representation learning method based on feature controllable fusion
CN110866542B (en) * 2019-10-17 2021-11-19 西安交通大学 Depth representation learning method based on feature controllable fusion
CN114169869A (en) * 2022-02-14 2022-03-11 北京大学 Attention mechanism-based post recommendation method and device
CN114169869B (en) * 2022-02-14 2022-06-07 北京大学 Attention mechanism-based post recommendation method and device

Also Published As

Publication number Publication date
CN104008203B (en) 2018-04-17

Similar Documents

Publication Publication Date Title
CN104008203A (en) User interest discovering method with ontology situation blended in
Gurini et al. Temporal people-to-people recommendation on social networks with sentiment-based matrix factorization
Ozsoy From word embeddings to item recommendation
Chandra et al. Estimating twitter user location using social interactions--a content based approach
CN102004774B (en) Personalized user tag modeling and recommendation method based on unified probability model
Li et al. Community detection using hierarchical clustering based on edge-weighted similarity in cloud environment
CN101216825B (en) Indexing key words extraction/ prediction method
CN102254038B (en) System and method for analyzing network comment relevance
CN105045931A (en) Video recommendation method and system based on Web mining
CN106126582A (en) Recommend method and device
Abebe et al. Generic metadata representation framework for social-based event detection, description, and linkage
CN106484764A (en) User's similarity calculating method based on crowd portrayal technology
US20200026759A1 (en) Artificial intelligence engine for generating semantic directions for websites for automated entity targeting to mapped identities
CN104008109A (en) User interest based Web information push service system
CN104376406A (en) Enterprise innovation resource management and analysis system and method based on big data
CN105045901A (en) Search keyword push method and device
CN104268271A (en) Interest and network structure double-cohesion social network community discovering method
CN105718579A (en) Information push method based on internet-surfing log mining and user activity recognition
CN103678431A (en) Recommendation method based on standard labels and item grades
CN104484431A (en) Multi-source individualized news webpage recommending method based on field body
CN104462253A (en) Topic detection or tracking method for network text big data
CN103838756A (en) Method and device for determining pushed information
CN104750789A (en) Label recommendation method and device
CN103324666A (en) Topic tracing method and device based on micro-blog data
CN105159930A (en) Search keyword pushing method and apparatus

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180417