CN104008203A - User interest discovering method with ontology situation blended in - Google Patents
User interest discovering method with ontology situation blended in Download PDFInfo
- Publication number
- CN104008203A CN104008203A CN201410269562.6A CN201410269562A CN104008203A CN 104008203 A CN104008203 A CN 104008203A CN 201410269562 A CN201410269562 A CN 201410269562A CN 104008203 A CN104008203 A CN 104008203A
- Authority
- CN
- China
- Prior art keywords
- user
- interest
- state
- model
- probability
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 238000001514 detection method Methods 0.000 claims abstract description 5
- 230000006399 behavior Effects 0.000 claims description 53
- 238000000605 extraction Methods 0.000 claims description 20
- 238000012549 training Methods 0.000 claims description 18
- 238000005065 mining Methods 0.000 claims description 14
- 230000008569 process Effects 0.000 claims description 11
- 239000011159 matrix material Substances 0.000 claims description 9
- 238000012546 transfer Methods 0.000 claims description 9
- 230000007704 transition Effects 0.000 claims description 9
- 230000003542 behavioural effect Effects 0.000 claims description 7
- 210000001072 colon Anatomy 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 6
- 239000000284 extract Substances 0.000 claims description 5
- 238000013459 approach Methods 0.000 claims description 4
- 238000009412 basement excavation Methods 0.000 claims description 3
- 239000012141 concentrate Substances 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 241001269238 Data Species 0.000 claims description 2
- 238000007418 data mining Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000003058 natural language processing Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A user interest discovering method with an ontology situation blended in comprises the steps that firstly, a user interest characteristic extracting model based on a second-order hidden Markov model is constructed for complex and multi-dimensional Web user interest behavior characteristic data in an e-commerce website; secondly, situation information capable of reflecting user interests is analyzed, wherein the situation information comprises individual information, environment information, device information and the like; thirdly, a user interest model based on situation ontology is constructed, and meanwhile the interest degree of the user individual information is measured and expressed by using the ideology of fuzzy logic; and lastly, a model is established according to user browsing paths and based on a user interest drifting detection method for a hidden semi-Markov model, and the average value of the average logarithm probable probabilities of a sequence is regarded as a threshold value point which is used for judging whether the interests are drifted. According to the user interest discovering method with the ontology situation blended in, the interest model capable of meeting user demands is constructed so as to provide individualized recommendation services and provide an effective means to improve the user satisfaction degree, and the user interest discovering method has good application value.
Description
Technical field
The present invention relates to data mining and ontology field, especially a kind of Users' Interests Mining method, is specially adapted to the problem that user personalized information is served.
Background technology
Network application becomes increasingly complex, data volume is also increasing, some become more complicated and heavy as work such as ecommerce, web site designs, this need to be on the basis of user's existing information, dynamically adjust structure of web page from behavior aspects such as user's Access Interest, access time, visiting frequencies, carry out targetedly ecommerce and meet consumers' demand, provide personalized service.The individual info service of Internet is exactly the feature different according to user, and user interest hobby carries out the service of automatic Information Organization and adjustment, quick with one, and efficiently, acquisition of information mode solves the problems such as user profile is isotropic accurately.Based on this, how accurate understanding user's information requirement from the information of rapid expansion, builds and characterizes the user model of network user's feature, interest, target and behavior preference and carry out accordingly predictive user behavior, become a difficult problem for user provides personalized service better.How to find in time and exactly user interest drift, build the user interest model dynamically updating, to meet the customized information Demand and service of different user, become the key issue of individual info service simultaneously.
Summary of the invention
For the interest model that cannot meet consumers' demand that overcomes existing data mining mode is to provide the deficiency of personalized recommendation clothes, the present invention builds the interest model that can meet consumers' demand so that personalized recommendation service to be provided, the effective means that improves user satisfaction, provides a kind of Users' Interests Mining method that incorporates body situation.
The technical solution adopted for the present invention to solve the technical problems is:
Incorporate a Users' Interests Mining method for body situation, described Users' Interests Mining method comprises the following steps:
1) set up the user interest profile extraction model based on Second-Order Hidden Markov Model:
First need to collect and obtain the data that those can reflect user interest, process is as follows: obtain user source data from client, server end, proxy server end, after these source datas are obtained, they are carried out to pre-service and preserve for later use excavation of family interest with the form of setting.
Secondly, adopt Second-Order Hidden Markov Model to extract user interest profile, comprise training part and Extraction parts;
Training department divides the characteristic information sequencing comprising user interest to carry out pre-service, form text document, then to text after overscanning, utilize separator, space, line feed, colon typesetting retrtieval sequence to be converted to the text sections sequence of mark, finally with second order HMM model, it is calculated to following model parameter, definite algorithm of its parameter is as shown in formula:
1. initial probability distribution vector
Wherein, Init (i) refers in the whole training sample of mark, with state S
ifor the number of initial state sequence,
refer to the number summation taking all states as initial state sequence;
2. original state transition probability
Wherein, C
ijand C
ijkrepresent respectively from state S
ito S
jtransfer number, and the state S in t-1 moment
i, t moment state S
j, transferring to t+1 moment state is S
knumber of times.
with
represent respectively from state S
ito the transfer number sum of all states, and the state S in t-1 moment
i, t moment state S
j, transfer to the number of times sum of all states;
3. observed value discharges probability
Wherein, E
j(O
k) and E
ij(O
k) represent respectively state S
jtime discharge observed value O
knumber of times, and the state S in t-1 moment
i, t moment state S
j, discharge observed value O
knumber of times.
with
represent respectively state S
jtime discharge the number of times sum of all observed values and the state S in t-1 moment
i, t moment state S
j, discharge the number of times sum of all observed values;
Extraction parts comprises two steps, that is: (a) carries out pre-service to the text of feature to be extracted, after overscanning, utilizes separator, space, line feed, colon typesetting retrtieval sequence to be converted to the text sections sequence of mark to text; (b) the second order HMM model of combined training part output, utilizes Viterbi algorithm to calculate, and the HMM model that application has established carries out user interest profile extraction, the State-output observed value O=O after processing is obtained
1o
2... O
tas mode input, therefrom find out maximum probability in state tag sequence, the content that user characteristics extracts is exactly the observation text that is marked as dbjective state label;
2) analyze the contextual information of reflection user interest: by the search to user, browse the analysis of behavior and purchaser record information, derive interior user's of a period of time true interest;
3) the user interest ontology model that incorporates situation builds: first by several to region, sex, age, marriage, education background and income key factor indexs as a setting that affect user interest, and buy information and user behavior feature is carried out Fuzzy Processing to obtain its interest level in conjunction with user's history; Then adopt the method for expressing of body situation, by many granularity division, build user interest ontology model;
4) user interest drift detection method based on hidden semi-Markov model:
Choose two observed values and describe user's the behavior of browsing: a) the browse path sequence of user's accessed web page; B) arrive time interval of another webpage from a webpage; All state sets are expressed as S={S
1, S
2..., S
n, corresponding observation set is expressed as V={v
1, v
2..., v
n, the time interval is expressed as set I={1, and 2 ...; Browse behavior for user a certain, the number of its browse path link is a stochastic variable, the number of the observed value of exporting under given state this can be browsed behavior representation become set 1 ..., D}.Be that two-dimentional observed value sequence table is shown as O={ (r user's browse path sequence
1, τ
1) ..., (r
t, τ
t), wherein: r
t∈ V represents the object of user's browsed web content; τ
t∈ I represents that user is from a page jump to another page r
twith r
t-1between the time interval; The output probability matrix B={b of model
i(v, q) } represent, for given state i ∈ S, b
i(v, q) represents that user is at a page r
t=v ∈ V and with the time interval of the previous page be τ
tthe probability of=q ∈ I, and meet ∑
v,qb
i(v, q)=1; Use P={p
i(d) } be illustrated under given state i, export observed value number be d ∈ 1 ..., the probability of D}, is the probability matrix of state duration in hidden semi-Markov model, and meets ∑
dp
i(d)=1; State transition probability matrix passes through A={a
ijrepresent a
ijrepresent the probability shifting from i ∈ S to j ∈ S; Initial π for probability vector={ π
irepresent π
irepresent the probability of original state in the time of i ∈ S;
One of user important interest behavior record is defined as: U
interest=user, and background, history, behavior, timestamp, content}, wherein, user user represents, as ID; Background represents the concrete contextual factor of user; History represents user's historical purchaser record; Behavior identifies concrete interest behavior operating result; Timestamp represents the execution time of user behavior; Content represents interest topic content;
In user's accessing work, between any two behaviors operation, exist access transition probability P (q
i→ q
j), represent that interest weight is as follows:
For each q
jand corresponding concept
all there is an observed value probability distribution
be that u is to q
jall access in, right
interest probability, can be by
icomprise access node set be Q
i=q '
1..., q'
f| q' ∈ IC}, Q
i,jrepresent at
iin at q
jthe set of all access nodes afterwards,
represent Q
i,jin contain
the set of node:
By u at q
jupper observed value probability distribution
be defined as:
Then in user u basis
institute likely in access sequence, find a status switch, set up the hidden semi-Markov model of user interest behavior, make it have maximum access probability:
In the process that user interest drift is detected, first need to gather the observation sequence in HSMM model, and before model training, data are carried out to pre-service, determine after model parameter, then, by calling HSMM algorithm, obtain the probable value that user interest is constant, its probable value is calculated with the probable probability of average logarithm, when user's interest value is in normal range, user data is joined to training data and concentrate, to upgrade the parameter of hidden semi-Markov model; Otherwise this user will be considered to interest drift.
Further, described step 1) in, the approach that obtains user personalized information has two kinds: (a) by network surveying, the mode that user oneself participates in is collected; (b) obtain user's interest information by following the tracks of user behavior, adopt the feature extracting method of user behavior data.
Further, described step 2) in, user's behavioural information comprises user search keyword, the historical purchaser record of user and the behavior of user's historical viewings.
Further again, described step 3) in, according to user's interest situation information, building in User-ontology situation, user context is divided into the individual situation of user, user environment situation and subscriber equipment situation.Body adopts the form of level conceptional tree, and a certain element of user context represents by the each node in tree, builds situation ontologies tree.
Technical conceive of the present invention is: user oriented personalized service field, according to the related concept drift of method and Question Scene, propose to incorporate the Users' Interests Mining method of body situation, build the interest model that can meet consumers' demand so that personalized recommendation service to be provided, improved the effective means of user satisfaction.
Based on this, the present invention, taking user personalized information service as research object, introduces data mining, ontology, takes into full account user individual feature, proposes a kind of Users' Interests Mining method that incorporates body situation, effectively realizes user individual demand for services.
Introduce data mining, ontology, take into full account user individual feature, first for the Web user interest behavioural characteristic data of complex multi-dimensional in e-commerce website, build the user interest profile extraction model based on Second-Order Hidden Markov Model (Second-Order Hidden Markov Model); Next has analyzed the contextual information that can reflect user interest, comprises user's individual information, environmental information and facility information etc.; Again build the user interest model based on situation ontologies, adopt the thought that logic is fuzzy that the interest-degree of user's individual information is measured and expressed simultaneously, finally based on hidden semi-Markov model (Hidden Semi-Markov Model, HSMM) user interest drift detection method, build model according to user's browse path, using the average of the probable probability of average logarithm of sequence as threshold point, in order to judge whether interest drift has occurred.
Beneficial effect of the present invention is: the present invention has built the interest model that can meet consumers' demand so that personalized recommendation service to be provided, and improves the effective means of user satisfaction, has good using value.
Brief description of the drawings
Fig. 1 is the algorithm flow chart that the interest characteristics based on second order HMM extracts.
Fig. 2 is the structure flow process of user context body.
Fig. 3 interest drift detects block diagram.
Embodiment
Below in conjunction with accompanying drawing, the invention will be further described.
With reference to Fig. 1, Fig. 2 and Fig. 3, a kind of Users' Interests Mining method that incorporates body situation, described Users' Interests Mining method comprises the following steps:
5) set up the user interest profile extraction model based on Second-Order Hidden Markov Model: Web information extraction (Web Information Extraction) belongs to the category that web content excavates, extracted data from semi-structured Web document, the category information abstracting method using Web as information source.This step comprises the collection of user data and the foundation of user interest profile extraction model.
In order to build user interest model, first need collection to obtain the data that those can reflect user interest.Under normal circumstances, user's data are often a lot, comprise the information that user registers, log information, and page of text content-data, website topological structure, user's behavioral data, and page hyperlink information etc.These data can obtain from data sources such as client, server end, proxy server ends, after these metadata are obtained, they can be carried out to pre-service and preserve with suitable form, for later use the excavation of family interest.Be summed up, the approach that obtains user personalized information mainly contains two kinds: (a) by network surveying, the mode that user oneself participates in is collected.This method can directly be obtained user's interest and information requirement tendency, but must have user's positive cooperation; (b) obtain user's interest information by following the tracks of user behavior.Because the first is obtained the approach of user data, for example log-on message, directly provided in the mode of list by user, import background data base into, the extraction comparison of its user interest profile is convenient, and infer that by the implicit expression behavior of following the tracks of user the data of user interest cannot directly obtain, so mainly adopt the feature extracting method of user behavior data here.
Secondly, the feature extraction of user interest belongs to Text Information Extraction category, and information extraction has become an important directions of natural language processing, and theoretical research is constantly developed.The model extracting for information about at present mainly contains 3 classes: a kind of is model based on dictionary; One is rule-based model, as body; The model based on statistics, as hidden Markov model (HMM).Because HMM has very the statistical basis that is applicable to natural language processing, add its extract strong robustness, precision high, be easy to set up and the advantage such as strong adaptability, more and more receive researcher's concern.Here adopt Second-Order Hidden Markov Model to extract user interest profile, process flow diagram as shown in Figure 1.Mainly comprise two large divisions, i.e. training part and Extraction parts.
Training department divides some characteristic information sequencings that comprise user interest to carry out pre-service, form text document, then to text after overscanning, utilize the typesettings such as separator, space, line feed, colon retrtieval sequence to be converted to the text sections sequence of mark, finally with second order HMM model, it is calculated to following model parameter, definite algorithm of its parameter is as shown in formula:
1. initial probability distribution vector
Wherein, Init (i) refers in the whole training sample of mark, with state S
ifor the number of initial state sequence,
refer to the number summation taking all states as initial state sequence.
2. original state transition probability
Wherein, C
ijand C
ijkrepresent respectively from state S
ito S
jtransfer number, and the state S in t-1 moment
i, t moment state S
j, transferring to t+1 moment state is S
knumber of times.
with
represent respectively from state S
ito the transfer number sum of all states, and the state S in t-1 moment
i, t moment state S
j, transfer to the number of times sum of all states.
3. observed value discharges probability
Wherein, E
j(O
k) and E
ij(O
k) represent respectively state S
jtime discharge observed value O
knumber of times, and the state S in t-1 moment
i, t moment state S
j, discharge observed value O
knumber of times.
with
represent respectively state S
jtime discharge the number of times sum of all observed values and the state S in t-1 moment
i, t moment state S
j, discharge the number of times sum of all observed values.
Extraction parts comprises two steps, that is: (a) carries out pre-service to the text of feature to be extracted, after overscanning, utilizes the typesettings such as separator, space, line feed, colon retrtieval sequence to be converted to the text sections sequence of mark to text; (b) the second order HMM model of combined training part output, utilizes Viterbi algorithm to calculate.The HMM model that application has established carries out user interest profile extraction.State-output observed value O=O after processing is obtained
1o
2... O
tas mode input, therefrom find out maximum probability in state tag sequence, the content that user characteristics extracts is exactly the observation text that is marked as dbjective state label.
6) analyze the contextual information that reflects user interest: the network user's interest characteristics is mainly to be affected by the internal factor relevant to user interest and external factor.Internal factor has the aspects such as sex, age, occupation, personality, education, income, and external factor has comprised the aspects such as culture background, social environment, home background, and inherent with external many factors has caused the generation of the different behaviors of the network user.Just because of this reason makes different users have many-sided difference, also different with deflection to the level of interest of commodity.
User's interest usually can be reflected in the behavior of self, when they will produce certain tendentiousness to whatsit is interesting, user's demand and interest can be recorded in their behavioural information, therefore can be by the search to user, browse the analysis of the information such as behavior and purchaser record, derive the true interest of user in a period of time.Here, user's behavioural information mainly comprises the following aspects: user search keyword, the historical purchaser record of user, the behavior of user's historical viewings etc.
7) the user interest ontology model that incorporates situation builds: first by several to region, sex, age, marriage, education background and income key factor indexs as a setting that affect user interest, and buy information and user behavior feature is carried out Fuzzy Processing to obtain its interest level in conjunction with user's history; Then adopt the method for expressing of body situation, by many granularity division, build user interest ontology model.Build the process flow diagram of user context ontology model as shown in Figure 2.
According to user's interest situation information, building in User-ontology situation, user context is divided into the individual situation of user, user environment situation and subscriber equipment situation.Body normally adopts the form of level conceptional tree, and a certain element of user context represents by the each node in tree, builds situation ontologies tree.
8) user interest drift detection method based on hidden semi-Markov model: the shopping action process of user on the network in browsing browsed the complex process that the multiple individual factors such as object, culture background, hobby affect, by contextual factor, user behavior and interest content are considered to user's interest, and set up hidden semi-Markov model (HSMM) and detect user interest and whether drift about.
Suppose that user is in the process of browsing page, it is browsed behavior and meets Markov property, chooses following two observed values herein and describe user's the behavior of browsing: a) the browse path sequence of user's accessed web page; B) arrive time interval of another webpage from a webpage.All state sets are expressed as S={S
1, S
2..., S
n, corresponding observation set is expressed as V={v
1, v
2..., v
n, the time interval is expressed as set I={1, and 2 ...; Browse behavior for user a certain, the number of its browse path link is a stochastic variable, the number of the observed value of exporting under given state this can be browsed behavior representation become set 1 ..., D}.Be that two-dimentional observed value sequence table is shown as O={ (r user's browse path sequence
1, τ
1) ..., (r
t, τ
t), wherein: r
t∈ V represents the object of user's browsed web content; τ
t∈ I represents that user is from a page jump to another page r
twith r
t-1between the time interval.The output probability matrix B={b of model
i(v, q) } represent, for given state i ∈ S, b
i(v, q) represents that user is at a page r
t=v ∈ V and with the time interval of the previous page be τ
tthe probability of=q ∈ I, and meet ∑
v,qb
i(v, q)=1.Use P={p
i(d) } be illustrated under given state i, export observed value number be d ∈ 1 ..., the probability of D}, is the probability matrix of state duration in hidden semi-Markov model, and meets ∑
dp
i(d)=1.State transition probability matrix passes through A={a
ijrepresent a
ijrepresent the probability shifting from i ∈ S to j ∈ S.Initial π for probability vector={ π
irepresent π
irepresent the probability of original state in the time of i ∈ S.
One of user important interest behavior record is defined as: U
interest={ user, background, history, behavior, timestamp, content}.Wherein, user user represents, as ID; Background represents the concrete contextual factor of user; History represents user's historical purchaser record; Behavior identifies concrete interest behavior operating result; Timestamp represents the execution time of user behavior; Content represents interest topic content.
In user's accessing work, between any two behaviors operation, exist access transition probability P (q
i→ q
j), can represent that interest weight is as follows:
For each q
jand corresponding concept
all there is an observed value probability distribution
be that u is to q
jall access in.Right
interest probability, can be by
icomprise access node set be Q
i=q '
1..., q'
f| q' ∈ IC}, Q
i,jrepresent at
iin at q
jthe set of all access nodes afterwards,
represent Q
i,jin contain
the set of node:
By u at q
jupper observed value probability distribution
be defined as:
Then in user u basis
institute likely in access sequence, find a status switch, set up the hidden semi-Markov model of user interest behavior, make it have maximum access probability:
In the process that user interest drift is detected, first need to gather the observation sequence in HSMM model, here be mainly that user's the behavioral data of browsing is used as to observed value sequence, and before model training, data are carried out to pre-service, determine after model parameter, then by calling HSMM algorithm, obtain the probable value that user interest is constant, its probable value is calculated with the probable probability of average logarithm.When user's interest value is in normal range, user data is joined to training data and concentrate, to upgrade the parameter of hidden semi-Markov model; Otherwise this user will be considered to interest drift.The implementation method that drift detects as shown in Figure 3.
Claims (4)
1. a Users' Interests Mining method that incorporates body situation, is characterized in that: described Users' Interests Mining method comprises the following steps:
1) set up the user interest profile extraction model based on Second-Order Hidden Markov Model:
First need to collect and obtain the data that those can reflect user interest, process is as follows: obtain user source data from client, server end, proxy server end, after these source datas are obtained, they are carried out to pre-service and preserve for later use excavation of family interest with the form of setting.
Secondly, adopt Second-Order Hidden Markov Model to extract user interest profile, comprise training part and Extraction parts;
Training department divides the characteristic information sequencing comprising user interest to carry out pre-service, form text document, then to text after overscanning, utilize separator, space, line feed, colon typesetting retrtieval sequence to be converted to the text sections sequence of mark, finally with second order HMM model, it is calculated to following model parameter, definite algorithm of its parameter is as shown in formula:
1. initial probability distribution vector
Wherein, Init (i) refers in the whole training sample of mark, with state S
ifor the number of initial state sequence,
refer to the number summation taking all states as initial state sequence;
2. original state transition probability
Wherein, C
ijand C
ijkrepresent respectively from state S
ito S
jtransfer number, and the state S in t-1 moment
i, t moment state S
j, transferring to t+1 moment state is S
knumber of times.
with
represent respectively from state S
ito the transfer number sum of all states, and the state S in t-1 moment
i, t moment state S
j, transfer to the number of times sum of all states;
3. observed value discharges probability
Wherein, E
j(O
k) and E
ij(O
k) represent respectively state S
jtime discharge observed value O
knumber of times, and the state S in t-1 moment
i, t moment state S
j, discharge observed value O
knumber of times.
with
represent respectively state S
jtime discharge the number of times sum of all observed values and the state S in t-1 moment
i, t moment state S
j, discharge the number of times sum of all observed values;
Extraction parts comprises two steps, that is: (a) carries out pre-service to the text of feature to be extracted, after overscanning, utilizes separator, space, line feed, colon typesetting retrtieval sequence to be converted to the text sections sequence of mark to text; (b) the second order HMM model of combined training part output, utilizes Viterbi algorithm to calculate, and the HMM model that application has established carries out user interest profile extraction, the State-output observed value O=O after processing is obtained
1o
2... O
tas mode input, therefrom find out maximum probability in state tag sequence, the content that user characteristics extracts is exactly the observation text that is marked as dbjective state label;
2) analyze the contextual information of reflection user interest: by the search to user, browse the analysis of behavior and purchaser record information, derive interior user's of a period of time true interest;
3) the user interest ontology model that incorporates situation builds: first by several to region, sex, age, marriage, education background and income key factor indexs as a setting that affect user interest, and buy information and user behavior feature is carried out Fuzzy Processing to obtain its interest level in conjunction with user's history; Then adopt the method for expressing of body situation, by many granularity division, build user interest ontology model;
4) user interest drift detection method based on hidden semi-Markov model:
Choose two observed values and describe user's the behavior of browsing: a) the browse path sequence of user's accessed web page; B) arrive time interval of another webpage from a webpage; All state sets are expressed as S={S
1, S
2..., S
n, corresponding observation set is expressed as V={v
1, v
2..., v
n, the time interval is expressed as set I={1, and 2 ...; Browse behavior for user a certain, the number of its browse path link is a stochastic variable, the number of the observed value of exporting under given state this can be browsed behavior representation become set 1 ..., D}.Be that two-dimentional observed value sequence table is shown as O={ (r user's browse path sequence
1, τ
1) ..., (r
t, τ
t), wherein: r
t∈ V represents the object of user's browsed web content; τ
t∈ I represents that user is from a page jump to another page r
twith r
t-1between the time interval; The output probability matrix B={b of model
i(v, q) } represent, for given state i ∈ S, b
i(v, q) represents that user is at a page r
t=v ∈ V and with the time interval of the previous page be τ
tthe probability of=q ∈ I, and meet ∑
v,qb
i(v, q)=1; Use P={p
i(d) } be illustrated under given state i, export observed value number be d ∈ 1 ..., the probability of D}, is the probability matrix of state duration in hidden semi-Markov model, and meets ∑
dp
i(d)=1; State transition probability matrix passes through A={a
ijrepresent a
ijrepresent the probability shifting from i ∈ S to j ∈ S; Initial π for probability vector={ π
irepresent π
irepresent the probability of original state in the time of i ∈ S;
One of user important interest behavior record is defined as: U
interest=user, and background, history, behavior, timestamp, content}, wherein, user user represents, as ID; Background represents the concrete contextual factor of user; History represents user's historical purchaser record; Behavior identifies concrete interest behavior operating result; Timestamp represents the execution time of user behavior; Content represents interest topic content;
In user's accessing work, between any two behaviors operation, exist access transition probability P (q
i→ q
j), represent that interest weight is as follows:
For each q
jand corresponding concept
all there is an observed value probability distribution
be that u is to q
jall access in, right
interest probability, can be by
icomprise access node set be Q
i=q '
1..., q'
f| q' ∈ IC}, Q
i,jrepresent at
iin at q
jthe set of all access nodes afterwards,
represent Q
i,jin contain
the set of node:
By u at q
jupper observed value probability distribution
be defined as:
Then in user u basis
institute likely in access sequence, find a status switch, set up the hidden semi-Markov model of user interest behavior, make it have maximum access probability:
In the process that user interest drift is detected, first need to gather the observation sequence in HSMM model, and before model training, data are carried out to pre-service, determine after model parameter, then, by calling HSMM algorithm, obtain the probable value that user interest is constant, its probable value is calculated with the probable probability of average logarithm, when user's interest value is in normal range, user data is joined to training data and concentrate, to upgrade the parameter of hidden semi-Markov model; Otherwise this user will be considered to interest drift.
2. a kind of Users' Interests Mining method that incorporates body situation as claimed in claim 1, it is characterized in that: described step 1) in, the approach that obtains user personalized information has two kinds: (a) by network surveying, the mode that user oneself participates in is collected; (b) obtain user's interest information by following the tracks of user behavior, adopt the feature extracting method of user behavior data.
3. a kind of Users' Interests Mining method that incorporates body situation as claimed in claim 1 or 2, is characterized in that: described step 2) in, user's behavioural information comprises user search keyword, the historical purchaser record of user and the behavior of user's historical viewings.
4. a kind of Users' Interests Mining method that incorporates body situation as claimed in claim 1 or 2, it is characterized in that: described step 3) in, according to user's interest situation information, in structure User-ontology situation, user context is divided into the individual situation of user, user environment situation and subscriber equipment situation.Body adopts the form of level conceptional tree, and a certain element of user context represents by the each node in tree, builds situation ontologies tree.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410269562.6A CN104008203B (en) | 2014-06-17 | 2014-06-17 | A kind of Users' Interests Mining method for incorporating body situation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410269562.6A CN104008203B (en) | 2014-06-17 | 2014-06-17 | A kind of Users' Interests Mining method for incorporating body situation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104008203A true CN104008203A (en) | 2014-08-27 |
CN104008203B CN104008203B (en) | 2018-04-17 |
Family
ID=51368860
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410269562.6A Expired - Fee Related CN104008203B (en) | 2014-06-17 | 2014-06-17 | A kind of Users' Interests Mining method for incorporating body situation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104008203B (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105718471A (en) * | 2014-12-03 | 2016-06-29 | 中国科学院声学研究所 | User preference modeling method, system, and user preference evaluation method and system |
CN106055661A (en) * | 2016-06-02 | 2016-10-26 | 福州大学 | Multi-interest resource recommendation method based on multi-Markov-chain model |
CN106651517A (en) * | 2016-12-20 | 2017-05-10 | 广东技术师范学院 | Hidden semi-Markov model-based drug recommendation method |
CN106776757A (en) * | 2016-11-15 | 2017-05-31 | 中国银行股份有限公司 | User completes the indicating means and device of Net silver operation |
CN107609063A (en) * | 2017-08-29 | 2018-01-19 | 重庆邮电大学 | A kind of the mobile phone application commending system and its method of multi-tag classification |
CN108038222A (en) * | 2017-12-22 | 2018-05-15 | 冶金自动化研究设计院 | System for Information System Modeling and entity-property frame of data access |
CN108596205A (en) * | 2018-03-20 | 2018-09-28 | 重庆邮电大学 | Behavior prediction method is forwarded based on the microblogging of region correlation factor and rarefaction representation |
CN108809955A (en) * | 2018-05-22 | 2018-11-13 | 南瑞集团有限公司 | A kind of power consumer behavior depth analysis method based on hidden Markov model |
CN109741146A (en) * | 2019-01-04 | 2019-05-10 | 平安科技(深圳)有限公司 | Products Show method, apparatus, equipment and storage medium based on user behavior |
CN109933741A (en) * | 2019-02-27 | 2019-06-25 | 京东数字科技控股有限公司 | User network behaviors feature extracting method, device and storage medium |
WO2019120037A1 (en) * | 2017-12-18 | 2019-06-27 | Oppo广东移动通信有限公司 | Model construction method, network resource preloading method and apparatus, medium, and terminal |
CN110162553A (en) * | 2019-05-21 | 2019-08-23 | 南京邮电大学 | Users' Interests Mining method based on attention-RNN |
CN110297817A (en) * | 2019-06-25 | 2019-10-01 | 哈尔滨工业大学 | A method of the structure of knowledge is constructed based on personalized Bayes's knowledge tracing model |
CN110866542A (en) * | 2019-10-17 | 2020-03-06 | 西安交通大学 | Depth representation learning method based on feature controllable fusion |
CN109388661B (en) * | 2017-08-02 | 2020-04-21 | 创新先进技术有限公司 | Model training method and device based on shared data |
CN112948672A (en) * | 2015-05-26 | 2021-06-11 | 谷歌有限责任公司 | Predicting user needs for a particular context |
CN114169869A (en) * | 2022-02-14 | 2022-03-11 | 北京大学 | Attention mechanism-based post recommendation method and device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100257131A1 (en) * | 2007-12-28 | 2010-10-07 | Kun-Oh Kim | Apparatus and method for controlling hybrid motor |
CN102043793A (en) * | 2009-10-09 | 2011-05-04 | 卢健华 | Knowledge-service-oriented recommendation method |
CN103514289A (en) * | 2013-10-08 | 2014-01-15 | 北京百度网讯科技有限公司 | Method and device for building interest entity base |
-
2014
- 2014-06-17 CN CN201410269562.6A patent/CN104008203B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100257131A1 (en) * | 2007-12-28 | 2010-10-07 | Kun-Oh Kim | Apparatus and method for controlling hybrid motor |
CN102043793A (en) * | 2009-10-09 | 2011-05-04 | 卢健华 | Knowledge-service-oriented recommendation method |
CN103514289A (en) * | 2013-10-08 | 2014-01-15 | 北京百度网讯科技有限公司 | Method and device for building interest entity base |
Cited By (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105718471A (en) * | 2014-12-03 | 2016-06-29 | 中国科学院声学研究所 | User preference modeling method, system, and user preference evaluation method and system |
CN112948672A (en) * | 2015-05-26 | 2021-06-11 | 谷歌有限责任公司 | Predicting user needs for a particular context |
CN106055661B (en) * | 2016-06-02 | 2017-11-17 | 福州大学 | More interest resource recommendations based on more Markov chain models |
CN106055661A (en) * | 2016-06-02 | 2016-10-26 | 福州大学 | Multi-interest resource recommendation method based on multi-Markov-chain model |
CN106776757B (en) * | 2016-11-15 | 2020-03-27 | 中国银行股份有限公司 | Method and device for indicating user to complete online banking operation |
CN106776757A (en) * | 2016-11-15 | 2017-05-31 | 中国银行股份有限公司 | User completes the indicating means and device of Net silver operation |
CN106651517A (en) * | 2016-12-20 | 2017-05-10 | 广东技术师范学院 | Hidden semi-Markov model-based drug recommendation method |
CN106651517B (en) * | 2016-12-20 | 2021-11-30 | 广东技术师范大学 | Drug recommendation method based on hidden semi-Markov model |
CN109388661B (en) * | 2017-08-02 | 2020-04-21 | 创新先进技术有限公司 | Model training method and device based on shared data |
US11106804B2 (en) | 2017-08-02 | 2021-08-31 | Advanced New Technologies Co., Ltd. | Model training method and apparatus based on data sharing |
US11106802B2 (en) | 2017-08-02 | 2021-08-31 | Advanced New Technologies Co., Ltd. | Model training method and apparatus based on data sharing |
CN107609063A (en) * | 2017-08-29 | 2018-01-19 | 重庆邮电大学 | A kind of the mobile phone application commending system and its method of multi-tag classification |
CN107609063B (en) * | 2017-08-29 | 2020-03-17 | 重庆邮电大学 | Multi-label classified mobile phone application recommendation system and method thereof |
WO2019120037A1 (en) * | 2017-12-18 | 2019-06-27 | Oppo广东移动通信有限公司 | Model construction method, network resource preloading method and apparatus, medium, and terminal |
CN108038222B (en) * | 2017-12-22 | 2022-01-11 | 冶金自动化研究设计院 | System of entity-attribute framework for information system modeling and data access |
CN108038222A (en) * | 2017-12-22 | 2018-05-15 | 冶金自动化研究设计院 | System for Information System Modeling and entity-property frame of data access |
CN108596205A (en) * | 2018-03-20 | 2018-09-28 | 重庆邮电大学 | Behavior prediction method is forwarded based on the microblogging of region correlation factor and rarefaction representation |
CN108596205B (en) * | 2018-03-20 | 2022-02-11 | 重庆邮电大学 | Microblog forwarding behavior prediction method based on region correlation factor and sparse representation |
CN108809955B (en) * | 2018-05-22 | 2019-05-24 | 南瑞集团有限公司 | A kind of power consumer behavior depth analysis method based on hidden Markov model |
CN108809955A (en) * | 2018-05-22 | 2018-11-13 | 南瑞集团有限公司 | A kind of power consumer behavior depth analysis method based on hidden Markov model |
CN109741146A (en) * | 2019-01-04 | 2019-05-10 | 平安科技(深圳)有限公司 | Products Show method, apparatus, equipment and storage medium based on user behavior |
CN109933741B (en) * | 2019-02-27 | 2020-06-23 | 京东数字科技控股有限公司 | Method, device and storage medium for extracting user network behavior characteristics |
CN109933741A (en) * | 2019-02-27 | 2019-06-25 | 京东数字科技控股有限公司 | User network behaviors feature extracting method, device and storage medium |
CN110162553A (en) * | 2019-05-21 | 2019-08-23 | 南京邮电大学 | Users' Interests Mining method based on attention-RNN |
CN110297817A (en) * | 2019-06-25 | 2019-10-01 | 哈尔滨工业大学 | A method of the structure of knowledge is constructed based on personalized Bayes's knowledge tracing model |
CN110866542A (en) * | 2019-10-17 | 2020-03-06 | 西安交通大学 | Depth representation learning method based on feature controllable fusion |
CN110866542B (en) * | 2019-10-17 | 2021-11-19 | 西安交通大学 | Depth representation learning method based on feature controllable fusion |
CN114169869A (en) * | 2022-02-14 | 2022-03-11 | 北京大学 | Attention mechanism-based post recommendation method and device |
CN114169869B (en) * | 2022-02-14 | 2022-06-07 | 北京大学 | Attention mechanism-based post recommendation method and device |
Also Published As
Publication number | Publication date |
---|---|
CN104008203B (en) | 2018-04-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104008203A (en) | User interest discovering method with ontology situation blended in | |
Gurini et al. | Temporal people-to-people recommendation on social networks with sentiment-based matrix factorization | |
Ozsoy | From word embeddings to item recommendation | |
Chandra et al. | Estimating twitter user location using social interactions--a content based approach | |
CN102004774B (en) | Personalized user tag modeling and recommendation method based on unified probability model | |
Li et al. | Community detection using hierarchical clustering based on edge-weighted similarity in cloud environment | |
CN101216825B (en) | Indexing key words extraction/ prediction method | |
CN102254038B (en) | System and method for analyzing network comment relevance | |
CN105045931A (en) | Video recommendation method and system based on Web mining | |
CN106126582A (en) | Recommend method and device | |
Abebe et al. | Generic metadata representation framework for social-based event detection, description, and linkage | |
CN106484764A (en) | User's similarity calculating method based on crowd portrayal technology | |
US20200026759A1 (en) | Artificial intelligence engine for generating semantic directions for websites for automated entity targeting to mapped identities | |
CN104008109A (en) | User interest based Web information push service system | |
CN104376406A (en) | Enterprise innovation resource management and analysis system and method based on big data | |
CN105045901A (en) | Search keyword push method and device | |
CN104268271A (en) | Interest and network structure double-cohesion social network community discovering method | |
CN105718579A (en) | Information push method based on internet-surfing log mining and user activity recognition | |
CN103678431A (en) | Recommendation method based on standard labels and item grades | |
CN104484431A (en) | Multi-source individualized news webpage recommending method based on field body | |
CN104462253A (en) | Topic detection or tracking method for network text big data | |
CN103838756A (en) | Method and device for determining pushed information | |
CN104750789A (en) | Label recommendation method and device | |
CN103324666A (en) | Topic tracing method and device based on micro-blog data | |
CN105159930A (en) | Search keyword pushing method and apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20180417 |