CN109727047A - A kind of method and apparatus, data recommendation method and the device of determining data correlation degree - Google Patents

A kind of method and apparatus, data recommendation method and the device of determining data correlation degree Download PDF

Info

Publication number
CN109727047A
CN109727047A CN201711032881.5A CN201711032881A CN109727047A CN 109727047 A CN109727047 A CN 109727047A CN 201711032881 A CN201711032881 A CN 201711032881A CN 109727047 A CN109727047 A CN 109727047A
Authority
CN
China
Prior art keywords
data
entry
entry data
association
category
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711032881.5A
Other languages
Chinese (zh)
Inventor
赵旭玲
李凯东
闫石
王经纬
王云涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201711032881.5A priority Critical patent/CN109727047A/en
Publication of CN109727047A publication Critical patent/CN109727047A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses method and apparatus, data recommendation method and the devices of a kind of determining data correlation degree, are related to field of computer technology.One specific embodiment of this method includes: to determine that associated data set, the associated data set include at least the data of associated two entries according to the historical viewings data of user;Each entry data is concentrated to carry out data statistics the associated data according to statistical indicator, to obtain the statistical value of each entry data;According to the statistical value of each entry data, the degree of association of each entry data and associated entry data is determined.It is able to ascend the computational accuracy to the data degree of association, and accurately determine the degree of association between data, help more accurately to understand the incidence relation between data from user perspective, so that the high data recommendation of the data correlation degree browsed with user is optimized user experience to user.

Description

A kind of method and apparatus, data recommendation method and the device of determining data correlation degree
Technical field
The present invention relates to field of computer technology more particularly to a kind of method and apparatus of determining data correlation degree, recommend The method and apparatus of data.
Background technique
With the popularity of the internet with the development of e-commerce, electric business tentatively instead of conventional entity Sales Channel, The product provided for user also increasingly diversity.In the epoch based on PC (electronic computer) end, in order to enhance the exposure of product Luminous power degree, electrospray chamber as much as possible expose product in the page that user browses, and attract the sight of user, promote conversion ratio, but It is the great-leap-forward development now with mobile terminals such as mobile phones, more and more users' selection is done shopping in mobile phone terminal, this just gives shop Quotient brings two challenges based on the conventional exposure mode at the end PC.First, former mobile phone user is most concerned with telephone expenses, but now Mobile phone user is most concerned with flow.It, can consumptive use if electric business still uses the Exposure mode of previous cruelty, mass picture The a large amount of mobile data traffic in family does not have not only and promotes transformation efficiency effect, it is also possible to evoke the dislike of user.Second, in the past The end PC shows that the size of screen is 15 cun or so, but the screen of mobile phone is but very small relative to computer screen, and electric business can open up Show then more limited to the pictorial information of user.Based on the above two o'clock reason, electric business must change previous product Exposure mode, The interested product of user is pushed to user as far as possible from cruelty examination exposure to fining exposure development, rather than thousand one The all over products of rule.
In order to which by the strong Products Show of relevance, to user, in existing analyzing product association scheme, one is be based on account Number order behavior, i.e., the commodity category bought jointly in same order is calculated, program amount of available data is less, and Calculated result is easy to be influenced by surging category;Another kind is the browsing behavior based on account, i.e., based on use in a period of time Browsing behavior after family logs in is calculated, the program be easy user every other day, it is even non-real in non-same period on the same day The behavior for being associated with category is calculated to together.
In realizing process of the present invention, at least there are the following problems in the prior art for inventor's discovery:
Existing scheme is insufficient to the accuracy of analyzing product association, can not accurately be pushed to the interested product of user User.
Summary of the invention
In view of this, the embodiment of the present invention provides the side of a kind of method and apparatus of determining data correlation degree, recommending data Method and device are able to ascend the computational accuracy to the data degree of association, and accurately determine the degree of association between data, facilitate from user Angle more accurately understands the incidence relation between data, thus the data recommendation that the data correlation degree browsed with user is high To user, optimize user experience.
To achieve the above object, according to an aspect of an embodiment of the present invention, a kind of determining data correlation degree is provided Method.
A kind of method of determining data correlation degree, comprising: determine associated data set according to the historical viewings data of user, institute State the data that associated data set includes at least associated two entries;Each entry is concentrated to the associated data according to statistical indicator Data carry out data statistics, to obtain the statistical value of each entry data;Each item is determined according to the statistical value of each entry data Mesh number is according to the degree of association with associated entry data.
Optionally, the step of determining associated data set according to the historical viewings data of user, comprising: according to the history of user The information of each affiliated session of entry data determines the associated entry data of each entry data in browsing data;Wherein, an entry The associated entry data of data are to belong to the data of one or more entries of same session with the entry data;For each Purpose data, the corresponding correlating sessions of each associated entry data for counting the entry data respectively count, and according to the association The size of session count is to each associated entry data sorting of the entry data, to obtain the associated entry data of the entry data Sequence;Wherein, the corresponding correlating sessions of an associated entry data of the entry data are counted as the entry data and are associated with item with this The quantity of session belonging to mesh number evidence is common;It is screened according to data of the preset rules to all entries, entry is chosen with basis Data determine associated data set.
Optionally, the degree of association of each entry data and associated entry data is determined according to the statistical value of each entry data The step of, comprising: the relevance point of each entry data and associated entry data is calculated according to the statistical value of each entry data Number;The degree of association of each entry data and associated entry data is determined according to the relevance scores.
Optionally, the relevance of each entry data and associated entry data is calculated according to the statistical value of each entry data The step of score, comprising: for each entry data, calculate the entry data and an associated entry data according to following formula Relevance scores: Score=N1*N2, wherein Score be the entry data and the associated entry data relevance scores, N1 is that the corresponding correlating sessions of the associated entry data of the entry data count and the quantity of the affiliated session of the entry data Ratio, N2 are the associated entry data of the associated entry data of the entry data corresponding reversion counting and the entry data The ratio of the quantity of affiliated session, wherein the value of the N1 and the N2 are obtained according to the statistical value.
Optionally, the step of the degree of association of each entry data and associated entry data is determined according to the relevance scores Suddenly, comprising: each entry data is made comparisons with the relevance scores of associated entry data and first threshold, second threshold, in which: When the relevance scores are greater than first threshold, it is determined that the degree of association of the entry data and the associated entry data is strong closes Connection;When the relevance scores are less than the first threshold and are greater than second threshold, then the entry data and the associated entry number According to the degree of association be weak rigidity;When the relevance scores are less than the second threshold, it is determined that the entry data is associated with this The degree of association of entry data is onrelevant.
According to another aspect of an embodiment of the present invention, provide it is a kind of based on the embodiment of the present invention really fixed number according to the degree of association Method determine data correlation degree recommending data method.
It is a kind of based on the data correlation degree recommending data that fixed number is determined according to the method for the degree of association really of the embodiment of the present invention Method, comprising: obtain the data of the current browsing items of user;According to the data of the determining current browsing items of the user and pass The degree of association for joining entry data will meet the associated entry data recommendation of default recommendation condition to the user.
Another aspect according to an embodiment of the present invention provides a kind of device of determining data correlation degree.
A kind of device of determining data correlation degree, comprising: associated data set determining module, for clear according to the history of user Data of looking at determine that associated data set, the associated data set include at least the data of associated two entries;Data statistics module, For concentrating each entry data to carry out data statistics the associated data according to statistical indicator, to obtain the system of each entry data Evaluation;Data correlation degree determining module, for according to the statistical value of each entry data determine each entry data be associated with item The degree of association of mesh number evidence.
Optionally, the associated data set determining module is also used to: according to each entry number in the historical viewings data of user The associated entry data of each entry data are determined according to the information of affiliated session;Wherein, the associated entry data of an entry data For belong to the entry data same session one or more entries data;For the data of each entry, count respectively The corresponding correlating sessions of each associated entry data of the entry data count, and the size counted according to the correlating sessions is to this Each associated entry data sorting of entry data, to obtain the associated entry data sequence of the entry data;Wherein, the entry number According to the corresponding correlating sessions of an associated entry data be counted as the entry data and the associated entry data jointly belonging to session Quantity;It is screened according to data of the preset rules to all entries, chooses the data of entry to determine associated data with basis Collection.
Optionally, the data correlation degree determining module is also used to: being calculated according to the statistical value of each entry data described each The relevance scores of entry data and associated entry data;According to the relevance scores determine each entry data be associated with The degree of association of entry data.
Optionally, the data correlation degree determining module includes computational submodule, is used for: for each entry data, being pressed The relevance scores of the entry data Yu an associated entry data: Score=N1*N2 are calculated according to following formula, wherein Score is the relevance scores of the entry data and the associated entry data, and N1 is the associated entry data of the entry data Corresponding correlating sessions count the ratio with the quantity of the affiliated session of the entry data, and N2 is the associated entry of the entry data The corresponding reversion of data counts the ratio with the quantity of the affiliated session of associated entry data of the entry data, wherein described The value of N1 and the N2 are obtained according to the statistical value.
Optionally, the data correlation degree determining module further includes determining submodule, is used for: by each entry data be associated with The relevance scores and first threshold of entry data, second threshold are made comparisons, in which: the relevance scores are greater than first threshold When, it is determined that the degree of association of the entry data and the associated entry data is Qiang Guanlian;The relevance scores are less than described the One threshold value and when being greater than second threshold, then the degree of association of the entry data and the associated entry data is weak rigidity;The association Property score when being less than the second threshold, it is determined that the degree of association of the entry data and the associated entry data is onrelevant.
Another aspect according to an embodiment of the present invention, provide it is a kind of based on the embodiment of the present invention really fixed number according to the degree of association Device determine data correlation degree recommending data device.
It is a kind of based on the data correlation degree recommending data that fixed number is determined according to the device of the degree of association really of the embodiment of the present invention Device, comprising: module is obtained, for obtaining the data of the current browsing items of user;Recommending module, for according to determining The data of the current browsing items of user and the degree of association of associated entry data, will meet the associated entry of default recommendation condition Data recommendation gives the user.
Another aspect according to an embodiment of the present invention, provides a kind of server.
A kind of server, comprising: one or more processors;Memory, for storing one or more programs, when described When one or more programs are executed by one or more of processors, so that one or more of processors, which are realized, determines number According to the degree of association method or based on the determination data correlation degree method determine data correlation degree recommending data method.
Another aspect according to an embodiment of the present invention, provides a kind of computer-readable medium.
A kind of computer-readable medium is stored thereon with computer program, realizes when described program is executed by processor true Fixed number according to the degree of association method or based on the determination data correlation degree method determine data correlation degree recommending data side Method.
One embodiment in foregoing invention has the following advantages that or the utility model has the advantages that the historical viewings data according to user are true Determine associated data set, then concentrates each entry data to carry out data statistics associated data according to statistical indicator, to obtain each item The statistical value of mesh number evidence determines the degree of association of each entry data Yu associated entry data further according to the statistical value of each entry data. It is able to ascend the computational accuracy to the data degree of association, and accurately determines the degree of association between data, is facilitated more smart from user perspective Really understand the incidence relation between data, thus by the high data recommendation of the data correlation degree browsed with user to user, Optimize user experience.
Further effect possessed by above-mentioned non-usual optional way adds hereinafter in conjunction with specific embodiment With explanation.
Detailed description of the invention
Attached drawing for a better understanding of the present invention, does not constitute an undue limitation on the present invention.Wherein:
Fig. 1 is the key step schematic diagram of the method for determining data correlation degree according to an embodiment of the present invention;
Fig. 2 is the preferred flow signal of the degree of association between the different category commodity datas of determination according to an embodiment of the present invention Figure;
Fig. 3 is the main modular schematic diagram of the device of determining data correlation degree according to an embodiment of the present invention;
Fig. 4 is that the embodiment of the present invention can be applied to exemplary system architecture figure therein;
Fig. 5 is adapted for the structural schematic diagram for the computer system for realizing the server of the embodiment of the present invention.
Specific embodiment
Below in conjunction with attached drawing, an exemplary embodiment of the present invention will be described, including the various of the embodiment of the present invention Details should think them only exemplary to help understanding.Therefore, those of ordinary skill in the art should recognize It arrives, it can be with various changes and modifications are made to the embodiments described herein, without departing from scope and spirit of the present invention.Together Sample, for clarity and conciseness, descriptions of well-known functions and structures are omitted from the following description.
Fig. 1 is the key step schematic diagram of the method for determining data correlation degree according to an embodiment of the present invention.
As shown in Figure 1, fixed number according to the method for the degree of association mainly includes the following steps, namely S101 extremely to the embodiment of the present invention really Step S103.
Step S101: associated data set is determined according to the historical viewings data of user.
Wherein, associated data set includes at least the data of associated two entries, in the data of associated two entries, In an entry data be another entry data associated entry data.Application scenarios according to an embodiment of the present invention, It can according to need to define the meaning of the entry of data.For example, between determining different category commodity datas in electric business field The degree of association, the data of an entry can be the ID of a certain rank commodity category, and for other application scenarios, such as user is browsed Other web datas in addition to electric business platform website, the commodity number that this web data is divided not in accordance with commodity category According to the data for the content that can then browse according to the content of web data, user can be used as the data of an entry.
The specific steps of associated data set are determined according to the historical viewings data of user can include: clear according to the history of user The information of each affiliated session of entry data in data of looking at determines the associated entry data of each entry data;For the number of each entry According to the corresponding correlating sessions of each associated entry data for counting the entry data respectively count, and are counted according to each correlating sessions Size to each associated entry data sorting of the entry data, to obtain the associated entry data sequence of the entry data, In, the corresponding correlating sessions of an associated entry data of the entry data are counted as the entry data and the associated entry data are total With the quantity of affiliated session;It is screened according to data of the preset rules to all entries, according to the data for choosing entry Determine associated data set, wherein preset rules include: both associated entry data when an entry data and the entry data In, any one is respectively positioned on the top n position of the associated entry data sequence of another one, then the two is selected into associated data Collection, N is natural number.
The ID that the information of each affiliated session of entry data can be the affiliated session of each entry data (is represented by session_ Id), the corresponding ID of session each time.
Wherein, the associated entry data of an entry data are to belong to the one or more of same session with the entry data The data of entry.The meaning of session are as follows: since user opens browser browse operation, browsing is all clear to closing Look at page, alternatively, played when browse operation opening browser since user user suspend browse operation reach preset duration when Between put until, a session can be defined as.Preset duration is the effective time of session_id (i.e. session id), can be certainly Row setting, such as it is 30 minutes that the preset duration, which is usually arranged, i.e., if user restarts browser or in pause browse operation 30 Restore browse operation after the time point of minute again and then belongs to primary new session.
Step S102: each entry data is concentrated to carry out data statistics associated data according to statistical indicator, to obtain each item The statistical value corresponding with statistical indicator of mesh number evidence.
It concentrates each entry data to carry out data statistics associated data according to statistical indicator, each entry data can be obtained One statistical value of statistical value, each entry data is corresponding with a statistical indicator.
Each associated entry data that statistical indicator can specifically include the current entries data for being performed data statistics are corresponding Correlating sessions counting, the quantity of the affiliated session of current entries data, each associated entry data of current entries data distinguish institute The corresponding reversion of quantity, each associated entry data of current entries data for belonging to session counts, wherein the one of current entries data The corresponding reversion of associated entry data counts are as follows: the associated entry data are as new current entries data and described work as preceding article When mesh number is according to associated entry data as new current entries data, the new current entries data and this new work as preceding article The quantity of session belonging to the associated entry data of mesh number evidence are common.
Step S103: according to the statistical value corresponding with statistical indicator of each entry data, each entry data and corresponding is determined Associated entry data the degree of association.
Specifically, each entry data and phase can be calculated according to the statistical value corresponding with statistical indicator of each entry data Then the relevance scores for the associated entry data answered determine each entry data and corresponding associated entry according to relevance scores The degree of association of data.
Wherein, according to the statistical value corresponding with the statistical indicator of each entry data, calculate each entry data with It the step of relevance scores of corresponding associated entry data, can specifically include: for each entry data, according to following public affairs Formula calculates the relevance scores of the entry data with a corresponding associated entry data:
Score=N1*N2,
Wherein, Score is the relevance scores of the entry data and the associated entry data, wherein the value of N1 and N2 can To be obtained according to the corresponding statistical value of each statistical indicator.Specifically, N1 is that the associated entry data of the entry data are corresponding Correlating sessions count the ratio with the quantity of the affiliated session of the entry data, and N2 is the associated entry data pair of the entry data The reversion answered counts the ratio with the quantity of the affiliated session of associated entry data of the entry data.
Also, the step of each entry data is with the degree of association of corresponding associated entry data is determined according to relevance scores, It can specifically include: by each entry data and the relevance scores of corresponding associated entry data and first threshold, second threshold It makes comparisons, in which: when relevance scores are greater than first threshold, it is determined that the degree of association of the entry data and the associated entry data For Qiang Guanlian;When relevance scores are less than first threshold and are greater than second threshold, then the entry data and the associated entry data The degree of association be weak rigidity;When relevance scores are less than the second threshold, it is determined that the entry data and the associated entry number According to the degree of association be onrelevant.
Really fixed number according to the method for the degree of association is able to ascend the computational accuracy to the data degree of association to the embodiment of the present invention, and essence Really determine data between the degree of association, help more accurately to understand the incidence relation between data from user perspective, thus will with The high data recommendation of the data correlation degree that family is browsing optimizes user experience to user.
The embodiment of the present invention really fixed number according to the degree of association method be suitable for all determining data correlation degree scene, such as The embodiment of the present invention can be used in electric business field, and fixed number is determined according to the method for the degree of association between different category commodity datas really The degree of association, below for determining the degree of association between different category commodity datas in electric business field, really to the embodiment of the present invention Fixed number describes in detail according to the method for the degree of association.
Fig. 2 is the preferred flow signal of the degree of association between the different category commodity datas of determination according to an embodiment of the present invention Figure.
As shown in Fig. 2, determining that the preferred flow of the degree of association between different category commodity datas includes the following steps, namely S201 To step S211.
Step S201: the user's history browsing data of preset time period are obtained.
It, can be with when obtaining user's history browsing data since the cycle of activity of the mankind is generally 7 integral multiple Historical viewings data are extracted based on 7 days multiples, verify the optimum data scale of construction of be drawn into data, then so as to most Small computing resource obtains optimal precision.By taking commodity data is divided into three ranks as an example, from the browsing history of user User 7 days historical viewings data are extracted, the session of user can be limited based on session_id, user is established and browses product Class base table, wherein session_id, that is, session id is used for session of identity user.User browses number in category base table It can specifically include according to format, session_id, level-one, the second level, the name of the ID of three-level category and commodity of user's browsing Claim.
Under normal conditions, when each category commodity data is divided into three ranks, level-one category and second level category include multiple The commodity of category, it is the commodity of some specific category that three-level category is corresponding, such as level-one category is mother and baby's articles, seconds Class is milk powder, and three-level category is pregnant milk powder, which is the pregnant milk powder of various brands, therefore, with Under each category commodity data using the ID of three-level category as the embodiment of the present invention.It will be appreciated by persons skilled in the art that The embodiment of the present invention is not limited to the case where category rank is three ranks, if it is three ranks are more or less than, as long as choosing The category data of the rank of corresponding one specific category commodity, such as four category ranks, fourth stage category pair Answer be single specific category commodity, then can choose each category quotient of the ID as the embodiment of the present invention of fourth stage category Product data.
Due to the embodiment of the present invention by taking the ID of three-level category as an example as each category commodity data of the embodiment of the present invention, be Statement is convenient, and three-level category can abbreviation category.
After being drawn into above-mentioned historical viewings data, data can also be carried out to the historical viewings data being drawn into first Cleaning, specifically, data cleansing may include carrying out outlier processing to the historical viewings data that are drawn into, for different different Constant value feature takes different processing strategies, such as can be repaired by the way of closing on average value for the exceptional value that can be repaired The processing strategies such as benefit, data cleansing can also include will enterprise in conjunction with data, the air control data of electric business platform such as the account of user etc. Industry user account, risk account (such as electric business businessman brush single account), corpse account (such as a long time without logging into account or length Only account of the browsing but without lower single act, the specific time range for being somebody's turn to do " long-term " can rule of thumb be set after phase logs in) etc. Account filtering.After executing above-mentioned data cleansing, the browsing category data of each session after output cleaning.
Step S202: according to the information of session belonging to category commodity data each in the historical viewings data of user to each product Class commodity data executes data splicing, obtains spliced tables of data.
According to the historical viewings data (can be the historical viewings data after data cleansing) of user with session_id As the standard of data splicing, category base table is browsed to user and carries out itself cartesian product operation, it specifically, can be clear by user The three-level category ID look in category base table distinguishes the combination of two of data sequence, obtains splicing data, specific for example, false If user browses category base table and only includes a session, and session id is S1, which includes two category commodity datas, i.e., Three-level category ID is respectively C1 and C2, then browses the result that category base table carries out itself cartesian product operation to user are as follows: S1: C1, C2;And S1:C2, C1.Itself cartesian product operation is carried out by browsing category base table to user, can be incited somebody to action The consistent data entry of session_id connects together.In this manner it is ensured that all category A being spliced to together and B mono- Surely it is to be appeared in a user browsing behavior (i.e. same session) simultaneously, only in this way just can guarantee that the category excavated is I.e. having browsed category C1 has also browsed category C2 in a session for category with Close relation, i.e. user.If unlimited Determine session_id, the category excavated is more likely that the heat on electric business platform searches category, and being not necessarily has association The category of property.Each splicing data form spliced tables of data, and spliced tables of data can be as shown in table 1.
Table 1
session_id Commodity three-level category ID Commodity three-level category ID is browsed simultaneously
S1 C1 C2
S1 C1 C3
S1 C1 C4
S1 C1 C5
S2 C1 C2
S2 C1 C3
In table 1, S1, S2 are session_id (i.e. session id), indicate sessions different twice.C1, C2, C3, C4, C5 are Three-level category ID.For example, it is C2 that session_id, which is S1, commodity three-level category ID is C1 while browsing commodity three-level category ID, Indicate: user is in the session that session id is S1, and while having browsed the three-level category that ID is C1, also having browsed ID is C2's Three-level category.Pass through spliced tables of data, it is seen then that in first time session (session id S1), user accesses category C1's The three-levels category such as C5 that meanwhile also having accessed C2, C3, C4 ..., in second of session (session id S2), user accesses category The three-levels category such as C2, C3 is also had accessed while C1.
Step S203: the association category commodity data for counting each category commodity data in spliced tables of data is corresponding Correlating sessions count.
The association category commodity data of one category commodity data is to belong to the one of same session with the category commodity data It is that user is clear in a user that a or multiple category commodity datas, i.e. a category commodity data, which are associated with category commodity data, It lookes in behavior while the data of browsing.It is exemplified by Table 1, in the session that session id is S1, the association category commodity number of C1 category According to for category C2, C3, C4 ... C5, in the session that session id is S2, the association category commodity data of C1 category be category C2, C3。
The corresponding correlating sessions of an association category commodity data of a certain category commodity data are counted as the category commodity number According to the quantity for being associated with the common affiliated session of category commodity data with this.With the data instance in table 1, C1 category and associated Session belonging to C2 category is common is S1 and S2, then the ID of C1 category is the corresponding association meeting of association category commodity data of C2 Words are counted as 2;C1 category and the common affiliated session of associated C4 category are S1, then the ID of C1 category is the association product of C4 The corresponding correlating sessions of class commodity data are counted as 1.
Step S204: it sorts to each association category commodity data of each category commodity data, to obtain each category quotient The association category commodity data sequence of product data.
The corresponding correlating sessions of association category commodity data of each category commodity data obtained according to statistics count big Small each association category commodity data to each category commodity data sorts, to obtain the association product of each category commodity data Class commodity data sequence.
With the data instance in table 1, each category commodity data can be counted by the step and be associated with category commodity The number that data (such as category C1 category C2 associated with it) occur simultaneously in all user browsing behaviors (session) is (i.e. Correlating sessions count), but for a same session without computing repeatedly.For example, if in spliced tables of data, meeting The record that words S1 has a plurality of C1 and C2 to occur simultaneously, but only statistics is primary.Due to only having user in a session while quilt The category being accessed could illustrate that there are certain relevances between these categories, in this way, the browsing behavior of user can be focused on It, may homogeneous session be clear by user if do not limited with a session (i.e. session_id is equal) in session Category in looking at counts on together, the incidence relation between the not instead of category excavated in this way, and heat searches the aggregation of category.
Which by browsing category together in analysis user each time browsing session, and then can excavate user's There is between which category relevance at heart.
Step S205: judging each category commodity data and one of the category commodity data is associated with category commodity data two In person, if any one is respectively positioned on the top n position of the association category commodity data sequence of another one, if so, executing step Rapid S206, if it is not, thening follow the steps S207.
The numerical value of N can be set as needed, by taking N=20 as an example, it is assumed that and category C1 category associated with it is category C2, Judge whether to meet: category C1 is located at preceding 20 position of the association category commodity data sequence of category C2, and category C2 is located at Preceding 20 position of the association category commodity data sequence of category C1.
Step S206: both the category commodity data and the association category commodity data are selected into associated data set.
According to the citing in step S205, if met: category C1 is located at the association category commodity data sequence of category C2 Preceding 20 position, and category C2 is located at preceding 20 position of the association category commodity data sequence of category C1, then by category C1 Associated data set is selected into category C2.
After executing the step S206, step S208 is executed.
Step S207: the association of the category commodity data and the association category commodity data in spliced tables of data is remembered Record is deleted.
According to the citing in step S205, if be unsatisfactory for: category C1 is located at the association category commodity data sequence of category C2 Preceding 20 position of column, and category C2 is located at preceding 20 position of the association category commodity data sequence of category C1, then it will splicing The associated record of category C1 and category C2 is deleted in tables of data afterwards.
The embodiment of the present invention will associated two categories must other side be associated with category commodity data sequence each other simultaneously each other Preceding 20 categories, be just selected into associated data set, if two association categories, having one, there can be no in another pass Join first 20 of category commodity data sequence, then deletes this record, can be improved the pass determined between different category commodity datas The precision of connection degree.
Step S208: each category commodity data is concentrated to carry out data statistics associated data according to statistical indicator, to obtain Category statistical form.
It is obtained in category statistical form to concentrate each category commodity data to carry out data statistics associated data according to statistical indicator The statistical value corresponding with statistical indicator of each category commodity data arrived.The statistical indicator, which can specifically include, is performed data system The corresponding correlating sessions counting of each association category commodity data of the current category commodity data of meter, current category commodity data institute Belong to quantity, the current category of session belonging to the quantity of session, each association category commodity data difference of current category commodity data The corresponding reversion of each association category commodity data of commodity data counts, wherein an association category of current category commodity data The corresponding reversion of commodity data counts are as follows: the association category commodity data is as current category commodity data newly and described works as When preceding category commodity data is associated with category commodity data as the one of new current category commodity data, the new current category quotient Product data are associated with the quantity of the common affiliated session of category commodity data with this of the new current category commodity data.Table 2 shows Example property lists a kind of concrete form of category statistical form.
Table 2
As shown in table 2, six column of the first row are respectively current category commodity data, association category commodity data, 4 in table 2 A statistical indicator, current category commodity data is corresponding to be classified as three-level category ID, the corresponding association category quotient of each three-level category ID Product data are user three-level category ID of browsing simultaneously in a session, and the corresponding column data of each statistical indicator is the system The corresponding statistical value of index is counted, for example, when C1C2_Num indicates that current category commodity data is C1 category, the association of the C1 category The corresponding correlating sessions of category commodity data, that is, C2 category count, i.e. the common affiliated session of C1 category and associated C2 category Quantity in other words, user also browsed while browsing C1 category C2 category session quantity.C2C1_Num is should The corresponding reversion of association category commodity data, that is, C2 category of C1 category counts, i.e. C2 category and the common institute of associated C1 category The quantity of the session of category in other words, user have also browsed the quantity of the session of C1 category while browsing C2 category.For Identical one group of current category commodity data is associated with for category commodity data with it, and corresponding correlating sessions count and reversion meter Several statistical values is identical.The quantity of the affiliated session of C1_Num_Sum, that is, C1 category, i.e. user browse the quantity of the session of C1 category, C2_Num_Sum is the quantity of the affiliated session of C2 category, i.e. the quantity of the session of user's browsing C2 category.
Step S209: it according to the statistical value corresponding with statistical indicator of category commodity data each in category statistical form, calculates The relevance scores of each category commodity data and corresponding association category commodity data.
With the current category commodity data in table 2 for C1 category, association category commodity data is C1 product for C2 category The relevance scores (Score) of class and C2 category are equal to N1*N2, wherein N1 is the ratio of C1C2_Num and C1_Num_Sum, N2 For the ratio of C2C1_Num and C2_Num_Sum.
For being associated with category commodity data with it due to category commodity data current for identical one group, corresponding association Session count is identical with the statistical value that reversion counts, therefore calculates each category commodity data and corresponding association category commodity data Relevance scores formula: in Score=N1*N2, N2 can also be the category commodity data the association category commodity number The ratio that the quantity of the affiliated session of category commodity data is associated with this of the category commodity data is counted according to corresponding correlating sessions.
Each category commodity data indicates two category commodity numbers with the relevance scores of corresponding association category commodity data The height of the degree of association between, this is worth value range between 0 to 1, such as the relevance scores of C1 category and C2 category It is 0.09, which indicates that the degree of association between two category commodity datas is higher, the embodiment of the present invention is specifically set closer to 1 Two threshold values (first threshold, second threshold) are set to determine the degree of association between two category commodity datas, specific determining rule will It is described in detail below.
Step S210: by the relevance scores and the first threshold of each category commodity data and corresponding association category commodity data Value, second threshold are made comparisons.
With the current category commodity data in table 2 for C1 category, category commodity data is associated with for C2 category, i.e., will The relevance scores Score (N1*N2) of C1 category and C2 category makes comparisons with first threshold, second threshold.
Step S211: the association of each category commodity data with corresponding association category commodity data is determined according to comparison result Degree.
According to the citing in step S210, the relevance scores Score (N1*N2) of C1 category and C2 category is greater than the first threshold When value, it is determined that the degree of association of the C1 category and C2 category is Qiang Guanlian, which is less than first threshold and is greater than second threshold When, it is determined that the degree of association of the C1 category and C2 category is weak rigidity;When the Score is less than second threshold, it is determined that the C1 product The degree of association of class and C2 category is onrelevant.
The embodiment of the present invention can construct a category relevance model to execute step S202 to step S211.It is extracting To after historical viewings data, data cleansing, and the history after data cleansing are carried out to the historical viewings data being drawn into Input data of the data as the category relevance model is browsed, step S202 is executed to step by the category relevance model S211, to calculate the relevance scores of each category commodity data and corresponding association category commodity data, the last model Output data is the degree of association of each category commodity data (can referred to as each category).
The input data of the category relevance model of the embodiment of the present invention is based on some model hypothesis, specifically includes: false If the user browse data fed back from electric business platform is all the browsing behavior of normal users, i.e. exclusion rival carries out The case where information is collected;Assuming that being all correct information from the user browsing behavior that full canal capacity detail is collected into, that is, do not have Exceptional value interference;Assuming that the information that session_id is returned all is correct information, that is, system mistake, above three mould is not present Type assumes to indicate that the historical viewings data (i.e. the input data of model) after data cleansing have met above three vacation If the ideal data that can be used for model.
It, can also be to this after determining the data correlation degree between each category commodity data by category relevance model The output result of model is tested in actual production system, such as by model determination as a result, AB is divided to test, it may be assumed that by one Part category commodity data presets recommendation item according to business demand according to the degree of association that model determines as A class data The association category commodity data for the A class data for meeting default recommendation condition is recommended user by part, while by another part category Commodity data as B class data, then to be returned according to production system to user's recommending data by the degree of association for not using model to determine The purchase user conversion ratio (ratio for browsing and buying user and total browsing user) returned, checks the association determined according to model Whether degree to the purchase user conversion ratio of user's recommending data reaches desired value (such as purchase user's conversion ratio of B class data Be 5%, it is expected that purchase user's conversion ratio of A class data be 10%, then the desired value be 10%), if not up to desired value, The parameter (i.e. first threshold and second threshold) of category relevance model is adjusted, so as to adjust the association of different category commodity datas The output result of category relevance model can constantly be optimized by multiple loop iteration by spending definitive result.
The embodiment of the present invention can also establish the automatic perform script of mode input data, and configure timing execution task, Automation task is executed by the model timing of category relevance, and can be by category relevance model interaction business system (such as electric business Plateform system), directly the output result of model is input in business system.Help electric business plateform system more smart from user perspective Really understanding between which category has strong and weak relevance, allows the browse state that electric business plateform system is current according to user, The category that will be being browsed with user, the category with strong correlation are pushed to user, and can be according to the browsing row of user For the category for continuing to optimize recommendation, enhance the exposure dynamics of related category, the product that user is most interested in is placed on optimal goods It on position, helps user to find the product of perception, shortens the Buying Cycle of user, optimize user experience, user is allowed to feel electric business The human oriented design of plateform system, to realize the effect for promoting user's single purchase GMV (turnover).
Another embodiment of the present invention provides that a kind of fixed number is true according to the method for the degree of association really based on the embodiment of the present invention The method of fixed data correlation degree recommending data.
The method of the recommending data is based on the data correlation that fixed number is determined according to the method for the degree of association really of the embodiment of the present invention Recommending data is spent, fixed number according to the result of each step of method of the degree of association can apply to the recommending data to the embodiment of the present invention really Method in.
The method of the recommending data specifically includes that the data for obtaining the current browsing items of user, and according to determining user The currently data with the degree of association of corresponding associated entry data of browsing items, will meet the association item of default recommendation condition Mesh number evidence recommends user.
Wherein it is determined that data and the degree of association of corresponding associated entry data of the current browsing items of user include strong close Connection, weak rigidity, onrelevant.Default recommendation condition can be preset according to business demand, such as the recommendation condition of setting can be with Are as follows: the data of recommended entry are the data of the strong associated entry of the data of the current browsing items of user, alternatively, recommended entry Data be the data of current browsing items and the data of each strong associated entry relevance scores it is mid-score highest M strong (M is natural number to the data of associated entry, and M is less than the total of the data of all strong associated entries of the data of current browsing items Number).
The method of the recommending data of the embodiment of the present invention can be adapted for the commodity number that associated category is recommended in electric business field According to being readily applicable to the data that other field recommends associated with the data that user browses content to user.According to application Scene can according to need to define the meaning of the entry of data.For example, the data of an entry can be in electric business field The ID of a certain rank commodity category browses other application scenarios such as user other webpages in addition to electric business platform website Data, the commodity data that this web data is divided not in accordance with commodity category can then be used according to the content of web data The data of one content of family browsing can be used as the data of an entry.
Fixed number is according to the method for the degree of association and the method for recommending data really for the embodiment of the present invention, the browsing row based on user For data, the degree of association between different categories is excavated, session_id (user initiates the session id once browsed) is introduced and limits product The data boundary of class relevance must be that the category in session of user while being accessed to could be by as with association Property category, if two categories occur from different sessions, must not believe that the two categories be used as have relevance. It is all the movable external embodiment of user psychology because of the browsing each time (session) of user, user must be with certain purpose Category browsing is being carried out, so being only and which category in user mind being best embodied only in same primary browsing behavior With correlation.The category relevance model of the embodiment of the present invention is based on above-mentioned consideration, is not blindly to browse user every time Data all pool together, avoid using the heat in electric business platform search category as association category caused by the category degree of association Confirm the not high defect of accuracy.In addition, proceeding from the situation as a whole to count the historical viewings behavior of all users, different product are ultimately generated Relevance between class both considers the individual behavior of user, also the comprehensive population effect for considering user, based on individual session number It as recessive weight, gives a mark to different categories, excavates the strong and weak correlation between category, can both be imitated with the operation of lift scheme Rate, and can be with lift scheme to the computational accuracy of category relevance.And it is possible to which it is current to make electric business plateform system obtain user Browsing behavior when, can be liked according to individual subscriber, personalized recommended products is pointedly provided, is disappeared to electric business platform The sale of Fei Pin has directive significance, facilitates to bring bigger business valence to electric business plateform system in practical electric business business Value.
Fig. 3 is the main modular schematic diagram of the device of determining data correlation degree according to an embodiment of the present invention.
The embodiment of the present invention really fixed number according to the device 300 of the degree of association specifically include that associated data set determining module 301, Data statistics module 302, data correlation degree determining module 303.
Associated data set determining module 301, for determining associated data set according to the historical viewings data of user.
Associated data set includes at least the data of associated two entries.Application scenarios according to an embodiment of the present invention, can With as needed come define data entry meaning.For example, determining the pass between different category commodity datas in electric business field Connection degree, the data of an entry can be the ID of a certain rank commodity category, and other application scenarios such as user's browsing is removed Other web datas except electric business platform website, the commodity data that this web data is divided not in accordance with commodity category, Then the data of an entry can be can be used as according to the content of web data, the data of a content of user's browsing.
Specifically, associated data set determining module 301 is used for: according to entry data each in the historical viewings data of user institute The information for belonging to session determines the associated entry data of each entry data, wherein the associated entry data of an entry data be with The entry data belongs to the data of one or more entries of same session;For the data of each entry, this is counted respectively The corresponding correlating sessions of each associated entry data of mesh number evidence count, and the size counted according to correlating sessions is to the entry data Each associated entry data sorting, to obtain the associated entry data sequence of the entry data, wherein a pass of the entry data The corresponding correlating sessions of connection entry data are counted as the quantity of the entry data and the common affiliated session of the associated entry data; It is screened according to data of the preset rules to all entries, chooses the data of entry to determine associated data set with basis, wherein Preset rules may include: any one equal position in associated entry data the two for an entry data and the entry data In the top n position of the associated entry data sequence of another one, then the two is selected into associated data set, and N is natural number.
Data statistics module 302, for concentrating each entry data to carry out data statistics associated data according to statistical indicator, To obtain the statistical value corresponding with statistical indicator of each entry data.
Each associated entry data that statistical indicator can specifically include the current entries data for being performed data statistics are corresponding Correlating sessions counting, the quantity of the affiliated session of current entries data, each associated entry data of current entries data distinguish institute The corresponding reversion of quantity, each associated entry data of current entries data for belonging to session counts, wherein the one of current entries data The corresponding reversion of associated entry data counts are as follows: the associated entry data are as new current entries data and described work as preceding article When mesh number is according to associated entry data as new current entries data, the new current entries data and this new work as preceding article The quantity of session belonging to the associated entry data of mesh number evidence are common.
Data correlation degree determining module 303 is determined for the statistical value corresponding with statistical indicator according to each entry data The degree of association of each entry data and corresponding associated entry data.
Specifically, data correlation degree determining module 303 is used for the statistics corresponding with statistical indicator according to each entry data Value, calculates the relevance scores of each entry data with corresponding associated entry data, and determine each entry according to relevance scores The degree of association of data and corresponding associated entry data.
Data correlation degree determining module 303 can specifically include computational submodule, be used for: for each entry data, press The relevance scores of the entry data with a corresponding associated entry data: Score=N1*N2 are calculated according to following formula, In, Score is the relevance scores of the entry data and the associated entry data, and N1 is the associated entry number of the entry data The ratio with the quantity of the affiliated session of the entry data is counted according to corresponding correlating sessions, N2 is the association item of the entry data Mesh number counts the ratio with the quantity of the affiliated session of associated entry data of the entry data according to corresponding reversion.
Data correlation degree determining module 303 can also include determining submodule, be used for: each entry data is closed with corresponding The relevance scores and first threshold, second threshold for joining entry data are made comparisons, in which: relevance scores are greater than first threshold When, it is determined that the degree of association of the entry data and the associated entry data is Qiang Guanlian;Relevance scores be less than first threshold and When greater than second threshold, then the degree of association of the entry data and the associated entry data is weak rigidity;Relevance scores are less than When two threshold values, it is determined that the degree of association of the entry data and the associated entry data is onrelevant.
Another embodiment of the present invention additionally provide it is a kind of based on the embodiment of the present invention really fixed number according to the degree of association device The device of the 300 data correlation degree recommending datas determined.
The device of the recommending data, main includes obtaining module and recommending module.
Module is obtained, for obtaining the data of the current browsing items of user.
Recommending module, for according to the data of the determining current browsing items of user and the pass of corresponding associated entry data Connection degree will meet the associated entry data recommendation of default recommendation condition to the user.
Wherein it is determined that data and the degree of association of corresponding associated entry data of the current browsing items of user include strong close Connection, weak rigidity, onrelevant.Default recommendation condition can be preset according to business demand, such as the recommendation condition of setting can be with Are as follows: the data of recommended entry are the data of the strong associated entry of the data of the current browsing items of user, alternatively, recommended entry Data be the data of current browsing items and the data of each strong associated entry relevance scores it is mid-score highest M strong (M is natural number to the data of associated entry, and M is less than the total of the data of all strong associated entries of the data of current browsing items Number).
In addition, determining the device 300 of data correlation degree and the determination based on the embodiment of the present invention in embodiments of the present invention The specific implementation content of the device for the data correlation degree recommending data that the device 300 of data correlation degree determines, it is described above true Fixed number according to the degree of association method and based on the embodiment of the present invention really fixed number according to the degree of association method determine data correlation degree push away It recommends in the method for data and has been described in detail, therefore no longer illustrate in this duplicate contents.
Fig. 4 show can using the embodiment of the present invention really fixed number according to the method for the degree of association, the method for recommending data, really Fixed number according to the device of the degree of association or the device of recommending data exemplary system architecture 400.
As shown in figure 4, system architecture 400 may include terminal device 401,402,403, network 404 and server 405. Network 404 between terminal device 401,402,403 and server 405 to provide the medium of communication link.Network 404 can be with Including various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be used terminal device 401,402,403 and be interacted by network 404 with server 405, to receive or send out Send message etc..Various telecommunication customer end applications, such as the application of shopping class, net can be installed on terminal device 401,402,403 The application of page browsing device, searching class application, instant messaging tools, mailbox client, social platform software etc..
Terminal device 401,402,403 can be the various electronic equipments with display screen and supported web page browsing, packet Include but be not limited to smart phone, tablet computer, pocket computer on knee and desktop computer etc..
Server 405 can be to provide the server of various services, such as utilize terminal device 401,402,403 to user The shopping class website browsed provides the back-stage management server supported.Back-stage management server can believe the product received The data such as breath inquiry request carry out the processing such as analyzing, and processing result (such as recommendation information, product information) is fed back to terminal Equipment.
It should be noted that the method for determining the method or recommending data of data correlation degree provided by the embodiment of the present invention It is generally executed by server 405, correspondingly, the device of the device or recommending data that determine data correlation degree is generally positioned at service In device 405.
It should be understood that the number of terminal device, network and server in Fig. 4 is only schematical.According to realization need It wants, can have any number of terminal device, network and server.
Below with reference to Fig. 5, it illustrates the computer systems 500 for the server for being suitable for being used to realize the embodiment of the present application Structural schematic diagram.Server shown in Fig. 5 is only an example, should not function and use scope band to the embodiment of the present application Carry out any restrictions.
As shown in figure 5, computer system 500 includes central processing unit (CPU) 501, it can be read-only according to being stored in Program in memory (ROM) 502 or be loaded into the program in random access storage device (RAM) 503 from storage section 508 and Execute various movements appropriate and processing.In RAM 503, also it is stored with system 500 and operates required various programs and data. CPU 501, ROM 502 and RAM 503 are connected with each other by bus 504.Input/output (I/O) interface 505 is also connected to always Line 504.
I/O interface 505 is connected to lower component: the importation 506 including keyboard, mouse etc.;It is penetrated including such as cathode The output par, c 507 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage section 508 including hard disk etc.; And the communications portion 509 of the network interface card including LAN card, modem etc..Communications portion 509 via such as because The network of spy's net executes communication process.Driver 510 is also connected to I/O interface 505 as needed.Detachable media 511, such as Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 510, in order to read from thereon Computer program be mounted into storage section 508 as needed.
Particularly, disclosed embodiment, the process described above with reference to flow chart may be implemented as counting according to the present invention Calculation machine software program.For example, embodiment disclosed by the invention includes a kind of computer program product comprising be carried on computer Computer program on readable medium, the computer program include the program code for method shown in execution flow chart.? In such embodiment, which can be downloaded and installed from network by communications portion 509, and/or from can Medium 511 is dismantled to be mounted.When the computer program is executed by central processing unit (CPU) 501, the system that executes the application The above-mentioned function of middle restriction.
It should be noted that computer-readable medium shown in the present invention can be computer-readable signal media or meter Calculation machine readable storage medium storing program for executing either the two any combination.Computer readable storage medium for example can be --- but not Be limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.Meter The more specific example of calculation machine readable storage medium storing program for executing can include but is not limited to: have the electrical connection, just of one or more conducting wires Taking formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only storage Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device, Or above-mentioned any appropriate combination.In this application, computer readable storage medium can be it is any include or storage journey The tangible medium of sequence, the program can be commanded execution system, device or device use or in connection.And at this In application, computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for By the use of instruction execution system, device or device or program in connection.Include on computer-readable medium Program code can transmit with any suitable medium, including but not limited to: wireless, electric wire, optical cable, RF etc. are above-mentioned Any appropriate combination.
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the application, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of above-mentioned module, program segment or code include one or more Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical On can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it wants It is noted that the combination of each box in block diagram or flow chart and the box in block diagram or flow chart, can use and execute rule The dedicated hardware based systems of fixed functions or operations is realized, or can use the group of specialized hardware and computer instruction It closes to realize.
Being described in module involved in the embodiment of the present invention can be realized by way of software, can also be by hard The mode of part is realized.Described module also can be set in the processor, for example, can be described as: a kind of processor packet Include associated data set determining module 301, data statistics module 302, data correlation degree determining module 303.Wherein, these modules Title does not constitute the restriction to the module itself under certain conditions, for example, associated data set determining module 301 can also quilt It is described as " module of associated data set is determined for the historical viewings data according to user ".
As on the other hand, the present invention also provides a kind of computer-readable medium, which be can be Included in equipment described in above-described embodiment;It is also possible to individualism, and without in the supplying equipment.Above-mentioned calculating Machine readable medium carries one or more program, when said one or multiple programs are executed by the equipment, makes Obtaining the equipment includes: to determine that associated data set, associated data set include at least associated two according to the historical viewings data of user The data of a entry;Each entry data is concentrated to carry out data statistics associated data according to statistical indicator, to obtain each entry number According to statistical value corresponding with statistical indicator;According to the statistical value corresponding with statistical indicator of each entry data, each entry is determined The degree of association of data and corresponding associated entry data.
Technical solution according to an embodiment of the present invention determines associated data set according to the historical viewings data of user, then Each entry data is concentrated to carry out data statistics associated data according to statistical indicator, with obtain each entry data and statistical indicator Corresponding statistical value, further according to the statistical value corresponding with statistical indicator of each entry data, determine each entry data with it is corresponding The degree of association of associated entry data.It is able to ascend the computational accuracy to the data degree of association, and accurately determines the degree of association between data, Help more accurately to understand the incidence relation between data from user perspective, thus the data correlation degree that will be being browsed with user High data recommendation optimizes user experience to user.
Above-mentioned specific embodiment, does not constitute a limitation on the scope of protection of the present invention.Those skilled in the art should be bright It is white, design requirement and other factors are depended on, various modifications, combination, sub-portfolio and substitution can occur.It is any Made modifications, equivalent substitutions and improvements etc. within the spirit and principles in the present invention, should be included in the scope of the present invention Within.

Claims (14)

1. a kind of method of determining data correlation degree characterized by comprising
Determine that associated data set, the associated data set include at least associated two entries according to the historical viewings data of user Data;
Each entry data is concentrated to carry out data statistics the associated data according to statistical indicator, to obtain the system of each entry data Evaluation;
The degree of association of each entry data and associated entry data is determined according to the statistical value of each entry data.
2. the method according to claim 1, wherein the historical viewings data according to user determine associated data set The step of, comprising:
The associated entry of each entry data is determined according to the information of the affiliated session of entry data each in the historical viewings data of user Data;Wherein, the associated entry data of an entry data are the one or more items for belonging to same session with the entry data Purpose data;
For the data of each entry, the corresponding correlating sessions of each associated entry data for counting the entry data respectively are counted, And according to the size of correlating sessions counting to each associated entry data sorting of the entry data, to obtain the entry data Associated entry data sequence;Wherein, the corresponding correlating sessions of an associated entry data of the entry data are counted as the entry Data and the associated entry data jointly belonging to session quantity;
It is screened according to data of the preset rules to all entries, chooses the data of entry to determine associated data set with basis.
3. the method according to claim 1, wherein determining each entry according to the statistical value of each entry data The step of degree of association of data and associated entry data, comprising:
The relevance scores of each entry data and associated entry data are calculated according to the statistical value of each entry data;
The degree of association of each entry data and associated entry data is determined according to the relevance scores.
4. according to the method described in claim 3, it is characterized in that, calculating each entry according to the statistical value of each entry data The step of relevance scores of data and associated entry data, comprising:
For each entry data, the relevance point of the entry data and an associated entry data is calculated according to following formula Number: Score=N1*N2, wherein Score is the relevance scores of the entry data and the associated entry data, and N1 is the entry The corresponding correlating sessions of the associated entry data of data count the ratio with the quantity of the affiliated session of the entry data, and N2 is should The corresponding reversion of the associated entry data of entry data counts and the affiliated session of associated entry data of the entry data The ratio of quantity, wherein the value of the N1 and the N2 are obtained according to the statistical value.
5. according to the method described in claim 3, it is characterized in that, determining each entry data according to the relevance scores The step of with the degrees of association of associated entry data, comprising:
Each entry data is made comparisons with the relevance scores of associated entry data and first threshold, second threshold, in which:
When the relevance scores are greater than first threshold, it is determined that the degree of association of the entry data and the associated entry data is strong Association;
When the relevance scores are less than the first threshold and are greater than second threshold, then the entry data and the associated entry number According to the degree of association be weak rigidity;
When the relevance scores are less than the second threshold, it is determined that the degree of association of the entry data and the associated entry data For onrelevant.
6. a kind of method of the data correlation degree recommending data determined based on method described in any one of claims 1 to 5, It is characterized in that, comprising:
Obtain the data of the current browsing items of user;
According to the degree of association of the data of the determining current browsing items of the user and associated entry data, default recommendation will be met The associated entry data recommendation of condition gives the user.
7. a kind of device of determining data correlation degree characterized by comprising
Associated data set determining module determines associated data set, the associated data for the historical viewings data according to user Collection includes at least the data of associated two entries;
Data statistics module, for concentrating each entry data to carry out data statistics the associated data according to statistical indicator, with Obtain the statistical value of each entry data;
Data correlation degree determining module, for determining each entry data and associated entry according to the statistical value of each entry data The degree of association of data.
8. device according to claim 7, which is characterized in that the associated data set determining module is also used to:
The associated entry of each entry data is determined according to the information of the affiliated session of entry data each in the historical viewings data of user Data;Wherein, the associated entry data of an entry data are the one or more items for belonging to same session with the entry data Purpose data;
For the data of each entry, the corresponding correlating sessions of each associated entry data for counting the entry data respectively are counted, And according to the size of correlating sessions counting to each associated entry data sorting of the entry data, to obtain the entry data Associated entry data sequence;Wherein, the corresponding correlating sessions of an associated entry data of the entry data are counted as the entry Data and the associated entry data jointly belonging to session quantity;
It is screened according to data of the preset rules to all entries, chooses the data of entry to determine associated data set with basis.
9. device according to claim 7, which is characterized in that the data correlation degree determining module is also used to:
The relevance scores of each entry data and associated entry data are calculated according to the statistical value of each entry data;
The degree of association of each entry data and associated entry data is determined according to the relevance scores.
10. device according to claim 9, which is characterized in that the data correlation degree determining module includes calculating submodule Block is used for:
For each entry data, the relevance point of the entry data and an associated entry data is calculated according to following formula Number: Score=N1*N2, wherein Score is the relevance scores of the entry data and the associated entry data, and N1 is the entry The corresponding correlating sessions of the associated entry data of data count the ratio with the quantity of the affiliated session of the entry data, and N2 is should The corresponding reversion of the associated entry data of entry data counts and the affiliated session of associated entry data of the entry data The ratio of quantity, wherein the value of the N1 and the N2 are obtained according to the statistical value.
11. device according to claim 9, which is characterized in that the data correlation degree determining module further includes determining son Module is used for:
Each entry data is made comparisons with the relevance scores of associated entry data and first threshold, second threshold, in which:
When the relevance scores are greater than first threshold, it is determined that the degree of association of the entry data and the associated entry data is strong Association;
When the relevance scores are less than the first threshold and are greater than second threshold, then the entry data and the associated entry number According to the degree of association be weak rigidity;
When the relevance scores are less than the second threshold, it is determined that the degree of association of the entry data and the associated entry data For onrelevant.
12. a kind of device of the data correlation degree recommending data determined based on device described in any one of claim 7 to 11, It is characterised by comprising:
Module is obtained, for obtaining the data of the current browsing items of user;
Recommending module, for the degree of association according to the data of the determining current browsing items of the user and associated entry data, The associated entry data recommendation of default recommendation condition will be met to the user.
13. a kind of server characterized by comprising
One or more processors;
Memory, for storing one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processors Realize such as method as claimed in any one of claims 1 to 6.
14. a kind of computer-readable medium, is stored thereon with computer program, which is characterized in that described program is held by processor Such as method as claimed in any one of claims 1 to 6 is realized when row.
CN201711032881.5A 2017-10-30 2017-10-30 A kind of method and apparatus, data recommendation method and the device of determining data correlation degree Pending CN109727047A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711032881.5A CN109727047A (en) 2017-10-30 2017-10-30 A kind of method and apparatus, data recommendation method and the device of determining data correlation degree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711032881.5A CN109727047A (en) 2017-10-30 2017-10-30 A kind of method and apparatus, data recommendation method and the device of determining data correlation degree

Publications (1)

Publication Number Publication Date
CN109727047A true CN109727047A (en) 2019-05-07

Family

ID=66291819

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711032881.5A Pending CN109727047A (en) 2017-10-30 2017-10-30 A kind of method and apparatus, data recommendation method and the device of determining data correlation degree

Country Status (1)

Country Link
CN (1) CN109727047A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110532478A (en) * 2019-09-04 2019-12-03 北京人民在线网络有限公司 A kind of dissemination of news method based on big data processing
CN111416904A (en) * 2020-03-13 2020-07-14 维沃移动通信有限公司 Data processing method, electronic device and medium
CN111784342A (en) * 2020-06-28 2020-10-16 广东金宇恒软件科技有限公司 Centralized payment dynamic monitoring management system based on big data
CN112633901A (en) * 2020-12-18 2021-04-09 深圳市思为软件技术有限公司 Guest judging method and related equipment
CN113515575A (en) * 2021-06-16 2021-10-19 北京格灵深瞳信息技术股份有限公司 Associated data processing method and device, electronic equipment and storage medium
CN114357411A (en) * 2022-01-19 2022-04-15 英才(广州)在线教育科技有限公司 Online education system based on block chain

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070022003A1 (en) * 2005-07-19 2007-01-25 Hui Chao Producing marketing items for a marketing campaign
CN102402766A (en) * 2011-12-27 2012-04-04 纽海信息技术(上海)有限公司 User interest modeling method based on web page browsing
CN103246980A (en) * 2012-02-02 2013-08-14 阿里巴巴集团控股有限公司 Information output method and server
CN103839167A (en) * 2012-11-21 2014-06-04 大连灵动科技发展有限公司 Commodity candidate set recommendation method
CN103839169A (en) * 2012-11-21 2014-06-04 大连灵动科技发展有限公司 Personalized commodity recommendation method based on frequency matrix and text similarity

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070022003A1 (en) * 2005-07-19 2007-01-25 Hui Chao Producing marketing items for a marketing campaign
CN102402766A (en) * 2011-12-27 2012-04-04 纽海信息技术(上海)有限公司 User interest modeling method based on web page browsing
CN103246980A (en) * 2012-02-02 2013-08-14 阿里巴巴集团控股有限公司 Information output method and server
CN103839167A (en) * 2012-11-21 2014-06-04 大连灵动科技发展有限公司 Commodity candidate set recommendation method
CN103839169A (en) * 2012-11-21 2014-06-04 大连灵动科技发展有限公司 Personalized commodity recommendation method based on frequency matrix and text similarity

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
徐清: "B2C电子商务中商品推荐模型研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110532478A (en) * 2019-09-04 2019-12-03 北京人民在线网络有限公司 A kind of dissemination of news method based on big data processing
CN110532478B (en) * 2019-09-04 2022-05-03 北京人民在线网络有限公司 News dissemination method based on big data processing
CN111416904A (en) * 2020-03-13 2020-07-14 维沃移动通信有限公司 Data processing method, electronic device and medium
CN111784342A (en) * 2020-06-28 2020-10-16 广东金宇恒软件科技有限公司 Centralized payment dynamic monitoring management system based on big data
CN111784342B (en) * 2020-06-28 2023-08-25 广东金宇恒软件科技有限公司 Dynamic monitoring management system based on big data centralized payment
CN112633901A (en) * 2020-12-18 2021-04-09 深圳市思为软件技术有限公司 Guest judging method and related equipment
CN113515575A (en) * 2021-06-16 2021-10-19 北京格灵深瞳信息技术股份有限公司 Associated data processing method and device, electronic equipment and storage medium
CN114357411A (en) * 2022-01-19 2022-04-15 英才(广州)在线教育科技有限公司 Online education system based on block chain

Similar Documents

Publication Publication Date Title
CN109727047A (en) A kind of method and apparatus, data recommendation method and the device of determining data correlation degree
CN108153901A (en) The information-pushing method and device of knowledge based collection of illustrative plates
CN108664513B (en) Method, device and equipment for pushing keywords
CN107273436A (en) The training method and trainer of a kind of recommended models
CN107679211A (en) Method and apparatus for pushed information
WO2020088058A1 (en) Information generating method and device
CN110111167A (en) A kind of method and apparatus of determining recommended
CN110020162B (en) User identification method and device
CN109388548A (en) Method and apparatus for generating information
CN107944956A (en) Method and apparatus for generating information
CN110413872A (en) Method and apparatus for showing information
US20110004508A1 (en) Method and system of generating guidance information
CN107885873A (en) Method and apparatus for output information
CN107169077A (en) Method and apparatus for pushed information
CN107977678A (en) Method and apparatus for output information
CN109901987A (en) A kind of method and apparatus generating test data
CN107784076A (en) The method and apparatus of visualization structure user behavior data
CN105095357A (en) Method and device for processing consultation data
CN109190027A (en) Multi-source recommended method, terminal, server, computer equipment, readable medium
CN107291835A (en) A kind of recommendation method and apparatus of search term
CN108197298A (en) A kind of smart shopper exchange method and system based on natural language processing
CN109711917A (en) Information-pushing method and device
CN109785072A (en) Method and apparatus for generating information
CN110516033A (en) A kind of method and apparatus calculating user preference
CN109993566A (en) A kind of method and apparatus for predicting product objective data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190507