CN107122367B - User attribute value calculation method and device based on user browsing behavior - Google Patents

User attribute value calculation method and device based on user browsing behavior Download PDF

Info

Publication number
CN107122367B
CN107122367B CN201610104707.6A CN201610104707A CN107122367B CN 107122367 B CN107122367 B CN 107122367B CN 201610104707 A CN201610104707 A CN 201610104707A CN 107122367 B CN107122367 B CN 107122367B
Authority
CN
China
Prior art keywords
user
attribute value
page
website
webpage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610104707.6A
Other languages
Chinese (zh)
Other versions
CN107122367A (en
Inventor
李辉
高俊鑫
沈栋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201610104707.6A priority Critical patent/CN107122367B/en
Publication of CN107122367A publication Critical patent/CN107122367A/en
Application granted granted Critical
Publication of CN107122367B publication Critical patent/CN107122367B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Abstract

The specific embodiment of the application provides a user attribute value calculation method and a calculation device based on user browsing behaviors, wherein the user attribute value calculation method comprises the following steps: calling a historical webpage which is accessed by a user daily from a user information database; acquiring a page attribute value corresponding to the historical webpage according to a page attribute value database; and obtaining the user attribute value of the corresponding user according to the page attribute value. The user attribute value calculation means includes: the scheduling equipment is used for calling the historical webpage which is accessed by the user in daily life from the user information database; the acquisition equipment is used for acquiring the page attribute value corresponding to the historical webpage according to a page attribute value database; and the obtaining device is used for obtaining the user attribute value of the corresponding user according to the page attribute value. According to the method and the device, the attribute information of the user can be fully known without the application of the user, and the service push or the information provision can be conveniently carried out in a targeted manner.

Description

User attribute value calculation method and device based on user browsing behavior
Technical Field
The present application relates to the field of computers, and in particular, to a method for obtaining user attribute values, and more particularly, to a method and an apparatus for calculating user attribute values based on user browsing behavior.
Background
With the development of the internet, especially the organic combination of the internet and finance in recent years, how to effectively serve the financial industry by using the big data information accumulated in the internet becomes a technical problem to be solved urgently. In the big data age, most log information accessed by users is recorded, including login information, browsing behavior information, mouse movement information, keystroke behavior information, user attribute information and the like of the users. Because different web sites often have different themes, for example, some web sites focus mainly on the financial field, some web sites focus mainly on the scientific field, and some web sites focus mainly on the political field. Therefore, various behaviors of the user can be analyzed according to the daily browsed webpages of the user.
Specifically, in the internet financial field, a web site in the financial field may generally include a page related to investment and financing and a page related to loan and loan, and generally, a user with an investment requirement may visit more pages in the investment and financing aspect to pay attention to information in the aspects of investment profitability, risk and the like; and a user with loan demand will go more to visit the page of the loan aspect and pay attention to the information of the loan interest rate, the loan term and the like. Thus, the user's financial needs may be known based on the user's daily browsing of financial-related web pages.
Knowledge of the value of the user's funding requirement attribute is of great use, both in terms of the user's marketing and in terms of the financing configuration of funds. Therefore, those skilled in the art are keenly required to develop a method for obtaining the user fund demand based on the user browsing behavior, so that the financial service provider can effectively serve the internet financial industry by using the big data information in the internet, and the development of the internet financial industry is promoted.
Disclosure of Invention
In view of this, the technical problem to be solved by the present application is to provide a user attribute value calculation method and a calculation apparatus based on a user browsing behavior, so as to solve the problem that a user attribute value cannot be obtained according to a behavior of a user browsing a webpage in the prior art.
In order to solve the above problem, a specific embodiment of the present application provides a user attribute value calculation method based on a user browsing behavior, including: calling a historical webpage which is accessed by a user daily from a user information database; acquiring a page attribute value corresponding to the historical webpage according to a page attribute value database; and obtaining the user attribute value of the corresponding user according to the page attribute value.
Another specific embodiment of the present application further provides a device for calculating a user attribute value based on a user browsing behavior, including: the scheduling equipment is used for calling the historical webpage which is accessed by the user in daily life from the user information database; the acquisition equipment is used for acquiring the page attribute value corresponding to the historical webpage according to a page attribute value database; and the obtaining device is used for obtaining the user attribute value of the corresponding user according to the page attribute value.
According to the foregoing embodiments of the present application, it can be seen that the method and the device for calculating a user attribute value based on a user browsing behavior have at least the following effective effects or characteristics: the method mainly comprises the steps of analyzing logs of daily website access of a user, mining big data information of an access page, designing a statistical model to judge the attribute value of the user, fully knowing the requirements of the user without applying operation of the user, and facilitating targeted information pushing or service providing for the user so as to promote the rapid development of national economy.
Of course, it is not necessary for any product or method of the present application to achieve all of the above-described advantages at the same time.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the scope of the invention, as claimed.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort.
Fig. 1 is a flowchart of a user attribute value calculation method based on a user browsing behavior according to an embodiment of the present application;
fig. 2 is a flowchart of generating a page attribute value database based on a user browsing behavior according to an embodiment of the present application;
FIG. 3 is a flowchart illustrating a method for calculating user attribute values based on user browsing behavior according to an embodiment of the present disclosure;
FIG. 4 is a block diagram of an apparatus for calculating user attribute values based on user browsing behavior according to an embodiment of the present application;
fig. 5 is a block diagram of a unit for solving a webpage attribute value based on a user browsing behavior according to an embodiment of the present application;
fig. 6 is a general block diagram of a user attribute value calculation apparatus based on user browsing behavior according to an embodiment of the present application.
Detailed Description
For the purpose of promoting a clear understanding of the objects, aspects and advantages of the embodiments of the present application, reference will now be made to the accompanying drawings and detailed description, wherein like reference numerals refer to like elements throughout.
The illustrative embodiments and descriptions of the present application are provided to explain the present application and not to limit the present application. Additionally, the same or similar numbered elements/components used in the drawings and the embodiments are used to represent the same or similar parts.
As used herein, "first," "second," …, etc., are not specifically intended to mean in a sequential or chronological order, nor are they intended to limit the application, but merely to distinguish between elements or operations described in the same technical language.
With respect to directional terminology used herein, for example: up, down, left, right, front or rear, etc., are simply directions with reference to the drawings. Accordingly, the directional terminology used is intended to be illustrative and is not intended to be limiting of the present teachings.
As used herein, the terms "comprising," "including," "having," "containing," and the like are open-ended terms that mean including, but not limited to.
As used herein, "and/or" includes any and all combinations of the described items.
As used herein, the terms "substantially", "about" and the like are used to modify any slight variation in quantity or error that does not alter the nature of the variation. Generally, the range of slight variations or errors modified by such terms may be 20% in some embodiments, 10% in some embodiments, 5% in some embodiments, or other values. It should be understood by those skilled in the art that the aforementioned values can be adjusted according to actual needs, and are not limited thereto.
Certain words used to describe the present application are discussed below or elsewhere in this specification to provide additional guidance to those skilled in the art in describing the present application.
Fig. 1 is a flowchart of a user attribute value calculation method based on a user browsing behavior according to an embodiment of the present application, where a log of a user visiting a website daily is analyzed, data mining is performed on a visited webpage, a user demand (e.g., an investment demand or a loan demand) is obtained, the user demand can be fully understood without a user applying operation, and information can be pushed to the user or a service can be provided to the user in a targeted manner.
The figure shows an embodiment comprising:
step 101: and calling historical webpages daily visited by the user from the user information database. In an embodiment of the present application, web pages browsed by a user in each service scenario are collected, and if the web page visited by the user a is denoted as u (a) ═ u1,u2,…,unIn which uiThe URL of each webpage accessed by the user A is represented, n represents the number of the webpages accessed by the user A in a preset history stage, the preset history stage can be one month, half year, one year, three years and the like, and the history webpages comprise various webpages accessed by the user daily, including news webpages, science and technology webpages, financial loan webpages, political entertainment webpages and the like. The user information database may be user browsing information recorded by a large website, or user browsing information recorded by a plurality of websites in a combined manner, and is stored in a shared database for a plurality of service systems to call, which is not limited in this application.
Step 102: and filtering out the web pages which are irrelevant to the target attribute in the historical web pages. In an embodiment of the present application, in order to save subsequent processing overhead, a subsequent web page to be analyzed may be specified according to a business requirement or a target attribute, for example, a web page not belonging to a specified site in a historical web page is filtered, that is, a web page of a site (website) related to financial loan is reserved, such as a website related to a financial institution, a personal loan, a your own loan, and the like.The URL list of the specified webpage to be analyzed is F ═ F1,f2,…,fm},fiRepresents the ith specified URL, where m represents the number of specified URLs.
In an embodiment of the present application, step 102 further includes:
step 1021: web pages of the site related to the target attribute are collected. If the objective attributes of the present application require information in order to obtain the user's funds, then the sites associated with financial lending include: land and gold houses, personal credits, your own credits and the like, which are related to investing, managing and loan financing.
Step 1022: and filtering out webpages which do not belong to the website webpages in the historical webpages according to the URLs. Because each webpage has a unique URL, historical webpages which do not belong to the specified sites can be easily filtered out according to the URL, and therefore the accuracy of the user attribute value of the user is improved.
Step 103: and acquiring a page attribute value corresponding to the historical webpage according to a page attribute value database. The page attribute values corresponding to the historical web pages belong to data in a page attribute value database, and a specific generation method of the page attribute value database will be described in detail below, wherein the page attribute values represent the tendency of the page, such as web pages related to financing loan, if-1 represents loan and +1 represents financing, the page attribute values are between-1 and +1, the more the page attribute values are biased towards-1, the more the content of the page is related to loan, the more the page attribute values are biased towards +1, and the more the content of the page is related to financing.
Step 104: and obtaining the user attribute value of the corresponding user according to the page attribute value. The user attribute value represents the target tendency of the user, if the interval [ -1, +1] represents the user attribute value of the user, if-1 represents that the user has the loan demand, and +1 represents that the user has the financial demand, the user attribute value is more biased to-1, which represents that the loan demand of the user is stronger, and the user attribute value is more biased to +1, which represents that the investment demand of the user is stronger, which is not limited by the present application.
In the embodiment of the present application, p (a) is a result of determining the user attribute value by the user a, where-1 is greater than or equal to p (a) is less than or equal to 1, p (a) is more biased towards 1 to indicate that the user a has the user attribute value in the aspect of investment and financing, and p (a) is more biased towards-1 to indicate that the user a has the user attribute value in the aspect of loan.
In a specific embodiment of the present application, obtaining a user attribute value of a corresponding user according to the page attribute value specifically includes:
step 1041: and acquiring the access time of the user to access each historical webpage. And recording the access time of the user to each historical webpage while recording the access time of the user to the historical webpages.
Step 1042: and distributing a weight value to the corresponding page attribute value according to the access time. In general, the longer the access time is from the current time, the smaller the weight value assigned to the accessed historical page is, and different target attributes of the user at different periods are mainly considered, and the target attribute of the user can be most reflected by the activity closest to the current time.
Step 1043: and obtaining the user attribute value of the corresponding user according to the page attribute value and the weight value. In order to accurately obtain the user attribute value of the user at the current time, the application considers that the user attribute values are different in different time periods, for example, the user may have a need of investment and financing in the last year, but has a need of loan and financing in this year, the access record (the accessed webpage) which is farther away from the current time has a smaller effect on the current user attribute value discrimination, and the access record which is closer to the current time has a larger feasibility degree on knowing the user attribute value, so that the attribute value of the webpage accessed by the user needs to be attenuated according to the time, and the exponential attenuation is generally used. In the specific examples of this application, decade (t) is definedi) For the attenuation function, decap (ti) exp (- δ (ti)), δ (ti) is greater than 0, δ (ti) represents time tiTime from the current moment, tiRepresenting the time when the user A visits the historical web page, 0 ≦ decade (t)i)≤1。
In a specific embodiment of the present application, a specific calculation formula of the user attribute value p (a) may be:
Figure GDA0002384200290000061
wherein, P (A) is the current user attribute value of the user A, P (A) is more than or equal to-1 and less than or equal to 1; decapay (t)i) As a function of attenuation, decay (t)i)=exp(-δ(ti)),δ(ti) Greater than 0, delta (t)i) Representing time t of user A accessing historical web pagesiTime from the current moment, tiRepresenting the time when the user A visits the historical web page, 0 ≦ decade (t)i) Less than or equal to 1; h (A) represents historical web pages visited by the user A; h (A, t)i) Indicates that user A is at tiHistorical web pages visited at any time; p (H (A, t)i) Represents user A at t)iPage attribute values of historical webpages accessed at all times; subscript H (A, t)i) ∈ H (A) indicates that user A is at tiThe historical web pages accessed at the moment belong to the historical web pages accessed by the user A.
Step 105: and pushing specific information to a corresponding user according to the user attribute value information. After obtaining the user attribute value information of the user, the corresponding service information or service consultation information may be pushed to the user in a targeted manner, or specific information may be pushed to the user through a third party approach, for example, specific information may be pushed to the mobile terminal of the user through a mobile communication network.
Fig. 2 is a flowchart for generating a page attribute value database based on user browsing behavior according to an embodiment of the present disclosure, and as shown in fig. 2, website webpages related to target attributes are collected (historical webpages daily visited by a user belong to a part of the website webpages), a processing frequency of each website webpage is determined according to a visiting summary frequency of each website webpage by all users, the webpages are processed according to the processing frequency, then normalization, word segmentation, and word filtering are sequentially performed, words of all website webpages are selected, or words of a part of website webpages are randomly selected to form a dictionary vocabulary, an attribute value of each word in the dictionary vocabulary is calculated, and finally, a page attribute value of each website webpage is solved by using the words in the dictionary vocabulary.
The figure shows an embodiment comprising:
step 100: web pages of the site related to the target attribute are collected. If the objective attribute specified in the present application is to obtain the fund demand information of the user, the web page of the site related to the objective attribute is the web page related to the financial transaction, for example, the site related to the financial loan comprises: land and gold houses, personal credits, your own credits and the like, which are related to investing, managing and loan financing. In other specific embodiments of the present application, the website webpages related to the target attributes may not be specified in the early collection process, and the website webpages not belonging to the specified websites (the websites related to financial loan administration) may be filtered out according to the URLs of the webpages in the post-processing process, so that the post-processing overhead is saved.
Step 200: and processing the website webpage to obtain a word list corresponding to the website webpage.
In a specific embodiment of the present application, step 200 may specifically include:
step 2001: and obtaining the access heat of the website webpage so as to obtain the information of the website webpage according to the access heat. Counting the web pages f of each user on the siteiFrequency of accesses viThus, vectorizing the website web page visited by each user, for example, the visit frequency of the user a visiting each website web page may be represented as V (a) ═ V (a)1),V(A2),…,V(Am) If the user does not visit a certain website webpage, setting the corresponding visit times as 0; obtaining the web page f of each user to different sitesiThe access frequency of each site webpage can be obtained, and the site webpage fiAggregated frequency of accesses V (f)i) Can be expressed as:
Figure GDA0002384200290000071
wherein, V (A)i) The method includes the steps of representing the frequency of one user accessing each website webpage, A ∈ user representing all users accessing a specified website, and quantizing the access summary frequency vector of the website webpage to be analyzed into V (F) ═ V (f) in the specific embodiment of the application1),V(f2),…,V(fm) FromAnd obtains the access heat of the web pages of the site.
Step 2002: and acquiring the information of the website webpage. In the specific implementation manner of the application, information of website webpages can be acquired by using a webpage crawler technology, the crawler frequency of each website webpage is determined according to the access summarizing frequency (access heat), the website webpages to be analyzed are crawled, and different crawler strategies can be designed according to the access heat of different website webpages in consideration of the fact that the crawler needs to consume a large amount of system resources; after the crawler frequency is determined, a crawler program can be designed to crawl website webpages with different access heat. And crawling all the web pages according to the crawler frequency to obtain information corresponding to each website web page. In this embodiment, the crawled information may be denoted as C ═ C1,c2,…,cmIn which c isiI.e. the web page f of the siteiAnd information obtained after crawler is performed.
Step 2003: and carrying out normalization processing on the information to obtain standard information. The normalization process includes: converting all capital letters into lowercase letters; the traditional Chinese characters are converted into simplified Chinese characters; converting the character of the half-angle symbol into the character of the full-angle symbol; and replacing synonyms in the short text by using a synonym processing algorithm to finish the normalization processing of the information C.
Step 2004: and performing word segmentation processing on the standard information to obtain a word list corresponding to the website webpage. I.e. to translate the standard information into a plurality of words. In the embodiments of the present application, stop words in the vocabulary may also be filtered based on the stop word list. In natural language processing, words or phrases and punctuation marks that are semantically nonsense or irrelevant to the service are generally put in a table, and words in the table are not analyzed in subsequent analysis, and the table is generally called a stop word table. In the specific embodiment of the present application, stop words in the stop word list are denoted as S ═ { S1, S2, …, st }, where si denotes the ith stop word, and t denotes the number of stop words.
Step 300: and randomly selecting words with a preset proportion from the word list by taking the webpage as a unit so as to calculate the attribute value of the randomly selected words. In the embodiment of the present application, in order to improve the accuracy of calculating the user attribute value, words (i.e., word lists) corresponding to all web pages of a site may be selected to form a dictionary word list, and in order to save processing resources, words in a predetermined proportion may also be randomly selected from the word lists to form the dictionary word list by using the web pages as units, assuming that the number of the web pages of the selected site is Z. In addition, after this step, the selected website webpages need to be labeled, that is, each website webpage is labeled as two types, that is, Y { +1, -1}, if the target attribute of the present application is to obtain the fund demand information of the user, then +1, -1 respectively indicate the user's needs for investing in funds and lending, and the selected website webpages may be represented as WY { W (c) }i),yi|1≤i≤z,yi∈ Y, i is a positive integer }, W (c)i) Representing the result of word segmentation of the selected website webpage, and yi represents the selected website webpage ciA printed label (+1 or-1). In the embodiment of the application, it is assumed that the selected website web page has s different words, and the ith different word uses diMeaning that all words build a dictionary vocabulary D, which can be expressed as D ═ D1,d2,…,ds}。
Step 400: and calculating the attribute value of each randomly selected word. If words (namely word lists) corresponding to all site web pages are selected to form a dictionary word list, the attribute value of each word in the word list is calculated (at the moment, the dictionary word list is the same as the word list), if words with a preset proportion are randomly selected from the word list to form the dictionary word list by taking the web pages as a unit, the words corresponding to partial site web pages are selected to form the dictionary word list, and the attribute value of each word in the dictionary word list is calculated (at the moment, the dictionary word list is a subset of the word list). In the embodiment of the application, after the selected website webpage is labeled, the ith word D in the dictionary word list D can be countediIn the random selectionJth site web page cjNumber of occurrences in, | dijL, calculating the attribute value of each word, i-th word D in dictionary word list DiThe attribute value calculation formula of (2) is:
Figure GDA0002384200290000081
wherein, | dijL is the d-th word in the dictionaryiThe word is in the j site web page c selected at randomjThe number of times of occurrence in the dictionary is that i is less than or equal to s, and s is the number of words in the dictionary vocabulary; y isjThe label of the jth website webpage is +1, which indicates that the investment requirement exists, and the label of-1 indicates that the loan requirement exists; z is the number of randomly selected website webpages, z is less than or equal to m, j is less than or equal to z, and m is the number of all website webpages subjected to crawler processing.
Step 500: and obtaining the page attribute value of each website webpage according to the attribute value. In the embodiment of the present application, the page attribute value of each website webpage (including randomly selected website webpages and unselected website webpages, that is, all website webpages processed by the crawler) can be obtained, and each website webpage ciPage attribute value of P (c)i) The calculation formula of (2) is as follows:
Figure GDA0002384200290000091
wherein, P (d)i) For appearance in dictionary vocabulary and in site web page ciThe attribute value of the word in (1); d represents a dictionary word list; w (c)i) Web page c representing a siteiThe word in (1);
Figure GDA0002384200290000092
for qualifier diPresence in site page ciThe Chinese belongs to the words in the dictionary vocabulary;
Figure GDA0002384200290000093
web page c representing cumulative siteiAttribute values of all words in; l ciI represents a site web pageciThe number of Chinese words.
Step 600: and generating a page attribute value database according to the page attribute value corresponding to the website webpage. And storing the page attribute values corresponding to all the website webpages in a database for calculating the user attribute values.
Fig. 3 is a comprehensive flowchart of a user attribute value calculation method based on a user browsing behavior according to an embodiment of the present application, and as shown in fig. 3, statistics is performed on access heat of all web pages of a specified site, and crawlers, normalization, word segmentation and filtering are performed on all web pages of the site according to the access heat to obtain a web page vocabulary corresponding to each web page of the site; selecting words of partial or all website webpages to form a dictionary word list, and solving the attribute value of each word in the dictionary word list; and then, the attribute value of each word in the dictionary word list is used for obtaining the page attribute value corresponding to each site webpage of the specified site. For a specific user A, collecting the historical access records of the user A, counting site webpages of the specified sites accessed by the user A, and obtaining the user attribute value of the user A at the current moment according to the page attribute value of the site webpages of the specified sites accessed by the user A and the access moment of the site webpages accessed by the user A.
For example, two sites related to financial loan are specified, wherein one site X related to financial loan has three web pages X1, X2 and X3; another loan-related site Y has two web pages Y1, Y2. Then the web pages X1, X2, X3 are marked as +1, indicating that the browsing user has an investment requirement for funds; the web pages Y1, Y2 are labeled-1 and represent the viewing user's desire to debit funds. The specified URL list to be analyzed is F ═ F1,f2,f3,f4,f5In which f1、f2、f3Sequentially corresponds to X1, X2, X3, f4、f5Corresponding to Y1 and Y2 in sequence. Assuming that a total of three users A, B, C visited site X, Y, assuming that the frequency of user a visiting each web page is denoted as v (a) ═ {1,1,1,1,0}, the frequency of user B visiting each web page is denoted as v (a) } 1,0,1,1,1, and the frequency of user C visiting each web page is denoted as v (a) } {0,0,1,0,1}, each time the web page is accessed by user aSince the access frequency of the web page X3 is the highest, that is, the access popularity of the web page X3 is the highest, the web page X3 needs to be heavily crawled when a crawler policy is formulated, and conversely, the crawler frequency of the web page X2 can be slightly reduced. Marking the text information after the web crawler as C ═ C1,c2,c3,c4,c5And recording the text information as C, performing normalization, word segmentation and word filtering to obtain words corresponding to web pages X1, X2, X3, Y1 and Y2, and recording the words as W (C)i)={wij|1≤j≤|ciL } in which wijRepresents the jth word, | c, in the ith web pageiI denotes the number of words last obtained by the ith web page, assuming | c1|、|c2|、|c3|、|c4|、|c5If all web pages are selected to form a dictionary vocabulary D, the number of words in the dictionary vocabulary is less than or equal to 80 (mainly considering that the same page may contain the same words after word segmentation, and different web pages may contain the same words after word segmentation, so the number of words in the dictionary vocabulary may be less than the sum of the numbers of words in the web pages), and if only web pages X1 and Y1 are selected, the number of words in the dictionary vocabulary is less than or equal to 45. For convenience of description, it is assumed that words formed after word segmentation of web pages are different, and words formed after word segmentation of different web pages are also different, and word forming dictionary vocabulary D ═ D of words of all web pages is selected1,d2,…,d80}Calculating the attribute value P (D) of each word in the dictionary vocabulary Di) I-th word D in dictionary vocabulary DiThe attribute value of (2) is calculated as:
Figure GDA0002384200290000101
wherein, | dijL is the d-th word in the dictionaryiWord in jth webpage cjThe number of times of occurrence in the dictionary is that i is less than or equal to s, and s is the number of words in the dictionary vocabulary; y isjThe label of the jth webpage is +1, which indicates that the investment requirement exists, and the label of-1 indicates that the loan requirement exists.
According to the genus of each wordProperty value P (d)i) The web page attribute value P (c) of each web page can be obtainedi) I.e. after summing up the attribute values of all words in each web page, dividing by the web page ciNumber of Chinese words, each Web page ciWeb page attribute value P (c)i) The calculation formula of (2) is as follows:
Figure GDA0002384200290000102
wherein, P (d)i) For appearance in dictionary vocabulary and in web page ciThe attribute value of the word in (1); d represents a dictionary word list; w (c)i) Web page c representing a siteiThe word in (1);
Figure GDA0002384200290000103
for qualifier diAppear on the web page ciThe Chinese belongs to the words in the dictionary vocabulary;
Figure GDA0002384200290000104
representing cumulative Web pages ciAttribute values of all words in; l ciI represents a Web page ciThe number of Chinese words.
If the user attribute value p (a) of the user a at the current time needs to be obtained, the web page attribute value of each web page visited by the user a and the access time of each web page visited by the user a are needed, the frequency that the user a visits each web page is given as v (a) {1,1,1, 0}, that is, the user a does not visit the web page Y2, and since the time from the time when the user a visits the web pages X1, X2, X3, and Y1 to the current time can also be obtained, the specific calculation formula of the user attribute value p (a) of the user a is:
Figure GDA0002384200290000111
wherein, P (A) is the user attribute value of the user A at the current moment, P is more than or equal to-1 and less than or equal to (A) and less than or equal to 1; decapay (t)i) As a function of attenuation, decay (t)i)=exp(-δ(ti)),δ(ti) Greater than 0, delta (t)i) Represents tiThe time being distant from the current timeThe time unit can be hour, day, week, month, or year, tiRepresents the time when the user A accesses the webpage, 0 is less than or equal to decade (t)i) Less than or equal to 1; h (A) represents a webpage accessed by the user A; h (A, t)i) Indicates that user A is at tiWeb pages accessed at any time; p (H (A, t)i) Represents user A at t)iConstantly accessing the webpage attribute value of the webpage; subscript H (A, t)i) ∈ H (A) indicates that user A is at tiThe web page accessed at the moment belongs to the web page accessed by the user A. Similarly, user attribute values P (B) and P (C) for user B, C may be found.
Suppose a web page ciWeb page attribute value P (c)i) The decay function of each web page accessed by user a has a value of decay (t) 0.8, 0.7, 0.5, -0.6, -0.9, respectivelyi) 0.4, 0.5, 0.8, 0.2, 0, then the user attribute value of user a is set to {0.4, 0.5, 0.8, 0.2, 0}
Figure GDA0002384200290000112
The user A can be known to have a relatively strong investment demand, and demand information related to financing can be pushed to the user A in a small amount; similarly, the frequency of accessing each web page by user B is denoted by v (a) {1,0,1,1,1}, and the values of the decay functions of accessing each web page by user B are assumed to be decay (t), respectivelyi) If {0.4, 0, 0.8, 0.2, 0.9}, then the user attribute value p (B) of user B is known to be equal to p (B) {0.4, 0, 0.8, 0.2, 0.9}, in the same way
Figure GDA0002384200290000113
The slight loan demand of the user B can be known, and the demand information related to the loan can be pushed to the user B in a small amount; the frequency of access to each web page by user C is denoted by v (a) {0,0,1,0,1}, and the value of the decay function of access to each web page by user C is assumed to be decay (t), respectivelyi) 0,0, 0.2, 0, 0.9, the user attribute value of user C is then set to {0,0, 0.2, 0, 0.9}
Figure GDA0002384200290000114
The user C can be informed of the strong loan demand, and can be pushed with a large amount of demand information related to the loan.
Fig. 4 is a block diagram of a user attribute value calculation apparatus based on a user browsing behavior according to an embodiment of the present application, and as shown in fig. 4, a log of a daily website visited by a user is analyzed to perform text data mining on a visited webpage, so as to obtain a user's demand for funds (investment demand or loan demand), so that the user's demand for funds (investment demand or loan demand) can be fully understood without requiring a user application, thereby facilitating accurate marketing and fund release by a sponsor, and facilitating a financing product promotion and fund absorption by a financing party in a more targeted manner.
In the specific embodiment shown in the figure, the user attribute value calculation apparatus includes a scheduling device 10, a filtering device 20, an obtaining device 30, an obtaining device 40, and an information pushing device 50, where the scheduling device 10 is configured to invoke a history web page daily visited by a user from a user information database; the filtering device 20 is configured to filter out web pages in the historical web pages that are not related to the target attribute, so as to calculate a page attribute value of the filtered historical web pages; the obtaining device 30 is configured to obtain a page attribute value corresponding to the historical webpage according to a page attribute value database; the obtaining device 40 is configured to obtain a user attribute value of a corresponding user according to the page attribute value; the information pushing device 50 is used for pushing specific information to the corresponding user according to the user attribute value. The user information database may be user browsing information recorded by a large website, or user browsing information recorded by multiple websites in a combined manner, and is stored in a shared database for being called by multiple service systems, which is not limited in this application.
Referring to fig. 4 again, the obtaining device 40 specifically includes an obtaining unit 401, a weight value allocating unit 402, and a calculating unit 403, where the obtaining unit 401 is configured to obtain an access time for a user to access each historical webpage; the weight value distribution unit 402 is configured to distribute a weight value to the corresponding page attribute value according to the access time; the calculating unit 403 is configured to obtain a user attribute value of a corresponding user according to the page attribute value and the weight.
Fig. 5 is a block diagram of a unit for solving a web page attribute value based on a user browsing behavior according to an embodiment of the present application, and as shown in fig. 5, according to a frequency of summarizing access to each web page of a website by all users, a crawler frequency of each web page of the website is determined, crawlers are performed on the web pages of the website according to the crawler frequency, then words of all web pages of the website are selected after normalization, word segmentation and word filtering are sequentially performed, or words of a part of web pages of the website are randomly selected to form a dictionary vocabulary, an attribute value of each word in the dictionary vocabulary is calculated, a web page attribute value of each web page is solved by using the words in the dictionary vocabulary, and finally, a web page attribute value database is generated (composed) according to the web page attribute value corresponding to the web page of the website.
In the specific embodiment shown in the figure, the generating unit 1 of the page attribute value database specifically includes a collecting module 11, a processing module 12, a word selecting module 13, a calculating module 14, an obtaining module 15, and a generating module 16, where the collecting module 11 is configured to collect website webpages related to target attributes; the processing module 12 is configured to process the website webpage to obtain a vocabulary corresponding to the website webpage; the word selecting module 13 is configured to randomly select words with a predetermined ratio from the word list by taking a webpage as a unit so as to calculate an attribute value of the randomly selected words; the calculation module 14 is configured to calculate an attribute value of each word in the word list; the obtaining module 15 is configured to obtain a page attribute value of each website webpage according to the attribute value; the generating module 16 is configured to generate a page attribute value database according to the page attribute value corresponding to the website webpage.
In a specific embodiment of the present application, the processing module 12 further includes an obtaining sub-module 121, an obtaining sub-module 122, a normalizing sub-module 123, and a word segmentation sub-module 124, where the obtaining sub-module 121 is configured to obtain the access popularity of the site webpage, so as to obtain the information of the site webpage according to the access popularity; the obtaining sub-module 122 is configured to obtain information of the website webpage; the normalization submodule 123 is configured to perform normalization processing on the information to obtain standard information; the participle sub-module 124 is configured to perform participle processing on the standard information to obtain a vocabulary corresponding to the website webpage.
FIG. 6 is a general block diagram of a user attribute value calculation apparatus based on user browsing behavior according to an embodiment of the present application, as shown in FIG. 6, a collection module 11 is used for collecting web pages of a site related to a target attribute; the processing module 12 performs crawler, normalization, word segmentation and filtering processing on the website webpages, so as to obtain word lists corresponding to the website webpages, that is, each website webpage may have a plurality of same words, but because of the normalization processing, no near-synonym or synonym exists, and the filtering processing mainly refers to filtering out words, words and punctuation marks which have no semantics or are irrelevant to the service in the word lists by using the disabled word lists; the word selecting module 13 randomly selects words with a predetermined proportion from the word list by taking a webpage as a unit, which means that a website webpage is randomly selected by taking the webpage as a unit, and then the words in the selected website webpage are all identified as selected; the calculation module 14 calculates the attribute value of each word in the word list, if all the site web pages are selected, the word list at this time includes words in all the site web pages, if some site web pages are selected by taking the web pages as a unit, the word list at this time only includes words in the selected site web pages, so that the data processing amount can be saved, in the big data processing, the number of all the site web pages is huge, some site web pages are randomly selected, and words appearing in all the site web pages can be basically covered; the obtaining module 15 obtains the page attribute value of the site webpage according to the attribute value of each word in the site webpage. For example, for a specific user a, the collecting module 11 is configured to collect site webpages related to a target attribute, the processing module 12 processes the site webpages to obtain word lists corresponding to the site webpages, and the calculating module 14 calculates attribute values of each word in the word lists; the obtaining module 15 obtains the page attribute value of each website webpage accessed by the user according to the attribute value, and obtains the user attribute value of the user a at the current moment.
The embodiment of the application provides a user attribute value calculation method and a calculation device based on user browsing behaviors, wherein a big data processing technology is used for calculating webpage attribute values of all webpages of sites related to target attributes, a cloud technology is used for collecting historical browsing information of all users into a database, the user attribute values of corresponding users can be obtained according to the historical webpage browsing time and the webpage browsing time of each user, and information push or service providing and other operations can be performed in a targeted manner according to the user attribute values, so that the development of a network big data era is promoted, and the rapid development of national economy is promoted.
The embodiments of the present application described above may be implemented in various hardware, software code, or a combination of both. For example, the embodiments of the present application may also be program codes executed in a Digital Signal Processor (DSP) to execute the above-described programs. The present application may also relate to a variety of functions performed by a computer processor, digital signal processor, microprocessor, or Field Programmable Gate Array (FPGA). The processor described above may be configured in accordance with the present application to perform certain tasks by executing machine-readable software code or firmware code that defines certain methods disclosed herein. Software code or firmware code may be developed in different programming languages and in different formats or forms. Software code may also be compiled for different target platforms. However, different code styles, types, and languages of software code and other types of configuration code for performing tasks according to the present application do not depart from the spirit and scope of the present application.
The foregoing is merely an illustrative embodiment of the present application, and any equivalent changes and modifications made by those skilled in the art without departing from the spirit and principles of the present application shall fall within the protection scope of the present application.

Claims (15)

1. A user attribute value calculation method based on user browsing behavior is characterized by comprising the following steps:
calling a historical webpage which is accessed by a user daily from a user information database;
acquiring a page attribute value corresponding to the historical webpage according to a page attribute value database; and
obtaining a user attribute value of a corresponding user according to the page attribute value;
the specific generation step of the page attribute value database comprises the following steps:
collecting website webpages related to the target attributes;
processing the website webpage to obtain a word list corresponding to the website webpage;
randomly selecting words with a preset proportion from the word list by taking a webpage as a unit, and calculating the attribute value of each randomly selected word;
obtaining a page attribute value corresponding to each website webpage according to the attribute value; and
generating a page attribute value database according to the page attribute value corresponding to the website webpage;
wherein the attribute value P (d) of a randomly selected wordi) The calculation formula of (2) is as follows:
Figure FDA0002384200280000011
wherein, | dijI is the d-th word in the word listiThe word is in the j site web page c selected at randomjThe number of times of occurrence in the word list is i not more than s, and s is the number of words in the word list; y isjThe label of the jth website webpage is +1 to represent positive attribute, and the label of-1 to represent negative attribute; z is the number of randomly selected website webpages, z is less than or equal to m, j is less than or equal to z, and m is the number of the website webpages.
2. The method for calculating user attribute values based on browsing behavior of a user according to claim 1, wherein before the step of obtaining page attribute values corresponding to the historical webpages from a page attribute value database, the method for calculating user attribute values further comprises:
and filtering the web pages which are irrelevant to the target attribute in the historical web pages so as to calculate the page attribute value of the historical web pages which are subjected to filtering processing.
3. The method for calculating a user attribute value based on browsing behavior of a user according to claim 2, wherein filtering out web pages irrelevant to the target attribute from historical web pages specifically comprises:
collecting website webpages related to the target attributes; and
and filtering out webpages which do not belong to the website webpages in the historical webpages according to the URLs.
4. The user attribute value calculation method based on user browsing behavior of claim 1, wherein after the step of obtaining the user attribute value of the corresponding user according to the page attribute value, the user attribute value calculation method further comprises:
and pushing specific information to the corresponding user according to the user attribute value.
5. The method for calculating a user attribute value based on a user browsing behavior according to claim 1, wherein obtaining a user attribute value of a corresponding user according to the page attribute value specifically comprises:
acquiring the access time of a user for accessing each historical webpage;
distributing a weight to the corresponding page attribute value according to the access time; and
and obtaining the user attribute value of the corresponding user according to the page attribute value and the weight value.
6. The method for calculating a user attribute value based on user browsing behavior of claim 1, wherein processing the website webpage specifically comprises:
acquiring information of the website webpage;
normalizing the information to obtain standard information; and
and performing word segmentation processing on the standard information to obtain a word list corresponding to the website webpage.
7. The method of calculating a user attribute value based on user browsing behavior of claim 6, wherein the step of processing the site web page further comprises, before the step of obtaining information of the site web page:
and obtaining the access heat of the website webpage so as to obtain the information of the website webpage according to the access heat.
8. The method for calculating user attribute values based on user browsing behavior of claim 1, wherein the page attribute value P (c) corresponding to a website webpagei) The calculation formula of (2) is as follows:
Figure FDA0002384200280000021
wherein, P (d)i) For appearance in the vocabulary and at the same time in the site page ciThe attribute value of the word in (1); d represents a word list; w (c)i) Web page c representing a siteiThe word in (1); subscript di∈W(ci) ∩ D for qualifier DiPresence in site page ciThe words also appear in the vocabulary;
Figure FDA0002384200280000022
web page c representing cumulative siteiAttribute values of all words in; l ciI represents the corresponding site web page ciThe number of Chinese words.
9. A user attribute value calculation apparatus based on a user browsing behavior, the user attribute value calculation apparatus comprising:
the scheduling equipment is used for calling the historical webpage which is accessed by the user in daily life from the user information database;
the acquisition equipment is used for acquiring the page attribute value corresponding to the historical webpage according to a page attribute value database; and
the obtaining device is used for obtaining a user attribute value of a corresponding user according to the page attribute value;
further comprising: the generating unit of the page attribute value database specifically comprises:
the collection module is used for collecting the website webpages related to the target attributes;
the processing module is used for processing the website webpage so as to obtain a word list corresponding to the website webpage;
the calculation module is used for randomly selecting words with a preset proportion from the word list by taking a webpage as a unit and calculating the attribute value of each randomly selected word;
the acquisition module is used for acquiring the page attribute value of each website webpage according to the attribute value; and
the generating module is used for generating a page attribute value database according to the page attribute value corresponding to the website webpage;
wherein the attribute value P (d) of a randomly selected wordi) The calculation formula of (2) is as follows:
Figure FDA0002384200280000031
wherein, | dijI is the d-th word in the word listiThe word is in the j site web page c selected at randomjThe number of times of occurrence in the word list is i not more than s, and s is the number of words in the word list; y isjThe label of the jth website webpage is +1 to represent positive attribute, and the label of-1 to represent negative attribute; z is the number of randomly selected website webpages, z is less than or equal to m, j is less than or equal to z, and m is the number of the website webpages.
10. The user attribute value calculation apparatus for use in connection with user browsing behavior of claim 9, wherein the user attribute value calculation apparatus comprises:
and the filtering device is used for filtering the web pages which are irrelevant to the target attribute in the historical web pages so as to calculate the page attribute value of the historical web pages which are subjected to filtering processing.
11. The apparatus for calculating a user attribute value based on user browsing behavior of claim 10, wherein the filtering device specifically comprises:
the collecting unit is used for collecting the website webpages related to the target attributes; and
and the filtering unit is used for filtering the webpages which do not belong to the website webpages in the historical webpages according to the URLs.
12. The user attribute value calculation apparatus for use in connection with user browsing behavior of claim 9, wherein the user attribute value calculation apparatus comprises:
and the information pushing equipment is used for pushing specific information to the corresponding user according to the user attribute value.
13. The apparatus for calculating a user attribute value based on browsing behavior of a user according to claim 9, wherein the obtaining device specifically comprises:
the acquisition unit is used for acquiring the access time of each historical webpage accessed by a user;
the weight value distribution unit is used for distributing weight values to the corresponding page attribute values according to the access time; and
and the calculating unit is used for obtaining the user attribute value of the corresponding user according to the page attribute value and the weight value.
14. The user attribute value calculation apparatus based on user browsing behavior of claim 9, wherein the processing module further comprises:
the acquisition submodule is used for acquiring the information of the website webpage;
the normalization submodule is used for carrying out normalization processing on the information to obtain standard information; and
and the word segmentation sub-module is used for carrying out word segmentation processing on the standard information so as to obtain a word list corresponding to the website webpage.
15. The user attribute value calculation apparatus based on user browsing behavior of claim 14, wherein the processing module further comprises:
and the obtaining submodule is used for obtaining the access heat of the website webpage so as to obtain the information of the website webpage according to the access heat.
CN201610104707.6A 2016-02-25 2016-02-25 User attribute value calculation method and device based on user browsing behavior Active CN107122367B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610104707.6A CN107122367B (en) 2016-02-25 2016-02-25 User attribute value calculation method and device based on user browsing behavior

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610104707.6A CN107122367B (en) 2016-02-25 2016-02-25 User attribute value calculation method and device based on user browsing behavior

Publications (2)

Publication Number Publication Date
CN107122367A CN107122367A (en) 2017-09-01
CN107122367B true CN107122367B (en) 2020-07-03

Family

ID=59717754

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610104707.6A Active CN107122367B (en) 2016-02-25 2016-02-25 User attribute value calculation method and device based on user browsing behavior

Country Status (1)

Country Link
CN (1) CN107122367B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107844548A (en) * 2017-10-30 2018-03-27 北京锐安科技有限公司 A kind of data label method and apparatus
CN108681961A (en) * 2018-05-24 2018-10-19 平安普惠企业管理有限公司 Credit product promotion method, apparatus, equipment and computer readable storage medium
CN109635184A (en) * 2018-11-02 2019-04-16 平安科技(深圳)有限公司 Financial product recommended method, device and computer equipment based on data analysis
CN111695073A (en) * 2020-05-18 2020-09-22 北京字节跳动网络技术有限公司 Information pushing method and device and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102622445A (en) * 2012-03-15 2012-08-01 华南理工大学 User interest perception based webpage push system and webpage push method
CN103440342A (en) * 2013-09-10 2013-12-11 广州市动景计算机科技有限公司 Information pushing method and information pushing device based on webpage types
CN104199874A (en) * 2014-08-20 2014-12-10 哈尔滨工程大学 Webpage recommendation method based on user browsing behaviors

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102622445A (en) * 2012-03-15 2012-08-01 华南理工大学 User interest perception based webpage push system and webpage push method
CN103440342A (en) * 2013-09-10 2013-12-11 广州市动景计算机科技有限公司 Information pushing method and information pushing device based on webpage types
CN104199874A (en) * 2014-08-20 2014-12-10 哈尔滨工程大学 Webpage recommendation method based on user browsing behaviors

Also Published As

Publication number Publication date
CN107122367A (en) 2017-09-01

Similar Documents

Publication Publication Date Title
US11334635B2 (en) Domain specific natural language understanding of customer intent in self-help
US10325033B2 (en) Determination of content score
AU2019366858B2 (en) Method and system for decoding user intent from natural language queries
CN102609474B (en) A kind of visit information supplying method and system
US11275748B2 (en) Influence score of a social media domain
CN111339427B (en) Book information recommendation method, device and system and storage medium
CN107122367B (en) User attribute value calculation method and device based on user browsing behavior
US20110040769A1 (en) Query-URL N-Gram Features in Web Ranking
CN104978665A (en) Brand evaluation method and brand evaluation device
JP2013218686A (en) System and method for extracting aspect-based evaluation point from product and service reviews
US10810685B1 (en) Generation of keywords for categories in a category hierarchy of a software product
CN104376058A (en) User interest model updating method and device
JP2023533475A (en) Artificial intelligence for keyword recommendation
US20210191938A1 (en) Summarized logical forms based on abstract meaning representation and discourse trees
US11314829B2 (en) Action recommendation engine
CN111221881B (en) User characteristic data synthesis method and device and electronic equipment
CN106874368B (en) RTB bidding advertisement position value analysis method and system
US10592995B1 (en) Methods, systems, and computer program product for providing expense information for an electronic tax return preparation and filing software delivery model
CN115375177A (en) User value evaluation method and device, electronic equipment and storage medium
CN110083809A (en) Contract terms similarity calculating method, device, equipment and readable storage medium storing program for executing
CN107527289B (en) Investment portfolio industry configuration method, device, server and storage medium
US11763180B2 (en) Unsupervised competition-based encoding
CN111737607B (en) Data processing method, device, electronic equipment and storage medium
Wolfram Modelling the stock market using Twitter
WO2020000782A1 (en) Financial product recommendation method and apparatus, computer device, and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20200925

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20200925

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee after: Advanced innovation technology Co.,Ltd.

Address before: Greater Cayman, British Cayman Islands

Patentee before: Alibaba Group Holding Ltd.