CN110110219B - Method and device for determining user preference according to network behavior - Google Patents

Method and device for determining user preference according to network behavior Download PDF

Info

Publication number
CN110110219B
CN110110219B CN201810108024.7A CN201810108024A CN110110219B CN 110110219 B CN110110219 B CN 110110219B CN 201810108024 A CN201810108024 A CN 201810108024A CN 110110219 B CN110110219 B CN 110110219B
Authority
CN
China
Prior art keywords
category
user
determining
webpage
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201810108024.7A
Other languages
Chinese (zh)
Other versions
CN110110219A (en
Inventor
陈实如
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
FOUNDER BROADBAND NETWORK SERVICE CO LTD
Peking University Founder Group Co Ltd
Original Assignee
FOUNDER BROADBAND NETWORK SERVICE CO LTD
Peking University Founder Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by FOUNDER BROADBAND NETWORK SERVICE CO LTD, Peking University Founder Group Co Ltd filed Critical FOUNDER BROADBAND NETWORK SERVICE CO LTD
Priority to CN201810108024.7A priority Critical patent/CN110110219B/en
Publication of CN110110219A publication Critical patent/CN110110219A/en
Application granted granted Critical
Publication of CN110110219B publication Critical patent/CN110110219B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data

Abstract

The invention provides a method and a device for determining user preference according to network behavior. The method comprises the following steps: acquiring access information of a user, wherein the access information comprises webpage information and access time; determining the category of the webpage accessed by the user according to the webpage information; determining the times of accessing each type of webpage by a user every day in a preset period according to the access time and the category of the webpage; according to the times, determining the average times of the user accessing each type of webpage and the variance value of the times of the user accessing each type of webpage; and determining the preference of the user in a preset period according to the average times and the time variance value of each type of webpage. The scheme provided by the invention can fully utilize the access information generated when the user surfs the internet and determine the user preference, so that a network operator can fully know the user preference and can provide better service for the user in a targeted manner.

Description

Method and device for determining user preference according to network behavior
Technical Field
The invention relates to the internet technology, in particular to a method and a device for determining user preference according to network behaviors.
Background
At present, with the development of internet technology, the demand of users for network services is also increasing. For a network operator, the preference of a user needs to be known, the network is reconstructed according to the preference of the user, the network is optimized, an accurate marketing package is designed, and then the service level is improved so as to meet the increasing user requirements.
The inventor finds that when a user accesses a network, a broadband operator can record the internet access information of the user and record the information in a database. How to determine the user's preference by using such information is a technical problem that needs to be solved urgently by those skilled in the art.
Disclosure of Invention
The invention provides a method and a device for determining user preference according to network behavior, which are characterized in that access information of a user is acquired, the times of the user accessing each type of webpage in a preset period is counted according to the acquired information, the average times of the user accessing each type of webpage and the difference value of the times of the user accessing each type of webpage are calculated according to the counted access times, and the preference of the user in the preset period is determined according to the calculation result.
A first aspect of the present invention provides a method of determining user preferences from network behaviour, comprising:
acquiring access information of a user, wherein the access information comprises webpage information and access time;
determining the category of the webpage accessed by the user according to the webpage information;
determining the times of accessing each type of webpage by the user every day in a preset period according to the access time and the category of the webpage;
according to the times, determining the average times of the user accessing each type of webpage and the difference value of the times of accessing each type of webpage;
and determining the preference of the user in the preset period according to the average times and the time variance value of each type of webpage.
Another aspect of the present invention provides an apparatus for determining user preferences based on network behavior, comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring access information of a user, and the access information comprises webpage information and access time;
the category determining module is used for determining the category of the webpage accessed by the user according to the webpage information;
the number determining module is used for determining the number of times that the user accesses each type of webpage every day in a preset period according to the access time and the category to which the webpage belongs; the calculation module is used for determining the average times of the user accessing each type of webpage and the variance value of the times of the user accessing each type of webpage according to the times;
and the preference determining module is used for determining the preference of the user in the preset period according to the average times and the time variance value of visiting each type of webpage.
The method and the device for determining the user preference according to the network behavior have the technical effects that:
the method and the device for determining the user preference according to the network behavior provided by the embodiment comprise the steps of obtaining access information of a user for accessing a webpage, wherein the access information comprises webpage information and access time; determining the category of the webpage accessed by the user according to the webpage information; determining the times of accessing each type of webpage by a user every day in preset time according to the access time and the category to which the webpage belongs; according to the times, determining the average times of the user accessing each type of webpage and the variance value of the times of the user accessing each type of webpage; and determining the preference of the user in a preset period according to the average times and the time variance value of each type of webpage. By adopting the method and the device provided by the embodiment, the access information generated when the user browses the webpage can be fully utilized, and the preference of the user can be accurately determined, so that the network provider can know the preference of the user, and the service level is further improved.
Drawings
FIG. 1 is a flow diagram illustrating a method for determining user preferences based on network behavior in accordance with an exemplary embodiment of the invention;
FIG. 2 is a flow chart illustrating a method for determining user preferences based on network behavior in accordance with another exemplary embodiment of the invention;
FIG. 3 is a flowchart illustrating a method for determining user preferences based on network behavior in accordance with yet another exemplary embodiment of the invention;
FIG. 4 is a flowchart illustrating a method of determining user preferences based on network behavior in accordance with yet another exemplary embodiment of the invention;
FIG. 5 is a block diagram illustrating an apparatus for determining user preferences based on network behavior in accordance with an exemplary embodiment of the present invention;
fig. 6 is a block diagram illustrating an apparatus for determining user preferences according to network behavior according to another exemplary embodiment of the present invention.
Detailed Description
Fig. 1 is a flowchart illustrating a method for determining user preferences based on network behavior according to an exemplary embodiment of the invention.
As shown in fig. 1, the method for determining user preferences according to network behaviors provided by this embodiment includes:
step 101, obtaining access information of a user, wherein the access information comprises webpage information and access time.
Specifically, when a user browses a web page on the internet, a lot of access information is generated. Generally, the content browsed by the user during surfing the internet is the content which the user is interested in, so that the user preference can be determined according to the access information generated during surfing the internet by the user.
The method can record the access information of the user on the side of the width operator, and then record the access information into the database, and the browser used by the user can also acquire the access information of the user and then send the acquired information into the background database of the browser, so that the background database can store the access information of the user. The recorded data may include the time of the internet access, the user ID, the source IP address, the destination IP, the url list, the type of the internet access terminal, etc. When the preference analysis of the user is needed, the information can be directly read from the database. In addition, the method provided by the embodiment can also be stored in the memory of the server, and the processor in the server is made to execute the method provided by the embodiment, so that the server can execute the method provided by the embodiment. The method provided by the embodiment can also be packaged into an application program and installed in a server, so that the server can run the method provided by the embodiment.
Furthermore, the access information of the user to access the webpage can be acquired from the acquired data, wherein all the user access information can be acquired, and the required access information can also be acquired. Meanwhile, the access information is classified according to the user identification, so that the preference of the user is determined according to the access information corresponding to the user identification. The user identification can be a mobile phone number, an account number, an ip and the like of the user.
In actual application, only the webpage information and the access time generated when the user accesses the webpage can be acquired. The web page information specifically includes: url, web content, etc. For example, url is http:// games. The access time refers to the time when the user accesses the web page, for example, the user accesses the web page of the green game on 11/23 in 2017.
The user access information can be acquired according to the day, the access information of the user every day is acquired, and the acquired access information is analyzed.
And step 102, determining the category of the webpage accessed by the user according to the webpage information.
The acquired webpage information can be analyzed, if the webpage information is url, the keywords in the url can be extracted according to a preset rule, and the category of the webpage is determined according to the keywords. For example, symbols such as "", "/", and the like in the url may be removed to obtain a vocabulary combination { http games sina com }, and then the category corresponding to the vocabulary combination is determined to be the game in a preset category library, so that the category to which the web page accessed by the user belongs is determined.
All or part of the categories of web pages may be set in the category library, for example, only the categories of web pages that need to be examined are stored. If the web page information is analyzed, the corresponding web page category cannot be determined in the category library, the access information data can be discarded, and the access information data can be stored as abnormal data and processed by maintenance personnel. If all the web page categories which can be currently determined and the corresponding keywords are stored in the category library, the web page categories corresponding to the web pages cannot be determined at this time, which may be caused by insufficient data in the category library, the access information data at this time can be stored as abnormal data, and the maintenance personnel supplements the web page categories or the keywords in the category library according to the abnormal data, so that the content of the category library can be enriched according to the acquired access information. In addition, if the category of the web page to be investigated and the corresponding keyword are stored in the category library, and if the web page information cannot be matched with the corresponding category of the web page, the access information of the web page can be considered not to be the scope of the investigation, and the access information of the web page can not be counted.
And 103, determining the times of each type of webpage accessed by the user every day in a preset period according to the access time and the category of the webpage.
Specifically, according to the access time, all the webpage information accessed by the user on the same day can be screened, then according to the result of step 102, the category of the webpage corresponding to the screened webpage information is determined, and the number of times of occurrence of each category of webpage is calculated, that is, the number of times of access to the category of webpage by the user in one day is calculated. If a user browses multiple web pages within a day, multiple web page types can be determined within a day. If the user does not browse web pages during a day, the data during that day may be replaced with 0's. For example, the user visits website in the game category 5 times the first day and 0 times the second day.
Furthermore, the category corresponding to the webpage information accessed by the user on the same day can be traversed, when the webpage category is traversed for the first time, the number of times of accessing the webpage category is set to be 1, and when the webpage category is traversed for the second time, 1 is added on the basis. The number of access times of each web page category may also be initialized to 0, and when traversing to one of the web page categories, 1 is superimposed on the number of access times of the web page category. Other ways may also be used to count the number of times the user accesses the category of web pages per day, which is not limited herein.
The preset period may be preset according to the requirement, for example, five days, one week, one month, and the like. And determining the times of the user accessing each type of webpage every day in the preset period by taking the preset period as a unit. For example, in units of five days, the number of times each type of web page is accessed by the user each day for five consecutive days is determined.
Specifically, in order to facilitate calculation or analysis according to the statistical result, a first matrix may be established according to the determined data:
Figure BDA0001568358090000051
wherein, aijThe number of times the user accessed the i-th type web page on the j-th day is given.
Specifically, each row of data in the matrix a represents the number of times that the user accesses the category web page corresponding to the row every day, that is, the category of the web page corresponding to each row of data is the same, for example, the first row represents the related data of the first category web page. Each column of data in the matrix a represents the number of times that the user accesses each category of web pages on the same day, and the date of generation of each column of data is the same, for example, the first column represents the number of times that the user accesses each category of web pages on the first day in the preset period. The matrix A comprises m types of web pages in total, and the preset period is n days.
Further, the value of m may be determined according to the types of web pages to be examined, for example, if 5 types of web pages are to be examined in total, m may be set to 5, and m is equal to 5 in each preset period.
And step 104, determining the average times of the user accessing each type of webpage and the difference value of the times of accessing each type of webpage according to the times.
The method for calculating the average number of times that the user accesses each type of web page may be:
Figure BDA0001568358090000061
that is, the average number of times that the user accesses the i-th type web page every day is determined by adding the number of times that the user accesses the i-th type web page within n days and dividing by n.
Specifically, the method for calculating the variance value of the number of times that the user accesses each type of web page may be:
Figure BDA0001568358090000062
wherein the content of the first and second substances,
Figure BDA0001568358090000063
variance value representing times of user accessing ith type web page
By calculating the variance
Figure BDA0001568358090000064
Can further know the daily visit of the userThe degree of deviation of the number of i-type web pages from the average number of times the web pages of that type are accessed per day.
For convenience of statistics, a second matrix may also be established according to the determined average times and the time variance value:
Figure BDA0001568358090000065
and the second matrix is the average times and the times variance value of each type of webpage accessed by the user in a preset period.
And 105, determining the preference of the user in a preset period according to the average times and the time variance value of accessing each type of webpage.
The number of times that the user visits each type of webpage every day can be continuously counted, the total number of times that the user visits no type of webpage is calculated, the webpage type with the top visit number is determined, and the webpage type is used as the user preference. The categories of web pages may also be ranked by calculating an average of the number of times the user visits each category of web pages per day. When the determining method is adopted, the access information stored in the database can be normally processed, such as covering and emptying, after the statistics of the times of accessing each type of webpage by the user every day is finished.
Specifically, the user preference may be determined according to the number of times that the user accesses each category of web pages each day in a preset period, and the user preference may be further determined according to the results determined in a plurality of preset periods.
Further, in the scheme provided by this embodiment, the preference of the user in the preset period may be determined according to the average number of times that the user accesses each type of web page and the variance value of the number of times.
The average times can visually represent the times of the user accessing various webpages, and if the average times of the user accessing certain webpages are more, the user can be considered to pay more attention to the contents of the webpages. In addition, theThe user preference may also be determined based on the variance value of the times. When the number of times of accessing the ith type of web pages by the user per day is more average and stable, the sigmaiIt will be smaller, and conversely, if the number of times the user accesses the i-th web page each day is uneven, for example, 40 times the i-th web page is accessed on the first day, but the i-th web page is not viewed on any other day, then σ isiIt will be larger. Therefore, the average times and the times variance value can be comprehensively considered to determine whether the user pays more attention to the content of the webpage. For example, the user's preference may be determined by determining that the user is more interested in such web content when the average number of times is greater than a number threshold and the variance value of times is less than a variance threshold.
In practical application, in a preset period, a plurality of types of web pages visited by a user may be able to determine a category of web pages that are of interest to the user, and at this time, a plurality of user preferences may be obtained. For example, a user is interested in sports as well as real estate.
The method for determining the user preference according to the network behavior comprises the steps of obtaining access information of a user for accessing a webpage, wherein the access information comprises webpage information and access time; determining the category of the webpage accessed by the user according to the webpage information; determining the times of accessing each type of webpage by a user every day in a preset period according to the access time and the category of the webpage; according to the times, determining the average times of the user accessing each type of webpage and the variance value of the times of the user accessing each type of webpage; and determining the preference of the user in a preset period according to the average times and the time variance value of each type of webpage. By adopting the method provided by the embodiment, the access information generated when the user browses the webpage can be fully utilized, and the average times and the variance value of each type of webpage accessed by the user are determined according to the access information of the user, so that the preference of the user is determined more accurately by comprehensively considering the average times and the variance value of the times, a network provider can know the preference of the user, and the service level is improved.
Fig. 2 is a flowchart illustrating a method of determining user preferences based on network behavior according to another exemplary embodiment of the present invention.
As shown in fig. 2, the method for determining user preferences according to network behaviors provided by this embodiment includes:
step 201, obtaining access information of a user accessing a webpage, wherein the access information includes webpage information and access time.
The specific principle and implementation of step 201 and step 101 are the same, and are not described herein again.
Step 202, extracting keywords from the uniform resource locator url of the web page and/or the content of the web page.
Wherein, the keyword of the webpage can be determined according to url of the webpage. The url of a web page is a compact representation of the location and access method of a resource available from the internet, and is the address of a standard resource on the internet. The url includes at least a mode/protocol (scheme) portion and a host IP address where the resource is stored. The obtained url may be processed according to a preset rule, for example, the preset rule may be to remove symbols such as ".", "/", and the like included therein, remove patterns and protocol portions such as "http", "https", "ftp", and the like included therein, and remove a general domain name format and a world wide web identifier such as "com", "cn", and "www", thereby obtaining useful information from the url. For example, http:// www.iqiyi.com/, after the common content is removed according to the preset rule, the obtained vocabulary is "iqiyi", and then it can be used as the keyword of the web page.
Specifically, keywords of the web page can be extracted according to the content of the web page. Generally, web pages all include a website name, and the website name in the web page can be obtained and used as a keyword of the web page. For example, in all the web pages of the cool web site, the top end comprises the cool mark, and the mark can be identified to determine the key words of the web page.
Step 203, determining the category of the webpage in a preset category library according to the keyword, and/or taking the keyword as the category of the webpage.
Further, a category library may be preset, where the category library includes a correspondence between the keyword and the web page category. For example, a category of web pages may include a plurality of keywords. In order to maintain the category library, a correspondence table between the keywords and the web page categories may be further provided, for storing the correspondence between the keywords and the web page categories, such as table 1.
TABLE 1
Figure BDA0001568358090000081
Figure BDA0001568358090000091
In table 1, the number of lines is set according to the number of url keywords, i.e., one line of url keywords, and the number of lines may also be set according to the number of web page categories, i.e., one line corresponds to one web page category, and accordingly, url keywords belonging to the same web page category may be placed in one box.
In actual application, the table 1 can be maintained, and url keywords and web page categories in the table 1 can be deleted, added and modified, and the corresponding relations in the table 1 can also be deleted, added and modified. For example, a web page category to be examined and its corresponding url keyword may be added to the correspondence table. And when the keywords included in the corresponding table are not analyzed in the access information of the user, the access condition is not recorded. By adopting the implementation mode, the data in the category library is less, and the maintenance is convenient.
If the category library needs to be preset, the method provided by this embodiment may further include:
and receiving the corresponding relation between the keyword and the category, and storing the corresponding relation into a preset category library.
The corresponding relation between the keywords and the categories can be actively uploaded by the user, and after the corresponding relation uploaded by the user is received, the corresponding relation can be stored in a preset category library.
Specifically, the correspondence between the keyword and the category may be determined by a machine learning method. When the user access information is counted, whether the preset category library comprises the webpage keywords or not can be detected, if not, the keywords can be imported into a self-learning system of the computer, so that the computer automatically determines the categories corresponding to the keywords, and the corresponding relation between the categories and the categories is stored in the preset category library. Wherein, the machine learning framework in the prior art can be adopted to realize the functions.
In addition, whether the preset category library comprises the keywords can be detected, if not, the keywords are added into the preset category library, and the category of the keywords is determined to be the keywords.
The keywords can be directly used as the webpage categories, when the webpage keywords are extracted and the preset category library does not contain the keywords, the keywords can be directly stored in the category library and used as the categories.
If the keyword is directly used as the category to which the web page belongs, the category library may be set in the form of a table, as shown in table 2.
TABLE 2
Figure BDA0001568358090000092
Figure BDA0001568358090000101
When the method is adopted, all webpage categories can be covered in the category library, and the data is rich.
Similarly, when the web pages are classified according to the keywords extracted from the web page content, the above manner may also be adopted, and only the maintained category libraries are different, which is not described herein again.
And step 204, determining the times of the user accessing each type of webpage every day in a preset period according to the access time and the type of the webpage.
The specific principle and implementation of step 204 and step 103 are the same, and are not described herein again.
Step 205, according to the number of times, determining the average number of times that the user accesses each type of web pages and the variance value of the number of times that the user accesses each type of web pages.
The specific principle and implementation of step 205 and step 104 are the same, and are not described herein again.
And step 206, determining the first category to which the user belongs according to the average times and the times variance value of the user accessing each category of web pages in the preset period.
Specifically, the first category to which the user belongs may be determined according to the average number of times that the user visits each category of web pages within a preset period, and the type of the web page with the largest average number of visits by the user may be determined as the first category to which the user belongs, for example, μ2The larger of the plurality of average values, and the category of web pages corresponding to category 2 web pages is News, it can be determined that the user belongs to News, whose preference is News.
Further, a first category to which a plurality of users belong may be determined, for example, a user belongs to News category and video category at the same time.
In practical application, there is a situation that a user browses more webpages of one type on the same day and does not browse the webpages of the type at other times only due to some reason, and the average number of times that the user visits the webpages of the type is more due to the number of visits on the day.
Therefore, it is also possible to consider determining the first category to which the user belongs from the degree variance value. When the number of times of accessing the ith type of web pages by the user per day is more average and stable, the sigmaiIt will be smaller, and conversely, if the number of times the user accesses the i-th web page each day is uneven, for example, 40 times the i-th web page is accessed on the first day, but the i-th web page is not viewed on any other day, then σ isiIt will be larger. Thus, σ can be comparediAnd further determining whether the user belongs to the first category or not according to the preset value.
In addition, the embodiment also provides another method for determining the first category to which the user belongs.
Can judge muiWhether or not greater than
Figure BDA0001568358090000111
And if so, determining that the user belongs to the first category i. Where a is a preset correction value, usually set to 0.5, and m is the total number of web page categories. Mu.siThe average number of times of accessing the ith type of web pages for the user; mu.skAverage number of times of access to k-th type web pages, σkIs the root of the variance value of the number of times each k types of web pages are visited.
The access average values and variance root values of all the web pages except the i-th type web page are added, then the sum is divided by the number m-1 of the web page types to obtain the average value of (mu + sigma), and then the average value of the mu and the sigma can be obtained by multiplying the average value by a correction value of 0.5. By removing the parameters of the ith type of web pages in all the web pages and then using muiComparing with the mean values of mu and sigma of other web pages, if muiIf the average value is larger than the final calculated average value, the user can be considered to belong to the first category i. By the aid of the determination method, the condition that the user accesses the ith webpage can be compared with the overall access condition, and whether the user belongs to the first category i or not is determined, so that the classification result is more accurate.
Step 207, determining the user's preferences according to the first category.
And after the first category to which the user belongs is determined, determining the preference of the user according to the first category. For example, if the user belongs to News class or video class, the user's preference may be determined to be News or movie class.
The method provided by the embodiment can determine the average times and the variance value of the times of the user accessing each type of webpage according to the access condition of the user in a preset period, and can accurately determine the type of the user according to the determined average times and the variance value, so that the user preference can be accurately determined.
Fig. 3 is a flowchart illustrating a method of determining user preferences based on network behavior according to yet another exemplary embodiment of the present invention.
The method provided by the embodiment can determine the preference of the user according to the access information of the user in a plurality of preset periods.
As shown in fig. 3, the method for determining user preferences according to network behaviors provided by this embodiment includes:
step 301, determining a first category to which a user belongs, an average number of times of accessing each category of web pages, and a variance value of the number of times in each preset period in P preset periods.
The preset number P of cycles to be considered is predetermined, for example, 10 cycles, 15 cycles, etc., and the user preference may also be considered for a long time according to the requirement. Since the access information of the user cannot be permanently stored in the database, when the preset period P is large, the number of access times can be recorded according to the access information after the access information is generated, and after one preset period is finished, the first category to which the user belongs can be determined according to the data in the preset period. Thereby avoiding the situation that the data in the database is disposed of but the preference analysis of the user is not performed based on the data. In addition, when considering the preference of the user for a long time, it is also possible to set P to a dynamic value, i.e., every time a preset period ends, 1 is superimposed on the current P value. And determining a first category according to the newly increased preset period, and determining a second category according to the first category determined in the previous preset period.
Specifically, the average number of times of accessing each type of web page and the variance value of the number of times may also be determined in each preset period, and the specific determination method may refer to step 201 to step 206 in the embodiment shown in fig. 2, which is not described herein again.
Step 302, determining a second category to which the user belongs from a plurality of first categories according to the average number of times of accessing each category of web pages and the number variance value determined in each preset period.
Further, since at least one first category can be determined in each preset period, a plurality of first categories can be determined in P preset periods. For example, the first category determined in the first preset period is the 1 st and 3 rd categories, the first category determined in the second preset period is the 1 st and 4 th categories, and the first category determined in the third preset period is the 1 st and 5 th categories.
Since the first category determined according to a preset period is determined to be short-term, the result, that is, the content in which the user is more interested in within the preset period, is obtained. However, it is possible that during this preset period, the user is more interested in a certain type of content, but after a while, the user is more interested in other content. Therefore, determining the user preference according to only one preset period cannot investigate the user's preference for a long time. Based on this, in order to examine the preference of the user in a longer period, the present embodiment comprehensively considers the average times and the time variance value generated when each type of web page is accessed in a plurality of preset periods, and screens out the second category from the determined first category.
In practical application, the average times and the time variance value corresponding to the first category of web pages can be accessed in each preset period, and whether the first category meets the requirement of the second category is determined according to the size of each value. For example, P is 3, the first class determined in the first preset period is the 1 st and 3 rd classes, the first class determined in the second preset period is the 1 st and 4 th classes, and the first class determined in the third preset period is the 1 st and 5 th classes. Then, the average number of times and the variance value of the number of times of accessing the 1 st, 3 rd, 4 th and 5 th categories in the first to third preset periods may be obtained, and then, whether the average number of times and the variance value of the number of times of any one of the first category webpages are both greater than the preset value corresponding thereto may be compared, if so, the first category may be determined as the second category to which the user belongs.
All of the first categories may be traversed to screen out second categories that satisfy the condition.
If the preference of the user is considered for a long time, that is, the P value is increased according to the time change situation, after the P value is changed, the second category to which the user belongs can be re-determined according to the first category, the average number of times of accessing each category of web pages and the variance value of the number of times included in the newly determined preset period, so that the second category can be updated according to the change of the P value.
Step 303, determining the user's preferences according to the second category.
Step 303 is similar to the implementation of step 208, and is not described herein again.
It should be noted that the determined user preferences can vary as the second category varies.
The method for determining the user preference according to the network behavior provided by the embodiment can investigate the preference of the user for a long time according to the access information generated when the user accesses the webpage, so that the preference and the change of the preference of the user can be known for a long time, and personalized service can be provided for the user according to the preference of the user.
Fig. 4 is a flowchart illustrating a method of determining user preferences based on network behavior according to yet another exemplary embodiment of the invention.
As shown in fig. 4, the method for determining user preferences according to network behaviors provided by this embodiment includes:
step 401, determining a first category to which a user belongs, an average number of times of accessing each category of web pages, and a variance value of the number of times in each preset period in P preset periods.
The implementation principle and manner of step 401 are the same as those of step 301, and are not described herein again.
Step 402, obtaining a first category i in each preset period, and determining the preset period number P' for determining the category i as the first category.
Since the access information of the user in each preset period is different, the first category determined in each preset period is also different. A first category i in each preset period may be obtained, for example, P is 3, and in the first preset period, the first category i is a web page category 1 or 3; in a second preset period, the first category i is web page categories 1 and 4; in a third preset period, the first category i is web page categories 1 and 5. Then, the predetermined number of cycles P' for each class i to be the first class is determined. For example, the preset number of cycles of the first category, in which the web page category 1 is determined to be the first category, is 3, and the preset number of cycles of the first category, in which the web page categories 3, 4, 5 are determined to be the first category, is 1.
To facilitate the statistics of the quantity P ', the set C' may be established according to a first category determined during each predetermined period. The concrete mode is as follows:
establishing a subset C 'according to the first category i determined in each preset period'n. Wherein, sub-set C'nIncluding all the first categories i determined in the preset period. n represents a preset period identifier, for example, the first preset period, and n is 1. Since P preset periods are included, P subsets can be obtained:
C′1={… e … Cf …}1
C′2={… Cg … Ch …}2
……
C′P={… Cc … Cd …}P
wherein, Ce、Cf、Cg、ChAnd so on for each preset period, the first category to which the user belongs.
Determining a set C' from the plurality of subsets:
C′=
{{… Ce … Cf …}1{… Cg … Ch …}2…{… Cc … Cd …}P}。
further, since each class i can only be determined as a first class at most once in each preset period, that is, in the above-mentioned set, each class i can only appear once in each subset, the preset period number P' for determining the class i as the first class can be determined according to the number of times each first class i appears.
In step 403, it is determined whether the first category i satisfies the first condition according to the number P' and the preset period number P.
Further, a preset rule may be set, and if it is determined that the first category i satisfies the preset rule according to the number P' and P, it is determined that the first category i satisfies the first condition.
In practice, it can be determined whether the number P' is greater than
Figure BDA0001568358090000141
If so, judging that the preset rule is met, otherwise, judging that the preset rule is not met.
Wherein the content of the first and second substances,
Figure BDA0001568358090000142
for the correction values, provision may be made as required, for example
Figure BDA0001568358090000143
Set to 1/3.
If the first category i satisfies the requirement of the first condition, step 404 is executed. Otherwise, whether the next first category meets the requirement of the first condition is continuously determined.
Step 404, determining the total average times of the user accessing the web pages of the category i in P preset periods.
Specifically, the times a that the user visits each type of web pages every day are recorded in each preset periodij(see matrix A in detail), the total number of visits by the user to the web pages of the category i in P preset periods can be determined according to the values, and then the total number of visits is divided by the total number of days of P preset periods, so as to obtain the total average number of visits mu of the web pages of the category iiWhere total number of days is the preset number of cycle days multiplied by P, e.g. 5 days per preset cycle, then the total number of days is 5 × P.
Step 405, determining whether the first category i meets the second condition according to the number P', the average number of times of accessing each category of web pages in each preset period, the number variance value and the total average number of times, and if so, determining that the first category i is the second category.
Further, the average number of times of accessing each type of web pages in each preset period and the number variance value may be processed first. When there are P preset periods, P matrices a can be obtained, and correspondingly, one matrix B can be obtained according to each matrix a, so that P matrices B can be obtained, that is:
Figure BDA0001568358090000151
p matrices B may be processed to obtain a third matrix Bi
Figure BDA0001568358090000152
According to the third matrix BjMu included inij、σijNumber P', total average number of times mui"it is determined whether or not the first class i satisfies the second condition, and a specific determination method may be to judge the total average number of times μiWhether or not to satisfy:
Figure BDA0001568358090000161
if yes, judging that the category i meets a second condition.
If the category i meets the first condition and the second condition, the category i is judged to be the second category, namely the second category to which the user belongs is determined according to the long-term access data of the user in a plurality of first categories generated in a plurality of preset periods, so that the final determined result is more accurate.
After determining whether the category i satisfies the first condition and the second condition, it may be continuously determined whether other categories i satisfy the first condition and the second condition.
At step 406, the user's preferences are determined based on the second category.
Step 406 is similar to the implementation of step 208, and is not described herein again.
Step 407, if the first category i is determined to be the second category, determining the probability that the user belongs to the category i according to a preset algorithm, and outputting the probability;
wherein, the preset algorithm is as follows:
Figure BDA0001568358090000162
where P 'refers to the preset number of cycles that class i is determined to be the first class, i.e., the number of times class i appears in set C'. The division of P' by P is recalculated to obtain q, i.e. the probability that the category i is determined as the first category in P cycles is calculated, and therefore the probability that the user is determined as the category i can be represented by the q value.
The execution sequence of step 406 and step 407 is not limited, and step 406 may be executed first, step 407 may be executed first, or steps 406 and 407 may be executed simultaneously.
In the method for determining the user preference according to the network behavior provided in this embodiment, on the basis of determining the first category to which the user belongs in each preset period, the second category to which the user belongs in the longer time of P preset periods is determined according to data in each preset period, so that the user can be classified in a longer time. In addition, by introducing the mean value and the variance value in a plurality of preset periods, the stationarity of the user in accessing the ith webpage can be inspected, so that the user can be classified more accurately, and the preference of the user can be analyzed more accurately.
Fig. 5 is a block diagram illustrating an apparatus for determining user preferences based on network behavior according to an exemplary embodiment of the present invention.
As shown in fig. 5, the apparatus for determining user preferences according to network behaviors provided in this embodiment includes:
an obtaining module 51, configured to obtain access information of a user, where the access information includes web page information and access time;
a category determining module 52, configured to determine, according to the web page information, a category to which a web page accessed by the user belongs;
the number determining module 53 is configured to determine, according to the access time and the category to which the webpage belongs, the number of times that the user accesses each type of webpage every day in a preset period;
the calculating module 54 is configured to determine, according to the number of times, an average number of times that the user accesses each type of web pages and a variance value of the number of times that the user accesses each type of web pages;
a preference determining module 55, configured to determine the user preference according to the number of times that the user accesses each type of web page each day.
The device for determining the user preference according to the network behavior provided by the embodiment comprises the steps of acquiring access information of a user accessing a webpage, wherein the access information comprises webpage information and access time; determining the category of the webpage accessed by the user according to the webpage information; determining the times of accessing each type of webpage by a user every day in a preset period according to the access time and the category of the webpage; according to the times, determining the average times of the user accessing each type of webpage and the variance value of the times of the user accessing each type of webpage; and determining the user preference in a preset period according to the average times and the time variance value of each type of webpage. By adopting the device provided by the embodiment, the access information generated when the user browses the webpage can be fully utilized, and the average times and the variance value of each type of webpage accessed by the user are determined according to the access information of the user, so that the preference of the user is determined more accurately by comprehensively considering the average times and the variance value of the times, a network provider can know the preference of the user, and the service level is improved.
The specific principle and implementation of the apparatus for determining user preferences according to network behavior provided in this embodiment are similar to those of the embodiment shown in fig. 1, and are not described herein again.
Fig. 6 is a block diagram illustrating an apparatus for determining user preferences according to network behavior according to another exemplary embodiment of the present invention.
As shown in fig. 6, on the basis of the above-mentioned embodiments, the present embodiment provides an apparatus for determining user preferences according to network behavior,
the preference determination module 55 includes:
a first category determining unit 551, configured to determine a first category to which the user belongs according to the average number of times of accessing each category of web pages and the number variance value;
a preference determining unit 552 for determining the preferences of the user according to the first category.
The first category determining unit 551 is specifically configured to:
judgment of muiWhether or not greater than
Figure BDA0001568358090000181
If yes, determining that the user belongs to a first category i;
wherein m is the total number of the web page categories, a is a preset correction value, muiThe average number of times of accessing the ith type of web pages for the user; mu.skThe average number of times σ to access a k-th type web pagekIs the root of the variance value of the number of times each k types of web pages are visited.
Specifically, the apparatus provided in this embodiment further includes: a multi-cycle determination module 56 to:
determining the first category to which the user belongs, the average number of times of accessing each category of web pages and the number variance value in each preset period within P preset periods;
determining a second category to which the user belongs in the plurality of first categories according to the average times of accessing each category of web pages determined in each preset period and the time variance value;
determining the user's preferences according to the second category.
Optionally, the multi-cycle determining module 56 includes:
an obtaining unit 561, configured to obtain the first category i in each preset period, and determine a preset period number P' for determining the category i as a first category;
a first determining unit 562, configured to determine whether the first category i satisfies a first condition according to the number P', a preset number of cycles P;
if yes, the first determining unit 562 determines the total average number of times that the user accesses the webpages of the category i in P preset periods;
the first determining unit 562 is further configured to determine whether the first category i meets a second condition according to the number P', the average number of times of accessing each category of web pages in each preset period, the number variance value, and the total average number of times, and if yes, determine that the first category i is the second category.
Optionally, the apparatus provided in this embodiment further includes a probability output module 57, configured to:
if the first category i is determined to be the second category, determining the probability that the user belongs to the category i according to a preset algorithm, and outputting the probability;
wherein the preset algorithm is as follows:
Figure BDA0001568358090000191
wherein q is the probability.
In addition, the category determination module 52 may further include:
an extracting unit 521, which extracts keywords from the url of the web page and/or the content of the web page;
the second determining unit 522 is configured to determine, according to the keyword, a category to which the web page belongs in a preset category library, and/or use the keyword as the category to which the web page belongs.
Optionally, the preset category library includes a corresponding relationship between the keyword and the category;
accordingly, the category determination module 52 further includes:
a receiving unit 523, configured to receive a correspondence between the keyword and the category, and store the correspondence in the preset category library;
and/or the adding unit 524 is configured to detect whether the preset category library includes the keyword, if not, add the keyword to the preset category library, and determine the category of the keyword as the keyword itself.
The specific principle and implementation manner of the apparatus for determining user preference according to network behavior provided in this embodiment are similar to those of the embodiments shown in fig. 2 to 4, and are not described here again.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (6)

1. A method for determining user preferences based on network behavior, comprising:
acquiring access information of a user, wherein the access information comprises webpage information and access time;
determining the category of the webpage accessed by the user according to the webpage information;
determining the times of accessing each type of webpage by the user every day in a preset period according to the access time and the category of the webpage;
according to the times, determining the average times of the user accessing each type of webpage and the difference value of the times of accessing each type of webpage;
judgment of
Figure DEST_PATH_IMAGE002
Whether or not greater than
Figure DEST_PATH_IMAGE004
If yes, determining that the user belongs to a first category i;
wherein m is the total number of the webpage categories, a is a preset correction value,
Figure 899380DEST_PATH_IMAGE002
the average number of times of the user accessing the ith type of webpage in the preset period is obtained;
Figure DEST_PATH_IMAGE006
for the average number of times that the user accesses the k-th class web page within the preset period,
Figure DEST_PATH_IMAGE008
the root value of the variance value of the times of accessing each k types of webpages by the user in the preset period is obtained;
determining the first category to which the user belongs, the average number of times of accessing each category of web pages and the number variance value in each preset period within P preset periods;
determining a second category to which the user belongs in the plurality of first categories according to the average times of accessing each category of web pages determined in each preset period and the time variance value; the second category is one of a plurality of the first categories;
determining the user's preferences according to the second category.
2. The method according to claim 1, wherein determining a second category to which the user belongs among the plurality of first categories according to the average number of times of accessing each category of web pages determined in each of the preset periods and the number variance value comprises:
acquiring the first category i in each preset period, and determining the preset period number P' for determining the category i as a first category;
determining whether the first category i meets a first condition or not according to the number P' and a preset period number P;
if yes, determining the total average times of the user accessing the webpage of the category i in the P preset periods;
and determining whether the first category i meets a second condition according to the number P', the average number of times of accessing each category of webpages in each preset period, the number variance value and the total average number of times, and if so, determining that the first category i is the second category.
3. The method according to claim 2, wherein if the first category i is determined to be the second category, determining a probability that the user belongs to the category i according to a preset algorithm, and outputting the probability;
wherein the preset algorithm is as follows:
Figure DEST_PATH_IMAGE010
wherein q is the probability.
4. The method according to any one of claims 1-3, wherein the determining the category to which the webpage accessed by the user belongs according to the webpage information comprises:
extracting keywords from a uniform resource locator url of the webpage and/or the content of the webpage;
and determining the category of the webpage in a preset category library according to the keyword, and/or taking the keyword as the category of the webpage.
5. The method according to claim 4, wherein the preset category library comprises a corresponding relationship between the keyword and the category;
the method further comprises the following steps:
receiving the corresponding relation between the keywords and the categories, and storing the corresponding relation into the preset category library;
and/or detecting whether the preset category library comprises the keyword or not, if not, adding the keyword into the preset category library, and determining the category of the keyword as the keyword.
6. An apparatus for determining user preferences based on network behavior, comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring access information of a user, and the access information comprises webpage information and access time;
the category determining module is used for determining the category of the webpage accessed by the user according to the webpage information;
the number determining module is used for determining the number of times that the user accesses each type of webpage every day in a preset period according to the access time and the category to which the webpage belongs; the calculation module is used for determining the average times of the user accessing each type of webpage and the variance value of the times of the user accessing each type of webpage according to the times;
a multi-period determining module, configured to determine, in P preset periods, the first category to which the user belongs, the average number of times of accessing each category of webpages, and the number variance value in each preset period; determining a second category to which the user belongs in the plurality of first categories according to the average times of accessing each category of web pages determined in each preset period and the time variance value; the second category is one of a plurality of the first categories; determining preferences of the user according to the second category;
a preference determination module comprising:
the first category determining unit is used for determining a first category to which the user belongs according to the average times and the time variance value of visiting each category of webpages;
a preference determining unit for determining the preference of the user according to the first category;
the first category determining unit is specifically configured to:
judgment of
Figure 737892DEST_PATH_IMAGE002
Whether or not greater than
Figure 986471DEST_PATH_IMAGE004
If yes, determining that the user belongs to a first category i;
wherein m is the total number of the webpage categories, a is a preset correction value,
Figure 975155DEST_PATH_IMAGE002
the average number of times of the user accessing the ith type of webpage in the preset period is obtained;
Figure 258369DEST_PATH_IMAGE006
for the average number of times that the user accesses the k-th class web page within the preset period,
Figure 674307DEST_PATH_IMAGE008
and the root value of the variance value of the times of accessing each k types of webpages by the user in the preset period is obtained.
CN201810108024.7A 2018-02-02 2018-02-02 Method and device for determining user preference according to network behavior Expired - Fee Related CN110110219B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810108024.7A CN110110219B (en) 2018-02-02 2018-02-02 Method and device for determining user preference according to network behavior

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810108024.7A CN110110219B (en) 2018-02-02 2018-02-02 Method and device for determining user preference according to network behavior

Publications (2)

Publication Number Publication Date
CN110110219A CN110110219A (en) 2019-08-09
CN110110219B true CN110110219B (en) 2022-02-18

Family

ID=67483141

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810108024.7A Expired - Fee Related CN110110219B (en) 2018-02-02 2018-02-02 Method and device for determining user preference according to network behavior

Country Status (1)

Country Link
CN (1) CN110110219B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110650161B (en) * 2019-10-30 2021-09-24 华南师范大学 Safe website and working method thereof
CN112131561A (en) * 2020-09-11 2020-12-25 北京北信源软件股份有限公司 Access boundary determination method, device, electronic device and storage medium
CN114780882B (en) * 2022-03-26 2023-12-05 深圳市安睿信科技有限公司 Internet webpage display management method, equipment and computer storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104217091B (en) * 2013-06-05 2016-12-28 北京齐尔布莱特科技有限公司 A kind of website visiting amount Forecasting Methodology based on history tendency weight
CN104077714B (en) * 2014-06-16 2017-06-09 微梦创科网络科技(中国)有限公司 Access preference acquisition, advertisement sending method and the system of the user of website

Also Published As

Publication number Publication date
CN110110219A (en) 2019-08-09

Similar Documents

Publication Publication Date Title
CN106649316B (en) Video pushing method and device
US10218599B2 (en) Identifying referral pages based on recorded URL requests
US6151585A (en) Methods and apparatus for determining or inferring influential rumormongers from resource usage data
CN104426713B (en) The monitoring method and device of web site access effect data
CN103886068B (en) Data processing method and device for Internet user's behavioural analysis
CN108304410B (en) Method and device for detecting abnormal access page and data analysis method
CN110110219B (en) Method and device for determining user preference according to network behavior
US10482477B2 (en) Stratified sampling applied to A/B tests
CN102831114B (en) Realize method and the device of internet user access Statistic Analysis
Akgül Quality evaluation of E-government websites of Turkey
CA2769946A1 (en) A method and system for efficient and exhaustive url categorization
CN107578263A (en) A kind of detection method, device and the electronic equipment of advertisement abnormal access
CN101739402A (en) Method and device for interest analysis
US20110029505A1 (en) Method and system for characterizing web content
WO2013110357A1 (en) Social network analysis
JP2011227721A (en) Interest extraction device, interest extraction method, and interest extraction program
CN117235586B (en) Hotel customer portrait construction method, system, electronic equipment and storage medium
CN116015842A (en) Network attack detection method based on user access behaviors
CN107526748A (en) A kind of method and apparatus for identifying user and clicking on behavior
JP5234839B2 (en) Content management apparatus, content management method and program
CN112347457A (en) Abnormal account detection method and device, computer equipment and storage medium
CN112685618A (en) User feature identification method and device, computing equipment and computer storage medium
JP5497925B2 (en) Content management apparatus, content management method and program
CN106933885A (en) The acquisition methods and device of website propagating influence
US10482105B1 (en) External verification of content popularity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20220218