CN117009629A - User classification method, device, electronic equipment and storage medium - Google Patents

User classification method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117009629A
CN117009629A CN202210471141.6A CN202210471141A CN117009629A CN 117009629 A CN117009629 A CN 117009629A CN 202210471141 A CN202210471141 A CN 202210471141A CN 117009629 A CN117009629 A CN 117009629A
Authority
CN
China
Prior art keywords
access time
time length
barrel
user
quantile
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210471141.6A
Other languages
Chinese (zh)
Inventor
张博
徐煦
张伟
朱宇昕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Xiongan ICT Co Ltd
China Mobile System Integration Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Xiongan ICT Co Ltd
China Mobile System Integration Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Xiongan ICT Co Ltd, China Mobile System Integration Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202210471141.6A priority Critical patent/CN117009629A/en
Publication of CN117009629A publication Critical patent/CN117009629A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Abstract

The invention provides a user classification method, a device, electronic equipment and a storage medium, wherein the total access time is stored in a barrel through a target mask, so that the storage space can be used more effectively, and the efficiency of barrel storage can be improved through the use of the target mask. Because of the introduction of each barrel, in the step of calculating the access time length quantile, the total sequence or the construction of the total index is not needed for each total access time length, the time delay for calculating the access time length quantile can be reduced, and the calculation efficiency of the access time length quantile is improved. Compared with the prior art that the users are classified by adopting the given access time threshold, the relationship among the total access time of each user can be considered by adopting the access time score, so that the classification result of the users is more accurate, and the user is recommended with proper service information for the follow-up users.

Description

User classification method, device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of user classification technologies, and in particular, to a user classification method, apparatus, electronic device, and storage medium.
Background
To maintain the adhesion degree (i.e., user viscosity) of a user to an Application (APP), a server providing the APP generally measures the adhesion degree of the user to the APP by taking the access time of the user to the APP as a measurement index.
After determining the access time of the user to the APP, the user is classified so as to recommend appropriate service information to the user. In this process, it is often the case that the user is classified by a given access duration threshold. Although the method can realize user classification, the classification is inaccurate, and proper service information cannot be recommended to the user.
For this reason, it is urgently required to provide a user classification method.
Disclosure of Invention
The invention provides a user classification method, a user classification device, electronic equipment and a storage medium, which are used for solving the defects in the prior art.
The invention provides a user classification method, which comprises the following steps:
acquiring the total access time length of each user to the target object, and storing each total access time length in a barrel based on the target mask;
determining an initial position corresponding to a preset quantile based on the total access time length number and the preset quantile, and calculating the access time length quantile at the preset quantile based on the initial position and the element number in each barrel;
and classifying each user based on the total access time length and the access time length quantile.
According to the user classification method provided by the invention, the access time length quantile at the preset quantile is calculated based on the initial position and the element number in each barrel, and the method specifically comprises the following steps:
sequencing each barrel based on the identification of each barrel to obtain a first sequencing result;
starting from a first barrel in the first sequencing result, determining a target barrel in each barrel and a target position of the preset position in the target barrel based on difference information between the current position and the element number in the current barrel; the current position is determined based on difference information corresponding to a position before the current position, and an initial value of the current position is the initial position;
sequencing all elements in the target bucket to obtain a second sequencing result;
and determining the access duration quantile based on the target position and the second sorting result.
According to the user classification method provided by the invention, based on the target mask, each total access duration is stored in a barrel, and the method comprises the following steps:
dividing each total access time length into index segments and content segments based on the target mask;
and storing the content segments of each total access duration into a bucket identified by the index segment by adopting multithreading.
According to the user classification method provided by the invention, based on the target mask, each total access time length is divided into an index segment and a content segment, and the method comprises the following steps:
and dividing each total access time length into an index section and a content section by adopting a shift operation mode and a logic operation mode based on the nonzero number of the target mask.
According to the user classification method provided by the invention, the determining the initial position corresponding to the access time length quantile based on the total access time length number and the preset quantile comprises the following steps:
storing each barrel into a memory;
and if the capacity of the memory is insufficient, selecting a barrel with the number of content segments larger than the number threshold corresponding to the target mask from the barrels to write into a disk.
According to the user classification method provided by the invention, the target mask is determined based on the following method:
determining an optimal mask range according to the total access time length number and the content segments of each total access time length;
and determining any mask in the optimal mask range as the target mask.
According to the user classification method provided by the invention, the classification of each user is performed based on each total access time length and the access time length quantile, and then the method comprises the following steps:
and recommending service information for each user based on the classification result corresponding to each user.
The invention also provides a user classifying device, comprising:
the sub-bucket storage module is used for acquiring the total access time length of each user to the target object and storing the total access time length in sub-buckets based on the target mask;
the system comprises a score calculating module, a score calculating module and a control module, wherein the score calculating module is used for determining an initial position corresponding to a preset score based on the total access time length number and the preset score, and calculating the access time length score at the preset score based on the initial position and the element number in each barrel;
and the classification module is used for classifying each user based on the total access time length and the access time length quantile.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the user classification method as described in any of the above when executing the program.
The invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a user classification method as described in any of the above.
The invention also provides a computer program product comprising a computer program which when executed by a processor implements a user classification method as described in any of the above.
According to the user classification method, the device, the electronic equipment and the storage medium, the total access time is stored in the sub-bucket through the target mask, so that the storage space can be used more effectively, and the efficiency of sub-bucket storage can be improved through the use of the target mask. Because of the introduction of each barrel, in the step of calculating the access time length quantile, the total sequence or the construction of the total index is not needed for each total access time length, the time delay for calculating the access time length quantile can be reduced, and the calculation efficiency of the access time length quantile is improved. Compared with the prior art that the users are classified by adopting the given access time threshold, the relationship among the total access time of each user can be considered by adopting the access time score, so that the classification result of the users is more accurate, and the user is recommended with proper service information for the follow-up users.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious to those skilled in the art that other drawings can be obtained according to these drawings without inventive effort.
FIG. 1 is a flow chart of a user classification method provided by the invention;
FIG. 2 is a schematic diagram of a user classification device according to the present invention;
fig. 3 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Because in the prior art, after determining the access duration of a user Application (APP), the user is typically classified by a given access duration threshold. This approach leads to inaccurate classification and thus to an inability to recommend proper service information to the user. Therefore, the embodiment of the invention provides a user classification method.
Fig. 1 is a flow chart of a user classification method provided in an embodiment of the present invention, as shown in fig. 1, the method includes:
s11, acquiring the total access time length of each user to the target object, and storing each total access time length in a barrel based on a target mask;
s12, determining an initial position corresponding to a preset quantile based on the total access time length number and the preset quantile, and calculating the access time length quantile at the preset quantile based on the initial position and the element number in each barrel;
s13, classifying each user based on the total access time length and the access time length quantile.
Specifically, in the user classification method provided in the embodiment of the present invention, the execution subject is a user classification device, and the device may be configured in a server, where the server may be a local server or a cloud server, and the local server may be a computer, etc., which is not specifically limited in the embodiment of the present invention.
Step S11 is first executed to obtain the total access duration of each user to the target object, where the target object may be APP, a web page, a website, or the like, and is not limited herein specifically. The total access duration refers to the accumulated access duration of each user accessing the target object at the current time and before, and the unit of the total access duration can be hours (h) or days. It will be appreciated that there is a total access duration for each user.
And then, storing each total access time length in a barrel according to the target mask. The target mask and each total access duration may be represented by a binary number, and the number of bits of the target mask may be the same as the number of bits of each total access duration. The target mask may include a non-zero bit and a zero bit, where the number of bits of the non-zero bit may be set as desired, and is not specifically limited herein, for example, the number of bits of the non-zero bit may be set to 8, and the number of bits of the zero bit is the difference between the number of bits of the target mask and 8.
In the embodiment of the invention, the total access duration and the target mask corresponding to each user can be long integer data type, namely long type data type, and the number of bits can be 64 bits.
Through the target mask, the total access time length can be divided, and the division results are stored in a barrel. The segmentation result may include first information for marking the bucket and second information for marking each access duration and storing the same in the bucket.
Then, step S12 is executed, where an initial position corresponding to the preset score is determined based on the total number of access durations and the preset score. Since there is one total access duration for each user, the total number of access durations can be understood as the total number of users. The preset index is a preset index standard which is a percentage. For example, at least one of 20%, 30%, 40%, 60% and 80% may be used, i.e., the number of preset digits may be one or more.
The initial position corresponding to the preset position can be determined through the total access time length number and the preset position, and the initial position can be the product of the total access time length number and the preset position. For example, the position is denoted as Rank, the total number of access durations is denoted as total, the preset fraction is denoted as p, and the initial position may be denoted as rank=p×total.
Thereafter, the number of access duration quantiles at the preset quantiles may be calculated in combination with the initial position and the number of elements in each bucket. The number of elements in each bucket may be understood as the number of total access durations that are the same as the first information, and the elements in each bucket may be understood as the second information that marks the total access duration corresponding to the first information for that bucket.
The process of calculating the access time length quantile can be that all barrels are sequenced firstly, a target barrel where the preset quantile is located is determined through iteration through an initial position, the position of the preset quantile in the target barrel is further determined, then elements stored in the target barrel where the preset quantile is located, namely second information, can be determined, and then the total access time length corresponding to the second information is the access time length quantile at the preset quantile.
It can be understood that when the preset number of digits is 1, if the preset number of digits is 2, the access duration number of digits is 1, and if the preset number of digits is 3, the access duration number of digits is 2, that is, the number of digits of the access duration = the number of preset digits-1. When the preset bits are multiple, the access time duration is multiple.
Because mass data cannot be loaded into the memory at one time to calculate the quantile, two common methods for calculating the quantile exist at present, and the first method is a method for constructing a full binary tree index to calculate the quantile. The second is a method of calculating quantiles using an external sorting method.
The method for realizing rapid calculation of quantiles by constructing the full binary tree index can effectively improve the query efficiency, but the defect is obvious along with the increase of the data quantity. Under the big data scene, although the direct query of the binary tree index with the full binary tree index has lower quantile delay, constructing the full binary tree index for the massive data is a time-consuming process, and simultaneously occupies a huge amount of storage space, both aspects are current pain points, and particularly under the condition of normal distribution or relative concentration of query, a scene with a small number of query times is constructed by storing a large number of attribute columns, and the condition of unreliability of index resource cost exists under the scene. Therefore, the method for calculating the quantiles by constructing the full binary tree index has the problems of overlong index construction time and overlarge persistence occupation space.
The external sorting is divided into two steps, each segment of the input file is sorted by adopting a proper internal sorting method, then the merging segments generated in the first stage are merged by utilizing a merging algorithm until only one merging segment is left, and finally the calculated score is used by utilizing the sorted result. The method has the defects that IO operation needs to be carried out for many times for each query, the query time is long, and particularly, the delay of the method is obvious in a scene of frequent quantile query. Namely, the method for calculating the quantile by directly adopting external sequencing has the problem of overlong inquiry time.
Compared with the quantile calculating method provided in the prior art, in the embodiment of the invention, the initial position corresponding to the preset quantile is determined through the total access time length number and the preset quantile, and the access time length quantile at the preset quantile is calculated through the initial position and the element number in each barrel. Because of the introduction of each barrel, in the step of calculating the access time length quantile, the total sequence or the construction of the total index is not needed for each total access time length, the time delay for calculating the access time length quantile can be reduced, and the calculation efficiency of the access time length quantile is improved.
Finally, step S13 is executed, where the access duration score may be used to classify the total access duration of each user, so as to classify each user. The number of categories of users can be equal to the access time length score number +1, namely, the total access time length at both sides of each access time length score number belongs to users in different categories.
Furthermore, according to the total access time length and the access time length quantile, the size relation between the total access time length and the access time length quantile can be judged so as to determine the category of each user. The category of each user may be used to characterize the level of liveness of that user to the target object, which may include multiple levels, e.g., very low, medium, high, very high, etc.
According to the user classification method provided by the embodiment of the invention, the target mask is used for storing the total access time length in the barrel, so that the storage space can be used more effectively, and the efficiency of barrel storage can be improved through the use of the target mask. Because of the introduction of each barrel, in the step of calculating the access time length quantile, the total sequence or the construction of the total index is not needed for each total access time length, the time delay for calculating the access time length quantile can be reduced, and the calculation efficiency of the access time length quantile is improved. Compared with the prior art that the users are classified by adopting the given access time threshold, the relationship among the total access time of each user can be considered by adopting the access time score, so that the classification result of the users is more accurate, and the user is recommended with proper service information for the follow-up users.
Based on the foregoing embodiment, the method for classifying users according to the embodiment of the present invention calculates the access duration quantile at the preset quantile based on the initial position and the number of elements in each bucket, and specifically includes:
sequencing each barrel based on the identification of each barrel to obtain a first sequencing result;
starting from a first barrel in the first sequencing result, determining a target barrel in each barrel and a target position of the preset position in the target barrel based on difference information between the current position and the element number in the current barrel; the current position is determined based on difference information corresponding to a position before the current position, and an initial value of the current position is the initial position;
sequencing all elements in the target bucket to obtain a second sequencing result;
and determining the access duration quantile based on the target position and the second sorting result.
Specifically, in the embodiment of the invention, when the access time length fractional bit number is calculated, the barrels can be sequenced according to the identification of each barrel, namely the first information of each total access time length, and can be sequenced according to the ascending order of the identification, so as to obtain a first sequencing result.
Then, starting from the first bucket in the first sorting result, a target bucket in each bucket and a target position of a preset position in the target bucket can be determined according to difference information between the current position and the element number in the current bucket. The difference information may be a magnitude relation between the two and a difference between the two.
It may be understood that the initial value of the current position is an initial position, and the current position may be determined by difference information corresponding to a previous position of the current position, where the difference information corresponding to the previous position refers to difference information between the previous position and the number of elements in a previous barrel of the current barrel.
The process is an iterative process, which can be implemented by:
wherein, target_bucket is the target bucket, bucket is the current bucket, bucket [ ] is the first ordering result
After exiting the FOR loop through BREAK in Line5, the target bucket target_bucket and the target position Rank of the preset partition in the target bucket can be obtained.
Thereafter, ordering of the elements in the target bucket may continue to obtain a second ordering result. When ordering the elements in the target bucket, an index may be configured for each element, and the second ordering result, that is, the ordering result of each element according to the index, may be an ascending ordering result.
And finding out the corresponding access time length fractional bit number from the second sequencing result according to the target position. The process of configuring the index and finding the quantiles of the access duration can be realized by the following processes:
in the embodiment of the invention, in the process of determining the access time length quantile, only the bucket and each element in the target bucket are ordered, and only the index is configured for each element in the target bucket, and the full ordering or the full index construction is not required for each total access time length, so that the time delay for calculating the access time length quantile can be reduced, the calculation efficiency of the access time length quantile can be improved, the storage space can be more effectively used, and the storage resource can still not be wasted under the condition that the total access time length is normally distributed or relatively concentrated. In addition, the IO frequency for calculating the access time length quantile can be reduced, and the quick calculation can still be realized under the scene that the access time length quantile needs to be calculated frequently.
Based on the foregoing embodiment, the user classification method provided in the embodiment of the present invention stores the total access time periods in buckets based on a target mask, including:
dividing each total access time length into index segments and content segments based on the target mask;
and storing the content segments of each total access duration into a bucket identified by the index segment by adopting multithreading.
Specifically, in the embodiment of the present invention, when each total access duration is stored in a barrel, each total access duration may be first divided into an index segment and a content segment by using a target mask, where the index segment is first information, and the content segment is second information.
The acquisition of the index segment may be achieved by division calculation of the quotient, i.e. by the function f_index (value) =value/mask. Wherein, value is a certain total access duration, mask is the target mask.
The acquisition of the content segment may be achieved by dividing the calculation remainder, i.e. by the function f_tail (value) =value% mask.
Thereafter, a bucket may be created with the index segment for each total access time length as an identification and the content segments for the total access time length stored into the bucket, which may be accomplished through multithreading. Since the index segments of a plurality of different total access durations may be the same, grouping and storing of each total access duration may be accomplished through the bucket.
In the embodiment of the invention, the complete step of storing each total access time length in a barrel can comprise the following steps:
and reading all the total access duration values, and processing each value according to the steps from S21 to S23.
S21, storing the value packet into different buckets by using a target mask. Value may be an index segment by a function f_index (Value) =value/mask, and a content segment by f_tail (Value) =value% mask.
S22, if f_index (value) does not exist, creating a bucket taking f_index (value) as a key, and if f_index (value) exists, directly executing S23.
S23, storing f_tail (value) into the barrel, and adding one operation to the element number cnt in the barrel.
S24, after traversing all the values circularly, traversing all the barrels once, storing the sequences as the boxes [ ], and counting the total value, namely the total access duration number total.
The steps can be realized through the following processes:
in the embodiment of the invention, only the content segments with the total access time length are stored in the barrel, so that the occupation of the storage space can be reduced.
On the basis of the foregoing embodiment, the user classification method provided in the embodiment of the present invention divides each total access duration into an index segment and a content segment based on the target mask, including:
and dividing each total access time length into an index section and a content section by adopting a shift operation mode and a logic operation mode based on the nonzero number of the target mask.
Specifically, in the embodiment of the present invention, when dividing each total access duration into an index segment and a content segment, the number of non-zero bits of the target mask may also be adopted, and a shift operation mode and a logic operation mode are adopted to implement the method, that is, the acquisition of the index segment may be adopted, and the method of combining the shift operation mode and the logic operation mode may be adopted to determine the index segment, that is, the method of determining the index segment through head=value > > (64-digit). Wherein value is a certain total access duration, and digit is a non-zero digit in the target mask.
The acquisition of the content segment may also be determined by combining a shift operation mode and a logic operation mode by adopting an IP mask concept, that is, by tail=value & (-0 l > > > digit).
It will be appreciated that splitting the target mask into index segments and content segments by way of bit manipulation requires only four computations, subtraction, shift and AND computations, respectively, for each total access duration.
Since the computer calculates x/y, it is actually a process of subtracting y from x. Let ri denote the remainder obtained after the ith operation, then there is:
if ri >0, then the quotient is 1, the remainder and quotient are shifted left by 1 bit, and then the divisor is subtracted, i.e., ri+1=2ri-y;
if ri <0, then the quotient is 0, the remainder and quotient are shifted left by 1 bit, plus the divisor, i.e., ri+1=2ri+y.
The calculation of the 64-bit division requires 128 addition, subtraction and shift operations, and the time complexity is 32 times that of the shift operation.
Therefore, the calculation of the division by the computer is far more complicated than bit operation, and the target mask is split into an index segment and a content segment by adopting a shift operation mode and a logic operation mode, so that the calculation efficiency can be greatly improved.
Based on the foregoing embodiment, in the user classification method provided in the embodiment of the present invention, the sorting of each element in the target bucket to obtain a second sorting result includes:
index creation is carried out on each element in the target bucket by adopting multithreading;
and sorting the elements based on the indexes of the elements to obtain the second sorting result.
Specifically, in the embodiment of the present invention, in the process of sorting elements in a target bucket to obtain a second sorting result, multithreading may be used to create an index for each element in the target bucket, and then sort each element according to the created index for each element, so as to obtain the second sorting result. Therefore, the multithreading of the computer operating system can be utilized to the greatest extent, and the calculation efficiency of the access time length quantile can be improved.
Based on the foregoing embodiment, the user classification method provided in the embodiment of the present invention stores, in a bucket, each total access duration based on a target mask, including:
based on the target mask, multithreading is adopted to store each total access time length in a barrel.
Specifically, in the embodiment of the invention, when the total access time length is stored in the sub-bucket, the multi-thread synchronization can be adopted to realize the sub-bucket storage of the total access time length, so that the multi-thread of the computer operating system can be utilized to the greatest extent, and the calculation efficiency of the quantile of the access time length can be improved.
According to the test, the loading rate of the access time quantile is in direct proportion to the thread number. The access duration is reduced by (1-1/thread count) 100% by the read time of the fractional bit.
Based on the foregoing embodiment, the user classification method provided in the embodiment of the present invention, where determining, based on the total number of access durations and the preset score, an initial position corresponding to the access duration score includes:
storing each barrel into a memory;
and if the capacity of the memory is insufficient, selecting a barrel with the number of content segments larger than the number threshold corresponding to the target mask from the barrels to write into a disk.
Specifically, in the embodiment of the invention, after each total access time length is stored in a barrel by the target mask, each barrel can be stored in the memory first, and the number of times of IO can be reduced.
When the capacity of the memory is insufficient, a bucket with the number of content segments larger than the number threshold value can be selected from each bucket to write into the disk. The number threshold may be set as desired, not too large or too small, if too large, may result in underutilization of subsequent memory. If too small, the IO count may be excessive. Through experiments, the final access duration fractional computation time is reduced by 100% (1-memory size/content segment number size).
The number threshold can be set according to the needs, and corresponds to the target mask, and the maximum capacity of the barrel marked by the index segment after dividing each total access time length through the target mask can be adjusted flexibly by matching with the target mask, so that IO times are reduced to the maximum extent, and algorithm efficiency is optimized. Theoretically, the larger the number threshold is, the larger the data amount of the single batch IO is, the smaller the number of times of the single batch IO is, the higher the overall efficiency of the algorithm is, and the number threshold is at most equal to the maximum capacity of each barrel after the target mask is acted. However, in practice, the number threshold cannot be increased without limitation, because of limited memory.
In the embodiment of the invention, the memory and the disk are used together as the storage space, so that the full utilization of the storage space can be realized, and the IO frequency can be reduced.
On the basis of the above embodiment, the user classification method provided in the embodiment of the present invention, the target mask is determined based on the following method:
determining an optimal mask range according to the total access time length number and the content segments of each total access time length;
and determining any mask in the optimal mask range as the target mask.
Specifically, in the embodiment of the present invention, the target mask may be a mask selected randomly from an optimal mask range as the target mask, where the optimal mask range may be determined according to the total number of access durations of the actual deployment scenario and the size of the content segments of each total access duration.
In the embodiment of the present invention, the number threshold corresponding to the target mask may be an optimal number threshold determined for each mask in the optimal mask range after determining the optimal mask range, so as to find the optimal configuration [ target mask, number threshold ]. Therefore, the IO frequency can be minimized, and the calculation efficiency of the access time quantile is further improved.
On the basis of the above embodiment, the method for calculating the access duration quantiles according to the present invention may find that, compared with the external sorting method and the full-scale indexing method in the prior art, the number of times of single query IO of the external sorting method is at least n times that of other methods, and n is very large in a big data scenario, so that the efficiency is the lowest in a scenario of calculating the access duration quantiles. According to the calculation method provided by the embodiment of the invention, the IO frequency n/m of the constructed index is 1/log n of the full index, and because n is far greater than 2, the IO frequency of the constructed index process in the embodiment of the invention is smaller than the full index, and meanwhile, the IO frequency of a single query is 1 and smaller than or equal to the IO frequency of the full index. Specifically, the results are shown in Table 1. In view of the above, the time performance of the calculation method in the embodiment of the invention is the best.
TABLE 1 comparison of IO times
Index construction process Single quantile query
External ordering method 0 n*log n/m
Full-scale indexing method n*log n/m log n/m>1?log n/m:1
Calculation method in the invention n/m 1
On the basis of the foregoing embodiments, the user classification method provided in the embodiment of the present invention classifies each user according to the total access duration and the access duration quantile, and then includes:
and recommending service information for each user based on the classification result corresponding to each user.
Specifically, in the embodiment of the invention, after classifying each user, service information recommendation can be performed for each user according to the classification result corresponding to each user. It is understood that the classification result corresponding to each user refers to the category of each user. Therefore, the service information recommendation is performed for each user, namely, the appropriate service information is recommended for the users in different categories, so that the pertinence and the rationality of the service information recommendation can be ensured, the user experience can be improved, and the user viscosity can be improved.
As shown in fig. 2, on the basis of the above embodiment, in an embodiment of the present invention, there is provided a user classification device, including:
the barrel storage module 21 is configured to obtain a total access duration of each user to the target object, and store each total access duration in barrels based on the target mask;
a quantile calculation module 22, configured to determine an initial position corresponding to a preset quantile based on a total number of access durations and the preset quantile, and calculate an access duration quantile at the preset quantile based on the initial position and the number of elements in each bucket;
and the classification module 23 is configured to classify each user based on the total access duration and the access duration quantile.
On the basis of the foregoing embodiments, the user classification device provided in the embodiment of the present invention, the quantile calculation module is specifically configured to:
sequencing each barrel based on the identification of each barrel to obtain a first sequencing result;
starting from a first barrel in the first sequencing result, determining a target barrel in each barrel and a target position of the preset position in the target barrel based on difference information between the current position and the element number in the current barrel; the current position is determined based on difference information corresponding to a position before the current position, and an initial value of the current position is the initial position;
sequencing all elements in the target bucket to obtain a second sequencing result;
and determining the access duration quantile based on the target position and the second sorting result.
Based on the foregoing embodiments, the user classification device provided in the embodiments of the present invention, the bucket storage module is specifically configured to:
dividing each total access time length into index segments and content segments based on the target mask;
and storing the content segments of each total access duration into a bucket identified by the index segment by adopting multithreading.
On the basis of the foregoing embodiments, the user classification device provided in the embodiment of the present invention, the barrel storage module is further specifically configured to:
and dividing each total access time length into an index section and a content section by adopting a shift operation mode and a logic operation mode based on the nonzero number of the target mask.
On the basis of the foregoing embodiment, the user classification device provided in the embodiment of the present invention further includes a storage module, configured to:
storing each barrel into a memory;
and if the capacity of the memory is insufficient, selecting a barrel with the number of content segments larger than the number threshold corresponding to the target mask from the barrels to write into a disk.
On the basis of the foregoing embodiment, the user classification device provided in the embodiment of the present invention further includes a mask determining module, configured to:
determining an optimal mask range according to the total access time length number and the size of the content segments of each total access time length;
and taking any mask in the optimal mask range as the target mask.
On the basis of the foregoing embodiment, the user classification device provided in the embodiment of the present invention further includes a recommendation module, configured to:
and recommending service information for each user based on the classification result corresponding to each user.
Specifically, the functions of each module in the user classification device provided in the embodiment of the present invention are in one-to-one correspondence with the operation flows of each step in the method embodiment, and the achieved effects are consistent.
Fig. 3 illustrates a physical schematic diagram of an electronic device, as shown in fig. 3, where the electronic device may include: processor (Processor) 310, communication interface (Communications Interface) 320, memory (Memory) 330 and communication bus 340, wherein Processor 310, communication interface 320, memory 330 accomplish communication with each other through communication bus 340. Processor 310 may invoke logic instructions in memory 330 to perform the user classification method provided in the embodiments described above, including: acquiring the total access time length of each user to the target object, and storing each total access time length in a barrel based on the target mask; determining an initial position corresponding to a preset quantile based on the total access time length number and the preset quantile, and calculating the access time length quantile at the preset quantile based on the initial position and the element number in each barrel; and classifying each user based on the total access time length and the access time length quantile.
Further, the logic instructions in the memory 330 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product comprising a computer program, the computer program being storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of performing the user classification method provided by the methods described above, the method comprising: acquiring the total access time length of each user to the target object, and storing each total access time length in a barrel based on the target mask; determining an initial position corresponding to a preset quantile based on the total access time length number and the preset quantile, and calculating the access time length quantile at the preset quantile based on the initial position and the element number in each barrel; and classifying each user based on the total access time length and the access time length quantile.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the user classification method provided by the methods above, the method comprising: acquiring the total access time length of each user to the target object, and storing each total access time length in a barrel based on the target mask; determining an initial position corresponding to a preset quantile based on the total access time length number and the preset quantile, and calculating the access time length quantile at the preset quantile based on the initial position and the element number in each barrel; and classifying each user based on the total access time length and the access time length quantile.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A method of classifying users, comprising:
acquiring the total access time length of each user to the target object, and storing each total access time length in a barrel based on the target mask;
determining an initial position corresponding to a preset quantile based on the total access time length number and the preset quantile, and calculating the access time length quantile at the preset quantile based on the initial position and the element number in each barrel;
and classifying each user based on the total access time length and the access time length quantile.
2. The method for classifying users according to claim 1, wherein the calculating the access duration score at the preset score based on the initial position and the number of elements in each bucket specifically comprises:
sequencing each barrel based on the identification of each barrel to obtain a first sequencing result;
starting from a first barrel in the first sequencing result, determining a target barrel in each barrel and a target position of the preset position in the target barrel based on difference information between the current position and the element number in the current barrel; the current position is determined based on difference information corresponding to a position before the current position, and an initial value of the current position is the initial position;
sequencing all elements in the target bucket to obtain a second sequencing result;
and determining the access duration quantile based on the target position and the second sorting result.
3. The method for classifying users according to claim 1, wherein the storing each total access duration in buckets based on the target mask comprises:
dividing each total access time length into index segments and content segments based on the target mask;
and storing the content segments of each total access duration into a bucket identified by the index segment by adopting multithreading.
4. The user classification method according to claim 3, wherein the dividing each total access duration into an index segment and a content segment based on the target mask comprises:
and dividing each total access time length into an index section and a content section by adopting a shift operation mode and a logic operation mode based on the nonzero number of the target mask.
5. The method for classifying users according to any one of claims 1 to 4, wherein determining an initial position corresponding to the access duration score based on the total access duration number and the preset score includes:
storing each barrel into a memory;
and if the capacity of the memory is insufficient, selecting a barrel with the number of content segments larger than the number threshold corresponding to the target mask from the barrels to write into a disk.
6. The user classification method according to any of claims 1-4, characterized in that the target mask is determined based on the following method:
determining an optimal mask range according to the total access time length number and the size of the content segments of each total access time length;
and taking any mask in the optimal mask range as the target mask.
7. The method of any one of claims 1-4, wherein classifying each user based on the total access time length and the access time length quantile, then comprises:
and recommending service information for each user based on the classification result corresponding to each user.
8. A user classification apparatus, comprising:
the sub-bucket storage module is used for acquiring the total access time length of each user to the target object and storing the total access time length in sub-buckets based on the target mask;
the system comprises a score calculating module, a score calculating module and a control module, wherein the score calculating module is used for determining an initial position corresponding to a preset score based on the total access time length number and the preset score, and calculating the access time length score at the preset score based on the initial position and the element number in each barrel;
and the classification module is used for classifying each user based on the total access time length and the access time length quantile.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the user classification method of any of claims 1 to 7 when the program is executed by the processor.
10. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the user classification method according to any one of claims 1 to 7.
CN202210471141.6A 2022-04-28 2022-04-28 User classification method, device, electronic equipment and storage medium Pending CN117009629A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210471141.6A CN117009629A (en) 2022-04-28 2022-04-28 User classification method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210471141.6A CN117009629A (en) 2022-04-28 2022-04-28 User classification method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117009629A true CN117009629A (en) 2023-11-07

Family

ID=88569736

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210471141.6A Pending CN117009629A (en) 2022-04-28 2022-04-28 User classification method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117009629A (en)

Similar Documents

Publication Publication Date Title
CN106649346B (en) Data repeatability checking method and device
CN109325032B (en) Index data storage and retrieval method, device and storage medium
US20160328445A1 (en) Data Query Method and Apparatus
CN110489405B (en) Data processing method, device and server
JP2019512127A (en) String distance calculation method and apparatus
US8996436B1 (en) Decision tree classification for big data
CN111597054B (en) Information processing method, system, electronic equipment and storage medium
CN109710542B (en) Full N-way tree construction method and device
CN105488176A (en) Data processing method and device
CN110704424B (en) Sorting method and device applied to database and related equipment
CN109977373B (en) Identification number distribution method, identification number recovery method and device
CN105701128A (en) Query statement optimization method and apparatus
CN117009629A (en) User classification method, device, electronic equipment and storage medium
CN111784246A (en) Logistics path estimation method
CN108376054B (en) Processing method and device for indexing identification data
CN115862653A (en) Audio denoising method and device, computer equipment and storage medium
CN115759250A (en) Attribution analysis method, attribution analysis device, electronic equipment and storage medium
WO2015142169A1 (en) A method and system for determining a measure of overlap between data entries
CN106445960A (en) Data clustering method and device
US20050278705A1 (en) System and method for analyzing a process
CN110309139B (en) High-dimensional neighbor pair searching method and system
CN109947933B (en) Method and device for classifying logs
CN113077344A (en) Transaction method and device based on block chain, electronic equipment and storage medium
CN111984652A (en) Method for searching idle block in bitmap data and related components
CN112181829A (en) User distribution method, device, terminal and medium for AB experiment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination