CN117435756B - Data processing method for inquiring user retention based on bitmap - Google Patents
Data processing method for inquiring user retention based on bitmap Download PDFInfo
- Publication number
- CN117435756B CN117435756B CN202311737605.4A CN202311737605A CN117435756B CN 117435756 B CN117435756 B CN 117435756B CN 202311737605 A CN202311737605 A CN 202311737605A CN 117435756 B CN117435756 B CN 117435756B
- Authority
- CN
- China
- Prior art keywords
- bitmap
- user
- retention
- time period
- bitmaps
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000014759 maintenance of location Effects 0.000 title claims abstract description 118
- 238000003672 processing method Methods 0.000 title claims abstract description 14
- 230000002354 daily effect Effects 0.000 claims description 26
- 230000003203 everyday effect Effects 0.000 claims description 12
- 238000012545 processing Methods 0.000 claims description 8
- 238000000034 method Methods 0.000 claims description 7
- 238000013507 mapping Methods 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 230000002688 persistence Effects 0.000 claims 1
- 230000002776 aggregation Effects 0.000 abstract description 4
- 238000004220 aggregation Methods 0.000 abstract description 4
- 238000013500 data storage Methods 0.000 abstract description 2
- 230000007547 defect Effects 0.000 abstract description 2
- 238000011161 development Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000013499 data model Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/51—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/53—Querying
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/5866—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, manually generated location and time information
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Library & Information Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a data processing method for inquiring user retention based on bitmaps, which is characterized in that user id is coded and converted into integers and bitmaps to generate a daily active user bit chart stored with daily active users, when the user self-defines a time range, a retention period and retention granularity, bitmaps are obtained from the daily active user bitmap chart according to the self-defined information to obtain the retention user number and the user retention rate. The invention solves the defect that the pre-aggregation mode of the existing scheduling system can not self-define the retention period and the retention granularity, and the bitmap storage is adopted to greatly reduce the data storage space and improve the query efficiency.
Description
Technical Field
The invention relates to the technical field of data processing, in particular to a data processing method for inquiring user retention based on a bitmap.
Background
With the rapid development of modern information technology and the increasingly perfect and popular big data technology, more and more industries and companies use big data to process and analyze their own data, such as using data to analyze the service development state and the life cycle of users. Under the condition that the Internet population is getting weaker and weaker, users are getting harder and harder to acquire, competition is getting stronger and stronger, and how to keep the users is more important than obtaining the users. The user who uses the product in a certain period of time and still continues to use the product after a period of time is called a retention user, and the proportion of the retention user to the newly added user is the user retention rate.
Analysis of user retention of a service or application is therefore an important indicator of whether sustainable development is possible.
The existing common user retention calculation mode is as follows: the preset retention granularity is day, and the retention period is such as next day, three days, seven days and thirty days, and the user retention of the corresponding period is respectively calculated according to the preset retention period at regular time every day by using the scheduling system. For example, calculating the retention user on the next day or the retention rate on the next day, respectively acquiring an active user list on the first day and an active user list on the second day, associating the user id on the second day with the user id on the first day, wherein the user id on the first day which can be associated with the second day is the retention user on the second day, the sum of the retention users on the second day on the first day is the retention user number, the retention user number/the user number on the first day is the user retention rate, and the retention granularity is week, month, year and the like. The conventional user retention calculation cannot self-define the retention period and the retention granularity to acquire corresponding user retention data, a corresponding data model and logic need to be developed in advance according to the preset retention period, and additional development is needed when a new retention period and retention granularity are set; and the query efficiency is lower, and the response service is slower.
Therefore, the invention provides a data processing method for inquiring user retention based on a bitmap, so as to at least solve the above part of technical problems.
Disclosure of Invention
The invention aims to solve the technical problems that: the data processing method for inquiring the user retention based on the bitmap is free from the limitation of a pre-aggregation model, and the user retention can be calculated by self-defining the retention period and the retention granularity.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
the invention aims to solve the technical problems that: the data processing method for inquiring the user retention based on the bitmap is free from the limitation of a pre-aggregation model, and the user retention can be calculated by self-defining the retention period and the retention granularity.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
a data processing method for inquiring user retention based on bitmap includes the following steps:
step 1, creating a daily active user table taking a day as a dimension, and writing user ids which are active every day into the daily active user table corresponding to the day;
step 2, each user id is encoded into a unique integer, and the integer is converted into a bitmap;
step 3, creating a daily active user bit chart taking a day as a dimension, merging bitmaps of all user ids active every day into a first bitmap, and writing the first bitmap into the daily active user bit chart for a corresponding day;
step 4, determining the time to be calculated according to the set inquiry time, the retention granularity and the retention period, and extracting days and corresponding first bitmaps in the time to be calculated in the daily active user bitmap to obtain a first data set;
step 5, dividing the time to be calculated into different time periods according to the set retention granularity and retention period, merging all the first bitmaps under each time period into a second bitmap, and creating a second data set containing each time period and the corresponding second bitmap;
step 6, adding a second bitmap of the next time period after the second bitmap of each time period of the second data set to obtain a third data set;
step 7, calculating the intersection of the second bitmap of each time period and the second bitmap of the next adjacent time period to obtain a user retention bitmap, and writing the user retention bitmap into the third data set to obtain a fourth data set;
and 8, obtaining the number of active users in each time period and the number of reserved users in the next time period based on the second bitmap and the user reserved bitmap in each time period, and obtaining the user reserved rate from the number of active users and the number of reserved users.
Further, the method also comprises the steps of constructing a user-defined function, wherein the user-defined function comprises a first function for converting an integer into a bitmap, a second function for combining a plurality of bitmaps into one bitmap, a third function for calculating the number of the bitmaps and a fourth function for calculating the intersection of the two bitmaps.
Further, in step 2, converting the integer into a bitmap by using a first function; in step 3, merging bitmaps of all user ids active every day into a first bitmap by adopting a second function; in step 5, merging all the first bitmaps under each time period into a second bitmap by adopting a second function; in step 7, calculating the intersection of the second bitmap of each time period and the second bitmap of the next adjacent time period by adopting a fourth function; in step 8, the active users of the second bitmap and the reserved users of the user reserved bitmap of each time period are respectively calculated by using a third function.
Further, in step 2, the efficient compressed bitmap is used as a processing manner of the integer conversion bitmap.
Further, in step 2, the integer is a self-increasing integer starting from 1.
Further, step 2 includes constructing a user dictionary table, and establishing a one-to-one mapping relation between user ids and dictionary ids in the user dictionary table, wherein each dictionary id is a unique integer through the mapping relation.
Further, in step 4, the time to be calculated is the inquiry time plus the time of one retention period after the end point of the inquiry time.
Further, step 6 includes: step 61, adding two columns of self-increasing sequence numbers after the second bitmap of each time period of the second data set according to time sequencing, wherein the first column of self-increasing sequence numbers are sequence number 1, the second column of self-increasing sequence numbers are sequence number 2, the sequence number 1 is from 1, and the sequence number 2 is from 0; step 62, extracting two second bitmaps with sequence number 1 equal to sequence number 2, and sequentially adding the extracted two second bitmaps to a time period corresponding to sequence number 1 to obtain a third data set.
Further, step 8 includes: step 81, obtaining the number of active users in the current time period according to the second bitmap in the current time period; step 82, obtaining the reserved user number of the next time period according to the user reserved bitmap; step 83, dividing the number of users reserved in the next time period by the number of active users in the current time period to obtain the corresponding user reserved rate.
Further, the retention granularity is the date dimension in days, weeks, months or years.
Compared with the prior art, the invention has the following beneficial effects:
the invention generates a daily active user bit chart including storing daily active users by converting user id into integer and bitmap, when the user self-defines time range, retention period and retention granularity, the bitmap is obtained from the daily active user bit chart to obtain the union, intersection and bitmap base number according to the self-defined information, and the retention user number and the user retention rate are obtained. The invention solves the defect that the pre-aggregation mode of the existing scheduling system can not self-define the retention period and the retention granularity, and the bitmap storage is adopted to greatly reduce the data storage space and improve the query efficiency.
The invention adopts bitmap storage, the storage occupation is small, and the query efficiency reserved by the user is improved through the intersection set of the bitmaps; encoding the user id, and supporting bitmap calculation of the non-integer user id; and calculating a retention user and a user retention, and rapidly responding to the service demand.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
Term interpretation:
hive is a data warehouse tool based on Hadoop.
The present invention will be described in further detail with reference to the accompanying drawings, in order to make the objects, technical solutions and advantages of the present invention more apparent. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention. Furthermore, the terms "first," "second," "third," "fourth," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
As shown in fig. 1, the data processing method for inquiring user retention based on bitmap provided by the invention comprises the following steps:
step 1, creating a daily active user table taking a day as a dimension, and writing user ids which are active every day into the daily active user table corresponding to the day;
step 2, each user id is encoded into a unique integer, and the integer is converted into a bitmap;
step 3, creating a daily active user bit chart taking a day as a dimension, merging bitmaps of all user ids active every day into a first bitmap, and writing the first bitmap into the daily active user bit chart for a corresponding day;
step 4, determining the time to be calculated according to the set inquiry time, the retention granularity and the retention period, and extracting days and corresponding first bitmaps in the time to be calculated in the daily active user bitmap to obtain a first data set;
step 5, dividing the time to be calculated into different time periods according to the set retention granularity and retention period, merging all the first bitmaps under each time period into a second bitmap, and creating a second data set containing each time period and the corresponding second bitmap;
step 6, adding a second bitmap of the next time period after the second bitmap of each time period of the second data set to obtain a third data set;
step 7, calculating the intersection of the second bitmap of each time period and the second bitmap of the next adjacent time period to obtain a user retention bitmap, and writing the user retention bitmap into the third data set to obtain a fourth data set;
and 8, obtaining the number of active users in each time period and the number of reserved users in the next time period based on the second bitmap and the user reserved bitmap in each time period, and obtaining the user reserved rate from the number of active users and the number of reserved users.
The invention also comprises a User-defined function (User-Defined Functions, UDFs) based on Hive, wherein a User-defined function is constructed, data is queried in a Hive database through the User-defined function, when a Bitmap related hiveUDF function is written, a high-efficiency compression Bitmap RoaringBitmap (RBM) is used as a processing mode of an integer conversion Bitmap, RBM mainly divides 32-bit integer (int) into high 16 bits and low 16 bits (two shorts), wherein numbers corresponding to the high 16 bits are stored by 16-bit integer ordered numbers, and the low 16 bits are stored by three different containers according to different conditions, and Bitmap containers are selected as low 16-bit storage; when a 32-bit shaped number is stored into the RBM, the number is first sorted into bins according to the upper 16 bits of the number to determine into which bin the number is to be stored. After the barrel dividing position is determined, the lower 16 bits corresponding to the number are put into the container corresponding to the current barrel.
The user-defined functions include a first function (to_bitmap), a second function (bitmap_unit), a third function (bitmap_count), and a fourth function (bitmap_and). The first function (to_bitmap) needs to inherit the UDF function of Hive, receive an integer parameter, first convert the incoming integer into a binary string, then traverse each bit in the binary string, add them as an integer to the bitmap list, and finally get a bitmap, which contains the value of each binary bit of the integer. The second function (bitmap_unit) needs to inherit Hive's UDAF function, receive multiple values of the same bitmap parameters, and merge them into one bitmap for merging multiple bitmaps of the field and return the union. The third function (bitmap_count) needs to inherit the UDF function of Hive, receive a bitmap parameter, and return the number of elements contained in the bitmap. The fourth function (bitmap_and) requires Hive's UDF function, receives two bitmap parameters, intersects the two bitmaps and returns the intersection.
In step 1, a daily active user table with a day as a dimension is created, and user ids of daily service data (such as buried data), namely user ids active every day, are written into the daily active user table corresponding to the day. The day-active user list also includes the week, month, and year for which the day corresponds, i.e., the week, month, and year to which the current day belongs.
In step 2, each user id is encoded as a unique integer, and the integer is converted into a bitmap using a first function (to_bitmap). The integer is a self-increasing integer starting from 1. Preferably, a user dictionary table is constructed to establish association of user ids and integers, a one-to-one mapping relation between the user ids and the dictionary ids is established in the user dictionary table, the user ids are encoded through the mapping relation, and each dictionary id is a unique integer. Skipping when the user id exists in the dictionary table; when the user id does not exist in the dictionary table, the user id is stored in the dictionary table, a dictionary table id is generated, and the generated dictionary table id is increased by 1 for the dictionary table id corresponding to the last stored user id.
In step 3, a daily active user bit chart with a day as a dimension is created, bitmaps of all user ids active every day are combined into a first bitmap by adopting a second function (bitmap_unit), and the first bitmap is written into the daily active user bit chart for a corresponding day. The day-active user position chart also includes the week, month and year corresponding to the day, i.e., the week, month and year to which the current day belongs.
In step 4, determining the time to be calculated according to the set inquiry time (between the starting point and the ending point), the retention granularity and the retention period, and extracting the day and the corresponding first bitmap in the time to be calculated in the daily active user bitmap to obtain a first data set. Wherein the time to be calculated is the inquiry time plus the time of the retention granularity of one retention period after the end point of the inquiry time.
User retention is the retention of a user at a certain time in the next granularity period, wherein the retention granularity and the retention period are basic definition information which needs to be provided for retention data which needs to be queried by the user. The retention granularity is the dimension of the date in days, weeks, months or years, and the retention period is the number such as 1,3,7, 30, etc. which needs to be specified after the retention granularity is selected. For example, the retention granularity is set to be week, and the retention period is set to be 3, which means that the retention of the user in the first week (the week after three weeks) is required to be queried for 3 weeks at the week granularity.
In step 5, according to the set retention granularity (such as day, week, month, year, etc.), the time to be calculated is divided into different time periods (such as time periods corresponding to each day, week, month, year, etc.), and all the first bitmaps under each time period are combined into one second bitmap by using a second function (bitmap_unit), so as to create a second data set including each time period and the corresponding second bitmap.
In step 6, adding the second bitmap of the next adjacent time period after the second bitmap of each time period of the second data set, so as to obtain a third data set. Preferably, step 6 includes: step 61, adding two columns of self-increasing sequence numbers after the second bitmap of each time period of the second data set according to time sequencing, wherein the first column of self-increasing sequence numbers are sequence number 1, the second column of self-increasing sequence numbers are sequence number 2, the sequence number 1 is from 1, and the sequence number 2 is from 0; step 62, extracting two second bitmaps with sequence number 1 equal to sequence number 2, and sequentially adding the extracted two second bitmaps to a time period corresponding to sequence number 1 to obtain a third data set.
In step 7, a fourth function (bitmap_and) is adopted to calculate the intersection of the second bitmap of each time period and the second bitmap of the next adjacent time period, so as to obtain a user retention bitmap, and the user retention bitmap is written into the third data set, so as to obtain a fourth data set.
In step 8, based on the second bitmap and the user retention bitmap of each time period, a third function is adopted to obtain the active user number of each time period and the retention user number of the next time period respectively, and the user retention rate is obtained by the active user number and the retention user number. Preferably, step 8 includes: step 81, obtaining the number of active users in the current time period according to the second bitmap in the current time period; step 82, obtaining the reserved user number of the next time period according to the user reserved bitmap; step 83, dividing the remaining users in the next time period by the active users in the current time period to obtain the user remaining in the current time period.
In one embodiment of the invention, the retention period is 1 and the retention granularity is month, namely the number of users retained and the user retention rate of each month, in the range of the query time 2023-04-01 to 2023-08-01.
Dictionary tables are constructed as shown in table 1, useid represents user id, and bicid represents dictionary id.
TABLE 1
And converting the integer corresponding to each userid into a bitmap.
As shown in table 2, df_d, df_w, and df_y represent the dimensions of day, week, month, and year, respectively.
TABLE 2
A daily active user bit map is created, as shown in table 3, with user_bitma representing the bitmap corresponding to userid.
TABLE 3 Table 3
According to the inquiry times 2023-04-01 to 2023-08-01, the retention period is 1, the retention granularity is month, the time to be calculated starts from 2023-04-01, and the next month of 2023-08-01, namely 2023-09-01, the time to be calculated is 2023-04-01 to 2021-09-01, corresponding data are extracted from the daily active user table, bitmaps of all user ids active every day are combined into one first bitmap, and a first data set is obtained, as shown in table 4.
TABLE 4 Table 4
According to the set retention granularity being month, the retention period is 1, namely inquiring the retention condition of a user in the next month of each month, dividing the time to be calculated into each month, merging all the first bitmaps of each month into one second bitmap, and creating a second data set, wherein the montath represents each month as shown in table 5.
TABLE 5
According to the time ordering, two columns of self-increasing numbers are added behind the second bitmap of the second data set per month, wherein the first column of self-increasing numbers is number 1, the second column of self-increasing numbers is number 2, the number 1 starts from 1, the number 2 starts from 0, and table 6 is obtained, index1 represents number 1, and index2 represents number 2.
TABLE 6
Two second bitmaps with the sequence number 1 equal to the sequence number 2 are extracted from table 5, and after the extracted two second bitmaps are sequentially added to the month corresponding to the sequence number 1, a third data set is obtained, as shown in table 7, each month includes both the second bitmaps of active users in the month and the second bitmaps of active users in the month next to the month, d1.User_bitma represents the second bitmap in the month, d2.User_bitma represents the second bitmap in the month next to the month.
TABLE 7
And calculating the intersection of the second bitmap of each month and the second bitmap of the next month to obtain a user retention bitmap, and writing the user retention bitmap into the third data set to obtain a fourth data set, wherein the retain_user_bitmap represents the user retention bitmap as shown in table 8.
TABLE 8
Calculating the second bitmap and the user retention bitmap of the current month by adopting a third function (bitmap_count), respectively obtaining the active user number of the current month and the retention user number of the next month of the current month, dividing the retention user number of the next month of the current month by the active user number of the current month to obtain the corresponding user retention rate, wherein the result is shown in a table 9, the active_user represents the active user number of the current month, the retain_user represents the retention user number of the next month of the current month, and the rate_retain_user represents the user retention rate.
TABLE 9
Finally, it should be noted that: the above embodiments are merely preferred embodiments of the present invention for illustrating the technical solution of the present invention, and are not limiting, but are not limiting of the scope of the present invention; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions; that is, even though the main design concept and spirit of the present invention is modified or finished in an insubstantial manner, the technical problem solved by the present invention is still consistent with the present invention, and all the technical problems are included in the protection scope of the present invention; in addition, the technical scheme of the invention is directly or indirectly applied to other related technical fields, and the technical scheme is included in the scope of the invention.
Claims (10)
1. The data processing method for inquiring the user retention based on the bitmap is characterized by comprising the following steps:
step 1, creating a daily active user table taking a day as a dimension, and writing user ids which are active every day into the daily active user table corresponding to the day;
step 2, each user id is encoded into a unique integer, and the integer is converted into a bitmap;
step 3, creating a daily active user bit chart taking a day as a dimension, merging bitmaps of all user ids active every day into a first bitmap, and writing the first bitmap into the daily active user bit chart for a corresponding day;
step 4, determining the time to be calculated according to the set inquiry time, the retention granularity and the retention period, and extracting days and corresponding first bitmaps in the time to be calculated in the daily active user bitmap to obtain a first data set;
step 5, dividing the time to be calculated into different time periods according to the set retention granularity and retention period, merging all the first bitmaps under each time period into a second bitmap, and creating a second data set containing each time period and the corresponding second bitmap;
step 6, adding a second bitmap of the next time period after the second bitmap of each time period of the second data set to obtain a third data set;
step 7, calculating the intersection of the second bitmap of each time period and the second bitmap of the next adjacent time period to obtain a user retention bitmap, and writing the user retention bitmap into the third data set to obtain a fourth data set;
and 8, obtaining the number of active users in each time period and the number of reserved users in the next time period based on the second bitmap and the user reserved bitmap in each time period, and obtaining the user reserved rate from the number of active users and the number of reserved users.
2. The bitmap-based query user-persisted data processing method of claim 1, further comprising constructing a user-defined function comprising a first function that converts an integer into a bitmap, a second function that merges multiple bitmaps into a bitmap, a third function that calculates the number of persisted bitmaps, and a fourth function that calculates the intersection of two bitmaps.
3. The method for processing data based on bitmap inquiry user retention according to claim 2, wherein in step 2, a first function is used to convert an integer into a bitmap; in step 3, merging bitmaps of all user ids active every day into a first bitmap by adopting a second function; in step 5, merging all the first bitmaps under each time period into a second bitmap by adopting a second function; in step 7, calculating the intersection of the second bitmap of each time period and the second bitmap of the next adjacent time period by adopting a fourth function; in step 8, the active users of the second bitmap and the reserved users of the user reserved bitmap of each time period are respectively calculated by using a third function.
4. The method for processing data based on bitmap inquiry user retention according to claim 1, wherein in step 2, a high-efficiency compressed bitmap is adopted as a processing mode of integer conversion bitmap.
5. The method for processing data based on bitmap inquiry user persistence according to claim 1, wherein in step 2, the integer is a self-increasing integer starting from 1.
6. The method for processing data based on bitmap inquiry user retention according to claim 1, wherein step 2 comprises constructing a user dictionary table, establishing a one-to-one mapping relationship between user ids and dictionary ids in the user dictionary table, and encoding the user ids through the mapping relationship, wherein each dictionary id is a unique integer.
7. The method according to claim 1, wherein in step 4, the time to be calculated is the time of the inquiry time plus one reservation period after the end point of the inquiry time.
8. The bitmap-based query user-resident data processing method according to claim 1, wherein step 6 comprises: step 61, adding two columns of self-increasing sequence numbers after the second bitmap of each time period of the second data set according to time sequencing, wherein the first column of self-increasing sequence numbers are sequence number 1, the second column of self-increasing sequence numbers are sequence number 2, the sequence number 1 is from 1, and the sequence number 2 is from 0; step 62, extracting two second bitmaps with sequence number 1 equal to sequence number 2, and sequentially adding the extracted two second bitmaps to a time period corresponding to sequence number 1 to obtain a third data set.
9. The bitmap-based query user-resident data processing method according to claim 1, wherein step 8 comprises: step 81, obtaining the number of active users in the current time period according to the second bitmap in the current time period; step 82, obtaining the reserved user number of the next time period according to the user reserved bitmap; step 83, dividing the number of users reserved in the next time period by the number of active users in the current time period to obtain the corresponding user reserved rate.
10. The bitmap query user retention-based data processing method of claim 1, wherein a retention granularity is a date dimension in days, weeks, months or years.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311737605.4A CN117435756B (en) | 2023-12-18 | 2023-12-18 | Data processing method for inquiring user retention based on bitmap |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311737605.4A CN117435756B (en) | 2023-12-18 | 2023-12-18 | Data processing method for inquiring user retention based on bitmap |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117435756A CN117435756A (en) | 2024-01-23 |
CN117435756B true CN117435756B (en) | 2024-03-26 |
Family
ID=89555608
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311737605.4A Active CN117435756B (en) | 2023-12-18 | 2023-12-18 | Data processing method for inquiring user retention based on bitmap |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117435756B (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107508917A (en) * | 2017-09-29 | 2017-12-22 | 济南浚达信息技术有限公司 | A kind of web site user activity statistical method and system based on bitmap |
CN108306936A (en) * | 2017-12-28 | 2018-07-20 | 深圳市创梦天地科技股份有限公司 | User's indicator-specific statistics method and server |
CN112269726A (en) * | 2020-10-22 | 2021-01-26 | 腾讯音乐娱乐科技(深圳)有限公司 | Data processing method and device |
CN112434085A (en) * | 2020-12-04 | 2021-03-02 | 四三九九网络股份有限公司 | Roaring Bitmap-based user data statistical method |
CN114328632A (en) * | 2021-12-06 | 2022-04-12 | 大箴(杭州)科技有限公司 | User data analysis method and device based on bitmap and computer equipment |
CN114579533A (en) * | 2022-02-25 | 2022-06-03 | 网易(杭州)网络有限公司 | Method and device for acquiring user activity index, electronic equipment and storage medium |
CN114791914A (en) * | 2022-05-07 | 2022-07-26 | 金腾科技信息(深圳)有限公司 | User behavior statistical method, device, equipment and medium based on Bitmap |
CN114968124A (en) * | 2022-06-28 | 2022-08-30 | 深圳前海微众银行股份有限公司 | Data storage method, server and storage medium |
CN115408381A (en) * | 2021-05-28 | 2022-11-29 | 腾讯科技(深圳)有限公司 | Data processing method and related equipment |
-
2023
- 2023-12-18 CN CN202311737605.4A patent/CN117435756B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107508917A (en) * | 2017-09-29 | 2017-12-22 | 济南浚达信息技术有限公司 | A kind of web site user activity statistical method and system based on bitmap |
CN108306936A (en) * | 2017-12-28 | 2018-07-20 | 深圳市创梦天地科技股份有限公司 | User's indicator-specific statistics method and server |
CN112269726A (en) * | 2020-10-22 | 2021-01-26 | 腾讯音乐娱乐科技(深圳)有限公司 | Data processing method and device |
CN112434085A (en) * | 2020-12-04 | 2021-03-02 | 四三九九网络股份有限公司 | Roaring Bitmap-based user data statistical method |
CN115408381A (en) * | 2021-05-28 | 2022-11-29 | 腾讯科技(深圳)有限公司 | Data processing method and related equipment |
CN114328632A (en) * | 2021-12-06 | 2022-04-12 | 大箴(杭州)科技有限公司 | User data analysis method and device based on bitmap and computer equipment |
CN114579533A (en) * | 2022-02-25 | 2022-06-03 | 网易(杭州)网络有限公司 | Method and device for acquiring user activity index, electronic equipment and storage medium |
CN114791914A (en) * | 2022-05-07 | 2022-07-26 | 金腾科技信息(深圳)有限公司 | User behavior statistical method, device, equipment and medium based on Bitmap |
CN114968124A (en) * | 2022-06-28 | 2022-08-30 | 深圳前海微众银行股份有限公司 | Data storage method, server and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN117435756A (en) | 2024-01-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110427434B (en) | Multidimensional data query method and device | |
CN109120272B (en) | RFID tag data compression method for discrete manufacturing workshop | |
CN113626448B (en) | HBase distributed storage-based space vector data indexing and query method | |
CN105260354A (en) | Chinese AC (Aho-Corasick) automaton working method based on keyword dictionary tree structure | |
CN113139227B (en) | BIM component construction code creation method based on Revit | |
CN113297435B (en) | Material management method and system based on gene codes | |
CN111274454B (en) | Spatio-temporal data processing method and device, electronic equipment and storage medium | |
CN110825733A (en) | Multi-sampling-stream-oriented time series data management method and system | |
CN103970842A (en) | Water conservancy big data access system and method for field of flood control and disaster reduction | |
CN115576998B (en) | Power distribution network data integration method and system based on multi-dimensional information fusion | |
CN110825830B (en) | Data retrieval method for grid space | |
CN114328981B (en) | Knowledge graph establishing and data acquiring method and device based on mode mapping | |
CN114443656A (en) | Customizable automated data model analysis tool and use method thereof | |
CN117435756B (en) | Data processing method for inquiring user retention based on bitmap | |
CN102867023B (en) | Method for storing and reading grid data and device | |
CN102999548B (en) | Geographical name data extended method and device in electronic chart | |
CN107423431A (en) | A kind of remotely-sensed data storage method and system based on distributed file system | |
CN113722533A (en) | Information pushing method and device, electronic equipment and readable storage medium | |
CN117971821A (en) | Data storage method, data reading method, device, and storage medium | |
US7624326B2 (en) | Encoding device and method, decoding device and method, program, and recording medium | |
CN110046343B (en) | Method for converting non-standard address into standard address and coding standard address | |
CN113138985B (en) | GPS data analysis method and system | |
CN115544305A (en) | Data storage method and device for digital steel coil system | |
CN107832345A (en) | The method of base station data unique numberization mark | |
CN114385624A (en) | Encoding method, encoding searching method, device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |