CN114328632A - User data analysis method and device based on bitmap and computer equipment - Google Patents

User data analysis method and device based on bitmap and computer equipment Download PDF

Info

Publication number
CN114328632A
CN114328632A CN202111481124.2A CN202111481124A CN114328632A CN 114328632 A CN114328632 A CN 114328632A CN 202111481124 A CN202111481124 A CN 202111481124A CN 114328632 A CN114328632 A CN 114328632A
Authority
CN
China
Prior art keywords
bitmap
user
label
query
tag
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111481124.2A
Other languages
Chinese (zh)
Inventor
刘一鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dazhu Hangzhou Technology Co ltd
Original Assignee
Dazhu Hangzhou Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dazhu Hangzhou Technology Co ltd filed Critical Dazhu Hangzhou Technology Co ltd
Priority to CN202111481124.2A priority Critical patent/CN114328632A/en
Publication of CN114328632A publication Critical patent/CN114328632A/en
Pending legal-status Critical Current

Links

Images

Abstract

The application discloses a user data analysis method and device based on a bitmap and computer equipment, relates to the field of data processing, and can solve the problems of long analysis time and large storage resource occupation during mass user data analysis. Acquiring a user data query request, wherein the user data query request carries at least one query tag, and when the data query request carries a plurality of query tags, the data query request also carries a logical operation relation among the plurality of query tags; in a tag bitmap database, searching a first tag bitmap corresponding to each query tag in at least one query tag; determining a target user identifier meeting a user data query request based on the first label bitmap, a logical operation relation among a plurality of query labels and a mapping relation table, wherein the mapping relation table is used for representing the mapping relation between each user identifier and each binary bit position in the label bitmap; and analyzing according to the target user identification to obtain a user data analysis result.

Description

User data analysis method and device based on bitmap and computer equipment
Technical Field
The present application relates to the field of data processing, and in particular, to a method and an apparatus for analyzing user data based on a bitmap, and a computer device.
Background
The rapid development of the internet generates massive user data, the user data can be generally divided into user attribute data and user behavior data, the user attribute data are basic information and states of users and generally do not change, such as sex, city, age and the like, and the user behavior data represent interaction relations between the users and products, such as whether to pay or not, active frequency, use duration and the like. The user data is analyzed, and products meeting the requirements of the users can be made in a targeted mode. For example, retention analysis is a method for analyzing user data, and is used for analyzing how many people perform subsequent behaviors in a user who performs an initial behavior, which is an important index for measuring the value of a product to the user, so as to update an iterative product in a targeted manner.
The existing method for analyzing the massive user data is to store the original user data in a disk, and to read all the original user data from the disk for analysis when analyzing the user data, so that the method results in that a large amount of user data is stored for a long time, a large amount of storage space is occupied, all the user data needs to be called when analyzing the user data each time, and the analysis time is long due to low parallelism, and the requirement of real-time feedback cannot be met.
Disclosure of Invention
In view of this, the present application provides a method and an apparatus for analyzing user data based on a bitmap, and a computer device, which relate to the field of data processing and can solve the problems of long analysis time and large storage resource occupation in analyzing massive user data.
According to an aspect of the present application, there is provided a bitmap-based user data analysis method, the method including:
acquiring a user data query request, wherein the user data query request carries at least one query tag, and when the data query request carries a plurality of query tags, the data query request also carries a logical operation relationship among the plurality of query tags;
searching a first label bitmap corresponding to each query label in the at least one query label in a label bitmap database, wherein each binary bit in the first label bitmap is used for indicating whether the query label exists in one user or not;
determining a target user identifier meeting the user data query request based on the first label bitmap, the logical operation relationship among the plurality of query labels and a preset mapping relationship table, wherein the preset mapping relationship table is used for representing the mapping relationship between each user identifier and each binary bit position in the label bitmap;
and analyzing according to the target user identification to obtain a user data analysis result.
According to another aspect of the present application, there is provided a bitmap-based user data analysis apparatus, the apparatus including:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a user data query request, the user data query request carries at least one query tag, and when the data query request carries a plurality of query tags, the data query request also carries a logical operation relation among the query tags;
the query module is used for querying a first tag bitmap corresponding to each query tag in the at least one query tag in the tag bitmap database, wherein each binary bit in the first tag bitmap is used for indicating whether a user has the query tag or not;
a determining module, configured to determine, based on the first tag bitmap, a logical operation relationship among the plurality of query tags, and a preset mapping relationship table, a target user identifier that meets the user data query request, where the preset mapping relationship table is used to represent a mapping relationship between each user identifier and each binary bit position in the tag bitmap;
and the analysis module is used for analyzing according to the target user identification to obtain a user data analysis result.
According to yet another aspect of the present application, there is provided a non-transitory readable storage medium having stored thereon a computer program which, when executed by a processor, implements the bitmap-based user data analysis method described above.
According to yet another aspect of the present application, there is provided a computer device comprising a non-volatile readable storage medium, a processor, and a computer program stored on the non-volatile readable storage medium and executable on the processor, the processor implementing the above bitmap-based user data analysis method when executing the program.
By means of the technical scheme, the application discloses a user data analysis method and device based on a bitmap and computer equipment, relates to the field of data processing, and can solve the problems of long analysis time and large storage resource occupation in mass user data analysis. The method comprises the steps of firstly, obtaining a user data query request, wherein the user data query request carries at least one query tag, and when the data query request carries a plurality of query tags, the data query request also carries a logical operation relation among the plurality of query tags; searching a first label bitmap corresponding to each query label in at least one query label in a label bitmap database, wherein each binary bit in the first label bitmap is used for indicating whether a user has the query label or not; determining a target user identifier meeting a user data query request based on a first label bitmap, a logical operation relation among at least one query label and a preset mapping relation table, wherein the mapping relation table is used for representing the mapping relation between each user identifier and each binary bit position in the label bitmap; and analyzing according to the target user identification to obtain a user data analysis result. According to the technical scheme, when massive user data are analyzed based on the bitmaps, on one hand, the storage space can be greatly saved by utilizing the bitmaps to store the massive user data, and on the other hand, the real-time analysis speed can be improved by utilizing parallel operation among the bitmaps.
The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application to the disclosed embodiment. In the drawings:
fig. 1 is a schematic flowchart illustrating a bitmap-based user data analysis method according to an embodiment of the present application;
FIG. 2 is a flow chart illustrating another bitmap-based user data analysis method according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of a user data analysis apparatus based on bitmap according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of another bitmap-based user data analysis apparatus according to an embodiment of the present application.
Detailed Description
The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
Aiming at the problems of long analysis time and large storage resource occupation in the analysis of the existing massive user data, the embodiment of the application provides a user data analysis method based on bitmap, as shown in fig. 1, the method comprises the following steps:
101. the method comprises the steps of obtaining a user data query request, wherein the user data query request carries at least one query tag, and when the data query request carries a plurality of query tags, the data query request also carries a logical operation relation among the plurality of query tags.
For this embodiment, as an implementation manner, the tags are used to characterize user characteristics, each user corresponds to multiple tags, for example, the occupation of the user 1 is a programmer, the age is 26 years old, a gender male, and the like, a plurality of users corresponding to one tag can be obtained, for example, the tag is a gender male, the gender male is obtained from the users 1, 2, 3, 4 as the users 1, 2, 3, 4, the tag is a browsed web page, the users 1, 3 can be obtained from the users 1, 2, 3, 4, the tags are new users, the users 1, 3, 4 can be obtained from the users 1, 2, 3, 4, a query tag is selected from the tags, and the query tag is a tag for query selected from the tags, where when one query tag is used, the corresponding user can be determined according to the query tag, for example, the query tag 1 is a gender male, the user 1 can be obtained, 2. When the number of the query tags is at least two, the logical operation relationship between the query tags needs to be determined, and the logical operation relationship includes intersection, union, difference, and the like, for example, the query tag 1 is a gender male, the query tag 2 is a browsed webpage, the query request for obtaining the user data is the query tag 1 and the intersection query tag 2, and the user 1 can be obtained.
102. And searching a first label bitmap corresponding to each query label in at least one query label in the label bitmap database, wherein each binary bit in the first label bitmap is used for indicating whether the query label exists in one user or not.
For this embodiment, as an implementation, data is stored using a Bitmap, which is a data structure representing a thick set (dense set) in a finite field, with each element appearing at least once, and no other data associated with the element. The bitmap is a data structure in which a value of an element corresponding to a certain element is marked by one bit, and since data is stored by using the bit as a unit, the storage space can be greatly saved. For example, using a 4-byte integer to store a subscriber identity would take up 32 bits (bit) of memory space, and mapping it to a single bit would reduce the memory space by a significant amount, taking up only the previous 1/32 memory space, and a massive data set would require only a very small amount of server memory resources. And use the bitmap to deposit the user and conveniently fuse, for example team 1 has set up the bitmap that label 1, 2, 3 correspond respectively, team 2 has set up the bitmap that label 4, 5, 6 correspond respectively, and when two team data fuse, can need not change the bitmap that has already set up, improved efficiency.
The label bitmap is a bitmap uniquely corresponding to the labels, a plurality of label bitmaps are stored in the label bitmap database, query labels are selected from the labels, and a unique first label bitmap in the label bitmaps is located according to each query label, wherein the first label bitmap and the label bitmaps are different in that the labels correspond to the label bitmaps, the query labels are selected from the labels, and the query labels correspond to the first label bitmaps.
Each label bitmap comprises a plurality of binary bits, each binary bit is marked by subscripts from left to right, for example, a first binary bit subscript 1 from left and a second binary bit subscript 2 from left, a user 1 corresponds to a subscript 1 and a user 2 corresponds to a subscript 2, each binary bit stores a value of 0 or 1, wherein 0 indicates that a user does not have a label corresponding to the label bitmap, 1 indicates that a user has a label corresponding to the label bitmap, for example, a 1 is placed at the subscript 1 of the label bitmap corresponding to the label 1, a 1 is indicated that a user corresponding to the user 1 has a label 1, a 0 is placed at the subscript 2 of the label bitmap corresponding to the label 1, and a user corresponding to the user 2 does not have a label 1, because the user and the corresponding relation of the subscripts of the binary bits in the bitmap are established, a user identification is preset for uniquely corresponding to the user, wherein the user identification is a positive integer starting from 1, and at least one user identifier corresponding to the subscript with the binary bit of 1 in the first label bitmap indicates that the query label exists for at least one user corresponding to each of the at least one user identifiers.
103. And determining a target user identifier meeting the user data query request based on the first label bitmap, the logical operation relationship among the plurality of query labels and a preset mapping relationship table, wherein the preset mapping relationship table is used for representing the mapping relationship between each user identifier and each binary bit position in the label bitmap.
For this embodiment, when there is only one query tag, determining, based on the first tag bitmap and the preset mapping relationship table, a target user identifier that meets the data query request, which may specifically include: if the data query request carries a query tag, determining a target user identifier matched with the first tag bitmap by using a preset mapping relation table;
when there are at least two query tags, as an implementation manner, the target user identifier satisfying the data query request is determined based on the first tag bitmap, the logical operation relationship among the multiple query tags, and the preset mapping relationship table. The method specifically comprises the following steps: if the data query request carries a plurality of query tags, performing logical operation on the first tag bitmap by taking binary bits as a computing unit according to a logical operation relation to obtain a second tag bitmap; and determining the target user identification matched with the second label bitmap by using the mapping relation table.
Correspondingly, when there are at least two query tags, as another implementation, the determining, based on the first tag bitmap, the logical operation relationship among the multiple query tags, and the preset mapping relationship table, a target user identifier that satisfies the user data query request may further include: cutting the first label bitmap to obtain a first label sub-bitmap with a second preset length, wherein the second preset length is an integer quotient of the first preset length; according to the logical operation relation, performing logical operation on the first label sub-bitmap by taking binary bits as a calculation unit to obtain a second label sub-bitmap; splicing the second label sub-bitmaps according to the cutting sequence to obtain a second label bitmap; and determining the target user identification matched with the second label bitmap by using the mapping relation table.
The mapping relationship is a corresponding relationship between a user identifier corresponding to the query tag embodied in the embodiment step 102 and each binary index in the first tag bitmap, for example, the tag is gender male, gender male is users 1 and 2 can be obtained from users 1, 2, 3 and 4, the tag is a browsed webpage, users 1 and 3 can be obtained from users 1, 2, 3 and 4, the tag is a new user, users 1, 3 and 4 can be obtained from users 1, 2, 3 and 4, a query tag is selected from the tags, and the query tag is a tag for query selected from the tags, where when one query tag is used, the corresponding user can be determined according to the query tag, for example, query tag 1 is male, and then binary index 1 and binary index 2 of the corresponding first tag bitmap are 1 when query tag 1 is queried, so that the target user identifier is determined to be user 1, and the target user identifier is determined to be user 1, 2.
When the number of the query tags is at least two, it is necessary to determine a logical operation relationship between the query tags, where the logical operation relationship includes an intersection, a union, a difference, and the like, for example, if the query tag 1 is a gender male, then binary bits of the corresponding first tag bitmap queried by the query tag 1 are sequentially 1, 0, and 0 from left to right, the query tag 2 is a browsed webpage, then binary bits of the corresponding first tag bitmap queried by the query tag 2 are sequentially 1, 0, 1, and 0 from left to right, the logical operation is an intersection, and the logical operation is sequentially performed on the first tag bitmap from left to right with the binary bits as a computing unit to obtain a second tag bitmap, which specifically is: subscript 1 of the second label bitmap: the intersection of 1 and 1 is 1, subscript 2: the intersection of 1 and 0 is an empty set and is denoted as 0, subscript 3: the intersection of 0 and 1 is an empty set denoted 0, subscript 4: the intersection of 0 and 0 is denoted as 0, so it is determined that the binary bit of the second tag bitmap is 1, and the corresponding target user is identified as user 1.
When the number of the query tags is at least two, the logical operation relationship between the query tags needs to be determined, and the logical operation relationship includes an intersection, a union, a difference, and the like, as another embodiment, the specific process of determining the target user identifier is as follows: cutting the first label bitmap to obtain a first label sub-bitmap with a second preset length, wherein the second preset length is an integer quotient of the first preset length; preferably, the determination of the first preset length of each first label bitmap can be performed by determining the maximum value of all user identifiers, and the number of the user identifiers that can be stored in each first label bitmap is set as the maximum value of the user identifiers, so that the first preset length of each label bitmap can be determined, and the first preset length of each first label bitmap is the same. The first label bitmap is cut, the first label sub-bitmap can be one binary bit or a plurality of binary bits, the number of the binary bits of the first label sub-bitmap obtained after cutting at least two first label bitmaps is the same, and after the first label bitmap is cut, the first label sub-bitmap can be subjected to logic operation with the binary bits in parallel, so that the calculation speed is increased, and the response time is shortened.
According to the logical operation relation, performing logical operation on the first label sub-bitmap by taking binary bits as a calculation unit to obtain a second label sub-bitmap; for example, existing users 1, 2, 3, and 4, in which the query tag 1 is a gender male, obtain a first tag bitmap 1 corresponding to the query tag 1: binary digit is 1, 0 from left to right in proper order, cuts into a plurality of first label sub-bitmaps 1 to first label bitmap 1, and first label sub-bitmap 1 contains a binary digit, and in the same way, inquiry label 2 is for having browsed the webpage, obtains the first label bitmap 2 that inquiry label 2 corresponds: the binary bits are 1, 0, 1 and 0 from left to right in sequence, the first label bitmap 2 is cut into a plurality of first label sub-bitmaps 2, the first label sub-bitmap 2 comprises one binary bit, at this time, logical operations do not need to be performed sequentially from left to right, the first label sub-bitmap can be subjected to logical operations in parallel, the subscript 1 of the first label sub-bitmap 1 and the subscript 1 of the first label sub-bitmap 2 are subjected to logical operations such as intersection to obtain a second label sub-bitmap, the second label sub-bitmap comprises a plurality of binary bits, the second label sub-bitmap 1 is a subscript 1, the binary bit thereof is 1, the second label sub-bitmap 2 is a subscript 2, the binary bit thereof is 0, the second label sub-bitmap 3 is a subscript 3, the binary bit thereof is 0, the second label sub-bitmap 4 is a subscript 4, and the binary bit thereof is 0.
Splicing the second label sub-bitmaps according to the cutting sequence to obtain a second label bitmap; determining a target user identifier matched with the second label bitmap by using the mapping relation table, for example, the cutting sequence is from left to right, and connecting the spliced second label sub-bitmaps according to a subscript 1, a subscript 2, a subscript 3, and a subscript 4 to obtain a binary bit storage subscript 1 of the second label bitmap: 1. subscript 2: 0. subscript 3: 0. subscript 4: and 0, determining a target user identifier 1 corresponding to the subscript 1 of the second label bitmap according to the mapping relation table.
104. And analyzing according to the target user identification to obtain a user data analysis result.
As an optional embodiment, the present embodiment obtains a first target user identifier in a first time period and a second target user identifier in a second time period, where the second time period is later than the first time period; calculating the intersection of the first target user identification and the second target user identification to obtain a reserved user; and determining the user attribute information and the user behavior information of the retained user as a user data analysis result. Through the steps 101, 102, and 103 of the embodiment, the target user identifier can be determined, and the retained user is a user who performs a subsequent behavior after performing an initial behavior, so that the retained user is determined by the same behavior in two time periods before and after, for example, user 1, 2, 3, 4 when webpage is browsed in 6 th month 1, and user 3, 4, 5, and 6 when webpage is browsed in 7 th month 1, so that the retained user is user 3, 4, the user data analysis result can be focused on the analysis of the user attribute information and the user behavior information of the retained user, and the user attribute information represents the user's own basic information and state, for example, the user attribute is divided into basic attributes: age, gender, city, economic attributes: occupation, income, channel attributes: and the channel and the mode of obtaining the customers are used for grouping a plurality of reserved users according to the characteristics, uploading the characteristics of the reserved users to a channel platform, and finding similar users through an algorithm. User behavior information represents the interaction of the user and the product, for example, the user behavior information is classified as an active case: new or old user, active frequency, pay condition: whether it is a paying user. The products can be classified according to the characteristics of the user behavior information, and the products with similar characteristics are formulated.
Preferably, after the user is retained, the user data analysis result further includes a user retention rate, specifically: determining the retention number of the retained users and the number of users corresponding to the first target user identification; calculating according to the reserved quantity and the user quantity to obtain a user reserved rate; and determining the user retention rate as a user data analysis result. The user retention rate is obtained by dividing the retention quantity by the user quantity, whether the user is retained is increased or not can be judged through the user retention rate, and if the user is not retained, the marketing strategy can be adjusted in time according to the user loss time and the event causing the user loss.
The application discloses a user data analysis method and device based on a bitmap and computer equipment, relates to the field of data processing, and can solve the problems of long analysis time and large storage resource occupation during mass user data analysis. The method comprises the steps of firstly, obtaining a user data query request, wherein the user data query request carries at least one query tag and a logical operation relation between the at least one query tag; searching a first label bitmap corresponding to each query label in at least one query label in a label bitmap database, wherein each binary bit in the first label bitmap is used for indicating whether a user has the query label or not; determining a target user identifier meeting a user data query request based on a first label bitmap, a logical operation relation among at least one query label and a preset mapping relation table, wherein the mapping relation table is used for representing the mapping relation between each user identifier and each binary bit position in the label bitmap; and analyzing according to the target user identification to obtain a user data analysis result. According to the technical scheme, when massive user data are analyzed based on the bitmaps, on one hand, the storage space can be greatly saved by utilizing the bitmaps to store the massive user data, and on the other hand, the real-time analysis speed can be improved by utilizing parallel operation among the bitmaps.
Further, as a refinement and an extension of the specific implementation of the foregoing embodiment, in order to fully illustrate the specific implementation process in this embodiment, another bitmap-based user data analysis method is provided, as shown in fig. 2, and the method includes:
201. acquiring an original user data table, wherein the original user data table comprises a user identifier and at least one label corresponding to the user identifier; grouping original user data tables according to at least one label to obtain at least one first data table, wherein a user identification set corresponding to the label is stored in the first data table;
for this embodiment, as an optional implementation manner, the original user data table is used to store a user identifier and a plurality of tags corresponding to the user identifier, where the user identifier is used to uniquely identify the user, and when the user registers, the system may use the user mobile phone number or a string of symbols randomly allocated to identify the user, because it needs to further use a bitmap to store the user identifier, so that the user identifier is set to a positive integer starting from 1 in advance, for example, user 1, user 2, and user 3 …. The labels are used for representing user characteristics, such as age, occupation, whether the user is a new user or not, and the like, and are arranged in an original user data table, when the user needs to be inquired, the inquiry labels are selected from the labels, each user corresponds to a plurality of labels, and correspondingly, because a label bitmap corresponding to each label is further created, and a plurality of user identifications corresponding to the labels are stored in each label bitmap, the original user data table is grouped by the labels to obtain a user identification set corresponding to each label, and the user identification set is combined into a first data table.
202. Converting the first data table into a label bitmap with a first preset length, and creating a mapping relation between each user identifier and each binary bit position in the label bitmap, so that each binary bit in the label bitmap represents whether a user corresponding to one user identifier has a label or not; and storing the label bitmap into a label bitmap database.
A Bitmap (Bitmap) is a data structure representing a dense set (dense set) in a finite field, with each element appearing at least once and no other data associated with the element. The bitmap is a data structure in which a value of an element corresponding to a certain element is marked by one bit, and since data is stored by using the bit as a unit, the storage space can be greatly saved. For example, using a 4-byte integer to store a subscriber identity would take up 32 bits (bit) of memory space, and mapping it to a single bit would reduce the memory space by a significant amount, taking up only the previous 1/32 memory space, and a massive data set would require only a very small amount of server memory resources.
For the present embodiment, each tag corresponds to a tag bitmap, each tag bitmap comprising a plurality of bins, each bin being labeled with a left-to-right subscript, for example a first bin index 1 starting from the left, a second bin index 2 starting from the left, so as to establish the mapping relationship between the user identifier and the subscript, wherein the mapping relationship is that the user 1 corresponds to the subscript 1, the user 2 corresponds to the subscript 2, each binary digit stores a value of 0 or 1, wherein 0 indicates that the user does not have a label corresponding to the label bitmap, 1 indicates that the user has a label corresponding to the label bitmap, for example, a 1 is placed at the subscript 1 of the tag bitmap corresponding to the tag 1 to indicate that the user corresponding to the user 1 exists the tag 1, a 0 is placed at the subscript 2 of the tag bitmap corresponding to the tag 1 to indicate that the user corresponding to the user 2 does not exist the tag 1, and creating a mapping relationship between each user identifier and each binary bit position in the tag bitmap specifically includes: each binary bit of the tag bitmap may be initially set to 0, each user identifier included in the first data table may be determined, each binary bit of the tag bitmap may be traversed, and a binary bit having a same subscript as the user identifier and the tag bitmap may be set to 1.
Converting the first data table into a label bitmap with a first preset length specifically includes: each binary bit is of a fixed same length, the first preset length is used for determining the number of user identifiers which can be stored in the label bitmap, the number of user identifiers can be stored to be equal to a value obtained by dividing the first preset length by the fixed same length of each binary bit, preferably, the first preset length of each label bitmap can be determined by determining the maximum value of all the user identifiers, the number of user identifiers which can be stored in each label bitmap is set to be the maximum value of the user identifiers, and then the first preset length of each label bitmap can be determined, and the first preset length of each label bitmap is the same.
For this embodiment, as an optimal mode, the tag bitmap database is used to store tag bitmaps, and when there is a new tag, a tag bitmap corresponding to the new tag is created, where the specific implementation mode is as follows: acquiring a new label and a user identification set corresponding to the new label; generating a label bitmap of a first preset length corresponding to the newly added label according to the user identification set; and updating and storing the label bitmap to the label bitmap database.
203. Acquiring a user data query request, wherein the user data query request carries at least one query tag, and when the data query request carries a plurality of query tags, the data query request also carries a logical operation relationship among the plurality of query tags;
for a specific process, reference may be made to the related description in step 101 of the embodiment, and details are not described herein.
204. And searching a first label bitmap corresponding to each query label in at least one query label in the label bitmap database, wherein each binary bit in the first label bitmap is used for indicating whether the query label exists in one user or not.
For a specific process, reference may be made to the related description in step 102 of the embodiment, and details are not described here.
205. Determining a target user identifier meeting a user data query request based on a first label bitmap, a logical operation relation among a plurality of query labels and a preset mapping relation table, wherein the preset mapping relation table is used for representing the mapping relation between each user identifier and each binary bit position in the label bitmap;
for a specific process, reference may be made to the related description in step 103 of the embodiment, which is not described herein again.
206. Acquiring a first target user identifier of a first time period and a second target user identifier of a second time period, wherein the second time period is later than the first time period; calculating the intersection of the first target user identification and the second target user identification to obtain a reserved user; and determining the user attribute information and the user behavior information of the retained user as a user data analysis result.
For this embodiment, as an optimal mode, the retained user is a user who performs a subsequent behavior after performing an initial behavior, and therefore the retained user is determined by the same behavior in two time periods before and after, for example, user 1, 2, 3, 4 browsing a web page in 6 th month 1, and user 3, 4, 5, 6 browsing a web page in 7 th month 1, so that the retained user is user 3, 4, the user data analysis result may be focused on the analysis of the user attribute information and the user behavior information of the retained user, and the user attribute information represents the user's own basic information and state, for example, the user attribute is divided into basic attributes: age, gender, city, economic attributes: occupation, income, channel attributes: and the channel and the mode of obtaining the customers are used for grouping a plurality of reserved users according to the characteristics, uploading the characteristics of the reserved users to a channel platform, and finding similar users through an algorithm. User behavior information represents the interaction of the user and the product, for example, the user behavior information is classified as an active case: new or old user, active frequency, pay condition: whether it is a paying user. The products can be classified according to the characteristics of the user behavior information, and the products with similar characteristics are formulated.
After obtaining the saved user, determining the user data analysis result further includes: determining the retention number of the retained users and the number of users corresponding to the first target user identification; calculating according to the reserved quantity and the user quantity to obtain a user reserved rate; and determining the user retention rate as a user data analysis result. The user retention rate is obtained by dividing the retention quantity by the user quantity, whether the user is retained is increased or not can be judged through the user retention rate, and if the user is not retained, the marketing strategy can be adjusted in time according to the user loss time and the event causing the user loss.
The application discloses a user data analysis method and device based on a bitmap and computer equipment, relates to the field of data processing, and can solve the problems of long analysis time and large storage resource occupation during mass user data analysis. The method comprises the steps of firstly, obtaining a user data query request, wherein the user data query request carries at least one query tag and a logical operation relation between the at least one query tag; searching a first label bitmap corresponding to each query label in at least one query label in a label bitmap database, wherein each binary bit in the first label bitmap is used for indicating whether a user has the query label or not; determining a target user identifier meeting a user data query request based on a first label bitmap, a logical operation relation among at least one query label and a preset mapping relation table, wherein the mapping relation table is used for representing the mapping relation between each user identifier and each binary bit position in the label bitmap; and analyzing according to the target user identification to obtain a user data analysis result. According to the technical scheme, when massive user data are analyzed based on the bitmaps, on one hand, the storage space can be greatly saved by utilizing the bitmaps to store the massive user data, and on the other hand, the real-time analysis speed can be improved by utilizing parallel operation among the bitmaps. In addition, the scheme utilizes the independent tags to group the users, and the original user groups do not need to be modified when the tags are updated subsequently.
Further, as a specific implementation of the method shown in fig. 1 and fig. 2, an embodiment of the present application provides a user data analysis device based on a bitmap, as shown in fig. 3, the device includes: a first acquisition module 31, a retrieval module 32, a determination module 33, and an analysis module 34;
the first obtaining module 31 may be configured to obtain a user data query request, where the user data query request carries at least one query tag, and when the data query request carries multiple query tags, the data query request also carries a logical operation relationship between the multiple query tags;
a retrieving module 32, configured to retrieve a first tag bitmap corresponding to each query tag in at least one query tag from the tag bitmap database, wherein each binary bit in the first tag bitmap is used to indicate whether a user has the query tag;
the determining module 33 is configured to determine a target user identifier meeting the user data query request based on the first tag bitmap, a logical operation relationship among the plurality of query tags, and a preset mapping relationship table, where the preset mapping relationship table is used to represent a mapping relationship between each user identifier and each binary bit position in the tag bitmap;
and the analysis module 34 may be configured to obtain a user data analysis result according to the target user identifier analysis.
In a specific application scenario, as shown in fig. 4, the system further includes a second obtaining module 35, a grouping module 36, a converting module 37, and a storing module 38;
a second obtaining module 35, configured to obtain an original user data table, where the original user data table includes a user identifier and at least one tag corresponding to the user identifier;
the grouping module 36 may be configured to group the original user data table according to at least one tag to obtain at least one first data table, where a user identifier set corresponding to one tag is stored in the first data table;
a conversion module 37, configured to convert the first data table into a tag bitmap with a first preset length, and create a mapping relationship between each user identifier and each binary bit position in the tag bitmap, so that each binary bit in the tag bitmap indicates whether a user corresponding to one user identifier has a tag;
a storage module 38 operable to store the tag bitmap to a tag bitmap database.
In a specific application scenario, in order to determine the target user identifier, as shown in fig. 4, the determining module 33 includes: a first determination unit 331, a second determination unit 332;
the first determining unit 331, configured to determine, if the data query request carries a query tag, a target user identifier matching the first tag bitmap by using a preset mapping relation table;
a second determining unit 332, configured to, if the data query request carries multiple query tags, perform a logical operation on the first tag bitmap with binary bits as a computing unit according to a logical operation relationship, to obtain a second tag bitmap; and determining the target user identification matched with the second label bitmap by using the mapping relation table.
Accordingly, in a specific application scenario, in order to determine the target user identifier, as shown in fig. 4, the determining module 33 further includes: a cutting unit 333, an arithmetic unit 334, a splicing unit 335, and a third determination unit 336;
the cutting unit 333 is configured to cut the first label bitmap to obtain a first label bitmap with a second preset length, where the second preset length is an integer quotient of the first preset length;
an operation unit 334, configured to perform a logical operation on the first tag bitmap by using the binary bit as a calculation unit according to the logical operation relationship, so as to obtain a second tag bitmap;
the splicing unit 335 may be configured to splice the second label sub-bitmaps according to the cutting order to obtain a second label bitmap;
a third determining unit 336, configured to determine the target user identifier matching the second tag bitmap by using the mapping relation table.
In a specific application scenario, as shown in fig. 4, the apparatus further includes: an update module 39, comprising: second acquiring unit 391, generating unit 392, updating unit 393;
a second obtaining unit 391, configured to obtain a new tag and a user identifier set corresponding to the new tag;
the generating unit 392 may be configured to generate a label bitmap, where the label bitmap corresponds to the new label and has a first preset length, according to the user identifier set;
the updating unit 393 may be configured to update and store the tag bitmap into the tag bitmap database.
In a specific application scenario, a user data analysis result is obtained according to the target user identifier analysis, as shown in fig. 4, the analysis module 34 includes: a first acquisition unit 341, a calculation unit 342, a fourth determination unit 343;
a first obtaining unit 341, configured to obtain a first target user identifier of a first time period and a second target user identifier of a second time period, where the second time period is later than the first time period;
a calculating unit 342, configured to calculate an intersection of the first target user identifier and the second target user identifier to obtain a retained user;
the fourth determining unit 343 may be configured to determine the user attribute information and the user behavior information of the retained user as the user data analysis result.
It should be noted that other corresponding descriptions of the functional units related to the user data analysis device based on bitmap provided in this embodiment may refer to the corresponding descriptions in fig. 1 to fig. 2, and are not repeated herein.
Based on the method shown in fig. 1 to fig. 2, correspondingly, the present embodiment further provides a storage medium, which may be volatile or nonvolatile, and on which computer readable instructions are stored, and when the computer readable instructions are executed by a processor, the bitmap-based user data analysis method shown in fig. 1 to fig. 2 is implemented.
Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, or the like), and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device, or the like) to execute the method of the embodiments of the present application.
Based on the method shown in fig. 1 to fig. 2 and the virtual device embodiments shown in fig. 3 and fig. 4, in order to achieve the above object, the present embodiment further provides a computer device, where the computer device includes a storage medium and a processor; a storage medium for storing a computer program; a processor for executing a computer program to implement the above-described water quality abnormality cause analysis method shown in fig. 1 to 2.
Optionally, the computer device may further include a user interface, a network interface, a camera, Radio Frequency (RF) circuitry, a sensor, audio circuitry, a WI-FI module, and so forth. The user interface may include a Display screen (Display), an input unit such as a keypad (Keyboard), etc., and the optional user interface may also include a USB interface, a card reader interface, etc. The network interface may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), etc.
It will be understood by those skilled in the art that the present embodiment provides a computer device structure that is not limited to the physical device, and may include more or less components, or some components in combination, or a different arrangement of components.
The storage medium may further include an operating system and a network communication module. The operating system is a program that manages the hardware and software resources of the computer device described above, supporting the operation of information handling programs and other software and/or programs. The network communication module is used for realizing communication among components in the storage medium and communication with other hardware and software in the information processing entity device.
Through the above description of the embodiments, those skilled in the art will clearly understand that the present application can be implemented by software plus a necessary general hardware platform, and can also be implemented by hardware.
By applying the technical scheme of the application, compared with the prior art, the application firstly acquires a user data query request, wherein the user data query request carries at least one query tag, and when the data query request carries a plurality of query tags, the data query request also carries a logical operation relationship among the plurality of query tags; searching a first label bitmap corresponding to each query label in at least one query label in a label bitmap database, wherein each binary bit in the first label bitmap is used for indicating whether a user has the query label or not; determining a target user identifier meeting a user data query request based on a first label bitmap, a logical operation relation among at least one query label and a preset mapping relation table, wherein the mapping relation table is used for representing the mapping relation between each user identifier and each binary bit position in the label bitmap; and analyzing according to the target user identification to obtain a user data analysis result. According to the technical scheme, when massive user data are analyzed based on the bitmaps, on one hand, the storage space can be greatly saved by utilizing the bitmaps to store the massive user data, and on the other hand, the real-time analysis speed can be improved by utilizing parallel operation among the bitmaps.
Those skilled in the art will appreciate that the figures are merely schematic representations of one preferred implementation scenario and that the blocks or flow diagrams in the figures are not necessarily required to practice the present application. Those skilled in the art will appreciate that the modules in the devices in the implementation scenario may be distributed in the devices in the implementation scenario according to the description of the implementation scenario, or may be located in one or more devices different from the present implementation scenario with corresponding changes. The modules of the implementation scenario may be combined into one module, or may be further split into a plurality of sub-modules.
The above application serial numbers are for description purposes only and do not represent the superiority or inferiority of the implementation scenarios. The above disclosure is only a few specific implementation scenarios of the present application, but the present application is not limited thereto, and any variations that can be made by those skilled in the art are intended to fall within the scope of the present application.

Claims (10)

1. A bitmap-based user data analysis method is characterized by comprising the following steps:
acquiring a user data query request, wherein the user data query request carries at least one query tag, and when the data query request carries a plurality of query tags, the data query request also carries a logical operation relationship among the plurality of query tags;
searching a first label bitmap corresponding to each query label in the at least one query label in a label bitmap database, wherein each binary bit in the first label bitmap is used for indicating whether the query label exists in one user or not;
determining a target user identifier meeting the user data query request based on the first label bitmap, the logical operation relationship among the plurality of query labels and a preset mapping relationship table, wherein the preset mapping relationship table is used for representing the mapping relationship between each user identifier and each binary bit position in the label bitmap;
and analyzing according to the target user identification to obtain a user data analysis result.
2. The method of claim 1, further comprising:
acquiring an original user data table, wherein the original user data table comprises a user identifier and at least one label corresponding to the user identifier;
grouping the original user data tables according to the at least one label to obtain at least one first data table, wherein the user identification set corresponding to the label is stored in the first data table;
converting the first data table into a label bitmap with a first preset length, and creating a mapping relation between each user identifier and each binary bit position in the label bitmap, so that each binary bit in the label bitmap represents whether a user corresponding to one user identifier has the label or not;
storing the tag bitmap to the tag bitmap database.
3. The method of claim 1, wherein determining a target user identifier satisfying the user data query request based on the first tag bitmap, a logical operation relationship among the plurality of query tags, and a preset mapping relationship table comprises:
if the data query request carries one query tag, determining a target user identifier matched with the first tag bitmap by using the preset mapping relation table;
if the data query request carries a plurality of query tags, performing logical operation on the first tag bitmap by taking binary bits as a computing unit according to the logical operation relation to obtain a second tag bitmap;
and determining the target user identification matched with the second label bitmap by using the mapping relation table.
4. The method according to claim 1, wherein if the data query request carries a plurality of query tags, the determining a target user id satisfying the user data query request based on the first tag bitmap, a logical operation relationship among the plurality of query tags, and a preset mapping relationship table further comprises:
cutting the first label bitmap to obtain a first label sub-bitmap with a second preset length, wherein the second preset length is an integer quotient of the first preset length;
according to the logical operation relation, performing logical operation on the first label sub-bitmap by taking binary bits as a calculation unit to obtain a second label sub-bitmap;
splicing the second label sub-bitmaps according to a cutting sequence to obtain a second label bitmap;
and determining the target user identification matched with the second label bitmap by using the mapping relation table.
5. The method according to any one of claims 1 to 4, further comprising:
acquiring a newly added label and the user identification set corresponding to the newly added label;
generating a label bitmap of a first preset length corresponding to the newly added label according to the user identification set;
and updating and storing the label bitmap to the label bitmap database.
6. The method of claim 1, wherein analyzing the target user identifier results in a user data analysis result, comprising:
acquiring a first target user identifier of a first time period and a second target user identifier of a second time period, wherein the second time period is later than the first time period;
calculating the intersection of the first target user identification and the second target user identification to obtain a retained user;
and determining the user attribute information and the user behavior information of the retained user as a user data analysis result.
7. The method of claim 6, wherein after calculating the intersection of the first target user identifier and the second target user identifier results in a surviving user, further comprising:
determining the retention number of the retained user and the user number corresponding to the first target user identifier;
calculating to obtain a user retention rate according to the retention quantity and the user quantity;
and determining the user retention rate as a user data analysis result.
8. A bitmap-based user data analysis apparatus, comprising:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a user data query request, the user data query request carries at least one query tag, and when the data query request carries a plurality of query tags, the data query request also carries a logical operation relation among the query tags;
the query module is used for querying a first tag bitmap corresponding to each query tag in the at least one query tag in the tag bitmap database, wherein each binary bit in the first tag bitmap is used for indicating whether a user has the query tag or not;
a determining module, configured to determine, based on the first tag bitmap, a logical operation relationship among the plurality of query tags, and a preset mapping relationship table, a target user identifier that meets the user data query request, where the preset mapping relationship table is used to represent a mapping relationship between each user identifier and each binary bit position in the tag bitmap;
and the analysis module is used for analyzing according to the target user identification to obtain a user data analysis result.
9. A storage medium on which a computer program is stored, the program, when executed by a processor, implementing the bitmap based user data analysis method of any one of claims 1 to 7.
10. A computer device comprising a storage medium, a processor and a computer program stored on the storage medium and executable on the processor, wherein the processor implements the bitmap based user data analysis method of any one of claims 1 to 7 when executing the program.
CN202111481124.2A 2021-12-06 2021-12-06 User data analysis method and device based on bitmap and computer equipment Pending CN114328632A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111481124.2A CN114328632A (en) 2021-12-06 2021-12-06 User data analysis method and device based on bitmap and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111481124.2A CN114328632A (en) 2021-12-06 2021-12-06 User data analysis method and device based on bitmap and computer equipment

Publications (1)

Publication Number Publication Date
CN114328632A true CN114328632A (en) 2022-04-12

Family

ID=81049098

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111481124.2A Pending CN114328632A (en) 2021-12-06 2021-12-06 User data analysis method and device based on bitmap and computer equipment

Country Status (1)

Country Link
CN (1) CN114328632A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117251652A (en) * 2023-09-18 2023-12-19 北京数方科技有限公司 Binary data mask label construction query method and device
CN117435756A (en) * 2023-12-18 2024-01-23 云筑信息科技(成都)有限公司 Data processing method for inquiring user retention based on bitmap
CN117251652B (en) * 2023-09-18 2024-04-30 北京数方科技有限公司 Binary data mask label construction query method and device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117251652A (en) * 2023-09-18 2023-12-19 北京数方科技有限公司 Binary data mask label construction query method and device
CN117251652B (en) * 2023-09-18 2024-04-30 北京数方科技有限公司 Binary data mask label construction query method and device
CN117435756A (en) * 2023-12-18 2024-01-23 云筑信息科技(成都)有限公司 Data processing method for inquiring user retention based on bitmap
CN117435756B (en) * 2023-12-18 2024-03-26 云筑信息科技(成都)有限公司 Data processing method for inquiring user retention based on bitmap

Similar Documents

Publication Publication Date Title
CN108846753B (en) Method and apparatus for processing data
CN107506495B (en) Information pushing method and device
CN110019367B (en) Method and device for counting data characteristics
CN109561117A (en) Collecting method and device
JP2021518021A (en) Data processing methods, equipment and computer readable storage media
CN103514209A (en) Method and equipment for generating promotion information of object to be promoted based on object information base
US20190065455A1 (en) Intelligent form creation
CN112835904A (en) Data processing method and data processing device
KR20200121744A (en) Method and device for processing user personal, server and storage medium
CN110569218B (en) Offline modification method and device for EXT file system and storage medium
CN114328632A (en) User data analysis method and device based on bitmap and computer equipment
CN110928917A (en) Target user determination method and device, computing equipment and medium
CN108011936B (en) Method and device for pushing information
CN111522854B (en) Data labeling method and device, storage medium and computer equipment
CN110147381B (en) Information processing method, system and recording medium
CN110189171B (en) Feature data generation method, device and equipment
CN107908724B (en) Data model matching method, device, equipment and storage medium
CN107291923B (en) Information processing method and device
CN114285896B (en) Information pushing method, device, equipment, storage medium and program product
CN115422270A (en) Information processing method and device
CN110827101A (en) Shop recommendation method and device
CN111552715B (en) User query method and device
CN110555053B (en) Method and apparatus for outputting information
CN110471708B (en) Method and device for acquiring configuration items based on reusable components
CN109308299B (en) Method and apparatus for searching information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination