CN115408381A - Data processing method and related equipment - Google Patents

Data processing method and related equipment Download PDF

Info

Publication number
CN115408381A
CN115408381A CN202110593441.7A CN202110593441A CN115408381A CN 115408381 A CN115408381 A CN 115408381A CN 202110593441 A CN202110593441 A CN 202110593441A CN 115408381 A CN115408381 A CN 115408381A
Authority
CN
China
Prior art keywords
user
query
bitmap
data
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110593441.7A
Other languages
Chinese (zh)
Inventor
陈璐
周远远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202110593441.7A priority Critical patent/CN115408381A/en
Publication of CN115408381A publication Critical patent/CN115408381A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Fuzzy Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses and relates to the technical field of computers, in particular to a data processing method and related equipment, wherein the method comprises the following steps: acquiring a query request; analyzing the condition limiting information, and determining at least two query conditions corresponding to the condition limiting information and a Boolean logic relationship between the at least two query conditions; respectively generating a query code corresponding to each query condition in at least two query conditions; the query code is used for carrying out object query in the high-efficiency compression bitmap data table to obtain a first high-efficiency compression bitmap of an object set meeting the corresponding query condition; combining the query codes corresponding to the query conditions in the at least two query conditions according to the Boolean logic relationship between the at least two query conditions to obtain combined codes; the combination code is used for carrying out logical operation on the first high-efficiency compression bitmap according to the Boolean logical relation between at least two query conditions to obtain the target high-efficiency compression bitmap.

Description

Data processing method and related equipment
Technical Field
The present application relates to the field of computer technologies, and in particular, to a data processing method and a related device.
Background
When a user needs to query information in a database system, query information needs to be input, the query information defines conditions of an object to be queried, and if the conditions defined by the query information are complex, the time spent on querying in the database is long, and the query efficiency is low.
Disclosure of Invention
In view of the foregoing problems, embodiments of the present application provide a data processing method and related apparatus to improve the foregoing problems.
According to an aspect of an embodiment of the present application, there is provided a data processing method, including: acquiring a query request, wherein the query request indicates query information, and the query information comprises condition limiting information; analyzing the condition limiting information, and determining at least two query conditions corresponding to the condition limiting information and a Boolean logic relationship between the at least two query conditions; respectively generating a query code corresponding to each query condition in the at least two query conditions; the query code is used for performing object query in a high-efficiency compression bitmap data table to obtain a first high-efficiency compression bitmap of an object set meeting corresponding query conditions, and the high-efficiency compression bitmap data table comprises high-efficiency compression bitmaps of a plurality of initial object sets; the initial object set is determined by classifying objects according to attribute values; combining the query codes corresponding to the query conditions in the at least two query conditions according to the Boolean logic relationship between the at least two query conditions to obtain combined codes; the combination code is used for carrying out logical operation on the first high-efficiency compressed bitmap according to the Boolean logical relation between the at least two query conditions to obtain a target high-efficiency compressed bitmap; the target efficient compressed bitmap is used to determine query results.
According to an aspect of an embodiment of the present application, there is provided a data processing apparatus including: the query request acquisition module is used for acquiring a query request, wherein the query request indicates query information, and the query information comprises condition limiting information; the analysis module is used for analyzing the condition limiting information and determining at least two query conditions corresponding to the condition limiting information and a Boolean logic relationship between the at least two query conditions; the query code generation module is used for respectively generating a query code corresponding to each query condition in the at least two query conditions; the query code is used for carrying out object query in a high-efficiency compression bitmap data table to obtain a first high-efficiency compression bitmap of an object set meeting corresponding query conditions, and the high-efficiency compression bitmap data table comprises high-efficiency compression bitmaps of a plurality of initial object sets; the initial object set is determined by classifying objects according to attribute values; the combination module is used for combining the query codes corresponding to the query conditions in the at least two query conditions according to the Boolean logic relationship between the at least two query conditions to obtain combination codes; the combination code is used for carrying out logical operation on the first high-efficiency compressed bitmap according to the Boolean logical relation between the at least two query conditions to obtain a target high-efficiency compressed bitmap; the target efficient compressed bitmap is used to determine query results.
In some embodiments of the present application, based on the foregoing solution, the combination module includes: the acquisition unit is used for acquiring the high-efficiency compression bitmap function corresponding to the Boolean logic relation according to the Boolean logic relation among the at least two query conditions; and the combining unit is used for combining the query codes corresponding to the query conditions in the at least two query conditions according to the high-efficiency compression bitmap function corresponding to the Boolean logic relationship to obtain the combined codes.
In some embodiments of the present application, based on the foregoing solution, the query information further includes classification statistical information, where the classification statistical information indicates a target information item that needs to be counted according to an attribute value; the data processing apparatus further includes: a classification statistic indication code generation module used for generating a classification statistic indication code according to the classification statistic information; and the updating module is used for updating the combined code according to the classification statistic indication code, and the updated combined code is used for classifying the object set indicated by the target high-efficiency compression bitmap according to the attribute value of the target information item.
In some embodiments of the present application, based on the foregoing solution, the query request includes a query task identifier, where the query task identifier is generated by an initiator of the query request when an input operation for query information is detected; the data processing apparatus further includes: the query information acquisition module is used for acquiring the query information associated with the query task identifier from a specified information table; after the initiator of the query request detects the input operation aiming at the query information, the query task identifier and the detected query information are stored in the specified information table in an associated manner; a combination code storage module for storing the combination code corresponding to the query information and the query task identifier into the specified information table in an associated manner; and the storage indication information returning module is used for returning the storage indication information to the initiator of the query request so that the initiator of the query request acquires and executes the combined code from the specified information table according to the storage indication information.
In some embodiments of the present application, based on the foregoing scheme, the object is a user; the data processing apparatus further includes: the data acquisition module is used for acquiring user operation data and user attribute data, wherein the user operation data is used for indicating the interactive behavior of a user on a user interface of a product; the efficient bitmap compression module is used for performing efficient bitmap compression on a first user identifier corresponding to a user with the same attribute value according to the attribute value of each field in the user operation data and the attribute value of each field in the user attribute data to obtain an efficient compressed bitmap of the corresponding attribute value; and the storage module is used for storing the obtained high-efficiency compression bitmap and the corresponding attribute value into the high-efficiency compression bitmap data table.
In some embodiments of the present application, based on the foregoing scheme, the efficient compression bitmap data table includes a second data table and a third data table; an efficient bitmap compression module comprising: the generating unit is used for generating a user operation table according to the user operation data and generating a user attribute table according to the user attribute data, wherein the user operation table and the user attribute table comprise first user identifications corresponding to all users; a second high-efficiency compression bitmap generation unit, configured to perform high-efficiency bitmap compression on the first user identifier corresponding to the same attribute value in the user operation table according to the attribute value of each field in the user operation table, to obtain a second high-efficiency compression bitmap associated with the corresponding field and the corresponding attribute value; the third efficient compressed bitmap generation unit is used for performing efficient bitmap compression on the first user identification corresponding to the same attribute value in the user attribute table according to the attribute value of each field in the user attribute table to obtain a third efficient compressed bitmap associated with the corresponding field and the corresponding attribute value; in this embodiment, the storage module includes: a first storage unit to store the second efficient compressed bitmap and the corresponding fields and corresponding attribute values in the second data table; a second storage unit, configured to store the third efficient compressed bitmap, the corresponding fields, and the corresponding attribute values in the third data table.
In some embodiments of the present application, based on the foregoing scheme, the user operation data and the user attribute data include a user identifier; the data processing apparatus further includes: a first user identifier generating unit, configured to generate, if the user identifier is in a format that does not support efficient bitmap compression, a first user identifier corresponding to each user identifier according to a specified format that supports efficient bitmap compression; in this embodiment, the generating unit includes: the user operation table generating unit is used for generating the user operation table according to a first preset field, the user operation data and first user identifications corresponding to the user identifications; and the user attribute table generating unit is used for generating the user attribute table according to a second preset field, the user attribute data and the first user identification corresponding to each user identification.
In some embodiments of the present application, based on the foregoing solution, the data processing apparatus further includes: the accumulation module is used for accumulating the first user identification to obtain an accumulated number; the first user identifier acquisition module is used for acquiring a first user identifier corresponding to a user with the longest non-active time length if the accumulated number reaches a set number threshold; and the allocation module is used for allocating the first user identifier corresponding to the user with the longest non-updating time to the next user to be generated with the first user identifier.
In some embodiments of the present application, based on the foregoing scheme, the efficient compression bitmap data table further includes a user group data table; the data processing apparatus includes: the user identification set determining module is used for determining a user identification set corresponding to a user group according to user identifications corresponding to users in the user group; a first user identifier set determining module, configured to determine, according to a mapping relationship between a user identifier and a first user identifier, a first user identifier corresponding to each user identifier in the user identifier set, to obtain a first user identifier set corresponding to the user group; the compression module is used for carrying out efficient bitmap compression on the first user identification in the first user identification set to obtain an efficient compression bitmap corresponding to the user group; and the user group data table storage module is used for storing the high-efficiency compression bitmap corresponding to the user group and the user group identification corresponding to the user group in the user group data table.
In some embodiments of the present application, based on the foregoing scheme, the query request includes at least one of an event analysis request, a retention analysis request, a funnel analysis request, and a user path analysis request.
According to an aspect of an embodiment of the present application, there is provided an electronic device including: a processor; a memory having computer readable instructions stored thereon which, when executed by the processor, implement a data processing method as described above.
According to an aspect of embodiments herein, there is provided a computer readable storage medium having stored thereon computer readable instructions, which, when executed by a processor, implement a data processing method as described above.
In the scheme of the application, the limited information corresponding to the query request is split into at least two query conditions, the query codes corresponding to each query condition are correspondingly generated, the query codes corresponding to the query conditions are respectively executed, the first high-efficiency compressed bitmap meeting the corresponding query conditions is obtained, then the query codes are combined according to the Boolean logic relationship between the two query conditions, so that the obtained combined code can perform Boolean logic operation on the first high-efficiency compressed bitmap respectively corresponding to the at least two query conditions according to the Boolean logic relationship between the at least two query conditions, and the target high-efficiency compressed bitmap of the object set meeting the conditions limited by the condition limited information is obtained. According to the scheme, the conditions defined by the condition limiting information are split, the block query is carried out according to the corresponding query codes on the basis that the conditions defined by the condition limiting information are split into at least two query conditions, and compared with the mode that the conditions defined by the condition limiting information are directly queried, the block query mode reduces the query difficulty and complexity and improves the query efficiency.
Moreover, in the scheme, the data is subjected to high-efficiency bitmap compression in advance and stored in the high-efficiency compressed bitmap data table, the data storage pressure is greatly reduced, in the data query process, the logical operation is performed based on the high-efficiency compressed bitmap in the high-efficiency compressed bitmap data table, compared with the data table operation, the logical operation on the high-efficiency compressed bitmap is faster in operation speed and higher in efficiency, and therefore the data query efficiency can be improved on the whole.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
Fig. 1A and 1B are schematic diagrams showing an implementation environment to which the technical solution of the embodiment of the present application can be applied.
Fig. 2 is a flow diagram illustrating a data processing method according to one embodiment of the present application.
Fig. 3 is a flowchart illustrating a data processing method according to another embodiment of the present application.
FIG. 4 is a flow diagram illustrating the generation of an efficient compressed bitmap data table according to one embodiment of the present application.
FIG. 5 illustrates a system architecture diagram suitable for implementing embodiments of the present application.
FIG. 6 is a flow chart illustrating a data processing method according to an embodiment of the present application.
FIG. 7 is an interface diagram illustrating new event analysis, according to an embodiment.
FIG. 8 is an interface diagram of a newly created retention analysis according to an embodiment of the application.
FIG. 9 is an interface diagram illustrating a newly created funnel analysis according to one embodiment of the present application.
Fig. 10A-10C are schematic interface diagrams of a new user group according to an embodiment of the application.
Fig. 11 is a block diagram of a data processing apparatus according to an embodiment of the present application.
FIG. 12 illustrates a schematic structural diagram of a computer system suitable for use to implement the electronic device of the embodiments of the subject application.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the subject matter of the present application can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the application.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
It should be noted that: reference herein to "a plurality" means two or more. "and/or" describes the association relationship of the associated object, indicating that there may be three relationships, for example, a and/or B may indicate: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
Roaring bitmap: namely, the high-efficiency bitmap compression, is a high-efficiency bit storage method, and the main idea is as follows: when data is stored in 32-bit unsigned integers with 16 upper bits in buckets (there may be 216=65536 buckets (container), the container is found with the 16 upper bits of the data, and the 16 lower bits are put into the container.
Fig. 1A and 1B are schematic diagrams showing an implementation environment to which the technical solution of the embodiment of the present application can be applied.
As shown in fig. 1A, the implementation environment includes a first device 110 and a second device 120, wherein the first device 110 is communicatively connected to the second device 120, and may be a wireless or wired network connection. The first device 110 may be a terminal device such as a tablet computer, a notebook computer, and a desktop computer, the second device 120 may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server that provides basic cloud computing services such as cloud service, cloud computing, cloud storage, middleware service, and big data, which are not specifically limited herein.
The first device 110 may run a client program that performs data analysis, which is served by the second device 120. The first device 110 may initiate a query request to the second device based on a user interface provided by the client program, and after obtaining the query information indicated by the query request, the second device 120 may perform data query according to the method of the present application, obtain a target high-efficiency compressed bitmap of an object that satisfies a defined condition in the query information, and further determine a query result for the query request according to the target high-efficiency compressed bitmap. In some embodiments of the present application, the second device may return the target efficient compressed bitmap as a query result to the first device.
Since the second device 120 performs the data query based on the high efficiency compressed bitmap in the high efficiency compressed bitmap set, the second device 120 or a database accessible to the second device 120 stores the high efficiency compressed bitmap data table. The high-efficiency compression bitmap data table comprises high-efficiency compression bitmaps of a plurality of initial object sets; the initial object set is determined by classifying the objects according to the attribute values. Generally, corresponding objects are represented by object identifiers, so that an initial object set can be understood as a set of object identifiers, and furthermore, an efficient compression bitmap can be understood as being obtained by performing efficient bitmap compression on the object identifiers in the initial object set.
The object may be a user of the software product, such as a wechat user or a user of interest of a public number, and the object may also be a device, such as a device (e.g., an electric meter, a transformer, etc.) in the power grid system, a node device in the blockchain system, a server in the server cluster, and the like.
Fig. 1B is a schematic diagram of another implementation environment according to an embodiment of the present application, and compared to fig. 1A, fig. 1B further includes a plurality of third devices 130 communicatively connected to the second device 120. The third device 130 is configured to provide a data basis for data processing, that is, object data related to an object, where if the object is a user of a software application, the third device may be a terminal device of a client where the user is located, or an application server of the software application; if the object is a server in a server cluster, the third device 130 may be a server in the server cluster.
The third device 130 may report object data of the object to the second device 120, and the second device 120 classifies the object data of each object according to the attribute value of the object, determines an initial object set corresponding to each attribute value, and performs high-efficiency bitmap compression on an object identifier in the initial object set to obtain a corresponding high-efficiency compressed bitmap. On the basis, the second device 120 may perform query according to the obtained high-efficiency compressed bitmap corresponding to each initial object set, or perform object analysis based on the high-efficiency compressed bitmap, or perform condition monitoring analysis on the object.
In some embodiments of the present application, the second device 120 may also be a server in a cloud platform, and a plurality of second devices form the cloud platform, and the cloud platform may serve as a big data analysis platform to provide basic cloud computing services such as cloud service, cloud computing, cloud storage, middleware service, big data, and the like. In this scenario, the second device 120 may acquire object data from a plurality of third devices 130, and then perform a statistical analysis based on the object data, for example, if the acquired object data is behavior data of a user in a software product, the statistical analysis performed may be a user retention analysis, an event analysis of the user in the software, and the like, which is not limited in this respect.
In some embodiments of the present application, the scheme of the present application may be applied to a blockchain system, where the third device 130 may serve as a node in the blockchain system, the second device 120 serves as an analysis node in the blockchain system, operation data such as an operation log of each third device 130 is uploaded to the analysis node, and the analysis node analyzes and monitors operation conditions of other nodes, so as to find a node having a fault or an abnormal node in the blockchain system in time.
It should be noted that the data processing method provided in the present application is generally executed by the second device 120. However, in other embodiments of the present application, a terminal device having data processing capability may also have similar functions as the second device 120, so as to execute the data processing method provided by the present application.
The implementation details of the technical solution of the embodiment of the present application are set forth in detail below:
fig. 2 shows a flowchart of a data processing method according to an embodiment of the present application, where the method may be executed by a computer device with processing capability, such as a server, a terminal device, or the like, and may also be a system composed of the terminal device and the server, which is not limited in detail herein. Referring to fig. 2, the method includes at least steps 210 to 240, which are described in detail as follows:
step 210, a query request is obtained, where the query request indicates query information, and the query information includes condition defining information.
The condition defining information is used for defining a condition of an object to be queried, and in a specific embodiment, the condition defining information may define an attribute value corresponding to the object, for example, if the object to be queried is a software product a and the age range of a user is 18 to 28 years old, the condition defined by the condition defining information includes "using the software product a" and the age range is 18 to 28 years old ", and further, if the software product a includes multiple versions, the version number and the like of the software product a may be further specified.
An object may be related to multiple items of information, and if an item of information is referred to as an information item, each object has a corresponding value under an information item, and the value is referred to as an attribute value of the object under the information item, for example, in the above example, the age may be referred to as an information item, "18-28 years" is an attribute value range of the user under the information item of the age, and the software product used may be referred to as an information item, "software product a" is an attribute value of the user under the information item of the software product.
The condition defined by the condition defining information may be that by making an information item designation and defining an attribute value or an attribute value range of the designated information item, the information item designated in the condition defining information may include a plurality.
The condition limiting information may be a search formula input by a user, or may be content input by the user in each query option in the user interface, where the query option is used for specifying an information item, and the input content is used for limiting an attribute value or an attribute value range of the corresponding information.
Step 220, analyzing the condition limiting information, and determining at least two query conditions corresponding to the condition limiting information and a boolean logic relationship between the at least two query conditions.
In step 220, the condition defined by the condition defining information is split into at least two query conditions by parsing the condition defining information, thereby simplifying the data query process.
In an application scenario where the condition defining information specifies an attribute value or an attribute value range of an information item, in step 220, at least two query conditions are obtained by recombining a plurality of information items defined in the condition defining information, and a boolean logic relationship between the at least two query conditions is correspondingly determined. The Boolean logical relationship may include a logical AND, a logical OR, a logical NOT, a logical NOR, a logical NAND, etc.
Wherein, each query condition comprises an attribute value or an attribute value range of at least one specified item in the condition definition information. In some embodiments of the present application, there may also be a common constraint condition in at least two query conditions, that is, each query condition includes a constraint that is different from other query conditions, and the common constraint condition is also included.
Step 230, respectively generating a query code corresponding to each of the at least two query conditions; the query code is used for performing object query in the high-efficiency compression bitmap data table to obtain a first high-efficiency compression bitmap of an object set meeting corresponding query conditions, and the high-efficiency compression bitmap data table comprises high-efficiency compression bitmaps of a plurality of initial object sets; the initial object set is determined by classifying the objects according to the attribute values.
The object data corresponding to the object may include multiple items of information, where each item of information may be regarded as an information item, and for each information item, a field may be used to represent a value of the field corresponding to the information item, which is referred to as an attribute value, that is, an attribute value corresponding to the information item.
Attribute values of a plurality of information items may be included in the object data, that is, information items associated with an object may include a plurality, for example, for a user of a product, the user-related information items may include user age, gender, registration time, product, control triggered in the product, time for triggering the control, and the like.
And according to the attribute value corresponding to each information item in the object data, enabling the objects corresponding to the same attribute value to correspond to the same initial object set, thereby obtaining a plurality of initial object sets. It is understood that an object may be identified by an object identification, and thus, an initial set of objects may be understood as a set of object identifications corresponding to several objects having the same attribute value. Furthermore, the high-efficiency bitmap of the initial object set is obtained by high-efficiency bitmap compression of the object identifications in the initial object set. The efficient compressed bitmap may also be referred to as a bitmap element, and thus, an efficient compressed bitmap of an initial object set refers to a bitmap element generated from object identifications in the initial object set.
It is worth mentioning that, because the high-efficiency bitmap compression has a data format limitation, only unsigned integer data (e.g. Uint16 (unsigned 16-bit integer), uint32 (unsigned 32-bit integer)) can perform the high-efficiency bitmap compression, so that before performing the high-efficiency bitmap compression on the object identifier in the initial object set, if the object identifier is not in a format supporting the high-efficiency bitmap compression, the format conversion needs to be performed on the object identifier, so that the format-converted object identifier supports the high-efficiency bitmap compression; or regenerating a new object identifier supporting the efficient bitmap compression format so as to perform efficient bitmap compression on the new object identifier corresponding to the initial object set.
The attribute value referred to in the process of dividing the initial object set may be an attribute value of one information item, or may be an attribute value of a plurality of information items, and may be specifically set according to actual needs.
In some embodiments of the present application, in order to facilitate performing data statistical analysis after data query, objects may be classified according to attribute values of a plurality of set information items, where the set information items may be set according to actual needs, or information items corresponding to query options are queried in a query interface, for example, in the query interface, the query options include query options for specifying an event, and then classification may be performed according to the attribute values of the information items corresponding to the query options to determine an initial object set, which not only provides a data base for data query, but also ensures utilization efficiency of a high-efficiency compressed bitmap corresponding to the initial object set, and avoids storing too many high-efficiency compressed bitmaps in a high-efficiency compressed bitmap data table.
The efficient compressed bitmap data table is used for storing efficient compressed bitmaps of each initial object set, and the initial object sets are determined by classifying objects according to the attribute values, so that the same attribute values corresponding to the initial object sets and the efficient compressed bitmaps of the initial object sets can be stored in an associated mode, and the efficient compressed bitmaps can be conveniently obtained based on the corresponding attribute values in the query process.
The first high-efficiency compression bitmap refers to a high-efficiency compression bitmap of an object identification corresponding to an object meeting the query condition. On the basis of constructing the high-efficiency compressed bitmap of the initial object set, boolean logic operation can be performed on the high-efficiency compressed bitmap of each initial object set according to the attribute values of all the information items and the Boolean logic relationship among all the information items defined in the query condition, so that the high-efficiency compressed bitmap of the object identification corresponding to the object meeting the query condition is obtained.
After the query codes corresponding to each query condition are generated respectively, the first efficient compression bitmaps with the same quantity as the query conditions can be obtained by executing the query codes corresponding to the query conditions.
In some embodiments of the present application, since the efficient compressed bitmap of the initial object set is stored in the efficient compressed bitmap data table, the Query code may be generated according to syntax rules of database system-oriented SQL (Structured Query Language).
Step 240, combining the query codes corresponding to each query condition in the at least two query conditions according to the boolean logic relationship between the at least two query conditions to obtain a combined code; the combination code is used for carrying out logical operation on the first high-efficiency compressed bitmap according to the Boolean logical relation between at least two query conditions to obtain a target high-efficiency compressed bitmap; the target efficient compressed bitmap is used to determine query results.
The obtained combined code can call and execute a first high-efficiency compressed bitmap obtained by the query code corresponding to each query condition on one hand, and can perform Boolean logic operation on the first high-efficiency bitmap corresponding to each query condition in at least two query conditions according to the Boolean logic relationship between the at least two query conditions on the other hand.
In some embodiments of the present application, step 240 may comprise: obtaining a high-efficiency compression bitmap function corresponding to the Boolean logic relation according to the Boolean logic relation among at least two query conditions; and combining the query codes corresponding to the query conditions in the at least two query conditions according to the high-efficiency compressed bitmap function corresponding to the Boolean logic relationship to obtain combined codes.
It can be understood that, for the combination of the query codes, a function for performing a logical operation on the high-efficiency compressed bitmap is constructed in advance, and the function for performing the logical operation on the high-efficiency compressed bitmap is referred to as a high-efficiency compressed bitmap function.
Examples of high efficiency compression bitmap functions are the bitmapAnd, bitmapOr, groupBitmapAnd, groupBitmapOr, bitmapAndCardiality, groupBitmapOrState, groupBitmapAndState, groupBitmapXor functions. The bitmap and function is to AND two bitmap objects (high-efficiency compressed bitmaps) and return a new bitmap object; from the operation point of view, the AND operation is carried out according to the bit, and from the SQL point of view, the conditions of the two parts are both satisfied. The bitmapOr function is used for performing bit-wise OR operation on two bitmap objects, and the function of returning a new bitmap object groupBitmapAnd is used for taking intersection of a plurality of bitmap objects and returning the number of elements in the bitmap object obtained after intersection. The groupBitmapOr function is to take and aggregate a plurality of bitmap objects and remove duplicates, returning the number of elements in the obtained bitmap objects. The bitmapandandcardinality function is the anding of two bitmap objects, returning the cardinality of the result bitmap (i.e., the number of elements in the result bitmap). The groupbitmaporastate function is used to or-operate multiple bitmap objects, returning the resulting bitmap object. The groupBitmapAndState function is used to and a plurality of (two or more) bitmap objects and return the resulting bitmap object. The groupBitmapXor function is used to remove the repeated values of any two of the bitmap objects and merge the other values (every two bitmap objects are operated once) to get the final number. Of course, the above is merely an illustrative example of an efficient compression bitmap function and should not be considered as limiting the scope of use of the present application.
And calling a first high-efficiency compression bitmap obtained by executing each query code in the process of executing the combined code, and performing corresponding logical operation on the first high-efficiency compression bitmap based on a corresponding high-efficiency bitmap compression function in the combined code to obtain a corresponding target high-efficiency compression bitmap. It will be appreciated that the efficiently compressed bitmap is an efficiently compressed bitmap of object identifications corresponding to all objects that satisfy the conditions defined in the query information.
In the scheme of the application, the limited information corresponding to the query request is split into at least two query conditions, query codes corresponding to each query condition are correspondingly generated, the query codes corresponding to the query conditions are respectively executed, first efficient compressed bitmaps meeting the corresponding query conditions are obtained, then the query codes are combined according to the Boolean logic relationship between the two query conditions, so that the obtained combined codes can perform Boolean logic operation on the first efficient compressed bitmaps corresponding to the at least two query conditions respectively according to the Boolean logic relationship between the at least two query conditions, and the target efficient compressed bitmaps of the object set meeting the conditions limited by the condition limited information are obtained. According to the scheme, the conditions defined by the condition limiting information are split, the block query is carried out according to the corresponding query codes on the basis that the conditions defined by the condition limiting information are split into at least two query conditions, and compared with the mode that the conditions defined by the condition limiting information are directly queried, the block query mode reduces the query difficulty and complexity and improves the query efficiency.
Moreover, in the scheme, the data is subjected to efficient bitmap compression in advance and stored in the efficient compression bitmap data table, so that the data storage pressure is greatly reduced, in the data query process, the logical operation is performed based on the efficient compression bitmap in the efficient compression bitmap data table, and compared with the data table operation, the logical operation on the efficient compression bitmap is faster in operation speed and higher in efficiency, so that the data query efficiency can be improved on the whole.
In some embodiments of the present application, the query information further includes classification statistical information indicating a target information item that needs to be counted according to the attribute value; in this embodiment, the data processing method further includes: generating a classification statistic indication code according to the classification statistic information; after step 240, the data processing method further includes: and updating the combined code according to the classification statistic indication code, wherein the updated combined code is used for classifying the object set indicated by the target high-efficiency compression bitmap according to the attribute value of the target information item.
In some application scenarios, after an object set meeting the conditions defined by the query information is queried, data classification statistics further needs to be performed according to the attribute values.
For example, when the number of users who register the product B on the day of the date a is to be known, the number of registered users in each area on the day of the date a is further to be known, and therefore, the user who registers the product B on the day of the date a is classified by the area to which the user belongs, and the number of users who register the product B in each area on the day of the date a is counted by the classification statistical information. In this example, the information item that needs attribute value expansion, i.e., the area to which the user belongs.
The classification statistic indication code may also be SQL code, as well as query code. By incorporating the query code into the combination code, the code after this combination is obtained.
And updating the combined code through the classified statistic indication code, so that the updated combined code can be executed to obtain a target high-efficiency compressed bitmap of a target object set meeting the condition limiting information on one hand, and obtain a high-efficiency compressed bitmap of a sub-object set corresponding to each attribute value in the target object set on the other hand. And then, by means of a function for counting the base number of the bitmap object in the high-efficiency compression bitmap function, the number of the objects meeting the condition information and the number of the objects in the sub-object set corresponding to each attribute value in the target object set can be correspondingly obtained.
In some embodiments of the application, after the number of objects satisfying the condition information and/or the number of objects in the sub-object set corresponding to each attribute value in the target object set is obtained, visual display can be performed, so that a user can intuitively know the query result conveniently.
Fig. 3 is a flowchart illustrating a data processing method according to an embodiment of the present application, and in the embodiment illustrated in fig. 3, the data processing method is implemented by a client and a server interactively.
Referring to fig. 3, the client first performs step 310 to obtain the query information. An interactive interface for query information input is provided in the client, in which a user can input conditions for defining an object to be queried. Further, in the interactive interface, the user can perform designation of an information item and designation of an attribute value of the information item, for example, in fig. 3, the user can perform designation of an event (information item) and designation of an attribute value of the event through a control of "+ add event", the control of "+ add filter condition" designates other information items and designates attribute values of the other information items; the user may also specify the attribute value (i.e., the specific date) corresponding to the information item "date" in the "date selected" option. By detecting the operation triggered by the user in the interactive interface, the query information input by the user can be correspondingly determined.
In some embodiments, the query information may be information indicating controls triggered by the user in the interactive interface and inputs in the options. The server can determine the information item designated by the user and the designated attribute value according to the control triggered by the user in the query information and the input in the option, and further determine the condition of the defined object.
At step 320, a query task identifier is generated. And after detecting a new input operation aiming at the query, the client generates the task identifier. And storing the query task identifier and the query information in a specified information table in an associated manner. The client and the server share the specified information table, so that both the client and the server can access the specified information table.
Step 330, a query request is sent. And the client sends the query request to the server based on the generated query task identifier, wherein the query request carries the query task identifier.
After the server receives the query request, step 340 is performed to read the query information. Specifically, the server side obtains query information associated with the query task identifier from the specified information table according to the query task identifier carried in the query request. For example, the query information is obtained by querying from the specified information table according to the query task identification by get _ task _ meta.
Step 350, generating a combined code. After the server obtains the query information, the server may generate a combination code according to the processes of steps 210 to 240 of the above embodiment; further, if the query information further includes classification statistical information, a classification statistical indication code is generated according to the classification statistical information, and the combination code is updated according to the classification statistical indication code.
After the combination code is generated, the server stores the combination code corresponding to the query information and the query task identifier in a specified information table in an associated manner, and executes step 360 to return storage indication information. The logging indication information is used for feeding back the state of successful logging of the combination code into the specified information table.
After the client receives the logging indication information, step 370 is executed to read the combination code, and read the combination code associated with the query task identifier from the specified information table according to the query task identifier. In a specific embodiment, a program for reading the combination code from the specified information table, for example, get _ task _ sql, is deployed in the client, so as to perform step 370 by running the program.
Step 380, executing the combined code. By executing the combination code, a target efficient compressed bitmap satisfying the definition information can be obtained; furthermore, if the combination code is combined with a classification statistic indication code, the object set indicated by the target high-efficiency compression bitmap can be further classified according to the attribute value of the target information item, and the high-efficiency compression bitmap/or the number of objects corresponding to each attribute value of the target information item can be counted.
Step 390, page rendering. The client can perform visual display of query results by performing page rendering, where the query results include the number of elements in a target efficient compression bitmap, that is, the number of objects satisfying condition definition information, and the number of objects corresponding to each attribute value in a target information item in the objects satisfying the condition definition information.
In this embodiment, the client executes the combined code generated by the server to obtain the query result satisfying the query information, so as to reduce the service pressure of the server, and since the combined code executed by the client performs a logical operation based on the high-efficiency compressed bitmap in the high-efficiency compressed bitmap data table, the amount of operation is greatly reduced compared with the data table that is not efficiently compressed, and therefore, the client with limited processing capability can also execute the combined code, thereby ensuring the speed and efficiency of data query.
The scheme of the application can be applied to an analysis platform, in which the method of the application can be used to perform retention analysis, event analysis, user group analysis, funnel analysis, and the like for a user of a software application, that is, the query request in the application can be initiated for the retention analysis, initiated based on the user group analysis, initiated based on the funnel analysis, or initiated based on the event analysis, in other words, the query request can be at least one of an event analysis request, a retention analysis request, and a funnel analysis request user group analysis request.
The retention analysis is an analysis model for analyzing the user participation/activity degree, and some people may perform subsequent behaviors in the user who performs the initial behavior. Event analysis refers to user behavior or business processes tracked or recorded by researching user behavior events, such as user registration, browsing product detail pages, placing orders, payment and the like, and mining reasons, interaction influences and the like behind the user behavior events by researching all factors associated with the events. The user group analysis refers to the analysis of user behaviors of users in a group, wherein the users with the same attribute are divided into one group through the attributes such as historical behavior paths, behavior characteristics, preference and the like of the users, and the subsequent analysis is carried out. The funnel analysis is used for analyzing the conversion and loss of the user behavior path in each step, carrying out thinning multi-dimensional analysis on the loss-compared multi-paths and finding out the leakage point promotion conversion. Funnel analysis is a set of flow data analysis that can scientifically reflect the behavior of the user and the transition and loss of each stage from the starting point to the end point.
In this application scenario, the objects in the above embodiment are users of products, such as users of news applications, users of video applications, users of shopping applications, and the like, and correspondingly, the initial object set is a user set determined by classifying users according to attribute values, where the user set includes user identifiers corresponding to users belonging to the same attribute values; the high-efficiency bitmap of the initial object set is obtained by high-efficiency bitmap compression of the user identifications in the initial object set.
In this application scenario, generally, the data related to the user includes user operation data of a dynamic behavior of the user in the product and user attribute data representing basic information of the user, and the user may be classified by combining the user operation data and the user attribute data, so as to determine an efficient compression bitmap of each initial object set (which may be regarded as an initial user set), and use the efficient compression bitmap in the efficient compression bitmap set as a data basis for performing analysis (e.g., retention analysis, event analysis).
In some embodiments of the present application, as shown in fig. 4, before step 220, an efficient compression bitmap data table needs to be generated in advance, which may specifically include the following processes of steps 410 to 430.
In step 410, user operation data and user attribute data are obtained, wherein the user operation data is used for indicating the interaction behavior of the user on the user interface of the product.
By setting the embedded points and issuing the corresponding embedded points to the client where the user is located, when the user triggers an operation for the embedded points in the user interface of the client, the embedded points report control information (such as control identification, position information of the control in a page, and the like) of the control triggered by the user to the server, and data reported by the client where the user is located for the embedded points can be used as user operation data.
In some embodiments of the present application, the user operation data may be collected for one or more software applications, in other words, the user operation data may indicate an interactive behavior performed on a user interface of one product or may indicate an interactive behavior performed on user interfaces of multiple products.
The user attribute data may include registration information of the user in the software application, such as user age, year and month of birth, location, registration time, user nickname, constellation, and the like.
And step 420, performing high-efficiency bitmap compression on the first user identification corresponding to the users with the same attribute value according to the attribute value of each field in the user operation data and the attribute value of each field in the user attribute data to obtain a high-efficiency compressed bitmap of the corresponding attribute value.
Step 430, storing the obtained high-efficiency compression bitmap and the corresponding attribute values in a high-efficiency compression bitmap data table.
In some embodiments of the present application, the data table may be constructed according to the attribute value of each field in the user operation data and the attribute value of each field in the user attribute data. In this embodiment, the high efficiency compressed bitmap data table includes a second data table and a third data table; in the present embodiment, please continue to refer to fig. 4, wherein step 420 further includes steps 421-424.
Step 421, generating a user operation table according to the user operation data.
Step 422, a user attribute table is generated according to the user attribute data, and the user operation table and the user attribute table include first user identifiers corresponding to the users.
In order to generate a user operation table and a user attribute table, defining fields in the user operation table and the user attribute table, associating information items in user operation data with the fields in the user operation table, associating the information items in the user attribute data with the fields in the user attribute table, and assigning values to the fields corresponding to the information items according to attribute values of the information items in the user operation data to obtain the user operation table; and assigning values to the fields corresponding to the information items according to the attribute values of the information items in the user attribute data to correspondingly obtain a user attribute table.
In some embodiments of the present application, the user attribute table and the user operation table are narrow tables, so that the number of fields defined in the user operation table and the user attribute table can be reduced, and in the subsequent user classification process according to the attribute values of the fields, the number of categories can be reduced, and correspondingly, the number of high-efficiency compressed bitmaps in the initial user set is reduced, thereby further reducing the storage pressure. In some embodiments of the present application, fields in the user attribute table and fields in the user operation table may be further defined as more uniform fields.
In some embodiments of the present application, some information items in the user operation data and the user attribute data obtained in step 410 are not required to be concerned in the subsequent data analysis process, so before step 420, the user operation data and the user attribute data may also be cleaned and filtered, the attribute values of the information items required to be concerned in the subsequent analysis are retained, and the attribute values of the information items not involved in other product analysis are filtered out.
The user operation table and the user attribute table respectively comprise first user identifications corresponding to the users, so that the information in the user operation table and the information in the user attribute table are related through the first user identifications corresponding to the users in the two tables.
And 423, according to the attribute value of each field in the user operation table, performing high-efficiency bitmap compression on the first user identifier corresponding to the same attribute value in the user operation table to obtain a second high-efficiency compressed bitmap associated with the corresponding field and the corresponding attribute value. And 424, according to the attribute value of each field in the user attribute table, performing high-efficiency bitmap compression on the first user identifier corresponding to the same attribute value in the user attribute table to obtain a third high-efficiency compressed bitmap associated with the corresponding field and the corresponding attribute value.
In this embodiment, step 430 includes: step 431, storing the second high-efficiency compression bitmap, the corresponding fields and the corresponding attribute values in a second data table; step 432, store the third high efficiency compressed bitmap and corresponding fields and corresponding attribute values in a third data table.
In some embodiments of the present application, efficient bitmap compression may be performed on the first user identities corresponding to the same attribute value through an aggregation function, i.e., aggregation of multiple elements into one bitmap object (efficient compressed bitmap). The same attribute value may be an attribute value of an information item, or two or more attribute values, for example, a first user identifier corresponding to a user with an attribute value of 18 to 25 years old under the information item of the age in the user attribute table may be aggregated into a bitmap object, so as to obtain an efficient compressed bitmap corresponding to the attribute value; as another example, user identities for users aged 18-25 and registered in North China may be aggregated into a bitmap object.
In some embodiments of the present application, in steps 423 and 424, in order to control the number of the efficiently compressed bitmaps of the efficiently compressed bitmap data table, the user may be classified according to the attribute value of the specified field, and the first user identifier corresponding to the same attribute value under the specified field in the user operation table (and the user attribute table) may be efficiently bitmap-compressed without performing efficient bitmap compression on the attribute value under each field in the user operation table and/or the user attribute table. The designated field may be set according to actual needs, and is not limited herein. For example, since the analysis process involves performing classification statistics based on the user operation behavior and the user operation time, a field indicating the user operation behavior and a field indicating the user operation time may be used as the designated fields.
It is worth mentioning that the object to be subjected to the high-efficiency bitmap compression is unsigned integer data, and therefore, before the high-efficiency bitmap compression is performed, it needs to be ensured that the first user identifier to be subjected to the high-efficiency bitmap compression is in a format supporting the high-efficiency bitmap compression, that is, unsigned integer data.
For a user using a software application, the user operation data is continuously updated along with the advance of time, and the general updating frequency of the user attribute data is low.
In some embodiments of the present application, the user operation table may also be constructed only for user operation data, so that based on the attribute values of the fields in the user operation table, the first user identifiers corresponding to the same attribute values in the user operation table are subjected to efficient bitmap compression. In other embodiments of the present application, the user operation data and the user attribute data may also be combined into the same data table, so that the first user identifier corresponding to the same attribute value is subjected to efficient bitmap compression based on the attribute value of each field in the data table, and may be specifically set according to actual needs.
In some embodiments of the present application, the user operation data and the user attribute data comprise a user identification; prior to steps 421 and 422, the method further comprises: and if the user identifier is in a format which does not support the efficient bitmap compression, generating a first user identifier corresponding to each user identifier according to a specified format which supports the efficient bitmap compression.
In this embodiment, step 421 further comprises: generating a user operation table according to the first preset field, the user operation data and the first user identification corresponding to each user identification; step 422 further includes: and generating a user attribute table according to the second preset field, the user attribute data and the first user identification corresponding to each user identification.
The format of the generated first subscriber identity is a format that supports efficient bitmap compression. The first preset field is defined for a user operation table; the second predetermined field is a field defined for the user attribute table. Through the above process, it can be ensured that the first user identifiers corresponding to the users in the user attribute table and the user operation table are in a format capable of performing efficient bitmap compression.
In some embodiments of the present application, in order to avoid affecting other data after replacing the user identifier with the first user identifier, other fields different from the field corresponding to the user identifier may be defined in the user operation table and the user attribute table to represent the first user identifier.
In some embodiments of the present application, the method further comprises: accumulating the first user identification to obtain an accumulated number; if the accumulated number reaches a set number threshold, acquiring a first user identifier corresponding to a user with the longest inactive duration; and allocating the first user identification corresponding to the user with the longest non-updating time length to the next user to generate the first user identification.
In this embodiment, a removal mechanism is used to remove users that are not active for a long time, and reuse the first user identifier of the removed user, so that the total number of the first user identifiers can be maintained within a range that does not exceed the set number threshold, and the space size of the high-efficiency compression bitmap in the high-efficiency compression bitmap data table is further ensured within a stable range, thereby avoiding the problem of reduced computation performance caused by the sparse high-efficiency compression bitmap due to the excessive number of the first user identifiers. Moreover, the eliminated user is the user with the longest inactive duration, and in the process of analyzing the user behavior, the analysis is generally performed based on the user which is active at present or in a period of time which is closer to the present time, so that the data analysis performed is not affected by reusing the first user identifier of the eliminated user.
In some embodiments of the present application, the high efficiency compressed bitmap data table further comprises a user group data table; the data processing method further comprises: determining a user identification set corresponding to a user group according to user identifications corresponding to users in the user group; determining a first user identifier corresponding to each user identifier in a user identifier set according to a mapping relation between the user identifiers and the first user identifiers to obtain a first user identifier set corresponding to a user group; performing high-efficiency bitmap compression on the first user identification in the first user identification set to obtain a high-efficiency compressed bitmap corresponding to the user group; and storing the high-efficiency compression bitmap corresponding to the user group and the user group identification corresponding to the user group in a user group data table.
In an embodiment, based on the obtained efficient compressed bitmap corresponding to the user group, retention analysis, event analysis, funnel analysis, and the like can be performed based on the users in the user group, and compared with performing product analysis, such as retention analysis, event analysis, funnel analysis, and the like, on all users using a product, performing product analysis based on users of the user group is equivalent to narrowing the user scope.
The method of the present application is further described below with reference to a specific example.
FIG. 5 is a diagram illustrating a system architecture according to an embodiment of the present application. As shown in fig. 5, the system architecture includes an offline storage module 510, a compute engine 520, and an application layer 530. Each terminal (for example, the third device in fig. 1B) reports data (user operation data, user attribute data), and the offline storage module 510 stores the data offline, where the offline storage module 510 may be constructed by using a Hadoop Distributed File System (HDFS), and the offline storage module 510 may further clean and normalize the data to obtain data with a standard structure.
The calculation engine 520 is used for processing data in the offline storage module 510, and the calculation engine 520 includes a Spark calculation engine and a Clickhouse query engine. Clickhouse is an open source columnar database for online analytical processing (OLAP) that provides a rich population of bitmap functions so that the bitmap functions in Clickhouse can be called to operate based on efficient compression in an efficient compression bitmap database. Under the system architecture, the efficient compressed bitmap of the user which is inquired to meet the defined condition can be stored in the Clickhouse, and then the inquiry of the related information of each user in the efficient compressed bitmap is carried out by means of a Clickhouse inquiry engine. The Spark calculation engine may be used to perform user imagery and user preference analysis for a user population.
In the application layer 530, a data overview function and an analysis tool are provided, for the data overview function, statistics of the number of users under the statistical index can be performed according to a preset statistical dimension, where the statistical index includes dimensions such as new addition, networking, active, retention, and the like, where the new addition dimension refers to statistics of the number of new users in each time period, the networking dimension refers to statistics of the number of networked users in each time period, the active dimension refers to statistics of the number of users actively accessing applications in each time period, and the retention dimension refers to statistics of the number of retained users in each time period. The analysis tools may be used to perform event analysis, funnel analysis, retention analysis, and user population analysis.
Fig. 6 is a flowchart illustrating a data processing method according to an embodiment of the present application, which may be implemented by the system architecture shown in fig. 5, as shown in fig. 6, including the following steps 610-674. And step 610, the terminal reports data to the HDFS.
The reported data comprises data related to user operation and/or data related to user portrait. Including the main fields, time, user id, channel, version, operational event id, platform, etc. The reported data can be shown in table 1 below. In table 1, "Timestamp" indicates time, and uid indicates a user identifier.
TABLE 1
Timestamp uid Channel for irrigation Version(s) Operation event id Platform .
1616481804 030216072815 Step by step higher 1.0.1 100324 Android .
1616481821 030220000816 Millet 1.2.0 32546 Android .
Step 620, data preprocessing.
Data stored in the HDFS is cleaned and filtered through a hive tool (a data warehouse tool based on Hadoop) of the HDFS system, and attribute values of important fields needed in subsequent data processing are extracted from the data, such as a first channel registered by a user, whether the user is a new user on the same day, a latest version used by the user and the like.
At step 630, an in-product attribute table and an out-of-product attribute table are generated.
In this embodiment, the data related to the user is divided into two parts, one part is used for reflecting the user behavior of the user in the product, and the part of data is called user operation data, such as the version of the product used by the user, the channel for downloading the software product, and the like; a part of the data is used to reflect basic image information of the user, and the part of the data is referred to as user attribute data such as gender, age, and the like. And then generating an in-product attribute table according to the user operation data, and generating an out-product attribute table according to the user attribute data.
In this embodiment, the fields in the product external attribute table and the product internal attribute table are defined according to the unified field, such as: category, attribute, value, uid, ds, product and the like, wherein the category represents the attribute belonging to the category, such as channel information, product behavior and the like, the attribute represents the attribute, such as a first channel, a function point and the like, and the value represents a specific attribute value, such as a channel name, an operation event id and the like; uid denotes a user identification; ds represents time; product represents a product name.
The in-product attribute table includes data such as the user identification uid and the user product attributes (e.g., operation event, channel), which may be shown in table 2.
TABLE 2
category attribute value Uid ds product
Channel information First canal Step by step higher 030216072815 20210115 Hand tube
Product behavior Function point 32546 030220000816 20210115 Photo album
The product external attribute table includes the user identification uid and the user basic portrait related information, which can be shown in table 3 below.
TABLE 3
category attribute value Uid ds
Demographic attributes Age (age) 6-12 years old 030216072815 20210118
Demographic attributes Sex Woman 030220000816 20210119
Step 640, the user is identification coded.
The user identification (uid) extracted from the data reported by the terminal is in a format incapable of supporting high-efficiency bitmap compression, so that the user identification is encoded in order to perform high-efficiency bitmap compression subsequently, and specifically, the user identification (uid) of each user is mapped to an integer id which can support high-efficiency bitmap compression. For the sake of distinction, the integer id mapped by the user identity (uid) is referred to as a first user identity. The types of values that the first subscriber identity may select include: UInt8: [0, 255], UInt16: [0, 65535] and UInt32: [0, 4294967295].
According to the mapping relationship between the user identifier and the first user identifier, an identifier encoding table shown in table 4 below can be obtained. In a specific embodiment, the first user identifier may be encoded sequentially, so as to facilitate statistics and distribution of the first user identifier.
TABLE 4
Id (first user identification) Uid (user identification)
1 030216072815
2 030220000816
Step 650, generating a user operation table and generating a user attribute table.
Based on the mapping relationship between the user identifier in the identifier code table and the first user identifier, the user identifier in the attribute table in the product generated in step 630 is converted into the first user identifier, so as to obtain a user operation table, as shown in table 5 below.
TABLE 5
category attribute value id ds product
Channel information First channel Step by step higher 1 20210115 Hand tube
Product behavior Function point 32546 2 20210115 Photo album
The user id in the out-of-product attribute table generated in step 630 is converted into the first user id, so as to obtain a user attribute table, as shown in table 6 below.
TABLE 6
category attribute value id ds
Demographic attributes Age (age) 6-12 years old 1 20210118
Demographic attributes Sex Female 2 20210119
Step 660 generates a second data table and a third data table.
And importing the user operation table and the user attribute table generated in the step 650 into the clickhouse system from the live tool, and then performing efficient bitmap compression (bitmap compression) on the user id (first user identifier) belonging to the attribute value by using an aggregation function groupBitmapState in the clickhouse system according to the attribute values of the fields in the user operation table and the user attribute table. The first user identifiers corresponding to the same attribute value in the user operation table are subjected to high-efficiency bitmap compression to obtain a second data table, which may be shown in table 7 below.
TABLE 7
category attribute value bmp ds product
Channel information First channel Step by step higher {1,100,2435} 20210115 Hand tube
Product behavior Function point 32546 {2,36,117} 20210115 Photo album
And performing efficient bitmap compression on the first user identifications corresponding to the same attribute value in the user attribute table to obtain a third data table, wherein the third data table can be shown in the following table 8.
TABLE 8
category attribute value bmp ds
Demographic attributes Age (age) 6-12 years old {1,5,10,…} 20210118
Demographic attributes Sex Female {2,56,109,…} 20210119
Wherein, the field bmp identifies the high-efficiency compressed bitmap, and {1,100,2435} in the field bpm in table 7 indicates that the way of storing the three user usage bits, i.e., {1,100,2435} of the first user identification 1,100,2435, exists in the data type of the aggregate function (UInt 32) in the Clickhouse system, and the like. The AggregataeFunction is a special data type provided by a Clickhouse system, the AggregataeFunction can store intermediate State results in a binary form, for a column field of the AggregataeFunction type, data writing and querying are different from ordinary data, a State function needs to be called when data is written, and a corresponding Merge function needs to be called when data is queried.
Based on the obtained second data table and the third data table, functions such as data overview, product analysis and user group analysis can be performed. An index system can be constructed through a spark calculation engine and a clickhouse query engine to help people to know the current situation of a product and discover index variation, and when the information is obtained, the product data can be obtained through data analysis and the reasons of the abnormality can be further disassembled, for example, whether newly added abnormality is related to a certain channel or not can be disassembled through event analysis, the problem on the product function can be discovered through funnel analysis, and the like. Or more exploratory analyses can be performed, for example, user group analysis is performed to find the characteristics of different user groups, so that operation decisions and intelligent operation can be conveniently performed by operators.
The data overview function is used for providing basic data systems of products, such as the whole user activity level, newly added users, reserved users and the like. In the part, user statistics can be carried out according to statistical indexes such as new addition, networking, initiative, retention and the like.
Specifically, based on the second data table and the third data table, common new-adding, networking, active, retention indexes and common dimensions can be calculated by using a bitmap function (high-efficiency compressed bitmap function) group carried by the Clickhouse system, calculation results are stored in a table of the Clickhouse system, for example, a bitmap algorithm function can be used for counting some common active indexes such as daily activity, weekly activity, monthly activity and indexes according to different attributes, in addition, retention related indexes can be calculated by using the bitmap algorithm, bitmap OrCardinal activity and other functions, and finally, visual display is carried out, and monitoring data of the common indexes are provided for products.
Based on the generated second data table and the third data table, a product analysis tool can be further constructed, wherein the product analysis tool comprises an event tool, a funnel tool and a retention tool. The event tool is used for performing event analysis, the funnel tool is used for performing funnel analysis, and the retention tool is used for performing retention analysis. The event tool can specify different screening conditions according to specified events (operation behaviors) and check the number of users according to a certain dimension.
The client provides an interactive interface for event analysis, and event specification, screening condition specification and unfolding dimension (unfolding attribute value) specification can be performed in the interactive interface. Steps 671-674 in fig. 6 show a flow chart of block query in event analysis, and as shown in fig. 6, on the basis of specified events, specified conditions (filter conditions) and specified data expansion dimensions, the client can upload the query information to the server, so as to resolve and determine the query conditions: appointing an event and conditions, determining classification statistical indication information by one-step analysis, and then executing a step 671 to calculate a user bitmap of the appointed event; 672, calculating a user bitmap of a specified condition, 673, calculating a user bitmap of each dimension value under a specified dimension; then, based on the boolean logic relationship between the query conditions, step 674 is performed, where a logical operation is performed on the user bitmap using the bitmap function. Through the step 674, a user bitmap (target high-efficiency compressed bitmap) meeting the condition defined by the condition defining information and a user bitmap corresponding to each attribute value after expansion according to the data expansion dimension can be obtained, and finally, visual display can be performed according to the result obtained in the step 674.
Fig. 7 is a diagram illustrating a newly created event analysis interface according to an embodiment, where the interface includes an event analysis information specifying area 710, an event analysis display setting area 720, and an event analysis result display area 730. A control for adding an event, that is, "+ add event" control, is provided in the information specification area 710, wherein the event is reflected by the user behavior of the user, and "network" is selected in fig. 7; the event analysis information designation field 710 also provides a control for adding a filter condition, i.e., "+ add filter condition" control, and a control for making a date designation, the dates designated in fig. 7 being 20210216-20210221. The event analysis information specification area 710 also provides a control "+ add dimension" for adding a view dimension. The added event selected by the user, the added filtering condition and the selected date form the condition limiting information indicated by the event analysis request. The user in the added viewing dimension may be considered the classification statistics indicated by the event analysis request. Therefore, after the client initiates the event analysis request, the server can correspondingly analyze the condition limiting information and decompose the condition limited by the condition limiting information into at least two query conditions.
The event analysis display settings area 720 in the interface of FIG. 7 provides display options for the user to select a display mode, e.g., a trend chart, a bar chart, a pie chart, a table, etc., may be displayed, and further dimensions may be set via the option "display settings". The event analysis result display area 730 is used for displaying the queried results according to the display mode selected by the user, in fig. 7, the view dimension of viewing according to the version is selected, and the result display area exemplarily shows a user number trend chart of each day in the time period of 20210216-20210221 for the event of networking under two versions, the version number is 8.10.1 and the version number is 8.10.0.
In a scenario where the query request is an event analysis request, the server may decompose a condition defined by the condition defining information into a specified event and a screening condition during parsing of the condition defining information, so that the query process may be partitioned into: finding a user bitmap of the event according to the event selected by the user, finding a bitmap of a user meeting the conditions according to the screening conditions, finding a user bitmap corresponding to each dimension value under the dimension according to the screened dimension, performing logical operation through a rich bitmap function group provided by a clickhouse to obtain a statistical result, and finally displaying a trend graph through front-end visualization.
The retention tool is an analysis model used for analyzing the participation condition/activity degree of the user, and how many people can perform subsequent behaviors in the user after the initial behaviors are examined. In practice, slightly different from the event tool is the concept of an initial behavior and a retention behavior, but both are actually an event, and thus the same is the user bitmap that finds an event.
Fig. 8 is a schematic diagram of an interface for creating retention analysis according to an embodiment of the present application, where the interface includes a retention analysis information specification area 810 and a retention analysis result display area 820. As shown in fig. 8, the retention analysis information designation area 810 provides options for designating a start behavior, designating a retention behavior, and also provides an option for adding a filter condition, an option for date designation, and an option for designating a viewing dimension. As with the event analysis, the start behavior, the specified retention behavior, the added filter condition, and the specified date specified by the user in the retention analysis information specifying area 810 correspond to the conditions specified by the condition definition information indicated by the indication retention analysis request. The viewing dimension added by the user may be considered the classification statistics indicated by the retention analysis request.
The retention analysis results display area 820 is used to display the results of the retention analysis, the retention analysis results display area 820 providing a display of the retention/change in retention per day for a specified period of time in terms of a table and a trend graph.
The retention analysis process is described in detail below with reference to a specific embodiment. If the information items defined by the user are as follows:
product _ id = "manager"; start _ event = "E _ socket _ Slip"; remaining _ event = "E _ Accelerate _ Show"; condition = "version info _ version _ include _8.12.0"; ds _ condition =20210416_20210422; date _ format = "day"; attribute _ name = "demographic attribute _ age".
The product _ id represents a product identifier, the start _ event represents a starting action, and the remaining _ event represents a retention action; condition indicates the screening condition; ds _ condition represents a time period, and date _ format represents statistics by day.
After receiving the query request, the server obtains the query information associated with the query task identifier, and resolves the condition defined by the condition definition in the query information (i.e. the condition defined by product _ id, start _ event, remaining _ event, condition and ds _ condition in the above text) into two query conditions: 1) The start behavior is operated and the screening condition (i.e. the condition defined by start _ event, product _ id, condition, ds _ condition) is satisfied; 2) Retention behavior (i.e., product _ id, remaining _ event, and a condition defined comprehensively by a specified retention time) is manipulated in the product; the query information also includes the indication information of the classification statistics, i.e. the information is expanded according to the age.
On this basis, block query is performed according to the method of the application:
(1) and querying the specified user object, namely querying the user who operates the initial behavior and meets the screening condition, and setting the high-efficiency compression bitmap of the part of users as object _ user _ bmp.
Figure BDA0003090361450000271
In the above code, as is used to indicate an alias, for example, select bmp as condition _ bmp indicates that the alias indicating bmp is condition _ bmp. A union all is used to perform a union operation on two select statements, including repeated rows.
By executing the SQL code, it is possible to query that the product _ id = "manager" is satisfied every day in the time period of 20210416_20210422; start _ event = "E _ pocket _ Slip" condition = "version info _ version _ efficient compressed bitmap of user containing these conditions — 8.12.0".
(2) Inquiring the high-efficiency compressed bitmap of the user corresponding to each attribute value needing to be expanded, and defining the bitmap as follows:
attribute_user_bmp。
select ds,attribute_name,attribute_value,bmp
from mvp.t_outer_bmp_cluster
where ds > =20210416and ds < =20210422and attribute \\uname = 'age'
(3) Querying an efficient compressed bitmap of a user for subsequent multi-day retention behavior, which is defined as:
Figure BDA0003090361450000281
the per-day user's bitmap at time periods 20210417-20210429, 20210430-20210516 for this time period at' E _ Accelerate _ Show 'in this retention behavior under the use of "product _ id =' manager 'product' can be determined by executing this piece of code.
(4) Performing code combining
Figure BDA0003090361450000282
Figure BDA0003090361450000291
Figure BDA0003090361450000301
The datediff () function is used to calculate the date part of the specified two dates, the time difference between the second date and the first date.
The parseDateTimeBestEffort function is used to convert the time-date of the String type to a DateTime data type.
The Cast () function is used to explicitly convert an expression of a certain data type to another data type. The argument of the Cast () function is an expression that includes the source value and the target data type separated by the AS key. The application method comprises the following steps: cast (expression AS data _ type), where expression denotes any valid SQServer expression, AS is used to separate two parameters, data to be processed before the AS, and data type to be converted after the AS.
data _ type refers to the type of data provided by the target system, including bigint and sql _ variable, which cannot use user-defined data types.
inner join is used to display all matching records in two or more tables to be associated according to the association condition. cross join is used to join all rows of the a table with all rows of the B table, respectively, and the number of records returned is the product of the numbers of records of the two tables. using () is used for join queries of two tables, requiring that the column specified by using () exists in both tables and uses it for the join condition.
In the combined code, calling the result attribute _ user _ bmp obtained in the step (1), the result attribute _ user _ bmp obtained in the step (2), and the result remaining _ user _ bmp obtained in the step (3) to perform corresponding logic operation to obtain the final user bitmap, and counting the number of users according to the age.
The query of the event tool and the funnel tool is similar to the process, and the query mode of the block well ensures the accuracy and efficiency of the query under the complex condition and improves the query efficiency.
The funnel tool is mainly used for analyzing the conversion and loss conditions of each step in a multi-step process, can find key breaking points in the steps, can help people to find problems in products, and provides data related to path conversion in the products.
FIG. 9 is a schematic interface diagram illustrating a newly created funnel analysis according to an embodiment of the application. As shown in fig. 9, the interface includes a funnel analysis information specification area 910 and a funnel analysis result display area 920, the funnel analysis information specification area 910 provides a control for adding a funnel step "+ add a funnel step", the funnel step added in fig. 9 includes two steps of "networking" and "initiative", and a control for adding a filtering condition and a control for date specification are also provided. The funnel analysis result display area 920 provides a table format for displaying the counted number of users in each funnel step and a histogram style for displaying the counted number of users in each funnel step.
The user group analysis is performed based on the product internal attribute table and the product external attribute table generated in step 630, and specifically, the user group may be generated according to the product internal attribute table and the product external attribute table.
Fig. 10A to 10C are schematic interface diagrams of a new user group according to an embodiment of the present application, and in this embodiment, three ways of creating a new user group, namely, a custom user group, an upload user group, and an SQL user group, are provided. As shown in fig. 10A, the user can specify a date and add a filtering condition, so that users meeting the specified date and the added filtering condition are filtered as users in the user group. In fig. 10A, an option of setting the user group name is also provided, which is convenient for the user to customize the user group name. As time advances, the data of the user is updated correspondingly, and correspondingly, the user who meets the specified date and the added filtering condition may also be changed, that is, the users in the user group are updated correspondingly. In other embodiments, the update may also be automatic, for example, an automatic update condition is preset, and when the automatic update condition is met, the update is performed automatically.
Fig. 10B shows a schematic diagram of an interface for creating a user group by uploading the user group, and as shown in fig. 10B, the interface provides an upload window 1010 of a user group package, and when a user drags a file indicating a user in the user group to the upload window 1010 or clicks an "upload" control in the upload window, the file may be added to the user group according to the user indicated in the file. The file uploaded through the upload window may be a user identifier (first user identifier) indicating a user group to be added. Like fig. 10A, the interface shown in fig. 10B also provides an option for custom setting of the user group name.
Fig. 10C is a schematic diagram of an interface for creating a user group by using SQL, as shown in fig. 10C, in which a code input area 1020 for inputting clustering SQL is provided, and SQL code for creating a user group, for example, SQL code indicating which users are satisfied and added to the user group, is input in the code input area 1020, and a user group is created by executing the SQL code. Further, as shown in fig. 10C, in order to avoid that the SQL code input is executed due to an error, the interface further provides an option for performing SQL code check, and if the user triggers the control "SQL check", the terminal may automatically check the SQL code input in the code input area 1020 and display the check result in the content display area corresponding to the "SQL check result". Like fig. 10A, the interface of fig. 10C also provides options for custom setting of the user group name and selecting the user group update mode.
Through the three ways, the user of the analysis platform can flexibly create a personalized user group, and further perform user analysis according to the data of the user in the user group, for example, analyze the portrait of the user and the user preference.
In this embodiment, the user group analysis tool and other product analysis tools may be further opened, that is, event analysis, retention analysis, funnel analysis, and the like in the product are performed in the user group. Based on the in-product attribute table and the out-product attribute table, and the user condition of the user in the user group defined in the interface of the newly-built user group, the user identification of the user in the user group can be determined. Then, according to the mapping relationship between the user identifier indicated by the identifier coding table obtained in step 640 and the first user identifier, a first user identifier set corresponding to the user group can be obtained, then, bitmap compression is performed on the first user identifier set corresponding to the user group to obtain a bitmap of the user group, and the bitmap of the user group can be used as a data base for product analysis, so that product analysis is performed based on the attribute values of the users in the user group.
Embodiments of the apparatus of the present application are described below, which may be used to perform the methods of the above-described embodiments of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method described above in the present application.
Fig. 11 is a block diagram illustrating a data processing apparatus according to an embodiment, as shown in fig. 11, the data processing apparatus including: a query request obtaining module 1110, configured to obtain a query request, where the query request indicates query information, and the query information includes condition restriction information; the parsing module 1120 is configured to parse the condition restriction information, and determine at least two query conditions corresponding to the condition restriction information and a boolean logic relationship between the at least two query conditions; a query code generation module 1130, configured to generate a query code corresponding to each query condition of the at least two query conditions, respectively; the query code is used for carrying out object query in the high-efficiency compression bitmap data table to obtain a first high-efficiency compression bitmap of an object set meeting corresponding query conditions, and the high-efficiency compression bitmap data table comprises high-efficiency compression bitmaps of a plurality of initial object sets; the initial object set is determined by classifying objects according to attribute values; a combining module 1140, configured to combine the query codes corresponding to the query conditions in the at least two query conditions according to a boolean logic relationship between the at least two query conditions, so as to obtain a combined code; the combination code is used for carrying out logical operation on the first high-efficiency compressed bitmap according to the Boolean logical relation between at least two query conditions to obtain a target high-efficiency compressed bitmap; the target efficient compressed bitmap is used to determine query results.
In some embodiments of the present application, the combining module 1140 comprises: the acquisition unit is used for acquiring a high-efficiency compressed bitmap function corresponding to the Boolean logic relationship according to the Boolean logic relationship among at least two inquiry conditions; and the combination unit is used for combining the query codes corresponding to each query condition in the at least two query conditions according to the efficient compression bitmap function corresponding to the Boolean logic relationship to obtain the combination codes.
In some embodiments of the present application, the query information further includes classification statistical information indicating a target information item that needs to be counted according to the attribute value; the data processing apparatus further includes: the classification statistical indication code generation module is used for generating a classification statistical indication code according to the classification statistical information; and the updating module is used for updating the combined code according to the classification statistic indication code, and the updated combined code is used for classifying the object set indicated by the target high-efficiency compression bitmap according to the attribute value of the target information item.
In some embodiments of the present application, the query request includes a query task identifier, and the query task identifier is generated by an initiator of the query request when an input operation for query information is detected; the data processing apparatus further includes: the query information acquisition module is used for acquiring query information associated with the query task identifier from the specified information table; after an initiator of the query request detects input operation aiming at query information, the query task identifier and the detected query information are stored in a specified information table in an associated manner; the combination code storage module is used for storing the combination code corresponding to the query information and the query task identifier into a specified information table in an associated manner; and the storage indication information returning module is used for returning the storage indication information to the initiator of the query request so that the initiator of the query request acquires and executes the combined code from the specified information table according to the storage indication information.
In some embodiments of the present application, the object is a user; the data processing apparatus further includes: the data acquisition module is used for acquiring user operation data and user attribute data, wherein the user operation data is used for indicating the interactive behavior of a user on a user interface of a product; the efficient bitmap compression module is used for performing efficient bitmap compression on a first user identifier corresponding to a user with the same attribute value according to the attribute value of each field in the user operation data and the attribute value of each field in the user attribute data to obtain an efficient compressed bitmap of the corresponding attribute value; and the storage module is used for storing the obtained high-efficiency compression bitmap and the corresponding attribute value into a high-efficiency compression bitmap data table.
In some embodiments of the present application, the efficiently compressed bitmap data table comprises a second data table and a third data table; an efficient bitmap compression module comprising: the generating unit is used for generating a user operation table according to the user operation data and generating a user attribute table according to the user attribute data, and the user operation table and the user attribute table comprise first user identifications corresponding to all users; the second high-efficiency compressed bitmap generation unit is used for performing high-efficiency bitmap compression on the first user identification corresponding to the same attribute value in the user operation table according to the attribute value of each field in the user operation table to obtain a second high-efficiency compressed bitmap associated with the corresponding field and the corresponding attribute value; the third efficient compressed bitmap generation unit is used for performing efficient bitmap compression on the first user identification corresponding to the same attribute value in the user attribute table according to the attribute value of each field in the user attribute table to obtain a third efficient compressed bitmap associated with the corresponding field and the corresponding attribute value; in this embodiment, the storage module includes: a first storage unit for storing a second high-efficiency compressed bitmap and corresponding fields and corresponding attribute values in a second data table; and the second storage unit is used for storing the third high-efficiency compressed bitmap, the corresponding fields and the corresponding attribute values in a third data table.
In some embodiments of the present application, the user operation data and the user attribute data comprise a user identification; the data processing apparatus further includes: the first user identifier generating unit is used for generating first user identifiers corresponding to the user identifiers according to a specified format supporting the efficient bitmap compression if the user identifiers are in the format not supporting the efficient bitmap compression; in this embodiment, the generating unit includes: the user operation table generating unit is used for generating a user operation table according to the first preset field, the user operation data and the first user identification corresponding to each user identification; and the user attribute table generating unit is used for generating a user attribute table according to the second preset field, the user attribute data and the first user identification corresponding to each user identification.
In some embodiments of the present application, the data processing apparatus further comprises: the accumulation module is used for accumulating the first user identification to obtain an accumulated number; the first user identification acquisition module is used for acquiring a first user identification corresponding to a user with the longest inactive duration if the accumulated number reaches a set number threshold; and the allocation module is used for allocating the first user identifier corresponding to the user with the longest non-updating time to the next user to generate the first user identifier.
In some embodiments of the present application, the high efficiency compressed bitmap data table further comprises a user group data table; the data processing apparatus includes: the user identification set determining module is used for determining a user identification set corresponding to the user group according to the user identification corresponding to each user in the user group; the first user identification set determining module is used for determining first user identifications corresponding to the user identifications in the user identification set according to the mapping relation between the user identifications and the first user identifications to obtain a first user identification set corresponding to the user group; the compression module is used for carrying out efficient bitmap compression on the first user identification in the first user identification set to obtain an efficient compression bitmap corresponding to the user group; and the user group data table storage module is used for storing the high-efficiency compression bitmap corresponding to the user group and the user group identification corresponding to the user group in the user group data table.
In some embodiments of the present application, the query request includes at least one of an event analysis request, a retention analysis request, a funnel analysis request, and a user path analysis request.
FIG. 12 illustrates a schematic structural diagram of a computer system suitable for use to implement the electronic device of the embodiments of the subject application.
It should be noted that the computer system 1200 of the electronic device shown in fig. 12 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 12, the computer system 1200 includes a Central Processing Unit (CPU) 1201, which can perform various appropriate actions and processes, such as executing the methods in the above-described embodiments, according to a program stored in a Read-Only Memory (ROM) 1202 or a program loaded from a storage section 1208 into a Random Access Memory (RAM) 1203. In the RAM 1203, various programs and data necessary for system operation are also stored. The CPU1201, ROM1202, and RAM 1203 are connected to each other by a bus 1204. An Input/Output (I/O) interface 1205 is also connected to bus 1204.
The following components are connected to the I/O interface 1205: an input portion 1206 including a keyboard, a mouse, and the like; an output section 1207 including a Display device such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 1208 including a hard disk and the like; and a communication section 1209 including a Network interface card such as a LAN (Local Area Network) card, a modem, or the like. The communication section 1209 performs communication processing via a network such as the internet. A driver 1210 is also connected to the I/O interface 1205 as needed. A removable medium 1211, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is mounted on the drive 1210 as necessary, so that a computer program read out therefrom is mounted into the storage section 1208 as necessary.
In particular, according to embodiments of the present application, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 1209, and/or installed from the removable medium 1211. The computer program executes various functions defined in the system of the present application when executed by a Central Processing Unit (CPU) 1201.
It should be noted that the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.
As another aspect, the present application also provides a computer-readable storage medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer-readable storage medium carries computer-readable instructions that, when executed by a processor, implement the method of any of the embodiments described above.
According to an aspect of the present application, there is also provided an electronic device, including: a processor; a memory having computer readable instructions stored thereon which, when executed by the processor, implement the method of any of the above embodiments.
According to an aspect of an embodiment of the present application, there is provided a computer program product or a computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method of any of the above embodiments.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, and may also be implemented by hardware necessary for software. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present application.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (13)

1. A method of data processing, comprising:
acquiring a query request, wherein the query request indicates query information, and the query information comprises condition limiting information;
analyzing the condition limiting information, and determining at least two query conditions corresponding to the condition limiting information and a Boolean logic relationship between the at least two query conditions;
respectively generating a query code corresponding to each query condition in the at least two query conditions; the query code is used for carrying out object query in a high-efficiency compression bitmap data table to obtain a first high-efficiency compression bitmap of an object set meeting corresponding query conditions, and the high-efficiency compression bitmap data table comprises high-efficiency compression bitmaps of a plurality of initial object sets; the initial object set is determined by classifying objects according to attribute values;
combining the query codes corresponding to the query conditions in the at least two query conditions according to the Boolean logic relationship between the at least two query conditions to obtain combined codes; the combination code is used for carrying out logical operation on the first high-efficiency compressed bitmap according to the Boolean logical relation between the at least two query conditions to obtain a target high-efficiency compressed bitmap; the target efficient compressed bitmap is used to determine query results.
2. The method according to claim 1, wherein the combining the query codes corresponding to the query conditions of the at least two query conditions according to the boolean logic relationship between the at least two query conditions to obtain a combined code comprises:
obtaining a high-efficiency compression bitmap function corresponding to the Boolean logic relation according to the Boolean logic relation among the at least two query conditions;
and combining the query codes corresponding to the query conditions in the at least two query conditions according to the high-efficiency compression bitmap function corresponding to the Boolean logic relationship to obtain the combined codes.
3. The method according to claim 1, wherein the query information further includes classification statistical information indicating target information items that need to be counted by attribute values;
the method further comprises the following steps:
generating a classification statistic indication code according to the classification statistic information;
after combining the query codes corresponding to each query condition in the at least two query conditions according to the boolean logic relationship between the at least two query conditions to obtain a combined code, the method further includes:
and updating the combined code according to the classification statistic indication code, wherein the updated combined code is used for classifying the object set indicated by the target high-efficiency compression bitmap according to the attribute value of the target information item.
4. The method according to any one of claims 1-3, wherein the query request comprises a query task identifier, and the query task identifier is generated by an initiator of the query request when an input operation for query information is detected;
after the query request is obtained, the method further includes:
acquiring the query information associated with the query task identifier from a specified information table; after the initiator of the query request detects the input operation aiming at the query information, the query task identifier and the detected query information are stored in the specified information table in an associated manner;
after combining the query codes corresponding to each query condition in the at least two query conditions according to the boolean logic relationship between the at least two query conditions to obtain a combined code, the method further includes:
storing the combination code corresponding to the query information and the query task identifier into the specified information table in an associated manner;
and returning storage indication information to the initiator of the query request so that the initiator of the query request acquires and executes the combined code from the specified information table according to the storage indication information.
5. The method of any one of claims 1-3, wherein the object is a user; before the generating the query code corresponding to each of the at least two query conditions, the method further includes:
acquiring user operation data and user attribute data, wherein the user operation data is used for indicating interactive behaviors of a user on a user interface of a product;
according to the attribute value of each field in the user operation data and the attribute value of each field in the user attribute data, performing high-efficiency bitmap compression on a first user identifier corresponding to a user with the same attribute value to obtain a high-efficiency compressed bitmap of the corresponding attribute value;
and storing the obtained high-efficiency compression bitmap and the corresponding attribute values in the high-efficiency compression bitmap data table.
6. The method of claim 5, wherein the efficient compression bitmap data table comprises a second data table and a third data table;
the performing high-efficiency bitmap compression on the user identifier corresponding to the user with the same attribute value according to the attribute value of each field in the user operation data and the attribute value of each field in the user attribute data to obtain a high-efficiency compressed bitmap of the corresponding attribute value includes:
generating a user operation table according to the user operation data, and generating a user attribute table according to the user attribute data, wherein the user operation table and the user attribute table comprise first user identifications corresponding to all users;
according to the attribute values of all fields in the user operation table, performing high-efficiency bitmap compression on first user identifications corresponding to the same attribute values in the user operation table to obtain second high-efficiency compressed bitmaps related to the corresponding fields and the corresponding attribute values; and
according to the attribute values of all fields in the user attribute table, performing high-efficiency bitmap compression on first user identifications corresponding to the same attribute values in the user attribute table to obtain a third high-efficiency compressed bitmap associated with the corresponding fields and the corresponding attribute values;
the storing the obtained high-efficiency compression bitmap and the corresponding attribute values in the high-efficiency compression bitmap data table comprises:
storing the second efficient compressed bitmap and corresponding fields and corresponding attribute values in the second data table;
storing the third efficiently compressed bitmap and corresponding fields and corresponding attribute values in the third data table.
7. The method of claim 5, wherein the user operation data and the user attribute data comprise a user identification;
before the generating a user operation table according to the user operation data and generating a user attribute table according to the user attribute data, the method further comprises:
if the user identifier is in a format which does not support high-efficiency bitmap compression, generating a first user identifier corresponding to each user identifier according to a specified format which supports high-efficiency bitmap compression;
the generating a user operation table according to the user operation data and generating a user attribute table according to the user attribute data comprises:
generating the user operation table according to a first preset field, the user operation data and first user identifications corresponding to the user identifications; and
and generating the user attribute table according to a second preset field, the user attribute data and the first user identification corresponding to each user identification.
8. The method of claim 7, further comprising:
accumulating the first user identification to obtain an accumulated number;
if the accumulated quantity reaches a set quantity threshold value, acquiring a first user identifier corresponding to a user with the longest inactive duration;
and allocating the first user identification corresponding to the user with the longest un-updated time length to the user needing to generate the first user identification.
9. The method of claim 7, wherein the efficiently compressed bitmap data table further comprises a user group data table; the method further comprises the following steps:
determining a user identification set corresponding to a user group according to user identifications corresponding to users in the user group;
determining a first user identifier corresponding to each user identifier in the user identifier set according to a mapping relation between the user identifiers and the first user identifiers to obtain a first user identifier set corresponding to the user group;
performing high-efficiency bitmap compression on the first user identification in the first user identification set to obtain a high-efficiency compressed bitmap corresponding to the user group;
and storing the high-efficiency compression bitmap corresponding to the user group and the user group identification corresponding to the user group in the user group data table.
10. The method of claim 5, wherein the query request comprises at least one of an event analysis request, a persistence analysis request, a funnel analysis request, and a user path analysis request.
11. A data processing apparatus, comprising:
the query request acquisition module is used for acquiring a query request, wherein the query request indicates query information, and the query information comprises condition limiting information;
the analysis module is used for analyzing the condition limiting information and determining at least two query conditions corresponding to the condition limiting information and a Boolean logic relationship between the at least two query conditions;
the query code generation module is used for respectively generating a query code corresponding to each query condition in the at least two query conditions; the query code is used for carrying out object query in a high-efficiency compression bitmap data table to obtain a first high-efficiency compression bitmap of an object set meeting corresponding query conditions, and the high-efficiency compression bitmap data table comprises high-efficiency compression bitmaps of a plurality of initial object sets; the initial object set is determined by classifying objects according to attribute values;
the combination module is used for combining the query codes corresponding to the query conditions in the at least two query conditions according to the Boolean logic relationship between the at least two query conditions to obtain combination codes; the combination code is used for carrying out logical operation on the first high-efficiency compressed bitmap according to the Boolean logical relation between the at least two query conditions to obtain a target high-efficiency compressed bitmap; the target efficient compressed bitmap is used to determine query results.
12. An electronic device, comprising:
a processor;
a memory having computer-readable instructions stored thereon which, when executed by the processor, implement the method of any one of claims 1-10.
13. A computer readable storage medium having computer readable instructions stored thereon which, when executed by a processor, implement the method of any one of claims 1-10.
CN202110593441.7A 2021-05-28 2021-05-28 Data processing method and related equipment Pending CN115408381A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110593441.7A CN115408381A (en) 2021-05-28 2021-05-28 Data processing method and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110593441.7A CN115408381A (en) 2021-05-28 2021-05-28 Data processing method and related equipment

Publications (1)

Publication Number Publication Date
CN115408381A true CN115408381A (en) 2022-11-29

Family

ID=84156572

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110593441.7A Pending CN115408381A (en) 2021-05-28 2021-05-28 Data processing method and related equipment

Country Status (1)

Country Link
CN (1) CN115408381A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116827682A (en) * 2023-08-23 2023-09-29 腾讯科技(深圳)有限公司 Data processing method and device and computer equipment
CN117354356A (en) * 2023-12-04 2024-01-05 四川才子软件信息网络有限公司 APP region retention statistical method, system and equipment
CN117435756A (en) * 2023-12-18 2024-01-23 云筑信息科技(成都)有限公司 Data processing method for inquiring user retention based on bitmap

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116827682A (en) * 2023-08-23 2023-09-29 腾讯科技(深圳)有限公司 Data processing method and device and computer equipment
CN116827682B (en) * 2023-08-23 2023-11-24 腾讯科技(深圳)有限公司 Data processing method and device and computer equipment
CN117354356A (en) * 2023-12-04 2024-01-05 四川才子软件信息网络有限公司 APP region retention statistical method, system and equipment
CN117435756A (en) * 2023-12-18 2024-01-23 云筑信息科技(成都)有限公司 Data processing method for inquiring user retention based on bitmap
CN117435756B (en) * 2023-12-18 2024-03-26 云筑信息科技(成都)有限公司 Data processing method for inquiring user retention based on bitmap

Similar Documents

Publication Publication Date Title
CN109388637B (en) Data warehouse information processing method, device, system and medium
US20200125530A1 (en) Data management platform using metadata repository
US10754877B2 (en) System and method for providing big data analytics on dynamically-changing data models
CN115408381A (en) Data processing method and related equipment
EP2929467B1 (en) Integrating event processing with map-reduce
US20170154057A1 (en) Efficient consolidation of high-volume metrics
CN110618983A (en) JSON document structure-based industrial big data multidimensional analysis and visualization method
CN111971666A (en) Dimension context propagation technology for optimizing SQL query plan
CN111488261A (en) User behavior analysis system, method, storage medium and computing device
US20110119300A1 (en) Method Of Generating An Analytical Data Set For Input Into An Analytical Model
CN111339073A (en) Real-time data processing method and device, electronic equipment and readable storage medium
US11301486B2 (en) Visualizing time metric database
CN111339171B (en) Data query method, device and equipment
CN109388659B (en) Data storage method, device and computer readable storage medium
CN110569313B (en) Model table level judging method and device of data warehouse
US20240241882A1 (en) Systems and Methods for Extracting Data Views from Heterogeneous Sources
CN114741392A (en) Data query method and device, electronic equipment and storage medium
CN111414410A (en) Data processing method, device, equipment and storage medium
CN114416891B (en) Method, system, apparatus and medium for data processing in a knowledge graph
CN117971606A (en) Log management system and method based on elastic search
CN111506564A (en) Remote data management method and device based on CS (circuit switched) architecture, computer equipment and storage medium
CN115879980A (en) Method and device for passenger group circle selection and comparative analysis
CN115599871A (en) Lake and bin integrated data processing system and method
CN110928938B (en) Interface middleware system
CN114253951B (en) Data processing method, system and second server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination