CN113157695B

CN113157695B - Data processing method and device, readable medium and electronic equipment

Info

Publication number: CN113157695B
Application number: CN202110336511.0A
Authority: CN
Inventors: 王石冲; 王航宇; 罗梦瑶; 汪鹏; 丁春雷; 宋骞; 于佳萍
Original assignee: Douyin Vision Co Ltd
Current assignee: Douyin Vision Co Ltd
Priority date: 2021-03-29
Filing date: 2021-03-29
Publication date: 2023-06-06
Anticipated expiration: 2041-03-29
Also published as: CN113157695A

Abstract

The disclosure relates to a data processing method, a device, a readable medium and an electronic apparatus, including: acquiring import data, wherein the import data comprises a user ID and tag data corresponding to the user ID; distributing target data nodes for the user ID; and storing the imported data by using the first bitmaps respectively corresponding to the target data nodes through the labels in the label data, wherein the first bitmaps comprise mapping tables, the first 32 bits of data in the 64-bit long integer data are used as keywords of the mapping tables, the last 32 bits of data are used as values of the keywords, and the values of the keywords are stored in the second bitmaps supporting the 32-bit integer data. Thus, a data processing system for performing insight analysis or crowd circle selection on users can be established, compared with a bitmap which only supports 32-bit integer, the first bitmap which supports 64-bit long integer data expands the user scale which can be supported by the system, can realize data processing of billions or even billions of users, and improves the processing capacity of the system.

Description

Data processing method and device, readable medium and electronic equipment

Technical Field

The present disclosure relates to the field of data processing, and in particular, to a data processing method, apparatus, readable medium, and electronic device.

Background

Crowd insight is a very important function. The method can help the user to more deeply and carefully understand the appointed crowd. Crowd insight analysis typically requires querying a large amount of tagged user data, and many existing implementations exist, such as Spark offline reading, elastic search based approaches, bitmap based approaches, and so forth. The Spark offline reading mode has a certain speed problem, even if the data volume is not large, quick query response cannot be achieved, the elastosearch-based mode cannot support the requirement of user data with a large number of tags, the Bitmap-based mode is faster in response speed compared with the former two modes, the query effect of the user data with the large number of tags is good, but only the 32-bit integer ID type is supported, the number of users with the tags supported is limited under the condition that the integer ID is used for representing the users, and the user data with the large-scale user number level cannot be processed.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

In a first aspect, the present disclosure provides a data processing method, the method comprising:

acquiring import data, wherein the import data comprises a user ID and tag data corresponding to the user ID;

distributing target data nodes for the user ID;

storing the imported data through first bitmaps respectively corresponding to the labels in the label data in the target data node, wherein 64-bit long integer data can be stored in the first bitmaps;

the first bitmap comprises a mapping table, the first 32 bits of data in the 64-bit long integer data are used as keywords of the mapping table, the last 32 bits of data are used as values of the keywords, and the values of the keywords are stored in a second bitmap supporting the 32-bit long integer data.

In a second aspect, the present disclosure provides a data processing apparatus, the apparatus comprising:

the system comprises an acquisition module, a storage module and a storage module, wherein the acquisition module is used for acquiring import data, and the import data comprises a user ID and tag data corresponding to the user ID;

the slicing module is used for distributing target data nodes for the user IDs;

the processing module is used for storing the imported data through first bitmaps respectively corresponding to the labels in the label data in the target data node, and 64-bit long integer data can be stored in the first bitmaps;

In a third aspect, the present disclosure provides a computer readable medium having stored thereon a computer program which when executed by a processing device performs the steps of the method of the first aspect.

In a fourth aspect, the present disclosure provides an electronic device comprising:

a storage device having a computer program stored thereon;

processing means for executing said computer program in said storage means to carry out the steps of the method of the first aspect.

By the technical scheme, the data processing system for carrying out hole inspection analysis or crowd circle selection on the user can be established according to the label data of the user, compared with a bitmap which only supports 32-bit integer, the first bitmap which supports 64-bit long integer data enlarges the user scale which can be supported by the system, can realize data processing of billions or billions of users, greatly simplifies the process of processing the label data when the user scale is overlarge, and improves the processing capacity of the data processing system.

Additional features and advantages of the present disclosure will be set forth in the detailed description which follows.

Drawings

The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale. In the drawings:

fig. 1 is a flowchart illustrating a data processing method according to an exemplary embodiment of the present disclosure.

Fig. 2 is a flowchart illustrating a data processing method according to still another exemplary embodiment of the present disclosure.

Fig. 3 is a flowchart illustrating a data processing method according to still another exemplary embodiment of the present disclosure.

Fig. 4 is a flowchart illustrating a data processing method according to still another exemplary embodiment of the present disclosure.

Fig. 5 is a flowchart illustrating a data processing method according to still another exemplary embodiment of the present disclosure.

Fig. 6 is a block diagram illustrating a structure of a data processing apparatus according to an exemplary embodiment of the present disclosure.

Fig. 7 is a block diagram of a data processing apparatus according to still another exemplary embodiment of the present disclosure.

Fig. 8 is a block diagram of a data processing apparatus according to still another exemplary embodiment of the present disclosure.

Fig. 9 shows a schematic structural diagram of an electronic device suitable for use in implementing embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below.

It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.

It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be interpreted as "one or more" unless the context clearly indicates otherwise.

The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

Fig. 1 is a flowchart illustrating a data processing method according to an exemplary embodiment of the present disclosure. As shown in fig. 1, the method includes steps 101 to 103.

In step 101, import data is acquired, where the import data includes a user ID and tag data corresponding to the user ID. The tag data can be any attribute identifier of the user ID corresponding to the user, and the obtaining method can be given by other evaluation systems, or the user fills in actively, or the user obtains one or more of the attribute identifiers through friend evaluation of the user. The method of acquiring the tag data is not limited in this application. The tag data may include a plurality of tags, each of which may include, for example, a tag ID, a tag value, and the like.

In step 102, a target data node is assigned to the user ID. The data node is a working node for storing and reading the user ID and tag data corresponding to the user ID, and when the user ID is large, the number of data nodes may be plural, so that it is necessary to select a target data node for storing tag information of the user ID from among the plurality of data nodes when processing the imported data. When the target data node is allocated, the selection can be performed according to the amount of the stored data in the plurality of data nodes, for example, when the tag data corresponding to the user ID is more, the plurality of target data nodes are allocated for the user ID so as to ensure the data balance on each data node; alternatively, the target data node may be determined according to the tag included in the tag data corresponding to the user ID, so as to store the same tag data in the same data node; in addition, all the tag data corresponding to the same user ID can be stored in the same target data node, so that when a tag of a certain user ID is queried, only the data in the same data node can be acquired and calculated (local Join), the triggering of a Shuffle operation is avoided, and because the Shuffle operation moves a large amount of data, huge consumption is caused for calculation, memory, a network and a disk, and therefore Join operation performance based on the user ID can be improved.

In step 103, storing the imported data through first bitmaps corresponding to the labels in the label data respectively in the target data node, wherein 64-bit long integer data can be stored in the first bitmaps; the first bitmap comprises a mapping table, the first 32 bits of data in the 64-bit long integer data are used as keywords of the mapping table, the last 32 bits of data are used as values of the keywords, and the values of the keywords are stored in a second bitmap supporting the 32-bit long integer data.

After the target data node is determined, the imported data may be stored according to tag data corresponding to the user ID. In the target data node, each tag corresponds to an independent first bitmap (bitmap), and the first bitmap contains all user IDs corresponding to the tag in the target data node, namely, a data modeling mode of tag-user IDs is adopted.

The first bitmap can store 64-bit long integer data, namely, a user ID supporting the highest 64-bit long integer, so that a user scale of billions or billions can be supported, and the method is realized in a mapping table. When the two 64-bit first bitmaps perform set computation (e.g., intersection and compensation), the key words of the mapping tables on both sides can be traversed first, and then the intersection and compensation result between the corresponding values is filled as a new value, so that the result of the first bitmaps which are 64 bits and are obtained after the two 64-bit first bitmaps perform intersection and compensation set computation can be obtained.

Fig. 2 is a flowchart illustrating a data processing method according to still another exemplary embodiment of the present disclosure. As shown in fig. 2, the method further comprises step 201.

In step 201, a first hash calculation is performed on the user ID to obtain a first ID, a remainder obtained by dividing the first ID by the total number of data nodes is determined, and the target data node allocated to the user ID is determined according to the remainder.

That is, in the above-mentioned step 102, an allocation method is adopted in which all the tag data corresponding to the same user ID are stored in the same target data node, specifically, the method is implemented by a hash calculation method.

In one possible case, each data node may correspond to a sequence number, for example, in the case where the total number of data nodes is N, the sequence number of each data node may be M, where M e (0, N-1), when determining the target data node according to the remainder, the remainder may be directly determined as the sequence number of the target data node.

Since the output value obtained by the same user ID using the same hash algorithm, that is, the first ID, is fixed and random, it can be ensured that the remainder obtained by dividing the first ID by the total number of data nodes can be evenly distributed in the range of (0, n-1), thereby not only ensuring that the tag data of the same user ID can be distributed only to the same data node, but also ensuring that all user IDs can be relatively evenly distributed in each data node, and ensuring the balance of the number of users on each data node.

Fig. 3 is a flowchart illustrating a data processing method according to still another exemplary embodiment of the present disclosure. As shown in fig. 3, the method further comprises step 301.

In step 301, a second hash calculation is performed on the user ID to obtain a second ID, and a target partition in the target data node to which the user ID is to be allocated is determined according to the second ID.

In step 302, the imported data is stored through the corresponding first bitmaps in the target partition by the respective labels in the label data.

That is, after determining the unique target data node to which the user ID is to be assigned, the partition (shield) in which the user ID is located in the target data node may be determined according to the user ID, so that it is ensured that not only the tag data of one user ID can be stored in the same data node, but also the tag data of one user ID can be stored in the same partition. Therefore, the data stored in the first bitmap between each partition in each data node cannot have intersection, and each partition can be processed simultaneously in parallel in the query process, so that the parallel capability of a computing engine is fully utilized, and the processing efficiency of the data processing system is further improved.

Fig. 4 is a flowchart illustrating a data processing method according to still another exemplary embodiment of the present disclosure. As shown in fig. 4, the method further comprises a step 401 and a step 402.

In step 401, a target encoding range is determined according to the number of user IDs stored in the target data node, and the user IDs are recoded into a second ID, wherein the second ID is in the target encoding range, a one-to-one mapping relation between the user IDs and the second ID is stored, the second ID is 64-bit long integer data, and the target encoding range is 2 identical to the first 32-bit data ³² Adjacent codes.

In step 402, the second IDs are imported into the first bitmaps respectively corresponding to the tags. And under the condition that the target partition is determined, the first bitmap corresponding to each label is the first bitmap in the target partition.

The number of user IDs stored in the target data node is less than 2 ³² In each case, it can be determined that the target coding range is 0 to 2 ³² The number of stored user IDs at the target data node is greater than 2 ³² And is less than 2.2 ³² In the case of (2), it can be determined that the target encoding range is 2 ³² ～2·2 ³² And so on.

In order to ensure that the second IDs encoded by different user IDs stored in different data nodes are not repeated in the case of a plurality of data nodes, the target encoding ranges that can be allocated in the respective data nodes can be divided on the basis of this. For example, 0 to 2.2 are divided for data node 0 in advance ³² Is divided into 2.2 for data node 1 ³² ～4·2 ³² And so on, each data node determines the current target encoding range according to the number of stored user IDs, and re-encodes the user IDs. Alternatively, also root Dividing the coding range in real time according to the number of user IDs actually stored in each data node, for example, firstly respectively allocating 2 to each data node according to the sequence number of the data node ³² The target coding range of the Mth data node is M.2 ³² ～(M+1)·2 ³² Then monitoring the number of stored user IDs in each data node, if the number of the first stored user IDs reaches 2 ³² The data node of (a) is the X data node, then the next target coding range of the X data node can be N.2 ³² ～(N+1)·2 ³² N is the total number of data nodes.

After recoding the user ID to the second ID in the target encoding range, the one-to-one mapping between them can be saved as index data in the data processing system to assist in decoding when decoding is required.

Since the second ID in each target encoding range is the same 2 as the first 32 bits of data ³² The adjacent codes are coded, so that even if the distribution of the user IDs is very dispersed, the recoded second ID can be controlled in a fixed interval through recoding, and the storage space occupied by a first bitmap corresponding to each label on the target data node is smaller; in addition, the first 32 bits of data of all the second IDs in each data node can be concentrated on part of key words, so that the number of times of traversing operation on the key words when data in the data nodes are subjected to query operation, such as performing intersection and interpolation computation, is greatly reduced, and the number of times of intersection and interpolation operation between first bitmaps which are actually required to be performed is also greatly reduced.

In another possible implementation manner, the determining the target coding range according to the number of the user IDs stored in the target data node may also be accomplished by the following method.

Determining the target coding range as (a.N+M). 2 ³² ～(a·N+M+1)·2 ³² Wherein, the method comprises the steps of, wherein,

s is the number of user IDs stored in the target data node, < >>

In order to take the whole operation symbol downwards, N is the total number of data nodes, and M is the sequence number of the target data node.

That is, each data node is assigned with 2 in advance according to the sequence number of the data node ³² Each adjacent code, and when the number of stored user IDs in any data node reaches 2 ³² The next target encoding range allocated thereto is then still determined in the order of the sequence numbers of the data nodes.

In a possible implementation manner, before performing the step 103, by using each tag in the tag data, and storing the imported data in the first bitmap corresponding to each of the target data nodes, the method may further include: judging whether the label in the label data exists the corresponding first bitmap in the target data node or not; if the label in the label data does not exist the corresponding first bitmap in the target data node, the first bitmap corresponding to the label in the label data is created in the target data node. If the label in the label data has a corresponding first bitmap in the target data node, the imported data can be directly stored.

In addition, when the target partition is determined, it should be determined whether the label in the label data has a corresponding first bitmap in the target partition of the target data node, and if not, a first bitmap corresponding to the label in the label data is created so as to store the label data.

Fig. 5 is a flowchart illustrating a data processing method according to still another exemplary embodiment of the present disclosure. As shown in fig. 5, the method further comprises step 501 and step 502.

In step 501, a query instruction is received, where the query instruction includes a plurality of sub-query conditions and a logic operation instruction between the plurality of sub-query conditions.

In step 502, multiple sets of user data are obtained according to the multiple sub-query conditions, and the multiple sets of user data are calculated according to the logic operation instruction, so as to obtain target data.

The query instruction may be a query instruction indirectly determined according to a query requirement input by a user, or may be a query instruction directly input by the user, including a plurality of sub-query conditions and a logic operation instruction between the plurality of sub-query conditions. For example, the query instruction may be entered by the following custom function bitmapCount:

select bitmapCount(expression)(idx_column，bitmap_column)from ((${index_sql_1})union all(${index_sql_2})union all(${index_sql_3}))，

Wherein, the expression is a logical operation expression among the plurality of sub-query conditions, the supported logical operation can be AND, OR, NOT, etc., wherein, the number of the sub-query condition is used for representing each sub-query condition, idx_column is the number of the plurality of sub-query conditions, bitmap_column is the bitmap (first bitmap) to be queried, index_sql_1, index_sql_2 and index_sql_3 are query sentences of three sub-query conditions respectively.

An exemplary query instruction given according to the custom function may be as follows:

select bitmapCount ('(1 & 2) |3') (idx, bitmap) from (select 1as idx, bitmapOr (bitmap) from database value= 'man' group by value) unit (select 2as idx, bitmap) from database value in ('Shanghai', 'Beijing') group by value) unit (select 3as idx, bitmap) from value of unit (bitmap) from database value of unit (bitmap) from data base value of unit 2.table_value > = 25group by decimal_value),

the above inquiry command means to obtain all men in the open sea or Beijing, or people older than 25. The decialml_value is a numeric tag value in the first bitmap, the value is a string tag value in the first bitmap, and operators (e.g., in, =, > =, <, <=, hasAny, hasAll, etc.) used when constructing the query statement may also be different according to different tag value types and query types.

By the method for separating the data acquisition logic and the calculation logic in the query instruction, each sub-query instruction can be executed concurrently, and then the logic operation is carried out according to the logic operation instruction in the final stage, so that the parallel capability of the data processing system is utilized more fully, and the query performance is improved fully.

Fig. 6 is a block diagram illustrating a structure of a data processing apparatus according to an exemplary embodiment of the present disclosure. As shown in fig. 6, the apparatus includes: an obtaining module 10, configured to obtain import data, where the import data includes a user ID and tag data corresponding to the user ID; a slicing module 20, configured to allocate a target data node to the user ID; the processing module 30 is configured to store the imported data through first bitmaps corresponding to each tag in the tag data in the target data node, where the first bitmaps can store 64-bit long integer data; the first bitmap comprises a mapping table, the first 32 bits of data in the 64-bit long integer data are used as key words of the mapping table, the last 32 bits of data are used as values of the key words, and the values of the key words are stored in a second bitmap supporting the 32-bit long integer data.

In one possible implementation, the slicing module 20 is further configured to: and performing first hash calculation on the user ID to obtain a first ID, determining a remainder obtained by dividing the first ID by the total number of data nodes, and determining the target data node allocated for the user ID according to the remainder.

In one possible implementation, the slicing module 20 is further configured to: performing second hash calculation on the user ID to obtain a second ID, and determining a target partition in the target data node to which the user ID is to be allocated according to the second ID; the processing module 30 is further configured to: and storing the imported data through the first bitmaps respectively corresponding to the labels in the label data in the target partition.

In one possible implementation, the processing module 30 includes: a first processing sub-module, configured to determine a target encoding range according to the number of user IDs stored in the target data node, and recode the user IDs into a second ID, where the second ID is within the target encoding range, a one-to-one mapping relationship between the user IDs and the second ID is stored, the second ID is 64-bit long integer data, and the target encoding range is 2 identical to the first 32-bit data ³² Adjacent codes; and the second processing sub-module is used for importing the second ID into the first bitmap corresponding to each tag respectively.

In a possible implementation manner, the first processing sub-module is further configured to determine the target coding range by: determining the target coding range as (a.N+M). 2 ³² ～ (a·N+M+1)·2 ³² Wherein, the method comprises the steps of, wherein,

s is the number of user IDs stored in the target data node, < >>

In order to round down the operation symbol, N is the total number of data nodes, and M is the sequence number of the target data node.

Fig. 7 is a block diagram of a data processing apparatus according to still another exemplary embodiment of the present disclosure. As shown in fig. 7, the apparatus further includes: a judging module 40, configured to judge whether the label in the label data has the corresponding first bitmap in the target data node; a creating module 50, configured to create, in the target data node, the first bitmap corresponding to the tag in the tag data if the first bitmap corresponding to the tag does not exist in the target data node.

Fig. 8 is a block diagram of a data processing apparatus according to still another exemplary embodiment of the present disclosure. As shown in fig. 8, the apparatus further includes: a receiving module 60, configured to receive a query instruction, where the query instruction includes a plurality of sub-query conditions and logic operation instructions between the plurality of sub-query conditions; the query module 70 is configured to obtain multiple sets of user data according to the multiple sub-query conditions, and calculate the multiple sets of user data according to the logic operation instruction, so as to obtain target data.

Referring now to fig. 9, a schematic diagram of an electronic device 900 suitable for use in implementing embodiments of the present disclosure is shown. The terminal devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 9 is merely an example, and should not impose any limitations on the functionality and scope of use of embodiments of the present disclosure.

As shown in fig. 9, the electronic device 900 may include a processing means (e.g., a central processor, a graphics processor, etc.) 901, which may perform various actions and processes as appropriate according to a program stored in a Read Only Memory (ROM) 902 or a program loaded from a storage means 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data necessary for the operation of the electronic device 900 are also stored. The processing device 901, the ROM 902, and the RAM 903 are connected to each other through a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.

In general, the following devices may be connected to the I/O interface 905: input devices 906 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 907 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 908 including, for example, magnetic tape, hard disk, etc.; and a communication device 909. Communication means 909 may allow electronic device 900 to communicate wirelessly or by wire with other devices to exchange data. While fig. 9 shows an electronic device 900 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a non-transitory computer readable medium, the computer program comprising program code for performing the methods shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via the communication device 909, or installed from the storage device 908, or installed from the ROM 902. When executed by the processing device 901, performs the above-described functions defined in the methods of the embodiments of the present disclosure.

It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

In some implementations, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.

The computer readable medium may be contained in the electronic device; or may exist alone without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring import data, wherein the import data comprises a user ID and tag data corresponding to the user ID; distributing target data nodes for the user ID; storing the imported data through first bitmaps respectively corresponding to the labels in the label data in the target data node, wherein 64-bit long integer data can be stored in the first bitmaps; the first bitmap comprises a mapping table, the first 32 bits of data in the 64-bit long integer data are used as keywords of the mapping table, the last 32 bits of data are used as values of the keywords, and the values of the keywords are stored in a second bitmap supporting the 32-bit long integer data.

Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including, but not limited to, an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules involved in the embodiments described in the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The name of a module is not limited to the module itself in some cases, and for example, the acquisition module may be also described as "a module that acquires acquisition import data".

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

According to one or more embodiments of the present disclosure, example 1 provides a data processing method, the method comprising: acquiring import data, wherein the import data comprises a user ID and tag data corresponding to the user ID; distributing target data nodes for the user ID; storing the imported data through first bitmaps respectively corresponding to the labels in the label data in the target data node, wherein 64-bit long integer data can be stored in the first bitmaps; the first bitmap comprises a mapping table, the first 32 bits of data in the 64-bit long integer data are used as keywords of the mapping table, the last 32 bits of data are used as values of the keywords, and the values of the keywords are stored in a second bitmap supporting the 32-bit integer data.

In accordance with one or more embodiments of the present disclosure, example 2 provides the method of example 1, the assigning the user ID to a target data node comprising: and performing first hash calculation on the user ID to obtain a first ID, determining a remainder obtained by dividing the first ID by the total number of data nodes, and determining the target data node allocated for the user ID according to the remainder.

In accordance with one or more embodiments of the present disclosure, example 3 provides the method of example 2, the assigning the user ID to a target data node further comprising: performing second Ha Xiji on the user ID to obtain a second ID, and determining a target partition in the target data node to which the user ID is to be allocated according to the second ID; the storing the imported data through the first bitmaps corresponding to the labels in the label data in the target data node respectively includes: and storing the imported data through the first bitmaps respectively corresponding to the labels in the label data in the target partition.

According to one or more embodiments of the present disclosure, example 4 provides the method of example 1, wherein storing, by each tag in the tag data, the imported data in the first bitmap corresponding to each of the target data nodes includes: determining a target coding range according to the number of user IDs stored in the target data node, and recoding the user IDs into second IDs, wherein the second IDs are in the target coding range, one-to-one mapping relation between the user IDs and the second IDs is stored, the second IDs are 64-bit long integer data, and the target coding range is the first 32-bit number According to the same 2 ³² Adjacent codes; and importing the second ID into the first bitmap corresponding to each tag.

In accordance with one or more embodiments of the present disclosure, example 5 provides the method of example 4, the determining the target encoding range from the number of user IDs stored in the target data node comprising: determining the target coding range as (a.N+M). 2 ³² ～(a·N+M+1)·2 ³² Wherein, the method comprises the steps of, wherein,

s is the number of user IDs stored in the target data node, < >>

According to one or more embodiments of the present disclosure, example 6 provides the method of example 1, wherein before the storing the imported data by the respective labels in the label data, in the respective corresponding first bitmaps in the target data node, the method further comprises: judging whether the label in the label data exists the corresponding first bitmap in the target data node or not; if the label in the label data does not exist the corresponding first bitmap in the target data node, the first bitmap corresponding to the label in the label data is created in the target data node.

In accordance with one or more embodiments of the present disclosure, example 7 provides the method of example 1, the method further comprising: receiving a query instruction, wherein the query instruction comprises a plurality of sub-query conditions and a logic operation instruction among the plurality of sub-query conditions; and respectively acquiring a plurality of groups of user data according to the plurality of sub-query conditions, and calculating the plurality of groups of user data according to the logic operation instruction to obtain target data.

According to one or more embodiments of the present disclosure, example 8 provides a data processing apparatus, the apparatus comprising: the system comprises an acquisition module, a storage module and a storage module, wherein the acquisition module is used for acquiring import data, and the import data comprises a user ID and tag data corresponding to the user ID; the slicing module is used for distributing target data nodes for the user IDs; the processing module is used for storing the imported data through first bitmaps respectively corresponding to the labels in the label data in the target data node, and 64-bit long integer data can be stored in the first bitmaps; the first bitmap comprises a mapping table, the first 32 bits of data in the 64-bit long integer data are used as keywords of the mapping table, the last 32 bits of data are used as values of the keywords, and the values of the keywords are stored in a second bitmap supporting the 32-bit long integer data.

According to one or more embodiments of the present disclosure, example 9 provides a computer-readable medium having stored thereon a computer program which, when executed by a processing device, implements the steps of the method of any of examples 1-7.

In accordance with one or more embodiments of the present disclosure, example 10 provides an electronic device, comprising: a storage device having a computer program stored thereon; processing means for executing the computer program in the storage means to implement the steps of the method of any one of examples 1-7.

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this disclosure is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or equivalents thereof without departing from the spirit of the disclosure. Such as the above-described features, are interchanged with the features disclosed in the present disclosure (but not limited to) having similar functions.

Moreover, although operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are example forms of implementing the claims. The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

Claims

1. A method of data processing, the method comprising:

distributing target data nodes for the user ID;

the first bitmap comprises a mapping table, the mapping table is used for storing keywords and values of the keywords, first 32 bits of data in the 64-bit long integer data are used as the keywords of the mapping table, last 32 bits of data are used as the values of the keywords, and the values of the keywords are stored in a second bitmap supporting the 32-bit integer data.

2. The method of claim 1, wherein said assigning the user ID to the target data node comprises:

and performing first hash calculation on the user ID to obtain a first ID, determining a remainder obtained by dividing the first ID by the total number of data nodes, and determining the target data node allocated for the user ID according to the remainder.

3. The method of claim 2, wherein said assigning the user ID to the target data node further comprises:

performing second hash calculation on the user ID to obtain a second ID, and determining a target partition in the target data node to which the user ID is to be distributed according to the second ID;

the storing the imported data through the first bitmaps corresponding to the labels in the label data in the target data node respectively includes:

and storing the imported data through the first bitmaps respectively corresponding to the labels in the label data in the target partition.

4. The method of claim 1, wherein storing the imported data with respective corresponding first bitmaps in the target data node by respective tags in the tag data comprises:

Determining a target coding range according to the number of stored user IDs in the target data node, and recoding the user IDs into second IDs, wherein the second IDs are in the target coding range, one-to-one mapping relation between the user IDs and the second IDs is stored, the second IDs are 64-bit long integer data, and the target coding range is 2 identical to the first 32 bits ³² Adjacent codes;

and importing the second ID into the first bitmap corresponding to each tag.

5. The method of claim 4, wherein said determining a target encoding range based on the number of user IDs stored in said target data node comprises:

s is the number of user IDs stored in the target data node, < >>

6. The method of claim 1, wherein before storing the imported data by each tag in the tag data in the corresponding first bitmap in the target data node, the method further comprises:

Judging whether the label in the label data exists the corresponding first bitmap in the target data node or not;

if the label in the label data does not exist the corresponding first bitmap in the target data node, the first bitmap corresponding to the label in the label data is created in the target data node.

7. The method according to claim 1, wherein the method further comprises:

receiving a query instruction, wherein the query instruction comprises a plurality of sub-query conditions and a logic operation instruction among the plurality of sub-query conditions;

and respectively acquiring a plurality of groups of user data according to the plurality of sub-query conditions, and calculating the plurality of groups of user data according to the logic operation instruction to obtain target data.

8. A data processing apparatus, the apparatus comprising:

the slicing module is used for distributing target data nodes for the user IDs;

9. A computer readable medium on which a computer program is stored, characterized in that the program, when being executed by a processing device, carries out the steps of the method according to any one of claims 1-7.

10. An electronic device, comprising:

a storage device having a computer program stored thereon;

processing means for executing said computer program in said storage means to carry out the steps of the method according to any one of claims 1-7.