CN110837510A - Data processing method, device, storage medium and processor - Google Patents

Data processing method, device, storage medium and processor Download PDF

Info

Publication number
CN110837510A
CN110837510A CN201911108257.8A CN201911108257A CN110837510A CN 110837510 A CN110837510 A CN 110837510A CN 201911108257 A CN201911108257 A CN 201911108257A CN 110837510 A CN110837510 A CN 110837510A
Authority
CN
China
Prior art keywords
data
type
data source
result set
source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911108257.8A
Other languages
Chinese (zh)
Other versions
CN110837510B (en
Inventor
郑文彪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Shenyan Intelligent Technology Co Ltd
Original Assignee
Beijing Shenyan Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Shenyan Intelligent Technology Co Ltd filed Critical Beijing Shenyan Intelligent Technology Co Ltd
Priority to CN201911108257.8A priority Critical patent/CN110837510B/en
Publication of CN110837510A publication Critical patent/CN110837510A/en
Application granted granted Critical
Publication of CN110837510B publication Critical patent/CN110837510B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Abstract

The invention discloses a data processing method, a data processing device, a storage medium and a processor. The method comprises the following steps: acquiring a data source set of equipment, wherein each data source in the data source set is used for identifying the equipment, the magnitude of each data source is the same, and the magnitude of each data source is greater than a target threshold; converting the data type of each data source from a character string type to a long integer type to obtain a plurality of first data; performing shift processing on each first data to obtain a plurality of second data; and performing logical operation on every two second data to obtain a result set. By the method and the device, the technical effect of improving the data processing efficiency is achieved.

Description

Data processing method, device, storage medium and processor
Technical Field
The present invention relates to the field of data processing, and in particular, to a data processing method, apparatus, storage medium, and processor.
Background
Currently, data is typically processed by a bitmap method (RoaringbitMap) that can represent 2^32 integer (Int) integers in their entirety with a capacity threshold of 512 MB.
However, since the device identifiers (IDFA For short) included in the service are strings encrypted by the 32-bit MD5 and do not match with the Int type, the above-mentioned type data cannot satisfy the existing logical calculation of hundred million data size, and the time complexity For processing the above-mentioned data is o (n) 2, which results in low efficiency of data processing.
Aiming at the problem of low efficiency of data processing in the prior art, no effective solution is provided at present.
Disclosure of Invention
The invention mainly aims to provide a data processing method, a data processing device, a storage medium and a processor, which at least solve the problem of low efficiency of data processing.
In order to achieve the above object, according to an aspect of the present invention, there is provided a data processing method. The method can comprise the following steps: acquiring a data source set of equipment, wherein each data source in the data source set is used for identifying the equipment, the magnitude of each data source is the same, and the magnitude of each data source is greater than a target threshold; converting the data type of each data source from a character string type to a long integer type to obtain a plurality of first data; performing shift processing on each first data to obtain a plurality of second data; and performing logical operation on every two second data to obtain a result set.
Optionally, the method further comprises: and storing each first data into the cluster cache database according to the target data structure.
Optionally, storing each first data in the cluster cache database according to the target data structure includes: and storing each first data to a cluster cache database in a key-value pair mode.
Optionally, after performing a logical operation on every two second data to obtain a result set, the method further includes: and back-checking the data source set in the cluster cache database based on the result set.
Optionally, the shifting each first data to obtain a plurality of second data includes: and carrying out displacement processing on each first data to obtain first subdata of a first data bit and second subdata of a second data bit of each first data.
Optionally, converting the data type of each data source from a character string type to a long integer, and obtaining a plurality of first data includes: and carrying out hash operation on each data source to convert the data type of each data source from a character string type to a long integer type to obtain a plurality of first data.
Optionally, performing a logical operation on every two second data to obtain a result set includes one of: performing OR logic operation on every two second data to obtain a result set; performing AND logic operation on every two second data to obtain a result set; and performing non-logical operation on every two second data to obtain a result set.
In order to achieve the above object, according to another aspect of the present invention, there is also provided a data processing apparatus, comprising: the device comprises an acquisition unit, a storage unit and a processing unit, wherein the acquisition unit is used for acquiring a data source set of the device, and each data source in the data source set is used for identifying the device; the conversion unit is used for converting the data type of each data source from a character string type to a long integer type to obtain a plurality of first data; the shifting unit is used for shifting each first data to obtain a plurality of second data; and the operation unit is used for carrying out logic operation on every two second data to obtain a result set.
In order to achieve the above object, according to another aspect of the present invention, there is also provided a storage medium. The storage medium includes a stored program, wherein the apparatus in which the storage medium is located is controlled to execute the data processing method of the embodiment of the present invention when the program runs.
To achieve the above object, according to another aspect of the present invention, there is also provided a processor. The processor is used for running a program, wherein the program executes the data processing method of the embodiment of the invention when running.
According to the invention, a data source set of the acquisition equipment is adopted, wherein each data source in the data source set is used for identifying the equipment, the magnitude of each data source is the same, and the magnitude is greater than a target threshold; converting the data type of each data source from a character string type to a long integer type to obtain a plurality of first data; performing shift processing on each first data to obtain a plurality of second data; and performing logical operation on every two second data to obtain a result set. That is to say, for multiple same-magnitude same-unit data, the data types are converted and shifted, and then the logical operation is performed, the time overhead is about o (n), and the time complexity of o (n) ^2 which is spent by the traditional calculation scheme is reduced by one dimension, so that the technical problem of low data processing efficiency is solved, and the technical effect of improving the data processing efficiency is achieved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 is a flow chart of a method of data processing according to an embodiment of the invention;
FIG. 2 is a flow diagram of another data processing method according to an embodiment of the invention; and
fig. 3 is a schematic diagram of a data processing apparatus according to an embodiment of the present invention.
Detailed Description
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
The embodiment of the invention provides a data processing method.
Fig. 1 is a flow chart of a data processing method according to an embodiment of the present invention. As shown in fig. 1, the method may include the steps of:
step S102, acquiring a data source set of the equipment.
In the technical solution provided by step S102 of the present invention, each data source in the data source set is used for identifying a device, and each data source has the same magnitude, and the magnitude is greater than the target threshold.
In this embodiment, the data source set includes a plurality of data sources, and the plurality of data sources may be a large amount of data of the same type collected by an enterprise or an organization, that is, a plurality of data of the same order of magnitude and the same unit, and include information for identifying a device, for example, an IDFA in an advertisement service, and a device number such as an International Mobile Equipment Identity (IMEI) for short, which may be applied in a label field of advertisement calculation.
And step S104, converting the data type of each data source from the character string type to a long integer type to obtain a plurality of first data.
In the technical solution provided in step S104 of the present invention, after the data source set of the device is obtained, each data source in the data source set may be traversed, and the data type of each data source is converted from the character string type to the long integer type, so that each data source corresponds to the first data of the long integer type, thereby obtaining a plurality of first data.
In this embodiment, the data type of each data source may be a string type, for example, the IDFA may be a string encrypted by the 32-bit MD5, which does not match with the Int type integer of the bitmap method and thus needs to be converted into a data type, and the data type of each data source may be converted from a string type to a Long integer (Long), so that each data source can correspond to a unique integer representing a Long type, for example, the IDFA of each encrypted data individual can correspond to a unique integer representing a Long type.
It should be noted that there is a certain time overhead in converting the data type of each data source from a string type to a long integer type in this embodiment.
Step S106, shift processing is performed on each first data to obtain a plurality of second data.
In the technical solution provided by step S106 of the present invention, after the data type of each data source is converted from the character string type to the long integer type to obtain a plurality of first data, each first data is shifted to obtain a plurality of second data.
In this embodiment, each first data is an integer of Long type, has 64 bits, and each first data may be subjected to shift processing to obtain first sub data of a first data Bit of each first data and second sub data of a second data Bit of each first data, where the first sub data of the first data Bit may be upper 32-Bit data of data obtained by shifting each first data, and the second sub data of the second data Bit may be lower 32-Bit data obtained by shifting each first data, so that one second data corresponding to each first data includes the upper 32-Bit data and the lower 32-Bit data obtained by shifting each first data.
Alternatively, this embodiment stores one data set using a Map < upper 32-bit, RoaringBitMap > data structure (hereinafter, collectively referred to as Map64), such that each first data generates one Map64, resulting in a plurality of maps 64, each Map64 is used to represent one first data, e.g., each Map64 may be used to identify one IDFA. In the case of a large amount of data, the memory space consumed by Map64 also increases linearly. Wherein each first data can be represented as one IDFA using a Map < 32-bit high, RoaringBitMap (32-bit low) > data structure.
And step S108, performing logical operation on every two second data to obtain a result set.
In the technical solution provided by step S108 of the present invention, after each first data is shifted to obtain a plurality of second data, each two second data are logically operated to obtain a result set.
In this embodiment, a logical operation may be performed on every two second data, for example, an or logical operation may be performed on every two second data to obtain a result set; performing AND logic operation on every two second data to obtain a result set; and performing non-logical operation on every two second data to obtain a result set, so that a common part, a mutual exclusion part and the like are quickly searched among a plurality of data sources. The embodiment performs logical operation on every two second data to obtain a result set which is a Long type set, and can effectively solve the problem of logical screening calculation among large data volumes.
Through the steps S102 to S108 of the present invention, a data source set of the acquisition device is adopted, wherein each data source in the data source set is used for identifying the device, and each data source has the same magnitude and the magnitude is greater than a target threshold; converting the data type of each data source from a character string type to a long integer type to obtain a plurality of first data; performing shift processing on each first data to obtain a plurality of second data; and performing logical operation on every two second data to obtain a result set. That is to say, for multiple same-magnitude same-unit data, the data types are converted and shifted, and then the logical operation is performed, the time overhead is about o (n), and the time complexity of o (n) ^2 which is spent by the traditional calculation scheme is reduced by one dimension, so that the technical problem of low data processing efficiency is solved, and the technical effect of improving the data processing efficiency is achieved.
As an optional implementation, the method further comprises: and storing each first data into the cluster cache database according to the target data structure.
In this embodiment, when the data type of each data source is converted from a string type to a long integer to obtain a plurality of first data, the plurality of first data may be stored according to a target data structure, where Map < high 32 bits, and RoaringBitMap (low 32 bits) > data structure) may be used to store one data set, and the data set may be stored in a cluster cache database, where the cluster cache database may be a rocksDB, a multi-node redis cluster, an Aerospike cluster, or the like. Alternatively, the cluster cache database may also replace the pika database applied on the ssd if the machine resources are sufficient or the capital cost is large enough, theoretically achieving higher timeliness.
As an optional implementation, the method further comprises: storing each first data into the cluster cache database according to the target data structure comprises: and storing each first data to a cluster cache database in a key-value pair mode.
In this embodiment, when Map < upper 32-bit, RoaringBitMap > data structure (hereinafter referred to as Map64 collectively) is used to store the upper 32-bit data and the lower 32-bit data obtained by each first data shift process, it may also be stored in the cluster cache database in the form of Key-Value pairs (Key-Value, abbreviated as K-V), for example, the Key is 684216861258721548, and the corresponding Value is 12385b750c02556e3d5ecb7b65d78b6 d. This operation can be executed in multiple threads at high concurrency, depending on the performance of the machine running the traversal dataset, or can be deployed in a cluster to run the operation, where the time complexity is o (n).
As an alternative implementation, after performing a logical operation on every two second data to obtain a result set in step S108, the method further includes: and back-checking the data source set in the cluster cache database based on the result set.
In this embodiment, each first data is stored in the cluster cache database according to the target data structure, so after a result set is obtained by performing logical operation on every two second data, a data source set corresponding to the result set, such as a device number, may be further back-checked in the cluster cache database, and the obtained second data is obtained by querying the cluster cache database to obtain the initial data source, thereby efficiently solving the problem of data back-checking.
Alternatively, the embodiment may perform an operation on each second data (Long) high < <32) | (low &0 xfffffffffl), and may perform an operation on a first sub data of a first data bit and a second sub data of a second data bit included in each second data (Long) high < <32) | (low &0 xfffffffffffl), so as to obtain the original first data, that is, obtain the original value, as a Long type of first data, where the first sub data of the first data bit may be an upper 32 bits obtained by shifting the Long type of first data, the second sub data of the second data bit may be a lower 32 bits obtained by shifting the first data, and then obtain the Long type of first data by querying the initial data source of the cache cluster database, where the value and the logical operation of shifting of each second data are fast and take almost negligible time, the total time taken to traverse all the data is only spent, depending on the size of the data volume and the memory used for the calculation, and thus the total time spent is close to o (n).
Optionally, in the traversal value of each second data (single Map64) in this embodiment, it is only necessary to extract the compressed bitmap (RoaringBitMap) of each second data while traversing to obtain the first sub-data of the first data bit, for example, obtain the data with the lower 32 bits, and then extract the key of each second data to obtain the second sub-data of the second data bit, for example, obtain the data with the higher 32 bits.
As an alternative implementation, in step S104, converting the data type of each data source from a string type to a long integer, and obtaining a plurality of first data includes: and carrying out hash operation on each data source to convert the data type of each data source from a character string type to a long integer type to obtain a plurality of first data.
In this embodiment, when the data type of each data source is converted from the string type to the long integer to obtain the plurality of first data, a hash operation may be performed on each data source, for example, the hash operation is performed on each data source by using the murmurur 3_128.asLong () hash algorithm of google to convert the data type of each data source from the string type to the long integer to obtain the plurality of first data.
The method of performing the logical operation on every two pieces of second data of this embodiment is exemplified below by performing the or logical operation on every two pieces of second data.
As an alternative implementation, performing an or logical operation on every two second data to obtain a result set includes: loading first target data in every two second data into a memory; traversing second target data in every two second data to obtain keys and values of the second target data; and performing OR operation on the compressed bitmap of the first target data and the compressed bitmap of the second target data to obtain a result set under the condition that the value of the first target data is the compressed bitmap.
In this embodiment, the first target data Map1 is determined in every two second data, Map1 is loaded into the memory, and then second target data Map2 in every two second data is traversed, so that the second target data Map2 is obtained as all keys (key) and values (value) and further the value of the first target data Map1, that is, the operation Map1.get (key), and the result is only null or compressed bitmap (roaring bitmap). If the value of the first target data is the compressed bitmap, performing or operation on the compressed bitmap of the first target data and the compressed bitmap of the second target data to obtain a result set, for example, performing or () operation on the roaring bitmap of map1.get (key) and the roaring bitmap obtained by map2.get (key), wherein the time consumption is almost 0, and finally obtaining the result set; if the value of the first target data is null, the structure of the key-value can be added to the first target data, and finally the result set can be obtained.
Other logical OR, logical AND, etc. logical operations may be analogized to the above.
The embodiment can effectively solve the problem of logic screening calculation among large data volumes through the data processing method, and particularly can solve the problem of the label field of advertisement calculation. For example, 2 billion phone tags are "men" and 8 million phone tags are "houses", and the set of phone tags "houses" can be calculated most quickly by using the data processing method.
In this embodiment, the specific time spent depends on the amount of data, since operations are only performed on different traversal data, while the real time spent overhead only occurs in two steps of type conversion and lookback in the cluster cache database. Multiple tests using spark show that 737741029 character strings encrypted by 32-bit MD5 take about 200 seconds for type conversion and about 60 seconds for logic calculation operation, so that the total time overhead of the scheme is controlled to be in the level of minutes, namely, the time complexity is about O (n), and the data processing efficiency is further improved.
Example 2
The data processing method of the present invention will be further illustrated with reference to preferred embodiments.
Fig. 2 is a flow chart of another data processing method according to an embodiment of the present invention. As shown in fig. 2, the method may include the steps of:
step S201, a data source set is obtained.
In this embodiment, the data source set, that is, the set of data sources, may include the device numbers such as IDFA and IMEI.
Step S202, traversing the data source set, adopting a Hash algorithm to enable an encrypted data individual to correspondingly and uniquely represent a Long type integer, and storing the K-V form into a corresponding cluster cache database.
Based on the existing well-developed bitmap compression method RoaringBitMap technology, the principle is bitmap method. The method can completely represent 2^32 Int type integers, the capacity threshold value is 512mb, but the type can not meet the logic calculation of the existing hundred million-level data volume, because IDFAs included in advertisement services are character strings encrypted by 32-bit MD5 and are not matched with Int types, the data type conversion of data in a data source set is needed, the data source set can be traversed, a Hash algorithm is adopted, so that one encrypted data individual can correspondingly and uniquely represent Long type integers, and the K-V form is stored in a corresponding cluster cache database.
In this embodiment, the hash algorithm may be murmur3_128.asLong () of google, and the K-V form may be 1684216861258721548-12385 b750c02556e3d5ecb7b65d78b6d, without any limitation. The cluster cache database, i.e., the rockDB cluster, of this embodiment includes a plurality of cluster nodes, and each cluster node (rockDB1, rockDB2, rockDB3 … …) is configured to store one data source in a data source set.
In this embodiment, the above operation may be executed in multiple threads at high concurrency, and depending on the performance of the machine running the traversal dataset, the cluster may also be deployed to run the operation, where the time complexity is o (n).
In step S203, a Long type integer is stored in a data set using a Map <32 upper bits, RoaringBitMap (32 lower bits) > data structure (hereinafter, referred to as Map64), wherein the Long type (64 Bit bits) is shifted to obtain 32 upper bits and 32 lower bits, and then stored in the Map 64.
In this embodiment, a data set generates a Map64, which may represent an IDFA, and if the data size is large, the memory space consumed by Map64 increases linearly.
In this embodiment, the traversal value of a single Map64 only needs to take out the roaring bitmap to obtain the low 32 bits, then takes out the key of Map64 to obtain the high 32 bits, and performs (((long) high) <32) | (low &0 xfffffffffffffl) on the high 32 bits to obtain the original value, where Map64 value and the shift logic operation are very fast, so that the time spent is negligible, and the whole time spent on traversing all data is only spent, depending on the size of the data volume and the memory used for calculation.
In step S204, logical operations are performed on any two data sets.
After the Long type (64 Bit bits) is shifted to obtain the upper 32 bits and the lower 32 bits, and stored in the Map64, it can be searched and logically calculated.
In this embodiment, the two data sets may be Map1, Map2, taking Map1, Map2 as an example for performing logic or computation: firstly, Map1 is loaded into a computing memory; secondly, traversing Map2, all keys and values of Map2 can be obtained (value ═ coarse bitmap); again, operation map1.get (key) will result in null or roaringBitMap only; further, it is determined that if Map1.get (key) is null, the key-value structure is added to Map1, and if Map1.get (key) is roaringBitMap, the roaringBitMap obtained from Map1.get (key) and the roaringBitMap obtained from Map2.get (key) are subjected to an or () operation, which is technically implemented, and the time consumption is almost 0, and finally a result set can be obtained. The end result is a new data set after a logical operation of two hundred million orders of magnitude data.
Other logical operations may be analogized to those described above.
And step S205, the result set is checked back.
In this embodiment, the final result obtained by the logic operation is a Long type set, and for the corresponding IDFA query, it is only necessary to store the corresponding Long integer data into the multi-node cluster cache database when performing the hash operation in step S202, so that the problem of back-check can be solved most efficiently.
In this embodiment, when the result set is checked back, the traversal value of a single Map64 only needs to take out the RoaringBitMap while traversing to obtain the low 32 bits, and then the key of Map64 is taken out to obtain the high 32 bits, and the operation of ((Long) high) < <32) | (low &0 xfffffffffl) is performed on the high 32 bits, so as to obtain the original value, an integer of Long type, wherein the time spent on the values and the shift logic operation is almost negligible, and thus the time overhead is close to o (n). And then, the obtained Long type data set is inquired through the rocksDB cluster, so that the initial data source is obtained.
Through the scheme of the embodiment, the logic screening calculation among large data volumes can be effectively solved, and particularly in the field of labels of advertisement calculation. The specific time spent depends on its data size, since the operations are only performed in different traversal data, while the real time spent overhead only occurs in two steps of type conversion and rocksDB cluster backcheck. With the above method of this embodiment, the total time overhead is controlled to the minute level, i.e., the time complexity is about O (n), thereby reducing the time complexity of O (n) 2, which is required by the conventional computation scheme, by one dimension.
It should be noted that, in this embodiment, the cluster cache database (rocksDB in the above) may be replaced by a redis cluster or an Aerospike cluster, and the like, so as to perform a test to select an optimal scheme. Even if the machine resources are sufficient or the capital cost is large enough, the pika database applied on the ssd can be replaced, and theoretically higher timeliness can be obtained.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.
Example 3
The embodiment of the invention also provides a data processing device. It should be noted that the data processing apparatus of this embodiment may be used to execute the data processing method of the embodiment of the present invention.
Fig. 3 is a schematic diagram of a data processing apparatus according to an embodiment of the present invention. As shown in fig. 3, the data processing apparatus 30 may include: an acquisition unit 31, a translation unit 32, a shift unit 33, and an arithmetic unit 34.
An obtaining unit 31, configured to obtain a set of data sources of a device, where each data source in the set of data sources is used to identify the device.
And the conversion unit 32 is configured to convert the data type of each data source from a character string type to a long integer type, so as to obtain a plurality of first data.
A shifting unit 33, configured to perform shifting processing on each first data to obtain a plurality of second data.
And the operation unit 34 is configured to perform a logical operation on every two second data to obtain a result set.
Optionally, the apparatus further comprises: and the storage unit is used for storing each first data into the cluster cache database according to the target data structure.
Optionally, the storage unit comprises: and the storage module is used for storing each first data to the cluster cache database according to the key value pair form.
Optionally, the apparatus further comprises: and the back-check unit is used for performing logical operation on every two second data to obtain a result set, and back-checking the data source set in the cluster cache database based on the result set.
Alternatively, the shift unit 33 includes: and the shifting module is used for shifting each first data to obtain first subdata of a first data bit and second subdata of a second data bit of each first data.
Optionally, the conversion unit 32 comprises: the first operation module is used for carrying out Hash operation on each data source so as to convert the data type of each data source from the character string type to the integer type and obtain a plurality of first data.
Optionally, the arithmetic unit 34 comprises one of: the second operation module is used for carrying out OR logic operation on every two second data to obtain a result set; the third operation module is used for performing AND logic operation on every two second data to obtain a result set; and the fourth operation module is used for performing non-logical operation on every two second data to obtain a result set.
The embodiment obtains a data source set of the device through an obtaining unit 31, where each data source in the data source set is used to identify the device; converting the data type of each data source from a character string type to a long integer type through a conversion unit 32 to obtain a plurality of first data; the shifting unit 33 shifts each first data to obtain a plurality of second data; every two second data are logically operated by the operation unit 34 to obtain a result set. That is to say, for multiple same-magnitude same-unit data, the data types are converted and shifted, and then the logical operation is performed, the time overhead is about o (n), and the time complexity of o (n) ^2 which is spent by the traditional calculation scheme is reduced by one dimension, so that the technical problem of low data processing efficiency is solved, and the technical effect of improving the data processing efficiency is achieved.
Example 4
The embodiment of the invention also provides a storage medium. The storage medium includes a stored program, wherein the apparatus in which the storage medium is located is controlled to execute the data processing method in the embodiment of the present invention when the program runs.
Example 5
The embodiment of the invention also provides a storage medium. The storage medium includes a stored program, wherein the apparatus in which the storage medium is located is controlled to execute the data processing method of the embodiment of the present invention when the program runs.
It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and they may alternatively be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, or fabricated separately as individual integrated circuit modules, or fabricated as a single integrated circuit module from multiple modules or steps. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A data processing method, comprising:
acquiring a data source set of equipment, wherein each data source in the data source set is used for identifying the equipment, the magnitude of each data source is the same, and the magnitude is greater than a target threshold;
converting the data type of each data source from a character string type to a long integer type to obtain a plurality of first data;
performing shift processing on each first data to obtain a plurality of second data;
and performing logical operation on every two second data to obtain a result set.
2. The method of claim 1, further comprising:
and storing each first data into a cluster cache database according to a target data structure.
3. The method of claim 2, wherein storing each of the first data in the cluster cache database in the target data structure comprises:
and storing each first data to a cluster cache database in a key-value pair mode.
4. The method of any one of claim 2, wherein after performing a logical operation on every two of the second data to obtain a result set, the method further comprises:
reverse-looking the set of data sources in the cluster cache database based on the result set.
5. The method of claim 1, wherein each of the second data comprises a first sub data of a first data bit and a second sub data of a second data bit, and wherein shifting each of the first data to obtain a plurality of second data comprises:
and performing shift processing on each first data to obtain first subdata of a first data bit and second subdata of a second data bit of each first data.
6. The method of claim 1, wherein converting the data type of each data source from a string type to a long integer type, and obtaining the first data comprises:
and carrying out Hash operation on each data source so as to convert the data type of each data source from a character string type to the long integer type to obtain a plurality of first data.
7. The method of claim 1, wherein performing a logical operation on every two of the second data to obtain a result set comprises one of:
performing OR logic operation on every two second data to obtain the result set;
performing AND logic operation on every two second data to obtain the result set;
and performing non-logical operation on every two second data to obtain the result set.
8.A data processing apparatus, comprising:
the device comprises an acquisition unit, a storage unit and a processing unit, wherein the acquisition unit is used for acquiring a data source set of a device, and each data source in the data source set is used for identifying the device;
the conversion unit is used for converting the data type of each data source from a character string type to a long integer type to obtain a plurality of first data;
the shifting unit is used for shifting each first data to obtain a plurality of second data;
and the operation unit is used for carrying out logic operation on every two second data to obtain a result set.
9. A storage medium, comprising a stored program, wherein the program, when executed, controls an apparatus in which the storage medium is located to perform the method of any one of claims 1 to 7.
10. A processor, characterized in that the processor is configured to run a program, wherein the program when running performs the method of any of claims 1 to 7.
CN201911108257.8A 2019-11-13 2019-11-13 Data processing method, device, storage medium and processor Active CN110837510B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911108257.8A CN110837510B (en) 2019-11-13 2019-11-13 Data processing method, device, storage medium and processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911108257.8A CN110837510B (en) 2019-11-13 2019-11-13 Data processing method, device, storage medium and processor

Publications (2)

Publication Number Publication Date
CN110837510A true CN110837510A (en) 2020-02-25
CN110837510B CN110837510B (en) 2020-08-07

Family

ID=69576422

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911108257.8A Active CN110837510B (en) 2019-11-13 2019-11-13 Data processing method, device, storage medium and processor

Country Status (1)

Country Link
CN (1) CN110837510B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102223289A (en) * 2010-04-15 2011-10-19 杭州华三通信技术有限公司 Method and device for storing IPv4 (the Fourth Internet Protocol Version) address and IPv6 (the Sixth Internet Protocol Version) address
CN102332030A (en) * 2011-10-17 2012-01-25 中国科学院计算技术研究所 Data storing, managing and inquiring method and system for distributed key-value storage system
CN102457283A (en) * 2010-10-28 2012-05-16 阿里巴巴集团控股有限公司 Data compression and decompression method and equipment
US20120235843A1 (en) * 2011-03-18 2012-09-20 Renesas Electronics Corporation Digital-to-analog converter and circuit
US20140006466A1 (en) * 2012-06-29 2014-01-02 International Business Machines Corporation High speed and low power circuit structure for barrel shifter
US20170004150A1 (en) * 2013-03-15 2017-01-05 Tactile, Inc. Storing and processing data organized as flexible records
CN109325029A (en) * 2018-08-30 2019-02-12 天津大学 RDF data storage and querying method based on sparse matrix
CN109462468A (en) * 2017-09-06 2019-03-12 深圳光启智能光子技术有限公司 Data processing method and device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102223289A (en) * 2010-04-15 2011-10-19 杭州华三通信技术有限公司 Method and device for storing IPv4 (the Fourth Internet Protocol Version) address and IPv6 (the Sixth Internet Protocol Version) address
CN102457283A (en) * 2010-10-28 2012-05-16 阿里巴巴集团控股有限公司 Data compression and decompression method and equipment
US20120235843A1 (en) * 2011-03-18 2012-09-20 Renesas Electronics Corporation Digital-to-analog converter and circuit
CN102332030A (en) * 2011-10-17 2012-01-25 中国科学院计算技术研究所 Data storing, managing and inquiring method and system for distributed key-value storage system
US20140006466A1 (en) * 2012-06-29 2014-01-02 International Business Machines Corporation High speed and low power circuit structure for barrel shifter
US20170004150A1 (en) * 2013-03-15 2017-01-05 Tactile, Inc. Storing and processing data organized as flexible records
CN109462468A (en) * 2017-09-06 2019-03-12 深圳光启智能光子技术有限公司 Data processing method and device
CN109325029A (en) * 2018-08-30 2019-02-12 天津大学 RDF data storage and querying method based on sparse matrix

Also Published As

Publication number Publication date
CN110837510B (en) 2020-08-07

Similar Documents

Publication Publication Date Title
CA2876466C (en) Scan optimization using bloom filter synopsis
US9721116B2 (en) Test sandbox in production systems during productive use
Lemire et al. Consistently faster and smaller compressed bitmaps with roaring
Ediger et al. Tracking structure of streaming social networks
US10073876B2 (en) Bloom filter index for device discovery
US10783163B2 (en) Instance-based distributed data recovery method and apparatus
US9779266B2 (en) Generation of analysis reports using trusted and public distributed file systems
US11481440B2 (en) System and method for processing metadata to determine an object sequence
Petermann et al. DIMSpan: Transactional frequent subgraph mining with distributed in-memory dataflow systems
Yang et al. A MapReduce approach for spatial co-location pattern mining via ordered-clique-growth
CN108563697B (en) Data processing method, device and storage medium
CN113760847A (en) Log data processing method, device, equipment and storage medium
CN106802927A (en) A kind of date storage method and querying method
US20140258307A1 (en) Method for Preparing Numerous Data for Efficient Manipulation using Interning
CN110837510B (en) Data processing method, device, storage medium and processor
CN116521956A (en) Graph database query method and device, electronic equipment and storage medium
Romero et al. Bolt: Fast inference for random forests
US9659061B2 (en) Method for efficient aggregation of numerous data using sparse bit sets
CN106250440B (en) Document management method and device
CN104636474A (en) Method and equipment for establishment of audio fingerprint database and method and equipment for retrieval of audio fingerprints
CN115269654A (en) Data cache supplementing method, device, equipment and medium
KR100870144B1 (en) Web history archive system and method for web pages management
US11921690B2 (en) Custom object paths for object storage management
CN113760907A (en) Data uniqueness identification method in database
Lakshmi et al. Compact in‐memory representation of large graph databases for efficient mining of maximal frequent sub graphs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant