CN107133329B - Data processing method, data processing apparatus, and storage medium - Google Patents

Data processing method, data processing apparatus, and storage medium Download PDF

Info

Publication number
CN107133329B
CN107133329B CN201710321319.8A CN201710321319A CN107133329B CN 107133329 B CN107133329 B CN 107133329B CN 201710321319 A CN201710321319 A CN 201710321319A CN 107133329 B CN107133329 B CN 107133329B
Authority
CN
China
Prior art keywords
protocol
data
real
intersection
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710321319.8A
Other languages
Chinese (zh)
Other versions
CN107133329A (en
Inventor
付兴旺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Yayue Technology Co.,Ltd.
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201710321319.8A priority Critical patent/CN107133329B/en
Publication of CN107133329A publication Critical patent/CN107133329A/en
Application granted granted Critical
Publication of CN107133329B publication Critical patent/CN107133329B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution

Abstract

The invention discloses a data processing method, a data processing device and a storage medium, wherein the processing method firstly analyzes real-time data to obtain analysis data; then inquiring whether a corresponding storage value is empty in a cache according to the global unique identifier in the analyzed data, and carrying out data deduplication when the storage value is not empty; when the stored value is empty, updating the stored value corresponding to the protocol number; and after updating, traversing the stored values respectively corresponding to the plurality of protocol numbers corresponding to the intersection in the statistical instruction, and counting the statistical indexes in the intersection when the stored values respectively corresponding to the plurality of protocol numbers are not empty. The global unique identifier in the intersection is used as a main key during caching, the characteristic value of each piece of real-time data is mapped to the stored value corresponding to the main key after being analyzed, and then intersection duplication elimination counting is carried out through the change of the specified bit in the stored value, so that the occupied memory space is small, and the calculation efficiency can be improved.

Description

Data processing method, data processing apparatus, and storage medium
Technical Field
The present invention relates to the field of data processing, and in particular, to a data processing method, a data processing apparatus, and a storage medium.
Background
With the advent of the big data age, the types of data have been increasing in a blowout manner, which has brought unprecedented opportunities and challenges to the data analysis industry. The duplication elimination count of the intersection belongs to one of the most basic and frequently used operation methods in big data statistics.
The operation steps of the existing intersection duplicate-removal counting method are summarized as follows: firstly, establishing N sets, caching all primary keys (keys) for deduplication in the sets, and taking protocol statistics as an example in the following, wherein the N sets can be understood as corresponding to N protocols; then, storing each piece of received data as a storage Value (Value), finding a set corresponding to the protocol according to the protocol number, and judging whether a main key of the set exists in an Nth set or not; if not, then go through the other N-1 sets to see if the primary key is present in all of the N-1 sets, and if so, then increment the statistical count by 1.
The existing intersection repeated counting method has the following defects: a large number of main keys need to be cached, and a large amount of memory is consumed; it is necessary to cross-reference a plurality of sets, which is computationally inefficient.
Disclosure of Invention
The invention aims to provide a data processing method, a data processing device and a storage medium, aiming at reducing the consumption of a memory and improving the calculation efficiency.
In order to solve the above technical problems, embodiments of the present invention provide the following technical solutions:
a method of data processing, comprising:
receiving real-time data, and analyzing the real-time data according to a protocol number contained in the real-time data to generate analyzed data;
inquiring a stored value corresponding to the protocol number according to the global unique identifier in the analysis data;
when the storage value is not empty, data deduplication is performed;
when the stored value is empty, updating the stored value corresponding to the protocol number; and
and after updating, traversing the stored values respectively corresponding to the plurality of protocol numbers corresponding to the intersection in the statistical instruction, and counting the statistical indexes in the intersection when the stored values respectively corresponding to the plurality of protocol numbers are not empty.
In order to solve the above technical problems, embodiments of the present invention further provide the following technical solutions:
a data processing apparatus comprising:
the analysis module is used for receiving the real-time data and analyzing the real-time data according to a protocol number contained in the real-time data to generate analysis data;
the query module is used for querying a stored value corresponding to the protocol number according to the global unique identifier in the analysis data;
the duplication removing module is used for carrying out data duplication removal when the stored value is not empty;
the updating module is used for updating the stored value corresponding to the protocol number when the stored value is empty; and
and the counting module is used for traversing the stored values respectively corresponding to the plurality of protocol numbers corresponding to the intersection in the counting instruction after updating, and counting the counting indexes in the intersection when the stored values respectively corresponding to the plurality of protocol numbers are not empty.
In order to solve the above technical problems, embodiments of the present invention further provide the following technical solutions:
a computer-readable storage medium, on which a computer program is stored which, when executed by a processor, implements the above-described data processing method.
According to the data processing method, the data processing device and the storage medium provided by the embodiment of the invention, firstly, real-time data is analyzed to obtain analysis data; and then inquiring whether the corresponding storage value is empty in the cache according to the global unique identifier in the analysis data, and removing the duplicate or updating the storage value according to the inquiry result. The global unique identifier in the intersection is used as a main key during caching, the characteristic value of each piece of real-time data is mapped to the stored value corresponding to the main key after being analyzed, and then intersection duplication elimination counting is carried out through the change of the specified bit in the stored value, so that the occupied memory space is small, and the calculation efficiency can be improved.
Drawings
The technical solution and other advantages of the present invention will become apparent from the following detailed description of specific embodiments of the present invention, which is to be read in connection with the accompanying drawings.
Fig. 1 is a schematic flow chart of a data processing method according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a data processing method according to an embodiment of the present invention;
fig. 3 is a schematic diagram illustrating an operation principle of a data processing method according to an embodiment of the present invention;
FIG. 4 is a schematic flow chart of a data processing method according to an embodiment of the present invention;
fig. 5 is a schematic diagram illustrating an operating principle of a data processing method according to an embodiment of the present invention;
FIG. 6 is a block diagram of a data processing apparatus according to an embodiment of the present invention;
FIG. 7 is a block diagram of a data processing apparatus according to an embodiment of the present invention;
fig. 8 is a schematic application diagram of a data processing method and a data processing apparatus according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of a communication device according to an embodiment of the present invention.
Detailed Description
Referring to the drawings, wherein like reference numbers refer to like elements, the principles of the present invention are illustrated as being implemented in a suitable computing environment. The following description is based on illustrated embodiments of the invention and should not be taken as limiting the invention with regard to other embodiments that are not detailed herein.
In the description that follows, specific embodiments of the present invention are described with reference to steps and symbols executed by one or more computers, unless otherwise indicated. Accordingly, these steps and operations will be referred to, several times, as being performed by a computer, the computer performing operations involving a processing unit of the computer in electronic signals representing data in a structured form. This operation transforms the data or maintains it at locations in the computer's memory system, which may be reconfigured or otherwise altered in a manner well known to those skilled in the art. The data maintains a data structure that is a physical location of the memory that has particular characteristics defined by the data format. However, while the principles of the invention have been described in language specific to above, it is not intended to be limited to the specific details shown, since one skilled in the art will recognize that various steps and operations described below may be implemented in hardware.
The terms "module" and "unit" as used herein may be considered software objects that execute on the computing system. The various components, modules, engines, and services described herein may be viewed as objects implemented on the computing system. The apparatus and method described herein are preferably implemented in software, but may also be implemented in hardware, and are within the scope of the present invention.
Referring to fig. 1, a flow chart of a data processing method according to an embodiment of the invention is shown.
The data processing method is applied to a real-time data processing platform and is used for processing data containing intersection operation in real time.
In step S101, real-time data is received and parsed according to a protocol number included in the real-time data to generate parsed data.
It can be understood that the format of the protocol data reported by the client is predefined. The protocol data format is as follows: protocol number | globally unique identifier of user | protocol content.
The protocol number is used for specifying a protocol specification corresponding to the current real-time data, is an integer, and can be represented by 1 group of numbers. Such as: when the client is started, first information is reported, wherein the first information comprises user environment data. The initiation protocol 8000 may be used to define the manner in which the first data is organized. After receiving the first information, the background analyzes the first information according to the starting protocol specification corresponding to the protocol number 8000, and then obtains the environmental data of the user. For another example: and when the client is closed, reporting second information, wherein the second information comprises user use data. The exit protocol number 8001 may be used to define the organization of the second data. And after receiving the second information, the background analyzes the second information according to the exit protocol specification with the protocol number corresponding to 8001, and then obtains the use data of the user.
The Globally Unique Identifier (GUID), which is typically a 32-bit string, is used to identify a Unique user. The protocol content is a self-defined character string.
In step S102, a stored value corresponding to the protocol number is queried according to the globally unique identifier in the parsed data.
Wherein, defining a cache format: { globally unique identifier, stored value }, i.e.: the global unique identifier is a main Key (Key) corresponding to the new storage Value, and is mapped with the storage Value (Value) in a cache, so that a Key Value pair is formed and is used for auxiliary calculation of intersection.
The storage value adopts a bit array (Bitmap) structure in a memory. Wherein, 1 byte occupies 8 bits, and 1 bit is 1 binary, i.e. not 0, i.e. 1. The bit array comprises a plurality of bits, and whether the designated position is 0 or 1 can be obtained through bit operation.
The bit array is a binary bit sequence. It will be appreciated that the use of bit arrays provides a significant savings in memory space. For example, 1 integer (int) occupies 4 bytes, and 1 byte contains 8 bits. The reporting identification of the protocol is stored by using a bit, 0 represents that the protocol is not reported, and 1 represents that the protocol is reported, so that 8 states can be stored by 1 byte, and the intersection of 8 protocols can be calculated. If the mode of storing the protocol number set is used, 1 protocol number, 1 integer bit and 8 protocols only occupy the space of 32 bytes, thereby greatly saving the memory resources and the processing resources.
For example, 3 protocols are counted, and the protocol numbers are respectively: 8000. 8100, and 8102, the bit number group may be allocated 1 byte. Wherein, the initial state is [00000000], report 8000 protocol after [00000001], report 8102 protocol after [0000101], wherein 8100 does not report, therefore, the corresponding protocol bit is 0.
In step S103, it is determined whether or not the storage value is empty.
It can be understood that, in this step, it is determined whether the protocol bit corresponding to the protocol number is 0.
If the stored value is not empty, executing step S104; if the stored value is empty, step S105.
In step S104, the parsed data is discarded for data deduplication.
In step S105, the stored value corresponding to the protocol number in the cache is updated.
In step S106, the stored values corresponding to the protocol numbers corresponding to the intersection in the statistical instruction are traversed, and when none of the stored values corresponding to the protocol numbers is empty, the statistical indexes in the intersection are counted.
Based on the storage mode of the bit array, the intersection of the double protocols can be completed by only one set, and the consumption of the memory is saved. Data updating can be completed through simple bit operation every time, and the method is more efficient.
According to the data processing method provided by the embodiment of the invention, the real-time data is firstly analyzed to obtain the analyzed data. And then inquiring whether the corresponding storage value is empty in the cache according to the global unique identifier in the analysis data, and removing the duplicate or updating the storage value according to the inquiry result. The global unique identifier in the intersection is used as a main key during caching, the characteristic value of each piece of real-time data is mapped to the stored value corresponding to the main key after being analyzed, and then intersection duplication elimination counting is carried out through the change of the specified bit in the stored value, so that the occupied memory space is small, and the calculation efficiency can be improved.
Referring to fig. 2, another flow chart of the data processing method according to the embodiment of the invention is shown.
The data processing method is applied to a real-time data processing platform and is used for processing data containing intersection operation in real time.
In step S201, a protocol data format of the real-time data is defined.
Wherein, this step can be specifically executed as:
(1) acquiring a protocol number, a globally unique identifier and protocol content, and setting a protocol data format through the protocol number, the globally unique identifier and the protocol content.
It can be understood that the protocol data format is a format of real-time data reported by the client. The protocol data format can be expressed as: protocol number | globally unique identifier of user | protocol content. The protocol number is used for specifying a protocol specification corresponding to the current real-time data, is an integer, and can be represented by 1 group of numbers. The Globally Unique Identifier (GUID), typically a 32-bit string, is used to determine a unique user. And the protocol content is a custom character string.
(2) In the protocol data format, a protocol identification bit where a protocol number is located is set.
Wherein, the setting of the protocol identification bit where the protocol number is located includes: defining bits occupied by a plurality of protocol numbers in the bit array, wherein each protocol corresponds to M bits, M is a positive integer, and if the value of the bit corresponding to the protocol number is a first value, the stored value corresponding to the protocol number is null; and if the bit corresponding to the protocol number is a second value, indicating that the stored value corresponding to the protocol number is not null.
For example, bits occupied by a plurality of protocol numbers are defined in the bit array, each protocol corresponds to 1 bit, wherein if the value of the bit corresponding to the protocol number is 0, it indicates that the stored value corresponding to the protocol number is null; and if the bit corresponding to the protocol number is 1, indicating that the stored value corresponding to the protocol number is not null.
In step S202, a cache format in the memory is defined.
Wherein, this step can be specifically executed as:
(1) setting the globally unique identifier as a primary Key (Key) in a cache;
(2) the protocol number, and the protocol contents, are set to a stored Value (Value), which is a structure of a bit array.
It is to be understood that the cache format may be expressed as: { globally unique identifier, stored value }, i.e., both are mapped (map) to form key-value pairs, which are used to assist in computing the intersection.
The storage value adopts a bit array (Bitmap) structure in a memory. Wherein, 1 byte occupies 8 bits, and 1 bit is 1 binary, i.e. not 0, i.e. 1. The bit array comprises a plurality of bits, and whether the designated position is 0 or 1 can be obtained through bit operation.
The bit array is a binary bit sequence. It will be appreciated that the use of bit arrays provides a significant savings in memory space. For example, 1 integer (int) occupies 4 bytes, and 1 byte contains 8 bits. The reporting identification of the protocol is stored by using a bit, 0 represents that the protocol is not reported, and 1 represents that the protocol is reported, so that 8 states can be stored by 1 byte, and the intersection of 8 protocols can be calculated. If the mode of storing the protocol number set is used, 1 protocol number, 1 integer bit and 8 protocols only occupy the space of 32 bytes, thereby greatly saving the memory resources and the processing resources.
In step S203, real-time data is received and parsed according to a protocol number included in the real-time data to generate parsed data.
Specifically, this step may be performed as:
(1) reading a protocol number from the real-time data; reading the protocol number from the protocol identification bit of the real-time data;
(2) matching a corresponding protocol specification according to the protocol number; and
(3) and analyzing the real-time data according to the protocol specification, and generating analyzed data conforming to a protocol data format.
In step S204, a stored value corresponding to the protocol number is queried according to the globally unique identifier in the parsed data.
For example, 3 protocols are counted, and the protocol numbers are respectively: 8000. 8100, and 8102, the bit number group may be allocated 1 byte. Wherein, the initial state is [00000000], report 8000 protocol after [00000001], report 8102 protocol after [0000101], wherein 8100 does not report, therefore, the corresponding protocol bit is 0.
In step S205, it is determined whether the stored value corresponding to the protocol number is empty.
It can be understood that, in this step, it is determined whether the protocol bit corresponding to the protocol number is 0.
If the stored value is not empty, step S206 is executed; if the stored value is empty, step S207.
In step S206, the parsed data is discarded for data deduplication.
Wherein when the stored value is not empty, that is: the analysis data corresponding to the current protocol number is already counted, so that the analysis data should be discarded to avoid repeated counting.
In step S207, the stored value corresponding to the protocol number in the cache is updated.
Wherein, this step can be specifically executed as:
(1) creating a new stored value, said stored value including a protocol number and protocol contents in said real-time data; and
(2) and setting the global unique identifier as a main key corresponding to the new storage value, forming a key value pair in a cache, and further counting the current analysis data.
In step S208, stored values corresponding to a plurality of protocol numbers corresponding to an intersection in the statistical instruction are traversed, and when none of the stored values corresponding to the plurality of protocol numbers is empty, the statistical indexes in the intersection are counted.
It can be understood that based on the storage mode of the bit array, only one set is needed to complete the intersection of the two protocols, and the consumption of the memory is saved. Data updating can be completed through simple bit operation every time, and the method is more efficient.
The data processing method provided by the embodiment of the invention comprises the steps of firstly analyzing real-time data to obtain analysis data; and then inquiring whether the corresponding storage value is empty in the cache according to the global unique identifier in the analysis data, and removing the duplicate or updating the storage value according to the inquiry result. The global unique identifier in the intersection is used as a main key during caching, the characteristic value of each piece of real-time data is mapped to the stored value corresponding to the main key after being analyzed, and then intersection duplication elimination counting is carried out through the change of the specified bit in the stored value, so that the occupied memory space is small, and the calculation efficiency can be improved.
Referring to fig. 3, a schematic diagram of an operating principle of a data processing method according to an embodiment of the present invention is shown.
Taking the intersection of two protocols (protocol 1 ^ protocol 2) as an example, the operation process of the intersection of the invention is shown.
And 2 protocols participate in intersection operation, and when data of all protocols corresponding to one Globally Unique Identifier (GUID) are reported, one statistical count increase is completed. The specific process is as follows:
i: defining bit (bit) occupied by a protocol participating in operation in a bit array (bitmap), wherein 1 protocol corresponds to 1 bit, and N protocols need to occupy the space of N bits (N/8 bytes).
II: initializing an empty cache structure, and defining a calculation result: JOIN _ UV ═ 0.
And III, when receiving the data of the unknown protocol P, inquiring the identification bit x of the protocol according to the corresponding relation defined in the step I by the protocol P, and inquiring the corresponding bitmap in the cache according to the GUID of the user.
IV: if the bitmap result inquired in the step III does not exist, a new bitmap is created and marked as nv, and all bits are initially 0; setting the x-th bit of nv to 1; adding a key value pair { guid: nv }.
V: if the bitmap result inquired in the step III exists, marking the bitmap result as ov, and acquiring the xth bit value (b) of the ov;
VI: if b in the V step is 1, all the parameters are kept unchanged.
VII: if b in the step V is equal to 0, traversing whether bit values corresponding to other N protocols are all 1; if yes, adding 1 to JOIN _ UV, and if not, keeping the calculation result unchanged; then, the x-th bit of ov is set to 1.
VIII: and continuously receiving data, repeating the operation in the step III, and outputting the JOIN _ UV in real time to obtain a real-time intersection statistical result.
The data processing method provided by the embodiment of the invention comprises the steps of firstly analyzing real-time data to obtain analysis data; and then inquiring whether the corresponding storage value is empty in the cache according to the global unique identifier in the analysis data, and removing the duplicate or updating the storage value according to the inquiry result. The global unique identifier in the intersection is used as a main key during caching, the characteristic value of each piece of real-time data is mapped to the stored value corresponding to the main key after being analyzed, and then intersection duplication elimination counting is carried out through the change of the specified bit in the stored value, so that the occupied memory space is small, and the calculation efficiency can be improved.
Referring to fig. 4, a flow chart of a data processing method according to an embodiment of the invention is shown.
The data processing method is applied to a real-time data processing platform and is used for processing data containing intersection operation in real time.
In step S401, a number N of protocols participating in intersection operation is obtained from the statistical instruction, where N is a positive integer.
In step S402, it is determined whether the number of protocols is greater than 2.
When the number N of the protocols is 2, executing step a, that is: step S101 in fig. 1 or step S203 in fig. 2; when the number N of protocols is greater than 2, step S403 is performed.
In step 403, the intersection operation is split into T intersections of dual protocols as T statistical indexes, and corresponding T protocol sets are recorded, where the protocol sets are used to store protocol numbers participating in the intersection operation, and T is a positive integer.
In step S404, the real-time data is received and parsed according to the protocol number included in the real-time data to generate parsed data.
It is understood that the format of the protocol data reported by the client may be defined first. The protocol data format is as follows: protocol number | globally unique identifier of user | protocol content.
The protocol number is used for specifying a protocol specification corresponding to the current real-time data, is an integer, and can be represented by 1 group of numbers. The Globally Unique Identifier (GUID), which is typically a 32-bit string, is used to identify a Unique user. And the protocol content is a custom character string.
In step S405, the stored value corresponding to the protocol number is queried according to the globally unique identifier in the parsed data.
Wherein, defining a cache format: { globally unique identifier, stored value }, i.e.: the global unique identifier is a main Key (Key) corresponding to the new storage Value, and is mapped with the storage Value (Value) in a cache, so that a Key Value pair is formed and is used for auxiliary calculation of intersection.
The storage value adopts a bit array (Bitmap) structure in a memory. Wherein, 1 byte occupies 8 bits, and 1 bit is 1 binary, i.e. not 0, i.e. 1. The bit array comprises a plurality of bits, and whether the designated position is 0 or 1 can be obtained through bit operation.
The bit array is a binary bit sequence. It will be appreciated that the use of bit arrays provides a significant savings in memory space. For example, 1 integer (int) occupies 4 bytes, and 1 byte contains 8 bits. The reporting identification of the protocol is stored by using a bit, 0 represents that the protocol is not reported, and 1 represents that the protocol is reported, so that 8 states can be stored by 1 byte, and the intersection of 8 protocols can be calculated. If the mode of storing the protocol number set is used, 1 protocol number, 1 integer bit and 8 protocols only occupy the space of 32 bytes, thereby greatly saving the memory resources and the processing resources.
For example, 3 protocols are counted, and the protocol numbers are respectively: 8000. 8100, and 8102, the bit number group may be allocated 1 byte. Wherein, the initial state is [00000000], report 8000 protocol after [00000001], report 8102 protocol after [0000101], wherein 8100 does not report, therefore, the corresponding protocol bit is 0.
In step S406, it is determined whether the storage value is empty.
It can be understood that, in this step, it is determined whether the protocol bit corresponding to the protocol number is 0.
If the stored value is not empty, executing step S104; if the stored value is empty, step S105.
In step S407, the parsed data is discarded for data deduplication.
In step S408, the stored value corresponding to the protocol number in the cache is updated.
In step S409, after the updating, the protocol set is traversed, a plurality of subsets including the protocol number are filtered, and when the stored values corresponding to the subsets are not empty, the statistical indexes in the intersection are counted.
In step S410, it is determined whether the index is a single statistical index.
The single statistical index means that the number of the intersection of the double protocols is 1.
If yes, executing step S411; if not, go to step S412.
In step S411, statistical data is output in real time according to the stored values corresponding to the plurality of protocol numbers corresponding to the intersection.
In step S412, statistical data is output in real time according to the count value of the multiple statistical indexes.
And less memory resources and computing resources are used in multi-group data and multiple intersection operations. In real-time statistics, there are multiple intersection operations using a common set of data. The invention sets the reference public data as the reference bit, a plurality of intersection operations are completed in one statistical method, the public data flow is common, and the storage is common, thereby saving the resources and improving the efficiency.
Based on the storage mode of the bit array, the intersection of the double protocols can be completed by only one set, and the consumption of the memory is saved. Data updating can be completed through simple bit operation every time, and the method is more efficient.
The data processing method provided by the embodiment of the invention comprises the steps of firstly analyzing real-time data to obtain analysis data; and then inquiring whether the corresponding storage value is empty in the cache according to the global unique identifier in the analysis data, and removing the duplicate or updating the storage value according to the inquiry result. The global unique identifier in the intersection is used as a main key during caching, the characteristic value of each piece of real-time data is mapped to the stored value corresponding to the main key after being analyzed, and then intersection duplication elimination counting is carried out through the change of the specified bit in the stored value, so that the occupied memory space is small, and the calculation efficiency can be improved.
Fig. 5 is a schematic diagram illustrating an operating principle of a data processing method according to an embodiment of the present invention.
Taking 3 protocol intersection operations (protocol 1 &protocol2, and protocol 1 &protocol3) as an example, the following figure shows the calculation flow:
n (N is more than 2) protocols participate in intersection operation, and a plurality of statistical indexes are calculated in a statistical mode; among the plurality of statistical indexes, there is common protocol data. For example, protocol 1, protocol 2 and protocol 3 participate in intersection operation, the statistical index 1 is the intersection of protocol 1 and protocol 2, and the statistical index 2 is the intersection of protocol 1 and protocol 3. The conventional method is that two statistical indexes are respectively operated, which brings about the following problems 1: the two statistical indexes are respectively operated, two cache regions need to be distributed, and the memory consumption is doubled; problem 2: the data stream of the protocol 1 needs to be distributed twice, participates in twice calculation, consumes twice network bandwidth and CPU calculation, and the larger the data flow of the protocol 1 is, the more obvious the disadvantages are.
The data processing method provided by the embodiment of the invention comprises the following specific processes:
defining bit (bit) occupied by protocol participating in operation in bit array (bitmap), 1 protocol corresponding to 1 bit, N protocols needing to occupy space of N bits (N/8 bytes).
II: calculating T statistical indexes, and defining a calculation result array [ UV1, …, UVt ]; t sets are recorded, each Set storing protocol numbers participating in the computation, [ { P1, …, Px }, …, { P1, …, Py } ], denoted as P _ Set.
And III, when receiving the data of the unknown protocol P, inquiring the identification bit x of the protocol according to the corresponding relation defined in the step I by the protocol P, and inquiring the corresponding bitmap in the cache according to the GUID of the user.
IV, if the bitmap result inquired in the step III does not exist, a new bitmap is created and marked as nv, and all bits are initially 0; setting the x-th bit of nv to 1; adding a key value pair { guid: nv }.
V: if the bitmap result inquired in the step III exists, marking the bitmap result as ov, and acquiring the xth bit value (b) of the ov;
VI: if b in the V step is 1, all the parameters are kept unchanged.
And VII, if b in the V step is 0, P _ Set is traversed, if the current data protocol exists in the specified Set S, all protocol numbers in the S are traversed, bit values are obtained according to storage positions in a bitmap corresponding to the protocol numbers defined in the I step, and if all the protocol numbers are 1, the statistical index calculation value corresponding to the Set S is added with 1.
VIII: and continuously receiving data, and repeating the operation in the step III to generate a real-time result.
And less memory resources and computing resources are used in multi-group data and multiple intersection operations. In real-time statistics, there are multiple intersection operations using a common set of data. The invention sets the reference public data as the reference bit, a plurality of intersection operations are completed in one statistical method, the public data flow is common, and the storage is common, thereby saving the resources and improving the efficiency.
Based on the storage mode of the bit array, the intersection of the double protocols can be completed by only one set, and the consumption of the memory is saved. Data updating can be completed through simple bit operation every time, and the method is more efficient.
The data processing method provided by the embodiment of the invention comprises the steps of firstly analyzing real-time data to obtain analysis data; and then inquiring whether the corresponding storage value is empty in the cache according to the global unique identifier in the analysis data, and removing the duplicate or updating the storage value according to the inquiry result. The global unique identifier in the intersection is used as a main key during caching, the characteristic value of each piece of real-time data is mapped to the stored value corresponding to the main key after being analyzed, and then intersection duplication elimination counting is carried out through the change of the specified bit in the stored value, so that the occupied memory space is small, and the calculation efficiency can be improved.
Referring to fig. 6, a block diagram of a data processing apparatus according to an embodiment of the present invention is shown.
The data processing apparatus 600 comprises: a parsing module 61, a query module 62, a deduplication module 63, an update module 64, and a statistics module 65.
And the analysis module 61 is configured to receive the real-time data, and analyze the real-time data according to a protocol number included in the real-time data to generate analysis data.
And the query module 62 is connected to the analysis module 61 and is used for querying the stored value corresponding to the protocol number according to the globally unique identifier in the analysis data.
And the deduplication module 63 is connected to the query module 62, and configured to discard the parsed data to perform data deduplication when the stored value is not empty.
And the updating module 64 is connected to the querying module 62, and is configured to update the stored value corresponding to the protocol number in the cache when the stored value is empty.
And the counting module 65 is connected to the updating module 64 and is used for traversing the stored values respectively corresponding to the plurality of protocol numbers corresponding to the intersection in the counting instruction after updating, and counting the counting indexes in the intersection when the stored values respectively corresponding to the plurality of protocol numbers are not empty.
The data processing device provided by the embodiment of the invention firstly analyzes the real-time data to obtain the analysis data; and then inquiring whether the corresponding storage value is empty in the cache according to the global unique identifier in the analysis data, and removing the duplicate or updating the storage value according to the inquiry result. The global unique identifier in the intersection is used as a main key during caching, the characteristic value of each piece of real-time data is mapped to the stored value corresponding to the main key after being analyzed, and then intersection duplication elimination counting is carried out through the change of the specified bit in the stored value, so that the occupied memory space is small, and the calculation efficiency can be improved.
Referring to fig. 7, a schematic block diagram of a data processing apparatus according to an embodiment of the invention is shown.
The data processing apparatus 700, comprising: format module 71, storage module 72, quantity module 73, parsing module 74, splitting module 75, query module 76, deduplication module 77, update module 78, and statistics module 79.
A format module 71, configured to obtain a protocol number, a globally unique identifier, and a protocol content, and set a protocol data format according to the protocol number, the globally unique identifier, and the protocol content; and setting a protocol identification bit where the protocol number is located in the protocol data format.
A storage module 72, configured to set the globally unique identifier as a primary key, and set the protocol number and the protocol content as a stored value, where the stored value is a structure of a bit array.
And the quantity module 73 is configured to obtain a protocol number N participating in intersection operation from the statistical instruction, where N is a positive integer.
And the analyzing module 74 is connected to the number module 73, and configured to receive the real-time data when the number N of the protocols is 2, and analyze the real-time data according to a protocol number included in the real-time data to generate analysis data.
Wherein the parsing module 74 includes: number unit 741, specification unit 742, and parsing unit 743. Specifically, the number unit 741 is configured to read the protocol number from a protocol identification bit of the real-time data. The specification unit 742 is configured to match a corresponding protocol specification according to the protocol number. The parsing unit 743 is configured to parse the real-time data according to the protocol specification, and generate parsed data conforming to a protocol data format.
The splitting module 75 is connected to the number module 73, and configured to split the intersection operation into intersections of T double protocols as T statistical indicators when the number N of the protocols is greater than 2, and record corresponding T protocol sets, where the protocol sets are used to store protocol numbers participating in the intersection operation, and T is a positive integer.
And the query module 76 is connected to the parsing module 74 and the splitting module 75, and is configured to query the stored value corresponding to the protocol number according to the globally unique identifier in the parsed data.
A deduplication module 77, connected to the query module 76, is configured to discard the parsed data for data deduplication when the stored value is not empty.
And the updating module 78 is connected to the querying module 76, and is configured to update the stored value corresponding to the protocol number in the cache when the stored value is empty.
Wherein the update module 78 comprises: a storage value cell 781, and a primary key cell 782. Specifically, the stored value unit 781 is configured to create a new stored value, where the stored value includes a protocol number and protocol content in the real-time data. A primary key unit 782, configured to set the globally unique identifier as a primary key corresponding to the new stored value, and form a key-value pair in the cache.
And the counting module 79 is connected to the updating module 78 and is used for traversing the stored values of the plurality of protocol numbers corresponding to the intersection in the counting instruction after updating, and counting the counting indexes in the intersection when the stored values corresponding to the plurality of protocol numbers are not empty.
In addition, the statistical module 79 is further configured to traverse the protocol sets after updating, filter out a plurality of subsets including the protocol numbers, and count the statistical indicators in the intersection when the stored values corresponding to the subsets are not empty.
Wherein the statistical module 79 comprises: a judgment unit 791, a single index unit 792, and a multiple index unit 793. Specifically, the determining unit 791 is configured to determine whether the number of the dual protocol intersections is 1. And a single index unit 792, configured to output statistical data in real time according to the stored values of the plurality of protocol numbers corresponding to the intersection when the single statistical index is used. And a multiple index unit 793, configured to output statistical data in real time according to a count value of the multiple statistical indexes when the single statistical index is not the single statistical index.
The data processing device provided by the embodiment of the invention firstly analyzes the real-time data to obtain the analysis data; and then inquiring whether the corresponding storage value is empty in the cache according to the global unique identifier in the analysis data, and removing the duplicate or updating the storage value according to the inquiry result. The global unique identifier in the intersection is used as a main key during caching, the characteristic value of each piece of real-time data is mapped to the stored value corresponding to the main key after being analyzed, and then intersection duplication elimination counting is carried out through the change of the specified bit in the stored value, so that the occupied memory space is small, and the calculation efficiency can be improved.
The method and apparatus described in the above embodiments are further described in detail below by way of example.
Referring to fig. 8, a diagram of a specific application example of the data processing method and the processing apparatus according to the embodiment of the present invention is shown.
Practical application scenario 1: the terminal device 81 may report the protocol a and the protocol B at different time points; and (4) counting indexes, namely reporting the number of the users of the protocol A and the protocol B.
Example (c): a certain product provides two functions, wherein a user uses a first function reporting protocol A and a second function reporting protocol B, the time points of the two functions used by the user are different, and the use sequence is not fixed; statistical protocol a ≈ protocol B, i.e.: and the statistical result is updated in second level according to the number of users used by both functions in the day.
Practical application scenario 2: the terminal device 81 may report protocol a, protocol B, protocol C, and protocol D at different time points; the statistical index 1 reports not only the protocol A but also how many users of the protocol B; the statistical index 2 reports not only the protocol a but also how many users of the protocol C are.
Example (c): the daily life of the product in different safety environments is monitored in real time. Starting a product by a user, and reporting a protocol A; the product daemon judges that the safety software 1 is started and reports a protocol B; judging that the safety software 2 is started, and reporting a protocol C; the protocol A & ltn & gt protocol B and the protocol A & ltn & gt protocol C are respectively counted, and daily life conditions under the two environments require real-time monitoring to quickly locate whether the product is attacked by an adversary.
Correspondingly, an embodiment of the present invention further provides a server, and as shown in fig. 9, the data processing method and the data processing apparatus are applied to the server 900. The server 900 includes: a processor 901 of one or more processing cores, memory 902 of one or more computer-readable storage media, Radio Frequency (RF) circuitry 903, a short-range wireless transmission (WiFi) module 904, a power supply 905, an input unit 906, and a display unit 907.
Those skilled in the art will appreciate that the above described architecture is not intended to be limiting and may include more or fewer components than those described, some components in combination, or a different arrangement of components. Wherein:
specifically, in the present embodiment, in the server 900, the processor 901 loads the executable file corresponding to the process of one or more application programs into the memory 902 according to the following instructions, and the processor 901 runs the application program stored in the memory 902, so as to implement various functions as follows: receiving real-time data, and analyzing the real-time data according to a protocol number contained in the real-time data to generate analyzed data; inquiring a stored value corresponding to the protocol number according to the global unique identifier in the analysis data; when the stored value is not empty, discarding the analysis data to perform data deduplication; when the stored value is empty, updating the stored value corresponding to the protocol number in the cache; and after updating, traversing the stored values respectively corresponding to the plurality of protocol numbers corresponding to the intersection in the statistical instruction, and counting the statistical indexes in the intersection when the stored values respectively corresponding to the plurality of protocol numbers are not empty.
Preferably, the processor 901 is further configured to: reading a protocol number from the real-time data; matching a corresponding protocol specification according to the protocol number; and analyzing the real-time data according to the protocol specification, and generating analyzed data conforming to a protocol data format.
Preferably, the processor 901 is further configured to: acquiring a protocol number, a global unique identifier and protocol content, and setting a protocol data format through the protocol number, the global unique identifier and the protocol content; in the protocol data format, setting a protocol identification bit where a protocol number is located; reading a protocol number from the real-time data, specifically executing: and reading the protocol number from the protocol identification bit of the real-time data.
Preferably, the processor 901 is further configured to: setting the globally unique identifier as a primary key; setting the protocol number and the protocol content as stored values, the stored values being a structure of a bit array; the setting of the protocol identification bit where the protocol number is located includes: defining bits occupied by a plurality of protocol numbers in the bit array, wherein each protocol corresponds to M bits, M is a positive integer, and if the value of the bit corresponding to the protocol number is a first value, the stored value corresponding to the protocol number is null; and if the bit corresponding to the protocol number is a second value, indicating that the stored value corresponding to the protocol number is not null.
Preferably, the processor 901 is further configured to: creating a new stored value, said stored value including a protocol number and protocol contents in said real-time data; and setting the global unique identifier as a primary key corresponding to the new storage value, and forming a key-value pair in a cache.
Preferably, the processor 901 is further configured to: acquiring the number N of protocols participating in intersection operation in the statistical instruction, wherein N is a positive integer; when the number N of the protocols is 2, executing a step of receiving real-time data; when the number N of the protocols is larger than 2, splitting the intersection operation into T intersections of double protocols as T statistical indexes, and recording corresponding T protocol sets, wherein the protocol sets are used for storing protocol numbers participating in the intersection operation, and T is a positive integer; and after updating, traversing the protocol set, filtering a plurality of subsets containing the protocol numbers, and counting the statistical indexes in the intersection when the storage values corresponding to the subsets are not empty.
Preferably, the processor 901 is further configured to: judging whether the single statistical index is a single statistical index, wherein the single statistical index refers to the number of the intersection of the double protocols being 1; if the single statistical index is adopted, outputting statistical data in real time according to the stored values of the plurality of protocol numbers corresponding to the intersection; and if the statistical indexes are not single statistical indexes, outputting the statistical data in real time according to the counting values of the multiple statistical indexes.
The server provided by the embodiment of the invention firstly analyzes the real-time data to obtain the analysis data; and then inquiring whether the corresponding storage value is empty in the cache according to the global unique identifier in the analysis data, and removing the duplicate or updating the storage value according to the inquiry result. The global unique identifier in the intersection is used as a main key during caching, the characteristic value of each piece of real-time data is mapped to the stored value corresponding to the main key after being analyzed, and then intersection duplication elimination counting is carried out through the change of the specified bit in the stored value, so that the occupied memory space is small, and the calculation efficiency can be improved.
The server provided by the embodiment of the invention has the same concept as the data processing method and the data processing device in the embodiment.
It should be noted that, for the data processing method of the present invention, it can be understood by those skilled in the art that all or part of the processes in the embodiments of the present invention may be implemented by controlling the related hardware through a computer program, where the computer program may be stored in a computer readable storage medium, such as a memory of a server, and executed by at least one processor in the server, and during the execution process, the processes of the embodiments of the information sharing method may be included. The storage medium may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a Random Access Memory (RAM), or the like.
In the data processing apparatus according to the embodiment of the present invention, each functional module may be integrated into one processing chip, or each module may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium, such as a read-only memory, a magnetic or optical disk, or the like.
The data processing method, the data processing apparatus, and the computer-readable storage medium according to the embodiments of the present invention are described in detail, and specific examples are applied herein to illustrate the principles and implementations of the present invention, and the descriptions of the embodiments are only used to help understand the method and the core concept of the present invention; meanwhile, for those skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (12)

1. A data processing method, comprising:
defining the protocol data format of the real-time data: acquiring a protocol number, a global unique identifier and protocol content, and setting a protocol data format through the protocol number, the global unique identifier and the protocol content;
setting the globally unique identifier as a primary key;
in the protocol data format, setting a protocol identification bit where a protocol number is located, where the setting of the protocol identification bit where the protocol number is located includes:
defining bits occupied by a plurality of protocol numbers in a bit array, wherein each protocol corresponds to M bits, M is a positive integer, and if the value of the bit corresponding to the protocol number is a first value, the stored value corresponding to the protocol number is null; if the bit corresponding to the protocol number is a second value, the stored value corresponding to the protocol number is not null;
receiving real-time data, and analyzing the real-time data according to a protocol number contained in the real-time data to generate analyzed data;
inquiring a stored value corresponding to the protocol number according to the global unique identifier in the analysis data;
when the storage value is not empty, data deduplication is performed;
when the stored value is empty, updating the stored value corresponding to the protocol number; and
and after updating, traversing the stored values respectively corresponding to the plurality of protocol numbers corresponding to the intersection in the statistical instruction, and counting the statistical indexes in the intersection when the stored values respectively corresponding to the plurality of protocol numbers are not empty.
2. The data processing method of claim 1, wherein parsing the real-time data according to a protocol number included in the real-time data to generate parsed data, comprises:
reading a protocol number from the real-time data;
matching a corresponding protocol specification according to the protocol number;
and analyzing the real-time data according to the protocol specification, and generating analyzed data conforming to a protocol data format.
3. The data processing method of claim 2, wherein receiving real-time data further comprises:
reading a protocol number from the real-time data, specifically executing: and reading the protocol number from the protocol identification bit of the real-time data.
4. The data processing method of claim 1, wherein a protocol number, a globally unique identifier, and protocol content are obtained for defining a protocol data format, and thereafter further comprising:
setting the protocol number, and the protocol content, to a stored value, the stored value being a structure of an array of bits.
5. The data processing method of claim 1, wherein updating the stored value corresponding to the protocol number when the stored value is empty comprises:
creating a new stored value, said stored value including a protocol number and protocol contents in said real-time data;
and setting the global unique identifier as a primary key corresponding to the new storage value, and forming a key-value pair in a cache.
6. The data processing method of any of claims 1 to 5, further comprising, prior to receiving the real-time data:
acquiring the number N of protocols participating in intersection operation in the statistical instruction, wherein N is a positive integer;
when the number N of the protocols is 2, executing a step of receiving real-time data;
when the number N of the protocols is larger than 2, splitting the intersection operation into T intersections of double protocols as T statistical indexes, and recording corresponding T protocol sets, wherein the protocol sets are used for storing protocol numbers participating in the intersection operation, and T is a positive integer;
after the updating, traversing stored values of a plurality of protocol numbers corresponding to intersections in the statistical instruction, and counting statistical indexes in the intersections when the stored values corresponding to the protocol numbers are not empty, wherein the method comprises the following steps: and after updating, traversing the protocol set, filtering a plurality of subsets containing the protocol numbers, and counting the statistical indexes in the intersection when the storage values corresponding to the subsets are not empty.
7. The data processing method of claim 6, wherein after updating, traversing stored values of a plurality of protocol numbers corresponding to intersections in the statistical instruction, and after counting statistical indicators in the intersections when none of the stored values corresponding to the plurality of protocol numbers are empty, further comprising:
judging whether the single statistical index is a single statistical index, wherein the single statistical index refers to the number of the intersection of the double protocols being 1;
if the single statistical index is adopted, outputting statistical data in real time according to the stored values of the plurality of protocol numbers corresponding to the intersection;
if the statistical indexes are not single statistical indexes, outputting the statistical data in real time according to the counting values of the multiple statistical indexes.
8. A data processing apparatus, comprising:
defining the protocol data format of the real-time data: acquiring a protocol number, a global unique identifier and protocol content, and setting a protocol data format through the protocol number, the global unique identifier and the protocol content;
setting the globally unique identifier as a primary key;
in the protocol data format, setting a protocol identification bit where a protocol number is located, where the setting of the protocol identification bit where the protocol number is located includes:
defining bits occupied by a plurality of protocol numbers in a bit array, wherein each protocol corresponds to M bits, M is a positive integer, and if the value of the bit corresponding to the protocol number is a first value, the stored value corresponding to the protocol number is null; if the bit corresponding to the protocol number is a second value, the stored value corresponding to the protocol number is not null;
the analysis module is used for receiving the real-time data and analyzing the real-time data according to a protocol number contained in the real-time data to generate analysis data;
the query module is used for querying a stored value corresponding to the protocol number according to the global unique identifier in the analysis data;
the data deduplication module is used for performing data deduplication when the stored value is not empty;
the updating module is used for updating the stored value corresponding to the protocol number when the stored value is empty; and
and the counting module is used for traversing the stored values respectively corresponding to the plurality of protocol numbers corresponding to the intersection in the counting instruction after updating, and counting the counting indexes in the intersection when the stored values respectively corresponding to the plurality of protocol numbers are not empty.
9. The data processing apparatus of claim 8, wherein the parsing module comprises:
the number unit is used for reading a protocol number from the real-time data;
the specification unit is used for matching the corresponding protocol specification according to the protocol number;
and the analysis unit is used for analyzing the real-time data according to the protocol specification and generating analysis data conforming to a protocol data format.
10. The data processing apparatus of claim 8, wherein the update module comprises:
a storage value unit for creating a new storage value, said storage value including a protocol number and protocol content in said real-time data;
and the primary key unit is used for setting the global unique identifier as a primary key corresponding to the new storage value and forming a key value pair in a cache.
11. The data processing apparatus according to any one of claims 8 to 10, further comprising:
the quantity module is used for acquiring the number N of protocols participating in intersection operation in the statistical instruction, wherein N is a positive integer;
the analysis module is used for receiving the real-time data when the number N of the protocols is 2, and analyzing the real-time data according to the protocol numbers contained in the real-time data to generate analysis data;
the splitting module is used for splitting the intersection operation into T intersections of double protocols as T statistical indexes when the number N of the protocols is larger than 2, and recording corresponding T protocol sets, wherein the protocol sets are used for storing protocol numbers participating in the intersection operation, and T is a positive integer;
and the statistical module is also used for traversing the protocol set after updating, filtering a plurality of subsets containing the protocol numbers, and counting the statistical indexes in the intersection when the stored values corresponding to the subsets are not empty.
12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the data processing method of any one of claims 1 to 7.
CN201710321319.8A 2017-05-09 2017-05-09 Data processing method, data processing apparatus, and storage medium Active CN107133329B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710321319.8A CN107133329B (en) 2017-05-09 2017-05-09 Data processing method, data processing apparatus, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710321319.8A CN107133329B (en) 2017-05-09 2017-05-09 Data processing method, data processing apparatus, and storage medium

Publications (2)

Publication Number Publication Date
CN107133329A CN107133329A (en) 2017-09-05
CN107133329B true CN107133329B (en) 2022-03-08

Family

ID=59732724

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710321319.8A Active CN107133329B (en) 2017-05-09 2017-05-09 Data processing method, data processing apparatus, and storage medium

Country Status (1)

Country Link
CN (1) CN107133329B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110109911A (en) * 2018-01-10 2019-08-09 武汉斗鱼网络科技有限公司 Distributed Global ID's generation method, storage medium, electronic equipment and method
CN109816536B (en) * 2018-12-14 2023-08-25 中国平安财产保险股份有限公司 List deduplication method, device and computer equipment
CN109783523B (en) * 2019-01-24 2022-02-25 广州虎牙信息科技有限公司 Data processing method, device, equipment and storage medium
CN109981599B (en) * 2019-03-06 2022-01-18 南京理工大学 General data analysis platform and method for communication data stream
CN110727878A (en) * 2019-09-19 2020-01-24 上海易点时空网络有限公司 Distance calculation method and device for collaborative filtering, and collaborative filtering recommendation method and device
CN111259013A (en) * 2020-02-03 2020-06-09 京东数字科技控股有限公司 Method and device for storing data

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103605756A (en) * 2013-11-22 2014-02-26 北京国双科技有限公司 Data processing method and data processing device for on-line analysis processing
CN105718515A (en) * 2016-01-14 2016-06-29 神策网络科技(北京)有限公司 Data storage system and method and data analysis system and method
CN105933929A (en) * 2016-04-20 2016-09-07 重庆重邮汇测通信技术有限公司 Multi-protocol association method and system suitable for LTE-A network air interface monitoring instrument
CN106126721A (en) * 2016-06-30 2016-11-16 北京奇虎科技有限公司 The data processing method of a kind of real-time calculating platform and device
CN106209840A (en) * 2016-07-12 2016-12-07 中国银联股份有限公司 A kind of network packet De-weight method and device
CN106557571A (en) * 2016-11-23 2017-04-05 福建亿榕信息技术有限公司 A kind of data duplicate removal method and device based on K V storage engines

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8464053B2 (en) * 2007-09-05 2013-06-11 Radvision Ltd Systems, methods, and media for retransmitting data using the secure real-time transport protocol

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103605756A (en) * 2013-11-22 2014-02-26 北京国双科技有限公司 Data processing method and data processing device for on-line analysis processing
CN105718515A (en) * 2016-01-14 2016-06-29 神策网络科技(北京)有限公司 Data storage system and method and data analysis system and method
CN105933929A (en) * 2016-04-20 2016-09-07 重庆重邮汇测通信技术有限公司 Multi-protocol association method and system suitable for LTE-A network air interface monitoring instrument
CN106126721A (en) * 2016-06-30 2016-11-16 北京奇虎科技有限公司 The data processing method of a kind of real-time calculating platform and device
CN106209840A (en) * 2016-07-12 2016-12-07 中国银联股份有限公司 A kind of network packet De-weight method and device
CN106557571A (en) * 2016-11-23 2017-04-05 福建亿榕信息技术有限公司 A kind of data duplicate removal method and device based on K V storage engines

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
A Real-Time Database QoS-aware Service Selection Protocol for MANET;JD Rekik 等;《arXiv.org》;20111231;第101-116页 *
基于阿里云平台的密文数据安全去重系统的设计与实现;宋建业 等;《信息网络安全》;20170424;第39-45页 *
备份系统中全局数据去重技术的研究;刘容;《中国优秀硕士学位论文全文数据库 信息科技辑》;20140615;I138-83 *

Also Published As

Publication number Publication date
CN107133329A (en) 2017-09-05

Similar Documents

Publication Publication Date Title
CN107133329B (en) Data processing method, data processing apparatus, and storage medium
CN108255925B (en) Method and terminal for displaying data table structure change condition
TWI600305B (en) Method and apparatus for compaction of data received over a network
CN110445828B (en) Data distributed processing method based on Redis and related equipment thereof
CN109933585B (en) Data query method and data query system
CN110765195A (en) Data analysis method and device, storage medium and electronic equipment
CN111177201A (en) Data stream processing method and related device
US20230396633A1 (en) Method and Apparatus for Detecting Security Event, and Computer-Readable Storage Medium
CN109144964A (en) log analysis method and device based on machine learning
CN115023697A (en) Data query method and device and server
CN115525652A (en) User access data processing method and device
CA3148489A1 (en) Method of and device for assessing data query time consumption, computer equipment and storage medium
CN111666344A (en) Heterogeneous data synchronization method and device
CN112596851A (en) Multi-source heterogeneous data batch extraction method and analysis method of simulation platform
CN111045735B (en) Personalized guide page pushing method, device and system
CN111897812A (en) Data query method and device, electronic equipment and computer readable storage medium
WO2022253131A1 (en) Data parsing method and apparatus, computer device, and storage medium
CN114048238B (en) Storage method and device for industrial equipment time sequence data and electronic equipment
CN116743790A (en) Device data acquisition, device data analysis method and device and computer device
CN114490861A (en) Telemetry data analysis method, device, equipment and medium
CN110633388B (en) Real-time index generation method, system and storage medium based on communication XDR
CN114063943A (en) Data transmission system, method, device, medium, and apparatus
CN111143006B (en) Method and device for acquiring command help information
CN114416731A (en) Data storage method, data reading method, data storage device, electronic device and medium
CN110032445B (en) Big data aggregation calculation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20221110

Address after: 1402, Floor 14, Block A, Haina Baichuan Headquarters Building, No. 6, Baoxing Road, Haibin Community, Xin'an Street, Bao'an District, Shenzhen, Guangdong 518,101

Patentee after: Shenzhen Yayue Technology Co.,Ltd.

Address before: 518000 Tencent Building, No. 1 High-tech Zone, Nanshan District, Shenzhen City, Guangdong Province, 35 Floors

Patentee before: TENCENT TECHNOLOGY (SHENZHEN) Co.,Ltd.