CN114925101A - Data processing method and device, storage medium and electronic equipment - Google Patents

Data processing method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN114925101A
CN114925101A CN202210687405.1A CN202210687405A CN114925101A CN 114925101 A CN114925101 A CN 114925101A CN 202210687405 A CN202210687405 A CN 202210687405A CN 114925101 A CN114925101 A CN 114925101A
Authority
CN
China
Prior art keywords
dimension
data
preset
target
bitmap index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210687405.1A
Other languages
Chinese (zh)
Inventor
罗琛
裴中率
朱一飞
刘源
姚盛楠
金林强
王永亮
陈人树
焦广才
冀文杰
钟秀秀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Netease Cloud Music Technology Co Ltd
Original Assignee
Hangzhou Netease Cloud Music Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Netease Cloud Music Technology Co Ltd filed Critical Hangzhou Netease Cloud Music Technology Co Ltd
Priority to CN202210687405.1A priority Critical patent/CN114925101A/en
Publication of CN114925101A publication Critical patent/CN114925101A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2237Vectors, bitmaps or matrices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2308Concurrency control
    • G06F16/2315Optimistic concurrency control
    • G06F16/2322Optimistic concurrency control using timestamps
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides a data processing method, a data processing device, a storage medium and electronic equipment, and relates to the technical field of computers. In the embodiment of the disclosure, data information corresponding to at least one preset dimension in each piece of data to be processed in a real-time data stream may be obtained first, then the data information is written into bitmap indexes corresponding to the preset dimensions respectively, then at least one target dimension is determined according to a screening condition in a read instruction, the bitmap index corresponding to the target dimension is determined under the condition that the target dimension is matched with the preset dimension, and a logical operation is performed on the bitmap index corresponding to the target dimension according to a target logical operation relationship, so as to obtain a statistical result for the read instruction. Therefore, the reading instruction is decomposed into a plurality of target dimensions, the calculation amount in the data processing process can be simplified, the purpose of quickly responding to the reading instruction can be realized, and because the possibility that the bitmap index corresponding to the single dimension is invaded is small, an additional coprocessor is not needed to be added for maintaining the bitmap index.

Description

Data processing method, data processing device, storage medium and electronic equipment
Technical Field
Embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a data processing method and apparatus, a storage medium, and an electronic device.
Background
With the rapid development of computer technology, data becomes more and more important in the big data era, and therefore, analysis and mining of data are indispensable means. Currently, an On-Line Analytical Processing (OLAP) technique is widely used, a multidimensional Cube pre-computation (multidimensional OLAP Cube) technique is adopted, acceleration is performed by using space time conversion, pre-computation is performed On possibly used metrics, an operation result is stored as a materialized view (Cube), and all materialized views (cubes) aggregated according to dimensions are stored in a database. The query of the MOLAP does not scan the original record, but directly executes the query with the pre-computed generated Cube results. The materialized view (Cuboid) generated by pre-calculation is only determined by the cardinality of the dimension (namely the number of dimension values), and does not linearly increase along with the increase of the data volume.
However, with the explosive increase of the data volume, the query time inevitably increases linearly with the data scale, since the mol ap needs to perform complex Cube construction operation, only t +1 times of pre-statistics can be performed at a fixed time every day, which leads to serious lack of real-time performance of data statistics, and with the increase of query dimensions, the processing volume of data also increases exponentially, which doubles the resource consumption, and the operation and maintenance cost is also increased by additionally introducing hardware processing equipment.
This section is intended to provide a background or context to the embodiments of the disclosure recited in the claims and the description herein is not admitted to be prior art by inclusion in this section.
Disclosure of Invention
To overcome the problems in the related art, the present disclosure provides a data processing method, an apparatus, a storage medium, and an electronic device.
According to a first aspect of the present disclosure, there is provided a data processing method, the method comprising:
acquiring data information corresponding to at least one preset dimension in each to-be-processed data in a real-time data stream;
respectively writing the data information into bitmap indexes corresponding to the preset dimensions;
determining at least one target dimension according to a screening condition in the reading instruction;
under the condition that the target dimension is matched with a preset dimension, determining a bitmap index corresponding to the target dimension;
and performing logical operation on the bitmap index corresponding to the target dimension according to the target logical operation relation to obtain a statistical result aiming at the reading instruction.
Optionally, the determining at least one target dimension according to the screening condition in the read instruction includes:
splitting the screening condition to obtain at least one screening sub-condition;
and determining a corresponding target dimension according to any one of the screening sub-conditions to obtain the at least one target dimension.
Optionally, the method further includes:
determining the target logical operation relationship according to the logical relationship between the at least one screening sub-condition indicated by the reading instruction
Optionally, before performing a logical operation on the bitmap index according to the target logical operation relationship to obtain a statistical result for the read instruction, the method further includes:
determining a newly added dimension under the condition that the target dimension is not matched with the preset dimension;
selecting the moment of configuring the bitmap index corresponding to the newly added dimension as a first moment;
sequentially writing data information corresponding to the newly added dimension in the first historical data into a bitmap index corresponding to the newly added dimension; the first historical data is data which is not written into the bitmap index corresponding to the newly added dimension before the first time;
taking the time when the bitmap index corresponding to the newly added dimension is successfully written as a second time;
detecting whether second historical data exists; the second historical data is data which is not written into the bitmap index corresponding to the newly added dimension before the second moment;
if yes, continuously writing the data information corresponding to the newly added dimensionality in the second historical data into the bitmap index corresponding to the newly added dimensionality, and re-executing the time of successfully writing the bitmap index corresponding to the newly added dimensionality as the operation of a second time;
and if not, continuously writing each data to be processed in the real-time data stream and the data information corresponding to the newly added dimension into the bitmap index corresponding to the newly added dimension for the real-time data stream entering after the second moment.
Optionally, before the obtaining of the data information of at least one preset dimension in each to-be-processed data in the real-time data stream, the method further includes:
and configuring the category characteristics conforming to a preset low base number into the preset dimension based on the plurality of category characteristics contained in the data to be processed.
Optionally, the method further includes:
receiving a deleting instruction aiming at the dimension to be offline;
and responding to the deleting instruction, and deleting the dimension to be downloaded and the bitmap index corresponding to the dimension to be downloaded.
Optionally, the acquiring data information of at least one preset dimension corresponding to each to-be-processed data in the real-time data stream includes:
acquiring each new data in the real-time data stream;
filtering each new data according to a preset filtering rule, and taking the new data which accords with the preset filtering rule as the data to be processed;
and determining data information corresponding to at least one preset dimension in the data to be processed.
According to a second aspect of the present disclosure, there is provided a data processing apparatus, the apparatus comprising:
the first acquisition module is used for acquiring data information corresponding to at least one preset dimension in each data to be processed in the real-time data stream;
the first writing module is used for respectively writing the data information into the bitmap indexes corresponding to the preset dimensionality;
the first determining module is used for determining at least one target dimension according to the screening condition in the reading instruction;
the second determining module is used for determining the bitmap index corresponding to the target dimension under the condition that the target dimension is matched with a preset dimension;
and the operation module is used for carrying out logical operation on the bitmap index corresponding to the target dimension according to a target logical operation relation to obtain a statistical result aiming at the reading instruction.
Optionally, the first determining module is further configured to:
splitting the screening condition to obtain at least one screening sub-condition;
and determining a corresponding target dimension according to any screening sub-condition to obtain at least one target dimension.
Optionally, the apparatus further comprises:
a third determining module, configured to determine the target logical operation relationship according to a logical relationship between the at least one filtering sub-condition indicated by the read instruction
Optionally, the apparatus further comprises:
a fourth determining module, configured to determine a newly added dimension when the target dimension is not matched with the preset dimension;
the selecting module is used for selecting the moment of configuring the bitmap index corresponding to the newly added dimension as a first moment;
the second writing module is used for sequentially writing the data information corresponding to the newly-added dimensionality in the first historical data into the bitmap index corresponding to the newly-added dimensionality; the first historical data is data which is not written into the bitmap index corresponding to the newly added dimension before the first time;
a fifth determining module, configured to use a time at which the bitmap index corresponding to the newly added dimension is successfully written as a second time;
the detection module is used for detecting whether second historical data exists or not; the second historical data is data which is not written into the bitmap index corresponding to the newly added dimension before the second moment; if so, continuously writing the data information corresponding to the newly added dimension in the second historical data into the bitmap index corresponding to the newly added dimension, and re-executing the time of successfully writing the bitmap index corresponding to the newly added dimension as the operation of a second time; and if not, continuously writing each data to be processed in the real-time data stream and the data information corresponding to the newly added dimension into the bitmap index corresponding to the newly added dimension for the real-time data stream entering after the second moment.
Optionally, the apparatus further comprises:
and the configuration module is used for configuring the category characteristics conforming to a preset low base number into the preset dimension based on the plurality of category characteristics contained in the data to be processed.
Optionally, the apparatus further comprises:
the receiving module is used for receiving a deleting instruction aiming at the dimension to be offline;
and the deleting module is used for responding to the deleting instruction and deleting the dimension to be downloaded and the bitmap index corresponding to the dimension to be downloaded.
Optionally, the first obtaining module is further configured to:
the second acquisition module is used for acquiring each new data in the real-time data stream;
the filtering module is used for filtering each new data according to a preset filtering rule, and taking the new data which accords with the preset filtering rule as the data to be processed;
and the sixth determining module is used for determining data information corresponding to at least one preset dimension in the data to be processed.
According to an aspect of the present disclosure, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, performs the above-described data processing method.
According to an aspect of the present disclosure, there is provided an electronic device including:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform any of the data processing methods described above via execution of the executable instructions.
To sum up, the data processing method provided by the embodiment of the present disclosure may first obtain data information corresponding to at least one preset dimension in each piece of data to be processed in a real-time data stream, then write the data information into bitmap indexes corresponding to the preset dimensions respectively, then determine at least one target dimension according to a screening condition in a read instruction, determine the bitmap index corresponding to the target dimension under the condition that the target dimension and the preset dimension are matched, and perform a logical operation on the bitmap index corresponding to the target dimension according to a target logical operation relationship, so as to obtain a statistical result for the read instruction. Therefore, the reading instruction is decomposed into a plurality of target dimensions, and the result is counted based on the bitmap index corresponding to the target dimension, so that the calculated amount in the data processing process can be simplified, the purpose of quickly responding to the reading instruction can be realized, the data processing efficiency is improved, and in addition, because only the bitmap index corresponding to a single dimension is counted, the probability that the bitmap index corresponding to the single dimension is invaded is small, an additional coprocessor is not required to be added for maintaining the bitmap index, and the maintenance cost is reduced.
Drawings
The above and other objects, features and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:
fig. 1 schematically illustrates a flowchart of steps of a data processing method provided by an embodiment of the present disclosure;
fig. 2 schematically illustrates a flow chart of a data processing method provided by an embodiment of the present disclosure;
fig. 3 schematically illustrates a schematic diagram of an update target willingness interval provided by an embodiment of the present disclosure;
FIG. 4 is a schematic diagram illustrating a target will interval provided by an embodiment of the disclosure;
fig. 5 schematically illustrates a flow chart of a data processing method provided by an embodiment of the present disclosure;
fig. 6 schematically illustrates a block diagram of a data processing apparatus provided by an embodiment of the present disclosure;
FIG. 7 schematically illustrates a schematic diagram of a storage medium provided by an embodiment of the disclosure; and
fig. 8 schematically illustrates a block diagram of an electronic device provided in an embodiment of the present disclosure.
In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
Detailed Description
The principles and spirit of the present disclosure will be described with reference to a number of exemplary embodiments. It is understood that these embodiments are given solely for the purpose of enabling those skilled in the art to better understand and to practice the present disclosure, and are not intended to limit the scope of the present disclosure in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art. The embodiments of the present disclosure may be combined with each other.
As will be appreciated by one of skill in the art, embodiments of the present disclosure may be embodied as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software, the data involved in the present disclosure may be data that is authorized by the user or fully authorized by various parties.
In this document, any number of elements in the drawings is by way of example and not by way of limitation, and any nomenclature is used solely for differentiation and not by way of limitation.
The principles and spirit of the present disclosure are explained in detail below with reference to several representative embodiments of the present disclosure.
Fig. 1 schematically illustrates a flowchart of steps of a data processing method provided by an embodiment of the present disclosure, and as shown in fig. 1, the method may include:
step S101, data information corresponding to at least one preset dimension in each to-be-processed data in the real-time data stream is obtained.
In the embodiment of the present disclosure, a plurality of data to be processed that need to be processed may be obtained by screening from a real-time data stream, and then data information corresponding to at least one preset dimension of each data to be processed is determined according to information included in the data to be processed. The preset dimension may be an information type predetermined according to an actual situation, specifically, the preset dimension may be a corresponding low-radix sequence, that is, an information type corresponding to a large number of same values, for example, the preset dimension may be a gender, an age, and the like. Because each data contains a plurality of data information, the data information matched with the preset dimensionality can be inquired in the data to be processed, and the data information corresponding to at least one preset dimensionality in each data to be processed is obtained.
In an embodiment of the present disclosure, the real-time data stream may be each piece of real-time data that needs to be written into the database, and in an actual application scenario, the real-time data stream may be order information submitted by each user on a shopping platform, may also be exchange information between users on a social platform, and may also be subscription information of each user on a media platform for push objects such as audio, video, and news, and the like, which is not limited to this disclosure.
And S102, respectively writing the data information into bitmap indexes corresponding to the preset dimensions.
In the embodiment of the present disclosure, the bitmap index corresponding to the preset dimension may be a preset bitmap index used for writing the extracted data information, and the bitmap index corresponding to the preset dimension may be a blank bitmap index before the data information is not written. The data information is respectively written into the bitmap indexes corresponding to the preset dimensions, and the corresponding data information can be sequentially written into corresponding positions in the bitmap indexes corresponding to the preset dimensions based on the writing time of each piece of data to be processed.
In the embodiment of the present disclosure, the bitmap index may be a vector created for a large number of columns of the same value, and the bitmap index may be composed of three parts, i.e., a category, a number, and an input value, where the category may be the same as a preset dimension, the number may be a data number of data to be processed, or may be a user ID recorded in the data, and the input value may be a value of a corresponding category in data information of the number. For example, if the data information is "user 1 gender male, user 2 gender male, user 3 gender female, user 4 gender female, user 5 gender male, user 6 gender female", and the preset dimension is "gender", the bitmap index corresponding to the preset dimension may be obtained as: [ user 1-male, user 2-male, user 3-female, user 4-female, user 5-male, user 6-female ].
And step S103, determining at least one target dimension according to the screening condition in the reading instruction.
In the embodiment of the present disclosure, the reading instruction may be an instruction that the user needs to read the statistical result, for example, if the user wants to know the number of men in the participating member, the reading instruction may be determined as "men participating in the member". Determining at least one target dimension according to the screening conditions in the read instruction, which may be to first resolve the at least one screening condition from the read instruction, where one screening condition corresponds to one dimension, and a dimension corresponding to the at least one screening condition is taken as the at least one target dimension. For example, the reading instruction may be "count the number of users with sex of a male and age of 30 years or more", and the filtering condition included in the reading instruction is "sex" and "age of 30 years or more", so that two target dimensions corresponding to the reading instruction may be determined, that is, "sex" and "age", respectively.
And step S104, determining a bitmap index corresponding to the target dimension under the condition that the target dimension is matched with a preset dimension.
In the embodiment of the present disclosure, a preset dimension matched with at least one target dimension may be determined from a plurality of preset dimensions, and a bitmap index corresponding to the matched preset dimension may be used as a bitmap index corresponding to the target dimension. For example, the preset dimensions and corresponding bitmap indexes may be: v1 ═ 0,0,1], V2 ═ 0,1,1], V3 ═ 1,0,1], V4 ═ 1,1,1], and the target dimensions V1 and V3, it can be determined that the target dimension V1 matches the preset dimension V1, and the target dimension V3 matches the preset dimension V3, and then the bitmap indexes corresponding to the target dimensions can be obtained as: v1 ═ 0,0,1, V3 ═ 1,0, 1.
Step S105, performing logical operation on the bitmap index corresponding to the target dimension according to the target logical operation relationship, and obtaining a statistical result for the read instruction.
In the embodiment of the present disclosure, the target logical operation relationship may be determined according to a logical relationship between the screening conditions in the read instruction, and the target logical operation relationship may be a "and" or "not" relationship, that is, a relationship that satisfies one screening condition or satisfies only one screening condition at the same time. For example, the reading instruction may be a "count of the number of users having a sex of men and an age of 30 years or more", and the relationship between the filtering condition "sex" and the filtering condition "age of 30 years or more" is a "and" relationship, that is, a logical relationship is data that statistically satisfies both the filtering conditions. And performing logical operation on the bitmap index corresponding to the target dimension according to the target logical operation relationship to obtain a statistical result for the read instruction, performing logical operation on the bitmap index corresponding to the target dimension according to the target logical operation relationship, and taking the calculation result as the statistical result for the read instruction.
To sum up, the data processing method provided by the embodiment of the present disclosure may first obtain data information corresponding to at least one preset dimension in each piece of data to be processed in a real-time data stream, then write the data information into bitmap indexes corresponding to the preset dimensions respectively, then determine at least one target dimension according to a screening condition in a read instruction, determine the bitmap index corresponding to the target dimension under the condition that the target dimension and the preset dimension are matched, and perform a logical operation on the bitmap index corresponding to the target dimension according to a target logical operation relationship, so as to obtain a statistical result for the read instruction. Therefore, the reading instruction is decomposed into a plurality of target dimensions, and the result is counted based on the bitmap index corresponding to the target dimension, so that the calculated amount in the data processing process can be simplified, the purpose of quickly responding to the reading instruction can be realized, the data processing efficiency is improved, and in addition, because only the bitmap index corresponding to a single dimension is counted, the probability that the bitmap index corresponding to the single dimension is invaded is small, an additional coprocessor is not required to be added for maintaining the bitmap index, and the maintenance cost is reduced.
Optionally, in the embodiment of the present disclosure, the operation of acquiring data information corresponding to at least one preset dimension in each to-be-processed data in the real-time data stream may specifically include, as shown in fig. 2:
step S1011, acquiring each new data in the real-time data stream.
In the embodiment of the present disclosure, the real-time data stream may be a real-time data stream written into a Database, and the Database may be a distributed open source Database (HBase) or other databases, which is not limited to this. Acquiring each new data in the real-time data stream may be collecting each new data in the real-time data stream for the real-time data stream written to the database.
Step S1012, filtering each new data according to a preset filtering rule, and taking the new data meeting the preset filtering rule as the data to be processed.
In the embodiment of the present disclosure, the preset filtering rule may be a rule preset according to an actual situation, for example, the preset filtering rule may be to screen out invalid data or to screen out data containing illegal characters, and the present disclosure is not limited thereto. Filtering each new data according to a preset filtering rule, taking the new data meeting the preset filtering rule as data to be processed, wherein the data which do not meet the preset filtering rule in each data can be screened out, the remaining data meeting the preset filtering rule can be taken as the data to be processed, for example, invalid data and data containing illegal characters in each data can be screened out, and the remaining data can be taken as the data to be processed.
Step S1013, determining data information corresponding to at least one preset dimension in the data to be processed.
In the embodiment of the present disclosure, information corresponding to at least one preset dimension may be extracted from data to be processed, and the extracted information is used as data information of the at least one preset dimension, so that the data information is then sequentially written into a bitmap index corresponding to the preset dimension.
For example, in an implementation scenario, the step of writing data in the real-time data stream into the corresponding bitmap index may specifically be: 1. the message queue can receive externally-incoming message data, and the message data can be a data sequence of a complete data exchange format (JavaScript Object notification, JSON); 2. reading new data flowing into an open source flow processing platform (Kafka) by a flow calculation engine, and filtering invalid data according to a preset filtering rule, wherein the invalid data can be filtered and deleted under the condition that userid in the data is null or illegal characters and the like; 3. the stream calculation engine writes the filtered data into an HBase database, specifically, converts a data sequence in a JSON format into one or more data write requests, and sequentially writes the data into the HBase database; 4. when the stream computing engine executes write operation to write data into the HBase database, whether each written data contains data information of a preset dimension is detected, if yes, the data information is obtained, the data information is written into a bitmap index corresponding to the preset dimension, and if not, the data is skipped.
Optionally, the operation of determining at least one target dimension according to the screening condition in the read instruction in the embodiment of the present disclosure, as shown in fig. 3, may specifically include:
and step S1031, splitting the screening condition to obtain at least one screening sub-condition.
In the embodiment of the present disclosure, the screening condition may be split according to a single category, and each category corresponds to one screening sub-condition, so as to obtain at least one screening sub-condition. For example, the screening condition is "male with age greater than 30", and it can be determined that the screening condition includes two categories, i.e., age and gender, so that it can be determined that the age-corresponding screening sub-condition is "age greater than 30" and the gender-corresponding screening sub-condition is "male".
Step S1032, determining a corresponding target dimension according to any one of the screening sub-conditions to obtain the at least one target dimension.
In this embodiment of the present disclosure, a category corresponding to each filtering sub-condition may be used as a target dimension corresponding to the filtering sub-condition, so that at least one target dimension may be obtained. It should be noted that each filtering sub-condition can only determine one target dimension, and therefore, the number of filtering sub-conditions may be the same as the number of target dimensions.
Optionally, the data processing method in the embodiment of the present disclosure may further specifically include:
and determining the target logical operation relationship according to the logical relationship between the at least one screening sub-condition indicated by the reading instruction.
In the embodiment of the present disclosure, a logical relationship of the at least one screening sub-condition in the read instruction may be determined, and then the logical relationship may be used as a target logical calculation relationship of the at least one screening sub-condition. The logical relationship may be used to characterize the relationship between two conditions, for example, the relationship between two filter sub-conditions may be "and", that is, the relationship between two filter sub-conditions may be characterized by adding, or "not", that is, the relationship between two filter sub-conditions may be characterized by subtracting.
For example, the reading instruction may be the number of users querying for sex of men and age of more than 30 years, splitting the reading instruction may obtain the filter conditions of "sex of men" and "age of more than 30 years", respectively, in an actual application scenario, the specific instruction may be represented as follows:
{data:[{
'dimension':'gender',
' value ': male ',
'operator':'eq'},
{'dimension':'age',
'value':'30',
'operator':'gt'}],
logicType:'and'}
the logicType may be a target logical operation relationship, and the target logical operation relationship may be and, or, nor, and represents a logical relationship between a plurality of conditions. The screening conditions may include eq (equal), gt (greater), gte (greater than or equal), lt (less than), lte (less than or equal), which represent the operational relationship between the dimension and the dimension value.
It should be noted that, analyzing the read instruction (DSL), may be to separate a specific dimension node in the read instruction from an overall logical relationship, where the dimension node may be a component in the DSL and may be used to determine a required bitmap index, and the logical relationship may be used to perform logical calculation between bitmap indexes.
Optionally, before performing the logical operation on the bitmap index according to the target logical operation relationship to obtain the statistical result for the read instruction in the embodiment of the present disclosure, as shown in fig. 4, the method may further include:
and step S21, determining a new dimension under the condition that the target dimension is not matched with the preset dimension.
In the embodiment of the present disclosure, the target dimension and the preset dimension are not matched, which may be that the preset dimension the same as the target dimension is not found, that is, the preset dimension does not include the target dimension. The determining of the new dimension may be to take the target dimension as the new dimension.
And step S22, selecting the time of configuring the bitmap index corresponding to the newly added dimension as a first time.
In the embodiment of the present disclosure, the time of configuring the bitmap index corresponding to the new dimension may be recorded, and the time may be used as the first time. For example, at time t1, the bitmap index corresponding to the newly added dimension is configured, and then time t1 may be used as the first time.
Step S23, sequentially writing the data information corresponding to the newly added dimensionality in the first historical data into the bitmap index corresponding to the newly added dimensionality; the first history data is data in which the bitmap index corresponding to the newly added dimension is not written before the first time.
In the embodiment of the present disclosure, data that is not written into the bitmap index corresponding to the newly added dimension before the first time is first used as the first history data, then data information corresponding to the newly added dimension in the first history data is obtained, and the data information is sequentially written into the bitmap index corresponding to the newly added dimension.
And step S24, taking the time when the bitmap index corresponding to the newly added dimension is successfully written as a second time.
In the embodiment of the present disclosure, the time when the bitmap index corresponding to the new dimension is successfully written may be recorded, specifically, the time when the bitmap index corresponding to the new dimension is successfully written into the first history data may be recorded, or the time when the bitmap index corresponding to the new dimension is successfully written into the other history data may be recorded, and the time is used as the second time.
Step S25, detecting whether second history data exist; the second history data is data which is not written into the bitmap index corresponding to the newly added dimension before the second moment; if yes, continuously writing the data information corresponding to the newly added dimensionality in the second historical data into the bitmap index corresponding to the newly added dimensionality, and re-executing the time of successfully writing the bitmap index corresponding to the newly added dimensionality as the operation of a second time; and if not, continuously writing each data to be processed in the real-time data stream and the data information corresponding to the newly added dimension into the real-time data stream entering after the second moment into the bitmap index corresponding to the newly added dimension.
In this embodiment of the present disclosure, it may be detected whether second history data occurs between the first time and the second time, that is, data in which the bitmap index corresponding to the newly added dimension is not written before the second time, if the second history data is detected, data information corresponding to the second history data may be continuously written into the bitmap index corresponding to the newly added dimension, and an operation of determining the second time is executed again until the second history data is not detected, and if the second history data is not detected, each piece of data to be processed in the real-time data stream and data information corresponding to the newly added dimension may be continuously written into the bitmap index corresponding to the newly added dimension for the real-time data stream entering after the second time.
In the embodiment of the disclosure, because the bitmap indexes corresponding to the respective dimensions are written from the data information corresponding to the same historical time, for the newly added dimensions, the historical data are sequentially written into the bitmap indexes corresponding to the newly added dimensions from the same historical time, and in order to avoid missing any data information to be written into the bitmap indexes, the operation of "taking the time at which the bitmap indexes are successfully written as the second time and detecting whether unwritten data exists before the second time" may be cyclically executed in order to ensure that all data before the second time are written into the bitmap indexes, so that for each to-be-processed data in the real-time data stream, together with other preset dimensions, the data information corresponding to the newly added dimension is determined and is continuously written into the bitmap indexes corresponding to the newly added dimensions, therefore, on one hand, dimensionality can be added at any time according to user requirements, and on the other hand, when the corresponding target dimensionality is determined according to the reading instruction, if the preset dimensionality matched with the target dimensionality is not found, the new dimensionality is added in time, so that the statistical result indicated by the reading instruction is ensured to be obtained, and the problem that dimensionality expansion is troublesome during data statistics is solved. For example, in an application scenario, the specific operation of the configuration center to update the newly added dimension may be as follows: data in the current real-time data stream is still written into a bitmap index corresponding to the existing preset dimension in real time, the existing preset dimension can normally run statistical calculation, however, since the new dimension does not count historical data before starting the new dimension, the real-time data stream cannot be counted together with the existing dimension, and therefore the new dimension needs to be counted from the historical data first. The configuration center can generate a corresponding HBase database coprocessor according to defined logic in the newly-added dimension, a coprocessor (coprocessor) can be added into the table descriptor (html descriptor) after the HBase database coprocessor is generated, and the coprocessor is dynamically loaded through a JAVA API: modifyTable (tableName, hTbleDescriptor). Recording the current time as t1, starting a thread to run task 1 to import the historical data into the bitmap index corresponding to the newly added dimension, wherein the task of the thread processes the data from the whole service starting time to t 1; after the task 1 is executed, checking whether real-time data are still stored in a database in the task 1 execution process, inquiring data in a specified range, and directly acquiring: scan't1', { column ═ c1', TIMERANGE ═ 1303668804000,1303668904000 }, if any, can take the current time as t2, and calculate the history data during the time from t1 to t2, and re-execute task 1; if no historical data needs to be calculated, the historical task thread pool can be closed, and the newly added dimension can be used normally together with the preset dimension.
Optionally, before the operation of obtaining the data information of at least one preset dimension corresponding to each piece of data to be processed in the real-time data stream, the embodiment of the present disclosure may further specifically include:
and configuring the category characteristics conforming to a preset low base number into the preset dimension based on the plurality of category characteristics contained in the data to be processed.
In the embodiment of the present disclosure, each class feature included in the data to be processed may be determined, then the class feature that may be represented by a large number of repetition numbers is used as the class feature conforming to the preset low base, and the class feature conforming to the preset low base is configured as the preset dimension. It should be noted that, in the embodiment of the present disclosure, the category feature of the common query may also be selected from a plurality of category features included in the data to be processed as a preset dimension.
For example, in an application scenario, the step of configuring the preset dimension may specifically be:
step one, configuring the dimension and the dimension value in a configuration center, for example, if a user wants to know the real-time number of people corresponding to a male and a female, the gender can be taken as the dimension for statistics, and the dimension value male and female is written into the configuration, specifically, the configuration format can be as follows:
Figure BDA0003698505010000141
and step two, checking whether the configuration center dimension is configured correctly, wherein the configuration dimension is mainly checked whether belongs to an existing column family in the HBase database, and if the dimension does not belong to the existing column family in the HBase database, it is indicated that the data written in the HBase database does not contain the column family, that is, the data belonging to the dimension does not exist.
And step three, when the dimension configuration is correct, namely the dimension belongs to an existing column group in the HBase database, generating blank bitmap indexes (bitmaps) with corresponding numbers according to the number of the dimensions and the number of the dimension values contained in each dimension, for example, two independent blank bitmaps are respectively generated by a male dimension value vector and a female dimension value vector.
And step four, generating a coprocessor (coprocessors) corresponding to the HBase database according to predefined logic, wherein the coprocessor is used for adding userid of the data into the bitmap of the corresponding dimension when the data is written into the HBase database. Specifically, the configuration center loads a configuration file defined by a user, and generates a regionoobserver implementation class one by one, and the implementation class inherits a baseregionoobserver class, wherein the baseregionbobserver is used as a base class of all user realization monitoring type coprocessors. The PostPut method is implemented in a specific implementation class, and the PostPut method is called after detecting the put behavior, and specific processing logic is added in the PostPut method for checking whether definition logic is satisfied, such as implementing a genterRegionObserver, and in the PostPut method, when the value of the put is male, the userId of the data is added to the male bitmap. And finally, after the code generation is finished, uploading the obtained File package to a specified Distributed File System (HDFS) directory.
And step five, adding the definition of the coprocessor, including the class name of the coprocessor, the file packet storage path, the priority and the like into the table descriptor, and writing the definition into the table descriptor through a JAVA API, wherein the defined coprocessor takes effect on the table. The specific content of the table descriptor may be as follows:
hTableDescriptor.addCoprocessor(GenderRegionObsever.class.getCanon icalName(),path,Coprocessor.PRIORITY_USER,null);
and step six, recording the current time as service starting time, and opening a message queue and a stream calculation engine service to start accessing data if the index initialization step is considered to be completed at the moment.
Optionally, the data processing method in the embodiment of the present disclosure may further specifically include:
receiving a deleting instruction aiming at the dimension to be offline; and responding to the deleting instruction, and deleting the dimension to be downlinked and the bitmap index corresponding to the dimension to be downlinked.
In the embodiment of the disclosure, when the usage rate of a certain preset dimension is low or does not meet the user requirement, the user may use the preset dimension as a dimension to be offline and perform a deletion operation, and accordingly, the electronic device may receive a deletion instruction for the dimension to be offline and delete the dimension to be offline and the bitmap index corresponding to the dimension to be offline in response to the deletion instruction. Therefore, when the dimension does not need to be counted, the dimension and the corresponding bitmap index can be deleted, and the problem of processing resource waste can be avoided.
For example, one implementation manner may be that each preset dimension is stored in a configuration center, when a user needs to delete a certain dimension, an operation of performing offline or deletion on a dimension to be offline in the configuration center may be performed, specifically, the dimension to be offline or deleted may be selected first, the operation of performing offline or deletion on the dimension is performed, and after the dimension to be offline or deleted is performed, a processing thread of a bitmap index corresponding to the dimension to be offline may also be unloaded, for example, a coprocessor corresponding to the dimension to be offline may be dynamically unloaded in a Hbase Shell manner, a software package generated and stored on a distributed file system (HDFS) when the coprocessor works is deleted, and all bitmap indexes under the dimension to be offline are deleted, and a specific code may be represented as follows:
hbase>alter'users',METHOD=>'table_att_unset',NAME=>'coprocessor$1'
for example, in practical applications, the preset dimension may be a plurality of dimension label data, for example, the preset dimension may be gender, age, etc., and if the data to be processed is described as a relational database structure, the data to be processed may be a structure represented as table 1:
TABLE 1
User id Sex Age (age) ......
11902 For male 25 ......
11903 Woman 16 ......
14532 For male 21 ......
...... ...... ...... ......
If the current user's reading instruction is: "current real-time number of male users", a bitmap index may be established based on the gender column to divide two vectors, male and female, where a notation of 1 indicates male and a notation of 0 indicates female. The strip with the user id 11902 is a male, so the bit position corresponding to the subscript of the user id is 1, the strip with the user id 11903 is a female, so the bit position corresponding to the user id is 0, and so on, only data which is consistent with the sex column value of the male needs to be added into the bitmap index, namely the user data is converted into 0 and 1 in the bitmap, and the table 2 can represent the male bitmap index.
TABLE 2
1 ...... 11902 11903 ...... 14532 ......
0 0 1 0 0 1 ......
Since only one check needs to be triggered when data is written, and data meeting the conditions is added into the bitmap index of the corresponding dimension, the bitmap index contains all userid with male gender in a time period, and the bitmap index can be directly counted to determine the reading instruction as the counting result of the current real-time male user number.
Assuming that the user reading instruction is "a user with a sex of male and an age above 30", it can be determined that there are two target dimensions, which are 1. the male vector bitmap index [0,1,1,0,1] for the sex column, 2. the age of the age column is greater than the bitmap index [1,1,0,0,0] for the 30 vector. Wherein the second bitmap index only needs to add all userid usernames older than 30 to one bitmap index. Since the two conditions of gender being male and age being 30 years old are logically linked by an and, a simple and operation (and) can be performed on the two bitmap indexes, and the resulting bitmap indexes are all sets of userid [0,1,0,0,0] satisfying gender being male and age being 30 years old or older.
Because the bitmap index itself is composed of bits of 0 and 1, the intersection and difference between two bitmap indexes is particularly efficient. There is no need to calculate in advance millions of combinations of dimensions, as with a conventional MOLAP. Only the bitmap index for the original dimension needs to be saved and can be recalculated as needed for multi-dimensional combinations.
For example, fig. 5 schematically illustrates a data processing architecture diagram provided by the embodiment of the present disclosure, as shown in fig. 5, a preset configured dimension 011 and an updated newly added dimension 013 to which history data is written by using a start thread may be stored in a configuration center 01, a dimension 012 to be offline may also be deleted, the configuration center 01 may call a coprocessor to connect to an HBase database 02, the HBase database 02 stores data in a message queue by a stream computing engine 021, and the HBase database 02 triggers writing of the data into a bitmap of a corresponding dimension to obtain a bitmap index 03 corresponding to each dimension, such as bitmap1, bitmap2, and bitmap3 … …, receive a read instruction 04 issued by a user, determine a corresponding target dimension 05 based on the read instruction 04, read a bitmap index corresponding to the target dimension from a bitmap index 03 corresponding to each dimension, calculate and combine the bitmap indexes corresponding to the target dimensions according to a target logical operation relationship, and obtaining a statistical result 06 corresponding to the reading instruction.
It should be noted that, in the data processing method provided in the embodiment of the present disclosure, the execution main body may be a data processing apparatus, or a control module in the data processing apparatus for executing the loaded data processing method. In the embodiment of the present disclosure, a data processing apparatus executes a loaded data processing method as an example, and the data processing method provided in the embodiment of the present disclosure is described. Next, a data processing apparatus of an exemplary embodiment of the present disclosure is described with reference to fig. 6.
Fig. 6 schematically shows a block diagram of a data processing apparatus according to an embodiment of the present disclosure, and as shown in fig. 6, the data processing apparatus 50 may include:
a first obtaining module 501, configured to obtain data information of at least one preset dimension corresponding to each to-be-processed data in a real-time data stream;
a first writing module 502, configured to write the data information into bitmap indexes corresponding to the preset dimensions respectively;
a first determining module 503, configured to determine at least one target dimension according to a filtering condition in the read instruction;
a second determining module 504, configured to determine, when the target dimension matches a preset dimension, a bitmap index corresponding to the target dimension;
and an operation module 505, configured to perform logical operation on the bitmap index corresponding to the target dimension according to a target logical operation relationship, and obtain a statistical result for the read instruction.
To sum up, the data processing apparatus provided in the embodiment of the present disclosure may first obtain data information corresponding to at least one preset dimension in each piece of data to be processed in a real-time data stream, then write the data information into bitmap indexes corresponding to the preset dimensions respectively, then determine at least one target dimension according to a screening condition in a read instruction, determine a bitmap index corresponding to the target dimension under the condition that the target dimension and the preset dimension are matched, and perform a logical operation on the bitmap index corresponding to the target dimension according to a target logical operation relationship, so as to obtain a statistical result for the read instruction. Therefore, the reading instruction is decomposed into a plurality of target dimensions, and the result is counted based on the bitmap index corresponding to the target dimensions, so that the calculation amount in the data processing process can be simplified, the aim of quickly responding to the reading instruction can be fulfilled, the data processing efficiency is improved, and the bitmap index corresponding to the single dimension is counted, and the probability that the bitmap index corresponding to the single dimension is invaded is small, so that an additional coprocessor is not required to be added for maintaining the bitmap index, and the maintenance cost is reduced.
Optionally, the first determining module 503 is further configured to:
splitting the screening condition to obtain at least one screening sub-condition;
and determining a corresponding target dimension according to any screening sub-condition to obtain at least one target dimension.
Optionally, the apparatus 50 further includes:
a third determining module, configured to determine the target logical operation relationship according to a logical relationship between the at least one filtering sub-condition indicated by the read instruction
Optionally, the apparatus 50 further includes:
a fourth determining module, configured to determine a newly added dimension when the target dimension is not matched with the preset dimension;
the selecting module is used for selecting the moment of configuring the bitmap index corresponding to the newly added dimension as a first moment;
the second writing module is used for sequentially writing the data information corresponding to the newly-added dimensionality in the first historical data into the bitmap index corresponding to the newly-added dimensionality; the first historical data is data which is not written into the bitmap index corresponding to the newly added dimension before the first time;
a fifth determining module, configured to use a time at which the bitmap index corresponding to the newly added dimension is successfully written as a second time;
the detection module is used for detecting whether second historical data exists or not; the second history data is data which is not written into the bitmap index corresponding to the newly added dimension before the second moment; if yes, continuously writing the data information corresponding to the newly added dimensionality in the second historical data into the bitmap index corresponding to the newly added dimensionality, and re-executing the time of successfully writing the bitmap index corresponding to the newly added dimensionality as the operation of a second time; and if not, continuously writing each data to be processed in the real-time data stream and the data information corresponding to the newly added dimension into the real-time data stream entering after the second moment into the bitmap index corresponding to the newly added dimension.
Optionally, the apparatus 50 further includes:
and the configuration module is used for configuring the category characteristics conforming to a preset low base number into the preset dimension based on the plurality of category characteristics contained in the data to be processed.
Optionally, the apparatus 50 further comprises:
the receiving module is used for receiving a deleting instruction aiming at the dimension to be offline;
and the deleting module is used for responding to the deleting instruction and deleting the dimension to be downlinked and the bitmap index corresponding to the dimension to be downlinked.
Optionally, the first obtaining module 501 is further configured to:
the second acquisition module is used for acquiring each new data in the real-time data stream;
the filtering module is used for filtering each new data according to a preset filtering rule, and taking the new data meeting the preset filtering rule as the data to be processed;
and the sixth determining module is used for determining data information corresponding to at least one preset dimension in the data to be processed.
Having described the data processing method and apparatus of the exemplary embodiments of the present disclosure, a storage medium of the exemplary embodiments of the present disclosure is explained next with reference to fig. 7.
Referring to fig. 7, a storage medium 600 for implementing the above method according to an embodiment of the present disclosure, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a device, such as a personal computer, is described. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
Having described the storage medium of the exemplary embodiment of the present disclosure, next, an electronic device of the exemplary embodiment of the present disclosure will be described with reference to fig. 7.
The electronic device 800 shown in fig. 8 is only an example and should not bring any limitations to the functionality or scope of use of the embodiments of the present disclosure.
As shown in fig. 8, the electronic device 800 is in the form of a general purpose computing device. The components of the electronic device 800 may include, but are not limited to: the at least one processing unit 810, the at least one memory unit 820, a bus 830 that couples various system components (including the memory unit 820 and the processing unit 810), and a display unit 840.
Wherein the storage unit stores program code that is executable by the processing unit 810 to cause the processing unit 810 to perform steps according to various exemplary embodiments of the present disclosure as described in the "exemplary methods" section above in this specification. For example, the processing unit 810 may execute step S101, and obtain data information corresponding to at least one preset dimension in each to-be-processed data in the real-time data stream; step S102, writing the data information into bitmap indexes corresponding to the preset dimensions respectively; step S103, determining at least one target dimension according to the screening condition in the reading instruction; step S104, determining a bitmap index corresponding to the target dimension under the condition that the target dimension is matched with a preset dimension; and step S105, performing logical operation on the bitmap index corresponding to the target dimension according to a target logical operation relation, and obtaining a statistical result aiming at the reading instruction.
The memory unit 820 may include volatile memory units such as a random access memory unit (RAM)8201 and/or a cache memory unit 8202, and may further include a read only memory unit (ROM) 8203.
Storage unit 820 may also include a program/utility module 8204 having a set (at least one) of program modules 8205, such program modules 8205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which or some combination thereof may comprise an implementation of a network environment.
Bus 830 may include a data bus, an address bus, and a control bus.
The electronic device 800 may also communicate with one or more external devices 70 (e.g., keyboard, pointing device, bluetooth device, etc.), which may be through an input/output (I/O) interface 850. The electronic device 800 further comprises a display unit 840 connected to the input/output (I/O) interface 850 for displaying. Also, the electronic device 800 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 860. As shown, the network adapter 860 communicates with the other modules of the electronic device 800 via the bus 830. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 800, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
It should be noted that although in the above detailed description several modules or sub-modules of the audio playback device and the audio sharing device are mentioned, such division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module, in accordance with embodiments of the present disclosure. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present disclosure.
While the present disclosure has been described with reference to the embodiments illustrated in the drawings, which are intended to be illustrative rather than restrictive, it will be apparent to those of ordinary skill in the art in light of the present disclosure that many more modifications may be made without departing from the spirit of the disclosure and the scope of the appended claims.

Claims (10)

1. A method of data processing, the method comprising:
acquiring data information corresponding to at least one preset dimension in each to-be-processed data in a real-time data stream;
respectively writing the data information into bitmap indexes corresponding to the preset dimensions;
determining at least one target dimension according to the screening condition in the reading instruction;
under the condition that the target dimension is matched with a preset dimension, determining a bitmap index corresponding to the target dimension;
and performing logical operation on the bitmap index corresponding to the target dimension according to a target logical operation relation to obtain a statistical result aiming at the reading instruction.
2. The method of claim 1, wherein determining at least one target dimension according to a filtering condition in the fetch instruction comprises:
splitting the screening condition to obtain at least one screening sub-condition;
and determining a corresponding target dimension according to any screening sub-condition to obtain at least one target dimension.
3. The method of claim 2, further comprising:
and determining the target logical operation relationship according to the logical relationship between the at least one screening sub-condition indicated by the reading instruction.
4. The method of claim 1, further comprising, before performing the logical operation on the bitmap index according to the target logical operation relationship to obtain the statistical result for the read instruction:
determining a newly added dimension under the condition that the target dimension is not matched with the preset dimension;
selecting the moment of configuring a corresponding bitmap index for the newly added dimension as a first moment;
sequentially writing data information corresponding to the newly added dimension in the first historical data into a bitmap index corresponding to the newly added dimension; the first history data is data which is not written into the bitmap index corresponding to the newly added dimension before the first time;
taking the time of successfully writing the bitmap index corresponding to the newly added dimension as a second time;
detecting whether second historical data exists; the second historical data is data which is not written into the bitmap index corresponding to the newly added dimension before the second moment;
if so, continuously writing the data information corresponding to the newly added dimension in the second historical data into the bitmap index corresponding to the newly added dimension, and re-executing the time of successfully writing the bitmap index corresponding to the newly added dimension as the operation of a second time;
and if not, continuously writing each data to be processed in the real-time data stream and the data information corresponding to the newly added dimension into the real-time data stream entering after the second moment into the bitmap index corresponding to the newly added dimension.
5. The method according to claim 1, before the obtaining data information corresponding to at least one preset dimension in each data to be processed in the real-time data stream, further comprising:
and configuring the class characteristics conforming to a preset low base number into the preset dimension based on the plurality of class characteristics contained in the data to be processed.
6. The method of claim 1, further comprising:
receiving a deleting instruction aiming at the dimension to be offline;
and responding to the deleting instruction, and deleting the dimension to be downloaded and the bitmap index corresponding to the dimension to be downloaded.
7. The method according to claim 1, wherein the acquiring data information corresponding to at least one preset dimension in each to-be-processed data in the real-time data stream comprises:
acquiring each new data in the real-time data stream;
filtering each new data according to a preset filtering rule, and taking the new data which accords with the preset filtering rule as the data to be processed;
and respectively determining data information corresponding to at least one preset dimension for each piece of data to be processed.
8. A data processing apparatus, characterized in that the apparatus comprises:
the first acquisition module is used for acquiring data information corresponding to at least one preset dimension in each data to be processed in the real-time data stream;
the first writing module is used for respectively writing the data information into the bitmap indexes corresponding to the preset dimensionality;
the first determining module is used for determining at least one target dimension according to the screening condition in the reading instruction;
the second determining module is used for determining the bitmap index corresponding to the target dimension under the condition that the target dimension is matched with a preset dimension;
and the operation module is used for carrying out logical operation on the bitmap index corresponding to the target dimension according to a target logical operation relation to obtain a statistical result aiming at the reading instruction.
9. A storage medium having stored thereon a computer program, characterized in that the computer program, when executed by a processor, implements the data processing method of any one of claims 1 to 7.
10. An electronic device, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the data processing method of any of claims 1-7 via execution of the executable instructions.
CN202210687405.1A 2022-06-16 2022-06-16 Data processing method and device, storage medium and electronic equipment Pending CN114925101A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210687405.1A CN114925101A (en) 2022-06-16 2022-06-16 Data processing method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210687405.1A CN114925101A (en) 2022-06-16 2022-06-16 Data processing method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN114925101A true CN114925101A (en) 2022-08-19

Family

ID=82814998

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210687405.1A Pending CN114925101A (en) 2022-06-16 2022-06-16 Data processing method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN114925101A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116303833A (en) * 2023-05-18 2023-06-23 联通沃音乐文化有限公司 OLAP-based vectorized data hybrid storage method
CN116974727A (en) * 2023-08-31 2023-10-31 中科驭数(北京)科技有限公司 Data stream processing method, device, equipment and medium based on multiple processing cores

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116303833A (en) * 2023-05-18 2023-06-23 联通沃音乐文化有限公司 OLAP-based vectorized data hybrid storage method
CN116303833B (en) * 2023-05-18 2023-07-21 联通沃音乐文化有限公司 OLAP-based vectorized data hybrid storage method
CN116974727A (en) * 2023-08-31 2023-10-31 中科驭数(北京)科技有限公司 Data stream processing method, device, equipment and medium based on multiple processing cores

Similar Documents

Publication Publication Date Title
CN110321958B (en) Training method of neural network model and video similarity determination method
CN114925101A (en) Data processing method and device, storage medium and electronic equipment
US8656377B2 (en) Tracking variable information in optimized code
CN111258966A (en) Data deduplication method, device, equipment and storage medium
CN109471851B (en) Data processing method, device, server and storage medium
CN109582906B (en) Method, device, equipment and storage medium for determining data reliability
JP2012113706A (en) Computer-implemented method, computer program, and data processing system for optimizing database query
CN107133263A (en) POI recommends method, device, equipment and computer-readable recording medium
CN111694866A (en) Data searching and storing method, data searching system, data searching device, data searching equipment and data searching medium
CN112118010A (en) Compression processing method and device for character strings and storage medium
CN112463800A (en) Data reading method and device, server and storage medium
CN109815241B (en) Data query method, device, equipment and storage medium
CN109033456B (en) Condition query method and device, electronic equipment and storage medium
CN114168616A (en) Data acquisition method and device, electronic equipment and storage medium
CN117149777B (en) Data query method, device, equipment and storage medium
CN109542912B (en) Interval data storage method, device, server and storage medium
CN111198917A (en) Data processing method, device, equipment and storage medium
CN114860819A (en) Method, device, equipment and storage medium for constructing business intelligent system
CN114547086A (en) Data processing method, device, equipment and computer readable storage medium
CN113626650A (en) Service processing method and device and electronic equipment
CN114510605A (en) Data storage method and device, electronic equipment and storage medium
CN110750569A (en) Data extraction method, device, equipment and storage medium
US20140149419A1 (en) Complex event processing apparatus for referring to table within external database as external reference object
CN111049988A (en) Intimacy prediction method, system, equipment and storage medium for mobile equipment
CN110928898A (en) Data acquisition method, data acquisition device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination