CN109597826B - Data processing method and device, electronic equipment and computer readable storage medium - Google Patents

Data processing method and device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN109597826B
CN109597826B CN201811027624.7A CN201811027624A CN109597826B CN 109597826 B CN109597826 B CN 109597826B CN 201811027624 A CN201811027624 A CN 201811027624A CN 109597826 B CN109597826 B CN 109597826B
Authority
CN
China
Prior art keywords
data
processing
fragments
sub
screening conditions
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811027624.7A
Other languages
Chinese (zh)
Other versions
CN109597826A (en
Inventor
沈立方
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Advanced New Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Advanced New Technologies Co Ltd filed Critical Advanced New Technologies Co Ltd
Priority to CN201811027624.7A priority Critical patent/CN109597826B/en
Publication of CN109597826A publication Critical patent/CN109597826A/en
Application granted granted Critical
Publication of CN109597826B publication Critical patent/CN109597826B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The embodiment of the invention discloses a data processing method, a data processing device, electronic equipment and a computer readable storage medium, wherein the method comprises the following steps: acquiring a data screening condition set, wherein the data screening condition set comprises two or more data screening conditions; acquiring an original data set, and preprocessing the original data set based on the data screening condition set to obtain a preprocessed data set, wherein the data comprises one or more data dimensions; and performing preset processing on the preprocessed data set. According to the technical scheme, a plurality of data processing tasks can be integrated into one data processing task to be processed, so that on the premise of ensuring the validity of data processing, the computing resources are saved, and the time consumed by computing is reduced.

Description

Data processing method and device, electronic equipment and computer readable storage medium
Technical Field
The embodiment of the invention relates to the technical field of big data processing, in particular to a data processing method and device, electronic equipment and a computer readable storage medium.
Background
With the development of network technology, the cloud era has come quietly, and big data is also generated. Big data refers to a collection of data that cannot be captured, managed and processed within a certain time frame with conventional software tools, which requires a new data processing model. On the current big data computing platform, each computation submitted by a user is accompanied by a series of operations such as computing task decomposition, resource scheduling, data routing, result combination and the like, and because the amount of data to be processed is generally huge, the required computing resources and time consumption are generally huge, especially in the data routing part, because the amount of data is often at the level of GB or TB, each data routing brings higher IO consumption. How to save the computing resources and reduce the computing time on the premise of ensuring effective computing is an urgent problem to be solved.
Disclosure of Invention
The embodiment of the invention provides a data processing method, a data processing device, electronic equipment and a computer readable storage medium.
In a first aspect, an embodiment of the present invention provides a data processing method.
Specifically, the data processing method includes:
acquiring a data screening condition set, wherein the data screening condition set comprises two or more data screening conditions;
acquiring an original data set, and preprocessing the original data set based on the data screening condition set to obtain a preprocessed data set, wherein the data comprises one or more data dimensions;
and carrying out preset processing on the preprocessed data set.
With reference to the first aspect, in a first implementation manner of the first aspect, the obtaining an original data set, and preprocessing the original data set based on the data screening condition set to obtain a preprocessed data set includes:
acquiring an original data set;
extracting data which meet all data screening conditions in the data screening condition set from the original data set;
and combining the extracted data into a preprocessed data set.
With reference to the first aspect and the first implementation manner of the first aspect, in a second implementation manner of the first aspect, the performing preset processing on the preprocessed data set includes:
acquiring a preset processing condition, wherein the preset processing condition is related to the data dimension;
performing data routing processing on the data in the preprocessed data set according to the preset processing condition to obtain two or more data fragments;
and carrying out data aggregation processing on the data fragments.
With reference to the first aspect, the first implementation manner of the first aspect, and the second implementation manner of the first aspect, in a third implementation manner of the first aspect, the performing data aggregation processing on the data slice includes:
dividing the data fragments into data sub-fragments corresponding to the data screening conditions according to the data screening conditions;
acquiring a data aggregation processing command;
and carrying out data aggregation processing on the data sub-fragments according to the data aggregation processing command.
With reference to the first aspect, the first implementation manner of the first aspect, and the second implementation manner of the first aspect, in a fourth implementation manner of the first aspect, the performing data aggregation processing on the data fragment includes:
acquiring a data aggregation processing command;
performing data aggregation processing on the data fragments according to the data aggregation processing command;
and dividing the data fragments subjected to the data aggregation processing into data sub-fragments corresponding to the data screening conditions according to the data screening conditions.
In a second aspect, an embodiment of the present invention provides a data processing apparatus.
Specifically, the data processing apparatus includes:
an obtaining module configured to obtain a set of data screening conditions, wherein the set of data screening conditions includes two or more data screening conditions;
the preprocessing module is configured to acquire an original data set and preprocess the original data set based on the data screening condition set to obtain a preprocessed data set, wherein the data comprises one or more data dimensions;
the processing module is configured to perform preset processing on the preprocessed data set.
With reference to the second aspect, in a first implementation manner of the second aspect, the preprocessing module includes:
a first obtaining submodule configured to obtain an original data set;
the extraction submodule is configured to extract data which meet all data screening conditions in the data screening condition set from the original data set;
and the combining submodule is configured to combine the extracted data into a preprocessed data set.
With reference to the second aspect and the first implementation manner of the second aspect, in a second implementation manner of the second aspect, the processing module includes:
a second obtaining submodule configured to obtain a preset processing condition, wherein the preset processing condition is related to the data dimension;
the first processing submodule is configured to perform data routing processing on the data in the preprocessed data set according to the preset processing condition to obtain two or more data fragments;
and the second processing submodule is configured to perform data aggregation processing on the data fragments.
With reference to the second aspect, the first implementation manner of the second aspect, and the second implementation manner of the second aspect, in a third implementation manner of the second aspect, the second processing sub-module includes:
the first fragment sub-module is configured to divide the data fragments into data sub-fragments corresponding to the data screening conditions according to the data screening conditions;
the third acquisition sub-module is configured to acquire a data aggregation processing command;
and the third processing sub-module is configured to perform data aggregation processing on the data sub-fragments according to the data aggregation processing command.
With reference to the second aspect, the first implementation manner of the second aspect, and the second implementation manner of the second aspect, in a fourth implementation manner of the second aspect, the second processing sub-module includes:
the fourth acquisition submodule is configured to acquire a data aggregation processing command;
the fourth processing submodule is configured to perform data aggregation processing on the data fragments according to the data aggregation processing command;
and the second fragmentation submodule is configured to divide the data fragments subjected to the data aggregation processing into data sub-fragments corresponding to the data screening conditions according to the data screening conditions.
In a third aspect, an embodiment of the present invention provides an electronic device, which includes a memory and a processor, where the memory is used to store one or more computer instructions that support a data processing apparatus to execute the data processing method in the first aspect, and the processor is configured to execute the computer instructions stored in the memory. The data processing apparatus may further comprise a communication interface for the data processing apparatus to communicate with other devices or a communication network.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium for storing computer instructions for a data processing apparatus, where the computer instructions include computer instructions for executing the data processing method in the first aspect to the data processing apparatus.
The technical scheme provided by the embodiment of the invention has the following beneficial effects:
according to the technical scheme, the data screening conditions are integrated, the original data set is preprocessed in advance based on the integration result, and then subsequent processing operation is carried out according to the preprocessed data set. According to the technical scheme, a plurality of data processing tasks can be integrated into one data processing task for processing, so that on the premise of ensuring the effectiveness of data processing, the computing resources are saved, and the computing time is reduced.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of embodiments of the invention.
Drawings
Other features, objects and advantages of embodiments of the invention will become more apparent from the following detailed description of non-limiting embodiments thereof, taken in conjunction with the accompanying drawings. In the drawings:
FIG. 1 shows a flow diagram of a data processing method according to an embodiment of the invention;
FIG. 2 shows a flow chart of step S102 of the data processing method according to the embodiment shown in FIG. 1;
FIG. 3 shows a flow chart of step S103 of the data processing method according to the embodiment shown in FIG. 1;
FIG. 4 shows a flow chart of step S303 of the data processing method according to one embodiment shown in FIG. 3;
fig. 5 shows a flow chart of step S303 of the data processing method according to another embodiment shown in fig. 3;
FIG. 6 shows a block diagram of a data processing apparatus according to an embodiment of the invention;
fig. 7 shows a block diagram of the preprocessing module 602 of the data processing apparatus according to the embodiment shown in fig. 6;
fig. 8 shows a block diagram of a processing module 603 of the data processing apparatus according to the embodiment shown in fig. 6;
fig. 9 is a block diagram showing a second processing submodule 803 of the data processing apparatus according to the embodiment shown in fig. 8;
fig. 10 is a block diagram showing a second processing submodule 803 of the data processing apparatus according to another embodiment shown in fig. 8;
FIG. 11 shows a block diagram of an electronic device according to an embodiment of the invention;
fig. 12 is a schematic block diagram of a computer system suitable for implementing a data processing method according to an embodiment of the present invention.
Detailed Description
Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement them. Furthermore, parts that are not relevant to the description of the exemplary embodiments have been omitted from the drawings for the sake of clarity.
In the embodiments of the present invention, it should be understood that terms such as "including" or "having", etc., are intended to indicate the presence of the features, numerals, steps, actions, components, parts, or combinations thereof disclosed in the specification, and are not intended to exclude the possibility that one or more other features, numerals, steps, actions, components, parts, or combinations thereof are present or added.
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict. Embodiments of the present invention will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
The technical scheme provided by the embodiment of the invention integrates the data screening conditions, preprocesses the original data set in advance based on the integration result, and then performs subsequent processing operation according to the preprocessed data set. According to the technical scheme, a plurality of data processing tasks can be integrated into one data processing task to be processed, so that on the premise of ensuring the validity of data processing, the computing resources are saved, and the computing time is reduced.
Fig. 1 shows a flow chart of a data processing method according to an embodiment of the invention, which, as shown in fig. 1, comprises the following steps S101-S103:
in step S101, a data screening condition set is obtained, where the data screening condition set includes two or more data screening conditions;
in step S102, an original data set is obtained, and the original data set is preprocessed based on the data screening condition set to obtain a preprocessed data set, where the data includes one or more data dimensions;
in step S103, a preset process is performed on the preprocessed data set.
As mentioned above, with the development of network technology, the cloud era has come quietly and big data should be generated. On the current big data computing platform, each computation submitted by a user is accompanied by a series of operations such as computing task decomposition, resource scheduling, data routing, result combination and the like, and because the amount of data to be processed is generally huge, the required computing resources and time consumption are generally huge, especially in the data routing part, because the amount of data is often at the level of GB or TB, each data routing brings higher IO consumption. On the premise of ensuring effective calculation, the method saves calculation resources and reduces calculation time, which is a problem to be solved urgently.
In view of the above problem, in this embodiment, a data processing method is proposed, which integrates the data screening conditions, pre-processes the original data set in advance based on the integration result, and then performs subsequent processing operations according to the pre-processed data set. According to the technical scheme, a plurality of data processing tasks can be integrated into one data processing task to be processed, so that on the premise of ensuring the validity of data processing, the computing resources are saved, and the computing time is reduced.
The present invention is described in detail below by taking a Structured Query Language (SQL) process based on big data as an example.
The data screening condition refers to conditions and requirements for screening data, which are required in a normal data processing process. In the structured query processing process based on big data, a query command usually carries a data screening condition and a data calculation command, which can embody the query and calculation purposes, wherein the data screening condition refers to which data a user wants to query, and the data calculation command refers to which kind or kinds of calculation results based on the data the user wants to obtain.
The data dimension refers to information which can be used as a query condition by a user during query and is related to characteristics, sources and the like of data. In an optional implementation manner of this embodiment, the data dimension includes one or more of the following dimensions: data type, data characteristics, data history processing strategy, data history processing result, data labeling information and the like, wherein the data type is used for representing the category to which the data belongs, it should be noted that the type definitions of the data stored in different databases may be different, and the specific data type definition is related to the purpose of data storage and processing, and is not specifically limited by the invention; the data characteristics may include attribute characteristics of the data and characteristics of other data related to data storage and processing purposes; the data history processing strategy refers to a strategy used for history processing of data before the data is stored in a database; the data history processing result refers to a result obtained by performing history processing on the data before the data is stored in a database; the data annotation information refers to annotated information that makes the data distinctive compared to other data, which may be used in subsequent queries or processing.
Taking the monitoring big data as an example, it is assumed that each piece of monitoring data stored in the database includes four dimensions: the method comprises the following steps of event type, characteristic, detection strategy and detection result, wherein the characteristic and detection strategy dimension is a dimension which has important reference significance and larger information quantity, and then a user can generate a corresponding query command according to the requirement of the user, such as: inquiring a command 1, and carrying out quantity statistics on all data under two dimensions of characteristics and a detection strategy; the query command 2 is used for carrying out quantity statistics on data with a detection result of Y under two dimensions of the characteristics and the detection strategy, namely carrying out quantity statistics on the data with the detection result of Y under the two dimensions of the characteristics and the detection strategy based on the screening result after the data is screened; and the query command 3 is used for carrying out quantity statistics on the data with the event type of 'sync' in two dimensions of the characteristic and the detection strategy, namely after the data with the event type of 'sync' is screened, carrying out quantity statistics on the data with the two dimensions of the characteristic and the detection strategy based on the screening result. In the above query commands, the detection result of "Y" and the event type of "sync" are data filtering conditions, and the number is counted as a data calculation command.
In practical applications, even for the same database, due to different query calculation purposes of different users, the statistical view angle is different, so that a plurality of query commands with the same data calculation command and different data screening conditions are likely to be generated. In the prior art, the query commands are all executed and routed in a one-to-one correspondence, which occupies a large amount of routing computing resources. In order to effectively save routing calculation resources, the method carries out integration processing on query commands, namely for the query commands with the same data calculation commands, firstly extracts data meeting all data screening conditions in the query commands to carry out data routing, then executes the data calculation commands on the basis of the obtained data, and considers specific data screening conditions again when executing the data calculation commands, so that a plurality of single-sentence query commands are combined into a single-sentence query command, the refined screening conditions are delayed to a specific data aggregation stage, a plurality of query calculations are submitted to a calculation platform in a calculation task, the same data routing logic is shared in the calculation process, the optimization effects of one-time data routing and output of a plurality of results are realized, the accuracy of the query calculations can be ensured, the routing calculation resources can be saved, the query calculation processing time is reduced, and the query calculation processing efficiency is improved.
Wherein the preset processing comprises one or more of the following processing: data fragmentation, data routing, data aggregation, data statistics, data computation, and the like.
In an optional implementation manner of this embodiment, as shown in fig. 2, the step S102 of acquiring an original data set, and performing preprocessing on the original data set based on the data screening condition set to obtain a preprocessed data set includes the following steps S201 to S203:
in step S201, an original data set is acquired;
in step S202, extracting data in the original data set that meets all data filtering conditions in the data filtering condition set;
in step S203, the extracted data are combined into a preprocessed data set.
In order to simplify the data processing flow, save the routing computation resources, reduce the query computation processing time, and improve the query computation processing efficiency, in this embodiment, the data objects to be processed are first preprocessed to integrate the query commands. Specifically, an original data set is obtained firstly; and then extracting data meeting all data screening conditions in the data screening condition set from the original data set to form a preprocessed data set, wherein the preprocessed data set is used as a data basis for subsequent data processing operation.
In an optional implementation manner of this embodiment, the data filtering condition is related to the data dimension, that is, the data obtained after being filtered by the data filtering condition is data with a certain data dimension characteristic.
Still taking monitoring of large data as an example, as mentioned in the description above for the example, the data screening conditions include: if the detection result is "Y" and the event type is "sync", in this implementation, the data screening condition set includes two data screening conditions, i.e., the detection result is "Y" and the event type is "sync", and the preprocessed data set obtained through preprocessing is composed of all data satisfying the two data screening conditions, i.e., the preprocessed data set includes all data whose detection result is "Y" or whose event type is "sync".
In an optional implementation manner of this embodiment, as shown in fig. 3, the step S103, that is, the step of performing the preset processing on the pre-processing data set, includes the following steps S301 to S303:
in step S301, a preset processing condition is acquired;
in step S302, performing data routing processing on the data in the preprocessed data set according to the preset processing condition to obtain two or more data fragments;
in step S303, a data aggregation process is performed on the data slice.
In this embodiment, the data set obtained through the preprocessing is further processed through data routing and aggregation to obtain a corresponding data processing result. Specifically, firstly, acquiring preset processing conditions; then, carrying out data routing processing on the data in the preprocessed data set according to the preset processing conditions to obtain two or more data fragments; and carrying out data aggregation processing on the data fragments.
Wherein the preset processing condition is related to the data dimension and is used for providing routing basis for data routing. For example, the preset processing condition may be data routing according to two dimensions of a feature and a detection policy.
In an optional implementation manner of this embodiment, as shown in fig. 4, the step S303, that is, the step of performing data aggregation processing on the data slice, includes the following steps S401 to S403:
in step S401, dividing the data segments into data sub-segments corresponding to the data screening conditions according to the data screening conditions;
in step S402, a data aggregation processing command is acquired;
in step S403, performing data aggregation processing on the data sub-slices according to the data aggregation processing command.
As mentioned above, in order to save data routing resources, the present disclosure performs data screening preprocessing on a data set according to all data screening conditions, that is, it is equivalent to performing integration processing on the data screening conditions, but the integration processing only obtains data bases of subsequent data routing and aggregation processing based on all data screening conditions, and does not show the particularity of different data screening conditions, so in this embodiment, when performing final aggregation processing on data, specific data screening conditions are taken into account to obtain a data processing result matching a data processing command, specifically, the data fragments are first divided into data sub-fragments corresponding to the data screening conditions according to the data screening conditions; then acquiring a data aggregation processing command; and finally, carrying out data aggregation processing on the data sub-fragments according to the data aggregation processing command.
The data aggregation processing command may be, for example, a data calculation command such as a count command and a quantity statistic command, or may be other data aggregation commands, which is not limited in the present invention.
In another optional implementation manner of this embodiment, the execution order of the data splitting and data aggregation processing based on the data filtering condition may be interchanged, that is, as shown in fig. 5, the step S303 of performing data aggregation processing on the data fragments includes the following steps S501 to S503:
in step S501, a data aggregation processing command is acquired;
in step S502, performing data aggregation processing on the data fragments according to the data aggregation processing command;
in step S503, the data slices subjected to the data aggregation processing are divided into data sub-slices corresponding to the data screening conditions according to the data screening conditions.
In this embodiment, first, data aggregation processing is performed on the data fragments according to a data aggregation processing command, and then, the data fragments subjected to the data aggregation processing are split based on different data screening conditions, so as to obtain a data processing result matched with the data processing command.
The following are embodiments of the apparatus of the present invention that may be used to perform embodiments of the method of the present invention.
Fig. 6 shows a block diagram of a data processing apparatus according to an embodiment of the present invention, which may be implemented as part or all of an electronic device by software, hardware, or a combination of both. As shown in fig. 6, the data processing apparatus includes:
an obtaining module 601 configured to obtain a data screening condition set, where the data screening condition set includes two or more data screening conditions;
a preprocessing module 602 configured to obtain an original data set, and preprocess the original data set based on the data screening condition set to obtain a preprocessed data set, where the data includes one or more data dimensions;
a processing module 603 configured to perform a preset process on the preprocessed data set.
As mentioned above, with the development of network technologies, cloud times have come quietly, and big data should be generated. On the current big data computing platform, each computation submitted by a user is accompanied by a series of operations such as computing task decomposition, resource scheduling, data routing, result combination and the like, and because the amount of data to be processed is generally huge, the required computing resources and time consumption are generally huge, especially in the data routing part, because the amount of data is often at the level of GB or TB, each data routing brings higher IO consumption. On the premise of ensuring effective calculation, the method saves calculation resources and reduces calculation time, which is a problem to be solved urgently.
In view of the above problem, in this embodiment, a data processing apparatus is proposed, which integrates data screening conditions, pre-processes an original data set based on an integration result, and then performs a post-processing operation according to the pre-processed data set. According to the technical scheme, a plurality of data processing tasks can be integrated into one data processing task to be processed, so that on the premise of ensuring the validity of data processing, the computing resources are saved, and the computing time is reduced.
The present invention is described in detail below by taking a Structured Query Language (SQL) process based on big data as an example.
The data screening condition refers to a condition and a requirement for screening data, which are required in a normal data processing process. In the structured query processing process based on big data, a query command usually carries a data screening condition and a data calculation command, which can embody the query and calculation purposes, wherein the data screening condition refers to which data a user wants to query, and the data calculation command refers to which kind or kinds of calculation results based on the data the user wants to obtain.
The data dimension refers to information which can be used as a query condition by a user during query and is related to characteristics, sources and the like of data. In an optional implementation manner of this embodiment, the data dimension includes one or more of the following dimensions: data type, data characteristics, data history processing strategy, data history processing result, data labeling information and the like, wherein the data type is used for representing the category to which the data belongs, it is noted that the type definitions of the data stored in different databases may be different, and the specific data type definitions are related to the purposes of data storage and processing, and the invention is not particularly limited thereto; the data characteristics may include attribute characteristics of the data and characteristics of other data related to data storage and processing purposes; the data history processing strategy refers to a strategy used for history processing of data before the data is stored in a database; the data history processing result refers to a result obtained by performing history processing on the data before the data is stored in a database; the data annotation information refers to annotated information that makes the data distinctive compared to other data, and is likely to be used in subsequent queries or processing.
Taking the monitoring big data as an example, it is assumed that each piece of monitoring data stored in the database includes four dimensions: the method comprises the following steps of event type, characteristics, detection strategy and detection result, wherein the characteristics and the detection strategy dimension are dimensions which have important reference meanings and larger information quantity relatively, so that a user can generate corresponding query commands according to the needs of the user, such as: inquiring a command 1, and carrying out quantity statistics on all data under two dimensions of characteristics and a detection strategy; the query command 2 is used for carrying out quantity statistics on data with a detection result of Y under two dimensions of characteristics and a detection strategy, namely after data screening is carried out with the detection result of Y, carrying out quantity statistics on the data under the two dimensions of the characteristics and the detection strategy based on the screening result; and inquiring a command 3, and carrying out quantity statistics on the data with the event type of 'sync' in two dimensions of the characteristic and the detection strategy, namely carrying out quantity statistics on the data with the event type of 'sync' in two dimensions of the characteristic and the detection strategy based on a screening result after screening the data with the event type of 'sync'. In the above query commands, the detection result of "Y" and the event type of "sync" are data screening conditions, and the number is counted as a data calculation command.
In practical applications, even for the same database, due to different query calculation purposes of different users, the statistical perspective is different, so that a plurality of query commands with the same data calculation command and different data screening conditions are likely to be generated. In the prior art, the query commands are all executed and routed in a one-to-one correspondence manner, which occupies a large amount of routing computing resources. In order to effectively save routing computing resources, the method performs integration processing on query commands, namely for query commands with the same data computing commands, firstly extracts data meeting all data screening conditions in the query commands to perform data routing, then executes the data computing commands based on the obtained data, and considers specific data screening conditions again when executing the data computing commands, so that a plurality of single-sentence query commands are combined into a single-sentence query command, the refined screening conditions are delayed to a specific data aggregation stage, a plurality of query computations are submitted to a computing platform by one computing task, the same data routing logic is shared in the computing process, the optimization effects of one-time data routing and output of a plurality of results are realized, the accuracy of the query computations can be ensured, the routing computing resources can be saved, the query computation processing time is reduced, and the query computation processing efficiency is improved.
Wherein the preset processing comprises one or more of the following processing: data fragmentation, data routing, data aggregation, data statistics, data computation, and the like.
In an optional implementation manner of this embodiment, as shown in fig. 7, the preprocessing module 602 includes:
a first obtaining sub-module 701 configured to obtain an original data set;
an extraction sub-module 702 configured to extract data in the original data set that meets all data filtering conditions in the data filtering condition set;
a combining submodule 703 configured to combine the extracted data into a preprocessed data set.
In order to simplify the data processing flow, save the routing computation resource, reduce the query computation processing time, and improve the query computation processing efficiency, in this embodiment, the data object to be processed is first preprocessed to integrate the query command. Specifically, the first obtaining sub-module 701 obtains an original data set; the extraction sub-module 702 extracts data in the original data set that satisfies all the data screening conditions in the data screening condition set, and the combination sub-module 703 forms a preprocessed data set, which serves as a data basis for subsequent data processing operations.
In an optional implementation manner of this embodiment, the data filtering condition is related to the data dimension, that is, the data obtained after being filtered by the data filtering condition is data with a certain data dimension characteristic.
Still taking monitoring of large data as an example, as mentioned above in the description of the example, the data screening conditions include: if the detection result is "Y" and the event type is "sync", in this implementation, the data screening condition set includes two data screening conditions, i.e., the detection result is "Y" and the event type is "sync", and the preprocessed data set obtained through preprocessing is composed of all data satisfying the two data screening conditions, i.e., the preprocessed data set includes all data whose detection result is "Y" or whose event type is "sync".
In an optional implementation manner of this embodiment, as shown in fig. 8, the processing module 603 includes:
a second obtaining sub-module 801 configured to obtain a preset processing condition;
a first processing sub-module 802, configured to perform data routing processing on the data in the preprocessed data set according to the preset processing condition, so as to obtain two or more data fragments;
the second processing sub-module 803 is configured to perform data aggregation processing on the data slices.
In this embodiment, the data set obtained through the preprocessing is further processed through data routing and aggregation to obtain a corresponding data processing result. Specifically, the second obtaining sub-module 801 obtains a preset processing condition; the first processing sub-module 802 performs data routing processing on the data in the preprocessed data set according to the preset processing condition to obtain two or more data fragments; the second processing sub-module 803 performs data aggregation processing on the data fragments.
Wherein the preset processing condition is related to the data dimension and is used for providing routing basis for data routing. For example, the preset processing condition may be data routing according to two dimensions, namely, a feature dimension and a detection policy dimension.
In an optional implementation manner of this embodiment, as shown in fig. 9, the second processing sub-module 803 includes:
a first fragmentation sub-module 901, configured to divide the data fragments into data fragmentation sub-fragments corresponding to the data screening conditions according to the data screening conditions;
a third obtaining submodule 902 configured to obtain a data aggregation processing command;
and the third processing submodule 903 is configured to perform data aggregation processing on the data sub-slices according to the data aggregation processing command.
As mentioned above, in order to save data routing resources, the present disclosure performs data screening preprocessing on a data set according to all data screening conditions, that is, it is equivalent to performing integration processing on the data screening conditions, but the integration processing only obtains data bases of subsequent data routing and aggregation processing based on all data screening conditions, and does not show the particularity of different data screening conditions, so in this embodiment, when performing final aggregation processing on data, specific data screening conditions are taken into account to obtain a data processing result matching a data processing command, specifically, the first fragmentation sub-module 901 divides the data fragments into data fragmentation corresponding to the data screening conditions according to the data screening conditions; the third obtaining sub-module 902 obtains a data aggregation processing command; and the third processing sub-module 903 performs data aggregation processing on the data sub-fragments according to the data aggregation processing command.
The data aggregation processing command may be, for example, a data calculation command such as a count command and a quantity statistic command, or may be other data aggregation commands, which is not limited in the present invention.
In another optional implementation manner of this embodiment, the execution order of the data splitting and data aggregating processes based on the data filtering condition may be interchanged, that is, as shown in fig. 10, the second processing sub-module 803 includes:
a fourth obtaining sub-module 1001 configured to obtain a data aggregation processing command;
the fourth processing submodule 1002 is configured to perform data aggregation processing on the data slice according to the data aggregation processing command;
the second fragmentation submodule 1003 is configured to divide the data fragments subjected to the data aggregation processing into data fragmentation corresponding to the data screening condition according to the data screening condition.
In this embodiment, the fourth processing sub-module 1002 performs data aggregation processing on the data fragment according to the data aggregation processing command, and the second fragment sub-module 1003 splits the data fragment subjected to the data aggregation processing based on different data screening conditions to obtain a data processing result matching with the data processing command.
Fig. 11 is a block diagram illustrating a structure of an electronic device according to an embodiment of the present invention, and as shown in fig. 11, the electronic device 1100 includes a memory 1101 and a processor 1102; wherein, the first and the second end of the pipe are connected with each other,
the memory 1101 is used to store one or more computer instructions that are executed by the processor 1102 to implement any of the method steps described above.
Fig. 12 is a schematic block diagram of a computer system suitable for use in implementing a data processing method according to an embodiment of the present invention.
As shown in fig. 12, the computer system 1200 includes a Central Processing Unit (CPU) 1201, which can execute various processes in the above-described embodiments according to a program stored in a Read Only Memory (ROM) 1202 or a program loaded from a storage section 1208 into a Random Access Memory (RAM) 1203. In the RAM1203, various programs and data necessary for the operation of the system 1200 are also stored. The CPU1201, ROM1202, and RAM1203 are connected to each other by a bus 1204. An input/output (I/O) interface 1205 is also connected to bus 1204.
The following components are connected to the I/O interface 1205: an input section 1206 including a keyboard, a mouse, and the like; an output portion 1207 including a display device such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 1208 including a hard disk and the like; and a communication section 1209 including a network interface card such as a LAN card, a modem, or the like. The communication section 1209 performs communication processing via a network such as the internet. A driver 1210 is also connected to the I/O interface 1205 as needed. A removable medium 1211, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is mounted on the drive 1210 as necessary, so that a computer program read out therefrom is mounted into the storage section 1208 as necessary.
In particular, the above described method may be implemented as a computer software program according to an embodiment of the present invention. For example, embodiments of the invention include a computer program product comprising a computer program tangibly embodied on a medium readable thereby, the computer program comprising program code for performing the data processing method. In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 1209, and/or installed from the removable medium 1211.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowcharts or block diagrams may represent a module, a program segment, or a portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units or modules described in the embodiments of the present invention may be implemented by software or hardware. The units or modules described may also be provided in a processor, and the names of the units or modules do not in some cases constitute a limitation of the units or modules themselves.
As another aspect, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium may be a computer-readable storage medium included in the apparatus in the foregoing embodiment; or it may be a separate computer readable storage medium not incorporated into the device. The computer readable storage medium stores one or more programs for use by one or more processors in performing the methods described in the embodiments of the present invention.
The foregoing description is only exemplary of the preferred embodiments of the invention and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention according to the embodiments of the present invention is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept. For example, the above features and (but not limited to) the features with similar functions disclosed in the embodiments of the present invention are mutually replaced to form the technical solution.

Claims (10)

1. A data processing method, comprising:
acquiring a data screening condition set, wherein the data screening condition set comprises two or more data screening conditions;
acquiring an original data set, and preprocessing the original data set based on the data screening condition set to obtain a preprocessed data set, wherein the data comprises one or more data dimensions;
presetting the preprocessing data set;
the pre-processing data set includes:
acquiring a preset processing condition, wherein the preset processing condition is related to the data dimension;
performing data routing processing on the data in the preprocessed data set according to the preset processing condition to obtain two or more data fragments;
and carrying out data aggregation processing on the data fragments.
2. The method of claim 1, wherein the obtaining a raw data set and preprocessing the raw data set based on the set of data screening conditions to obtain a preprocessed data set comprises:
acquiring an original data set;
extracting data which meet all data screening conditions in the data screening condition set from the original data set;
and combining the extracted data into a preprocessed data set.
3. The method according to claim 1, wherein the performing data aggregation processing on the data slice includes:
dividing the data fragments into data sub-fragments corresponding to the data screening conditions according to the data screening conditions;
acquiring a data aggregation processing command;
and performing data aggregation processing on the data sub-fragments according to the data aggregation processing command.
4. The method according to claim 1, wherein the performing data aggregation processing on the data slice includes:
acquiring a data aggregation processing command;
performing data aggregation processing on the data fragments according to the data aggregation processing command;
and dividing the data fragments subjected to the data aggregation processing into data sub-fragments corresponding to the data screening conditions according to the data screening conditions.
5. A data processing apparatus, comprising:
an obtaining module configured to obtain a set of data screening conditions, wherein the set of data screening conditions includes two or more data screening conditions;
the preprocessing module is configured to acquire an original data set and preprocess the original data set based on the data screening condition set to obtain a preprocessed data set, wherein the data comprises one or more data dimensions;
the processing module is configured to carry out preset processing on the preprocessed data set;
the processing module comprises:
a second obtaining submodule configured to obtain a preset processing condition, wherein the preset processing condition is related to the data dimension;
the first processing submodule is configured to perform data routing processing on the data in the preprocessed data set according to the preset processing condition to obtain two or more data fragments;
and the second processing submodule is configured to perform data aggregation processing on the data fragments.
6. The data processing apparatus of claim 5, wherein the pre-processing module comprises:
a first obtaining sub-module configured to obtain an original data set;
the extraction sub-module is configured to extract data meeting all data screening conditions in the data screening condition set from the original data set;
and the combining submodule is configured to combine the extracted data into a preprocessed data set.
7. The data processing apparatus of claim 5, wherein the second processing submodule comprises:
the first fragment sub-module is configured to divide the data fragments into data sub-fragments corresponding to the data screening conditions according to the data screening conditions;
a third obtaining sub-module configured to obtain a data aggregation processing command;
and the third processing sub-module is configured to perform data aggregation processing on the data sub-fragments according to the data aggregation processing command.
8. The data processing apparatus of claim 5, wherein the second processing submodule comprises:
the fourth acquisition submodule is configured to acquire a data aggregation processing command;
the fourth processing submodule is configured to perform data aggregation processing on the data fragments according to the data aggregation processing command;
and the second fragmentation sub-module is configured to divide the data fragmentation subjected to data aggregation processing into data sub-fragmentation corresponding to the data screening condition according to the data screening condition.
9. An electronic device comprising a memory and a processor; wherein, the first and the second end of the pipe are connected with each other,
the memory is configured to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method steps of any of claims 1-4.
10. A computer-readable storage medium having stored thereon computer instructions, characterized in that the computer instructions, when executed by a processor, carry out the method steps of any of claims 1-4.
CN201811027624.7A 2018-09-04 2018-09-04 Data processing method and device, electronic equipment and computer readable storage medium Active CN109597826B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811027624.7A CN109597826B (en) 2018-09-04 2018-09-04 Data processing method and device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811027624.7A CN109597826B (en) 2018-09-04 2018-09-04 Data processing method and device, electronic equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN109597826A CN109597826A (en) 2019-04-09
CN109597826B true CN109597826B (en) 2023-02-21

Family

ID=65957019

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811027624.7A Active CN109597826B (en) 2018-09-04 2018-09-04 Data processing method and device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN109597826B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112148804A (en) * 2019-06-28 2020-12-29 京东数字科技控股有限公司 Data preprocessing method, device and storage medium thereof
CN112258690B (en) * 2020-10-23 2022-09-06 中车青岛四方机车车辆股份有限公司 Data access method and device and data storage method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8214370B1 (en) * 2009-03-26 2012-07-03 Crossbow Technology, Inc. Data pre-processing and indexing for efficient retrieval and enhanced presentation
CN106302702A (en) * 2016-08-10 2017-01-04 华为技术有限公司 Burst storage method, the Apparatus and system of data
CN106331117A (en) * 2016-08-26 2017-01-11 中国科学技术大学 Data transmission method
CN107491885A (en) * 2017-08-25 2017-12-19 上海找钢网信息科技股份有限公司 A kind of air control platform and risk control management method for steel trade financial business
CN108228736A (en) * 2017-12-12 2018-06-29 深圳市买买提信息科技有限公司 Data processing method, data processing system and computer readable storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4278918B2 (en) * 2002-04-19 2009-06-17 富士通株式会社 Image data processing apparatus and method
US8996463B2 (en) * 2012-07-26 2015-03-31 Mongodb, Inc. Aggregation framework system architecture and method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8214370B1 (en) * 2009-03-26 2012-07-03 Crossbow Technology, Inc. Data pre-processing and indexing for efficient retrieval and enhanced presentation
CN106302702A (en) * 2016-08-10 2017-01-04 华为技术有限公司 Burst storage method, the Apparatus and system of data
CN106331117A (en) * 2016-08-26 2017-01-11 中国科学技术大学 Data transmission method
CN107491885A (en) * 2017-08-25 2017-12-19 上海找钢网信息科技股份有限公司 A kind of air control platform and risk control management method for steel trade financial business
CN108228736A (en) * 2017-12-12 2018-06-29 深圳市买买提信息科技有限公司 Data processing method, data processing system and computer readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于MongoDB的数据库高可用高性能研究;陈辰等;《电脑知识与技术》;20171105(第31期);全文 *

Also Published As

Publication number Publication date
CN109597826A (en) 2019-04-09

Similar Documents

Publication Publication Date Title
EP3916584A1 (en) Information processing method and apparatus, electronic device and storage medium
CN111274256B (en) Resource management and control method, device, equipment and storage medium based on time sequence database
CN109597826B (en) Data processing method and device, electronic equipment and computer readable storage medium
CN112236761A (en) Dynamic incremental update of data cubes
CN112231417A (en) Data classification method and device, electronic equipment and storage medium
CN115408381A (en) Data processing method and related equipment
CN115168398A (en) Data query method and device, electronic equipment and storage medium
CN109416688B (en) Method and system for flexible high performance structured data processing
CN114880368A (en) Data query method and device, electronic equipment and readable storage medium
CN110825526A (en) Distributed scheduling method and device based on ER relationship, equipment and storage medium
CN110347653A (en) Data processing method and device, electronic equipment and readable storage medium storing program for executing
CN113918532A (en) Portrait label aggregation method, electronic device and storage medium
CN116010447A (en) Load balancing method and device for optimizing heterogeneous database user query
CN111159213A (en) Data query method, device, system and storage medium
CN110941536B (en) Monitoring method and system, and first server cluster
CN112507098B (en) Question processing method, question processing device, electronic equipment, storage medium and program product
CN114610825A (en) Method and device for confirming associated grid set, electronic equipment and storage medium
CN113590322A (en) Data processing method and device
CN113204426A (en) Task processing method of resource pool and related equipment
CN109086279B (en) Report caching method and device
CN112435151A (en) Government affair information data processing method and system based on correlation analysis
CN112671593B (en) Server management method and related equipment
CN113641670B (en) Data storage and data retrieval method and device, electronic equipment and storage medium
CN117349016A (en) Resource allocation method, device, equipment and medium
CN117809133A (en) Data set management method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200923

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

Effective date of registration: 20200923

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant before: Advanced innovation technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant