CN109597826B

CN109597826B - Data processing method and device, electronic equipment and computer readable storage medium

Info

Publication number: CN109597826B
Application number: CN201811027624.7A
Authority: CN
Inventors: 沈立方
Original assignee: Advanced New Technologies Co Ltd
Current assignee: Advanced New Technologies Co Ltd; Advantageous New Technologies Co Ltd
Priority date: 2018-09-04
Filing date: 2018-09-04
Publication date: 2023-02-21
Anticipated expiration: 2038-09-04
Also published as: CN109597826A

Abstract

The embodiment of the invention discloses a data processing method, a data processing device, electronic equipment and a computer readable storage medium, wherein the method comprises the following steps: acquiring a data screening condition set, wherein the data screening condition set comprises two or more data screening conditions; acquiring an original data set, and preprocessing the original data set based on the data screening condition set to obtain a preprocessed data set, wherein the data comprises one or more data dimensions; and performing preset processing on the preprocessed data set. According to the technical scheme, a plurality of data processing tasks can be integrated into one data processing task to be processed, so that on the premise of ensuring the validity of data processing, the computing resources are saved, and the time consumed by computing is reduced.

Description

Data processing method and device, electronic equipment and computer readable storage medium

Technical Field

The embodiment of the invention relates to the technical field of big data processing, in particular to a data processing method and device, electronic equipment and a computer readable storage medium.

Background

With the development of network technology, the cloud era has come quietly, and big data is also generated. Big data refers to a collection of data that cannot be captured, managed and processed within a certain time frame with conventional software tools, which requires a new data processing model. On the current big data computing platform, each computation submitted by a user is accompanied by a series of operations such as computing task decomposition, resource scheduling, data routing, result combination and the like, and because the amount of data to be processed is generally huge, the required computing resources and time consumption are generally huge, especially in the data routing part, because the amount of data is often at the level of GB or TB, each data routing brings higher IO consumption. How to save the computing resources and reduce the computing time on the premise of ensuring effective computing is an urgent problem to be solved.

Disclosure of Invention

The embodiment of the invention provides a data processing method, a data processing device, electronic equipment and a computer readable storage medium.

In a first aspect, an embodiment of the present invention provides a data processing method.

Specifically, the data processing method includes:

acquiring a data screening condition set, wherein the data screening condition set comprises two or more data screening conditions;

acquiring an original data set, and preprocessing the original data set based on the data screening condition set to obtain a preprocessed data set, wherein the data comprises one or more data dimensions;

and carrying out preset processing on the preprocessed data set.

With reference to the first aspect, in a first implementation manner of the first aspect, the obtaining an original data set, and preprocessing the original data set based on the data screening condition set to obtain a preprocessed data set includes:

acquiring an original data set;

extracting data which meet all data screening conditions in the data screening condition set from the original data set;

and combining the extracted data into a preprocessed data set.

With reference to the first aspect and the first implementation manner of the first aspect, in a second implementation manner of the first aspect, the performing preset processing on the preprocessed data set includes:

acquiring a preset processing condition, wherein the preset processing condition is related to the data dimension;

performing data routing processing on the data in the preprocessed data set according to the preset processing condition to obtain two or more data fragments;

and carrying out data aggregation processing on the data fragments.

With reference to the first aspect, the first implementation manner of the first aspect, and the second implementation manner of the first aspect, in a third implementation manner of the first aspect, the performing data aggregation processing on the data slice includes:

dividing the data fragments into data sub-fragments corresponding to the data screening conditions according to the data screening conditions;

acquiring a data aggregation processing command;

and carrying out data aggregation processing on the data sub-fragments according to the data aggregation processing command.

With reference to the first aspect, the first implementation manner of the first aspect, and the second implementation manner of the first aspect, in a fourth implementation manner of the first aspect, the performing data aggregation processing on the data fragment includes:

acquiring a data aggregation processing command;

performing data aggregation processing on the data fragments according to the data aggregation processing command;

and dividing the data fragments subjected to the data aggregation processing into data sub-fragments corresponding to the data screening conditions according to the data screening conditions.

In a second aspect, an embodiment of the present invention provides a data processing apparatus.

Specifically, the data processing apparatus includes:

an obtaining module configured to obtain a set of data screening conditions, wherein the set of data screening conditions includes two or more data screening conditions;

the preprocessing module is configured to acquire an original data set and preprocess the original data set based on the data screening condition set to obtain a preprocessed data set, wherein the data comprises one or more data dimensions;

the processing module is configured to perform preset processing on the preprocessed data set.

With reference to the second aspect, in a first implementation manner of the second aspect, the preprocessing module includes:

a first obtaining submodule configured to obtain an original data set;

the extraction submodule is configured to extract data which meet all data screening conditions in the data screening condition set from the original data set;

and the combining submodule is configured to combine the extracted data into a preprocessed data set.

With reference to the second aspect and the first implementation manner of the second aspect, in a second implementation manner of the second aspect, the processing module includes:

a second obtaining submodule configured to obtain a preset processing condition, wherein the preset processing condition is related to the data dimension;

the first processing submodule is configured to perform data routing processing on the data in the preprocessed data set according to the preset processing condition to obtain two or more data fragments;

and the second processing submodule is configured to perform data aggregation processing on the data fragments.

With reference to the second aspect, the first implementation manner of the second aspect, and the second implementation manner of the second aspect, in a third implementation manner of the second aspect, the second processing sub-module includes:

the first fragment sub-module is configured to divide the data fragments into data sub-fragments corresponding to the data screening conditions according to the data screening conditions;

the third acquisition sub-module is configured to acquire a data aggregation processing command;

and the third processing sub-module is configured to perform data aggregation processing on the data sub-fragments according to the data aggregation processing command.

With reference to the second aspect, the first implementation manner of the second aspect, and the second implementation manner of the second aspect, in a fourth implementation manner of the second aspect, the second processing sub-module includes:

the fourth acquisition submodule is configured to acquire a data aggregation processing command;

the fourth processing submodule is configured to perform data aggregation processing on the data fragments according to the data aggregation processing command;

and the second fragmentation submodule is configured to divide the data fragments subjected to the data aggregation processing into data sub-fragments corresponding to the data screening conditions according to the data screening conditions.

In a third aspect, an embodiment of the present invention provides an electronic device, which includes a memory and a processor, where the memory is used to store one or more computer instructions that support a data processing apparatus to execute the data processing method in the first aspect, and the processor is configured to execute the computer instructions stored in the memory. The data processing apparatus may further comprise a communication interface for the data processing apparatus to communicate with other devices or a communication network.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium for storing computer instructions for a data processing apparatus, where the computer instructions include computer instructions for executing the data processing method in the first aspect to the data processing apparatus.

The technical scheme provided by the embodiment of the invention has the following beneficial effects:

according to the technical scheme, the data screening conditions are integrated, the original data set is preprocessed in advance based on the integration result, and then subsequent processing operation is carried out according to the preprocessed data set. According to the technical scheme, a plurality of data processing tasks can be integrated into one data processing task for processing, so that on the premise of ensuring the effectiveness of data processing, the computing resources are saved, and the computing time is reduced.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of embodiments of the invention.

Drawings

Other features, objects and advantages of embodiments of the invention will become more apparent from the following detailed description of non-limiting embodiments thereof, taken in conjunction with the accompanying drawings. In the drawings:

FIG. 1 shows a flow diagram of a data processing method according to an embodiment of the invention;

FIG. 2 shows a flow chart of step S102 of the data processing method according to the embodiment shown in FIG. 1;

FIG. 3 shows a flow chart of step S103 of the data processing method according to the embodiment shown in FIG. 1;

FIG. 4 shows a flow chart of step S303 of the data processing method according to one embodiment shown in FIG. 3;

fig. 5 shows a flow chart of step S303 of the data processing method according to another embodiment shown in fig. 3;

FIG. 6 shows a block diagram of a data processing apparatus according to an embodiment of the invention;

fig. 7 shows a block diagram of the preprocessing module 602 of the data processing apparatus according to the embodiment shown in fig. 6;

fig. 8 shows a block diagram of a processing module 603 of the data processing apparatus according to the embodiment shown in fig. 6;

fig. 9 is a block diagram showing a second processing submodule 803 of the data processing apparatus according to the embodiment shown in fig. 8;

fig. 10 is a block diagram showing a second processing submodule 803 of the data processing apparatus according to another embodiment shown in fig. 8;

FIG. 11 shows a block diagram of an electronic device according to an embodiment of the invention;

fig. 12 is a schematic block diagram of a computer system suitable for implementing a data processing method according to an embodiment of the present invention.

Detailed Description

Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement them. Furthermore, parts that are not relevant to the description of the exemplary embodiments have been omitted from the drawings for the sake of clarity.

In the embodiments of the present invention, it should be understood that terms such as "including" or "having", etc., are intended to indicate the presence of the features, numerals, steps, actions, components, parts, or combinations thereof disclosed in the specification, and are not intended to exclude the possibility that one or more other features, numerals, steps, actions, components, parts, or combinations thereof are present or added.

It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict. Embodiments of the present invention will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

The technical scheme provided by the embodiment of the invention integrates the data screening conditions, preprocesses the original data set in advance based on the integration result, and then performs subsequent processing operation according to the preprocessed data set. According to the technical scheme, a plurality of data processing tasks can be integrated into one data processing task to be processed, so that on the premise of ensuring the validity of data processing, the computing resources are saved, and the computing time is reduced.

Fig. 1 shows a flow chart of a data processing method according to an embodiment of the invention, which, as shown in fig. 1, comprises the following steps S101-S103:

in step S101, a data screening condition set is obtained, where the data screening condition set includes two or more data screening conditions;

in step S102, an original data set is obtained, and the original data set is preprocessed based on the data screening condition set to obtain a preprocessed data set, where the data includes one or more data dimensions;

in step S103, a preset process is performed on the preprocessed data set.

As mentioned above, with the development of network technology, the cloud era has come quietly and big data should be generated. On the current big data computing platform, each computation submitted by a user is accompanied by a series of operations such as computing task decomposition, resource scheduling, data routing, result combination and the like, and because the amount of data to be processed is generally huge, the required computing resources and time consumption are generally huge, especially in the data routing part, because the amount of data is often at the level of GB or TB, each data routing brings higher IO consumption. On the premise of ensuring effective calculation, the method saves calculation resources and reduces calculation time, which is a problem to be solved urgently.

In view of the above problem, in this embodiment, a data processing method is proposed, which integrates the data screening conditions, pre-processes the original data set in advance based on the integration result, and then performs subsequent processing operations according to the pre-processed data set. According to the technical scheme, a plurality of data processing tasks can be integrated into one data processing task to be processed, so that on the premise of ensuring the validity of data processing, the computing resources are saved, and the computing time is reduced.

The present invention is described in detail below by taking a Structured Query Language (SQL) process based on big data as an example.

The data screening condition refers to conditions and requirements for screening data, which are required in a normal data processing process. In the structured query processing process based on big data, a query command usually carries a data screening condition and a data calculation command, which can embody the query and calculation purposes, wherein the data screening condition refers to which data a user wants to query, and the data calculation command refers to which kind or kinds of calculation results based on the data the user wants to obtain.

The data dimension refers to information which can be used as a query condition by a user during query and is related to characteristics, sources and the like of data. In an optional implementation manner of this embodiment, the data dimension includes one or more of the following dimensions: data type, data characteristics, data history processing strategy, data history processing result, data labeling information and the like, wherein the data type is used for representing the category to which the data belongs, it should be noted that the type definitions of the data stored in different databases may be different, and the specific data type definition is related to the purpose of data storage and processing, and is not specifically limited by the invention; the data characteristics may include attribute characteristics of the data and characteristics of other data related to data storage and processing purposes; the data history processing strategy refers to a strategy used for history processing of data before the data is stored in a database; the data history processing result refers to a result obtained by performing history processing on the data before the data is stored in a database; the data annotation information refers to annotated information that makes the data distinctive compared to other data, which may be used in subsequent queries or processing.

Taking the monitoring big data as an example, it is assumed that each piece of monitoring data stored in the database includes four dimensions: the method comprises the following steps of event type, characteristic, detection strategy and detection result, wherein the characteristic and detection strategy dimension is a dimension which has important reference significance and larger information quantity, and then a user can generate a corresponding query command according to the requirement of the user, such as: inquiring a command 1, and carrying out quantity statistics on all data under two dimensions of characteristics and a detection strategy; the query command 2 is used for carrying out quantity statistics on data with a detection result of Y under two dimensions of the characteristics and the detection strategy, namely carrying out quantity statistics on the data with the detection result of Y under the two dimensions of the characteristics and the detection strategy based on the screening result after the data is screened; and the query command 3 is used for carrying out quantity statistics on the data with the event type of 'sync' in two dimensions of the characteristic and the detection strategy, namely after the data with the event type of 'sync' is screened, carrying out quantity statistics on the data with the two dimensions of the characteristic and the detection strategy based on the screening result. In the above query commands, the detection result of "Y" and the event type of "sync" are data filtering conditions, and the number is counted as a data calculation command.

In practical applications, even for the same database, due to different query calculation purposes of different users, the statistical view angle is different, so that a plurality of query commands with the same data calculation command and different data screening conditions are likely to be generated. In the prior art, the query commands are all executed and routed in a one-to-one correspondence, which occupies a large amount of routing computing resources. In order to effectively save routing calculation resources, the method carries out integration processing on query commands, namely for the query commands with the same data calculation commands, firstly extracts data meeting all data screening conditions in the query commands to carry out data routing, then executes the data calculation commands on the basis of the obtained data, and considers specific data screening conditions again when executing the data calculation commands, so that a plurality of single-sentence query commands are combined into a single-sentence query command, the refined screening conditions are delayed to a specific data aggregation stage, a plurality of query calculations are submitted to a calculation platform in a calculation task, the same data routing logic is shared in the calculation process, the optimization effects of one-time data routing and output of a plurality of results are realized, the accuracy of the query calculations can be ensured, the routing calculation resources can be saved, the query calculation processing time is reduced, and the query calculation processing efficiency is improved.

Wherein the preset processing comprises one or more of the following processing: data fragmentation, data routing, data aggregation, data statistics, data computation, and the like.

In an optional implementation manner of this embodiment, as shown in fig. 2, the step S102 of acquiring an original data set, and performing preprocessing on the original data set based on the data screening condition set to obtain a preprocessed data set includes the following steps S201 to S203:

in step S201, an original data set is acquired;

in step S202, extracting data in the original data set that meets all data filtering conditions in the data filtering condition set;

in step S203, the extracted data are combined into a preprocessed data set.

In order to simplify the data processing flow, save the routing computation resources, reduce the query computation processing time, and improve the query computation processing efficiency, in this embodiment, the data objects to be processed are first preprocessed to integrate the query commands. Specifically, an original data set is obtained firstly; and then extracting data meeting all data screening conditions in the data screening condition set from the original data set to form a preprocessed data set, wherein the preprocessed data set is used as a data basis for subsequent data processing operation.

In an optional implementation manner of this embodiment, the data filtering condition is related to the data dimension, that is, the data obtained after being filtered by the data filtering condition is data with a certain data dimension characteristic.

Still taking monitoring of large data as an example, as mentioned in the description above for the example, the data screening conditions include: if the detection result is "Y" and the event type is "sync", in this implementation, the data screening condition set includes two data screening conditions, i.e., the detection result is "Y" and the event type is "sync", and the preprocessed data set obtained through preprocessing is composed of all data satisfying the two data screening conditions, i.e., the preprocessed data set includes all data whose detection result is "Y" or whose event type is "sync".

In an optional implementation manner of this embodiment, as shown in fig. 3, the step S103, that is, the step of performing the preset processing on the pre-processing data set, includes the following steps S301 to S303:

in step S301, a preset processing condition is acquired;

in step S302, performing data routing processing on the data in the preprocessed data set according to the preset processing condition to obtain two or more data fragments;

in step S303, a data aggregation process is performed on the data slice.

In this embodiment, the data set obtained through the preprocessing is further processed through data routing and aggregation to obtain a corresponding data processing result. Specifically, firstly, acquiring preset processing conditions; then, carrying out data routing processing on the data in the preprocessed data set according to the preset processing conditions to obtain two or more data fragments; and carrying out data aggregation processing on the data fragments.

Wherein the preset processing condition is related to the data dimension and is used for providing routing basis for data routing. For example, the preset processing condition may be data routing according to two dimensions of a feature and a detection policy.

In an optional implementation manner of this embodiment, as shown in fig. 4, the step S303, that is, the step of performing data aggregation processing on the data slice, includes the following steps S401 to S403:

in step S401, dividing the data segments into data sub-segments corresponding to the data screening conditions according to the data screening conditions;

in step S402, a data aggregation processing command is acquired;

in step S403, performing data aggregation processing on the data sub-slices according to the data aggregation processing command.

As mentioned above, in order to save data routing resources, the present disclosure performs data screening preprocessing on a data set according to all data screening conditions, that is, it is equivalent to performing integration processing on the data screening conditions, but the integration processing only obtains data bases of subsequent data routing and aggregation processing based on all data screening conditions, and does not show the particularity of different data screening conditions, so in this embodiment, when performing final aggregation processing on data, specific data screening conditions are taken into account to obtain a data processing result matching a data processing command, specifically, the data fragments are first divided into data sub-fragments corresponding to the data screening conditions according to the data screening conditions; then acquiring a data aggregation processing command; and finally, carrying out data aggregation processing on the data sub-fragments according to the data aggregation processing command.

The data aggregation processing command may be, for example, a data calculation command such as a count command and a quantity statistic command, or may be other data aggregation commands, which is not limited in the present invention.

In another optional implementation manner of this embodiment, the execution order of the data splitting and data aggregation processing based on the data filtering condition may be interchanged, that is, as shown in fig. 5, the step S303 of performing data aggregation processing on the data fragments includes the following steps S501 to S503:

in step S501, a data aggregation processing command is acquired;

in step S502, performing data aggregation processing on the data fragments according to the data aggregation processing command;

in step S503, the data slices subjected to the data aggregation processing are divided into data sub-slices corresponding to the data screening conditions according to the data screening conditions.

In this embodiment, first, data aggregation processing is performed on the data fragments according to a data aggregation processing command, and then, the data fragments subjected to the data aggregation processing are split based on different data screening conditions, so as to obtain a data processing result matched with the data processing command.

The following are embodiments of the apparatus of the present invention that may be used to perform embodiments of the method of the present invention.

Fig. 6 shows a block diagram of a data processing apparatus according to an embodiment of the present invention, which may be implemented as part or all of an electronic device by software, hardware, or a combination of both. As shown in fig. 6, the data processing apparatus includes:

an obtaining module 601 configured to obtain a data screening condition set, where the data screening condition set includes two or more data screening conditions;

a preprocessing module 602 configured to obtain an original data set, and preprocess the original data set based on the data screening condition set to obtain a preprocessed data set, where the data includes one or more data dimensions;

a processing module 603 configured to perform a preset process on the preprocessed data set.

As mentioned above, with the development of network technologies, cloud times have come quietly, and big data should be generated. On the current big data computing platform, each computation submitted by a user is accompanied by a series of operations such as computing task decomposition, resource scheduling, data routing, result combination and the like, and because the amount of data to be processed is generally huge, the required computing resources and time consumption are generally huge, especially in the data routing part, because the amount of data is often at the level of GB or TB, each data routing brings higher IO consumption. On the premise of ensuring effective calculation, the method saves calculation resources and reduces calculation time, which is a problem to be solved urgently.

In view of the above problem, in this embodiment, a data processing apparatus is proposed, which integrates data screening conditions, pre-processes an original data set based on an integration result, and then performs a post-processing operation according to the pre-processed data set. According to the technical scheme, a plurality of data processing tasks can be integrated into one data processing task to be processed, so that on the premise of ensuring the validity of data processing, the computing resources are saved, and the computing time is reduced.

The data screening condition refers to a condition and a requirement for screening data, which are required in a normal data processing process. In the structured query processing process based on big data, a query command usually carries a data screening condition and a data calculation command, which can embody the query and calculation purposes, wherein the data screening condition refers to which data a user wants to query, and the data calculation command refers to which kind or kinds of calculation results based on the data the user wants to obtain.

The data dimension refers to information which can be used as a query condition by a user during query and is related to characteristics, sources and the like of data. In an optional implementation manner of this embodiment, the data dimension includes one or more of the following dimensions: data type, data characteristics, data history processing strategy, data history processing result, data labeling information and the like, wherein the data type is used for representing the category to which the data belongs, it is noted that the type definitions of the data stored in different databases may be different, and the specific data type definitions are related to the purposes of data storage and processing, and the invention is not particularly limited thereto; the data characteristics may include attribute characteristics of the data and characteristics of other data related to data storage and processing purposes; the data history processing strategy refers to a strategy used for history processing of data before the data is stored in a database; the data history processing result refers to a result obtained by performing history processing on the data before the data is stored in a database; the data annotation information refers to annotated information that makes the data distinctive compared to other data, and is likely to be used in subsequent queries or processing.

Taking the monitoring big data as an example, it is assumed that each piece of monitoring data stored in the database includes four dimensions: the method comprises the following steps of event type, characteristics, detection strategy and detection result, wherein the characteristics and the detection strategy dimension are dimensions which have important reference meanings and larger information quantity relatively, so that a user can generate corresponding query commands according to the needs of the user, such as: inquiring a command 1, and carrying out quantity statistics on all data under two dimensions of characteristics and a detection strategy; the query command 2 is used for carrying out quantity statistics on data with a detection result of Y under two dimensions of characteristics and a detection strategy, namely after data screening is carried out with the detection result of Y, carrying out quantity statistics on the data under the two dimensions of the characteristics and the detection strategy based on the screening result; and inquiring a command 3, and carrying out quantity statistics on the data with the event type of 'sync' in two dimensions of the characteristic and the detection strategy, namely carrying out quantity statistics on the data with the event type of 'sync' in two dimensions of the characteristic and the detection strategy based on a screening result after screening the data with the event type of 'sync'. In the above query commands, the detection result of "Y" and the event type of "sync" are data screening conditions, and the number is counted as a data calculation command.

In practical applications, even for the same database, due to different query calculation purposes of different users, the statistical perspective is different, so that a plurality of query commands with the same data calculation command and different data screening conditions are likely to be generated. In the prior art, the query commands are all executed and routed in a one-to-one correspondence manner, which occupies a large amount of routing computing resources. In order to effectively save routing computing resources, the method performs integration processing on query commands, namely for query commands with the same data computing commands, firstly extracts data meeting all data screening conditions in the query commands to perform data routing, then executes the data computing commands based on the obtained data, and considers specific data screening conditions again when executing the data computing commands, so that a plurality of single-sentence query commands are combined into a single-sentence query command, the refined screening conditions are delayed to a specific data aggregation stage, a plurality of query computations are submitted to a computing platform by one computing task, the same data routing logic is shared in the computing process, the optimization effects of one-time data routing and output of a plurality of results are realized, the accuracy of the query computations can be ensured, the routing computing resources can be saved, the query computation processing time is reduced, and the query computation processing efficiency is improved.

In an optional implementation manner of this embodiment, as shown in fig. 7, the preprocessing module 602 includes:

a first obtaining sub-module 701 configured to obtain an original data set;

an extraction sub-module 702 configured to extract data in the original data set that meets all data filtering conditions in the data filtering condition set;

a combining submodule 703 configured to combine the extracted data into a preprocessed data set.

In order to simplify the data processing flow, save the routing computation resource, reduce the query computation processing time, and improve the query computation processing efficiency, in this embodiment, the data object to be processed is first preprocessed to integrate the query command. Specifically, the first obtaining sub-module 701 obtains an original data set; the extraction sub-module 702 extracts data in the original data set that satisfies all the data screening conditions in the data screening condition set, and the combination sub-module 703 forms a preprocessed data set, which serves as a data basis for subsequent data processing operations.

Still taking monitoring of large data as an example, as mentioned above in the description of the example, the data screening conditions include: if the detection result is "Y" and the event type is "sync", in this implementation, the data screening condition set includes two data screening conditions, i.e., the detection result is "Y" and the event type is "sync", and the preprocessed data set obtained through preprocessing is composed of all data satisfying the two data screening conditions, i.e., the preprocessed data set includes all data whose detection result is "Y" or whose event type is "sync".

In an optional implementation manner of this embodiment, as shown in fig. 8, the processing module 603 includes:

a second obtaining sub-module 801 configured to obtain a preset processing condition;

a first processing sub-module 802, configured to perform data routing processing on the data in the preprocessed data set according to the preset processing condition, so as to obtain two or more data fragments;

the second processing sub-module 803 is configured to perform data aggregation processing on the data slices.

In this embodiment, the data set obtained through the preprocessing is further processed through data routing and aggregation to obtain a corresponding data processing result. Specifically, the second obtaining sub-module 801 obtains a preset processing condition; the first processing sub-module 802 performs data routing processing on the data in the preprocessed data set according to the preset processing condition to obtain two or more data fragments; the second processing sub-module 803 performs data aggregation processing on the data fragments.

Wherein the preset processing condition is related to the data dimension and is used for providing routing basis for data routing. For example, the preset processing condition may be data routing according to two dimensions, namely, a feature dimension and a detection policy dimension.

In an optional implementation manner of this embodiment, as shown in fig. 9, the second processing sub-module 803 includes:

a first fragmentation sub-module 901, configured to divide the data fragments into data fragmentation sub-fragments corresponding to the data screening conditions according to the data screening conditions;

a third obtaining submodule 902 configured to obtain a data aggregation processing command;

and the third processing submodule 903 is configured to perform data aggregation processing on the data sub-slices according to the data aggregation processing command.

As mentioned above, in order to save data routing resources, the present disclosure performs data screening preprocessing on a data set according to all data screening conditions, that is, it is equivalent to performing integration processing on the data screening conditions, but the integration processing only obtains data bases of subsequent data routing and aggregation processing based on all data screening conditions, and does not show the particularity of different data screening conditions, so in this embodiment, when performing final aggregation processing on data, specific data screening conditions are taken into account to obtain a data processing result matching a data processing command, specifically, the first fragmentation sub-module 901 divides the data fragments into data fragmentation corresponding to the data screening conditions according to the data screening conditions; the third obtaining sub-module 902 obtains a data aggregation processing command; and the third processing sub-module 903 performs data aggregation processing on the data sub-fragments according to the data aggregation processing command.

In another optional implementation manner of this embodiment, the execution order of the data splitting and data aggregating processes based on the data filtering condition may be interchanged, that is, as shown in fig. 10, the second processing sub-module 803 includes:

a fourth obtaining sub-module 1001 configured to obtain a data aggregation processing command;

the fourth processing submodule 1002 is configured to perform data aggregation processing on the data slice according to the data aggregation processing command;

the second fragmentation submodule 1003 is configured to divide the data fragments subjected to the data aggregation processing into data fragmentation corresponding to the data screening condition according to the data screening condition.

In this embodiment, the fourth processing sub-module 1002 performs data aggregation processing on the data fragment according to the data aggregation processing command, and the second fragment sub-module 1003 splits the data fragment subjected to the data aggregation processing based on different data screening conditions to obtain a data processing result matching with the data processing command.

Fig. 11 is a block diagram illustrating a structure of an electronic device according to an embodiment of the present invention, and as shown in fig. 11, the electronic device 1100 includes a memory 1101 and a processor 1102; wherein, the first and the second end of the pipe are connected with each other,

the memory 1101 is used to store one or more computer instructions that are executed by the processor 1102 to implement any of the method steps described above.

Fig. 12 is a schematic block diagram of a computer system suitable for use in implementing a data processing method according to an embodiment of the present invention.

As shown in fig. 12, the computer system 1200 includes a Central Processing Unit (CPU) 1201, which can execute various processes in the above-described embodiments according to a program stored in a Read Only Memory (ROM) 1202 or a program loaded from a storage section 1208 into a Random Access Memory (RAM) 1203. In the RAM1203, various programs and data necessary for the operation of the system 1200 are also stored. The CPU1201, ROM1202, and RAM1203 are connected to each other by a bus 1204. An input/output (I/O) interface 1205 is also connected to bus 1204.

The following components are connected to the I/O interface 1205: an input section 1206 including a keyboard, a mouse, and the like; an output portion 1207 including a display device such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 1208 including a hard disk and the like; and a communication section 1209 including a network interface card such as a LAN card, a modem, or the like. The communication section 1209 performs communication processing via a network such as the internet. A driver 1210 is also connected to the I/O interface 1205 as needed. A removable medium 1211, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is mounted on the drive 1210 as necessary, so that a computer program read out therefrom is mounted into the storage section 1208 as necessary.

In particular, the above described method may be implemented as a computer software program according to an embodiment of the present invention. For example, embodiments of the invention include a computer program product comprising a computer program tangibly embodied on a medium readable thereby, the computer program comprising program code for performing the data processing method. In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 1209, and/or installed from the removable medium 1211.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowcharts or block diagrams may represent a module, a program segment, or a portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units or modules described in the embodiments of the present invention may be implemented by software or hardware. The units or modules described may also be provided in a processor, and the names of the units or modules do not in some cases constitute a limitation of the units or modules themselves.

As another aspect, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium may be a computer-readable storage medium included in the apparatus in the foregoing embodiment; or it may be a separate computer readable storage medium not incorporated into the device. The computer readable storage medium stores one or more programs for use by one or more processors in performing the methods described in the embodiments of the present invention.

The foregoing description is only exemplary of the preferred embodiments of the invention and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention according to the embodiments of the present invention is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept. For example, the above features and (but not limited to) the features with similar functions disclosed in the embodiments of the present invention are mutually replaced to form the technical solution.

Claims

1. A data processing method, comprising:

presetting the preprocessing data set;

the pre-processing data set includes:

and carrying out data aggregation processing on the data fragments.

2. The method of claim 1, wherein the obtaining a raw data set and preprocessing the raw data set based on the set of data screening conditions to obtain a preprocessed data set comprises:

acquiring an original data set;

and combining the extracted data into a preprocessed data set.

3. The method according to claim 1, wherein the performing data aggregation processing on the data slice includes:

acquiring a data aggregation processing command;

and performing data aggregation processing on the data sub-fragments according to the data aggregation processing command.

4. The method according to claim 1, wherein the performing data aggregation processing on the data slice includes:

acquiring a data aggregation processing command;

5. A data processing apparatus, comprising:

the processing module is configured to carry out preset processing on the preprocessed data set;

the processing module comprises:

6. The data processing apparatus of claim 5, wherein the pre-processing module comprises:

a first obtaining sub-module configured to obtain an original data set;

the extraction sub-module is configured to extract data meeting all data screening conditions in the data screening condition set from the original data set;

7. The data processing apparatus of claim 5, wherein the second processing submodule comprises:

a third obtaining sub-module configured to obtain a data aggregation processing command;

8. The data processing apparatus of claim 5, wherein the second processing submodule comprises:

and the second fragmentation sub-module is configured to divide the data fragmentation subjected to data aggregation processing into data sub-fragmentation corresponding to the data screening condition according to the data screening condition.

9. An electronic device comprising a memory and a processor; wherein, the first and the second end of the pipe are connected with each other,

the memory is configured to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method steps of any of claims 1-4.

10. A computer-readable storage medium having stored thereon computer instructions, characterized in that the computer instructions, when executed by a processor, carry out the method steps of any of claims 1-4.