CN113411365A

CN113411365A - Data processing method and device

Info

Publication number: CN113411365A
Application number: CN202010184728.XA
Authority: CN
Inventors: 蒲承祖; 刘毅; 刘红梅; 姜良军; 袁鲲; 邱伟娜; 张康; 孙善勇
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Group Shandong Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Group Shandong Co Ltd
Priority date: 2020-03-17
Filing date: 2020-03-17
Publication date: 2021-09-17

Abstract

The invention relates to a data processing method and a data processing device. The method comprises the following steps: receiving a data message sent by a data source system, wherein the data message carries address information and data information of a server for storing data; the data source system stores the data in the server; determining a task flow corresponding to the data according to the data information, and writing the address information into a scheduling queue corresponding to the task flow; extracting the address information according to the arrangement sequence of the address information in the scheduling queue; downloading the data from the server according to the address information; and loading and converging the downloaded data of the same task flow. The technical scheme provided by the embodiment of the invention can improve the efficiency and accuracy of data acquisition, loading and aggregation.

Description

Data processing method and device

[ technical field ] A method for producing a semiconductor device

The present invention relates to the field of communications technologies, and in particular, to a data processing method and apparatus, and an electronic device.

[ background of the invention ]

At present, 4G services are explosively increased, along with the arrival of the 5G era, the network scale is continuously enlarged, the network structure is more complex, the network services are continuously diversified, the data volume of network indexes is exponentially increased, and each scene service has the tide effect of high user number and high flow in a short time. In this large background, the real-time performance and accuracy of data analysis are increasingly important.

The existing data analysis system usually completes data processing in a timed task manner, such as the process of extracting (extract), converting (transform), and loading (load) data from a source end to a destination end, which is referred to as ETL for short. In the data processing mode, processing programs of three links of data acquisition, loading and convergence are mutually independent and depend on appointed scheduling time, and a task flow is not formed in a real sense. With the increase of data sources, the data volume is continuously increased, the change of services needs to continuously adjust the scheduling time, so that the processes of data acquisition, loading and aggregation are not only complicated, but also errors are easy to occur.

[ summary of the invention ]

In view of this, embodiments of the present invention provide a data processing method and apparatus to improve the efficiency and accuracy of data acquisition, loading and aggregation.

In order to achieve the above object, in a first aspect, the present invention provides a data processing method, including:

receiving a data message sent by a data source system, wherein the data message carries address information and data information of a server for storing data; the data source system stores the data in the server;

determining a task flow corresponding to the data according to the data information, and writing the address information into a scheduling queue corresponding to the task flow;

extracting the address information according to the arrangement sequence of the address information in the scheduling queue;

downloading the data from the server according to the address information;

and loading and converging the downloaded data of the same task flow.

With reference to the first aspect, in one possible implementation, the method further includes: determining whether the data message carries a complementary mining identifier; if the data message carries a complementary acquisition identifier, after the data is downloaded from a server, searching for data to be replaced, wherein the data information in the downloaded data is the same as the data information carried in the data; and deleting the data to be replaced.

With reference to the first aspect, in a possible implementation manner, the determining, according to the data information, a task flow corresponding to the data, and writing the address information into a scheduling queue corresponding to the task flow includes: determining whether a task flow corresponding to the data exists according to the data information; if the address information does not exist, establishing a task flow corresponding to the data, and writing the address information into a scheduling queue corresponding to the task flow; and if so, writing the address information into a scheduling queue corresponding to the task flow.

With reference to the first aspect, in one possible implementation manner, the address information includes: the IP address and port address of the server, the user name and password used to access the server, and the file name and file path used to store the data.

With reference to the first aspect, in one possible implementation manner, the data information includes: data detail information, data start time and data end time.

In order to achieve the above object, in a second aspect, the present invention provides a data processing apparatus comprising:

the data source system comprises a message processing module, a data source processing module and a data processing module, wherein the message processing module is used for receiving a data message sent by the data source system, and the data message carries address information and data information of a server used for storing data; the data source system stores the data in the server;

the scheduling module is used for determining a task flow corresponding to the data according to the data information and writing the address information into a scheduling queue corresponding to the task flow;

the data processing module extracts the address information according to the arrangement sequence of the address information in the scheduling queue and downloads the data from the server according to the address information;

and the data processing module is also used for loading and converging the downloaded data of the same task flow.

With reference to the second aspect, in a possible implementation manner, the data processing module is further configured to: determining whether the data message carries a complementary mining identifier; if the data message carries a complementary acquisition identifier, after the data is downloaded from a server, searching for data to be replaced, wherein the data information in the downloaded data is the same as the data information carried in the data; and deleting the data to be replaced.

With reference to the second aspect, in a possible implementation manner, the scheduling module is specifically configured to: determining whether a task flow corresponding to the data exists according to the data information; if the address information does not exist, establishing a task flow corresponding to the data, and writing the address information into a scheduling queue corresponding to the task flow; and if so, writing the address information into a scheduling queue corresponding to the task flow.

In order to achieve the above object, in a third aspect, the present invention provides a non-transitory computer-readable storage medium storing computer instructions that cause the computer to execute the above-described data processing method.

In order to achieve the above object, in a fourth aspect, the present invention provides a computer device comprising: at least one processor; and at least one memory communicatively coupled to the processor, wherein the memory stores program instructions executable by the processor, and the processor calls the program instructions to perform the data processing method described above.

In the scheme, three independent links of data acquisition, loading and convergence are organically unified through the task flow, and the efficiency and the accuracy of the data acquisition, the loading and the convergence are effectively improved.

[ description of the drawings ]

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.

Fig. 1 is a schematic flow chart of a data processing method according to an embodiment of the present invention;

FIG. 2 is another flow chart of another data processing method according to an embodiment of the present invention;

FIG. 3 is a diagram of a data processing apparatus according to an embodiment of the present invention;

fig. 4 is a schematic diagram of an alternative computer device provided by the embodiment of the present invention.

[ detailed description ] embodiments

For better understanding of the technical solutions of the present invention, the following detailed descriptions of the embodiments of the present invention are provided with reference to the accompanying drawings.

It should be understood that the described embodiments are only some embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be understood that the term "and/or" as used herein is merely a binding relationship that describes a binding object, meaning that three relationships may exist, e.g., A and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter binding objects are in an "or" relationship.

It should be understood that although the terms first, second, third, etc. may be used to describe the terminals in the embodiments of the present invention, the terminals should not be limited by these terms. These terms are only used to distinguish one terminal from another. For example, a first terminal may also be referred to as a second terminal, and similarly, a second terminal may also be referred to as a first terminal, without departing from the scope of embodiments of the present invention.

The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.

Fig. 1 is a diagram of a data processing method according to an embodiment of the present invention, which can be applied to a data processing apparatus. The data processing apparatus may include: the system comprises a message processing module, a scheduling module and a data processing module.

As shown in fig. 1, the method comprises:

step 101, receiving a data message sent by a data source system.

The data message carries address information and data information of a server for storing data.

The server for storing data may specifically be a PTF server, and the address information of the server may specifically include: the IP address and port address of the server, the username and password for accessing the server, and the filename and file path for storing the data. The data information may specifically include: data details (e.g., data source information, data type, data content, etc.), data start time, and data end time.

In one example, the data source system is used for collecting data, storing the collected data in the server, then carrying address information of the server for storing the data and data information of the data in a data message, sending the data message to the data processing device, and a message processing module in the data processing device receives the data message.

And 102, determining a task flow corresponding to the data according to the data information, and writing the address information into a scheduling queue corresponding to the task flow.

In one example, data of the same data source corresponds to the same task stream. The scheduling module of the data processing apparatus may determine a data source that transmits the data according to detailed data information included in the data information, and allocate all data transmitted by the data source to the same task stream. The scheduling module may also convert the address information into a format recognizable by the scheduling queue, and write the address information after the format conversion into the scheduling queue corresponding to the task flow to which the data is allocated. Based on the scheduling mode, a plurality of task flows can be established at the same time so as to process data of a plurality of data sources at the same time, and therefore the data processing efficiency is improved.

And 103, extracting the address information of the data according to the arrangement sequence of the address information in the scheduling queue.

In one example, the data processing module of the data processing apparatus sequentially extracts the address information in the scheduling queue according to the order of arrangement of the address information in the scheduling queue, and executes the next step after the address information of the data is extracted.

It should be noted that, if multiple scheduling queues exist at the same time, the data processing module may sequentially invoke address information in each scheduling queue.

And 104, downloading the data from the server according to the address information.

As described in step 101, the data is stored in the server, so that the data processing module of the data processing apparatus can access the server for storing the data according to the address information and download the data from the server to the local.

And 105, loading and aggregating the downloaded data of the same task flow.

And the data processing module of the data processing device loads and assembles the data of the same task flow after downloading the data of the same task flow, so as to complete the whole data processing process.

By utilizing the data processing method provided by the embodiment of the invention, three independent links of data acquisition, loading and convergence are organically unified through the task flow, and the efficiency and the accuracy of the data acquisition, loading and convergence are effectively improved.

Fig. 2 is a schematic flow chart of another data processing method according to an embodiment of the present invention. The data processing method can be applied to a data processing apparatus. The data processing apparatus may include: the system comprises a message processing module, a scheduling module and a data processing module.

As shown in fig. 2, the data processing method of the present embodiment may include:

step 201, receiving a data message sent by a data source system.

The data message carries an identifier for indicating whether the data is the complementary data or not, and address information and data information of a server for storing the data.

In one specific example, the format of the data message is as follows:

step 202, determining whether a task flow corresponding to the data exists according to the data information.

If not, go to

steps

203 and 204; if so, go to step 204.

In one example, data of the same data source corresponds to the same task stream. The scheduling module of the data processing apparatus may determine a data source that transmits the data according to detailed data information included in the data information, and allocate all data transmitted by the data source to the same task stream. Based on the scheduling mode, a plurality of task flows can be established at the same time so as to process data of a plurality of data sources at the same time, and therefore the data processing efficiency is improved.

Step 203, a task flow corresponding to the data is established, and the address information is written into a scheduling queue corresponding to the task flow.

When the scheduling module establishes the task flow, whether the task flow is established immediately can be determined according to a preset scheduling strategy. If the number of the current task flows exceeds a preset threshold value, whether the data is important data is judged, if yes, the task flow corresponding to the data is immediately established, and if not, after the task flow is finished, the task flow corresponding to the data is established again to optimize the processing sequence and reduce the processing pressure.

And step 204, writing the address information into a scheduling queue corresponding to the task flow.

The scheduling module may also convert the address information into a format recognizable by the scheduling queue, and write the address information after the format conversion into the scheduling queue corresponding to the task flow to which the data is allocated.

Step 205, extracting the address information of the data according to the arrangement sequence of the address information in the scheduling queue.

Step 206, downloading the data from the server according to the address information.

As described in step 201, the data is stored in the server, so that the data processing module of the data processing apparatus can access the server for storing the data according to the address information and download the data from the server to the smart home.

Step 207, determine whether the data message carries a complementary acquisition identifier.

The data processing module of the data processing device may further determine whether the data corresponding to the data message is the complementary data according to the identifier, which is carried in the data message and is used for indicating whether the data is the complementary data, so as to execute the corresponding step. In an example, if the identifier indicating whether the data is the complementary data is 1, the data message is considered to carry the complementary data identifier, and the data corresponding to the data message is the complementary data, so step 208 and 209 are executed; if the identifier indicating whether the data is the complementary data is 0, it is determined that the data message does not carry the complementary data identifier, and the data corresponding to the data message is the original data, so step 209 is executed.

And step 208, searching for data to be replaced, in the downloaded data, of which the data information is the same as the data information carried in the data, and deleting the data to be replaced.

Because the data corresponding to the data information is the complementary data, the message processing module needs to delete the original data corresponding to the complementary data, so as to avoid processing errors.

And step 209, loading and aggregating the downloaded data of the same task flow.

By utilizing the data processing method provided by the embodiment of the invention, three independent links of data acquisition, loading and convergence are organically unified through the task flow, and the efficiency and the accuracy of the data acquisition, loading and convergence are effectively improved. And when the data needs to be supplemented and collected, the data source system prepares the data needing to be supplemented and collected, and triggers the subsequent flow by sending the data message carrying the supplementing and collecting identification, so that the whole process does not need manual participation, and all service flows related to the supplementing and collecting data can be processed according to the corresponding logic chains, thereby avoiding the problem of possible omission in manual processing. Meanwhile, detailed information of data to be processed is recorded in the data message, and the whole process only needs to process problematic data, so that resource consumption is reduced.

As shown in fig. 3, an embodiment of the present invention provides a data processing apparatus, where the data processing apparatus of this embodiment may include: a message processing module 301, a scheduling module 302 and a data processing module 303.

A message processing module 301, configured to receive a data message sent by a data source system, where the data message carries address information and data information of a server used for storing data; the data source system stores the data in the server.

In one example, the data source system is configured to collect data, store the collected data in a server, then carry address information of the server for storing the data and data information of the data in a data message, send the data message to the data processing apparatus, and the message processing module 301 in the data processing apparatus receives the data message.

The scheduling module 302 is configured to determine a task flow corresponding to the data according to the data information, and write the address information into a scheduling queue corresponding to the task flow.

In one example, data of the same data source corresponds to the same task stream. The scheduling module 302 of the data processing apparatus may determine a data source sending the data according to the detailed data information included in the data information, and allocate all data sent by the data source to the same task stream. The scheduling module 302 may also convert the address information into a format that can be recognized by a scheduling queue, and write the address information after the format conversion into the scheduling queue corresponding to the task flow to which the data is allocated. Based on the scheduling mode, a plurality of task flows can be established at the same time so as to process data of a plurality of data sources at the same time, and therefore the data processing efficiency is improved.

And the data processing module 303 extracts the address information according to the arrangement sequence of the address information in the scheduling queue, and downloads the data from the server according to the address information.

In one example, the data processing module 303 of the data processing apparatus sequentially extracts the address information in the scheduling queue according to the order of the address information in the scheduling queue, and downloads the data from the server according to the address information after extracting the address information of the data.

It should be noted that, if multiple scheduling queues exist at the same time, the data processing module 303 may sequentially retrieve address information in each scheduling queue.

The data processing module 303 is further configured to load and aggregate the downloaded data of the same task flow.

After the data of the same task flow is downloaded, the data processing module 303 of the data processing apparatus loads and aggregates the data of the same task flow, thereby completing the whole data processing process.

Preferably, the data processing module 303 is further configured to: determining whether the data message carries a complementary mining identifier; if the data message carries a complementary acquisition identifier, after the data is downloaded from a server, searching for data to be replaced, wherein the data information in the downloaded data is the same as the data information carried in the data; and deleting the data to be replaced.

Preferably, the scheduling module 302 is configured to: determining whether a task flow corresponding to the data exists according to the data information; if the address information does not exist, establishing a task flow corresponding to the data, and writing the address information into a scheduling queue corresponding to the task flow; and if so, writing the address information into a scheduling queue corresponding to the task flow.

By utilizing the data processing device provided by the embodiment of the invention, three independent links of data acquisition, loading and convergence are organically unified through task flows, and the efficiency and the accuracy of the data acquisition, loading and convergence are effectively improved. And when the data needs to be supplemented and collected, the data source system prepares the data needing to be supplemented and collected, and triggers the subsequent flow by sending the data message carrying the supplementing and collecting identification, so that the whole process does not need manual participation, and all service flows related to the supplementing and collecting data can be processed according to the corresponding logic chains, thereby avoiding the problem of possible omission in manual processing. Meanwhile, detailed information of data to be processed is recorded in the data message, the whole process only needs to process problematic data, and resource consumption is reduced

An embodiment of the present invention provides a computer-readable storage medium, where the non-transitory computer-readable storage medium stores computer instructions, where the computer instructions cause a computer to perform the following steps:

receiving a data message sent by a data source system, wherein the data message carries address information and data information of a server for storing data; the data source system stores the data in the server; determining a task flow corresponding to the data according to the data information, and writing the address information into a scheduling queue corresponding to the task flow; extracting the address information according to the arrangement sequence of the address information in the scheduling queue;

downloading the data from the server according to the address information; and loading and converging the downloaded data of the same task flow.

Optionally, the computer instructions cause the computer to further perform the steps of:

determining whether the data message carries a complementary mining identifier; if the data message carries a complementary acquisition identifier, after the data is downloaded from a server, searching for data to be replaced, wherein the data information in the downloaded data is the same as the data information carried in the data; and deleting the data to be replaced.

Optionally, the determining, according to the data information, a task flow corresponding to the data, and writing the address information into a scheduling queue corresponding to the task flow includes:

determining whether a task flow corresponding to the data exists according to the data information; if the address information does not exist, establishing a task flow corresponding to the data, and writing the address information into a scheduling queue corresponding to the task flow; and if so, writing the address information into a scheduling queue corresponding to the task flow.

Optionally, the address information includes: the IP address and port address of the server, the user name and password used to access the server, and the file name and file path used to store the data.

Optionally, the data information includes: data detail information, data start time and data end time.

Fig. 4 is a schematic diagram of a computer device 400 according to an embodiment of the present invention, and as shown in fig. 4, the computer device 400 according to the embodiment includes: at least one processor 410 and a communication interface 420; and at least one memory 430 communicatively coupled to the processor 410, wherein the memory 430 stores program instructions executable by the processor 410, and the processor 410 calls the program instructions to perform the data processing method described above. To avoid repetition, it is not repeated herein.

The computer device 400 may be a desktop computer, a notebook, a palm top computer, a cloud server, or other computer devices. The computer devices may include, but are not limited to, a processor 410, a communication interface 420, and a memory 430. Those skilled in the art will appreciate that fig. 4 is merely an example of a computer device 400 and is not intended to limit the computer device 400 and may include more or fewer components than those shown, or some of the components may be combined, or different components, e.g., the computer device may also include a communication bus 440, etc.

The Processor 410 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage 430 may be an internal storage unit of the computer device 400, such as a hard disk or a memory of the computer device 400. The memory 430 may also be an external storage device of the computer device 400, such as a plug-in hard disk provided on the computer device 400, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the memory 330 may also include both internal storage units of the computer device 300 and external storage devices. The memory 430 is used to store program instructions and other programs and data required by the computer device. The memory 430 may also be used to temporarily store data that has been output or is to be output.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) or a Processor (Processor) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The present invention is not limited to the above preferred embodiments, and any modifications, equivalent substitutions, improvements, etc. within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A data processing method, applied to a data processing apparatus, the method comprising:

downloading the data from the server according to the address information;

and loading and converging the downloaded data of the same task flow.

2. The method of claim 1, further comprising:

determining whether the data message carries a complementary mining identifier;

if the data message carries a complementary acquisition identifier, after the data is downloaded from a server, searching for data to be replaced, wherein the data information in the downloaded data is the same as the data information carried in the data;

and deleting the data to be replaced.

3. The method according to claim 1, wherein the determining a task flow corresponding to the data according to the data information and writing the address information into a scheduling queue corresponding to the task flow comprises:

determining whether a task flow corresponding to the data exists according to the data information;

if the address information does not exist, establishing a task flow corresponding to the data, and writing the address information into a scheduling queue corresponding to the task flow;

and if so, writing the address information into a scheduling queue corresponding to the task flow.

4. The method of claim 1, wherein the address information comprises: the IP address and port address of the server, the user name and password used to access the server, and the file name and file path used to store the data.

5. The method of claim 1, wherein the data information comprises: data detail information, data start time and data end time.

6. A data processing apparatus, characterized in that the apparatus comprises:

7. The apparatus of claim 6, wherein the data processing module is further configured to:

determining whether the data message carries a complementary mining identifier;

and deleting the data to be replaced.

8. The apparatus of claim 6, wherein the scheduling module is specifically configured to:

9. A non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions cause the computer to execute the data processing method according to any one of claims 1 to 5.

10. A computer device, comprising: at least one processor; and at least one memory communicatively coupled to the processor, wherein the memory stores program instructions executable by the processor, and wherein the processor is capable of executing the data processing method of any of claims 1-5 when invoked by the program instructions.