CN113411365A - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN113411365A
CN113411365A CN202010184728.XA CN202010184728A CN113411365A CN 113411365 A CN113411365 A CN 113411365A CN 202010184728 A CN202010184728 A CN 202010184728A CN 113411365 A CN113411365 A CN 113411365A
Authority
CN
China
Prior art keywords
data
address information
information
task flow
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010184728.XA
Other languages
Chinese (zh)
Inventor
蒲承祖
刘毅
刘红梅
姜良军
袁鲲
邱伟娜
张康
孙善勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Group Shandong Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Group Shandong Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Group Shandong Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202010184728.XA priority Critical patent/CN113411365A/en
Publication of CN113411365A publication Critical patent/CN113411365A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention relates to a data processing method and a data processing device. The method comprises the following steps: receiving a data message sent by a data source system, wherein the data message carries address information and data information of a server for storing data; the data source system stores the data in the server; determining a task flow corresponding to the data according to the data information, and writing the address information into a scheduling queue corresponding to the task flow; extracting the address information according to the arrangement sequence of the address information in the scheduling queue; downloading the data from the server according to the address information; and loading and converging the downloaded data of the same task flow. The technical scheme provided by the embodiment of the invention can improve the efficiency and accuracy of data acquisition, loading and aggregation.

Description

Data processing method and device
[ technical field ] A method for producing a semiconductor device
The present invention relates to the field of communications technologies, and in particular, to a data processing method and apparatus, and an electronic device.
[ background of the invention ]
At present, 4G services are explosively increased, along with the arrival of the 5G era, the network scale is continuously enlarged, the network structure is more complex, the network services are continuously diversified, the data volume of network indexes is exponentially increased, and each scene service has the tide effect of high user number and high flow in a short time. In this large background, the real-time performance and accuracy of data analysis are increasingly important.
The existing data analysis system usually completes data processing in a timed task manner, such as the process of extracting (extract), converting (transform), and loading (load) data from a source end to a destination end, which is referred to as ETL for short. In the data processing mode, processing programs of three links of data acquisition, loading and convergence are mutually independent and depend on appointed scheduling time, and a task flow is not formed in a real sense. With the increase of data sources, the data volume is continuously increased, the change of services needs to continuously adjust the scheduling time, so that the processes of data acquisition, loading and aggregation are not only complicated, but also errors are easy to occur.
[ summary of the invention ]
In view of this, embodiments of the present invention provide a data processing method and apparatus to improve the efficiency and accuracy of data acquisition, loading and aggregation.
In order to achieve the above object, in a first aspect, the present invention provides a data processing method, including:
receiving a data message sent by a data source system, wherein the data message carries address information and data information of a server for storing data; the data source system stores the data in the server;
determining a task flow corresponding to the data according to the data information, and writing the address information into a scheduling queue corresponding to the task flow;
extracting the address information according to the arrangement sequence of the address information in the scheduling queue;
downloading the data from the server according to the address information;
and loading and converging the downloaded data of the same task flow.
With reference to the first aspect, in one possible implementation, the method further includes: determining whether the data message carries a complementary mining identifier; if the data message carries a complementary acquisition identifier, after the data is downloaded from a server, searching for data to be replaced, wherein the data information in the downloaded data is the same as the data information carried in the data; and deleting the data to be replaced.
With reference to the first aspect, in a possible implementation manner, the determining, according to the data information, a task flow corresponding to the data, and writing the address information into a scheduling queue corresponding to the task flow includes: determining whether a task flow corresponding to the data exists according to the data information; if the address information does not exist, establishing a task flow corresponding to the data, and writing the address information into a scheduling queue corresponding to the task flow; and if so, writing the address information into a scheduling queue corresponding to the task flow.
With reference to the first aspect, in one possible implementation manner, the address information includes: the IP address and port address of the server, the user name and password used to access the server, and the file name and file path used to store the data.
With reference to the first aspect, in one possible implementation manner, the data information includes: data detail information, data start time and data end time.
In order to achieve the above object, in a second aspect, the present invention provides a data processing apparatus comprising:
the data source system comprises a message processing module, a data source processing module and a data processing module, wherein the message processing module is used for receiving a data message sent by the data source system, and the data message carries address information and data information of a server used for storing data; the data source system stores the data in the server;
the scheduling module is used for determining a task flow corresponding to the data according to the data information and writing the address information into a scheduling queue corresponding to the task flow;
the data processing module extracts the address information according to the arrangement sequence of the address information in the scheduling queue and downloads the data from the server according to the address information;
and the data processing module is also used for loading and converging the downloaded data of the same task flow.
With reference to the second aspect, in a possible implementation manner, the data processing module is further configured to: determining whether the data message carries a complementary mining identifier; if the data message carries a complementary acquisition identifier, after the data is downloaded from a server, searching for data to be replaced, wherein the data information in the downloaded data is the same as the data information carried in the data; and deleting the data to be replaced.
With reference to the second aspect, in a possible implementation manner, the scheduling module is specifically configured to: determining whether a task flow corresponding to the data exists according to the data information; if the address information does not exist, establishing a task flow corresponding to the data, and writing the address information into a scheduling queue corresponding to the task flow; and if so, writing the address information into a scheduling queue corresponding to the task flow.
In order to achieve the above object, in a third aspect, the present invention provides a non-transitory computer-readable storage medium storing computer instructions that cause the computer to execute the above-described data processing method.
In order to achieve the above object, in a fourth aspect, the present invention provides a computer device comprising: at least one processor; and at least one memory communicatively coupled to the processor, wherein the memory stores program instructions executable by the processor, and the processor calls the program instructions to perform the data processing method described above.
In the scheme, three independent links of data acquisition, loading and convergence are organically unified through the task flow, and the efficiency and the accuracy of the data acquisition, the loading and the convergence are effectively improved.
[ description of the drawings ]
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.
Fig. 1 is a schematic flow chart of a data processing method according to an embodiment of the present invention;
FIG. 2 is another flow chart of another data processing method according to an embodiment of the present invention;
FIG. 3 is a diagram of a data processing apparatus according to an embodiment of the present invention;
fig. 4 is a schematic diagram of an alternative computer device provided by the embodiment of the present invention.
[ detailed description ] embodiments
For better understanding of the technical solutions of the present invention, the following detailed descriptions of the embodiments of the present invention are provided with reference to the accompanying drawings.
It should be understood that the described embodiments are only some embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be understood that the term "and/or" as used herein is merely a binding relationship that describes a binding object, meaning that three relationships may exist, e.g., A and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter binding objects are in an "or" relationship.
It should be understood that although the terms first, second, third, etc. may be used to describe the terminals in the embodiments of the present invention, the terminals should not be limited by these terms. These terms are only used to distinguish one terminal from another. For example, a first terminal may also be referred to as a second terminal, and similarly, a second terminal may also be referred to as a first terminal, without departing from the scope of embodiments of the present invention.
The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.
Fig. 1 is a diagram of a data processing method according to an embodiment of the present invention, which can be applied to a data processing apparatus. The data processing apparatus may include: the system comprises a message processing module, a scheduling module and a data processing module.
As shown in fig. 1, the method comprises:
step 101, receiving a data message sent by a data source system.
The data message carries address information and data information of a server for storing data.
The server for storing data may specifically be a PTF server, and the address information of the server may specifically include: the IP address and port address of the server, the username and password for accessing the server, and the filename and file path for storing the data. The data information may specifically include: data details (e.g., data source information, data type, data content, etc.), data start time, and data end time.
In one example, the data source system is used for collecting data, storing the collected data in the server, then carrying address information of the server for storing the data and data information of the data in a data message, sending the data message to the data processing device, and a message processing module in the data processing device receives the data message.
And 102, determining a task flow corresponding to the data according to the data information, and writing the address information into a scheduling queue corresponding to the task flow.
In one example, data of the same data source corresponds to the same task stream. The scheduling module of the data processing apparatus may determine a data source that transmits the data according to detailed data information included in the data information, and allocate all data transmitted by the data source to the same task stream. The scheduling module may also convert the address information into a format recognizable by the scheduling queue, and write the address information after the format conversion into the scheduling queue corresponding to the task flow to which the data is allocated. Based on the scheduling mode, a plurality of task flows can be established at the same time so as to process data of a plurality of data sources at the same time, and therefore the data processing efficiency is improved.
And 103, extracting the address information of the data according to the arrangement sequence of the address information in the scheduling queue.
In one example, the data processing module of the data processing apparatus sequentially extracts the address information in the scheduling queue according to the order of arrangement of the address information in the scheduling queue, and executes the next step after the address information of the data is extracted.
It should be noted that, if multiple scheduling queues exist at the same time, the data processing module may sequentially invoke address information in each scheduling queue.
And 104, downloading the data from the server according to the address information.
As described in step 101, the data is stored in the server, so that the data processing module of the data processing apparatus can access the server for storing the data according to the address information and download the data from the server to the local.
And 105, loading and aggregating the downloaded data of the same task flow.
And the data processing module of the data processing device loads and assembles the data of the same task flow after downloading the data of the same task flow, so as to complete the whole data processing process.
By utilizing the data processing method provided by the embodiment of the invention, three independent links of data acquisition, loading and convergence are organically unified through the task flow, and the efficiency and the accuracy of the data acquisition, loading and convergence are effectively improved.
Fig. 2 is a schematic flow chart of another data processing method according to an embodiment of the present invention. The data processing method can be applied to a data processing apparatus. The data processing apparatus may include: the system comprises a message processing module, a scheduling module and a data processing module.
As shown in fig. 2, the data processing method of the present embodiment may include:
step 201, receiving a data message sent by a data source system.
The data message carries an identifier for indicating whether the data is the complementary data or not, and address information and data information of a server for storing the data.
The server for storing data may specifically be a PTF server, and the address information of the server may specifically include: the IP address and port address of the server, the username and password for accessing the server, and the filename and file path for storing the data. The data information may specifically include: data details (e.g., data source information, data type, data content, etc.), data start time, and data end time.
In one example, the data source system is used for collecting data, storing the collected data in the server, then carrying address information of the server for storing the data and data information of the data in a data message, sending the data message to the data processing device, and a message processing module in the data processing device receives the data message.
In one specific example, the format of the data message is as follows:
Figure BDA0002413752450000071
step 202, determining whether a task flow corresponding to the data exists according to the data information.
If not, go to steps 203 and 204; if so, go to step 204.
In one example, data of the same data source corresponds to the same task stream. The scheduling module of the data processing apparatus may determine a data source that transmits the data according to detailed data information included in the data information, and allocate all data transmitted by the data source to the same task stream. Based on the scheduling mode, a plurality of task flows can be established at the same time so as to process data of a plurality of data sources at the same time, and therefore the data processing efficiency is improved.
Step 203, a task flow corresponding to the data is established, and the address information is written into a scheduling queue corresponding to the task flow.
When the scheduling module establishes the task flow, whether the task flow is established immediately can be determined according to a preset scheduling strategy. If the number of the current task flows exceeds a preset threshold value, whether the data is important data is judged, if yes, the task flow corresponding to the data is immediately established, and if not, after the task flow is finished, the task flow corresponding to the data is established again to optimize the processing sequence and reduce the processing pressure.
And step 204, writing the address information into a scheduling queue corresponding to the task flow.
The scheduling module may also convert the address information into a format recognizable by the scheduling queue, and write the address information after the format conversion into the scheduling queue corresponding to the task flow to which the data is allocated.
Step 205, extracting the address information of the data according to the arrangement sequence of the address information in the scheduling queue.
In one example, the data processing module of the data processing apparatus sequentially extracts the address information in the scheduling queue according to the order of arrangement of the address information in the scheduling queue, and executes the next step after the address information of the data is extracted.
It should be noted that, if multiple scheduling queues exist at the same time, the data processing module may sequentially invoke address information in each scheduling queue.
Step 206, downloading the data from the server according to the address information.
As described in step 201, the data is stored in the server, so that the data processing module of the data processing apparatus can access the server for storing the data according to the address information and download the data from the server to the smart home.
Step 207, determine whether the data message carries a complementary acquisition identifier.
The data processing module of the data processing device may further determine whether the data corresponding to the data message is the complementary data according to the identifier, which is carried in the data message and is used for indicating whether the data is the complementary data, so as to execute the corresponding step. In an example, if the identifier indicating whether the data is the complementary data is 1, the data message is considered to carry the complementary data identifier, and the data corresponding to the data message is the complementary data, so step 208 and 209 are executed; if the identifier indicating whether the data is the complementary data is 0, it is determined that the data message does not carry the complementary data identifier, and the data corresponding to the data message is the original data, so step 209 is executed.
And step 208, searching for data to be replaced, in the downloaded data, of which the data information is the same as the data information carried in the data, and deleting the data to be replaced.
Because the data corresponding to the data information is the complementary data, the message processing module needs to delete the original data corresponding to the complementary data, so as to avoid processing errors.
And step 209, loading and aggregating the downloaded data of the same task flow.
And the data processing module of the data processing device loads and assembles the data of the same task flow after downloading the data of the same task flow, so as to complete the whole data processing process.
By utilizing the data processing method provided by the embodiment of the invention, three independent links of data acquisition, loading and convergence are organically unified through the task flow, and the efficiency and the accuracy of the data acquisition, loading and convergence are effectively improved. And when the data needs to be supplemented and collected, the data source system prepares the data needing to be supplemented and collected, and triggers the subsequent flow by sending the data message carrying the supplementing and collecting identification, so that the whole process does not need manual participation, and all service flows related to the supplementing and collecting data can be processed according to the corresponding logic chains, thereby avoiding the problem of possible omission in manual processing. Meanwhile, detailed information of data to be processed is recorded in the data message, and the whole process only needs to process problematic data, so that resource consumption is reduced.
As shown in fig. 3, an embodiment of the present invention provides a data processing apparatus, where the data processing apparatus of this embodiment may include: a message processing module 301, a scheduling module 302 and a data processing module 303.
A message processing module 301, configured to receive a data message sent by a data source system, where the data message carries address information and data information of a server used for storing data; the data source system stores the data in the server.
The data message carries address information and data information of a server for storing data.
The server for storing data may specifically be a PTF server, and the address information of the server may specifically include: the IP address and port address of the server, the username and password for accessing the server, and the filename and file path for storing the data. The data information may specifically include: data details (e.g., data source information, data type, data content, etc.), data start time, and data end time.
In one example, the data source system is configured to collect data, store the collected data in a server, then carry address information of the server for storing the data and data information of the data in a data message, send the data message to the data processing apparatus, and the message processing module 301 in the data processing apparatus receives the data message.
The scheduling module 302 is configured to determine a task flow corresponding to the data according to the data information, and write the address information into a scheduling queue corresponding to the task flow.
In one example, data of the same data source corresponds to the same task stream. The scheduling module 302 of the data processing apparatus may determine a data source sending the data according to the detailed data information included in the data information, and allocate all data sent by the data source to the same task stream. The scheduling module 302 may also convert the address information into a format that can be recognized by a scheduling queue, and write the address information after the format conversion into the scheduling queue corresponding to the task flow to which the data is allocated. Based on the scheduling mode, a plurality of task flows can be established at the same time so as to process data of a plurality of data sources at the same time, and therefore the data processing efficiency is improved.
And the data processing module 303 extracts the address information according to the arrangement sequence of the address information in the scheduling queue, and downloads the data from the server according to the address information.
In one example, the data processing module 303 of the data processing apparatus sequentially extracts the address information in the scheduling queue according to the order of the address information in the scheduling queue, and downloads the data from the server according to the address information after extracting the address information of the data.
It should be noted that, if multiple scheduling queues exist at the same time, the data processing module 303 may sequentially retrieve address information in each scheduling queue.
The data processing module 303 is further configured to load and aggregate the downloaded data of the same task flow.
After the data of the same task flow is downloaded, the data processing module 303 of the data processing apparatus loads and aggregates the data of the same task flow, thereby completing the whole data processing process.
Preferably, the data processing module 303 is further configured to: determining whether the data message carries a complementary mining identifier; if the data message carries a complementary acquisition identifier, after the data is downloaded from a server, searching for data to be replaced, wherein the data information in the downloaded data is the same as the data information carried in the data; and deleting the data to be replaced.
Preferably, the scheduling module 302 is configured to: determining whether a task flow corresponding to the data exists according to the data information; if the address information does not exist, establishing a task flow corresponding to the data, and writing the address information into a scheduling queue corresponding to the task flow; and if so, writing the address information into a scheduling queue corresponding to the task flow.
By utilizing the data processing device provided by the embodiment of the invention, three independent links of data acquisition, loading and convergence are organically unified through task flows, and the efficiency and the accuracy of the data acquisition, loading and convergence are effectively improved. And when the data needs to be supplemented and collected, the data source system prepares the data needing to be supplemented and collected, and triggers the subsequent flow by sending the data message carrying the supplementing and collecting identification, so that the whole process does not need manual participation, and all service flows related to the supplementing and collecting data can be processed according to the corresponding logic chains, thereby avoiding the problem of possible omission in manual processing. Meanwhile, detailed information of data to be processed is recorded in the data message, the whole process only needs to process problematic data, and resource consumption is reduced
An embodiment of the present invention provides a computer-readable storage medium, where the non-transitory computer-readable storage medium stores computer instructions, where the computer instructions cause a computer to perform the following steps:
receiving a data message sent by a data source system, wherein the data message carries address information and data information of a server for storing data; the data source system stores the data in the server; determining a task flow corresponding to the data according to the data information, and writing the address information into a scheduling queue corresponding to the task flow; extracting the address information according to the arrangement sequence of the address information in the scheduling queue;
downloading the data from the server according to the address information; and loading and converging the downloaded data of the same task flow.
Optionally, the computer instructions cause the computer to further perform the steps of:
determining whether the data message carries a complementary mining identifier; if the data message carries a complementary acquisition identifier, after the data is downloaded from a server, searching for data to be replaced, wherein the data information in the downloaded data is the same as the data information carried in the data; and deleting the data to be replaced.
Optionally, the determining, according to the data information, a task flow corresponding to the data, and writing the address information into a scheduling queue corresponding to the task flow includes:
determining whether a task flow corresponding to the data exists according to the data information; if the address information does not exist, establishing a task flow corresponding to the data, and writing the address information into a scheduling queue corresponding to the task flow; and if so, writing the address information into a scheduling queue corresponding to the task flow.
Optionally, the address information includes: the IP address and port address of the server, the user name and password used to access the server, and the file name and file path used to store the data.
Optionally, the data information includes: data detail information, data start time and data end time.
Fig. 4 is a schematic diagram of a computer device 400 according to an embodiment of the present invention, and as shown in fig. 4, the computer device 400 according to the embodiment includes: at least one processor 410 and a communication interface 420; and at least one memory 430 communicatively coupled to the processor 410, wherein the memory 430 stores program instructions executable by the processor 410, and the processor 410 calls the program instructions to perform the data processing method described above. To avoid repetition, it is not repeated herein.
The computer device 400 may be a desktop computer, a notebook, a palm top computer, a cloud server, or other computer devices. The computer devices may include, but are not limited to, a processor 410, a communication interface 420, and a memory 430. Those skilled in the art will appreciate that fig. 4 is merely an example of a computer device 400 and is not intended to limit the computer device 400 and may include more or fewer components than those shown, or some of the components may be combined, or different components, e.g., the computer device may also include a communication bus 440, etc.
The Processor 410 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The storage 430 may be an internal storage unit of the computer device 400, such as a hard disk or a memory of the computer device 400. The memory 430 may also be an external storage device of the computer device 400, such as a plug-in hard disk provided on the computer device 400, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the memory 330 may also include both internal storage units of the computer device 300 and external storage devices. The memory 430 is used to store program instructions and other programs and data required by the computer device. The memory 430 may also be used to temporarily store data that has been output or is to be output.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) or a Processor (Processor) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The present invention is not limited to the above preferred embodiments, and any modifications, equivalent substitutions, improvements, etc. within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A data processing method, applied to a data processing apparatus, the method comprising:
receiving a data message sent by a data source system, wherein the data message carries address information and data information of a server for storing data; the data source system stores the data in the server;
determining a task flow corresponding to the data according to the data information, and writing the address information into a scheduling queue corresponding to the task flow;
extracting the address information according to the arrangement sequence of the address information in the scheduling queue;
downloading the data from the server according to the address information;
and loading and converging the downloaded data of the same task flow.
2. The method of claim 1, further comprising:
determining whether the data message carries a complementary mining identifier;
if the data message carries a complementary acquisition identifier, after the data is downloaded from a server, searching for data to be replaced, wherein the data information in the downloaded data is the same as the data information carried in the data;
and deleting the data to be replaced.
3. The method according to claim 1, wherein the determining a task flow corresponding to the data according to the data information and writing the address information into a scheduling queue corresponding to the task flow comprises:
determining whether a task flow corresponding to the data exists according to the data information;
if the address information does not exist, establishing a task flow corresponding to the data, and writing the address information into a scheduling queue corresponding to the task flow;
and if so, writing the address information into a scheduling queue corresponding to the task flow.
4. The method of claim 1, wherein the address information comprises: the IP address and port address of the server, the user name and password used to access the server, and the file name and file path used to store the data.
5. The method of claim 1, wherein the data information comprises: data detail information, data start time and data end time.
6. A data processing apparatus, characterized in that the apparatus comprises:
the data source system comprises a message processing module, a data source processing module and a data processing module, wherein the message processing module is used for receiving a data message sent by the data source system, and the data message carries address information and data information of a server used for storing data; the data source system stores the data in the server;
the scheduling module is used for determining a task flow corresponding to the data according to the data information and writing the address information into a scheduling queue corresponding to the task flow;
the data processing module extracts the address information according to the arrangement sequence of the address information in the scheduling queue and downloads the data from the server according to the address information;
and the data processing module is also used for loading and converging the downloaded data of the same task flow.
7. The apparatus of claim 6, wherein the data processing module is further configured to:
determining whether the data message carries a complementary mining identifier;
if the data message carries a complementary acquisition identifier, after the data is downloaded from a server, searching for data to be replaced, wherein the data information in the downloaded data is the same as the data information carried in the data;
and deleting the data to be replaced.
8. The apparatus of claim 6, wherein the scheduling module is specifically configured to:
determining whether a task flow corresponding to the data exists according to the data information;
if the address information does not exist, establishing a task flow corresponding to the data, and writing the address information into a scheduling queue corresponding to the task flow;
and if so, writing the address information into a scheduling queue corresponding to the task flow.
9. A non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions cause the computer to execute the data processing method according to any one of claims 1 to 5.
10. A computer device, comprising: at least one processor; and at least one memory communicatively coupled to the processor, wherein the memory stores program instructions executable by the processor, and wherein the processor is capable of executing the data processing method of any of claims 1-5 when invoked by the program instructions.
CN202010184728.XA 2020-03-17 2020-03-17 Data processing method and device Pending CN113411365A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010184728.XA CN113411365A (en) 2020-03-17 2020-03-17 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010184728.XA CN113411365A (en) 2020-03-17 2020-03-17 Data processing method and device

Publications (1)

Publication Number Publication Date
CN113411365A true CN113411365A (en) 2021-09-17

Family

ID=77677067

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010184728.XA Pending CN113411365A (en) 2020-03-17 2020-03-17 Data processing method and device

Country Status (1)

Country Link
CN (1) CN113411365A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004070615A (en) * 2002-08-06 2004-03-04 Digital Electronics Corp Data collecting system, data collecting method, program for collecting data, and recording medium with its program recorded thereon
US20080141250A1 (en) * 2006-10-30 2008-06-12 Karlheinz Dorn Distributed taskflow architecture
CN102915254A (en) * 2011-08-02 2013-02-06 中兴通讯股份有限公司 Task management method and device
CN104780017A (en) * 2014-01-10 2015-07-15 中国移动通信集团公司 Data processing method and data processing device
WO2018107780A1 (en) * 2016-12-16 2018-06-21 威创集团股份有限公司 Method and system for controlling task flow of kvm system
CN108958881A (en) * 2018-05-31 2018-12-07 平安科技(深圳)有限公司 Data processing method, device and computer readable storage medium
CN110231983A (en) * 2019-05-13 2019-09-13 北京百度网讯科技有限公司 Data Concurrent processing method, apparatus and system, computer equipment and readable medium
CN110650180A (en) * 2019-08-23 2020-01-03 腾讯科技(深圳)有限公司 Road data acquisition method, system, terminal and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004070615A (en) * 2002-08-06 2004-03-04 Digital Electronics Corp Data collecting system, data collecting method, program for collecting data, and recording medium with its program recorded thereon
US20080141250A1 (en) * 2006-10-30 2008-06-12 Karlheinz Dorn Distributed taskflow architecture
CN102915254A (en) * 2011-08-02 2013-02-06 中兴通讯股份有限公司 Task management method and device
CN104780017A (en) * 2014-01-10 2015-07-15 中国移动通信集团公司 Data processing method and data processing device
WO2018107780A1 (en) * 2016-12-16 2018-06-21 威创集团股份有限公司 Method and system for controlling task flow of kvm system
CN108958881A (en) * 2018-05-31 2018-12-07 平安科技(深圳)有限公司 Data processing method, device and computer readable storage medium
CN110231983A (en) * 2019-05-13 2019-09-13 北京百度网讯科技有限公司 Data Concurrent processing method, apparatus and system, computer equipment and readable medium
CN110650180A (en) * 2019-08-23 2020-01-03 腾讯科技(深圳)有限公司 Road data acquisition method, system, terminal and storage medium

Similar Documents

Publication Publication Date Title
CN108255701B (en) Scene testing method and mobile terminal
CN109547474A (en) A kind of data transmission method and device
CN107146623B (en) Speech recognition method, device and system based on artificial intelligence
CN115567589B (en) Compression transmission method, device and equipment of JSON data and storage medium
CN116755844B (en) Data processing method, device and equipment of simulation engine and storage medium
US20200204688A1 (en) Picture book sharing method and apparatus and system using the same
CN111273891A (en) Business decision method and device based on rule engine and terminal equipment
CN106897052B (en) APK file compression method and device
CN113254767A (en) Big data searching method and device, computer equipment and storage medium
CN105550179A (en) Webpage collection method and browser plug-in
CN111966647A (en) Cloud storage method and device for small files, server and storage medium
CN113204695B (en) Website identification method and device
CN114218175A (en) Resource cross-platform sharing method and device, terminal equipment and storage medium
CN109993286B (en) Sparse neural network computing method and related product
CN110380902B (en) Topological relation generation method and device, electronic equipment and storage medium
CN117093619A (en) Rule engine processing method and device, electronic equipment and storage medium
CN116775575A (en) File merging method and device, electronic equipment and storage medium
CN113411365A (en) Data processing method and device
CN113779021B (en) Data processing method, device, computer system and readable storage medium
CN111782479A (en) Log processing method and device, electronic equipment and computer readable storage medium
CN107888445B (en) Method and device for analyzing performance state, computer equipment and storage medium
CN112650597A (en) Processing system and method for high-concurrency acquired data
CN113327302A (en) Picture processing method and device, storage medium and electronic device
CN112055058A (en) Data storage method and device and computer readable storage medium
CN117435367B (en) User behavior processing method, device, equipment, storage medium and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210917