CN114116245A - Data processing method, computer device, and storage medium - Google Patents

Data processing method, computer device, and storage medium Download PDF

Info

Publication number
CN114116245A
CN114116245A CN202010899539.0A CN202010899539A CN114116245A CN 114116245 A CN114116245 A CN 114116245A CN 202010899539 A CN202010899539 A CN 202010899539A CN 114116245 A CN114116245 A CN 114116245A
Authority
CN
China
Prior art keywords
data processing
processed
event message
event
processing component
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010899539.0A
Other languages
Chinese (zh)
Inventor
叶军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN202010899539.0A priority Critical patent/CN114116245A/en
Publication of CN114116245A publication Critical patent/CN114116245A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a data processing method, computer equipment and a storage medium, and belongs to the technical field of data processing. The method comprises the following steps: acquiring a to-be-processed event message, wherein the to-be-processed event message comprises a to-be-processed file list and an event type; determining a data processing component chain corresponding to the event message to be processed currently, and determining a target data processing component in the data processing component chain according to the event type in the event message to be processed; and sending the event message to be processed to the target data processing component so that the target data processing component processes the file list to be processed according to the event message to be processed. According to the technical scheme of the embodiment of the invention, the event message to be processed is distributed to the target data processing assembly according to the event type corresponding to the event message to be processed, so that the file list to be processed is processed in parallel by a plurality of data processing assemblies in the data processing assembly chain in an isolated manner, and the data processing efficiency is improved.

Description

Data processing method, computer device, and storage medium
Technical Field
The present invention relates to the field of communications technologies, and in particular, to a data processing method, a computer device, and a storage medium.
Background
With the rapid development of computer information technology and the improvement of communication network technology, the scale of industry application systems is rapidly enlarged, so that data sources generated by industry application are explosively increased. In order to better utilize data information, data sources of service systems separated from each other need to be integrated together, so that data mining and data analysis can be performed on the data sources corresponding to subsequent enterprises.
The existing text file processing method generally comprises two processing modes, wherein one processing mode is to carry out pipelined end-to-end processing based on text files, and a set of processing flow needs to be customized for each type of text files; in the case of large data volume, the data processing flow and algorithm are fixed and are not easy to expand or adjust, resulting in long time consumption and low efficiency of the data processing process. The other processing mode is to lead the text file into a memory database, and then to clean and convert the data record through the function grammar of the database; this approach has special format requirements for text files, such as coding, delimiters, and file size, and has a strong dependency on the database, i.e., it consumes a lot of system resources to load the text file, resulting in low processing efficiency.
Therefore, how to improve the data processing efficiency of the text file becomes an urgent problem to be solved.
Disclosure of Invention
Embodiments of the present invention mainly aim to provide a data processing method, a computer device, and a storage medium, in which a to-be-processed event message is allocated to a target data processing component according to an event type corresponding to the to-be-processed event message, so that a plurality of data processing components in a data processing component chain are isolated from each other and process a to-be-processed file list in parallel, thereby improving data processing efficiency.
In a first aspect, an embodiment of the present invention provides a data processing method, including:
acquiring a to-be-processed event message, wherein the to-be-processed event message comprises a to-be-processed file list and an event type; determining a data processing component chain corresponding to the event message to be processed currently, and determining a target data processing component in the data processing component chain according to the event type in the event message to be processed; and sending the event message to be processed to the target data processing component, so that the target data processing component processes the file list to be processed according to the event message to be processed.
In a second aspect, the embodiment of the present invention further provides a computer device, which includes a processor, a memory, a computer program stored on the memory and executable by the processor, and a data bus for implementing connection communication between the processor and the memory, wherein the computer program, when executed by the processor, implements the data processing method as described above.
In a third aspect, an embodiment of the present invention further provides a storage medium for computer-readable storage, where the storage medium stores one or more programs, and the one or more programs are executable by one or more processors to implement the steps of any data processing method provided in the present specification.
The embodiment of the invention provides a data processing method, computer equipment and a storage medium, wherein a to-be-processed file list and an event type corresponding to a to-be-processed event message can be determined by acquiring the to-be-processed event message; by determining the data processing component chain corresponding to the event message to be processed currently, the target data processing component in the data processing component chain can be determined according to the event type in the event message to be processed, so that the event message to be processed with different event types can be processed by different data processing components, and the data processing components are not interfered with each other; by sending the event message to be processed to the target data processing component, the target data processing component can process the file list to be processed according to the event message to be processed, so that a plurality of data processing components in the data processing component chain are isolated from each other and process files in parallel, and the data processing efficiency is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a block diagram of a data processing system according to an embodiment of the present invention;
FIG. 2 is a block diagram schematically illustrating another computer device according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart diagram of a data processing method provided by an embodiment of the invention;
FIG. 4 is a schematic flow chart diagram of generating an initial event message according to an embodiment of the present invention;
FIG. 5 is a schematic interaction diagram for generating an initial event message according to an embodiment of the present invention;
FIG. 6 is a block diagram of a chain of data processing components according to an embodiment of the present invention;
FIG. 7 is a schematic interaction diagram for obtaining a pending event message according to an embodiment of the present invention;
FIG. 8 is a schematic flow chart diagram illustrating sub-steps of a data processing method according to an embodiment of the present invention;
FIG. 9 is a schematic interaction diagram for processing a pending file list according to an embodiment of the present invention;
FIG. 10 is a block diagram of another chain of data processing components provided by an embodiment of the present invention;
FIG. 11 is a block diagram of another chain of data processing components according to an embodiment of the present invention;
fig. 12 is a schematic structural diagram of another data processing component chain according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The flow diagrams depicted in the figures are merely illustrative and do not necessarily include all of the elements and operations/steps, nor do they necessarily have to be performed in the order depicted. For example, some operations/steps may be decomposed, combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
It is to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
Embodiments of the present invention provide a data processing method, a computer device, and a storage medium, where the data processing method may be applied to a data processing system in a server, and implement that a plurality of data processing components in a data processing component chain are isolated from each other and process a to-be-processed file list in parallel by allocating a to-be-processed event message to a target data processing component according to an event type corresponding to the to-be-processed event message, thereby improving data processing efficiency.
The server may be an independent server or a server cluster.
Referring to fig. 1, fig. 1 is a schematic structural diagram of a data processing system according to an embodiment of the present invention. As shown in FIG. 1, a data processing system may include a data source configuration module 1001, a component management module 1002, a service initiation module 1003, and an event manager module 1004.
The data source configuration module 1001 includes a plurality of data source files and data source configuration files corresponding to the plurality of data source files; the data source configuration file includes description information of the data source file.
Illustratively, the data source configuration file includes attribute information, functional configuration information, and component chain information. The attribute information may include, but is not limited to, parameters such as a collection path, a file name expression, a working path, and a backup path.
In some embodiments, a screening condition may be set according to the attribute information, and the text file in the data source file is screened according to the screening condition, so as to obtain a text file meeting the screening condition.
For example, the function configuration information may include function information such as field mapping information, file transcoding information, file merging information, cleansing algorithm, conversion algorithm, and file loading information. It should be noted that the field mapping information is used to define the configuration when extracting data; the file transcoding information is used for defining the configuration of file transcoding; the file merging information is used for defining the configuration of file merging; the cleaning algorithm and the conversion algorithm are used for realizing the cleaning conversion of the file; the file loading information is used to define the configuration of the file when loaded. The component chain information may include, but is not limited to, information including names, function types, and connection orders of the respective data processing components in the data processing component chain.
In this embodiment of the present invention, the component management module 1002 is configured to manage data processing components in the data processing component chain, for example, the data processing components in the data processing component chain may be added or deleted, or the sequence corresponding to the data processing components in the data processing component chain may be adjusted. The data processing assembly is used for processing data according to the event message to be processed distributed by the event manager, constructing a lower-level event message and sending the lower-level event message to the event manager. It is understood that a data processing component refers to a data processing process that is broken down into different steps, each step acting as a data processing component; the different data processing components are independent of each other, and can process a plurality of event messages in parallel.
In this embodiment of the present invention, the service initiation module 1003 is configured to initiate an event manager in the event management module 1004, scan and read a data source file and a data source configuration file; generating a screening condition according to the data source configuration file, and screening the text files in the data source file according to the screening condition to obtain the text files meeting the screening condition; and then constructing an initial event message according to the text file and sending the initial event message to the event manager, so that the event manager distributes the event message to be processed to the data processing component after receiving the initial event message.
In this embodiment of the present invention, the event manager module 1004 includes an event manager, which is equivalent to a management scheduling center in a data processing process, and transmits the event message to be processed to each data processing component, so that each data processing component processes the file list to be processed according to the event message to be processed and generates a lower level event message after the processing is completed. The event manager also adds the lower-level event message into the message queue after receiving the lower-level event message returned by the data processing component, acquires the next event message to be processed from the message queue and sends the next event message to be processed to the lower-level data processing component in the data processing component chain for processing; therefore, the data processing assembly processes the file list to be processed in sequence.
Referring to fig. 2, fig. 2 is a schematic block diagram of a computer device according to an embodiment of the present invention. The computer device may be a server 1000.
Referring to fig. 2, the server 1000 may include a processor 11 and a memory 12, wherein the processor 11 and the memory 12 may be connected by a bus, such as an I2C (Inter-integrated Circuit) bus, which may be any suitable bus.
The memory 12 may include, among other things, a nonvolatile storage medium and an internal memory. The non-volatile storage medium may store an operating system and a computer program. The computer program comprises program instructions which, when executed, cause a processor to perform any of the data processing methods.
The processor 11 is used to provide computing and control capabilities, among other things, to support the operation of the overall computer device.
In an embodiment, the processor 11 is configured to run a computer program stored in the memory 12, and when executing the computer program, to implement the following steps:
acquiring a to-be-processed event message, wherein the to-be-processed event message comprises a to-be-processed file list and an event type; determining a data processing component chain corresponding to the event message to be processed currently, and determining a target data processing component in the data processing component chain according to the event type in the event message to be processed; and sending the event message to be processed to the target data processing component, so that the target data processing component processes the file list to be processed according to the event message to be processed.
The Processor 11 may be a Central Processing Unit (CPU), and may also be other general purpose processors, Digital Signal Processors (DSP), Application Specific Integrated Circuits (ASIC), Field Programmable Gate Arrays (FPGA) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Some embodiments of the invention are described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.
Referring to fig. 3, fig. 3 is a schematic flowchart of a data processing method according to an embodiment of the present invention. The data processing method can be applied to a server, and the event message to be processed is distributed to the target data processing assembly according to the event type corresponding to the event message to be processed, so that a plurality of data processing assemblies in a data processing assembly chain are isolated from each other and process a file list to be processed in parallel, and the data processing efficiency is improved. The data processing method includes steps S10 through S30.
Step S10, obtaining a to-be-processed event message, wherein the to-be-processed event message comprises a to-be-processed file list and an event type.
In the embodiment of the invention, the event message to be processed can be acquired through the event manager. Wherein, the event manager can comprise a message queue and a thread pool; the message queue is used for storing the event messages to be processed; the thread pool is provided with a plurality of threads, and the event manager can allocate the threads to the target data processing assembly from the thread pool when the to-be-processed event message is allocated to the target data processing assembly, so that the target data processing assembly processes the to-be-processed file list according to the to-be-processed event message through the threads.
Illustratively, message queues may include, but are not limited to, RabbitMQ, RocketMQ, ActiveMQ, Kafka, ZeroMQ, and MetaMq, among others. By storing the pending event messages in the message queue, isolation between the pending event messages of the data processing process may be achieved.
In some embodiments, before acquiring the pending event message in step S10, the method may further include: an initial event message is added to the message queue.
In the embodiment of the invention, the event manager can be initialized by the service starting module so as to start the event manager; the event manager, upon startup, may receive an initial event message sent by the service startup module.
The event manager, upon startup, may initialize a message queue, may also initialize a thread pool and start a scan thread. The scanning thread is used for scanning the message queue to determine whether an event message exists in the message queue.
Illustratively, the event manager adds the initial event message to the message queue upon receiving the initial event message sent by the service initiation module. The event manager may retrieve pending event messages from the message queue.
By adding the initial event message to the message queue, the event messages in the message queue can be sequentially used as the event messages to be processed after the event messages exist in the message queue.
In some embodiments, the initial event message needs to be generated before it is added to the message queue. In the embodiment of the present invention, the initial event message is generated in advance by the service initiation module, and the specific generation process may include step S101 to step S103. Referring to fig. 4, fig. 4 is a schematic flow chart of generating an initial event message according to an embodiment of the present invention. Referring to fig. 5, fig. 5 is a schematic interaction diagram for generating an initial event message according to an embodiment of the present invention.
Step S101, obtaining a plurality of data source files and a plurality of data source configuration files corresponding to the data source files, wherein the data source configuration files comprise attribute information, and the data source files comprise at least one text file.
For example, the service initiation module may obtain a data source file and a data source configuration file uploaded by a user, and store the obtained data source file and data source configuration file in a local database.
It should be noted that the data source file refers to a file including a data source. Illustratively, the data sources may include various information of the business items. For example, in a multimedia service project, the data sources may include, but are not limited to, provincial information, city information, regional information, and user information, among others. Wherein the data source typically exists in the form of a text file.
In the embodiment of the present invention, there may be a plurality of data source files, such as data source file a, data source file B, and data source file C. Each data source file includes at least one text file, for example, data source file a may include text files of a1.txt, a2.txt, and a3. txt.
For example, for city information, all city information of a province, such as city code, name, and homed province, is saved in the text file city. Of course, the city information may also be saved in a plurality of text files, for example, in the text files city1.txt, city2.txt and city3. txt.
Illustratively, the data source profile includes attribute information. For example, the attribute information may include parameters such as a collection path, a file name expression, a work path, and a backup path.
And S102, screening text files in the data source files according to the attribute information, and using the screened text files as a list of files to be processed.
In the embodiment of the present invention, the attribute information may be used as a filtering condition for filtering text files meeting the filtering condition in the data source file. For example, the service initiation module may use the collection path in the attribute information as the filtering condition. Illustratively, the service initiation module may further use a file name expression in the attribute information as a filtering condition.
In some embodiments, if the screening condition is the acquisition path a, text files corresponding to the acquisition path a are screened from the data source file a, the data source file B, the data source file C, and the like, and the screened text files are used as a to-be-processed file list. For example, if the text files obtained by screening include A1.txt, B2.txt and C1.txt, the generated list of the files to be processed includes three text files of A1.txt, B2.txt and C1. txt.
In some embodiments, if the filtering condition is a filename expression, text files corresponding to the filename expression are filtered from the data source file a, the data source file B, the data source file C, and the like, and the filtered text files are used as a list of files to be processed. For example, if the text files obtained by screening include a2.txt, b1.txt and c2.txt, the generated list of the files to be processed includes three text files of a2.txt, b1.txt and c2. txt.
By acquiring a plurality of data source files and data source configuration files, attribute information in the data source configuration files can be used as screening conditions, text files in the data source files are screened, and a to-be-processed file list can be obtained; and further generating an initial event message according to the file list to be processed.
Step S103, generating the initial event message according to the file list to be processed.
In the embodiment of the present invention, after determining the to-be-processed file list, the service initiation module may generate an initial event message for processing the to-be-processed file list according to the to-be-processed file list, and determine an event type corresponding to the initial event message.
In some embodiments, after the initial event message corresponding to the to-be-processed file list in step S103, the method may further include: adding the functional configuration information to the initial event message.
Illustratively, the data source profile may also include functional configuration information. The function configuration information is used for defining function information of the file list to be processed, which needs to be processed. Therefore, after the service starting module generates the initial event message according to the file list to be processed, the function configuration information is added into the initial event message, so that the subsequent target data processing component can process the file list to be processed through the function information in the function configuration information.
In some embodiments, after the initial event message corresponding to the to-be-processed file list in step S103, the method may further include: determining a function type corresponding to a first data processing component in the component chain information; and determining the event type corresponding to the initial event message according to the function type corresponding to the first data processing component based on the preset corresponding relation between the event type and the function type.
Illustratively, the data source profile may also include component chain information. The component chain information is used for describing information such as names, function types and connection sequences of the data processing components in the data processing component chain. The function types may include, but are not limited to, functions such as data extraction, file transcoding, cleansing conversion, file merging, and file loading.
By way of example, event types may include, but are not limited to, data extraction events, file transcoding events, cleansing conversion events, file merging events, and file loading events; the event types may be respectively expressed as an ExtractEvent, a converteevent, a TransformEvent, a MergeEvent, a LoadEvent, and the like.
As shown in fig. 6, fig. 6 is a schematic structural diagram of a data processing component chain according to an embodiment of the present invention. In embodiments of the present invention, the chain of data processing components may include, but is not limited to, components such as a data extraction component, a file transcoding component, a cleansing conversion component, a file merging component, and a file loading component. The data processing components may be represented as: ExtractHandler- > ConvertHandler- > transformHandler- > MergeHandler- > LoadHandler.
In some embodiments, a function type corresponding to a first data processing component in the component chain information is determined.
In the embodiment of the invention, the data processing components with different function types are used for processing the event messages to be processed with different event types. Illustratively, the data extraction component ExtractHandler is responsible for processing the pending event message with the event type of ExtractEvent; the file transcoding component ConvertEventr is responsible for processing the event message to be processed with the event type ConvertEvent. Therefore, after the service starting module generates the initial event message according to the to-be-processed file list, the event type corresponding to the initial event message needs to be determined, so that the event manager can distribute the to-be-processed event message to the data processing component with the corresponding function type according to the event type for processing.
Illustratively, the data processing components correspond to the function types as shown in table 1.
TABLE 1 functional types of data processing components
Data processing component name Type of function
ExtractHandler Data extraction
ConvertHandler File transcoding
TransformHandler Cleaning conversion
MergeHandler File merging
LoadHandler File loading
For example, if the first data processing component in the component chain information is the data extraction component ExtractHandler, it may be determined that the function type corresponding to the first data processing component is data extraction.
For example, if the first data processing component in the component chain information is a data extraction component TransformHandler, it may be determined that the function type corresponding to the first data processing component is cleaning conversion.
In some embodiments, based on a preset correspondence between the event type and the function type, the event type corresponding to the initial event message is determined according to the function type corresponding to the first data processing component.
It should be noted that the initial event message is assigned to the first data processing component in the data processing component chain for processing, and therefore the event type corresponding to the initial event message is to be matched with the function type corresponding to the first data processing component in the data processing component chain.
For example, the event type and the function type may be associated in advance and stored in a local database, as shown in table 2.
Table 2 is a table of correspondence between function types and event types
Type of function Event type
Data extraction ExtractEvent
File transcoding ConvertEvent
Cleaning conversion TransformEvent
File merging MergeEvent
File loading LoadEvent
For example, if the function type corresponding to the first data processing component is data extraction, it may be determined that the event type corresponding to the initial event message is an ExtractEvent. For example, if the function type corresponding to the first data processing component is the flush transition, it may be determined that the event type corresponding to the initial event message is TransformEvent.
By determining the function type corresponding to the first data processing component in the component chain information, the event type corresponding to the initial event message can be determined according to the function type corresponding to the first data processing component based on the preset corresponding relationship between the event type and the function type.
In some embodiments, the obtaining of the pending event message in step S10 may include: and taking the initial event message in the message queue as a pending event message.
For example, the event manager may scan the message queue through the scanning thread at regular time or in real time to treat the initial event message in the message queue as the pending event message.
Referring to fig. 7, fig. 7 is a schematic interaction diagram for acquiring a pending event message according to an embodiment of the present invention. In some embodiments, the message queue is scanned by the scanning thread, and after determining that there is an event message in the message queue, the event message in the message queue is sequentially used as a pending event message. For example, the first event message in the message queue may be sequentially used as the pending event message according to a first-in first-out principle. It should be noted that the first event message in the message queue is an initial event message, and the subsequent event messages in the message queue are lower-level event messages generated by the data processing component after the pending event messages are processed.
In other embodiments, the message queue is scanned by the scan thread and a hold scan is performed if there are no event messages in the message queue.
By sequentially taking the event messages in the message queue as the event messages to be processed, the data processing components with different function types can be used for respectively processing the file list to be processed, so that the file list can be processed only by mutually isolating and processing the different data processing components in parallel.
Step S20, determining a data processing component chain corresponding to the event message to be processed currently, and determining a target data processing component in the data processing component chain according to the event type in the event message to be processed.
In some embodiments, determining the data processing component chain to which the pending event message currently corresponds may include: and determining a data processing component chain corresponding to the event message to be processed currently according to the component chain information.
It can be understood that, since the component chain information includes information such as the name, the function type, and the connection order of each data processing component in the data processing component chain, the data processing component chain to which the event message to be processed currently corresponds may be determined according to the component chain information. It should be noted that, in different service items, the data processing chains for processing the to-be-processed file list may be the same or different. The user can configure the data processing component chain in advance according to the actual situation, and define the component chain information in the data source configuration file or update the component chain information.
For example, the component chain information may be obtained from the data source configuration file, and the current data processing component chain of the event message to be processed may be determined according to the component chain information.
In some embodiments, according to the name, the function type, and the connection order of each data processing component in the component chain information, the current data processing component chain of the to-be-processed event message may be determined as: ExtractHandler- > ConvertHandler- > transformHandler- > MergeHandler- > LoadHandler, as shown in FIG. 6. It should be noted that the event manager sequentially uses the event messages in the message queue as event messages to be processed, and the event messages can be sequentially processed in the data processing component chain, so that the data processing components of different functional types concurrently process the file list to be processed, and the data processing efficiency is improved.
By determining the data processing component chain corresponding to the event message to be processed currently, the target data processing component corresponding to the event message to be processed can be determined in the data processing component chain subsequently.
In some embodiments, determining a target data processing component in the chain of data processing components according to the event type in the event message to be processed may include: determining a target function type corresponding to the event type in the event message to be processed based on a preset corresponding relation between the event type and the function type; and determining the data processing component corresponding to the target function type as a target data processing component.
It should be noted that, in the embodiment of the present invention, since the event type and the function type are associated in advance and stored in the local database, a preset corresponding relationship between the event type and the function type may be acquired from the local database.
Illustratively, based on a preset corresponding relationship between an event type and a function type, if the event type in the event message to be processed is an ExtractEvent, it may be determined that a target function type corresponding to the event type ExtractEvent is "data extraction"; since the data processing component corresponding to the target function type "data extraction" is the data processing component ExtractHandler, the data processing component ExtractHandler can be used as the target data processing component corresponding to the to-be-processed event message.
For example, if the event type in the event message to be processed is converteevent, it may be determined that the target function type corresponding to the event type converteevent is "file transcoding"; because the data processing component corresponding to the target function type "file transcoding" is the data processing component convertlhandler, the data processing component convertlhandler can be used as the target data processing component corresponding to the to-be-processed event message.
And determining a target function type corresponding to the event type in the event message to be processed based on the preset corresponding relation between the event type and the function type, and further determining the data processing component corresponding to the target function type as a target data processing component.
Step S30, sending the to-be-processed event message to the target data processing component, so that the target data processing component processes the to-be-processed file list according to the to-be-processed event message.
In the embodiment of the invention, after the target data processing component corresponding to the event message to be processed is determined, the event manager sends the event message to be processed to the target data processing component. After receiving the event message to be processed below the event manager, the target data processing component may perform function processing on the file list to be processed according to the function configuration information in the event message to be processed, so as to obtain a processed file list to be processed.
The event message to be processed is sent to the target data processing assembly, so that the target data processing assembly processes the file list to be processed according to the event message to be processed, the data processing process is divided into a plurality of data processing assemblies to be processed, the data processing process is more flexible, and the data processing assemblies can be expanded and adjusted more conveniently.
In some embodiments, as shown in fig. 8, after the to-be-processed event message is sent to the target data processing component in step S30, the following steps S40 and S50 may be further included.
Step S40, determining a target thread in a thread pool, and sending the target thread to the target data processing component, so that the target data processing component processes the to-be-processed file list based on the target thread.
For example, the thread pool may include a plurality of threads, and the event manager may target an idle thread in the thread pool. The target thread is distributed to the target data processing assembly, so that the target data processing assembly processes the file list to be processed based on the target thread, and different data processing assemblies have corresponding target threads, so that data can be processed among the data processing assemblies in parallel, and the data processing efficiency is effectively improved.
In some embodiments, the event manager may also adjust the thread pool size based on the actual situation. For example, when the number of event messages in the message queue is greater than the preset data value, the thread pool may be increased, so that more available threads in the thread pool are allocated to the data processing component, thereby improving the efficiency of data processing.
Referring to fig. 9, fig. 9 is a schematic interaction diagram for processing a pending file list in a data processing component chain according to an embodiment of the present invention. In the embodiment of the present invention, after receiving the target thread, the target data processing component may process the to-be-processed file list according to the to-be-processed event message through the target thread.
In some embodiments, processing the pending file list according to the pending event message may include: and acquiring the function configuration information in the event message to be processed, and finishing the function processing of the file list to be processed in the current working directory according to the function information in the function configuration information.
For example, the function information in the function configuration information may include, but is not limited to, field mapping information, file transcoding information, file merging information, cleansing algorithm, conversion algorithm, file loading information, and the like. It should be noted that, when target data processing components of different function types process the to-be-processed file list, the required function information is different. For example, the function information required by the data extraction component ExtractHandler is field mapping information, and data extraction on the file list to be processed can be realized through the field mapping information. For example, the function information required by the file transcoding component convertlandler is file transcoding information, and file transcoding can be performed on the file list to be processed through the file transcoding information.
For example, when the target data processing component is a data extraction component, data extraction processing of the file list to be processed is completed in the current working directory according to the field mapping information in the function configuration information. Wherein, the current working directory can be represented as Extract directory.
For example, when the target data processing component is a cleaning conversion component, cleaning conversion processing of the file list to be processed is completed in the current working directory according to the cleaning algorithm and the conversion algorithm in the functional configuration information. Wherein, the current working directory can be represented as a Transform directory.
For another example, when the target data processing component is a file loading component, the file loading processing of the file list to be processed is completed in the current working directory according to the file loading information in the function configuration information.
By acquiring the function configuration information in the event message to be processed, the function processing of the file list to be processed can be completed in the current working directory according to the function information in the function configuration information.
And step S50, receiving a lower event message sent by the target data processing component after the target data processing component completes processing the to-be-processed file list.
In the embodiment of the present invention, after the target data processing component finishes processing the to-be-processed file list, a lower level event message may be generated and sent to the event manager.
For example, the target data processing component may determine the functional type of the next data processing component from the component chain information; and then determining the event type corresponding to the lower-level event message according to the function type corresponding to the next data processing component based on the preset corresponding relation between the event type and the function type, and adding the event type to the lower-level event message. The event manager may thus determine the next target data processing component according to the event type corresponding to the lower level event message.
Referring to fig. 8, after the step S50 receives the lower event message sent after the target data processing component completes processing the pending file list, steps S60 to S80 may be further included.
Step S60, adding the lower level event message to the message queue, and obtaining the next to-be-processed event message from the message queue.
For example, the event manager may add the lower level event message to the message queue after receiving the lower level event message, and use the current first event message in the message queue as the next event message to be processed on the basis of the first-in-first-out principle.
The method and the device can realize the sequential processing of the file list to be processed in the data processing component chain by receiving the lower-level event message sent by the target data processing component after the processing of the file list to be processed is finished and acquiring one event message to be processed from the message queue.
And step S70, determining the lower data processing component of the target data processing component in the data processing component chain.
In an embodiment of the present invention, a pending event message is a current chain of data processing components, as shown in fig. 6.
For example, if the target data processing component is an ExtractHandler, it may be determined that the subordinate data processing component is a convertlhandler.
For example, if the target data processing component is convertlhandler, it may be determined that the subordinate data processing component is TransformHandler.
Step S80, moving the to-be-processed file list to a work directory corresponding to the lower data processing component, so that the lower data processing component processes the to-be-processed file list according to the next to-be-processed event message after receiving the next to-be-processed event message.
After the subordinate data processing component is determined, the target data processing component needs to move the file list to be processed to a working directory corresponding to the subordinate data processing component. Therefore, after receiving the next to-be-processed event message, the lower-level data processing component can process the to-be-processed file list according to the next to-be-processed event message in the local working directory.
It should be noted that, the to-be-processed file list is moved to the work directory corresponding to the lower-level data processing component, so that the to-be-processed file list can be sequentially processed in the data processing component chain, and the to-be-processed file list can be processed in parallel in different data processing components, so that the data processing efficiency can be improved.
In some embodiments, the data processing method provided in the embodiments of the present invention may further include: when the modification operation of the data processing component chain by the user is acquired, modifying the data processing component chain according to the modification operation, and updating component chain information corresponding to the data processing component chain.
Illustratively, the modification operation includes adding a preset data processing component, deleting a data processing component in the chain of data processing components, and adjusting the order of the components of the chain of data processing components.
It should be noted that, the user may modify the data processing component chain on the system according to the actual data processing process of the text file in combination with the actual situation of the service item. The component management module can modify the data processing component chain according to the modification operation of the user; the data source configuration module may update the component chain information in the data source configuration file after detecting that the data processing component chain has changed.
Illustratively, when the actual data processing process of the text file needs to add a step of loading the GP database, the user may add a component GpLoadHandler to the data processing component chain. For example, after a file in a data processing component chain is loaded into a component LoadHandler, a component GpLoadHandler is added; at this time, the chain of data processing components may include: the method includes the steps of extracting handler- > convertlhandler- > TransformHandler- > MergeHandler- > LoadHandler- > GpLoadHandler, as shown in fig. 10, and fig. 10 is a schematic structural diagram of another data processing component chain provided by an embodiment of the present invention.
For example, a user may delete a component in a chain of data processing components. For example, the component MergeHandler is deleted in the data processing component chain; at this time, the chain of data processing components may include: the method includes the steps of extracting handler- > convertlhandler- > TransformHandler- > LoadHandler, as shown in fig. 11, where fig. 11 is a schematic structural diagram of another data processing component chain provided by an embodiment of the present invention.
Illustratively, the user may also adjust the component order in the chain of data processing components. For example, in adjusting the order of the component ConvertHandler and the component TransformHandler; at this time, the chain of data processing components may include: the method includes the steps of extracting handler- > TransformHandler- > convertlhandler- > MergeHandler- > LoadHandler, as shown in fig. 12, and fig. 12 is a schematic structural diagram of another data processing component chain provided by the embodiment of the present invention.
When the modification operation of the data processing component chain by the user is obtained, the data processing component chain is modified according to the modification operation, so that the flexible arrangement of the data processing process can be realized, and meanwhile, the data processing component chain can be more conveniently expanded.
In some embodiments, the data processing method provided in the embodiments of the present invention may further include: when the configuration operation of the user on the data source configuration file is acquired, configuring the data source corresponding to the data source configuration file according to the configuration operation.
The configuration operation comprises adding a data source configuration file, deleting the data source configuration file and modifying the data source configuration file.
It should be noted that, in the embodiment of the present invention, a user may configure a data source through a data source configuration file. Such as adding a data source, deleting an existing data source, or modifying an existing data source.
In the embodiment of the present invention, a data source module configuration module may receive a configuration operation of a user on a data source configuration file, and configure a data source corresponding to the data source configuration file according to the configuration operation.
Illustratively, when configuration operation of a newly added data source configuration file of a user is obtained, a data source corresponding to the newly added data source configuration file may be accessed. It should be noted that accessing a data source means that the data processing system supports processing of that type of data source. By adding the data source configuration file, the data source corresponding to the newly added data source configuration file can be quickly and flexibly accessed in the data processing system.
For example, when a configuration operation that a user deletes a data source configuration file is obtained, a data source corresponding to the deleted data source configuration file may be deleted.
Through the configuration operation of the user on the data source configuration file, the data source can be newly added, deleted and modified, so that a more flexible and convenient data source configuration mode can be provided.
According to the data processing method, the computer device and the storage medium provided by the embodiment, by acquiring the plurality of data source files and the data source configuration file, the attribute information in the data source configuration file can be used as the screening condition to screen the text files in the plurality of data source files, so that a to-be-processed file list can be obtained, and further the initial event message can be generated according to the to-be-processed file list; adding the function configuration information into the initial event message so that the subsequent target data processing component can process the file list to be processed through the function information in the function configuration information; determining a function type corresponding to a first data processing component in the component chain information, and further determining an event type corresponding to the initial event message according to the function type corresponding to the first data processing component based on a preset corresponding relation between the event type and the function type; determining a data processing component chain corresponding to the event message to be processed currently, and subsequently determining a target data processing component corresponding to the event message to be processed in the data processing component chain; the event message to be processed is sent to the target data processing assembly, so that the target data processing assembly processes the file list to be processed according to the event message to be processed, the data processing process is divided into a plurality of data processing assemblies for processing, the data processing process is more flexible, and the data processing assemblies can be expanded and adjusted more conveniently; the target thread is distributed to the target data processing assembly, so that the target data processing assembly processes the file list to be processed based on the target thread, and different data processing assemblies have corresponding target threads, so that data can be processed in parallel among the data processing assemblies, and the data processing efficiency is effectively improved; when the modification operation of the data processing component chain by a user is acquired, the data processing component chain is modified according to the modification operation, so that the flexible arrangement of the data processing process can be realized, and the data processing component chain can be expanded more conveniently; through the configuration operation of the user on the data source configuration file, the data source can be newly added, deleted and modified, so that a more flexible and convenient data source configuration mode can be provided.
Embodiments of the present invention also provide a storage medium for computer-readable storage, the storage medium storing one or more programs, the one or more programs being executable by one or more processors to implement the steps of any of the data processing methods provided in the specification of the embodiments of the present invention.
For example, the program is loaded by a processor and may perform the following steps:
acquiring a to-be-processed event message, wherein the to-be-processed event message comprises a to-be-processed file list and an event type; determining a data processing component chain corresponding to the event message to be processed currently, and determining a target data processing component in the data processing component chain according to the event type in the event message to be processed; and sending the event message to be processed to the target data processing component, so that the target data processing component processes the file list to be processed according to the event message to be processed.
The storage medium may be an internal storage unit of the computer device described in the foregoing embodiment, for example, a hard disk or a memory of the computer device. The storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the computer device.
It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware embodiment, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.
It should be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments. While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (14)

1. A method of data processing, the method comprising:
acquiring a to-be-processed event message, wherein the to-be-processed event message comprises a to-be-processed file list and an event type;
determining a data processing component chain corresponding to the event message to be processed currently, and determining a target data processing component in the data processing component chain according to the event type in the event message to be processed;
and sending the event message to be processed to the target data processing component, so that the target data processing component processes the file list to be processed according to the event message to be processed.
2. The data processing method according to claim 1, wherein before the obtaining the event message to be processed, further comprising:
adding an initial event message to a message queue;
the acquiring the event message to be processed includes:
and taking the initial event message in the message queue as the event message to be processed.
3. The data processing method of claim 2, wherein before adding the initial event message to the message queue, further comprising:
acquiring a plurality of data source files and a plurality of data source configuration files corresponding to the data source files, wherein the data source configuration files comprise attribute information, and the data source files comprise at least one text file;
screening text files in the data source files according to the attribute information, and taking the text files obtained by screening as a list of files to be processed;
and generating the initial event message according to the file list to be processed.
4. The data processing method of claim 3, wherein the data source configuration file further comprises functional configuration information; after the generating the initial event message according to the to-be-processed file list, the method further includes:
adding the functional configuration information to the initial event message.
5. The data processing method of claim 3, wherein the data source profile further comprises component chain information; after the generating the initial event message according to the to-be-processed file list, the method further includes:
determining a function type corresponding to a first data processing component in the component chain information;
and determining the event type corresponding to the initial event message according to the function type corresponding to the first data processing component based on the preset corresponding relation between the event type and the function type.
6. The data processing method of claim 5, wherein the event types include a data extraction event, a file transcoding event, a cleansing conversion event, a file merging event, and a file loading event;
the function types comprise data extraction, file transcoding, cleaning conversion, file merging and file loading.
7. The data processing method of claim 5, wherein the data source profile further comprises component chain information; the determining of the data processing component chain currently corresponding to the event message to be processed includes:
and determining a data processing component chain currently corresponding to the event message to be processed according to the component chain information.
8. The data processing method according to claim 1, wherein determining a target data processing component in the chain of data processing components according to the event type in the event message to be processed comprises:
determining a target function type corresponding to the event type in the event message to be processed based on a preset corresponding relation between the event type and the function type;
and determining the data processing component corresponding to the target function type as the target data processing component.
9. The data processing method of claim 2, wherein after sending the pending event message to the target data processing component, further comprising:
determining a target thread in a thread pool, and sending the target thread to the target data processing component so that the target data processing component processes the list of files to be processed based on the target thread;
and receiving a subordinate event message sent by the target data processing component after the target data processing component finishes processing the file list to be processed.
10. The data processing method according to claim 9, wherein after receiving a subordinate event message sent by the target data processing component after completing processing of the pending file list, the method further comprises:
adding the lower-level event message into the message queue, and acquiring a next event message to be processed from the message queue;
determining a lower data processing component of the target data processing component in the chain of data processing components;
and moving the file list to be processed to a working directory corresponding to the subordinate data processing component, so that the subordinate data processing component processes the file list to be processed according to the next event message to be processed after receiving the next event message to be processed.
11. A data processing method according to any one of claims 1 to 10, characterized in that the method comprises:
when the modification operation of a user on the data processing component chain is acquired, modifying the data processing component chain according to the modification operation, and updating component chain information corresponding to the data processing component chain;
and the modification operation comprises adding a preset data processing assembly, deleting the data processing assembly in the data processing assembly chain and adjusting the assembly sequence of the data processing assembly chain.
12. A data processing method according to any one of claims 1 to 10, characterized in that the method comprises:
when configuration operation of a user on a data source configuration file is acquired, configuring a data source corresponding to the data source configuration file according to the configuration operation, wherein the configuration operation comprises adding the data source configuration file, deleting the data source configuration file and modifying the data source configuration file.
13. A computer arrangement comprising a processor, a memory, a computer program stored on the memory and executable by the processor, and a data bus for enabling connection communication between the processor and the memory, wherein the computer program, when executed by the processor, implements a data processing method as claimed in any one of claims 1 to 12.
14. A storage medium for readable storage, wherein the storage medium stores one or more programs, which are executable by one or more processors to implement a data processing method according to any one of claims 1 to 12.
CN202010899539.0A 2020-08-31 2020-08-31 Data processing method, computer device, and storage medium Pending CN114116245A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010899539.0A CN114116245A (en) 2020-08-31 2020-08-31 Data processing method, computer device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010899539.0A CN114116245A (en) 2020-08-31 2020-08-31 Data processing method, computer device, and storage medium

Publications (1)

Publication Number Publication Date
CN114116245A true CN114116245A (en) 2022-03-01

Family

ID=80360132

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010899539.0A Pending CN114116245A (en) 2020-08-31 2020-08-31 Data processing method, computer device, and storage medium

Country Status (1)

Country Link
CN (1) CN114116245A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115866057A (en) * 2022-11-30 2023-03-28 中国联合网络通信集团有限公司 Data processing method and device and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115866057A (en) * 2022-11-30 2023-03-28 中国联合网络通信集团有限公司 Data processing method and device and storage medium

Similar Documents

Publication Publication Date Title
CN102279888B (en) Method and system for scheduling tasks
CN105808778B (en) A kind of mass data extracts, conversion, loading method and device
CN109933338B (en) Block chain deployment method, device, computer equipment and storage medium
CN111399764B (en) Data storage method, data reading device, data storage equipment and data storage medium
CN110647570B (en) Data processing method and device and electronic equipment
US11232123B2 (en) Pseudo-synchronous processing by an analytic query and build cluster
CN110659124A (en) Message processing method and device
CN104182295A (en) Data backup method and data backup device
CN114116245A (en) Data processing method, computer device, and storage medium
CN114827082A (en) Method, system, device and medium for generating globally unique ID of distributed system
CN102215264A (en) Method and device for supporting multi-tenancy data and service customized running
CN115687333B (en) V2x big data life cycle management method and device
WO2014180411A1 (en) Distributed index generation method and device
US7127446B1 (en) File system based task queue management
CN105760215A (en) Map-reduce model based job running method for distributed file system
CN108874798B (en) Big data sorting method and system
CN113377500B (en) Resource scheduling method, device, equipment and medium
CN114020843A (en) Spark framework-based data table synchronization method, device and storage medium
CN114020719A (en) License data migration method applied to heterogeneous database
CN111190607A (en) Task plug-in processing method and device, task scheduling server and storage medium
CN114817396A (en) Data synchronization method, device, equipment and storage medium
CN110990193A (en) Log backup method, device and system
CN107168685B (en) Method and device for updating script and computer terminal
GB2542585A (en) Task scheduler and task scheduling process
CN115357342B (en) Cold start resource processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination