CN114116803A - Method, device and equipment for processing big data file and storage medium - Google Patents

Method, device and equipment for processing big data file and storage medium Download PDF

Info

Publication number
CN114116803A
CN114116803A CN202111440255.6A CN202111440255A CN114116803A CN 114116803 A CN114116803 A CN 114116803A CN 202111440255 A CN202111440255 A CN 202111440255A CN 114116803 A CN114116803 A CN 114116803A
Authority
CN
China
Prior art keywords
data
processed
processing
file
big
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111440255.6A
Other languages
Chinese (zh)
Inventor
辜坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
Original Assignee
China Construction Bank Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp filed Critical China Construction Bank Corp
Priority to CN202111440255.6A priority Critical patent/CN114116803A/en
Publication of CN114116803A publication Critical patent/CN114116803A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/547Messaging middleware
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/548Queue

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a method, a device, equipment and a storage medium for processing a big data file, which relate to the field of data storage and can be applied to any application server to improve the data processing capacity of the application server on the big data file. The method comprises the following steps: and receiving a data processing task and a data configuration file, wherein the data processing task is used for processing the big data text file, and the data configuration file is used for defining rules for carrying out block processing, state recording and updating on the big data text file. And the data processing task reads and executes the data to be processed of the big data text file according to the data configuration file rule, and simultaneously records the data processing state until all the data to be processed in the big data text file are successfully processed. According to the scheme, the data to be processed of the big data text file is sequentially subjected to blocking processing and state recording based on the data configuration file, so that the flexibility and the efficiency of processing the file by the server can be improved.

Description

Method, device and equipment for processing big data file and storage medium
Technical Field
The present application relates to the field of data storage, and in particular, to a method, an apparatus, a device, and a storage medium for processing a big data file.
Background
With the accelerated development and application of network information technology, the application fields of the internet are greatly expanded by the internet of things, mobile internet, social networks and the like, data in the internet era is rapidly expanding, and big data becomes a new hotspot of information technology development.
The data under the big data environment has abundant sources and various data types, and the data volume of storage, analysis and mining is huge, so that the requirement on data display is high. The traditional data processing method is centered on a processor, and in a big data environment, a data-centered mode needs to be adopted, so that the overhead caused by data movement is reduced. Therefore, the conventional data processing method cannot meet the requirement of large data.
Disclosure of Invention
The embodiment of the application provides a method, a device, equipment and a storage medium for processing a big data file, so as to improve the data processing capacity of an application server on the big data file.
A first aspect of an embodiment of the present application provides a method for processing a big data file, including:
receiving a data processing task and a data configuration file, wherein the data processing task is used for processing a big data text file, and the data configuration file is used for indicating a method for carrying out block processing, recording and updating on the big data text file;
and reading and executing the data to be processed of the big data text file according to the data configuration file, and simultaneously recording the data processing state until all the data to be processed in the big data text file are successfully processed.
In an optional embodiment of the first aspect of the present application, the reading, according to the data configuration file, the to-be-processed data of the big data text file includes:
and reading the data to be processed of the big data text file from the redis middleware according to the data configuration file.
In an optional embodiment of the first aspect of the present application, the data configuration file includes configuration information of a number of lines of single processing performed on the big data text file; the reading and executing the data to be processed of the big data text file according to the data configuration file, and simultaneously recording the data processing state until all the data to be processed in the big data text file are successfully processed, including:
segmenting the data to be processed of the big data text file according to the single processing line number to obtain a plurality of data units to be processed, wherein each data unit to be processed comprises M lines of data to be processed, and M is a positive integer;
and sequentially reading and executing the plurality of data units to be processed, and simultaneously recording the data processing state of each data unit to be processed until all the data units to be processed are successfully processed.
In an optional embodiment of the first aspect of the present application, the data profile comprises configuration information recording a single processing state; the method further comprises the following steps:
acquiring an initial data processing marking key value pair corresponding to each data unit to be processed according to the configuration information for recording the single processing state, wherein the initial data processing marking key value pair corresponding to each data unit to be processed indicates that the processing state of the data unit to be processed is unprocessed;
and storing the initial data processing marking key value pairs corresponding to the plurality of data units to be processed into a database.
In an optional embodiment of the first aspect of the present application, the sequentially reading and executing the plurality of data units to be processed includes:
reading a data processing marking key value pair corresponding to a first data unit to be processed from a redis middleware, and if the data processing marking key value pair indicates that the processing state of the first data unit to be processed is processing failure or unprocessed, acquiring and executing data to be processed in the first data unit to be processed;
the first unit of data to be processed is any one of the plurality of units of data to be processed.
In an optional embodiment of the first aspect of the present application, the method further comprises:
and if the data processing marking key value pair corresponding to the first data unit to be processed indicates that the processing state of the first data unit to be processed is successful, skipping the first data unit to be processed, and reading the data processing marking key value pair of the next data unit to be processed.
In an optional embodiment of the first aspect of the present application, the method further comprises:
and if the first to-be-processed data unit is successfully processed, updating the data processing mark key value pair corresponding to the first to-be-processed data unit, wherein the updated data processing mark key value pair corresponding to the first to-be-processed data unit indicates that the processing state of the first to-be-processed data unit is successful.
In an optional embodiment of the first aspect of the present application, the storing, to a database, the initial data processing marker key-value pairs corresponding to the plurality of data units to be processed includes:
and storing the initial data processing marking key value pairs corresponding to the plurality of data units to be processed into the redis middleware by taking the file name of the big data text file as a key and the initial data processing marking key value pairs corresponding to the plurality of data units to be processed as values.
A second aspect of the embodiments of the present application provides a device for processing a big data file, including:
the device comprises a receiving module, a processing module and a processing module, wherein the receiving module is used for receiving a data processing task and a data configuration file, the data processing task is used for processing a big data text file, and the data configuration file is used for indicating a method for carrying out block processing, recording and updating on the big data text file;
and the processing module is used for reading and executing the data to be processed of the big data text file according to the data configuration file, and simultaneously recording the data processing state until all the data to be processed in the big data text file are successfully processed.
In an optional embodiment of the second aspect of the present application, the processing module is configured to:
and reading the data to be processed of the big data text file from the redis middleware according to the data configuration file.
In an optional embodiment of the second aspect of the present application, the data configuration file includes configuration information of a number of lines of single processing performed on the big data text file; the processing module is configured to:
segmenting the data to be processed of the big data text file according to the single processing line number to obtain a plurality of data units to be processed, wherein each data unit to be processed comprises M lines of data to be processed, and M is a positive integer;
and sequentially reading and executing the plurality of data units to be processed, and simultaneously recording the data processing state of each data unit to be processed until all the data units to be processed are successfully processed.
In an optional embodiment of the second aspect of the application, the data profile comprises configuration information recording a single processing state; the processing device of the big data file further comprises: a storage module;
the processing module is used for acquiring an initial data processing mark key value pair corresponding to each data unit to be processed according to the configuration information for recording the single processing state, wherein the initial data processing mark key value pair corresponding to each data unit to be processed indicates that the processing state of the data unit to be processed is unprocessed;
and the storage module is used for storing the initial data processing mark key value pairs corresponding to the plurality of data units to be processed into a database.
In an optional embodiment of the second aspect of the application, the processing module is configured to:
reading a data processing marking key value pair corresponding to a first data unit to be processed from a redis middleware, and if the data processing marking key value pair indicates that the processing state of the first data unit to be processed is processing failure or unprocessed, acquiring and executing data to be processed in the first data unit to be processed;
the first unit of data to be processed is any one of the plurality of units of data to be processed.
In an optional embodiment of the second aspect of the application, the processing module is configured to:
and if the data processing marking key value pair corresponding to the first data unit to be processed indicates that the processing state of the first data unit to be processed is successful, skipping the first data unit to be processed, and reading the data processing marking key value pair of the next data unit to be processed.
In an optional embodiment of the second aspect of the present application, the processing apparatus for big data files further comprises: an update module;
and if the first to-be-processed data unit is successfully processed, the updating module is configured to update the data processing flag key value pair corresponding to the first to-be-processed data unit, and the updated data processing flag key value pair corresponding to the first to-be-processed data unit indicates that the processing state of the first to-be-processed data unit is successfully processed.
In an optional embodiment of the second aspect of the present application, the storage module is configured to:
and storing the initial data processing marking key value pairs corresponding to the plurality of data units to be processed into the redis middleware by taking the file name of the big data text file as a key and the initial data processing marking key value pairs corresponding to the plurality of data units to be processed as values.
A third aspect of embodiments of the present application provides an electronic device, including:
a memory;
a processor; and
a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of any of the first aspects.
A fourth aspect of embodiments of the present application provides a computer-readable storage medium having stored thereon a computer program for execution by a processor to perform the method according to any one of the first aspect.
A fifth aspect of embodiments of the present application provides a computer program product comprising a computer program that, when executed by a processor, implements the method of any one of the first aspects.
The embodiment of the application provides a method, a device, equipment and a storage medium for processing a big data file, which can be applied to any application server to improve the data processing capacity of the application server on the big data file. The processing method comprises the following steps: and receiving a data processing task and a data configuration file, wherein the data processing task is used for processing the big data text file, and the data configuration file is used for indicating a method for carrying out block processing, state recording and updating on the big data text file. And reading and executing the data to be processed of the big data text file according to the data configuration file, and simultaneously recording the data processing state until all the data to be processed in the big data text file are successfully processed. According to the scheme, the data to be processed of the big data text file is sequentially subjected to blocking processing and state recording based on the data configuration file, so that the flexibility and the efficiency of processing the file by the server can be improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.
Fig. 1 is a first scene schematic diagram of a method for processing a big data file according to an embodiment of the present application;
fig. 2 is a scene schematic diagram ii of a method for processing a big data file according to an embodiment of the present application;
fig. 3 is a first flowchart illustrating a method for processing a big data file according to an embodiment of the present disclosure;
fig. 4 is a second flowchart illustrating a method for processing a big data file according to an embodiment of the present application;
fig. 5 is a third schematic flowchart of a processing method for a big data file according to an embodiment of the present application;
fig. 6 is a fourth schematic flowchart of a method for processing a big data file according to an embodiment of the present application;
FIG. 7 is a first schematic structural diagram of a device for processing a big data file according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of a large data file processing apparatus according to an embodiment of the present application;
fig. 9 is a hardware structure diagram of an electronic device according to an embodiment of the present application.
With the above figures, there are shown specific embodiments of the present application, which will be described in more detail below. These drawings and written description are not intended to limit the scope of the inventive concepts in any manner, but rather to illustrate the inventive concepts to those skilled in the art by reference to specific embodiments.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first," "second," and the like in the description and in the claims, and in the drawings, of the embodiments of the application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in other sequences than described or illustrated herein.
It will be understood that the terms "comprises" and "comprising," and any variations thereof, as used herein, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In the description of the embodiments of the present application, the term "correspond" may indicate that there is a direct correspondence or an indirect correspondence between the two, may also indicate that there is an association between the two, and may also indicate and be indicated, configure and configured, and so on.
Before introducing the processing method of the big data file according to the embodiment of the present application, an application scenario of the processing method of the big data file is briefly introduced first.
Fig. 1 is a first scene schematic diagram of a method for processing a big data file according to an embodiment of the present application. As shown in fig. 1, the scenario includes a terminal device 11, an application server 12, and a third-party server 13, where the terminal device 11 is in communication connection with the application server 12, and the application server 12 is in communication connection with the third-party server 13. The application server 12 provides a plurality of service services for the user, and the user accesses the application server 12 through the terminal device 11 to initiate a service request. The application server 12 receives the service request from the terminal device 11, and when processing the service request, sometimes needs to perform data interaction with the third party server 13 to fulfill the service requirement of the user. The application server 12 sends a response message of the service request to the terminal device 11 to which the user belongs based on the data returned by the third party server 13.
As an example, the application server 12 is a bank server, and the bank server provides the user with services such as point exchange goods, and the like, and the services relate to a logistics service, and the logistics service is provided by a third party server 13, namely a server of a logistics company. After the user initiates the point exchange of the article, the bank server needs to interact data with the third-party server 13, and the logistics information of the exchanged article is updated in real time.
Fig. 2 is a scene schematic diagram ii of a method for processing a big data file according to an embodiment of the present application. Different from fig. 1, the scenario includes a plurality of third-party servers 13, the application server 12 is in communication connection with the plurality of third-party servers 13, each third-party server 13 sends a data file to the application server 12, and the application server 12 performs data analysis based on the data files of the plurality of third-party servers 13 to provide a big data service for a user.
Based on the above scenario, in actual application, the application server is involved in exchanging data with a third-party server, and as the traffic volume increases, data files returned by the third party become larger and larger, and in the actual processing process of the application server, the memory overflow may be caused by the file being too large, thereby causing a problem of abnormal operation of the application server. The memory overflow refers to a phenomenon that the amount of data loaded in the memory of the computer exceeds the storage size of the memory, so that the storage is broken down, and an application system cannot work normally.
In order to solve the above problems, an embodiment of the present application provides a method for processing a big data file, which is suitable for any application server to read and process the big data file, and improve data processing efficiency and success rate. Considering that the data volume of the large data file is huge, in order to avoid system crash caused by memory overflow, a plurality of data units to be processed of the large data file can be determined by configuring the data volume of single processing, and each data unit to be processed is sequentially read and executed. In addition, in consideration of the situation that partial data units may fail to be executed, accurate re-reading execution of data which fails to be read is achieved by configuring a data processing state mark for each data unit, data which is successfully processed is not re-processed in full, and flexibility and efficiency of processing files by the server are improved.
The technical solutions provided in the embodiments of the present application are described in detail below with specific embodiments. It should be noted that the technical solutions provided in the embodiments of the present application may include part or all of the following contents, and these specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.
Fig. 3 is a first flowchart illustrating a method for processing a big data file according to an embodiment of the present application. As shown in fig. 3, the method for processing a big data file of this embodiment includes the following steps:
step 201, receiving a data processing task and a data configuration file, wherein the data processing task is used for processing a big data text file, and the data configuration file is used for indicating a method for performing block processing, recording and updating on the big data text file.
In general, an application server receives a data configuration file first, receives a data processing task, and executes the data processing task according to the data configuration file.
The data processing task is used for processing big data text files, and the big data text files comprise files with the data file size of 1G or more, or files with the data line number of 1 ten thousand or more. The file type of the big data text file is not specifically limited in this embodiment, and may be any type of data file.
The data configuration file is a configuration file which is configured in the application server in advance and is related to data reading, processing, state recording and updating. And the application server performs file processing on the big data file based on the data configuration file.
Step 202, reading and executing the data to be processed of the big data text file according to the data configuration file, and simultaneously recording the data processing state until all the data to be processed in the big data text file are successfully processed.
Compared with the traditional file, the big data text file has huge data volume, so that the data to be processed of the big data text file can be read and executed according to the data configuration file based on the redia middleware to avoid memory overflow until all the data to be processed in the big data text file are successfully processed.
The redis middleware is a memory-based storage middleware and is used for databases, caches and message queues. The redis middleware is also called a redis database. The application server caches the data in the big data text file to the redis middleware, and reads the data to be processed in the redis middleware in sequence, so that the speed of reading the data in the redis is higher compared with that of a traditional database.
The redis middleware is further configured to store data processing states, i.e. data processing results, the data processing states including the following three states: unprocessed, successfully processed, and unsuccessfully processed. In the data processing process, the redis middleware is used for recording the data processing state of each data processing in the big data text file, and updating the data processing state when the data processing state changes, so that repeated reading and processing of data are avoided.
Optionally, in some embodiments, the data configuration file may be stored in the redis middleware.
The processing method for the big data file shown in this embodiment receives a data processing task and a data configuration file, where the data processing task is used for processing the big data text file, and the data configuration file is used for indicating a method for performing blocking processing, status recording, and updating on the big data text file. And reading and executing the data to be processed of the big data text file according to the data configuration file, and simultaneously recording the data processing state until all the data to be processed in the big data text file are successfully processed. According to the scheme, the data to be processed of the big data text file is sequentially subjected to blocking processing and state recording based on the data configuration file, so that the flexibility and the efficiency of processing the file by the server can be improved.
On the basis of the above embodiments, the following describes in detail the data processing procedure of the big data text file by introducing the content of the data configuration file.
Fig. 4 is a schematic flowchart of a second method for processing a big data file according to an embodiment of the present application. On the basis of the steps shown in fig. 3, as shown in fig. 4, the method for processing a big data text provided by this embodiment includes the following steps:
step 301, receiving a data processing task and a data configuration file.
In an optional embodiment of this embodiment, the data configuration file includes configuration information of the number of lines of single processing performed on the big data text file. Illustratively, the data configuration file includes a file single-processing line number configuration parameter pnum, for example, pnum is 1000, the file single-processing line number configuration parameter pnum may be stored in the redis middleware, and the configuration parameter may be dynamically and flexibly adjusted: (pnum, 1000).
And step 302, segmenting the data to be processed of the big data text file according to the data configuration file to obtain a plurality of data units to be processed.
Each unit of data to be processed comprises M lines of data to be processed, and M is a positive integer. M, pnum in step 301, can be flexibly configured according to the processing capability of the application server. Illustratively, the total number of rows of data in the big data text file indicated by the data processing task is allnum, for example, allnum equals 10000, and assuming that the number of rows pnum processed by the file once is 1000, there are 15000/1000 equals 10 units of data to be processed in the big data text file.
And step 303, sequentially reading and executing the plurality of data units to be processed until all the data units to be processed are successfully processed.
And sequentially reading each data unit to be processed from the redis middleware according to the sequence of the plurality of data units to be processed in the big data text file, executing each data unit to be processed, and simultaneously recording the data processing state of each data unit to be processed until all the data units to be processed are successfully processed.
In this embodiment, the big data text file is segmented by the configuration parameters of the single processing line number in the data configuration file to obtain a plurality of to-be-processed data units, and each to-be-processed data unit is read and executed from the redis middleware in sequence, so that memory overflow when the server processes the big data text file is avoided.
It should be noted that, in the process of sequentially reading and executing a plurality of data units to be processed by the server, there are two possible situations that data processing is successful or failed, and in order to avoid repeated processing, the processing capability of the server can be improved by setting a data processing flag. Wherein the data processing flag is used for recording the data processing state of each data unit.
On the basis of the above embodiments, optionally, the data configuration file further includes configuration information for recording a single processing state. Fig. 5 is a third schematic flowchart of a processing method for a big data file according to an embodiment of the present application. As shown in fig. 5, the method for processing a big data file provided in this embodiment includes the following steps:
step 401, receiving a data processing task and a data configuration file.
Step 402, segmenting the data to be processed of the big data text file according to the data configuration file to obtain a plurality of data units to be processed.
Step 401 and step 402 of this embodiment can refer to the above two embodiments, and are not described herein again.
And 403, acquiring an initial data processing mark key value pair corresponding to each data unit to be processed according to the configuration information of the single processing state recorded in the data configuration file.
The application server firstly analyzes the attribute of the big data text file in the data processing task, obtains the total line number allnum of the file data, then reads the single processing line number pnum in the data configuration file, and determines the data (data line) processing mark number flagnum.
Optionally, the following rule may be referred to for determining the flag number flagnum of data processing:
if the allnum% pnum is 0, flagnum is allnum/pnum;
if allnum% pnum > 0, flagnum ═ allnum/pnum + 1.
Wherein,% is modulus in the computer operator, if the modulus is 0, the modulus is integer multiple of the modulus, otherwise the modulus is not integer multiple of the modulus. And/is the rounding in the computer operator. Exemplarily, the total number 10000, 1000, and the total number 10000/1000, 10; 10002 for allnum and 1000 for pnum, 10000/1000+1 for flagnum and 11.
It should be understood that the number flagnum of data processing flags of the present embodiment is equal to the number M of the data units to be processed of the above embodiment.
The configuration information of recording the single processing state in the data configuration file defines the data processing mark key value pair of each segmented data unit to be processed, and for flagnum data processing mark key value pairs, the flag key value pairs can be stored in a key value pair object datamap, and the configuration rule is as follows:
datamap key-value pair object stores data: (pnum 0+1, N), (pnum 1+1, N), …, (pnum (flagnum-1) +1, N). For example, assuming that allnum is 10003, pnum is 5000, flagnum is 3, the initial data processing token key value pair is: (1, N), (5001, N), (10001, N), where N indicates that the data processing state is unprocessed or processing fails.
It should be understood that the initial data processing flag key value pair corresponding to each data unit to be processed indicates that the processing status of the data unit to be processed is unprocessed, i.e. denoted by the letter N.
Optionally, the initial data processing flag key value pairs corresponding to the multiple data units to be processed are stored in the database.
Optionally, the initial data processing flag key value pairs corresponding to the multiple data units to be processed are stored in the redis middleware. In an optional embodiment, taking the file name of the big data text file as a key, taking the initial data processing flag key value pairs corresponding to the multiple data units to be processed as values, and storing the initial data processing flag key value pairs corresponding to the multiple data units to be processed into the redis middleware, may be represented as: (text file name, datamap).
And 404, sequentially reading and executing a plurality of data units to be processed, and updating the data processing mark key value pair of each data unit to be processed until all the data units to be processed are successfully processed.
When the first round of data processing is carried out on the big data file, each data unit to be processed is read and executed in sequence directly according to the position of the data units to be processed in the big data file. If the data processing of the data unit to be processed is successful, updating the data processing mark key value pair, wherein the updated data processing mark key value pair indicates that the processing state of the data unit is successful; and if the data processing of the data unit to be processed fails or is abnormal, the data processing mark key value pair is not updated.
Illustratively, the large data file is divided into three to-be-processed data units, denoted as data unit A, B, C, and the initial data processing tag key value pairs of the three to-be-processed data units are denoted as: (1, N), (5001, N), (10001, N), after the first round of data processing, the data processing flag key value pair is updated to: (1, Y), (5001, N), (10001, Y), which indicates that the data processing of data units a and C succeeded and the data processing of data unit B failed.
After the first round of data processing is finished, if the big data file still has the data units which are not successfully processed, the circular processing flow is entered until all the data units are successfully processed. In the loop processing flow, the application server first needs to read the data processing flag key value pair corresponding to the data unit to be processed from the redis middleware, and determine whether the data unit to be processed needs to be executed.
The following scheme is described by taking the first to-be-processed data unit as an example:
in one possible case, if the data processing flag key value pair corresponding to the first to-be-processed data unit indicates that the processing status thereof is processing failure or unprocessed, the application server obtains and executes the to-be-processed data in the first to-be-processed data unit.
In one possible case, if the data processing flag key value pair corresponding to the first to-be-processed data unit indicates that the processing status thereof is successful, the application server skips the first to-be-processed data unit and reads the data processing flag key value pair of the next to-be-processed data unit.
The processing method for large data files shown in this embodiment relates to status flags and status updates for data processing, and in the loop processing process, it is first necessary to determine the processing status of a data unit according to the status flags of the data unit, and read or skip the data unit according to the processing status of the data unit, so that the flexibility and efficiency of processing files by an application server are increased, and the repeated processing of successfully processed data is avoided.
Fig. 6 is a fourth schematic flowchart of a processing method for a big data file according to an embodiment of the present application. The technical solution of the present application is generally described below with reference to fig. 6.
1. When the application server reads the big data file, the application server performs circular processing according to the processing line number pnum of a single time, and the mark n is the current processing time and is stored in the memory of the application server.
2. After the pnum line data is successfully processed each time, updating the data processing mark key value pair corresponding to the data unit in the redis, and updating (pnum (N-1) +1, N) to (pnum (N-1) +1, Y), wherein N represents that the processing state of the data unit pnum (N-1) +1 is unprocessed or failed, and Y represents that the processing state of the data unit pnum (N-1) +1 is successful.
3. When the application server repeatedly reads the file for processing, the processing state corresponding to the data unit can be firstly inquired, if the processing state is N, the data unit is processed, and if the processing state is Y, the data unit is skipped over, so that the data which is successfully processed can be prevented from being repeatedly processed.
The processing method of the big data file provided by the embodiment of the application has the following advantages:
when the application server processes a large data file, the number of lines processed at a time can be flexibly specified, the memory overflow caused by overlarge file or overlarge data volume can be avoided, when partial data processing fails, the failed data lines can be repeatedly read, the successfully processed data lines do not need to be repeatedly processed, the robustness and compatibility of the system can be improved, the flexibility of the system for processing the file can be improved, and the processing efficiency of the system can be improved.
The processing method of the big data file provided by the embodiment of the present application is described above, and the processing apparatus of the big data file provided by the embodiment of the present application will be described below.
In the embodiment of the present application, functional modules of the processing apparatus for big data files may be divided according to the method embodiments, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated module can be realized in a form of hardware or a form of a software functional module. It should be noted that, in the embodiment of the present application, the division of the module is schematic, and is only one logic function division, and there may be another division manner in actual implementation. The following description will be given by taking an example in which each functional module is divided by using a corresponding function.
Fig. 7 is a first schematic structural diagram of a device for processing a big data file according to an embodiment of the present application. As shown in fig. 7, the apparatus 700 for processing a large data file provided in this embodiment includes: a receiving module 701 and a processing module 702.
A receiving module 701, configured to receive a data processing task and a data configuration file, where the data processing task is used to process a big data text file, and the data configuration file is used to indicate a method for performing block processing, recording, and updating on the big data text file;
and the processing module 702 is configured to read and execute the to-be-processed data of the big data text file according to the data configuration file, and record a data processing state until all the to-be-processed data in the big data text file are successfully processed.
In an optional embodiment of this embodiment, the processing module 702 is configured to:
and reading the data to be processed of the big data text file from the redis middleware according to the data configuration file.
In an optional embodiment of this embodiment, the data configuration file includes configuration information of a number of lines of single processing performed on the big data text file; the processing module 702 is configured to:
segmenting the data to be processed of the big data text file according to the single processing line number to obtain a plurality of data units to be processed, wherein each data unit to be processed comprises M lines of data to be processed, and M is a positive integer;
and sequentially reading and executing the plurality of data units to be processed, and simultaneously recording the data processing state of each data unit to be processed until all the data units to be processed are successfully processed.
Fig. 8 is a schematic structural diagram of a processing apparatus for big data files according to an embodiment of the present application. On the basis of the apparatus shown in fig. 7, as shown in fig. 8, the apparatus 700 for processing a large data file provided in this embodiment further includes: a storage module 703 and an update module 704.
In an optional embodiment of this embodiment, the data profile includes configuration information that records a single processing state.
A processing module 702, configured to obtain, according to the configuration information for recording the single processing state, an initial data processing flag key value pair corresponding to each to-be-processed data unit, where the initial data processing flag key value pair corresponding to each to-be-processed data unit indicates that the processing state of the to-be-processed data unit is unprocessed;
the storage module 703 is configured to store the initial data processing flag key value pairs corresponding to the multiple data units to be processed in a database.
In an optional embodiment of this embodiment, the processing module 702 is configured to:
reading a data processing marking key value pair corresponding to a first data unit to be processed from a redis middleware, and if the data processing marking key value pair indicates that the processing state of the first data unit to be processed is processing failure or unprocessed, acquiring and executing data to be processed in the first data unit to be processed;
the first unit of data to be processed is any one of the plurality of units of data to be processed.
In an optional embodiment of this embodiment, the processing module 702 is configured to:
and if the data processing marking key value pair corresponding to the first data unit to be processed indicates that the processing state of the first data unit to be processed is successful, skipping the first data unit to be processed, and reading the data processing marking key value pair of the next data unit to be processed.
In an optional embodiment of this embodiment, if the first to-be-processed data unit is successfully processed, the updating module 704 is configured to update the data processing flag key value pair corresponding to the first to-be-processed data unit, and the updated data processing flag key value pair corresponding to the first to-be-processed data unit indicates that the processing state of the first to-be-processed data unit is successfully processed.
In an optional embodiment of this embodiment, the storage module 703 is configured to:
and storing the initial data processing marking key value pairs corresponding to the plurality of data units to be processed into the redis middleware by taking the file name of the big data text file as a key and the initial data processing marking key value pairs corresponding to the plurality of data units to be processed as values.
The processing apparatus for big data files provided in this embodiment may execute the technical solution of any of the above method embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.
Fig. 9 is a hardware structure diagram of an electronic device according to an embodiment of the present application. As shown in fig. 9, the electronic device 900 provided in this embodiment includes:
a memory 901;
a processor 902; and
a computer program;
the computer program is stored in the memory 901 and configured to be executed by the processor 902 to implement the technical solution of any one of the above method embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.
Alternatively, the memory 901 may be separate or integrated with the processor 902. When the memory 901 is a separate device from the processor 902, the electronic device 900 further comprises: a bus 903 for connecting the memory 901 and the processor 902.
The present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by the processor 902 to implement the technical solution of any one of the foregoing method embodiments.
An embodiment of the present application provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the computer program implements the technical solutions of any of the foregoing method embodiments.
An embodiment of the present application further provides a chip, including: a processing module and a communication interface, the processing module being capable of performing the solution of any of the method embodiments described above.
Further, the chip further includes a storage module (e.g., a memory), where the storage module is configured to store instructions, and the processing module is configured to execute the instructions stored in the storage module, and the execution of the instructions stored in the storage module causes the processing module to execute the technical solution of any one of the foregoing method embodiments.
It should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.
The memory may comprise a high-speed RAM memory, and may further comprise a non-volatile storage NVM, such as at least one disk memory, and may also be a usb disk, a removable hard disk, a read-only memory, a magnetic or optical disk, etc.
The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.
The storage medium may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an Application Specific Integrated Circuits (ASIC). Of course, the processor and the storage medium may reside as discrete components in an electronic device.
Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications or substitutions do not depart from the spirit and scope of the present disclosure as defined by the appended claims.

Claims (12)

1. A method for processing big data files is characterized by comprising the following steps:
receiving a data processing task and a data configuration file, wherein the data processing task is used for processing a big data text file, and the data configuration file is used for indicating a method for carrying out block processing, recording and updating on the big data text file;
and reading and executing the data to be processed of the big data text file according to the data configuration file, and simultaneously recording the data processing state until all the data to be processed in the big data text file are successfully processed.
2. The method according to claim 1, wherein the reading the data to be processed of the big data text file according to the data configuration file comprises:
and reading the data to be processed of the big data text file from the redis middleware according to the data configuration file.
3. The method of claim 1 or 2, wherein the data configuration file comprises configuration information for a single processing line number for the big data text file; the reading and executing the data to be processed of the big data text file according to the data configuration file, and simultaneously recording the data processing state until all the data to be processed in the big data text file are successfully processed, including:
segmenting the data to be processed of the big data text file according to the single processing line number to obtain a plurality of data units to be processed, wherein each data unit to be processed comprises M lines of data to be processed, and M is a positive integer;
and sequentially reading and executing the plurality of data units to be processed, and simultaneously recording the data processing state of each data unit to be processed until all the data units to be processed are successfully processed.
4. The method of claim 3, wherein the data profile includes configuration information that records a single processing state; the method further comprises the following steps:
acquiring an initial data processing marking key value pair corresponding to each data unit to be processed according to the configuration information for recording the single processing state, wherein the initial data processing marking key value pair corresponding to each data unit to be processed indicates that the processing state of the data unit to be processed is unprocessed;
and storing the initial data processing marking key value pairs corresponding to the plurality of data units to be processed into a database.
5. The method of claim 3, wherein sequentially reading and executing the plurality of data units to be processed comprises:
reading a data processing marking key value pair corresponding to a first data unit to be processed from a redis middleware, and if the data processing marking key value pair indicates that the processing state of the first data unit to be processed is processing failure or unprocessed, acquiring and executing data to be processed in the first data unit to be processed;
the first unit of data to be processed is any one of the plurality of units of data to be processed.
6. The method of claim 5, further comprising:
and if the data processing marking key value pair corresponding to the first data unit to be processed indicates that the processing state of the first data unit to be processed is successful, skipping the first data unit to be processed, and reading the data processing marking key value pair of the next data unit to be processed.
7. The method of claim 5, further comprising:
and if the first to-be-processed data unit is successfully processed, updating the data processing mark key value pair corresponding to the first to-be-processed data unit, wherein the updated data processing mark key value pair corresponding to the first to-be-processed data unit indicates that the processing state of the first to-be-processed data unit is successful.
8. The method according to claim 4, wherein storing the initial data processing tag key-value pairs corresponding to the plurality of data units to be processed in a database comprises:
and storing the initial data processing marking key value pairs corresponding to the plurality of data units to be processed into the redis middleware by taking the file name of the big data text file as a key and the initial data processing marking key value pairs corresponding to the plurality of data units to be processed as values.
9. A device for processing big data files, comprising:
the system comprises a receiving module, a processing module and a processing module, wherein the receiving module is used for receiving a data processing task and a data configuration file, the data processing task is used for processing a big data text file, and the data configuration file is used for indicating methods for processing, recording and updating the big data text file;
and the processing module is used for reading and executing the data to be processed of the big data text file according to the data configuration file, and simultaneously recording the data processing state until all the data to be processed in the big data text file are successfully processed.
10. An electronic device, comprising:
a memory;
a processor; and
a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of any one of claims 1-8.
11. A computer-readable storage medium, on which a computer program is stored, which computer program is executable by a processor to implement the method according to any one of claims 1-8.
12. A computer program product, characterized in that it comprises a computer program which, when being executed by a processor, carries out the method of any one of claims 1-8.
CN202111440255.6A 2021-11-30 2021-11-30 Method, device and equipment for processing big data file and storage medium Pending CN114116803A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111440255.6A CN114116803A (en) 2021-11-30 2021-11-30 Method, device and equipment for processing big data file and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111440255.6A CN114116803A (en) 2021-11-30 2021-11-30 Method, device and equipment for processing big data file and storage medium

Publications (1)

Publication Number Publication Date
CN114116803A true CN114116803A (en) 2022-03-01

Family

ID=80368360

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111440255.6A Pending CN114116803A (en) 2021-11-30 2021-11-30 Method, device and equipment for processing big data file and storage medium

Country Status (1)

Country Link
CN (1) CN114116803A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109033184A (en) * 2018-06-27 2018-12-18 中国建设银行股份有限公司 Data processing method and device
CN110838071A (en) * 2019-11-05 2020-02-25 泰康保险集团股份有限公司 Policy data processing method and device and server
CN112579698A (en) * 2020-12-02 2021-03-30 京东数字科技控股股份有限公司 Data synchronization method, device, gateway equipment and storage medium
CN112579606A (en) * 2020-12-24 2021-03-30 平安普惠企业管理有限公司 Workflow data processing method and device, computer equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109033184A (en) * 2018-06-27 2018-12-18 中国建设银行股份有限公司 Data processing method and device
CN110838071A (en) * 2019-11-05 2020-02-25 泰康保险集团股份有限公司 Policy data processing method and device and server
CN112579698A (en) * 2020-12-02 2021-03-30 京东数字科技控股股份有限公司 Data synchronization method, device, gateway equipment and storage medium
CN112579606A (en) * 2020-12-24 2021-03-30 平安普惠企业管理有限公司 Workflow data processing method and device, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
黄向平 等: "基于内存映射文件的复杂对象快速读取方法", 计算机技术与发展, no. 03, 31 March 2020 (2020-03-31) *

Similar Documents

Publication Publication Date Title
CN107832062B (en) Program updating method and terminal equipment
CN110113393B (en) Message pushing method and device, electronic equipment and medium
CN112597153B (en) Block chain-based data storage method, device and storage medium
CN112860953A (en) Data importing method, device, equipment and storage medium of graph database
CN111026765A (en) Dynamic processing method, equipment, storage medium and device for strictly balanced binary tree
US10761940B2 (en) Method, device and program product for reducing data recovery time of storage system
CN114116803A (en) Method, device and equipment for processing big data file and storage medium
CN110209347A (en) A kind of retrospective date storage method
CN113132241B (en) ACL template dynamic configuration method and device
CN111371818B (en) Data request verification method, device and equipment
CN111343105B (en) Cutoff identification method and device based on deep learning
CN110021166B (en) Method and device for processing user travel data and computing equipment
CN112527276A (en) Data updating method and device in visual programming tool and terminal equipment
CN112667631A (en) Method, device and equipment for automatically editing service field and storage medium
CN111984202A (en) Data processing method and device, electronic equipment and storage medium
CN111367750A (en) Exception handling method, device and equipment
CN117271440B (en) File information storage method, reading method and related equipment based on freeRTOS
CN115756998B (en) Cache data re-fetching mark verification method, device and system
CN114168275B (en) Task scheduling method, system, terminal device and storage medium
CN115103020B (en) Data migration processing method and device
CN115037799A (en) Current limiting method, apparatus, device and medium
CN116186046B (en) Device data modification method and device, computer device and storage medium
CN113364875B (en) Method, apparatus and computer readable storage medium for accessing data at block link points
CN113535714A (en) Data storage method, data reading method and computer equipment
CN108256989B (en) Data display method and system of fund preparation system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination