CN115421809A - File processing method and device, electronic equipment and computer readable medium - Google Patents

File processing method and device, electronic equipment and computer readable medium Download PDF

Info

Publication number
CN115421809A
CN115421809A CN202211038917.1A CN202211038917A CN115421809A CN 115421809 A CN115421809 A CN 115421809A CN 202211038917 A CN202211038917 A CN 202211038917A CN 115421809 A CN115421809 A CN 115421809A
Authority
CN
China
Prior art keywords
file
data
type
processing
metafile
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211038917.1A
Other languages
Chinese (zh)
Inventor
戚军臣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
CCB Finetech Co Ltd
Original Assignee
China Construction Bank Corp
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp, CCB Finetech Co Ltd filed Critical China Construction Bank Corp
Priority to CN202211038917.1A priority Critical patent/CN115421809A/en
Publication of CN115421809A publication Critical patent/CN115421809A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44505Configuring for program initiating, e.g. using registry, configuration files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a file processing method, a file processing device, electronic equipment and a computer readable medium, and relates to the technical field of big data processing. One embodiment of the method comprises: receiving a data file group and a corresponding metafile thereof; the data file group comprises a plurality of data files, and the metadata information of the data file group is recorded in the metadata file; calculating the type and the number of file processing processes needing to be started according to the metafile; generating an analysis code for processing the data file according to the metafile; and starting the file processing process to enable the file processing process to run the analysis codes. The implementation method can solve the technical problems of large development workload and low file processing efficiency.

Description

File processing method and device, electronic equipment and computer readable medium
Technical Field
The present invention relates to the field of big data processing technologies, and in particular, to a file processing method and apparatus, an electronic device, and a computer-readable medium.
Background
For data files delivered by an upstream system, the conventional general file processing method is to hard-code the data files according to specific file types and file structures.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art:
when the type, the number, the file structure and the data volume of the data files sent by the upstream system change, a program developer needs to recode, so that the development workload is large, and the file processing efficiency is low.
Disclosure of Invention
In view of this, embodiments of the present invention provide a file processing method and apparatus, so as to solve the technical problems of large development workload and low file processing efficiency.
To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a file processing method including:
receiving a data file group and a corresponding metafile thereof; wherein the data file group comprises a plurality of data files, and the metadata information of the data file group is recorded in the metadata file;
calculating the type and the number of file processing processes needing to be started according to the metafile;
generating an analysis code for processing the data file according to the metafile;
and starting the file processing process to enable the file processing process to run the analysis codes.
Optionally, the metadata information of the data file group includes at least one of: the number of data files, the total data size, the file type, the fixed-length identification, the row and column separators, the field number, the field sequence and the field data type.
Optionally, calculating the type and number of file processing processes to be started according to the metafile includes:
judging whether the contents of the plurality of data files are matched with the contents of the metafile or not;
and if so, calculating the type and the number of the file processing processes needing to be started according to the number of the data files, the total data volume and the file type.
Optionally, calculating the type and the number of the file processing processes to be started according to the number, the total data size, and the file type of the data files, including:
and inputting the number, the total data volume and the file type of the data files into a trained process calculation model, thereby outputting N M-type processes needing to be started and the number of the processes needed by each data file.
Optionally, before receiving the data file group and the corresponding metafile, the method further includes:
and training a neural network model in a supervision manner based on the number, total data volume and file type of the data files of the sample data file group and the type and number of file processing processes needing to be started for processing the data files of the sample data file group, so as to obtain a process calculation model through training.
Optionally, generating, according to the metafile, an analysis code for processing the data file, including:
and generating an analysis code for processing the data file according to the file type and the field sequence of the data file.
Optionally, starting the file processing process to make the file processing process run the parsing code, including:
and starting N M-type file processing processes, and respectively allocating a file processing process to each data file according to the number of processes required by each data file so as to enable the file processing processes to operate the analysis codes.
In addition, according to another aspect of an embodiment of the present invention, there is provided a file processing apparatus including:
the receiving module is used for receiving the data file group and the corresponding metafile; the data file group comprises a plurality of data files, and the metadata information of the data file group is recorded in the metadata file;
the computing module is used for computing the type and the number of the file processing processes needing to be started according to the metafile;
the generating module is used for generating an analysis code for processing the data file according to the metafile;
and the analysis module is used for starting the file processing process so as to enable the file processing process to run the analysis codes.
Optionally, the metadata information of the data file group includes at least one of: the number of data files, the total data size, the file type, the fixed-length identification, the row and column separators, the field number, the field sequence and the field data type.
Optionally, the computing module is further configured to:
judging whether the contents of the plurality of data files are matched with the contents of the metafile or not;
and if so, calculating the type and the number of the file processing processes needing to be started according to the number of the data files, the total data volume and the file type.
Optionally, the computing module is further configured to:
and inputting the number, the total data volume and the file type of the data files into a trained process calculation model, thereby outputting N M-type processes to be started and the number of processes required by each data file.
Optionally, a training module is further included for:
and training a neural network model in a supervision manner based on the number, total data volume and file type of the data files of the sample data file group and the type and number of file processing processes needing to be started for processing the data files of the sample data file group, so as to obtain a process calculation model through training.
Optionally, the generating module is further configured to:
and generating an analysis code for processing the data file according to the file type and the field sequence of the data file.
Optionally, the parsing module is further configured to:
and starting N M-type file processing processes, and respectively allocating a file processing process to each data file according to the number of processes required by each data file so as to enable the file processing processes to operate the analysis codes.
According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including:
one or more processors;
a storage device for storing one or more programs,
when the one or more programs are executed by the one or more processors, the one or more processors implement the method of any of the embodiments described above.
According to another aspect of the embodiments of the present invention, there is also provided a computer readable medium, on which a computer program is stored, which when executed by a processor implements the method of any of the above embodiments.
According to another aspect of the embodiments of the present invention, there is also provided a computer program product comprising a computer program which, when executed by a processor, implements the method of any of the above embodiments.
One embodiment of the above invention has the following advantages or benefits: the technical means that the data file group and the corresponding metafile thereof are received, then the type and the number of the file processing processes needing to be started are calculated according to the metafile, the analysis codes for processing the data file are generated, and finally the file processing processes are started to run the analysis codes are adopted, so the technical problems of large development workload and low file processing efficiency in the prior art are solved. The embodiment of the invention can adaptively process data files of various types, structures and sizes without manual intervention, thereby avoiding a program developer from repeatedly developing and processing programs of the files and obviously improving the file processing efficiency.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts. Wherein:
fig. 1 is a schematic diagram of a main flow of a document processing method according to an embodiment of the present invention;
FIG. 2 is a schematic view of a main flow of a document processing method according to a referential embodiment of the present invention;
FIG. 3 is a schematic view of a main flow of a document processing method according to another referenceable embodiment of the present invention;
FIG. 4 is a schematic view of a main flow of a document processing method according to still another referential embodiment of the present invention;
FIG. 5 is a schematic diagram of the main blocks of a document processing apparatus according to an embodiment of the present invention;
FIG. 6 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
fig. 7 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
According to the technical scheme, the data acquisition, storage, use, processing and the like meet relevant regulations of national laws and regulations.
Fig. 1 is a schematic diagram of a main flow of a file processing method according to an embodiment of the present invention. As an embodiment of the present invention, as shown in fig. 1, the file processing method may include:
step 101, receiving a data file group and a corresponding metafile thereof; the data file group comprises a plurality of data files, and the metadata information of the data file group is recorded in the metadata file.
Firstly, receiving data file groups and corresponding metafiles thereof transmitted by an upstream system, wherein each data file group comprises 1-n data files, each data file group has one corresponding metafile, and the metafile records the metafile information of the data file group.
Optionally, the metadata information of the data file group includes at least one of: the number of data files, the total data amount, the file type, the fixed-length identifier, the row and column separator, the field number, the field sequence, the field data type (such as date, character string, numerical value and the like) and other data file structure information. Optionally, the file type of each data file in the data file group is the same, so that only one data file type is recorded in the metafile. The file type may be a structured file composed of lines and columns, such as csv, dat, txt, and the like, or a semi-structured file such as json, xml, and the like.
And 102, calculating the type and the number of the file processing processes needing to be started according to the metafile.
In this step, the type and number of file processing processes that need to be started are calculated from the metadata information of the data file group recorded in the metafile. The type of the file processing process is related to the file type of the data file, such as: the process type of processing xml files, the process type of processing json files, the process type of processing dat files, the process type of processing csv files, the process type of processing txt files and the like.
Optionally, step 102 may comprise: judging whether the contents of the plurality of data files are matched with the contents of the metafile or not; and if so, calculating the type and the number of the file processing processes needing to be started according to the number of the data files, the total data volume and the file type. In order to improve file processing efficiency and file processing accuracy, it is necessary to check each data file in the data file group, specifically, analyze the content of the metafile, determine whether the content of each data file in the data file group matches with the content of the metafile, such as whether the number of data files, total data amount, file type, fixed length identifier, row and column separator, field number, field sequence, field data type, etc. match the content recorded in the metafile, and if so, calculate the type and number of file processing processes that need to be started according to the number of data files, total data amount, and file type. If not, the unmatched information is written into the feedback file to avoid dirty data polluting the downstream system.
Optionally, calculating the type and the number of the file processing processes to be started according to the number of the data files, the total data size, and the file type may include: and inputting the number, the total data volume and the file type of the data files into a trained process calculation model, thereby outputting N M-type processes needing to be started and the number of the processes needed by each data file. In order to accurately calculate the type and the number of the file processing processes to be started, a trained process calculation model can be used for calculation, specifically, three characteristics of the number of the data files, the total data volume and the file type are input into the process calculation model, and the process calculation model outputs the type and the number of the file processing processes to be started (such as N M-type processes) and the number of processes required by each data file. Alternatively, the process computation model may be trained using a neural network model, such as a fully-connected neural network (FCN).
And 103, generating an analysis code for processing the data file according to the metafile.
Optionally, step 103 may comprise: and generating an analysis code for processing the data file according to the file type and the field sequence of the data file. And extracting the file type and the field sequence of the data file from the metafile, and then generating an analysis code for processing the data file according to the file type and the field sequence of the data file.
And 104, starting the file processing process to enable the file processing process to run the analysis code.
After generating the analysis code for processing the data file, starting N M-type file processing processes, so that these file processing processes run the analysis code, and analyze and convert the data file into a data structure form required by a downstream module, such as storing the data file into a relational database, generating a kafka message, generating a java object, and the like. It should be noted that these file processing processes are executed in parallel, which can improve the file processing efficiency.
Optionally, step 104 may include: and starting N M-type file processing processes, and respectively allocating a file processing process to each data file according to the number of processes required by each data file so as to enable the file processing processes to operate the analysis codes. In the embodiment of the present invention, N M-type file processing processes are started according to the calculation result of step 102, and a corresponding number of M-type file processing processes are respectively allocated to each data file according to the number of processes required by the data file, so that the file processing processes run analysis codes, thereby completing the processing of the data file.
According to the various embodiments described above, it can be seen that the technical problems of large development workload and low file processing efficiency in the prior art are solved by the technical means of receiving the data file group and the corresponding metafile thereof, then calculating the type and the number of the file processing processes to be started according to the metafile, generating the analysis codes for processing the data file, and finally starting the file processing processes to run the analysis codes. The embodiment of the invention can adaptively process data files of various types, structures and sizes without manual intervention, thereby avoiding a program developer from repeatedly developing and processing programs of the files and obviously improving the file processing efficiency.
Fig. 2 is a schematic view of a main flow of a document processing method according to a referential embodiment of the present invention. As still another embodiment of the present invention, as shown in fig. 2, the file processing method may include:
step 201, acquiring the number, total data size and file type of the data files of the sample data file group, and the type and number of file processing processes to be started for processing the data files of the sample data file group.
202, training a neural network model in a supervision mode based on the number, total data volume and file type of the data files of the sample data file group and the type and number of file processing processes needing to be started for processing the data files of the sample data file group, so as to obtain a process calculation model through training.
The process calculation model can be trained by adopting a neural network model, specifically, a plurality of sample data file groups are obtained, each sample data file group comprises a plurality of data files, three characteristics of the number of the data files, the total data amount and the file type of each sample data file group are extracted, tags are marked at the same time, the tags are the types and the number of file processing processes for processing the sample data file groups, and then the neural network model is trained in a supervision mode based on the extracted characteristics and the marked tags, so that the process calculation model is obtained through training.
Optionally, the model also needs to be evaluated and fine-tuned to obtain the final process calculation model. Optionally, the neural network model may be a fully connected neural network (FCN), thereby improving reliability and robustness of the model.
Step 203, receiving a data file group and a corresponding metafile thereof; the data file group comprises a plurality of data files, and the metadata information of the data file group is recorded in the metadata file.
Each data file group comprises 1-n data files, each data file group is provided with a corresponding metafile, and the metafile records the metadata information of the data file group. Optionally, the metadata information of the data file group includes data file structure information such as the number of data files, total data amount, file type, fixed length identifier, row and column separator, field number, field sequence, field data type (such as date, character string, numerical value, etc.). Moreover, the file type of each data file in the data file group is the same.
And step 204, calculating the type and the number of the file processing processes needing to be started according to the metafile.
The type of file handling process is related to the file type of the data file, such as: the process type of processing xml files, the process type of processing json files, the process type of processing dat files, the process type of processing csv files, the process type of processing txt files and the like.
Step 205, generating an analysis code for processing the data file according to the metafile.
Specifically, the file type and the field sequence of the data file are extracted from the metafile, and then an analysis code for processing the data file is generated according to the file type and the field sequence of the data file.
Step 206, starting the file processing process, so that the file processing process runs the analysis code.
After the analysis codes for processing the data files are generated, file processing processes are started according to the calculation result of the step 204, so that the file processing processes run the analysis codes and analyze and convert the data files into a data structure form required by a downstream module.
In addition, in one embodiment of the present invention, the document processing method is described in detail in the above document processing method, and therefore, the repeated content will not be described.
Fig. 3 is a schematic diagram of a main flow of a document processing method according to another referential embodiment of the present invention. As another embodiment of the present invention, as shown in fig. 3, the file processing method may include:
step 301, receiving a data file group and a corresponding metafile; the data file group comprises a plurality of data files, and the metadata information of the data file group is recorded in the metadata file.
Receiving data file groups and corresponding metafiles thereof transmitted by an upstream system, wherein each data file group comprises 1-n data files, each data file group has one corresponding metafile, and the metafile records the metadata information of the data file group.
Step 302, judging whether the contents of the plurality of data files are matched with the contents of the metafile; if yes, go to step 303; if not, go to step 306.
In order to improve the file processing efficiency and the file processing accuracy and avoid polluting a downstream system by dirty data, each data file in the data file group needs to be checked first, and the type and the number of file processing processes needing to be started are calculated after the check is passed.
Specifically, the content of the metafile is analyzed, whether the content of each data file in the data file group is matched with the content of the metafile is judged, for example, whether the number, the total data amount, the file type, the fixed-length identifier, the row and column separator, the field number, the field sequence, the field data type and the like of the data files accord with the content recorded in the metafile, and if so, the type and the number of the file processing processes needing to be started are calculated according to the number, the total data amount and the file type of the data files. If not, the unmatched information is written into the feedback file.
Step 303, calculating the type and number of the file processing processes to be started according to the number, total data volume and file type of the data files.
And step 304, generating an analysis code for processing the data file according to the file type and the field sequence of the data file.
Step 305, starting the file processing process, so that the file processing process runs the analysis code.
Step 306, write the unmatched information into the feedback file.
In addition, in another embodiment of the present invention, the detailed implementation of the document processing method is described in detail above, and therefore, the repeated content is not described herein.
Fig. 4 is a schematic view of a main flow of a document processing method according to still another referential embodiment of the present invention. As still another embodiment of the present invention, as shown in fig. 4, the file processing method may include:
step 401, receiving a data file group and a corresponding metafile thereof; the data file group comprises a plurality of data files, and the metadata information of the data file group is recorded in the metadata file.
Step 402, judging whether the contents of the plurality of data files are matched with the contents of the metafile; if yes, go to step 403; if not, go to step 407.
And 403, inputting the number, total data volume and file type of the data files into the trained process calculation model, so as to output the N M-type processes to be started and the number of processes required by each data file.
Step 404, generating an analysis code for processing the data file according to the file type and the field sequence of the data file.
Step 405, starting N M-type file processing processes.
And 406, respectively allocating a file processing process to each data file according to the number of processes required by each data file, so that the file processing process runs the analysis code.
And starting N M-type file processing processes according to the calculation result of the step 403, and respectively allocating a corresponding number of M-type file processing processes to each data file according to the number of processes required by the data file, so that the file processing processes run analysis codes, thereby completing the processing of the data file.
Step 407, write the unmatched information into the feedback file.
In addition, in another embodiment of the present invention, the document processing method is described in detail in the above document processing method, and therefore, the repeated content will not be described again.
Fig. 5 is a schematic diagram of main blocks of a document processing apparatus according to an embodiment of the present invention. As shown in fig. 5, the document processing apparatus 500 includes a receiving module 501, a calculating module 502, a generating module 503, and an analyzing module 504; the receiving module 501 is configured to receive a data file group and a corresponding metafile thereof; the data file group comprises a plurality of data files, and the metadata information of the data file group is recorded in the metadata file; the calculating module 502 is configured to calculate the type and number of the file processing processes to be started according to the metafile; the generating module 503 is configured to generate an analysis code for processing the data file according to the metafile; the parsing module 504 is configured to start the file processing process, so that the file processing process runs the parsing code.
Optionally, the metadata information of the data file group includes at least one of: the number of data files, the total data size, the file type, the fixed-length identification, the row and column separators, the field number, the field sequence and the field data type.
Optionally, the calculating module 502 is further configured to:
judging whether the contents of the plurality of data files are matched with the contents of the metafile or not;
if yes, calculating the type and the number of the file processing processes needing to be started according to the number of the data files, the total data volume and the file types.
Optionally, the calculating module 502 is further configured to:
and inputting the number, the total data volume and the file type of the data files into a trained process calculation model, thereby outputting N M-type processes to be started and the number of processes required by each data file.
Optionally, a training module is further included for:
and training a neural network model in a supervision manner based on the number, the total data volume and the file type of the data files of the sample data file group and the type and the number of file processing processes which need to be started for processing the data files of the sample data file group, thereby training to obtain a process calculation model.
Optionally, the generating module 503 is further configured to:
and generating an analysis code for processing the data file according to the file type and the field sequence of the data file.
Optionally, the parsing module 504 is further configured to:
and starting N M-type file processing processes, and respectively allocating a file processing process to each data file according to the number of processes required by each data file so as to enable the file processing processes to operate the analysis codes.
It should be noted that the embodiment of the document processing apparatus according to the present invention has been described in detail in the document processing method, and therefore, the repeated description is not repeated here.
Fig. 6 shows an exemplary system architecture 600 of a file processing method or a file processing apparatus to which an embodiment of the present invention can be applied.
As shown in fig. 6, the system architecture 600 may include terminal devices 601, 602, 603, a network 604, and a server 605. The network 604 serves to provide a medium for communication links between the terminal devices 601, 602, 603 and the server 605. Network 604 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.
A user may use the terminal devices 601, 602, 603 to interact with the server 605 via the network 604 to receive or send messages or the like. The terminal devices 601, 602, 603 may have installed thereon various communication client applications, such as shopping applications, web browser applications, search applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).
The terminal devices 601, 602, 603 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 605 may be a server providing various services, such as a background management server (for example only) providing support for shopping websites browsed by users using the terminal devices 601, 602, 603. The background management server can analyze and process the received data such as the article information query request and feed back the processing result to the terminal equipment.
It should be noted that the file processing method provided by the embodiment of the present invention is generally executed by the server 605, and accordingly, the file processing apparatus is generally disposed in the server 605.
It should be understood that the number of terminal devices, networks, and servers in fig. 6 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 7, shown is a block diagram of a computer system 700 suitable for use with a terminal device implementing embodiments of the present invention. The terminal device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU) 701, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data necessary for the operation of the system 700 are also stored. The CPU 701, ROM 702, and RAM703 are connected to each other via a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
The following components are connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711. The computer program performs the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 701.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer programs according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a receiving module, a computing module, a generating module, and a parsing module, where the names of the modules do not in some cases constitute limitations on the modules themselves.
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not assembled into the device. The computer readable medium carries one or more programs which, when executed by a device, implement the method of: receiving a data file group and a corresponding metafile thereof; wherein the data file group comprises a plurality of data files, and the metadata information of the data file group is recorded in the metadata file; calculating the type and the number of file processing processes needing to be started according to the metafile; generating an analysis code for processing the data file according to the metafile; and starting the file processing process to enable the file processing process to run the analysis codes.
As another aspect, an embodiment of the present invention further provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the computer program implements the method described in any of the above embodiments.
According to the technical scheme of the embodiment of the invention, the technical means that the data file group and the corresponding metafile thereof are received, then the type and the number of the file processing processes needing to be started are calculated according to the metafile, the analysis codes for processing the data file are generated, and finally the file processing processes are started to run the analysis codes are adopted, so that the technical problems of large development workload and low file processing efficiency in the prior art are solved. The embodiment of the invention can adaptively process data files of various types, structures and sizes without manual intervention, thereby avoiding a program developer from repeatedly developing and processing programs of the files and obviously improving the file processing efficiency.
The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may occur depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (17)

1. A file processing method, comprising:
receiving a data file group and a corresponding metafile thereof; the data file group comprises a plurality of data files, and the metadata information of the data file group is recorded in the metadata file;
calculating the type and the number of the file processing processes needing to be started according to the metafile;
generating an analysis code for processing the data file according to the metafile;
and starting the file processing process to enable the file processing process to run the analysis codes.
2. The method of claim 1, wherein the metadata information for the set of data files comprises at least one of: the number of data files, total data volume, file type, fixed-length identification, row and column separators, field number, field sequence and field data type.
3. The method of claim 2, wherein calculating the type and number of file processing processes to be started based on the metafile comprises:
judging whether the contents of the plurality of data files are matched with the contents of the metafile or not;
and if so, calculating the type and the number of the file processing processes needing to be started according to the number of the data files, the total data volume and the file type.
4. The method according to claim 3, wherein calculating the type and number of file processing processes to be started according to the number of the data files, the total data volume and the file type comprises:
and inputting the number, the total data volume and the file type of the data files into a trained process calculation model, thereby outputting N M-type processes to be started and the number of processes required by each data file.
5. The method of claim 4, wherein prior to receiving the set of data files and their corresponding metafiles, further comprising:
and training a neural network model in a supervision manner based on the number, total data volume and file type of the data files of the sample data file group and the type and number of file processing processes needing to be started for processing the data files of the sample data file group, so as to obtain a process calculation model through training.
6. The method of claim 2, wherein generating parsing code for processing the data file from the metafile comprises:
and generating an analysis code for processing the data file according to the file type and the field sequence of the data file.
7. The method of claim 4, wherein initiating the file handling process to cause the file handling process to run the parsed code comprises:
and starting N M-type file processing processes, and respectively allocating a file processing process to each data file according to the number of processes required by each data file so as to enable the file processing processes to operate the analysis codes.
8. A document processing apparatus, characterized by comprising:
the receiving module is used for receiving the data file group and the corresponding metafile; wherein the data file group comprises a plurality of data files, and the metadata information of the data file group is recorded in the metadata file;
the computing module is used for computing the type and the number of the file processing processes needing to be started according to the metafile;
the generating module is used for generating an analysis code for processing the data file according to the metafile;
and the analysis module is used for starting the file processing process so as to enable the file processing process to run the analysis codes.
9. The apparatus of claim 8, wherein the metadata information of the set of data files comprises at least one of: the number of data files, the total data size, the file type, the fixed-length identification, the row and column separators, the field number, the field sequence and the field data type.
10. The apparatus of claim 9, wherein the computing module is further configured to:
judging whether the contents of the plurality of data files are matched with the contents of the metafile or not;
and if so, calculating the type and the number of the file processing processes needing to be started according to the number of the data files, the total data volume and the file type.
11. The apparatus of claim 10, wherein the computing module is further configured to:
and inputting the number, the total data volume and the file type of the data files into a trained process calculation model, thereby outputting N M-type processes to be started and the number of processes required by each data file.
12. The apparatus of claim 11, further comprising a training module to:
and training a neural network model in a supervision manner based on the number, total data volume and file type of the data files of the sample data file group and the type and number of file processing processes needing to be started for processing the data files of the sample data file group, so as to obtain a process calculation model through training.
13. The apparatus of claim 9, wherein the generating module is further configured to:
and generating an analysis code for processing the data file according to the file type and the field sequence of the data file.
14. The apparatus of claim 11, wherein the parsing module is further configured to:
and starting N M-type file processing processes, and respectively allocating a file processing process to each data file according to the number of processes required by each data file so as to enable the file processing processes to operate the analysis codes.
15. An electronic device, comprising:
one or more processors;
a storage device to store one or more programs,
the one or more processors, when the one or more programs are executed by the one or more processors, implement the method of any of claims 1-7.
16. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-7.
17. A computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements the method of any one of claims 1-7.
CN202211038917.1A 2022-08-29 2022-08-29 File processing method and device, electronic equipment and computer readable medium Pending CN115421809A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211038917.1A CN115421809A (en) 2022-08-29 2022-08-29 File processing method and device, electronic equipment and computer readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211038917.1A CN115421809A (en) 2022-08-29 2022-08-29 File processing method and device, electronic equipment and computer readable medium

Publications (1)

Publication Number Publication Date
CN115421809A true CN115421809A (en) 2022-12-02

Family

ID=84200818

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211038917.1A Pending CN115421809A (en) 2022-08-29 2022-08-29 File processing method and device, electronic equipment and computer readable medium

Country Status (1)

Country Link
CN (1) CN115421809A (en)

Similar Documents

Publication Publication Date Title
CN109871311B (en) Method and device for recommending test cases
CN112527649A (en) Test case generation method and device
CN110689268A (en) Method and device for extracting indexes
CN113760948A (en) Data query method and device
CN111125064A (en) Method and device for generating database mode definition statement
CN110807311A (en) Method and apparatus for generating information
CN114064925A (en) Knowledge graph construction method, data query method, device, equipment and medium
CN113590756A (en) Information sequence generation method and device, terminal equipment and computer readable medium
CN114036921A (en) Policy information matching method and device
CN112559024A (en) Method and device for generating transaction code change list
CN109740130B (en) Method and device for generating file
CN110110032B (en) Method and device for updating index file
CN107256244B (en) Data processing method and system
CN115421809A (en) File processing method and device, electronic equipment and computer readable medium
CN111737571B (en) Searching method and device and electronic equipment
CN112579673A (en) Multi-source data processing method and device
CN113485763A (en) Data processing method and device, electronic equipment and computer readable medium
CN113393288A (en) Order processing information generation method, device, equipment and computer readable medium
CN113792232A (en) Page feature calculation method, device, electronic equipment, medium and program product
CN113641633A (en) File processing method, file processing device, electronic equipment, medium and computer program
CN113704222A (en) Method and device for processing service request
CN110851438A (en) Database index optimization suggestion and verification method and device
CN111178014A (en) Method and device for processing business process
CN111127077A (en) Recommendation method and device based on stream computing
CN112527880B (en) Method, device, equipment and medium for collecting metadata information of big data cluster

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination