CN112231292A - File processing method and device, storage medium and computer equipment - Google Patents

File processing method and device, storage medium and computer equipment Download PDF

Info

Publication number
CN112231292A
CN112231292A CN202010930089.7A CN202010930089A CN112231292A CN 112231292 A CN112231292 A CN 112231292A CN 202010930089 A CN202010930089 A CN 202010930089A CN 112231292 A CN112231292 A CN 112231292A
Authority
CN
China
Prior art keywords
file
original
information
files
data warehouse
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010930089.7A
Other languages
Chinese (zh)
Inventor
郑艳涛
周一帆
庞少强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dt Dream Technology Co Ltd
Original Assignee
Hangzhou Dt Dream Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dt Dream Technology Co Ltd filed Critical Hangzhou Dt Dream Technology Co Ltd
Priority to CN202010930089.7A priority Critical patent/CN112231292A/en
Publication of CN112231292A publication Critical patent/CN112231292A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems

Abstract

The invention provides a file processing method, a file processing device, a storage medium and computer equipment, wherein a file is a file in a data warehouse tool, the data warehouse tool comprises a node of a target type, and the method comprises the steps of acquiring a mirror image file generated by the node of the target type; analyzing the mirror image file by combining directory information of the data warehouse tool to obtain information of an original file to which the mirror image file belongs; and combining the original files according to the information of the original files and a preset rule. The invention can realize the automatic identification of the files generated during the operation of the data warehouse tool and can combine the files in time.

Description

File processing method and device, storage medium and computer equipment
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a file processing method and apparatus, a storage medium, and a computer device.
Background
The data warehouse tool can map the Structured data file into a database table, provide a simple SQL (Structured Query Language) function, and convert SQL statements into distributed computing tasks to be executed.
Data warehouse tools generally run on a Hadoop distributed file system, and a large number of small files are generated in the running process. The generation of small files may come from: the data source is generated when the data source is imported into the data warehouse tool or when offline calculations are made by reading the data table of the data warehouse tool. Usually, for a single file, one computing process or thread is required to be occupied during computing, and a large number of small files consume more computing resources, so that it is necessary to process files generated in the operation process of the data warehouse tool.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, the invention aims to provide a file processing method, a file processing device, a storage medium and computer equipment, which can automatically identify files generated when a data warehouse tool operates and can combine the files in time.
To achieve the above object, an embodiment of a first aspect of the present invention provides a file processing method, where the file is a file in a data warehouse tool, and the data warehouse tool includes a node of a target type, and the method includes: acquiring a mirror image file generated by the node of the target type; analyzing the image file by combining the directory information of the data warehouse tool to obtain the information of the original file to which the image file belongs; and combining the original files according to the information of the original files by combining a preset rule.
In the file processing method provided by the embodiment of the first aspect of the present invention, a mirror image file generated by a node of a target type is obtained; analyzing the mirror image file by combining directory information of the data warehouse tool to obtain information of an original file to which the mirror image file belongs; according to the information of the original files, the original files are combined according to the preset rules, the files generated when the data warehouse tool operates can be automatically identified, and the files are combined in time.
To achieve the above object, a file processing apparatus according to a second aspect of the present invention is a file in a data warehouse tool, where the data warehouse tool includes a node of a target type, and the file processing apparatus includes: the acquisition module is used for acquiring the mirror image file generated by the node of the target type; the analysis module is used for analyzing the image file to obtain the information of the original file to which the image file belongs by combining the directory information of the data warehouse tool; and the merging processing module is used for merging the original files by combining a preset rule according to the information of the original files.
The file processing apparatus provided by the embodiment of the second aspect of the present invention obtains the image file generated by the node of the target type; analyzing the mirror image file by combining directory information of the data warehouse tool to obtain information of an original file to which the mirror image file belongs; according to the information of the original files, the original files are combined according to the preset rules, the files generated when the data warehouse tool operates can be automatically identified, and the files are combined in time.
To achieve the above object, a non-transitory computer-readable storage medium according to a third embodiment of the present invention is a non-transitory computer-readable storage medium, when instructions in the storage medium are executed by a processor of a mobile terminal, the instructions enabling the mobile terminal to execute a file processing method, the method including: the embodiment of the first aspect of the invention provides a file processing method.
In a non-transitory computer-readable storage medium according to a third embodiment of the present invention, an image file generated by a node of a target type is obtained; analyzing the mirror image file by combining directory information of the data warehouse tool to obtain information of an original file to which the mirror image file belongs; according to the information of the original files, the original files are combined according to the preset rules, the files generated when the data warehouse tool operates can be automatically identified, and the files are combined in time.
To achieve the above object, a computer program product according to a fourth aspect of the present invention is a computer program product, when instructions of the computer program product are executed by a processor, for executing a file processing method, where the file is a file in a data warehouse tool, and the data warehouse tool includes a node of a target type, and the method includes: acquiring a mirror image file generated by the node of the target type; analyzing the image file by combining the directory information of the data warehouse tool to obtain the information of the original file to which the image file belongs; and combining the original files according to the information of the original files by combining a preset rule.
In a computer program product according to a fourth aspect of the present invention, an image file generated by a node of a target type is obtained; analyzing the mirror image file by combining directory information of the data warehouse tool to obtain information of an original file to which the mirror image file belongs; according to the information of the original files, the original files are combined according to the preset rules, the files generated when the data warehouse tool operates can be automatically identified, and the files are combined in time.
The fifth aspect of the present invention further provides a computer device, which includes a housing, a processor, a memory, a circuit board, and a power circuit, wherein the circuit board is disposed inside a space enclosed by the housing, and the processor and the memory are disposed on the circuit board; the power supply circuit is used for supplying power to each circuit or device of the computer equipment; the memory is used for storing executable program codes; the processor executes a program corresponding to the executable program code by reading the executable program code stored in the memory, for performing: acquiring a mirror image file generated by the node of the target type; analyzing the image file by combining the directory information of the data warehouse tool to obtain the information of the original file to which the image file belongs; and combining the original files according to the information of the original files by combining a preset rule.
In the computer device according to the fifth aspect of the present invention, the image file generated by the target type node is obtained; analyzing the mirror image file by combining directory information of the data warehouse tool to obtain information of an original file to which the mirror image file belongs; according to the information of the original files, the original files are combined according to the preset rules, the files generated when the data warehouse tool operates can be automatically identified, and the files are combined in time.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flowchart illustrating a document processing method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an application scenario according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a document processing apparatus according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a document processing apparatus according to another embodiment of the present invention;
fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention. On the contrary, the embodiments of the invention include all changes, modifications and equivalents coming within the spirit and terms of the claims appended hereto.
Fig. 1 is a flowchart illustrating a file processing method according to an embodiment of the present invention.
The present embodiment is exemplified in a case where the file processing method is configured as a file processing apparatus.
The file processing method in this embodiment may be configured in a file processing apparatus, and the file processing apparatus may be configured in a server, or may also be configured in an electronic device, which is not limited in this embodiment of the present application.
The present embodiment takes the case where the file processing method is configured in the electronic device as an example.
The file is a file in a data warehouse tool, the data warehouse tool comprises a target type node, and the target type node can be a metadata management center NameNode node.
It should be noted that the execution main body in the embodiment of the present application may be, for example, a Central Processing Unit (CPU) in a server or an electronic device in terms of hardware, and may be, for example, a related background service in the server or the electronic device in terms of software, which is not limited to this.
The data warehouse tool can map the Structured data file into a database table, provide a simple SQL (Structured Query Language) function, and convert SQL statements into distributed computing tasks to be executed.
Data warehouse tools generally run on a Hadoop distributed file system, and a large number of small files are generated in the running process. The generation of small files may come from: the data source is generated when the data source is imported into the data warehouse tool or when offline calculations are made by reading the data table of the data warehouse tool. Usually, for a single file, one computing process or thread is required to be occupied during computing, and a large number of small files consume more computing resources, so that it is necessary to process files generated in the operation process of the data warehouse tool.
In order to solve the above technical problem, an embodiment of the present invention provides a file processing method, where an image file generated by a node of a target type is obtained; analyzing the mirror image file by combining directory information of the data warehouse tool to obtain information of an original file to which the mirror image file belongs; according to the information of the original files, the original files are combined according to the preset rules, the files generated when the data warehouse tool operates can be automatically identified, and the files are combined in time.
Referring to fig. 1, the method includes:
s101: and acquiring the image file generated by the node of the target type.
The target type node may be a node of a metadata management center NameNode.
In the specific execution process of the embodiment of the invention, in order to realize automatic identification of the file generated during the operation of the data warehouse tool, the mirror image file generated by the node of the target type can be stored in the local storage device during the operation of the data warehouse tool.
In the embodiment of the invention, the mirror image file generated by the NameNode node of the metadata management center can be directly acquired in the local storage equipment, and then the mirror image file is analyzed.
The metadata management center NameNode node generates an image file, wherein the image file is specifically corresponding to an original file, and the original file is a file generated when a data warehouse tool operates.
The image file may be analyzed in real time, or the image file may be analyzed at certain time intervals, which is not limited in this respect.
S102: and analyzing the mirror image file by combining the directory information of the data warehouse tool to obtain the information of the original file to which the mirror image file belongs.
The original files are files corresponding to a plurality of database tables and a plurality of partitions of the data warehouse tool, and the files corresponding to each database table and each partition.
The directory information therein is used to describe the specific organizational structure of each database table and each partition.
The information includes: the amount and the size of the occupied storage space.
In the specific execution process of the embodiment of the invention, the directory information of the data warehouse tool can be combined to determine the information of the first original file corresponding to each database table and the information of the second original file corresponding to each partition in the plurality of database tables and the plurality of partitions of the data warehouse tool.
The original file corresponding to the database table may be referred to as a first original file, and the original file corresponding to each partition may be referred to as a second original file.
By combining the directory information of the data warehouse tool, the information of the first original file corresponding to each database table and the information of the second original file corresponding to each partition in the plurality of database tables and the plurality of partitions of the data warehouse tool are determined, so that the original files can be positioned timely and accurately, and the information of the original files corresponding to the database tables can be acquired timely.
In the embodiment of the invention, the information of the original file to which the mirror image file belongs is obtained by analyzing the mirror image file in combination with the directory information of the data warehouse tool, rather than directly accessing the NameNode node of the metadata management center through the preset interface, the acquisition process of the information of the original file can be simplified, the file processing efficiency is improved, and the mirror image file is acquired for analysis instead of directly remotely calling the NameNode node of the metadata management center through the preset interface, so that extra access pressure is not brought to the NameNode node, and the adverse effect on the stability of the production environment can be avoided.
S103: and combining the original files according to the information of the original files and a preset rule.
Optionally, in some embodiments, merging the original files according to the information of the original files by combining a preset rule, includes: determining a first average value of the size of the storage space occupied by the first original files corresponding to each database table according to the information of each first original file, determining a second average value of the size of the storage space occupied by the second original files corresponding to each partition, and combining the original files according to the first average value, the second average value, the number of the first original files and the number of the second original files and a preset rule.
Optionally, in some embodiments, merging the original files according to the first average value, the second average value, the number of the first original files, and the number of the second original files in combination with a preset rule includes: when the first average value or the second average value is smaller than or equal to a first preset threshold value, merging the first original file or the second original file; and/or when the number of the first original files or the number of the second original files is larger than a second preset threshold value, merging the first original files or the second original files.
The first preset threshold and the second preset threshold may be set by a user according to a requirement, or may also be preset by a factory program of the electronic device, which is not limited to this.
By setting a first preset threshold and a second preset threshold, when the first average value or the second average value is smaller than or equal to the first preset threshold, merging the first original file or the second original file; and/or when the number of the first original files or the number of the second original files is larger than a second preset threshold, merging the first original files or the second original files, and setting a reasonable judgment condition for merging, so that the files generated when the data warehouse tool operates can be automatically identified, and the files can be merged in time.
In the embodiment, the mirror image file generated by the node of the target type is obtained; analyzing the mirror image file by combining directory information of the data warehouse tool to obtain information of an original file to which the mirror image file belongs; according to the information of the original files, the original files are combined according to the preset rules, the files generated when the data warehouse tool operates can be automatically identified, and the files are combined in time.
As an example, referring to fig. 2, fig. 2 is a schematic view of an application scenario according to an embodiment of the present invention. The node of the metadata management center NameNode records the metadata information including file information in a local disk (the disk is a local storage device in the invention) periodically, and the stored content can be called as an Image file (Image); periodically starting an Analysis process, obtaining an Image file generated by a NameNode node of a metadata management center by a background, analyzing the Image file, and obtaining information of an original file to which the Image file belongs according to a Hive directory (a single table or a single partition corresponding to Hive), wherein the information can be information such as the number of files, the size of the files and the like; storing the result obtained by analysis into any storage system for query; inquiring the obtained Hive directory information, and calculating by combining with preset rules to obtain a Hive table or a partition to be merged, wherein the preset rules comprise: 1) the average value of the sizes of the storage spaces occupied by the original files is less than or equal to a first preset threshold; 2) and if the number of the original files is larger than or equal to a second preset threshold value and the conditions are met, the corresponding small files of the directory are considered to be too many and need to be merged.
In the specific implementation process of the embodiment of the invention, considering that the data is a service sensitive resource, the process can be automatically completed by electronic equipment, can also be confirmed by intervention of related personnel, issues an instruction to combine a corresponding Hive table or partition, and specifically can be calculated through MapReduce/Spark or Hive.
Fig. 3 is a schematic structural diagram of a document processing apparatus according to an embodiment of the present invention.
The file is a file in a data warehouse tool that includes nodes of the target type.
Referring to fig. 3, the apparatus 300 includes:
an obtaining module 301, configured to obtain an image file generated by a node of a target type;
the analysis module 302 is configured to analyze the mirror image file to obtain information of an original file to which the mirror image file belongs, in combination with directory information of the data warehouse tool;
and a merging processing module 303, configured to merge the original files according to the information of the original files and by combining a preset rule.
Optionally, in some embodiments, referring to fig. 4, the apparatus 300 further comprises:
and the storage module 304 is used for storing the image file generated by the node of the target type into the local storage device in the operation process of the data warehouse tool.
Optionally, in some embodiments, the information includes: the number and the size of the occupied storage space, the parsing module 302 is specifically configured to:
and determining information of a first original file corresponding to each database table and information of a second original file corresponding to each partition in a plurality of database tables and a plurality of partitions of the data warehouse tool by combining directory information of the data warehouse tool.
Optionally, in some embodiments, the merging processing module 303 is specifically configured to:
determining a first average value of the size of the storage space occupied by the first original file corresponding to each database table according to the information of each first original file, and determining a second average value of the size of the storage space occupied by the second original file corresponding to each partition;
and combining the original files according to the first average value, the second average value, the number of the first original files and the number of the second original files by combining a preset rule.
Optionally, in some embodiments, the merging processing module 303 is specifically configured to:
when the first average value or the second average value is smaller than or equal to a first preset threshold value, merging the first original file or the second original file; and/or the presence of a gas in the gas,
and when the number of the first original files or the number of the second original files is larger than a second preset threshold value, merging the first original files or the second original files.
It should be noted that the foregoing explanations of the file processing method embodiments in the embodiments of fig. 1-2 are also applicable to the file processing apparatus 300 of this embodiment, and the implementation principles thereof are similar and will not be described herein again.
In the embodiment, the mirror image file generated by the node of the target type is obtained; analyzing the mirror image file by combining directory information of the data warehouse tool to obtain information of an original file to which the mirror image file belongs; according to the information of the original files, the original files are combined according to the preset rules, the files generated when the data warehouse tool operates can be automatically identified, and the files are combined in time.
Fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
The computer device may be a mobile phone, a tablet computer, etc.
Referring to fig. 5, the computer apparatus 50 of the present embodiment includes: the electronic device comprises a shell 501, a processor 502, a memory 503, a circuit board 504 and a power supply circuit 505, wherein the circuit board 504 is arranged inside a space enclosed by the shell 501, and the processor 502 and the memory 503 are arranged on the circuit board 504; a power supply circuit 505 for supplying power to each circuit or device of the computer apparatus 50; the memory 503 is used to store executable program code; the processor 502 runs a program corresponding to the executable program code by reading the executable program code stored in the memory 503, so as to execute:
acquiring a mirror image file generated by a node of a target type;
analyzing the mirror image file by combining directory information of the data warehouse tool to obtain information of an original file to which the mirror image file belongs;
and combining the original files according to the information of the original files and a preset rule.
It should be noted that the foregoing explanation on the file processing method embodiment in the embodiments of fig. 1 to fig. 2 also applies to the computer device 50 of this embodiment, and the implementation principle is similar, and is not described herein again.
The computer device in the embodiment acquires the image file generated by the node of the target type; analyzing the mirror image file by combining directory information of the data warehouse tool to obtain information of an original file to which the mirror image file belongs; according to the information of the original files, the original files are combined according to the preset rules, the files generated when the data warehouse tool operates can be automatically identified, and the files are combined in time.
To achieve the above embodiments, the present invention also proposes a non-transitory computer-readable storage medium, which when instructions in the storage medium are executed by a processor of a terminal, enables the terminal to execute a file processing method, the file being a file in a data warehouse tool, the data warehouse tool including nodes of a target type, the method including:
acquiring a mirror image file generated by a node of a target type;
analyzing the mirror image file by combining directory information of the data warehouse tool to obtain information of an original file to which the mirror image file belongs;
and combining the original files according to the information of the original files and a preset rule.
The non-transitory computer readable storage medium in this embodiment obtains an image file generated by a node of a target type; analyzing the mirror image file by combining directory information of the data warehouse tool to obtain information of an original file to which the mirror image file belongs; according to the information of the original files, the original files are combined according to the preset rules, the files generated when the data warehouse tool operates can be automatically identified, and the files are combined in time.
To achieve the above embodiments, the present invention further provides a computer program product, wherein when instructions in the computer program product are executed by a processor, a file processing method is performed, where a file is a file in a data warehouse tool, the data warehouse tool includes a node of a target type, and the method includes:
acquiring a mirror image file generated by a node of a target type;
analyzing the mirror image file by combining directory information of the data warehouse tool to obtain information of an original file to which the mirror image file belongs;
and combining the original files according to the information of the original files and a preset rule.
The computer program product in the embodiment obtains the image file generated by the node of the target type; analyzing the mirror image file by combining directory information of the data warehouse tool to obtain information of an original file to which the mirror image file belongs; according to the information of the original files, the original files are combined according to the preset rules, the files generated when the data warehouse tool operates can be automatically identified, and the files are combined in time.
It should be noted that the terms "first," "second," and the like in the description of the present invention are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In addition, in the description of the present invention, "a plurality" means two or more unless otherwise specified.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (12)

1. A method of file processing, wherein the file is a file in a data warehouse tool, the data warehouse tool comprising nodes of a target type, the method comprising the steps of:
acquiring a mirror image file generated by the node of the target type;
analyzing the image file by combining the directory information of the data warehouse tool to obtain the information of the original file to which the image file belongs;
and combining the original files according to the information of the original files by combining a preset rule.
2. The file processing method according to claim 1, wherein before the obtaining the image file corresponding to the node of the target type, further comprising:
and storing the image file generated by the node of the target type into a local storage device in the operation process of the data warehouse tool.
3. The file processing method of claim 1, wherein the information comprises: the quantity and the size of the occupied storage space, in combination with the directory information of the data warehouse tool, the mirror image file is analyzed to obtain the information of the original file to which the mirror image file belongs, and the method comprises the following steps:
and determining information of a first original file corresponding to each database table and information of a second original file corresponding to each partition in a plurality of database tables and a plurality of partitions of the data warehouse tool by combining directory information of the data warehouse tool.
4. The file processing method according to claim 3, wherein the merging the original files according to the information of the original files and with a preset rule comprises:
determining a first average value of the size of the storage space occupied by the first original file corresponding to each database table according to the information of each first original file, and determining a second average value of the size of the storage space occupied by the second original file corresponding to each partition;
and combining preset rules to merge the original files according to the first average value, the second average value, the number of the first original files and the number of the second original files.
5. The file processing method according to claim 4, wherein the merging the original files according to the first average value, the second average value, the number of the first original files, and the number of the second original files in combination with a preset rule includes:
when the first average value or the second average value is smaller than or equal to a first preset threshold value, merging the first original file or the second original file; and/or the presence of a gas in the gas,
and when the number of the first original files or the number of the second original files is larger than a second preset threshold value, merging the first original files or the second original files.
6. An apparatus for processing a file, the file being a file in a data warehouse tool, the data warehouse tool comprising nodes of a target type, the apparatus comprising:
the acquisition module is used for acquiring the mirror image file generated by the node of the target type;
the analysis module is used for analyzing the image file to obtain the information of the original file to which the image file belongs by combining the directory information of the data warehouse tool;
and the merging processing module is used for merging the original files by combining a preset rule according to the information of the original files.
7. The document processing apparatus according to claim 6, further comprising:
and the storage module is used for storing the image file generated by the target type node into a local storage device in the operation process of the data warehouse tool.
8. The document processing apparatus according to claim 6, wherein the information includes: quantity and size of occupied storage space, the analysis module is specifically used for:
and determining information of a first original file corresponding to each database table and information of a second original file corresponding to each partition in a plurality of database tables and a plurality of partitions of the data warehouse tool by combining directory information of the data warehouse tool.
9. The document processing apparatus according to claim 8, wherein the merge processing module is specifically configured to:
determining a first average value of the size of the storage space occupied by the first original file corresponding to each database table according to the information of each first original file, and determining a second average value of the size of the storage space occupied by the second original file corresponding to each partition;
and combining preset rules to merge the original files according to the first average value, the second average value, the number of the first original files and the number of the second original files.
10. The document processing apparatus according to claim 9, wherein the merge processing module is specifically configured to:
when the first average value or the second average value is smaller than or equal to a first preset threshold value, merging the first original file or the second original file; and/or the presence of a gas in the gas,
and when the number of the first original files or the number of the second original files is larger than a second preset threshold value, merging the first original files or the second original files.
11. A non-transitory computer-readable storage medium on which a computer program is stored, the program, when executed by a processor, implementing a file processing method according to any one of claims 1 to 5.
12. A computer device comprising a housing, a processor, a memory, a circuit board, and a power circuit, wherein the circuit board is disposed inside a space enclosed by the housing, the processor and the memory being disposed on the circuit board; the power supply circuit is used for supplying power to each circuit or device of the computer equipment; the memory is used for storing executable program codes; the processor executes a program corresponding to the executable program code by reading the executable program code stored in the memory, for performing:
acquiring a mirror image file generated by the node of the target type;
analyzing the image file by combining the directory information of the data warehouse tool to obtain the information of the original file to which the image file belongs;
and combining the original files according to the information of the original files by combining a preset rule.
CN202010930089.7A 2019-02-15 2019-02-15 File processing method and device, storage medium and computer equipment Pending CN112231292A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010930089.7A CN112231292A (en) 2019-02-15 2019-02-15 File processing method and device, storage medium and computer equipment

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010930089.7A CN112231292A (en) 2019-02-15 2019-02-15 File processing method and device, storage medium and computer equipment
CN201910116009.1A CN109902067B (en) 2019-02-15 2019-02-15 File processing method and device, storage medium and computer equipment

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201910116009.1A Division CN109902067B (en) 2019-02-15 2019-02-15 File processing method and device, storage medium and computer equipment

Publications (1)

Publication Number Publication Date
CN112231292A true CN112231292A (en) 2021-01-15

Family

ID=66944827

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202010930089.7A Pending CN112231292A (en) 2019-02-15 2019-02-15 File processing method and device, storage medium and computer equipment
CN201910116009.1A Active CN109902067B (en) 2019-02-15 2019-02-15 File processing method and device, storage medium and computer equipment

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201910116009.1A Active CN109902067B (en) 2019-02-15 2019-02-15 File processing method and device, storage medium and computer equipment

Country Status (1)

Country Link
CN (2) CN112231292A (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112231293A (en) * 2020-09-14 2021-01-15 杭州数梦工场科技有限公司 File reading method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105653592A (en) * 2016-01-28 2016-06-08 浪潮软件集团有限公司 Small file merging tool and method based on HDFS
CN106503198A (en) * 2016-11-02 2017-03-15 北京集奥聚合科技有限公司 A kind of cold data recognition methodss and system based on hadoop metadata
CN108256115A (en) * 2017-09-05 2018-07-06 国家计算机网络与信息安全管理中心 A kind of HDFS small documents towards SparkSql merge implementation method in real time

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9424272B2 (en) * 2005-01-12 2016-08-23 Wandisco, Inc. Distributed file system using consensus nodes
CN103353901B (en) * 2013-08-01 2016-10-05 百度在线网络技术(北京)有限公司 The orderly management method of table data based on Hadoop distributed file system and system
US9280678B2 (en) * 2013-12-02 2016-03-08 Fortinet, Inc. Secure cloud storage distribution and aggregation
CN105404652A (en) * 2015-10-29 2016-03-16 河海大学 Mass small file processing method based on HDFS
US10305747B2 (en) * 2016-06-23 2019-05-28 Sap Se Container-based multi-tenant computing infrastructure
CN106843763A (en) * 2017-01-19 2017-06-13 北京神州绿盟信息安全科技股份有限公司 A kind of Piece file mergence method and device based on HDFS systems
CN107045531A (en) * 2017-01-20 2017-08-15 郑州云海信息技术有限公司 A kind of system and method for optimization HDFS small documents access
CN107679177A (en) * 2017-09-29 2018-02-09 郑州云海信息技术有限公司 A kind of small documents storage optimization method based on HDFS, device, equipment
CN109063192B (en) * 2018-08-29 2021-01-29 江苏云从曦和人工智能有限公司 Working method of high-performance mass file storage system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105653592A (en) * 2016-01-28 2016-06-08 浪潮软件集团有限公司 Small file merging tool and method based on HDFS
CN106503198A (en) * 2016-11-02 2017-03-15 北京集奥聚合科技有限公司 A kind of cold data recognition methodss and system based on hadoop metadata
CN108256115A (en) * 2017-09-05 2018-07-06 国家计算机网络与信息安全管理中心 A kind of HDFS small documents towards SparkSql merge implementation method in real time

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
曹建芳: "大规模场景图像的情感语义分析若干关键技术研究", pages: 94 - 97 *

Also Published As

Publication number Publication date
CN109902067B (en) 2020-11-27
CN109902067A (en) 2019-06-18

Similar Documents

Publication Publication Date Title
CN107506451B (en) Abnormal information monitoring method and device for data interaction
CN106874281B (en) Method and device for realizing database read-write separation
CN109379398B (en) Data synchronization method and device
EP2763055B1 (en) A telecommunication method and mobile telecommunication device for providing data to a mobile application
CN107423404B (en) Flow instance data synchronous processing method and device
KR102529038B1 (en) Resource management and control method and device, device and storage medium
US20190034228A1 (en) Method and apparatus for task scheduling
US10552419B2 (en) Method and system for performing an operation using map reduce
CN113111038B (en) File storage method, device, server and storage medium
CN109902067B (en) File processing method and device, storage medium and computer equipment
EP4012573A1 (en) Graph reconstruction method and apparatus
CN113360581A (en) Data processing method, device and storage medium
CN112860412B (en) Service data processing method and device, electronic equipment and storage medium
CN113220530B (en) Data quality monitoring method and platform
CN115408546A (en) Time sequence data management method, device, equipment and storage medium
CN114297196A (en) Metadata storage method and device, electronic equipment and storage medium
CN113835953A (en) Statistical method and device of job information, computer equipment and storage medium
CN113868138A (en) Method, system, equipment and storage medium for acquiring test data
CN113392131A (en) Data processing method and device and computer equipment
CN112800091A (en) Flow-batch integrated calculation control system and method
CN110908999A (en) Data acquisition mode determining method and device, storage medium and electronic device
CN113297245A (en) Method and device for acquiring execution information
CN112749189A (en) Data query method and device
CN117390040B (en) Service request processing method, device and storage medium based on real-time wide table
CN111143711A (en) Object searching method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination