CN109376137B - File processing method and device - Google Patents

File processing method and device Download PDF

Info

Publication number
CN109376137B
CN109376137B CN201811541562.1A CN201811541562A CN109376137B CN 109376137 B CN109376137 B CN 109376137B CN 201811541562 A CN201811541562 A CN 201811541562A CN 109376137 B CN109376137 B CN 109376137B
Authority
CN
China
Prior art keywords
file
source code
files
task
processed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811541562.1A
Other languages
Chinese (zh)
Other versions
CN109376137A (en
Inventor
张铮
潘传幸
邬江兴
王晓梅
王俊超
谢光伟
王立群
李卫超
刘镇武
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information Engineering University of PLA Strategic Support Force
Original Assignee
Information Engineering University of PLA Strategic Support Force
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information Engineering University of PLA Strategic Support Force filed Critical Information Engineering University of PLA Strategic Support Force
Priority to CN201811541562.1A priority Critical patent/CN109376137B/en
Publication of CN109376137A publication Critical patent/CN109376137A/en
Application granted granted Critical
Publication of CN109376137B publication Critical patent/CN109376137B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks

Abstract

The file processing method and the file processing device classify each file of the task according to the dependency relationship among the files of the task, classify each file with the dependency relationship in the task into the same class, distribute the files on the basis, and specifically distribute each file classified into the same class to the same node of the distributed cluster; therefore, based on the scheme of the application, the files with the dependency relationship in the task can be distributed to the same node of the cluster, and correspondingly, the dependency relationship does not exist among the files of different nodes in the cluster, so that convenience is brought to the processing of the files by the nodes, cross-node reference is not needed, the nodes in the distributed system can be effectively prevented from generating wrong results, and meanwhile, the effective utilization of computing resources of the distributed system is facilitated.

Description

File processing method and device
Technical Field
The invention belongs to the field of distributed computing, network communication and network security, and particularly relates to a file processing method and device.
Background
With the advent of the "internet +" era, networks have not only profound effects on people's lifestyles, but also have serious challenges on the computing power of servers. More and more enterprises deploy background servers to distributed systems, and break through the bottleneck of development by means of the power of distributed computing.
For some real-time large-scale batch processing computing tasks, how to perform fast and efficient file distribution plays a crucial role in fully and efficiently utilizing computing resources. In view of this, the present invention is directed to realizing efficient file distribution in a distributed environment, and particularly, decomposing a problem into small problems, and adopting ideas such as divide-and-conquer and pipeline to distribute files quickly and efficiently in a distributed environment.
The inventor has found that if the task file is simply divided and distributed to the distributed nodes according to the above idea, some processing inconvenience may be caused, and even the processing result is wrong.
Disclosure of Invention
In view of the above, an object of the present invention is to provide a file processing method and apparatus, so as to perform file division and distribution on tasks according to dependency relationships among files of the tasks, thereby avoiding occurrence of erroneous results of a distributed system, and being more beneficial to effective utilization of computing resources of the distributed system.
Therefore, the invention discloses the following technical scheme:
a method of file processing, comprising:
analyzing the dependency relationship among files included in the task to be processed;
classifying the files included in the task to be processed according to the dependency relationship among the files to obtain a plurality of classification categories, and recording the path structure of each file of the task to be processed; each classification category corresponds to a plurality of files with dependency relations and comprising the tasks to be processed;
distributing each file of the task to be processed to a plurality of nodes of a distributed cluster based on a preset file distribution strategy; wherein files belonging to the same classification category are distributed to the same node;
and acquiring the processing result of each node on the distributed files, and merging the processing result of each file based on the path structure of each file so as to restore the task structure of the task to be processed.
In the above method, preferably, if the task to be processed is a project source code to be processed, the classifying the files included in the task to be processed according to the dependency relationship among the files to obtain a plurality of classification categories, including:
and classifying the source code files according to the dependency relationship among the source code files of the engineering source code to obtain a plurality of classification type source code files, wherein each classification type correspondingly comprises a plurality of source code files with the dependency relationship.
Preferably, in the method, the distributing each file of the task to be processed to the plurality of nodes of the distributed cluster based on the predetermined file distribution policy includes:
acquiring the use condition of computing resources, the network load condition and the congestion condition of each node in the cluster;
distributing tasks to each source code file of the engineering source code based on the computing resource use condition, the network load condition and the congestion condition of each node in the cluster; wherein the respective source code files belonging to the same classification category are distributed to the same node.
Preferably, the obtaining of the processing result of each node on the distributed file and merging the processing results of each file based on the path structure of each file so as to restore the task structure of the task to be processed includes:
monitoring the processing condition of each node in the cluster based on multiple threads or multiple processes;
when monitoring a compiling result of a certain node for the distributed source code files of the corresponding category, receiving the compiling result and acquiring a path structure of each source code file in the category;
and writing the compiling result of each source code file in the category into a corresponding position of a storage medium based on the path structure of each source code file in the category so as to restore the engineering structure of the engineering source code.
Preferably, the method for writing the compiling result of the source code file into the corresponding position of the storage medium includes:
and storing the compiling result of the source code file in a preset sharing directory.
A document processing apparatus comprising:
the analysis unit is used for analyzing the dependency relationship among the files included in the task to be processed;
the classification and information recording unit is used for classifying the files included in the task to be processed according to the dependency relationship among the files to obtain a plurality of classification categories and recording the path structure of each file of the task to be processed; each classification category corresponds to at least one file comprising the tasks to be processed;
the distribution unit is used for distributing each file of the task to be processed to a plurality of nodes of the distributed cluster based on a preset file distribution strategy; wherein files belonging to the same classification category are distributed to the same node;
and the result processing unit is used for acquiring the processing result of each node on the distributed files and merging the processing result of each file based on the path structure of each file so as to restore the task structure of the task to be processed.
Preferably, in the apparatus, the task to be processed is an engineering source code to be processed;
the classification and information recording unit classifies the files included in the task to be processed according to the dependency relationship among the files to obtain a plurality of classification categories, and the classification and information recording unit specifically comprises the following steps:
and classifying the source code files according to the dependency relationship among the source code files of the engineering source code to obtain a plurality of classification type source code files, wherein each classification type correspondingly comprises at least one source code file.
The above apparatus, preferably, the distribution unit is specifically configured to:
acquiring the use condition of computing resources, the network load condition and the congestion condition of each node in the cluster;
distributing tasks to each source code file of the engineering source code based on the computing resource use condition, the network load condition and the congestion condition of each node in the cluster; wherein the respective source code files belonging to the same classification category are distributed to the same node.
The above apparatus, preferably, the result processing unit is specifically configured to:
monitoring the processing condition of each node in the cluster based on multiple threads or multiple processes;
when monitoring a compiling result of a certain node for the distributed source code files of the corresponding category, receiving the compiling result and acquiring a path structure of each source code file in the category;
and writing the compiling result of each source code file in the category into a corresponding position of a storage medium based on the path structure of each source code file in the category so as to restore the engineering structure of the engineering source code.
Preferably, in the apparatus, the writing, by the result processing unit, the compiling result of the source code file into a corresponding location of the storage medium includes:
and storing the compiling result of the source code file in a preset sharing directory.
According to the scheme, the file processing method and the file processing device provided by the application classify the files of the task according to the dependency relationship among the files of the task, classify the files with the dependency relationship in the task into the same class, distribute the files on the basis, and specifically distribute the files classified into the same class to the same node of the distributed cluster; therefore, based on the scheme of the application, the files with the dependency relationship in the task can be distributed to the same node of the cluster, and correspondingly, the dependency relationship does not exist among the files of different nodes in the cluster, so that convenience is brought to the processing of the files by the nodes, cross-node reference is not needed, the nodes in the distributed system can be effectively prevented from generating wrong results, and meanwhile, the effective utilization of computing resources of the distributed system is facilitated.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a schematic flowchart of a document processing method according to an embodiment of the present application;
FIG. 2 is a schematic structural diagram of a file distribution model in a distributed environment according to an embodiment of the present application;
FIG. 3 is a schematic workflow diagram of modules of a file distribution model in a distributed environment according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a document processing apparatus according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The inventor finds that certain dependency relationships often exist among certain files in files included in the same task, and when a task is processed, if the files of the task are simply divided (for example, the files are simply divided into a plurality of file groups with uniform sizes according to the size of data amount) and the files are distributed, some processing inconvenience may be brought, and even processing results are wrong (for example, if a file distributed by a certain node needs to refer to a variable defined by a file of another node, the processing results may be wrong due to unsuccessful cross-node reference). In order to solve the problem, the application provides a file processing method and device, which are suitable for file distribution in a distributed environment. The document processing method and apparatus of the present application will be described in detail below with specific embodiments.
Example one
Referring to fig. 1, a flow diagram of a document processing method is shown, the method comprising the steps of:
step 101, analyzing the dependency relationship among the files included in the task to be processed.
The task to be processed may be, but is not limited to, an engineering source code to be processed (e.g., to be compiled), and the document processing method of the present application will be described in detail mainly by taking the task to be processed as the engineering source code to be processed.
In view of this, in this step, the dependency relationship between the source code files included in the engineering source code to be processed may be specifically analyzed for the engineering source code to be processed.
For example, if a variable defined in a B file needs to be referenced in an a file, the B file must be compiled before the a file (where "before" refers to temporal first), and the a file is dependent on the B file, so that A, B there is a dependency between the two files. In this application, the fact that one file is dependent on another file means that the use (e.g., compilation or execution) of the one file needs to be premised on the other file, and if the other file is missing, the one file cannot be used.
102, classifying the files included in the task to be processed according to the dependency relationship among the files to obtain a plurality of classification categories, and recording the path structure of each file of the task to be processed; and each classification category corresponds to at least one file comprising the tasks to be processed.
After analyzing the dependency relationship among the source codes of the engineering source codes, the source code files of the engineering source codes can be continuously classified according to the dependency relationship among the source code files, and the source code files with the dependency relationship are specifically classified into the same classification category, that is, each classification category can comprise a plurality of source code files with the dependency relationship, and correspondingly, the source code files without the dependency relationship are classified into different classification categories.
In this step, after classifying the source code files of the engineering source code according to the dependency relationship, the path structure of each source code file in the original engineering source code is also recorded at the same time, so as to provide a basis for subsequently recovering the engineering structure of the engineering source code after obtaining the processing result of each node on the source code file.
103, distributing each file of the task to be processed to a plurality of nodes of the distributed cluster based on a preset file distribution strategy; wherein files belonging to the same classification category are distributed to the same node.
After each source code file of the engineering source code is divided into each category, each source code file of the engineering source code can be continuously distributed to a plurality of nodes of the distributed cluster according to the category to which the source code belongs, and each source code file of the same category is specifically distributed to the same node of the cluster, so that each source code file on the same node is ensured to have a complete dependency relationship in the node without any cross-node reference, the phenomenon that the processing result of the node is wrong is avoided, and meanwhile, the effective utilization of computing resources of a distributed system is facilitated.
In practical implementation, when file distribution is needed, the computing resource use condition, the network load condition and the congestion condition of each node in the cluster can be obtained in real time, and task distribution can be performed on each source code file of the engineering source code by combining the computing resource use condition, the network load condition and the congestion condition of each node in the cluster, so that the resource use condition, the network load condition and the congestion condition of each node are relatively balanced.
And 104, acquiring the processing result of each node on the distributed files, and merging the processing results of each file based on the path structure of each file so as to restore the task structure of the task to be processed.
After distributing each source code file of the engineering source code to each node of the distributed cluster according to the classification category to which the node belongs and by combining the computing resource use condition, the network load condition and the congestion condition of each node, each node of the distributed cluster performs required processing on each distributed source code file of the corresponding category, such as compiling, encrypting or code obfuscating the source code file.
Meanwhile, multithreading or multiprocessing can be started to monitor the processing condition of each node in the cluster, wherein, each time a processing result (such as a compiling result, a ciphertext obtained by encryption or a confusion code obtained by code confusion) of a certain node to the distributed corresponding category source code file is monitored, the processing result is received and the path structure of the currently monitored category source code file is obtained from the recorded path structure information of each source code file, and further, the processing result of each source code file in the category can be written into the corresponding position of the storage medium in a predetermined mode (specifically, the compiling result of the source code file can be stored in a predetermined sharing directory) based on the path structure of each source code file in the category, so as to restore the engineering structure of the engineering source code until the processing results of all the source code files of the engineering source code are received completely, and writing the processing result into the corresponding position of the storage medium according to the path structure of the source code file, and then merging the processing result of the whole engineering source code and restoring the structure.
The following provides a specific application example of the document processing method based on the application. In this example, a file distribution model in a distributed environment is realized based on the above method of the present application, as shown in fig. 2, the model includes a file splitting module 201, a file distribution module 202, a file receiving module 203, and a received result storage module 204, where:
the file splitting module 201 first analyzes the dependency relationship among the files included in the task to be processed, for example, specifically analyzes the dependency relationship among the source code files of the engineering source code, then classifies the files according to the dependency relationship, and finally transmits the classification result to the file distributing module 202;
after receiving the result transmitted by the file splitting module 201, the file distributing module 202 records the path structure of each file, then selects a node for each classification category according to a certain policy (for example, performs file distribution based on the use condition of computing resources, the network load condition and the congestion condition of each node in the cluster), and distributes all files in each classification category to a corresponding node;
the file receiving module 203 starts multi-line/process, monitors the receiving of the processing result all the time, and calls the file storage module 204 when the arrival of the processing result is monitored;
the file storage module 204 first requests the recorded path structure information of each file from the file distribution module 202 for the processing result of the file, and then writes the processing result (such as the compiling result or the encryption result) of the file into the corresponding position of the storage medium in a predetermined manner based on the path structure information of the file, so as to complete the structure restoration of the whole engineering source code.
The workflow of each module of the model can be specifically referred to as shown in fig. 3.
According to the scheme, the file processing method provided by the application carries out classification on each file of the task according to the dependency relationship among the files of the task, divides each file with the dependency relationship in the task into the same class, carries out file distribution on the basis, and particularly distributes each file divided into the same class to the same node of the distributed cluster; therefore, based on the scheme of the application, the files with the dependency relationship in the task can be distributed to the same node of the cluster, and correspondingly, the dependency relationship does not exist among the files of different nodes in the cluster, so that convenience is brought to the processing of the files by the nodes, cross-node reference is not needed, the nodes in the distributed system can be effectively prevented from generating wrong results, and meanwhile, the effective utilization of computing resources of the distributed system is facilitated.
Example two
Corresponding to the file processing method in the first embodiment, the second embodiment of the present application further provides a file processing apparatus, referring to the schematic structural diagram of the file processing apparatus shown in fig. 4, the apparatus includes an analyzing unit 401, a classifying and information recording unit 402, a distributing unit 403, and a result processing unit 404, where:
the analysis unit 401 is configured to analyze a dependency relationship between files included in the task to be processed.
The task to be processed may be, but is not limited to, an engineering source code to be processed (e.g., to be compiled), and the document processing method of the present application will be described in detail mainly by taking the task to be processed as the engineering source code to be processed.
In view of the fact that the analysis unit 401 may specifically analyze the dependency relationship between the source code files included in the engineering source code to be processed, the analysis unit may include a plurality of source code files, and the plurality of source code files form an engineering with corresponding functions through a certain organization structure.
For example, if a variable defined in a B file needs to be referenced in an a file, the B file must be compiled before the a file (where "before" refers to temporal first), and the a file is dependent on the B file, so that A, B there is a dependency between the two files. In this application, the fact that one file is dependent on another file means that the use (e.g., compilation or execution) of the one file needs to be premised on the other file, and if the other file is missing, the one file cannot be used.
A classification and information recording unit 402, configured to classify files included in the task to be processed according to a dependency relationship among the files, obtain multiple classification categories, and record a path structure of each file of the task to be processed; and each classification category corresponds to at least one file comprising the tasks to be processed.
The task to be processed is an engineering source code to be processed; the classifying and information recording unit 402 classifies the files included in the task to be processed according to the dependency relationship among the files to obtain a plurality of classification categories, including: and classifying the source code files according to the dependency relationship among the source code files of the engineering source code to obtain a plurality of classification type source code files, wherein each classification type correspondingly comprises at least one source code file.
Specifically, after analyzing the dependency relationship among the source codes of the engineering source codes, the source code files of the engineering source codes can be continuously classified according to the dependency relationship among the source code files, and the source code files with the dependency relationship are specifically classified into the same classification category, that is, each classification category may include a plurality of source code files with dependency relationship, and correspondingly, source code files without dependency relationship are classified into different classification categories.
After the source code files of the engineering source codes are classified according to the dependency relationship, the path structure of each source code file in the original engineering source codes is also recorded at the same time, so that a basis is provided for recovering the engineering structure of the engineering source codes after the processing result of each node on the source code files is obtained subsequently.
A distributing unit 403, configured to distribute, based on a predetermined file distribution policy, each file of the task to be processed to multiple nodes of a distributed cluster; wherein files belonging to the same classification category are distributed to the same node.
Further, the distributing unit 403 is specifically configured to: acquiring the use condition of computing resources, the network load condition and the congestion condition of each node in the cluster; distributing tasks to each source code file of the engineering source code based on the computing resource use condition, the network load condition and the congestion condition of each node in the cluster; wherein the respective source code files belonging to the same classification category are distributed to the same node.
Specifically, after dividing each source code file of the engineering source code into each category, each source code file of the engineering source code can be continuously distributed to a plurality of nodes of the distributed cluster according to the category to which the source code belongs, and each source code file of the same category is specifically distributed to the same node of the cluster, so that each source code file on the same node is ensured to have a complete dependency relationship in the node without any cross-node reference, the phenomenon that the processing result of the node is wrong is avoided, and meanwhile, the effective utilization of computing resources of the distributed system is facilitated.
In practical implementation, when file distribution is needed, the computing resource use condition, the network load condition and the congestion condition of each node in the cluster can be obtained in real time, and task distribution can be performed on each source code file of the engineering source code by combining the computing resource use condition, the network load condition and the congestion condition of each node in the cluster, so that the resource use condition, the network load condition and the congestion condition of each node are relatively balanced.
A result processing unit 404, configured to obtain a processing result of each node on the distributed file, and merge the processing result of each file based on the path structure of each file, so as to restore the task structure of the to-be-processed task.
Further, the result processing unit 404 is specifically configured to: monitoring the processing condition of each node in the cluster based on multiple threads or multiple processes; when monitoring a compiling result of a certain node for the distributed source code files of the corresponding category, receiving the compiling result and acquiring a path structure of each source code file in the category; and writing the compiling result of each source code file in the category into a corresponding position of a storage medium based on the path structure of each source code file in the category so as to restore the engineering structure of the engineering source code.
Specifically, after distributing each source code file of the engineering source code to each node of the distributed cluster according to the classification category to which the node belongs and by combining the computing resource usage condition, the network load condition, and the congestion condition of each node, each node of the distributed cluster performs required processing on each distributed source code file of the corresponding category, such as compiling, encrypting, code obfuscating, and the like on the source code file.
Meanwhile, multithreading or multiprocessing can be started to monitor the processing condition of each node in the cluster, wherein, each time a processing result (such as a compiling result, a ciphertext obtained by encryption or a confusion code obtained by code confusion) of a certain node to the distributed corresponding category source code file is monitored, the processing result is received and the path structure of the currently monitored category source code file is obtained from the recorded path structure information of each source code file, and further, the processing result of each source code file in the category can be written into the corresponding position of the storage medium in a predetermined mode (specifically, the compiling result of the source code file can be stored in a predetermined sharing directory) based on the path structure of each source code file in the category, so as to restore the engineering structure of the engineering source code until the processing results of all the source code files of the engineering source code are received completely, and writing the processing result into the corresponding position of the storage medium according to the path structure of the source code file, and then merging the processing result of the whole engineering source code and restoring the structure.
According to the scheme, the file processing device divides the files of the task into the same type according to the dependency relationship among the files of the task, divides the files with the dependency relationship in the task into the same type, distributes the files on the basis, and specifically distributes the files divided into the same type to the same node of the distributed cluster; therefore, based on the scheme of the application, the files with the dependency relationship in the task can be distributed to the same node of the cluster, and correspondingly, the dependency relationship does not exist among the files of different nodes in the cluster, so that convenience is brought to the processing of the files by the nodes, cross-node reference is not needed, the nodes in the distributed system can be effectively prevented from generating wrong results, and meanwhile, the effective utilization of computing resources of the distributed system is facilitated.
It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other.
For convenience of description, the above system or apparatus is described as being divided into various modules or units by function, respectively. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.
From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application.
Finally, it is further noted that, herein, relational terms such as first, second, third, fourth, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (8)

1. A file processing method, comprising:
analyzing the dependency relationship among files included in the task to be processed;
classifying the files included in the task to be processed according to the dependency relationship among the files to obtain a plurality of classification categories, and recording the path structure of each file of the task to be processed; each classification category corresponds to a plurality of files with dependency relations and comprising the tasks to be processed;
distributing each file of the task to be processed to a plurality of nodes of a distributed cluster based on a preset file distribution strategy; wherein files belonging to the same classification category are distributed to the same node;
acquiring the processing result of each node on the distributed files, and merging the processing results of each file based on the path structure of each file so as to restore the task structure of the task to be processed;
if the task to be processed is the engineering source code to be processed, classifying the files included in the task to be processed according to the dependency relationship among the files to obtain a plurality of classification categories, including:
classifying the source code files according to the dependency relationship among the source code files of the engineering source code to obtain a plurality of classification type source code files, wherein each classification type correspondingly comprises a plurality of source code files with the dependency relationship;
the dependency relationship among the files represents that the compiling or running of one file needs to be premised on another file, and if the another file is missing, the one file cannot be used.
2. The method of claim 1, wherein distributing the respective files of the pending task to the plurality of nodes of the distributed cluster based on a predetermined file distribution policy comprises:
acquiring the use condition of computing resources, the network load condition and the congestion condition of each node in the cluster;
distributing tasks to each source code file of the engineering source code based on the computing resource use condition, the network load condition and the congestion condition of each node in the cluster; wherein the respective source code files belonging to the same classification category are distributed to the same node.
3. The method according to claim 2, wherein the obtaining of the processing result of each node on the distributed files and merging the processing result of each file based on the path structure of each file, so as to restore the task structure of the task to be processed, comprises:
monitoring the processing condition of each node in the cluster based on multiple threads or multiple processes;
when monitoring a compiling result of a certain node for the distributed source code files of the corresponding category, receiving the compiling result and acquiring a path structure of each source code file in the category;
and writing the compiling result of each source code file in the category into a corresponding position of a storage medium based on the path structure of each source code file in the category so as to restore the engineering structure of the engineering source code.
4. The method of claim 3, wherein writing the compiled result of the source code file to a corresponding location on the storage medium comprises:
and storing the compiling result of the source code file in a preset sharing directory.
5. A document processing apparatus, characterized by comprising:
the analysis unit is used for analyzing the dependency relationship among the files included in the task to be processed;
the classification and information recording unit is used for classifying the files included in the task to be processed according to the dependency relationship among the files to obtain a plurality of classification categories and recording the path structure of each file of the task to be processed; each classification category corresponds to at least one file comprising the tasks to be processed;
the distribution unit is used for distributing each file of the task to be processed to a plurality of nodes of the distributed cluster based on a preset file distribution strategy; wherein files belonging to the same classification category are distributed to the same node;
the result processing unit is used for acquiring the processing result of each node on the distributed files and merging the processing result of each file based on the path structure of each file so as to restore the task structure of the task to be processed;
the task to be processed is an engineering source code to be processed;
the classification and information recording unit classifies the files included in the task to be processed according to the dependency relationship among the files to obtain a plurality of classification categories, and the classification and information recording unit specifically comprises the following steps:
classifying the source code files according to the dependency relationship among the source code files of the engineering source code to obtain a plurality of classification type source code files, wherein each classification type correspondingly comprises at least one source code file;
the dependency relationship among the files represents that the compiling or running of one file needs to be premised on another file, and if the another file is missing, the one file cannot be used.
6. The apparatus according to claim 5, wherein the distribution unit is specifically configured to:
acquiring the use condition of computing resources, the network load condition and the congestion condition of each node in the cluster;
distributing tasks to each source code file of the engineering source code based on the computing resource use condition, the network load condition and the congestion condition of each node in the cluster; wherein the respective source code files belonging to the same classification category are distributed to the same node.
7. The apparatus of claim 6, wherein the result processing unit is specifically configured to:
monitoring the processing condition of each node in the cluster based on multiple threads or multiple processes;
when monitoring a compiling result of a certain node for the distributed source code files of the corresponding category, receiving the compiling result and acquiring a path structure of each source code file in the category;
and writing the compiling result of each source code file in the category into a corresponding position of a storage medium based on the path structure of each source code file in the category so as to restore the engineering structure of the engineering source code.
8. The apparatus of claim 6, wherein the result processing unit writes the compilation result of the source code file into a corresponding location of the storage medium, and specifically comprises:
and storing the compiling result of the source code file in a preset sharing directory.
CN201811541562.1A 2018-12-17 2018-12-17 File processing method and device Active CN109376137B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811541562.1A CN109376137B (en) 2018-12-17 2018-12-17 File processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811541562.1A CN109376137B (en) 2018-12-17 2018-12-17 File processing method and device

Publications (2)

Publication Number Publication Date
CN109376137A CN109376137A (en) 2019-02-22
CN109376137B true CN109376137B (en) 2021-03-23

Family

ID=65374290

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811541562.1A Active CN109376137B (en) 2018-12-17 2018-12-17 File processing method and device

Country Status (1)

Country Link
CN (1) CN109376137B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111008074B (en) * 2019-12-05 2023-08-22 中国建设银行股份有限公司 File processing method, device, equipment and medium
CN111224822A (en) * 2020-01-03 2020-06-02 深圳鲲云信息科技有限公司 Node scheduling method, system, server and storage medium of data flow graph
CN113110803B (en) * 2021-04-19 2022-10-21 浙江中控技术股份有限公司 Data storage method and device

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831012A (en) * 2011-06-16 2012-12-19 日立(中国)研究开发有限公司 Task scheduling device and task scheduling method in multimode distributive system
CN103377075A (en) * 2012-04-28 2013-10-30 腾讯科技(深圳)有限公司 Task management method, device and system
US8868604B2 (en) * 2012-09-26 2014-10-21 Oracle International Corporation Methods and apparatus for implementing Semi-distributed Lock Management
CN105162878B (en) * 2015-09-24 2018-08-31 网宿科技股份有限公司 Document distribution system based on distributed storage and method
CN106155791B (en) * 2016-06-30 2019-05-07 电子科技大学 A kind of workflow task dispatching method under distributed environment
CN106874084B (en) * 2017-01-04 2020-04-07 北京百度网讯科技有限公司 Distributed workflow scheduling method and device and computer equipment
CN108874520A (en) * 2018-06-06 2018-11-23 成都四方伟业软件股份有限公司 Calculation method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
服务发布与发现在面向服务的架构中的研究与应用;汪润等;《电脑知识与技术》;20180228;第14卷(第6期);全文 *

Also Published As

Publication number Publication date
CN109376137A (en) 2019-02-22

Similar Documents

Publication Publication Date Title
Chandarana et al. Big data analytics frameworks
CN109376137B (en) File processing method and device
CN102638456B (en) Based on magnanimity real-time video code stream intelligent analysis method and the system thereof of cloud computing
US8381016B2 (en) Fault tolerance for map/reduce computing
CN105631026A (en) Security data analysis system
CN111049705A (en) Method and device for monitoring distributed storage system
Melenli et al. Real-time maintaining of social distance in covid-19 environment using image processing and big data
US10169166B2 (en) Real-time fault-tolerant architecture for large-scale event processing
US9588780B2 (en) Maintaining state information in a multi-component, event-driven state machine
Shu et al. Massive distributed and parallel log analysis for organizational security
CN108573029B (en) Method, device and storage medium for acquiring network access relation data
CN109190025B (en) Information monitoring method, device, system and computer readable storage medium
CN110727508A (en) Task scheduling system and scheduling method
CN111124830B (en) Micro-service monitoring method and device
Li et al. Assessment of machine learning algorithms in cloud computing frameworks
CN112000350B (en) Dynamic rule updating method, device and storage medium
JP2004038516A (en) Work processing system, operation management method and program for performing operation management
Parres-Peredo et al. Building and evaluating user network profiles for cybersecurity using serverless architecture
Madsen et al. Integrating fault-tolerance and elasticity in a distributed data stream processing system
Kim-Hung et al. A scalable IoT framework to design logical data flow using virtual sensor
Terzi et al. Evaluations of big data processing
CN112925619A (en) Big data real-time computing method and platform
Imran et al. Cloud-niagara: A high availability and low overhead fault tolerance middleware for the cloud
Talluri et al. Characterization of a big data storage workload in the cloud
Nazeer et al. Real-time text analytics pipeline using open-source big data tools

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant