CN115391276A - High-resolution remote sensing image distributed processing system and method - Google Patents

High-resolution remote sensing image distributed processing system and method Download PDF

Info

Publication number
CN115391276A
CN115391276A CN202211012277.7A CN202211012277A CN115391276A CN 115391276 A CN115391276 A CN 115391276A CN 202211012277 A CN202211012277 A CN 202211012277A CN 115391276 A CN115391276 A CN 115391276A
Authority
CN
China
Prior art keywords
remote sensing
sensing image
data
distributed
metadata
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211012277.7A
Other languages
Chinese (zh)
Inventor
李斌全
吴兰
王瞧
姚远
龚丽爽
郭鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan University of Technology
Original Assignee
Henan University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan University of Technology filed Critical Henan University of Technology
Priority to CN202211012277.7A priority Critical patent/CN115391276A/en
Publication of CN115391276A publication Critical patent/CN115391276A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/116Details of conversion of file system types or formats
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a distributed processing system for a high-resolution remote sensing image, which comprises a remote sensing image input file format extension module, a remote sensing image metadata management and optimization module and a distributed computing module for the remote sensing image. The remote sensing image storage method has the advantages that the remote sensing image can be distributed and stored in different nodes by utilizing the expandability of the distributed storage model, and the storage capacity of the remote sensing image storage method to the image is correspondingly increased along with the continuous increase of the storage nodes. The corresponding distributed computing framework is utilized to realize that the computation is moved to the data, the remote sensing image data which are distributed and stored in different nodes are simultaneously operated, and the processing efficiency of the image data is greatly improved.

Description

High-resolution remote sensing image distributed processing system and method
Technical Field
The invention belongs to the technical field of remote sensing image processing, and particularly relates to a distributed processing system and method for a high-resolution remote sensing image.
Background
With the rapid increase of high-resolution remote sensing images, huge challenges are brought to storage and calculation of the high-resolution remote sensing images.
In the aspect of storage, because a centralized storage mode has an upper storage limit, when the number of remote sensing images is further increased, the storage requirement of image data cannot be met. On the contrary, the storage node of the distributed storage mode can be further expanded, so that the data storage mode has good expandability and adaptability in data storage.
In the aspect of calculation, because a single machine serially processes images, the time consumption is huge, and the method is not suitable for the timeliness requirement of remote sensing image processing. On the contrary, the distributed computing method distributes the computing task to each storage node, and each storage node completes the computing task of part of data, so that the computing efficiency of image processing is greatly improved.
It can be seen that the distributed storage and distributed computation model are suitable for the processing requirements of remote sensing image data.
Disclosure of Invention
Aiming at the problems in the prior art, the invention aims to provide a distributed processing system and a distributed processing method based on a high-resolution remote sensing image. The remote sensing image data distributed and stored in different nodes are simultaneously operated by using a corresponding distributed computing framework to realize that computing is moved to data, so that the processing efficiency of the image data is greatly improved.
In order to achieve the purpose, the invention adopts the following technical scheme,
a high-resolution remote sensing image distributed processing system comprises a remote sensing image input file format extension module, a remote sensing image metadata management and optimization module and a remote sensing image distributed computing module, wherein the remote sensing image input file format extension module, the remote sensing image metadata management and optimization module and the remote sensing image distributed computing module are connected with the remote sensing image input file format extension module through a network
The remote sensing image input file format extension module is used for correspondingly extending the input file format of the distributed file system for the high-resolution remote sensing image type data, so that the distributed file system can effectively store the remote sensing image type data, the input file format of the remote sensing image data is extended after the distributed file system can effectively store the remote sensing image type data, gfFileName, gfFileContent, gfImageFileInputFound and GfImageFileRecordReader are constructed, and remote sensing images in the HDFS of the distributed file system are respectively stored in the GfFileName and the GfFileContent by image names and image contents;
the remote sensing image metadata management and optimization module utilizes metadata of remote sensing image data to establish index and reference of the data, access to the data is achieved through reference of the metadata, and structured processing of remote sensing image information is achieved; the data itself is accessed through the reference of the metadata, so that the distributed algorithm can effectively read the remote sensing image data;
the distributed computing module of the remote sensing image adopts a data parallel mode, and a deep convolutional neural network is trained by a plurality of nodes together, so that the training speed and the processing efficiency of the remote sensing image data are improved.
It should be noted that the training and parameter updating mode of the deep convolutional neural network adopts batch updating; and performing the next iteration after performing the integral calculation on the remote sensing images by batch updating.
The invention also provides a method for realizing the distributed processing system of the high-resolution remote sensing image, which is characterized by comprising the following steps:
s1, expanding a format of a remote sensing image input file;
s2, managing and optimizing the remote sensing image metadata after format expansion;
and S3, carrying out distributed calculation on the remote sensing image processed in the step S2.
It should be noted that the step S1 further includes:
s1.1, storing the image name and the image content of the remote sensing image by using the GfFileName and the GfFileContent;
s1.2GfImageFileInputFormat inherits from FileInputFormat, and FileInputFormat realizes an InputFoundation interface; the GfImageFileRecordreader realizes a Recorderreader interface;
s1.3, reading data by using the InputFormat, distributing the data to a Mapper for processing, and finally reading a Key/Value pair by the Mapper;
s1.4FileInputFormat realizes that a file can be used as input data of a system;
s1.5GfImageFileInputFormat inherits the FileInputFormat class and comprises the following member methods: configure (), issaplable (), getSplits (), getRecordReader (); wherein, configure () is used to configure the relevant attribute, issplible () is used to judge whether to perform data block segmentation, getSplits () is used to segment the data block, getRecordReader () is used to perform read operation on the corresponding record; meanwhile, getRecordReader () for recording and reading calls a GfImageFileRecordReader () method in the GfImageFileRecordReader class;
s1.6, the GfImageFileInputFormat firstly executes configuration () to carry out configuration operation, then judges whether a data block needs to be divided by using issDeltable (), and executes a data block division function getSplits (), if so; then, executing a getRecordReader () method for reading records, which calls a construction method of the last one of the four classes constructed previously, namely a GfImageFileRecordReader () method of the GfImageFileRecordReader class;
s1.7GfImageFileRecordReader mainly comprises construction methods GfImageFileRecordReader (), createKey (), createValue (), and next (). The method comprises the steps of generating a createKey () and a createValue () to return a key value pair, realizing continuous reading operation of records by a next () method, and then reading remote sensing image data by a GfImageFileRecordReader ().
It should be noted that, the InputFormat includes verifying correctness of job input for the distributed framework; the method comprises the steps that input data are divided into logical InputSplits by means of a getSplits method, one InputSplit is distributed to a single Mapper for processing, wherein the InputSplits logically comprise all Key/Value pairs which are provided for a certain Mapper for processing, a getLength method in the InputSplits is used for obtaining the size of the InputSplits, and a getLocation method is used for returning a corresponding location list; and returning a Recordreader object by using a CreateRecordreader method, and reading a Key/Value pair in the fragment by using the object by using the Mapper.
In step S2, the remote sensing image metadata in the extended format is managed
The method comprises the steps that a main node manages, and metadata comprise three types of file directories, file blocks and position information; the main node saves all the remote sensing images and the metadata of the folders by using a file system tree in the forms of a name space mirror image and a modified log file, wherein the file system tree comprises file data block handles and the information of distributed nodes; meanwhile, for the information request of the client, the main node performs corresponding feedback, and the client can perform local import, export, file deletion, directory creation and other operations through a program instruction, so that the remote sensing image data is processed.
It should be noted that the optimizing the remote sensing image metadata after the format expansion in step S2 includes:
s2.1, remote sensing image file redundancy; the image information is backed up by utilizing a plurality of slave node nodes so as to improve the fault tolerance of the system; performing transfer copying while storing one copy of Block; if a certain node is stored to be in fault or abnormal, the Block copy of the remote sensing image data can be found out in other slave nodes, efficient access to data information is guaranteed, and fault tolerance of the remote sensing image data is achieved;
s2.2, heartbeat detection of the remote sensing image data; the slave node periodically sends heartbeat signals to the master node to verify the current state and the connectivity of the slave node, if the heartbeat signals are not detected, a certain slave node storing the remote sensing image is determined to be down, and subsequent remote sensing data are not arranged in the slave node during storage; secondly, the slave node periodically sends and summarizes the Block information list to a corresponding mapping table in the master node through a heartbeat detection mode, so that the detection and the retrieval of the data by using the metadata of the remote sensing image data are facilitated, and the integrity and the usability of the data are ensured;
s2.3, optimizing a security mode of the distributed file system and setting a copy threshold value. When a distributed storage system of remote sensing images is started, a main node checks Block information of all data in the whole system, then determines whether a correct coefficient of each Block is equal to or greater than a set copy threshold value, and if not, copies the blocks of the remote sensing image data;
s2.4, detecting the integrity of remote sensing image data, calculating a checksum of a Block when the Block is created by a distributed file system, mainly helping to judge the integrity of the data, and reading a corresponding data copy nearby if the data is incomplete; when remote sensing image data are stored, different storage strategies can be adopted according to different slave node capacities in consideration of load balance; finally, a pipeline copying mode is adopted for remote sensing image data, after the cache in the client reaches the set size of Block, a master node of the distributed system is informed, and the remote sensing image data Block is copied according to a slave node given by the master node; and the first slave node starts to send data to the second slave node while writing the data into the disk, so that the time consumed by writing the remote sensing image data Block is reduced, and the integrity of the data is ensured.
It should be noted that, step S3 includes the Mapper performing forward propagation and reverse error calculation on the remote sensing image data, solving a local change amount of the model parameter, generating an intermediate key value pair including < key = w and value = Δ w >, then executing a calculation result of the Combiner locally summarizing the model parameter, and reducing I/O transmission consumption of the data, and finally, the Reducer receives an output result of the Combiner, summarizes the local change amount of the parameter of each node to obtain a global change amount, and performs batch updating.
The invention has the beneficial effects that:
1. by utilizing the expandability of the distributed storage model, the remote sensing images can be distributed and stored in different nodes, and the storage capacity of the remote sensing images is correspondingly increased along with the continuous increase of the storage nodes.
2. The remote sensing image data distributed and stored in different nodes are simultaneously operated by using a corresponding distributed computing framework to realize that computing is moved to data, so that the processing efficiency of the image data is greatly improved.
Detailed Description
The present invention will be further described below, and it should be noted that the following examples are provided to illustrate the detailed embodiments and specific procedures based on the technical solution, but the scope of the present invention is not limited to the examples.
The invention relates to a distributed processing system for a high-resolution remote sensing image, which comprises a remote sensing image input file format extension module, a remote sensing image metadata management and optimization module and a distributed computing module for the remote sensing image, wherein the remote sensing image input file format extension module is used for receiving the remote sensing image metadata management and optimization module and the distributed computing module is used for processing the remote sensing image metadata
The remote sensing image input file format extension module is used for correspondingly extending the input file format of the distributed file system for the high-resolution remote sensing image type data, so that the distributed file system can effectively store the remote sensing image type data, the input file format of the remote sensing image data is extended after the distributed file system can effectively store the remote sensing image type data, gfFileName, gfFileContent, gfImageFileInputFound and GfImageFileRecordReader are constructed, and remote sensing images in the HDFS of the distributed file system are respectively stored in the GfFileName and the GfFileContent by image names and image contents;
the remote sensing image metadata management and optimization module utilizes metadata of remote sensing image data to establish index and reference of the data, access to the data is achieved through reference of the metadata, and structured processing of remote sensing image information is achieved; the data itself is accessed through the reference of the metadata, so that the distributed algorithm can effectively read the remote sensing image data;
the distributed computing module of the remote sensing image adopts a data parallel mode, and a deep convolutional neural network is trained by a plurality of nodes together, so that the training speed and the processing efficiency of the remote sensing image data are improved.
Furthermore, the training and parameter updating mode of the deep convolutional neural network adopts batch updating; and performing the next iteration after performing the integral calculation on the remote sensing images by batch updating.
The invention also provides a method for realizing the distributed processing system of the high-resolution remote sensing image, which is characterized by comprising the following steps:
s1, expanding a format of a remote sensing image input file;
s2, managing and optimizing the remote sensing image metadata after format expansion;
and S3, carrying out distributed calculation on the remote sensing image processed in the step S2.
Further, step S1 of the present invention further includes:
s1.1, storing the image name and the image content of the remote sensing image by using GfFileName and GfFileContent;
s1.2GfImageFileInputFormat inherits from FileInputFormat, and FileInputFormat realizes an InputFoundation interface; the GfImageFileRecordreader realizes a Recorderreader interface;
s1.3, reading data by using the InputFormat, distributing the data to a Mapper for processing, and finally reading a Key/Value pair by the Mapper;
s1.4FileInputFormat realizes that a file can be used as input data of a system;
s1.5GfImageFileInputFormat inherits the FileInputFormat class and comprises the following member methods: configure (), issaplable (), getSplits (), getRecordReader (); the configuration () is used for configuring relevant attributes, the isstableable () is used for judging whether to perform data block segmentation, the getSplits () is used for segmenting data blocks, and the getRecordReader () is used for reading corresponding records; meanwhile, getRecordReader () for recording and reading calls a GfImageFileRecordReader () method in the GfImageFileRecordReader class;
s1.6GfImageFileInputFormat firstly executes configuration () to carry out configuration operation, then judges whether a data block needs to be divided by using issolitable (), and executes a data block division function getSplits (), if so; then, starting to execute a getrecordreadreader () method for reading records, which calls a construction method of the last one of the four classes constructed previously, namely a gfimagefilerecorderreader () method of the gfimagefilerecorderreader class;
the S1.7GfImageFileRecordReader class mainly comprises construction methods GfImageFileRecordReader (), createKey (), createValue (), and next (). The method comprises the steps of recording a remote sensing image data, wherein the two methods of createKey () and createValue () are used for returning a key value pair, the next () method realizes the continuous reading operation of records, and then the GfImageFileRecordReader () performs the work of reading the remote sensing image data.
Further, in the present invention, the input format is a distributed framework and includes verifying correctness of job input; dividing input data into logical InputSplit by using a getSplits method, wherein one InputSplit allocates a separate Mapper for processing, the InputSplit logically comprises all Key/Value pairs which are provided for a certain Mapper for processing, a getLength method in the InputSplit is used for acquiring the size of the InputSplit, and a getLocation method is used for returning a corresponding location list; and returning a Recordreader object by using a CreateRecordreader method, and reading a Key/Value pair in the fragment by using the object by using the Mapper.
Further, in the step S2 of the present invention, the remote sensing image metadata after the format expansion is managed
The method comprises the steps that main nodes manage, and metadata comprise three types of file directories, file blocks and position information; the main node saves all the remote sensing images and the metadata of the folders by using a file system tree in the forms of a name space mirror image and a modified log file, wherein the file system tree comprises file data block handles and the information of distributed nodes; meanwhile, for the information request of the client, the main node performs corresponding feedback, and the client can perform local import, export, file deletion, directory creation and other operations through a program instruction, so that the remote sensing image data is processed.
Further, the optimizing the remote sensing image metadata after the format expansion in the step S2 of the present invention includes:
s2.1, remote sensing image file redundancy; the image information is backed up by utilizing a plurality of slave node nodes so as to improve the fault tolerance of the system; performing transfer copying while storing one copy of Block; if a certain node is stored to be in fault or abnormal, the Block copy of the remote sensing image data can be found out in other slave nodes, efficient access to data information is guaranteed, and fault tolerance of the remote sensing image data is achieved;
s2.2, heartbeat detection of remote sensing image data; the slave node periodically sends heartbeat signals to the master node to verify the current state and the connectivity of the slave node, if the heartbeat signals are not detected, a certain slave node storing the remote sensing image is determined to be down, and subsequent remote sensing data are not arranged in the slave node during storage; secondly, the slave node periodically sends and summarizes the Block information list to a corresponding mapping table in the master node through a heartbeat detection mode, so that the detection and the retrieval of the data by using the metadata of the remote sensing image data are facilitated, and the integrity and the usability of the data are ensured;
s2.3, optimizing a security mode of the distributed file system and setting a copy threshold value. When a distributed storage system of remote sensing images is started, a main node checks Block information of all data in the whole system, and then determines whether a correct coefficient of each Block is equal to or larger than a set copy threshold value, otherwise, the Block of the remote sensing image data is copied;
s2.4, detecting the integrity of remote sensing image data, calculating a checksum of a Block when the Block is created by a distributed file system, mainly helping to judge the integrity of the data, and reading a corresponding data copy nearby if the data is incomplete; when remote sensing image data are stored, different storage strategies can be adopted according to different slave node capacities in consideration of load balance; finally, a pipeline copying mode is adopted for remote sensing image data, after the cache in the client reaches the set size of Block, a master node of the distributed system is informed, and the remote sensing image data Block is copied according to a slave node given by the master node; and the first slave node starts to send data to the second slave node while writing the data into the disk, so that the time consumed by writing the remote sensing image data Block is reduced, and the integrity of the data is ensured.
Further, step S3 of the present invention includes Mapper performing forward propagation and backward error calculation on the remote sensing image data, solving local variation of the model parameters, generating an intermediate key-value pair including < key = w and value = Δ w >, then performing Combiner to locally summarize the calculation result of the model parameters, and simultaneously reducing I/O transmission consumption of the data, and finally, reducer accepts the output result of Combiner and locally locates the parameters of each node.
Examples
1. Remote sensing image input file format extension
For the high-resolution remote sensing image type data, the input file format of the distributed file system needs to be correspondingly expanded, so that the distributed file system can effectively store the remote sensing image type data. For the distributed file system HDFS, according to the functions of textInputFormat and sequenceiFileInputFormat in the distributed file system, gfImageFileInputFormat and GfImageFileRecordReader are constructed and realized, the input file format of remote sensing image data is expanded, and remote sensing images in the HDFS are stored in GfFileName and GfFileContent respectively by image names and image contents.
In order to enable the distributed file system to support storage of remote sensing image type data, the GfFileName, gfFileContent, gfImageFillnputFount and GfImageFileRecordReader constructed as above are required to be utilized. They were analyzed separately as follows:
first, the GfFileName and GfFileContent are used to store the image name and image content of the remote sensing image.
Second, gfImageFileInputFormat inherits from FileInputFormat, which implements the inputfoundation interface. Gfimagefilerecorderreader implements the RecorderReader interface.
Thirdly, data reading is carried out by utilizing the InputFormat, the data reading is simultaneously distributed to a Mapper for processing, and finally the Mapper reads out a Key/Value pair in the data reading. The InputFormat provides the following functions for the distributed framework: 1) Verifying the correctness of the operation input; 2) The method comprises the steps that input data are divided into logical InputSplits by means of a getSplits method, one InputSplit is distributed to a single Mapper for processing, wherein the InputSplits logically comprise all Key/Value pairs which are provided for a certain Mapper for processing, a getLength method in the InputSplits is used for obtaining the size of the InputSplits, and a getLocation method is used for returning a corresponding location list; 3) And returning a Recordreader object by using a CreateRecordreader method, and reading a Key/Value pair in the fragment by using the Mapper.
Fourth, fileInputFormat enables files to be used as input data for the system. Thus, the FileInputFormat class needs to be inherited if it needs to be passed to the distributed processing framework in a file.
Fifth, gfImageFileInputFormat inherits the FileInputFormat class, and includes the following member methods: configure (), issaplable (), getspheres (), getrcordreader (). The configuration () is used to configure the relevant attributes, the issplible () is used to determine whether to perform data block segmentation, the getSplits () is used to segment the data block, and the getRecordReader () is used to perform a read operation on the corresponding record. Meanwhile, getrecordreadreader () for recording and reading calls a gfimagefilerecordrreader () method in the gfimagefilerecordrreader class.
Sixthly, the GfImageFileInputFormat first executes configure () to perform a configuration operation, then judges whether the data block needs to be split by using isssplitable (), and if so, executes a data block splitting function getSplits (). Then the execution of the getrecordreadreader () method of the read record is started, which calls the construction method of the last one of the four classes constructed previously, i.e. the gfimagefilerecorderreader () method of the gfimagefilerecorderreader class.
Seventh, the gfimagefilerecord reader class mainly includes the construction methods gfimagefilerecord reader (), createKey (), createValue (), and next (). The method comprises the steps of recording a remote sensing image data, wherein the two methods of createKey () and createValue () are used for returning a key value pair, the next () method realizes the continuous reading operation of records, and then the GfImageFileRecordReader () performs the work of reading the remote sensing image data.
By expanding the input file format of the distributed file system, the support of the distributed file system for importing and storing the remote sensing image type data is realized.
2. Remote sensing image metadata management and optimization
The index and reference of the data are established by using the metadata of the remote sensing image data, and the access to the data is achieved by the reference of the metadata, so that the structured processing of the remote sensing image information is realized. Meanwhile, the data are accessed through the reference of the metadata, so that the distributed algorithm can effectively read the remote sensing image data.
Metadata of the remotely sensed image data is managed by the master node. The metadata mainly comprises three types of file directories, file blocks and position information. By the reference of the metadata, the remote sensing image data can be accessed, and the calculation requirement of subsequent data is greatly facilitated. And the main node saves all the remote sensing images and the metadata of the folders by using a file system tree in the forms of a name space mirror image and a modified log file, wherein the file system tree comprises information such as file data block handles, distributed nodes and the like. Meanwhile, for the information request of the client, the main node performs corresponding feedback, and the client can perform local import, export, file deletion, directory creation and other operations through a program instruction, so that the remote sensing image data is processed.
In the distributed storage file system of the remote sensing image, a main node can manage a plurality of slave nodes. The main node mainly comprises the steps of storing remote sensing image metadata, file names and directory information where single remote sensing image files are located. The master node can receive heartbeat information sent by the slave nodes so as to confirm the running state of the master node, and the slave nodes are used for storing and backing up remote sensing image data.
In order to ensure effective storage and management of the remote sensing image and the metadata thereof, the system can be optimized correspondingly:
1) Remote sensing image file redundancy. And a plurality of slave node nodes are used for backing up the image information so as to improve the fault tolerance of the system. Transfer copy is performed for one copy of the Block store at the same time. If a certain node is stored to be in fault or abnormal, the Block copy of the remote sensing image data can be found out in other slave nodes, so that efficient access to data information is guaranteed, and fault tolerance of the remote sensing image data is achieved.
2) And detecting the heartbeat of the remote sensing image data. The slave node periodically sends heartbeat signals to the master node to verify the current state and the connectivity of the slave node, if the heartbeat signals are not detected, a certain slave node storing the remote sensing images is determined to be down, and subsequent remote sensing data cannot be arranged in the slave node during storage. Secondly, the slave node periodically sends and summarizes the Block information list to a corresponding mapping table in the master node through a heartbeat detection mode, so that the detection and the retrieval of the data are facilitated by using the metadata of the remote sensing image data, and the integrity and the usability of the data are ensured.
3) Distributed file system security mode optimization and copy threshold setting. When the distributed storage system of the remote sensing image is started, the main node checks the Block information of all data in the whole system, then determines whether the correct coefficient of each Block is equal to or larger than a set copy threshold value, and if not, copies the Block of the remote sensing image data.
4) And (3) detecting the integrity of the remote sensing image data, calculating the checksum of the Block when the distributed file system creates the data Block, mainly judging the integrity of the data by the checksum, and reading a corresponding data copy nearby if the data is incomplete. When the remote sensing image data is stored, different storage strategies can be adopted according to different slave node capacities in consideration of load balance. And finally, adopting a pipeline copying mode for the remote sensing image data, notifying a main node of the distributed system after the cache in the client reaches the set size of Block, and starting to copy the remote sensing image data Block according to a slave node given by the main node. When the first slave node writes the data into the disk, the data is sent to the second slave node, so that the time consumed by writing the remote sensing image data Block is reduced, and the integrity of the data is ensured.
By the management and optimization of the metadata of the remote sensing image, the reliability, adaptability and robustness of the distributed storage system are ensured, the remote sensing image data can be accessed by the index and reference of the metadata, and the processing efficiency of the remote sensing image is greatly improved.
3. Distributed computation of remote sensing images
A deep convolution neural network is trained by a plurality of nodes together in a data parallel mode, so that the training speed and the processing efficiency of the remote sensing image data are improved. And the training and parameter updating mode of the deep convolutional network adopts batch updating. And performing the next iteration after the remote sensing image is subjected to overall calculation by batch updating, wherein the training of the network is not influenced by the input sequence, and conditions are provided for distributed calculation of the remote sensing image data.
And performing distributed training on the deep convolutional network, storing an identical and complete deep neural network at each computing node, and performing optimized training on a deep neural network model by each node according to corresponding data and summarizing and updating the deep neural network model.
Firstly, mapper performs forward propagation and backward error calculation on the remote sensing image data, solves local change of model parameters, and generates intermediate key value pairs such as < key = w, value = Δ w >. Then, the Combiner is executed to locally summarize the calculation results of the model parameters, and the I/O transmission consumption of the data is reduced. Finally, the Reducer receives the output result of the Combiner, collects the parameter local change quantity of each node to obtain a global change quantity, and performs batch updating.
The deep neural network distributed training method comprises the following steps:
Figure BDA0003811054890000171
Figure BDA0003811054890000181
various corresponding changes and modifications can be made by those skilled in the art based on the above technical solutions and concepts, and all such changes and modifications should be included in the protection scope of the present invention.

Claims (8)

1. The distributed processing system for the high-resolution remote sensing image is characterized by comprising a remote sensing image input file format extension module, a remote sensing image metadata management and optimization module and a distributed computing module for the remote sensing image, wherein the remote sensing image input file format extension module is used for receiving the remote sensing image input file format extension module, the remote sensing image metadata management and optimization module is used for managing the remote sensing image metadata, and the distributed computing module is used for computing the remote sensing image metadata
The remote sensing image input file format extension module is used for correspondingly extending the input file format of the distributed file system for the high-resolution remote sensing image type data, so that the distributed file system can effectively store the remote sensing image type data, the input file format of the remote sensing image data is extended after the distributed file system can effectively store the remote sensing image type data, gfFileName, gfFileContent, gfImageFileInputFound and GfImageFileRecordReader are constructed, and remote sensing images in the HDFS of the distributed file system are respectively stored in the GfFileName and the GfFileContent by image names and image contents;
the remote sensing image metadata management and optimization module utilizes metadata of remote sensing image data to establish index and reference of the data, access to the data is achieved through reference of the metadata, and structured processing of remote sensing image information is achieved; the data itself is accessed by referring to the metadata, so that the remote sensing image data can be effectively read by a distributed algorithm;
the distributed computing module of the remote sensing image adopts a data parallel mode, and a deep convolution neural network is trained by a plurality of nodes together, so that the training speed and the processing efficiency of the remote sensing image data are improved.
2. The distributed processing system for the high-resolution remote sensing image according to claim 1, wherein the training and parameter updating mode of the deep convolutional neural network adopts batch updating; and performing the next iteration after performing the integral calculation on the remote sensing images by batch updating.
3. A method of implementing the distributed processing system for high resolution remote sensing images according to claim 1, comprising the steps of:
s1, expanding the format of a remote sensing image input file;
s2, managing and optimizing the remote sensing image metadata after format expansion;
and S3, carrying out distributed calculation on the remote sensing image processed in the step S2.
4. The method for the distributed processing system of the high resolution remote sensing image according to claim 3, wherein the step S1 further comprises:
s1.1, storing the image name and the image content of the remote sensing image by using the GfFileName and the GfFileContent;
s1.2GfImageFileInputFormat inherits from FileInputFormat, and the FileInputFormat realizes an InputFoundation interface; the GfImageFileRecordreader realizes a Recorderreader interface;
s1.3, reading data by using the InputFormat, distributing the data to a Mapper for processing, and finally reading a Key/Value pair by the Mapper;
s1.4FileInputFormat realizes that a file can be used as input data of a system;
s1.5GfImageFileInputFormat inherits the FileInputFormat class and comprises the following member methods: configure (), issaplable (), getSplits (), getRecordReader (); wherein, configure () is used to configure the relevant attribute, issplible () is used to judge whether to perform data block segmentation, getSplits () is used to segment the data block, getRecordReader () is used to perform read operation on the corresponding record; meanwhile, getRecordReader () for recording and reading calls a GfImageFileRecordReader () method in the GfImageFileRecordReader class;
s1.6GfImageFileInputFormat firstly executes configuration () to carry out configuration operation, then judges whether a data block needs to be divided by using issDeltable (), and executes a data block division function getSplits (), if so; then, starting to execute a getrecordreadreader () method for reading records, which calls a construction method of the last one of the four classes constructed previously, namely a gfimagefilerecorderreader () method of the gfimagefilerecorderreader class;
the s1.7 gfimagefilerecorderreader class mainly includes the construction methods gfimagefilerecorderreader (), createKey (), createValue (), and next (). The method comprises the steps of generating a createKey () and a createValue () to return a key value pair, realizing continuous reading operation of records by a next () method, and then reading remote sensing image data by a GfImageFileRecordReader ().
5. The method for the distributed processing system of high resolution remote sensing images according to claim 4, wherein the InputFormat is a distributed framework including verifying correctness of job input; dividing input data into logical InputSplit by using a getSplits method, wherein one InputSplit allocates a separate Mapper for processing, the InputSplit logically comprises all Key/Value pairs which are provided for a certain Mapper for processing, a getLength method in the InputSplit is used for acquiring the size of the InputSplit, and a getLocation method is used for returning a corresponding location list; and returning a Recordreader object by using a CreateRecordreader method, and reading a Key/Value pair in the fragment by using the Mapper.
6. The distributed processing system for high-resolution remote sensing images as claimed in claim 3, wherein the step S2 manages metadata of the remote sensing images in the expanded format
The method comprises the steps that main nodes manage, and metadata comprise three types of file directories, file blocks and position information; the main node saves all the remote sensing images and the metadata of the folders by using a file system tree in the forms of a name space mirror image and a modified log file, wherein the file system tree comprises file data block handles and the information of distributed nodes; meanwhile, for the information request of the client, the main node performs corresponding feedback, and the client can perform local import, export, file deletion, directory creation and other operations through a program instruction, so that the remote sensing image data is processed.
7. The method for the distributed processing system of the high resolution remote sensing image according to claim 3, wherein the optimizing the remote sensing image metadata with the extended format in step S2 includes:
s2.1, remote sensing image file redundancy; the image information is backed up by utilizing a plurality of slave node nodes so as to improve the fault tolerance of the system; transfer copying is carried out while one copy of Block is stored; if a certain node is stored to be in fault or abnormal, the Block copy of the remote sensing image data can be found out in other slave nodes, efficient access to data information is guaranteed, and fault tolerance of the remote sensing image data is achieved;
s2.2, heartbeat detection of remote sensing image data; the slave node periodically sends heartbeat signals to the master node to verify the current state and the connectivity of the slave node, if the heartbeat signals are not detected, a certain slave node storing the remote sensing image is determined to be down, and subsequent remote sensing data are not arranged in the slave node during storage; secondly, the slave node periodically sends and summarizes the Block information list to a corresponding mapping table in the master node through a heartbeat detection mode, so that the detection and the retrieval of the data by using the metadata of the remote sensing image data are facilitated, and the integrity and the usability of the data are ensured;
s2.3, optimizing a security mode of the distributed file system and setting a copy threshold value. When a distributed storage system of remote sensing images is started, a main node checks Block information of all data in the whole system, then determines whether a correct coefficient of each Block is equal to or greater than a set copy threshold value, and if not, copies the blocks of the remote sensing image data;
s2.4, detecting the integrity of remote sensing image data, calculating a checksum of a Block when the Block is created by a distributed file system, mainly helping to judge the integrity of the data, and reading a corresponding data copy nearby if the data is incomplete; when remote sensing image data are stored, different storage strategies can be adopted according to different slave node capacities in consideration of load balance; finally, a pipeline copying mode is adopted for remote sensing image data, after the cache in the client reaches the set size of Block, a master node of the distributed system is informed, and the remote sensing image data Block is copied according to a slave node given by the master node; and the first slave node starts to send data to the second slave node while writing the data into the disk, so that the time consumed by writing the remote sensing image data Block is reduced, and the integrity of the data is ensured.
8. The method for distributed processing of remote sensing images with high resolution according to claim 3, wherein the step S3 includes Mapper forward propagation and backward error calculation of remote sensing image data, solving local change of model parameters, generating intermediate key value pairs including < key = w and value = Δ w >, then performing combination to locally summarize the calculation results of model parameters while reducing I/O transmission consumption of data, and finally receiving the output results of combination by Reducer, summarizing the local change of parameters of each node to obtain global change, and performing batch updating.
CN202211012277.7A 2022-08-23 2022-08-23 High-resolution remote sensing image distributed processing system and method Pending CN115391276A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211012277.7A CN115391276A (en) 2022-08-23 2022-08-23 High-resolution remote sensing image distributed processing system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211012277.7A CN115391276A (en) 2022-08-23 2022-08-23 High-resolution remote sensing image distributed processing system and method

Publications (1)

Publication Number Publication Date
CN115391276A true CN115391276A (en) 2022-11-25

Family

ID=84121385

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211012277.7A Pending CN115391276A (en) 2022-08-23 2022-08-23 High-resolution remote sensing image distributed processing system and method

Country Status (1)

Country Link
CN (1) CN115391276A (en)

Similar Documents

Publication Publication Date Title
US11461202B2 (en) Remote data replication method and system
JP6309103B2 (en) Snapshot and clone replication
JP4419884B2 (en) Data replication apparatus, method, program, and storage system
JP5387757B2 (en) Parallel data processing system, parallel data processing method and program
CN109542682B (en) Data backup method, device, equipment and storage medium
JP2005529410A (en) Multiple simultaneously active file systems
CN107665219B (en) Log management method and device
CN107545015B (en) Processing method and processing device for query fault
CN104040481A (en) Method Of And System For Merging, Storing And Retrieving Incremental Backup Data
CN104301360A (en) Method, log server and system for recording log data
CN113553313B (en) Data migration method and system, storage medium and electronic equipment
JP2007241486A (en) Memory system
CN114416665B (en) Method, device and medium for detecting and repairing data consistency
CN111831475A (en) Data backup method and device, node equipment and readable storage medium
CN107506466B (en) Small file storage method and system
CN113821476B (en) Data processing method and device
CN117112522A (en) Concurrent process log management method, device, equipment and storage medium
CN109325005A (en) A kind of data processing method and electronic equipment
CN115391276A (en) High-resolution remote sensing image distributed processing system and method
CN110297728B (en) Selective data reconstruction method in file reconstruction process based on origin data
CN112711627A (en) Data import method, device and equipment for greenplus database
CN105573862A (en) Method and equipment for recovering file systems
CN111221801A (en) Database migration method, system and related device
CN118170589B (en) Data processing method, computer program product, equipment and computer medium
CN116257531B (en) Database space recovery method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination