CN114882350A

CN114882350A - Image processing method and device, electronic equipment and storage medium

Info

Publication number: CN114882350A
Application number: CN202210331915.5A
Authority: CN
Inventors: 张译; 杨伯钢; 贾光军; 余永欣; 刘博文; 刘晓娜; 崔亚君; 刘鹏; 王琪; 王怡; 龚芸; 韩雄
Original assignee: Beijing Institute of Surveying and Mapping
Current assignee: Beijing Institute of Surveying and Mapping
Priority date: 2022-03-30
Filing date: 2022-03-30
Publication date: 2022-08-09

Abstract

The present disclosure relates to an image processing method and apparatus, an electronic device, and a storage medium, the method including: carrying out downsampling processing and coding processing on an input image to obtain a first feature map; performing dimension compression processing and encoding processing on the first feature map to obtain a second feature map; and carrying out dense upsampling processing on the second feature map to obtain a segmentation map. According to the image processing method disclosed by the embodiment of the disclosure, the processing efficiency can be improved and the time period for finding problems can be shortened through dimension compression and encoding processing, and the precision of the segmentation map can be improved through intensive upsampling processing, so that the fine identification and management of the shoreline are facilitated, the image processing method can be applied to monitoring of the shoreline in a large range, and meanwhile, the labor cost is reduced.

Description

Image processing method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to an image processing method and apparatus, an electronic device, and a storage medium.

Background

Ecological civilization construction is an important prerequisite for guaranteeing the sustainable high-quality development of the economic society. In order to improve the ecological civilization construction level, ecological protection and green development of an ecological conservation area are firmly promoted, ecological construction investigation and evaluation of natural ecological space are continuously carried out, air quality is continuously improved, water environment quality is stable, ecological environment condition is stable, and accordingly positive contribution is made to ecological civilization construction.

Monitoring of rivers and lake and reservoir shorelines is an important link of ecological civilization construction, in the aspect of monitoring technology, in the related technology, the outstanding problems of long satellite remote sensing monitoring time period, low accuracy of finding problem point positions, high ground investigation labor cost, small monitoring coverage, strong dependence on professionals, overlong consumed time and the like exist, the requirements of refining, precision, intellectualization and informatization monitoring of inland rivers and lake and reservoir shorelines cannot be met, and the upcoming monitoring requirement relating to larger area cannot be met. Meanwhile, the bank line identification of inland rivers and lakes and reservoirs is mostly manually combined with image data for judgment at present, and the bank line identification precision is slightly different due to different judgment standards of operators.

Disclosure of Invention

The disclosure provides an image processing method and device, an electronic device and a storage medium.

According to an aspect of the present disclosure, there is provided an image processing method including: carrying out downsampling processing and coding processing on an input image to obtain a first feature map; performing dimension compression processing and encoding processing on the first feature map to obtain a second feature map, wherein the dimension compression processing comprises processing for reducing the number of feature channels of the feature map under the condition that information contained in the feature map is not changed; and carrying out dense upsampling processing on the second feature map to obtain segmentation maps of target areas of multiple categories in the input image.

In a possible implementation manner, performing dimension compression processing and encoding processing on the first feature map to obtain a second feature map includes: performing fusion processing on pixel points at corresponding positions of a first feature map of a plurality of feature channels to obtain a third feature map, wherein the number of the feature channels of the third feature map is less than that of the feature channels of the first feature map; coding the third feature map to obtain a fourth feature map; before the preset iteration times are reached, taking the fourth feature diagram as a new first feature diagram, and re-executing the steps of performing fusion processing on pixel points at corresponding positions of the first feature diagrams of the plurality of feature channels and then; and when the preset iteration times are reached, taking the obtained fourth feature map as the second feature map.

In a possible implementation manner, performing dense upsampling processing on the second feature map to obtain segmentation maps of a plurality of classes of target objects in the input image includes: and reconstructing pixel points at corresponding positions of second feature maps of the plurality of feature channels to obtain the segmentation map, wherein the segmentation map is consistent with the input image in size.

In a possible implementation manner, reconstructing a pixel point at a corresponding position of a second feature map of a plurality of feature channels to obtain the segmentation map includes: determining a size relationship between the input image and the second feature map; and reconstructing the pixel points at the corresponding positions of the second characteristic diagram according to the size relation to obtain the segmentation diagram.

In one possible implementation, the method is implemented by a depth coding network, and the method further includes: inputting a sample image into the depth coding network, and obtaining a sample segmentation graph of the sample image, wherein the sample image comprises a contour label of a contour of a target area in the sample image and a category label of a target object; obtaining a first network loss according to the sample segmentation graph and the category label; obtaining a second network loss according to the sample segmentation graph and the outline label; obtaining the network loss of the deep coding network according to the first network loss and the second network loss; and training the deep coding network according to the network loss.

In one possible implementation, the method further includes: obtaining contour lines of the target areas of the multiple categories according to the segmentation graph; acquiring the type of the target area; acquiring historical data of the contour line, wherein the historical data comprises a historical type of the contour line and a historical contour line; and obtaining a classification result of the contour line according to the historical data and the contour line and the type of the target area.

In a possible implementation manner, obtaining a classification result of the contour line according to the historical data and the contour line and the type of the target region includes: determining the type of the contour line according to the type of the target area; and merging the historical contour lines and the contour lines of the same type, and determining the type of the contour lines as the type of the merged contour lines.

According to an aspect of the present disclosure, there is provided an image processing apparatus including: the encoding module is used for carrying out downsampling processing and encoding processing on an input image to obtain a first feature map; the compression module is used for carrying out dimension compression processing and encoding processing on the first feature map to obtain a second feature map, wherein the dimension compression processing comprises the processing of reducing the number of feature channels of the feature map under the condition that the information contained in the feature map is not changed; and the up-sampling module is used for carrying out dense up-sampling processing on the second feature map to obtain segmentation maps of target areas of multiple categories in the input image.

In one possible implementation, the compression module is further configured to: performing fusion processing on pixel points at corresponding positions of a first feature map of a plurality of feature channels to obtain a third feature map, wherein the number of the feature channels of the third feature map is less than that of the feature channels of the first feature map; coding the third feature map to obtain a fourth feature map; and before the preset iteration times are reached, taking the fourth feature diagram as a new first feature diagram, re-executing the steps of performing fusion processing on pixel points at corresponding positions of the first feature diagrams of the plurality of feature channels and then, and taking the obtained fourth feature diagram as the second feature diagram when the preset iteration times are reached.

In one possible implementation, the upsampling module is further configured to: and reconstructing pixel points at corresponding positions of second feature maps of the plurality of feature channels to obtain the segmentation map, wherein the segmentation map is consistent with the input image in size.

In one possible implementation, the upsampling module is further configured to: determining a size relationship between the input image and the second feature map; and reconstructing the pixel points at the corresponding positions of the second characteristic diagram according to the size relation to obtain the segmentation diagram.

In one possible implementation, the method is implemented by a depth coding network, and the apparatus further includes: the training module is used for inputting a sample image into the depth coding network to obtain a sample segmentation graph of the sample image, wherein the sample image comprises a contour label of a contour of a target area in the sample image and a category label of a target object; obtaining a first network loss according to the sample segmentation graph and the category label; obtaining a second network loss according to the sample segmentation graph and the outline label; obtaining the network loss of the deep coding network according to the first network loss and the second network loss; and training the deep coding network according to the network loss.

In one possible implementation, the apparatus further includes: a result obtaining module, configured to obtain contour lines of the target areas of the multiple categories according to the segmentation map; acquiring the type of the target area; acquiring historical data of the contour line, wherein the historical data comprises a historical type of the contour line and a historical contour line; and obtaining the classification result of the contour line according to the historical data and the contour line and the type of the target area.

In a possible implementation manner, the result obtaining module is further configured to determine a type of the contour line according to the type of the target area; and merging the historical contour lines and the contour lines of the same type, and determining the type of the contour lines as the type of the merged contour lines.

According to an aspect of the present disclosure, there is provided an electronic device including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.

According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.

According to the image processing method disclosed by the embodiment of the disclosure, the processing efficiency can be improved and the time period for finding problems can be shortened through dimension compression and encoding processing, and the precision of the segmentation map can be improved through intensive upsampling processing, so that the fine identification and management of the shoreline are facilitated, the image processing method can be applied to monitoring of the shoreline in a large range, and meanwhile, the labor cost is reduced.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.

FIG. 1 shows a flow diagram of an image processing method according to an embodiment of the present disclosure;

FIG. 2 shows a schematic diagram of a dense upsampling process according to an embodiment of the present disclosure;

fig. 3 shows an application diagram of an image processing method according to an embodiment of the present disclosure;

FIG. 4 shows a schematic diagram of an application of an image processing method according to an embodiment of the present disclosure;

fig. 5 shows a block diagram of an image processing apparatus according to an embodiment of the present disclosure;

FIG. 6 shows a block diagram of an electronic device in accordance with an embodiment of the disclosure;

fig. 7 shows a block diagram of an electronic device in accordance with an embodiment of the disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

Fig. 1 shows a flowchart of an image processing method according to an embodiment of the present disclosure, as shown in fig. 1, the method comprising:

in step S11, a downsampling process and an encoding process are performed on the input image to obtain a first feature map;

in step S12, performing dimension compression processing and encoding processing on the first feature map to obtain a second feature map, where the dimension compression processing includes processing for reducing the number of feature channels of the feature map when information included in the feature map is not changed;

in step S13, the second feature map is subjected to dense upsampling processing, and segmentation maps of target regions of a plurality of classes in the input image are obtained.

Wherein the pixel value of a pixel in the segmentation map may represent the probability that the pixel belongs to the target region.

In one possible implementation, during the monitoring and management process of the shoreline, the type of the shoreline can be divided according to factors such as natural attributes and the use of the shoreline, for example, the shoreline is divided into a natural shoreline, an artificial shoreline and a living shoreline, wherein the natural shoreline is divided into an original shoreline, a near-natural shoreline and the like; the artificial shoreline is divided into a production type shoreline, a protection type shoreline and the like.

In an example, the primitive-type shorelines in the natural shorelines are: the natural shoreline is basically reserved, and the shoreline of which the dynamic balance is not obviously influenced by artificial structures mostly exists in an area which has a certain distance from a living accumulation area of human beings and is difficult or less to reach by human activities. The near-natural shoreline in the natural shoreline is: the shoreline with natural morphological characteristics and ecological functions after renovation and restoration is mostly present in areas which are close to the human living accumulation area and are easy to reach for human activities. The life type shoreline is: the living type shoreline is generally open to the public and is mostly present in parks, scenic spots and other areas. The production type shoreline in the artificial shoreline is as follows: the shore line meeting the requirements of normal projects such as traffic, ship manufacturing, water taking, water drainage and the like and production mainly comprises facilities such as a wharf, a wharf boat, ship berthing, a bridge, an elevated road, a pump station, a drainage gate and the like. The protection type shoreline in the artificial shoreline is as follows: the bank line of projects such as breakwater and revetment is composed of permanent structures. The classification of the shoreline can be divided according to the classification standard, and the shoreline of each classification can be identified.

In a possible implementation manner, the input image is an image of a certain area, for example, an aerial image, a satellite image, and the like, and the image may include various areas, for example, a river area, a lake area, a sea area, a land area, and the like. In the above-described region, various kinds of land lines can be included, and the land lines can be finely identified.

In a possible implementation manner, the aerial image and the satellite image may be subjected to image processing, for example, the aerial image is a video frame in an aerial video, the aerial video may be analyzed, clipped, and the like to obtain a video frame, and the video frame may be subjected to preprocessing such as normalization and the like to obtain the input image.

In one possible implementation, the input image may be processed, and in an example, the input image may be processed through a Deep Encoding Network (DE-Net), which may be a Deep learning neural Network, and may implement segmentation of a pixel level building in a high-resolution video.

In one possible implementation, the depth coding network may include a plurality of network levels, such as a down-sampling level, an encoding level, a dimension compression level, and a dense up-sampling level, which may be respectively used for down-sampling, encoding, dimension compression, and dense up-sampling. The present disclosure does not limit the specific network structure of the depth coding network.

In one possible implementation, in step S11, the input image may be downsampled and encoded by the depth coding network. In the related art, a feature channel is usually added to downsample a neural network model by reducing the resolution of an image to enlarge a receptive field and obtain an abstract feature, for example, downsampling processing may be performed by pooling, but downsampling processing in the related art usually performs simple averaging or maximum value taking on kernel information, so as to obtain a pixel value of one pixel by using pixel values of a plurality of pixels to achieve the effects of downsampling and enlarging the receptive field, but the feature of a feature map obtained by the above method is single.

In one possible implementation manner, in step S11, the features obtained by downsampling may be fused together and input to the coding hierarchy for coding, for example, convolution, activation, or the like, to obtain richer feature information, in a manner that the downsampling process and the coding process are alternated. The downsampling level can comprise network levels such as a convolution layer, a pooling layer and an activation layer, and can be used for downsampling the input image and obtaining the feature maps of a plurality of channels.

In an example, the coding layer may include a plurality of residual blocks (e.g., 6) that may include a convolutional layer and an active layer. In order to increase the retention degree of the characteristic information in the internal transfer process of the neural network, the activation layer can select an appropriate activation function for activation processing. The nature of the activation function (e.g., RELU activation function) in the related art is: the activation value is a linear function of the input value when the input value is greater than 0, and the activation value is 0 when the input value is less than or equal to 0, which is simple and efficient in calculation, but is likely to lose part of the information, for example, when the input value is less than or equal to 0, the information is lost. Thus, the activation functions used by the activation layer may be other activation functions, such as ELU, PReLU, and SELU.

In an example, the ELU activation function is shown in equation (1) below:

in an example, the PReLU activation function is shown in the following equation (2):

in an example, the SELU activation function is shown in equation (3) below:

where λ and α are preset parameters and x is an input value, in the example, α is 1.673 and λ is 1.051. The present disclosure does not limit the specific values of the preset parameters.

In one possible implementation, as described above, when the input value is less than or equal to 0, the three activation functions can still retain the input information, and information loss can be prevented.

In an example, the SELU activation function may be selected as the activation function of the activation layer, and the present disclosure does not limit the activation function used by the activation layer.

In one possible implementation, through the alternate iterative processing of a plurality of downsampling levels and encoding levels, the first feature map with rich feature information can be obtained.

In one possible implementation manner, in step S12, the dimension compression process and the encoding process may be performed on the first feature map. Since the number of feature channels, that is, the dimension of the feature map, is increased while the receptive field is expanded by the down-sampling level and the encoding level, the receptive field of the first feature map is wide and the dimension is high after the above-described alternate iteration processing of the down-sampling and encoding for a plurality of times is performed. However, the feature map with higher dimension has lower calculation efficiency and higher processing resource requirement, and dimension compression processing can be performed on the first feature map to obtain the second feature map without losing information.

In one possible implementation, step S12 may include: performing fusion processing on pixel points at corresponding positions of a first feature map of a plurality of feature channels to obtain a third feature map, wherein the number of the feature channels of the third feature map is less than that of the feature channels of the first feature map; coding the third feature map to obtain a fourth feature map; and before the preset iteration times are reached, taking the fourth feature diagram as a new first feature diagram, re-executing the steps of performing fusion processing on pixel points at corresponding positions of the first feature diagrams of the plurality of feature channels and then, and taking the obtained fourth feature diagram as the second feature diagram when the preset iteration times are reached.

In one possible implementation, the dimension compression and encoding processes may also be performed iteratively in alternation, as with the downsampling and encoding processes described above. The dimension compression process may be performed by a dimension compression hierarchy that may fuse the first feature maps of the plurality of feature channels. For example, the pixels at the corresponding positions in the plurality of first feature maps may be merged into fewer pixels or one pixel. In an example, the pixel values of the pixel points at the corresponding positions in the plurality of first feature maps may all be stored in the information attached to one or more (less than the number of the first feature maps) pixel points, and then the plurality of first feature maps may be fused into one third feature map, where each pixel point in the third feature map may have information of the pixel points of the plurality of first feature maps. In an example, the dimension compression hierarchy may also include an activation layer, and the compressed third feature map may be activated using an activation function such as RELU, so as to obtain the activated third feature map.

In one possible implementation, the third feature map may be subjected to an encoding process, for example, the fourth feature map may be obtained by performing an encoding process through an encoding hierarchy. The processing procedure of the encoding process is as described above, and is not described herein again.

In one possible implementation, as described above, the dimension compression process and the encoding process may be alternately performed iteratively, and therefore, the dimension compression process and the encoding process may be performed again on the output fourth feature map, that is, the output fourth feature map is subjected to the process of the first feature map, the iteration is performed for a plurality of times, and when the iteration number reaches a preset number, the second feature map is obtained.

By the method, the dimensionality of the feature map can be reduced and the processing efficiency can be improved by dimension compression processing under the condition of not losing feature information.

In one possible implementation, the second feature map obtained through the above processing has abundant feature information, the number of feature channels is greater than 1, and the size of the second feature map is smaller than that of the input image. The second feature map may be subjected to an upsampling process to obtain segmentation maps of target regions of a plurality of classes in the input image.

In one possible implementation, in the related art, the upsampling method may include inverse pooling, deconvolution, interpolation, or the like, so as to increase the size of the feature map by supplementing the pixel points. However, in the related art, the obtained feature information may not be retained in the upsampling process, for example, the upsampling process may change or lose information of a part of pixel points, so that a part of features of an image obtained by upsampling is lost, thereby causing problems of accuracy reduction and the like.

In one possible implementation, in order to address the above problem, in step S13, a segmentation map that is consistent with the resolution of the input image may be obtained through a dense upsampling process. The dense upsampling process may be performed by a dense upsampling hierarchy, in which the processing is performed without using the above-described methods such as inverse pooling, deconvolution, interpolation, and the like, but the pixel points in each feature map are reconstructed, so that information is not lost. Step S13 may include: and reconstructing pixel points at corresponding positions of second feature maps of the plurality of feature channels to obtain the segmentation map, wherein the segmentation map is consistent with the input image in size.

In a possible implementation manner, in the reconstruction process, a plurality of pixel points corresponding to the position in the second feature map in the segmentation map can be formed by using pixel points (i.e., a plurality of pixel points at the same position) at corresponding positions of each second feature map, that is, the resolution of the second feature map is smaller than that of the segmentation map, and the position of a certain pixel point in the second feature map corresponds to the position of a plurality of pixel points in the second feature map, so that the pixel points at corresponding positions in the plurality of second feature maps can be used for reconstruction to obtain a plurality of pixel points at corresponding positions in the segmentation map. Thereby realizing the effect of up-sampling without losing the characteristic information.

Fig. 2 shows a schematic diagram of a dense upsampling process according to an embodiment of the present disclosure. As shown in FIG. 2, the input image or the segmentation map has a size of H × W and a number of 1, and after the above-mentioned processing, the size of the second feature map is (H/C) × (W/C), and accordingly, the number of the second feature maps is C ² Therefore, the size relationship between the input image and the second feature map is that the size of the input image is C of the second feature map ² And (4) doubling.

In a possible implementation manner, the size relationship may be utilized to reconstruct the pixel points at the corresponding positions of the second feature map, for example, a plurality of the first feature maps may be usedThe pixel points of the corresponding positions of the two feature maps are all arranged at the corresponding positions of the segmentation map, namely, all the pixel points (C) of the second feature map are reserved at the corresponding positions in the segmentation map ² Individual pixel points) to form a segmentation map with complete characteristic information.

In an example, the feature channels of the second feature map may have an order in which the plurality of pixel points of the corresponding positions may be arranged at the corresponding positions in the segmentation map. Alternatively, the pixel points of the corresponding positions of the plurality of second feature maps may be randomly arranged at the corresponding positions in the segmentation map. The present disclosure does not limit the way in which pixel points are reconstructed.

In this way, the segmentation graph can be obtained by reconstructing the pixel points of the second feature graph, so that the loss of feature information is reduced, the accuracy of the segmentation graph is improved, and the pixel-level identification and segmentation are realized.

In a possible implementation manner, the above depth coding network may be trained before being used, for example, may be trained by using a plurality of sample images with labeling information, and the method further includes: inputting a sample image into the depth coding network, and obtaining a sample segmentation graph of the sample image, wherein the sample image comprises a contour label of a contour of a target area in the sample image and a category label of a target object; obtaining a first network loss according to the sample segmentation graph and the category label; obtaining a second network loss according to the sample segmentation graph and the outline label; obtaining the network loss of the deep coding network according to the first network loss and the second network loss; and training the deep coding network according to the network loss.

In a possible implementation manner, the sample image is similar to the input image, and may be an aerial image or a satellite image of a certain area, and the disclosure does not limit the sample image. The sample image may be input into a depth coding network to obtain a sample segmentation map (the segmentation map obtained by the depth coding network operation may have errors).

In one possible implementation, the loss function may quantify an error between the prediction result obtained by the depth coding network and the labeling information, and the gradient of the neurons of the depth coding network based on the error may adjust the network parameters in a direction that reduces the error. Different loss functions have different emphasis, and even if the depth coding network acquires the same sample image and the same annotation information, different learning results can be generated under the adjustment of the different loss functions. Therefore, selecting the loss function is very important for training the deep coding network. The identification and segmentation of objects in images can be regarded as a two-class segmentation problem, while a binary cross-entropy function is the most common loss function for this type of task.

In one possible implementation, the cross entropy function may be used as the first network loss, and the sample segmentation map and the class label may be used to obtain the first network loss, which may be determined by the following equation (4):

wherein N is the number of pixels, y' _n For the annotation information of the nth pixel, i.e. true category, y _n The predicted class probability for n pixels, i.e., the class probability of the depth coding network output. CE cross entropy function.

In one possible implementation, although the form of the cross-entropy function is easy to compute gradients and optimize the depth coding network, building pixels and background (non-building) pixels in the sample image have a serious imbalance problem, and the cross-entropy function may emphasize identifying a high proportion of classes, e.g., non-building pixels, making building pixels difficult to identify. Therefore, a loss function that is not easily biased by sample imbalance, for example, a dess loss function (dice loss), which represents the size of the overlapping region of the output value of the depth coding network and the annotation information, may be selected and may be represented by a tasse correlation coefficient, as shown in the following formula (5):

wherein p is _n Class of prediction for the nth pixel point, i.e. class of depth coded network output, t _n And labeling information, namely, real category, of the nth pixel point. The aforementioned dice loss function dice may be determined as the second network loss.

In one possible implementation, the network loss of the deep coding network may be obtained based on the first network loss and the second network loss, for example, the dess loss function and the cross entropy function may be summed to obtain the dess cross entropy loss function as the network loss. As shown in the following equation (6):

DCE＝ dice+ CE (6)

wherein DCE is the network loss.

In a possible implementation manner, back propagation may be performed based on the network loss to adjust parameters of the depth coding network, and training is completed when the network loss converges, so as to obtain a trained depth coding network, which is used for performing segmentation on the input image to obtain a segmentation map.

By the method, the depth coding network can be trained through the first network loss and the second network loss, the precision of the depth coding network is improved, and pixel-level segmentation can be achieved.

In one possible implementation, as described above, the depth coding network may be configured to process an input image such as an aerial image or a satellite image to segment regions in which various types of target objects are located, to obtain the contour of each target object, and further to obtain the position of a shoreline, for example, a segment line between a land region and a river region, that is, the contour lines of the land region and the river region may be determined as the shoreline, and based on the type of the target object on the land region, the type of the shoreline may be determined.

In one possible implementation, the method further includes: obtaining contour lines of the target areas of the multiple categories according to the segmentation graph; acquiring the type of the target area; acquiring historical data of the contour line, wherein the historical data comprises a historical type of the contour line and a historical contour line; and obtaining the classification result of the contour line according to the historical data and the contour line and the type of the target area.

In one possible implementation, the segmentation map may include contour lines of a plurality of categories of the target region, and as described above, the contour lines may include a bank line. Further, the type of the target area may be obtained, for example, the target area is a river area, a lake area, a dam area, a park area, a dock area, a forest area, etc., which is not limited by the present disclosure. Historical data may also be obtained, including historical types of contour lines and historical contour lines.

In one possible implementation, the type of contour line may be determined based on the type of target area and historical data, e.g., determining the type of a land line. Obtaining a classification result of the contour line according to the historical data and the contour line and the type of the target area, wherein the classification result comprises the following steps: determining the type of the contour line according to the type of the target area; and merging the historical contour lines and the contour lines of the same type, and determining the type of the contour lines as the type of the merged contour lines.

In an example, the type of contour line may be determined based on the type of the target region, e.g., if a certain section of contour line is a division line of a wharf region and a river region, then the section of contour line is a shoreline, and is a production-type shoreline in the artificial shoreline type. The present disclosure is not limited as to the type of shoreline.

In one possible implementation, further, historical contours and historical types in the historical data may be obtained. In an example, the historical contour line is a location where the historical contour line is located, for example, a division line of a wharf area and a river area, and the contour line may change over time, for example, a shoreline may extend as the wharf is constructed. Therefore, if the historical type of the contour line is the same as the type determined this time, the historical contour line and the contour line can be merged, for example, a union of the historical contour line and the contour line is taken, or any other merging manner is needed, a new contour line is obtained, and the type is the type determined this time.

In a possible implementation manner, if the history type of the contour line is different from the type determined this time, there may be a case where a misjudgment occurs or the target area is largely changed. In this case, the type of contour line may be determined by manual review. The present disclosure is not so limited.

According to the image processing method and device, the dimensionality of the feature map can be reduced through dimensionality compression processing under the condition that feature information is not lost, processing efficiency is improved, the time period for finding problems is shortened, the segmentation map can be obtained through reconstructing pixel points of the second feature map, loss of the feature information is reduced, accuracy of the segmentation map is improved, and pixel-level identification and segmentation are achieved. In the training process, the depth coding network can be trained through the first network loss and the second network loss, the precision of the depth coding network is improved, and pixel-level segmentation can be achieved. The method is favorable for fine identification and management of the shoreline, can be applied to monitoring of the shoreline in a large range, and simultaneously reduces labor cost.

Fig. 3 shows an application diagram of an image processing method according to an embodiment of the present disclosure. As shown in fig. 3, the depth coding network may be an aerial image, a satellite image, or the like, and the image includes various regions. The depth coding network can obtain a first characteristic diagram through multiple iterative processing of a down-sampling level and a coding level, and obtain a second characteristic diagram through multiple iterative processing of a dimension compression level and a coding level, so that the dimension is reduced and the calculation efficiency is improved under the condition of not losing characteristic information.

In a possible implementation manner, the second feature map can be processed through a dense upsampling level, that is, pixel points at corresponding positions in the second feature map of multiple channels are reconstructed to obtain a segmentation map, so that the segmentation map with the size consistent with that of the input image is obtained under the condition of not losing feature information, pixel-level segmentation of each region in the input image is realized, and segmentation precision is improved.

Fig. 4 shows an application diagram of an image processing method according to an embodiment of the present disclosure. As shown in fig. 4, the image data of 2021 year can be obtained, and at the same time, historical data, that is, the vector data of the water surface and the vector data of the shoreline in 2020 year (that is, data of the water region and the shoreline identified in the segmentation map of the same region based on 2020 year, including the position and type of the shoreline (that is, the contour line of the land region), and the like, can also be obtained.

In one possible implementation, the obtained data may be subjected to data processing, for example, for the 2020 water surface vector data, a surface line turning tool may be used to convert the data into a line vector, which is a 2020 water line vector (i.e., an outline of a water body region), and a buffer tool may be used to buffer the data by 50m to obtain buffer vector data. Further, the 2021-year image data may be subjected to a cropping process based on the buffer vector data obtained above, that is, the position of the bank edge to be processed is determined based on the buffer vector data, so that the 2021-year image data is subjected to the cropping process, resulting in a 2021-year cropped image for comparing the data obtained in 2021 with the historical data.

In one possible implementation, the 2021 year cropped image may be used as the input image, and may be processed by the depth coding network to obtain the segmentation map. Based on the segmentation map, the position of the water sideline in 2021 year and the position of the bank sideline can be determined, and based on the types of the land area and the water area, the types of the water sideline and the bank sideline are determined, and then compared with the data of the water sideline and the bank sideline in 2020 year.

In one possible implementation, if the type of the borderline in 2021 and the type of the borderline in 2020 are the same, merging may be performed (for example, merging may be performed if the borderline is changed), further, if the type of the borderline in 2021 and the type of the borderline in 2020 are the same, merging may be performed (for example, merging may be performed if the borderline is changed), and further, the merged borderline and the borderline may constitute the borderline in 2021. If the type of the shoreline changes in two years, the area may change greatly, or the division may be wrong, and the like, manual review may be performed to obtain the final result of the shoreline, that is, the category and the position of the shoreline.

Fig. 5 shows a block diagram of an image processing apparatus according to an embodiment of the present disclosure, as shown in fig. 5: the encoding module 11 is configured to perform downsampling processing and encoding processing on an input image to obtain a first feature map; a compression module 12, configured to perform dimension compression processing and encoding processing on the first feature map to obtain a second feature map, where the dimension compression processing includes processing for reducing the number of feature channels of the feature map under the condition that information included in the feature map is not changed; and an upsampling module 13, configured to perform dense upsampling on the second feature map to obtain a segmentation map of a target region of multiple categories in the input image.

It is understood that the above-mentioned method embodiments of the present disclosure can be combined with each other to form a combined embodiment without departing from the logic of the principle, which is limited by the space, and the detailed description of the present disclosure is omitted. Those skilled in the art will appreciate that in the above methods of the specific embodiments, the specific order of execution of the steps should be determined by their function and possibly their inherent logic.

In addition, the present disclosure also provides an image processing apparatus, an electronic device, a computer-readable storage medium, and a program, which can be used to implement any one of the image processing methods provided by the present disclosure, and the descriptions and corresponding descriptions of the corresponding technical solutions and the corresponding descriptions in the methods section are omitted for brevity.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.

Embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the above-mentioned method. The computer readable storage medium may be a non-volatile computer readable storage medium.

An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.

Embodiments of the present disclosure also provide a computer program product, which includes computer readable code, when the computer readable code runs on a device, a processor in the device executes instructions for implementing the image processing method provided in any one of the above embodiments.

The embodiments of the present disclosure also provide another computer program product for storing computer readable instructions, which when executed cause a computer to perform the operations of the image processing method provided in any of the above embodiments.

The electronic device may be provided as a terminal, server, or other form of device.

Fig. 6 illustrates a block diagram of an electronic device 800 in accordance with an embodiment of the disclosure. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like terminal.

Referring to fig. 6, electronic device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.

The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense an edge of a touch or slide action, but also detect a duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 800 is in an operation mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the electronic device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in the position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in the temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium, such as the memory 804, is also provided that includes computer program instructions executable by the processor 820 of the electronic device 800 to perform the above-described methods.

Fig. 7 illustrates a block diagram of an electronic device 1900 in accordance with an embodiment of the disclosure. For example, the electronic device 1900 may be provided as a server. Referring to fig. 7, electronic device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.

The electronic device 1900 may further include a power component 1926 configured to perform power management of the electronic device 1900, aA wired or wireless network interface 1950 is configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system, such as Windows Server, stored in memory 1932 ^TM ，Mac OS X ^TM ，Unix ^TM ,Linux ^TM ，FreeBSD ^TM Or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium, such as the memory 1932, is also provided that includes computer program instructions executable by the processing component 1922 of the electronic device 1900 to perform the above-described methods.

The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. An image processing method, comprising:

carrying out downsampling processing and coding processing on an input image to obtain a first feature map;

performing dimension compression processing and encoding processing on the first feature map to obtain a second feature map, wherein the dimension compression processing comprises processing for reducing the number of feature channels of the feature map under the condition that information contained in the feature map is not changed;

and carrying out dense upsampling processing on the second feature map to obtain segmentation maps of target areas of multiple categories in the input image.

2. The method according to claim 1, wherein performing dimension compression processing and encoding processing on the first feature map to obtain a second feature map comprises:

performing fusion processing on pixel points at corresponding positions of a first feature map of a plurality of feature channels to obtain a third feature map, wherein the number of the feature channels of the third feature map is less than that of the feature channels of the first feature map;

coding the third feature map to obtain a fourth feature map;

before the preset iteration times are reached, taking the fourth feature diagram as a new first feature diagram, and re-executing the steps of performing fusion processing on pixel points at corresponding positions of the first feature diagrams of the plurality of feature channels and then;

and when the preset iteration times are reached, taking the obtained fourth feature map as the second feature map.

3. The method according to claim 1, wherein performing dense upsampling on the second feature map to obtain segmentation maps of a plurality of classes of target objects in the input image comprises:

and reconstructing pixel points at corresponding positions of second feature maps of the plurality of feature channels to obtain the segmentation map, wherein the segmentation map is consistent with the input image in size.

4. The method according to claim 3, wherein reconstructing pixels at corresponding positions of a second feature map of a plurality of feature channels to obtain the segmentation map comprises:

determining a size relationship between the input image and the second feature map;

and reconstructing the pixel points at the corresponding positions of the second characteristic diagram according to the size relation to obtain the segmentation diagram.

5. The method of claim 1, wherein the method is implemented by a depth coding network,

the method further comprises the following steps:

inputting a sample image into the depth coding network, and obtaining a sample segmentation graph of the sample image, wherein the sample image comprises a contour label of a contour of a target area in the sample image and a category label of a target object;

obtaining a first network loss according to the sample segmentation graph and the category label;

obtaining a second network loss according to the sample segmentation graph and the outline label;

obtaining the network loss of the deep coding network according to the first network loss and the second network loss;

and training the deep coding network according to the network loss.

6. The method of claim 1, further comprising:

obtaining contour lines of the target areas of the multiple categories according to the segmentation graph;

acquiring the type of the target area;

acquiring historical data of the contour line, wherein the historical data comprises a historical type of the contour line and a historical contour line;

and obtaining the classification result of the contour line according to the historical data and the contour line and the type of the target area.

7. The method of claim 6, wherein obtaining the classification result of the contour line according to the historical data and the contour line and the type of the target area comprises:

determining the type of the contour line according to the type of the target area;

and merging the historical contour lines and the contour lines of the same type, and determining the type of the contour lines as the type of the merged contour lines.

8. An image processing apparatus characterized by comprising:

the encoding module is used for carrying out downsampling processing and encoding processing on an input image to obtain a first feature map;

the compression module is used for carrying out dimension compression processing and encoding processing on the first feature map to obtain a second feature map, wherein the dimension compression processing comprises the processing of reducing the number of feature channels of the feature map under the condition that the information contained in the feature map is not changed;

and the up-sampling module is used for carrying out dense up-sampling processing on the second feature map to obtain segmentation maps of target areas of multiple categories in the input image.

9. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to invoke the memory-stored instructions to perform the method of any of claims 1 to 7.

10. A computer readable storage medium having computer program instructions stored thereon, which when executed by a processor implement the method of any one of claims 1 to 7.