CN117808680A

CN117808680A - Image processing method and device, electronic equipment and storage medium

Info

Publication number: CN117808680A
Application number: CN202211157615.6A
Authority: CN
Inventors: 汪松; 韩炳男; 伦朝林; 尹玄武
Original assignee: Shanghai Xuanjie Technology Co ltd
Current assignee: Shanghai Xuanjie Technology Co ltd
Priority date: 2022-09-22
Filing date: 2022-09-22
Publication date: 2024-04-02

Abstract

The application discloses an image processing method and device, electronic equipment and a storage medium, and relates to the technical field of image processing, wherein the main technical scheme comprises the following steps: connecting the up-sampled first resolution feature map with the second resolution feature map to obtain an intermediate feature map, acquiring a first offset matrix corresponding to the corresponding first resolution feature map according to the intermediate feature map, correcting the first resolution feature map, and improving the correction degree of the edge structure of the up-sampled first resolution feature map in the airspace and the original image edge structure; and acquiring a second offset matrix corresponding to the second resolution feature map according to the intermediate feature map, correcting the second resolution feature map, eliminating redundant texture information carried by the second resolution feature map, highlighting an edge part, and eliminating airspace offset when image recovery is carried out based on the corrected first resolution feature map and the corrected second resolution feature map.

Description

Image processing method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to an image processing method and apparatus, an electronic device, and a storage medium.

Background

Image semantic segmentation is a task of assigning a semantic label to each pixel in an image, and is a fundamental task in image processing. The image semantic segmentation can be applied to the fields of automatic driving, image semantic enhancement, augmented reality, security protection, remote sensing measurement and control and the like in a plurality of real life. In view of the great success of deep learning in the field of computer vision in recent years, current image semantic segmentation techniques are mostly algorithms based on deep neural networks.

At present, a common segmentation algorithm generally adopts a pooling (pooling) mode, a large-stride convolution operation mode and the like to perform layer-by-layer downsampling on an original image so as to obtain a segmented image in the original image, and in order to restore the original size of the original image, a direct upsampling mode is generally adopted to restore a feature map in a segmented image with low resolution. However, this direct up-sampling approach causes a significant shift in the restored segmented image.

Disclosure of Invention

The application provides an image processing method, an image processing device, electronic equipment and a storage medium. The method mainly aims to solve the problem that when an image is segmented by a direct upsampling mode, the segmented image after the size is restored has obvious offset.

According to a first aspect of the present application, there is provided an image processing method, comprising:

up-sampling a first resolution feature map corresponding to an initial image to obtain a second resolution feature map, wherein the first resolution feature map and the second resolution feature map have the same scale, and the resolution of the first resolution feature map before up-sampling is lower than that of the second resolution feature map;

connecting the first resolution characteristic diagram and the second resolution characteristic diagram to obtain an intermediate characteristic diagram;

acquiring a first offset matrix of the intermediate feature map corresponding to the first resolution feature map and a second offset matrix of the intermediate feature map corresponding to the second resolution feature map;

correcting the first resolution feature map according to the first offset matrix, and correcting the second resolution feature map according to the second offset matrix;

and carrying out semantic segmentation processing on the initial image according to the corrected first resolution feature map and the corrected second resolution feature map to obtain a target image.

Optionally, performing semantic segmentation processing on the initial image according to the corrected first resolution feature map and the corrected second resolution feature map, to obtain a target image includes:

Superposing the corrected first resolution characteristic diagram and the corrected second resolution characteristic diagram to obtain a superposed characteristic diagram;

and carrying out image semantic segmentation on the superimposed feature images, and outputting the target image.

Optionally, the upsampling the first resolution feature map corresponding to the initial image to the second resolution feature map includes:

performing feature extraction on the initial image through step-by-step up-sampling to obtain the first resolution feature map and the second resolution feature map;

and upsampling the first resolution characteristic diagram to the same scale size as the second resolution characteristic diagram by adopting a preset interpolation upsampling algorithm.

Optionally, the correcting the first resolution feature map according to the first offset matrix, and the correcting the second resolution feature map according to the second offset matrix includes:

acquiring pixel coordinates of each of the first resolution feature maps according to the first offset matrix, and corresponding to a first offset of the second resolution feature maps;

calling a preset correction function, and correcting the first resolution characteristic diagram according to the first offset;

Acquiring a second offset of each pixel coordinate in the second resolution characteristic diagram relative to the second resolution characteristic diagram according to the second offset matrix;

and calling the preset correction function to correct the second resolution characteristic diagram according to the second offset.

Optionally, before upsampling the first resolution feature map corresponding to the initial image to the second resolution feature map, the method further includes:

configuring a reference network of an image semantic segmentation model as a context access network in a bilateral segmentation network;

configuring a backbone network of the image semantic segmentation model as a residual neural network for extracting a feature map;

configuring an up-sampling network of the image semantic segmentation model as an up-sampling network based on feature alignment to complete network construction of the image semantic segmentation model;

training the image semantic segmentation model by using a sample image to execute segmentation of the image semantics by using the trained image semantic segmentation model.

Optionally, the training the image semantic segmentation model using the sample image includes:

inputting the sample image into a context path network in the bilateral segmentation network;

Extracting a training first resolution characteristic map corresponding to the sample image based on the residual neural network, and training a second resolution characteristic map;

upsampling the training first resolution feature map to the same scale as the training second resolution feature map using an upsampling approach;

learning the training first resolution feature map and the training second resolution feature map based on the upsampling network, and correcting the training first resolution feature map and the training second resolution feature map;

and training the image semantic segmentation model based on the corrected training first resolution feature map, the corrected training second resolution feature map and a preset loss function.

Optionally, the learning the training first resolution feature map and the training second resolution feature map based on the upsampling network, and correcting the training first resolution feature map and the training second resolution feature map includes:

connecting the training first resolution feature map with the training second resolution feature map to obtain a training intermediate feature map;

learning the training intermediate feature map, and obtaining a first training offset matrix of the training intermediate feature map relative to the training first resolution feature map and a second training offset matrix of the training intermediate feature map relative to the training second resolution feature map;

Correcting the training first resolution feature map by using the first training offset matrix, and correcting the training second resolution feature map by using the second training offset matrix.

According to a second aspect of the present application, there is provided an image processing apparatus comprising:

the up-sampling unit is used for up-sampling a first resolution characteristic image corresponding to the initial image to obtain a second resolution characteristic image, the first resolution characteristic image and the second resolution characteristic image have the same scale, and the resolution of the first resolution characteristic image before up-sampling is lower than that of the second resolution characteristic image;

the connecting unit is used for connecting the first resolution characteristic diagram and the second resolution characteristic diagram to obtain an intermediate characteristic diagram;

the offset matrix acquisition unit is used for acquiring a first offset matrix of the intermediate feature map corresponding to the first resolution feature map and a second offset matrix of the intermediate feature map corresponding to the second resolution feature map;

the correction unit is used for correcting the first resolution characteristic map according to the first offset matrix and correcting the second resolution characteristic map according to the second offset matrix;

And the processing unit is used for carrying out semantic segmentation processing on the initial image according to the corrected first resolution characteristic map and the corrected second resolution characteristic map to obtain a target image.

Optionally, the processing unit includes:

Optionally, the first upsampling unit is further configured to:

Optionally, the correction unit is further configured to:

Optionally, the apparatus further comprises:

the network construction unit is used for configuring a reference network of the image semantic segmentation model into a context path network in the bilateral segmentation network;

and the training unit is used for training the image semantic segmentation model by using the sample image so as to execute segmentation of the image semantic by using the trained image semantic segmentation model.

Optionally, the training unit includes:

the input module is used for inputting the sample image into a context path network in the bilateral segmentation network;

the upsampling module is used for upsampling the training first resolution characteristic diagram to the same scale as the training second resolution characteristic diagram by using an upsampling mode;

The correction module is used for learning the training first resolution characteristic map and the training second resolution characteristic map based on the up-sampling network and correcting the training first resolution characteristic map and the training second resolution characteristic map;

the training module is used for training the image semantic segmentation model based on the corrected training first resolution characteristic diagram, the corrected training second resolution characteristic diagram and a preset loss function.

Optionally, the correction module is further configured to:

According to a third aspect of the present application, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect.

According to a fourth aspect of the present application, there is provided a chip comprising one or more interface circuits and one or more processors; the interface circuit is configured to receive a signal from a memory of an electronic device and to send the signal to the processor, the signal comprising computer instructions stored in the memory, which when executed by the processor, cause the electronic device to perform the method of the first aspect.

According to a fifth aspect of the present application, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of the preceding first aspect.

According to a sixth aspect of the present application there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method as in the first aspect described above.

The image processing method, the device, the electronic equipment and the storage medium provided by the application are used for upsampling a first resolution feature map corresponding to an initial image to obtain a second resolution feature map, wherein the first resolution feature map and the second resolution feature map have the same scale, the resolution of the first resolution feature map before upsampling is lower than that of the second resolution feature map, the first resolution feature map and the second resolution feature map are connected to obtain an intermediate feature map, a first offset matrix of the intermediate feature map corresponding to the first resolution feature map is obtained, the intermediate feature map corresponds to a second offset matrix of the second resolution feature map, the first resolution feature map is corrected according to the first offset matrix, the second resolution feature map is corrected according to the second offset matrix, the initial image is subjected to a segmentation processing according to the first resolution feature map and the corrected second resolution feature map after the first offset matrix, the first resolution feature map is connected with the first resolution feature map after the first application, the edge feature map is corrected according to the first offset matrix, and the edge feature map is corrected according to the first application, and the edge feature map is acquired; and obtaining a second offset matrix corresponding to the second resolution feature map according to the intermediate feature map, so that redundant texture information carried by the second offset matrix can be eliminated, and the edge part is highlighted, so that airspace offset is eliminated when image recovery is performed based on the corrected first resolution feature map and the corrected second resolution feature map.

It should be understood that the description of this section is not intended to identify key or critical features of the embodiments of the application or to delineate the scope of the application. Other features of the present application will become apparent from the description that follows.

Drawings

The drawings are for better understanding of the present solution and do not constitute a limitation of the present application. Wherein:

fig. 1 is a schematic flow chart of an image processing method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of an image semantic segmentation model according to an embodiment of the present application;

fig. 3 is a network diagram of an upsampling network UMFA based on feature alignment according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a target image result according to an embodiment of the present disclosure;

FIG. 5 is a flow chart of a training method of an image semantic segmentation model;

fig. 6 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

fig. 8 is a schematic block diagram of an example electronic device 800 provided by an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

An image processing method, apparatus, electronic device, and storage medium of the embodiments of the present application are described below with reference to the accompanying drawings.

Fig. 1 is a flowchart of an image processing method according to an embodiment of the present application.

As shown in fig. 1, the method comprises the steps of:

101, up-sampling a first resolution feature map corresponding to an initial image to obtain a second resolution feature map, wherein the first resolution feature map and the second resolution feature map have the same scale size, and the resolution of the first resolution feature map before up-sampling is lower than that of the second resolution feature map.

In order to facilitate understanding of the architecture of the image semantic segmentation model, please refer to fig. 2, fig. 2 is a schematic diagram of the image semantic segmentation model provided in the embodiment of the present application, where the schematic diagram is a reference network architecture, a backbone network and an upsampling network used by the image semantic segmentation model, the reference network architecture is a context path in a bilateral segmentation network bisnet, the backbone network is a residual network resnet18 pre-trained by using an ImageNet dataset, and the upsampling network is an upsampling network (Upsampling Module based on Feature Alignment, UMFA) based on feature alignment.

It can be seen from the network chart that, when the feature map extraction is performed, the embodiment of the present application performs descriptions with 4 scales, namely 1/4, 1/8, 1/16 and 1/32, and it should be noted that, this description manner is not intended to limit the extraction scale of the feature map to only four types, and the feature map extraction can be flexibly configured according to the requirements of different scenes, for example, adding one extraction scale 1/64, adding 5 scales, and so on, which is not limited in the embodiment of the present application.

After the initial image is input into the image semantic segmentation model, the first resolution feature map and the second resolution feature map are extracted by using a residual network resnet18 pre-trained by the ImageNet dataset, and then the first resolution feature map in the initial image is up-sampled to the same scale as the second resolution feature map, namely the size of the first resolution feature map is up-sampled to be consistent with the size of the second resolution feature map. Reference may be made to any implementation in the related art for this implementation.

102, connecting the first resolution feature map and the second resolution feature map to obtain an intermediate feature map.

After the up-sampling in step 101, the first resolution feature map and the second resolution feature map are identical in size, the first resolution feature map and the second resolution feature map are connected to obtain an intermediate feature map, the intermediate feature map is transmitted to an up-sampling network UMFA based on feature alignment in fig. 2, for convenience of understanding, please continue to refer to fig. 2, assuming that 1/4 of the second resolution feature map and 1/8 of the first resolution feature map are extracted based on the resnet18, after the up-sampling of the first resolution feature map is performed in step 101, the two feature maps are input into the up-sampling network UMFA based on feature alignment, so that the two feature maps are spatially aligned with the high resolution feature map.

103, obtaining a first offset matrix of the intermediate feature map corresponding to the first resolution feature map and a second offset matrix of the intermediate feature map corresponding to the second resolution feature map.

Continuously learning the connected feature images to obtain a first offset matrix delta of the connected feature images relative to the first resolution feature images ^L And a second offset matrix delta of the connected feature map to the second resolution feature map ^H . When the first offset matrix and the second offset matrix are obtained by training, it is required that the first resolution feature map and the second resolution feature map exist simultaneously, and this object is achieved by step 102.

The first offset matrix delta ^L Comprising an offset of the first resolution profile relative to a second resolution profile, a second offset matrix delta ^H Including an offset of the second resolution profile relative to the first resolution profile.

Embodiments of the present application propose to up-sample network UMFA based on feature alignment. Aiming at the extracted depth semantic features, the extracted depth semantic features are subjected to spatial domain alignment with a high-resolution feature map, as shown in fig. 3, fig. 3 is a network map of an upsampling network UMFA based on feature alignment, which is provided by an embodiment of the present application.

104, correcting the first resolution feature map according to the first offset matrix, and correcting the second resolution feature map according to the second offset matrix.

The first offset matrix delta obtained in step 103 is processed ^L Returning to the original first resolution feature map (i.e., the feature map without upsampling), the second offset matrix Δ ^H Returning to the second resolution feature map and referencing the first offset matrix delta ^L Alignment (correction) of the original first resolution feature map is completed, referring to the second offset matrix delta ^H Alignment (correction) of the second resolution feature is completed. The alignment (correction) is the alignment (correction) of the coordinates.

And 105, performing semantic segmentation processing on the initial image according to the corrected first resolution feature map and the corrected second resolution feature map to obtain a target image.

And (3) taking the corrected first resolution feature map and the corrected second resolution feature map output in the step (104) as the input of an image segmentation model or an image segmentation algorithm, and performing semantic segmentation on the initial image to obtain a target image. For an implementation algorithm or implementation manner of image segmentation, reference may be made to any implementation in the related art, which is not limited in this embodiment of the present application.

Fig. 4 is a schematic diagram of a target image result provided in the embodiment of the present application, from which it can be seen that, for the alignment degree with the edge structure of the original image, redundant texture information carried by the image itself can be eliminated, and the edge portion is protruded, so as to ensure more remarkable edge structure information.

According to the image processing method, a first resolution feature map corresponding to an initial image is up-sampled to form a second resolution feature map, the first resolution feature map and the second resolution feature map have the same scale, the resolution of the first resolution feature map before up-sampling is lower than that of the second resolution feature map, the first resolution feature map and the second resolution feature map are connected to obtain an intermediate feature map, a first offset matrix of the intermediate feature map corresponding to the first resolution feature map is obtained, the intermediate feature map corresponds to the second offset matrix of the second resolution feature map, the first resolution feature map is corrected according to the first offset matrix, the second resolution feature map is corrected according to the second offset matrix, the initial image is subjected to semantic segmentation processing according to the corrected first resolution feature map and the corrected second resolution feature map to obtain a target image, compared with a correlation technique, the first offset matrix of the intermediate feature map corresponding to the first resolution feature map after the first resolution feature map is sampled, the intermediate feature map can be corrected according to the first offset matrix, and the first edge feature map can be corrected according to the first resolution feature map of the first edge of the intermediate feature map; and obtaining a second offset matrix corresponding to the second resolution feature map according to the intermediate feature map, so that redundant texture information carried by the second offset matrix can be eliminated, and the edge part is highlighted, so that airspace offset is eliminated when image recovery is performed based on the corrected first resolution feature map and the corrected second resolution feature map.

As a refinement of the embodiment of the present application, when performing, in step 105, semantic segmentation processing on the initial image according to the corrected first resolution feature map and the corrected second resolution feature map to obtain a target image, it is necessary to perform superposition computation on the corrected first resolution feature map and the corrected second resolution feature map to obtain a superimposed feature map, perform image semantic segmentation on the superimposed feature map, output the segmented image, and implement superposition computation on the corrected first resolution feature map and the corrected second resolution feature map in an upsampling network UMFA, and implement the superposition computation on the superimposed feature map, which is characterized in fig. 3, in thatAnd (3) obtaining a second resolution characteristic diagram (rightmost second resolution characteristic diagram in fig. 3) by superposition calculation, taking the second resolution characteristic diagram as input of semantic segmentation in fig. 2, and outputting a segmented image after segmentation processing of the initial image by the second resolution characteristic diagram (rightmost second resolution characteristic diagram in fig. 3). As an implementation manner of the embodiment of the application, the superimposed feature map is used as input of a segmentation head, the segmentation head is a segmentation head of any semantic segmentation model or segmentation algorithm, and before semantic segmentation is executed, the superimposed feature map is input to the segmentation head; when the voice segmentation is executed, the superimposed feature images are acquired from the segmentation head, the semantic segmentation of the superimposed feature images is started to be executed, the target images obtained after segmentation are output, and the used semantic segmentation model or segmentation algorithm is not limited in the embodiment of the application.

As a possible way of an embodiment of the present application, upsampling the low resolution feature map (first resolution feature map) in the initial image to the same scale as the second resolution feature map includes: and extracting features of the initial image through step-by-step upsampling to obtain a first resolution feature image and a second resolution feature image, and sampling the first resolution feature image to the same scale as the second resolution feature image by adopting a preset interpolation upsampling algorithm. The progressive upsampling may be performed by interpolation, and the method of interpolation upsampling may be, but is not limited to, nearest neighbor interpolation, bilinear interpolation, and bicubic interpolation, and it should be noted that the foregoing is merely exemplary, and the embodiment of the present application is not limited to progressive upsampling.

Further, in a possible implementation manner of this embodiment, correcting the first resolution feature map according to the first offset matrix, and correcting the second resolution feature map according to the second offset matrix includes: acquiring pixel coordinates of each of the first resolution feature graphs according to the first offset matrix, calling a preset correction function relative to a first offset of the second resolution feature graphs, correcting the first resolution feature graphs according to the first offset, acquiring pixel coordinates of each of the second resolution feature graphs according to the second offset matrix, calling the preset correction function relative to a second offset of the second resolution feature graphs, and correcting the second resolution feature graphs according to the second offset. The preset correction function in the specific application process can be realized through Alignment function Alignment, and the embodiments of the present application will not be repeated in a one-to-one manner with respect to the specific implementation process.

Before executing the method shown in fig. 1, network building and training of the image semantic segmentation model are further required to be executed, and with continued reference to fig. 2, specific building includes:

1. configuring a reference network architecture used by the image semantic segmentation model as a context access network in a bilateral segmentation network bisnet; it should be noted that the built reference network architecture is a context path network in the biset network, and the spatial path in the biset network is abandoned.

2. The backbone network of the image semantic segmentation model is configured as a residual network resnet18 pre-trained using ImageNet datasets for feature map extraction.

3. And configuring an up-sampling network of the image semantic segmentation model as an up-sampling network based on feature alignment, and completing network construction of the image semantic segmentation model.

After the network construction of the image semantic segmentation model is completed, training the image semantic segmentation model by using a sample image to execute the image processing method shown in fig. 1 by using the trained image semantic segmentation model.

In a specific training process, as shown in fig. 5, fig. 5 is a flow chart of an image semantic segmentation model training method, where the method includes:

501, inputting the sample image into a context path network in the bilateral segmentation network.

The embodiments of the present application mainly use the public dataset coco-stuff dataset in determining the sample image. In order to reduce training difficulty and actual requirements of services, original 171 types of data in the coco-stuff data set can be reduced, and the types of sample images are not limited in the embodiment of the application.

In the training process, the data sample number batch size acquired in one training process can be set, the data sample number batch size is a tested value, and the data sample number batch size can be flexibly set based on the data supporting capability, for example, the data sample number batch size is 20, the data sample number batch size is 30, and the like, and the embodiment of the application does not limit the specific set data sample number.

And 502, extracting a training first resolution characteristic map corresponding to the sample image based on the residual neural network, and training a second resolution characteristic map.

503, upsampling the training first resolution feature map to the same scale as the training second resolution feature map using an upsampling mode.

And 504, learning the training first resolution characteristic map and the training second resolution characteristic map based on the upsampling network, and correcting the training first resolution characteristic map and the training second resolution characteristic map.

The implementation process of step 502 and step 504 may be described with reference to the related steps in fig. 1, and the embodiments of the present application will not be described herein.

And 505, training the image semantic segmentation model based on the corrected training first resolution feature map, the corrected training second resolution feature map and a preset loss function.

Connecting the training first-resolution feature map with the training second-resolution feature map to obtain a training intermediate feature map, learning the training intermediate feature map, obtaining a first training offset matrix of the training intermediate feature map relative to the training first-resolution feature map, and correcting the training first-resolution feature map by using the first training offset matrix and correcting the training second-resolution feature map by using the second training offset matrix. The training is not described in detail in the embodiments of the present application.

When training the image semantic segmentation model, a loss function and a learning rate need to be preset in advance for use, and in a specific implementation, the application uses the preset loss function as an OHEM algorithm (Online Hard Example Mining) and the learning rate as an exponential transformation strategy, for example, when using a Poly strategy, the initial learning rate is set to be 0.01. In addition to the above settings, settings for the number of iterations are included. The foregoing is merely exemplary, and the embodiments of the present application are not limited to specific settings in the training process.

Training of the image semantic segmentation model is an iterative process that continuously adjusts training parameters according to training results based on a specific network. The specific training method of the image semantic segmentation model can refer to the description of the related technology, and the embodiments of the present application are not described herein.

Corresponding to the image processing method, the invention also provides an image processing device. Since the device embodiment of the present invention corresponds to the above-mentioned method embodiment, details not disclosed in the device embodiment may refer to the above-mentioned method embodiment, and details are not described in detail in the present invention.

Fig. 6 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application, as shown in fig. 6, including:

the upsampling unit 601 is configured to upsample a first resolution feature map corresponding to an initial image to obtain a second resolution feature map, where the first resolution feature map and the second resolution feature map have the same scale size, and the resolution of the first resolution feature map before upsampling is lower than the resolution of the second resolution feature map;

a connection unit 602, configured to connect the first resolution feature map and the second resolution feature map to obtain an intermediate feature map;

An offset matrix obtaining unit 603, configured to obtain a first offset matrix of the intermediate feature map corresponding to the first resolution feature map, and a second offset matrix of the intermediate feature map corresponding to the second resolution feature map;

a correction unit 604, configured to correct the first resolution feature map according to the first offset matrix, and correct the second resolution feature map according to the second offset matrix;

and a processing unit 605, configured to perform semantic segmentation processing on the initial image according to the corrected first resolution feature map and the corrected second resolution feature map, so as to obtain a target image.

The image processing device provided by the application upsamples a first resolution feature map corresponding to an initial image to obtain a second resolution feature map, wherein the first resolution feature map and the second resolution feature map have the same scale, the resolution of the first resolution feature map before upsampling is lower than that of the second resolution feature map, the first resolution feature map and the second resolution feature map are connected to obtain an intermediate feature map, a first offset matrix of the intermediate feature map corresponding to the first resolution feature map is obtained, the intermediate feature map corresponds to the second offset matrix of the second resolution feature map, the first resolution feature map is corrected according to the first offset matrix, the second resolution feature map is corrected according to the second offset matrix, the initial image is subjected to semantic segmentation processing according to the corrected first resolution feature map and the corrected second resolution feature map to obtain a target image, compared with the relevant technical application image, the first offset matrix of the intermediate feature map can be corrected according to the first resolution feature map after the first offset feature map is connected to the first resolution feature map, and the intermediate feature map can be corrected according to the first offset matrix, and the first edge feature map can be corrected; and obtaining a second offset matrix corresponding to the second resolution feature map according to the intermediate feature map, so that redundant texture information carried by the second offset matrix can be eliminated, and the edge part is highlighted, so that airspace offset is eliminated when image recovery is performed based on the corrected first resolution feature map and the corrected second resolution feature map.

Further, in one possible implementation manner of this embodiment, as shown in fig. 7, the processing unit 605 includes:

Further, in a possible implementation manner of this embodiment, as shown in fig. 7, the upsampling unit 601 is further configured to:

Further, in a possible implementation manner of this embodiment, as shown in fig. 7, the correction unit 604 is further configured to:

Further, in a possible implementation manner of this embodiment, as shown in fig. 7, the apparatus further includes:

a network construction unit 606, configured to configure a reference network of the image semantic segmentation model as a context path network in the bilateral segmentation network;

a training unit 607, configured to train the image semantic segmentation model using a sample image, so as to perform segmentation of the image semantic using the trained image semantic segmentation model.

Further, in one possible implementation manner of this embodiment, as shown in fig. 7, the training unit 607 includes:

An input module 6071 for inputting the sample image into a context path network in the bilateral segmentation network;

an upsampling module 6072 for upsampling the training first resolution feature map to the same scale as the training second resolution feature map using an upsampling mode;

a correction module 6073 for learning the training first resolution feature map and the training second resolution feature map based on the upsampling network, and correcting the training first resolution feature map and the training second resolution feature map;

the training module 6074 is configured to train the image semantic segmentation model based on the corrected training first resolution feature map, the corrected training second resolution feature map, and a preset loss function.

Further, in one possible implementation manner of this embodiment, as shown in fig. 7, the correction module 6073 is further configured to:

The foregoing explanation of the method embodiment is also applicable to the apparatus of this embodiment, and the principle is the same, and this embodiment is not limited thereto.

According to embodiments of the present application, there is also provided an electronic device, a readable storage medium and a computer program product.

The application also provides a chip comprising one or more interface circuits and one or more processors; the interface circuit is configured to receive a signal from a memory of an electronic device and send the signal to the processor, the signal including computer instructions stored in the memory that, when executed by the processor, cause the electronic device to perform the method of any of the embodiments described above.

Fig. 8 shows a schematic block diagram of an example electronic device 800 that may be used to implement embodiments of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the application described and/or claimed herein.

As shown in fig. 8, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a ROM (Read-Only Memory) 802 or a computer program loaded from a storage unit 808 into a RAM (Random Access Memory ) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The computing unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An I/O (Input/Output) interface 805 is also connected to bus 804.

Various components in device 800 are connected to I/O interface 805, including: an input unit 806 such as a keyboard, mouse, etc.; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, etc.; and a communication unit 809, such as a network card, modem, wireless communication transceiver, or the like. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 801 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 801 include, but are not limited to, a CPU (Central Processing Unit ), GPU (Graphic Processing Units, graphics processing unit), various dedicated AI (Artificial Intelligence ) computing chips, various computing units running machine learning model algorithms, DSPs (Digital Signal Processor, digital signal processors), and any suitable processors, controllers, microcontrollers, and the like. The computing unit 801 performs the respective methods and processes described above, for example, an image processing method. For example, in some embodiments, the image processing method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 808. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 800 via ROM 802 and/or communication unit 809. When a computer program is loaded into RAM 803 and executed by computing unit 801, one or more steps of the methods described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the aforementioned image processing methods by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit System, FPGA (Field Programmable Gate Array ), ASIC (Application-Specific Integrated Circuit, application-specific integrated circuit), ASSP (Application Specific Standard Product, special-purpose standard product), SOC (System On Chip ), CPLD (Complex Programmable Logic Device, complex programmable logic device), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present application may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this application, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, RAM, ROM, EPROM (Electrically Programmable Read-Only-Memory, erasable programmable read-Only Memory) or flash Memory, an optical fiber, a CD-ROM (Compact Disc Read-Only Memory), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., CRT (Cathode-Ray Tube) or LCD (Liquid Crystal Display ) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: LAN (Local Area Network ), WAN (Wide Area Network, wide area network), internet and blockchain networks.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual Private Server" or simply "VPS") are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.

It should be noted that, artificial intelligence is a subject of studying a certain thought process and intelligent behavior (such as learning, reasoning, thinking, planning, etc.) of a computer to simulate a person, and has a technology at both hardware and software level. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, a machine learning/deep learning technology, a big data processing technology, a knowledge graph technology and the like.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions disclosed in the present application are achieved, and are not limited herein.

The above embodiments do not limit the scope of the application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims

1. An image processing method, comprising:

2. The method of claim 1, wherein performing semantic segmentation processing on the initial image based on the corrected first resolution feature map and the corrected second resolution feature map to obtain a target image comprises:

3. The method of claim 1, wherein upsampling the first resolution profile corresponding to the initial image to a second resolution profile comprises:

4. The method of claim 1, wherein correcting the first resolution feature map according to the first offset matrix and correcting the second resolution feature map according to the second offset matrix comprises:

5. The method of any of claims 1-4, wherein prior to upsampling the first resolution profile corresponding to the initial image to the second resolution profile, the method further comprises:

6. The method of claim 5, wherein the training the image semantic segmentation model using a sample image comprises:

7. The method of claim 6, wherein the learning of the training first resolution feature map and the training second resolution feature map based on the upsampling network and correcting the training first resolution feature map and the training second resolution feature map comprises:

8. An image processing apparatus, comprising:

9. The apparatus of claim 8, wherein the processing unit comprises:

10. The apparatus of claim 8, wherein the upsampling unit is further configured to:

11. The apparatus of claim 8, wherein the correction unit is further configured to:

12. The apparatus according to any one of claims 8-11, wherein the apparatus further comprises:

13. The apparatus of claim 12, wherein the training unit comprises:

14. The apparatus of claim 13, wherein the correction module is further configured to:

15. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.

16. A chip comprising one or more interface circuits and one or more processors; the interface circuit is configured to receive a signal from a memory of an electronic device and to send the signal to the processor, the signal comprising computer instructions stored in the memory, which when executed by the processor, cause the electronic device to perform the method of any of claims 1-7.

17. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-7.

18. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-7.