CN113393468A

CN113393468A - Image processing method, model training device and electronic equipment

Info

Publication number: CN113393468A
Application number: CN202110719484.5A
Authority: CN
Inventors: 张健
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-06-28
Filing date: 2021-06-28
Publication date: 2021-09-14

Abstract

The disclosure provides an image processing method, a model training device and electronic equipment, and relates to the field of artificial intelligence, in particular to computer vision and deep learning technology. The specific implementation scheme is as follows: determining the position information of a key area in an image to be processed; performing deformation operation on the image to be processed based on the position information of the key area to obtain a deformed image; performing semantic segmentation based on the deformed image to obtain a first semantic segmentation result; and performing the inverse operation of the deformation operation on the first semantic segmentation result to obtain a second semantic segmentation result. The technical scheme of the disclosure is beneficial to deploying the semantic segmentation model on the mobile terminal.

Description

Image processing method, model training device and electronic equipment

Technical Field

The present disclosure relates to the field of artificial intelligence, and in particular to computer vision and deep learning techniques.

Background

Semantic segmentation is a common image processing task. In the related art, the segmentation capability of the semantic segmentation model is improved by increasing the input resolution of the semantic segmentation model, increasing the depth or width of the model, enhancing the structure of the model and the like. Meanwhile, the computational complexity of the model is increased, and the model is difficult to deploy on a mobile terminal.

Disclosure of Invention

The disclosure provides an image processing method, a model training device and electronic equipment.

According to an aspect of the present disclosure, there is provided an image processing method including:

determining the position information of a key area in an image to be processed;

performing deformation operation on the image to be processed based on the position information of the key area to obtain a deformed image;

performing semantic segmentation based on the deformed image to obtain a first semantic segmentation result;

and performing the inverse operation of the deformation operation on the first semantic segmentation result to obtain a second semantic segmentation result.

According to another aspect of the present disclosure, there is provided a model training method, including:

training a preset model to obtain a semantic segmentation model;

the semantic segmentation model is used for processing the image to be processed as follows:

determining the position information of a key area in an image to be processed;

According to another aspect of the present disclosure, there is provided an image processing apparatus including:

the spatial self-attention module is used for determining the position information of a key area in the image to be processed;

the spatial transformation module is used for executing deformation operation on the image to be processed based on the position information of the key area to obtain a deformed image;

the semantic segmentation module is used for performing semantic segmentation based on the deformed image to obtain a first semantic segmentation result;

and the space inverse transformation module is used for executing the inverse operation of the deformation operation on the first semantic segmentation result to obtain a second semantic segmentation result.

According to another aspect of the present disclosure, there is provided a model training apparatus comprising

The training module is used for training the preset model to obtain a semantic segmentation model;

determining the position information of a key area in an image to be processed;

According to another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method according to any one of the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method in any of the embodiments of the present disclosure.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method in any of the embodiments of the present disclosure.

According to the technical scheme, the position information of the key area in the image to be processed is determined, and deformation operation is executed on the image to be processed based on the information. Because semantic segmentation is carried out on the basis of a deformed image obtained after deformation operation, and the obtained first semantic segmentation result is converted into a second semantic segmentation result through inverse operation, the detail amplification of an important region and the space compression of a non-important region can be realized through configuring the deformation operation, so that the limited calculation amount of the semantic segmentation can act on the region needing to be finely segmented. Therefore, the high-precision semantic segmentation capability can be realized by using the light-weight semantic segmentation model, and the semantic segmentation model is favorably deployed on the mobile terminal.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram of an image processing method according to one embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a spatial self-attention module, according to one embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a semantic segmentation model according to one embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a model training method according to one embodiment of the present disclosure;

FIG. 5 is a schematic diagram of an image processing apparatus according to one embodiment of the present disclosure;

FIG. 6 is a schematic diagram of an image processing apparatus according to another embodiment of the present disclosure;

FIG. 7 is a schematic diagram of a model training apparatus according to one embodiment of the present disclosure;

FIG. 8 is a block diagram of an electronic device used to implement methods of embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 shows a schematic diagram of an image processing method provided by an embodiment of the present disclosure. As shown in fig. 1, the method includes:

step S110, determining the position information of a key area in the image to be processed;

step S120, executing deformation operation on the image to be processed based on the position information of the key area to obtain a deformed image;

step S130, performing semantic segmentation based on the deformed image to obtain a first semantic segmentation result;

step S140, performing an inverse operation of the transformation operation on the first semantic segmentation result to obtain a second semantic segmentation result.

The above method may be performed by an electronic device. In practical applications, the electronic device may be used to perform a semantic segmentation task on the image to be processed. For example, the image to be processed may include a road condition image, and the semantic segmentation task needs to segment an object of interest such as sky, road surface, and vehicle in the road image. As another example, the to-be-processed image may include a scenic region image, and the semantic segmentation task needs to segment sights, backgrounds, and people in the scenic region image. Illustratively, in the embodiment of the present disclosure, the semantic segmentation result may include a segmentation map obtained by segmentation, in which different objects of interest are highlighted in different colors or different pixel values.

For example, in the step S110, the position information of the important region may be determined in various ways. For example, location information for the region of interest is determined based on a priori knowledge. One specific example is: in the case that a semantic segmentation task is sequentially performed by taking a plurality of continuous image frames in a video as images to be processed, a circumscribed rectangular frame of an interested object can be determined based on a semantic segmentation result of a previous image frame, and a key area of the current image frame to be processed is determined based on the circumscribed rectangular frame. As another example, the position information of the key region in the image to be processed may be determined manually. For another example, a spatial self-attention module may be provided in the semantic segmentation model, and the spatial self-attention module is used to determine the position information of the important region in the image to be processed.

For example, the position information of the region of interest may include coordinates of a plurality of vertices of the region of interest. For example, the important region may be an image region in a rectangular frame, and the position information thereof includes coordinates (x) of a vertex at the upper left corner of the rectangular frame₁,y₁) And coordinates of the vertex of the lower right corner (x)₂,y₂). The position information may also be based on rectangular box coordinates (x)₁,y₁,x₂,y₂) And (4) showing.

For example, the deformation operation performed on the image to be processed based on the position information of the important region may be used to perform detail enlargement on the important region and/or perform spatial compression on the non-important region. That is, each pixel feature in the image space of the image to be processed may be transformed into another image space, in which the emphasized region will be enlarged and the non-emphasized region will be compressed. In practical application, the deformation operation, such as the algorithm logic, parameters, etc., can be configured according to the application requirements, so as to achieve the effect of enlarging the important region or compressing the non-important region.

After the deformation image is obtained through the deformation operation, the deformation image is sent to a semantic segmentation network, and the semantic segmentation network performs semantic segmentation on the deformation image to obtain a first semantic segmentation result. And performing the inverse operation of the deformation operation on the first semantic segmentation result, so that the first semantic segmentation result can be restored into the image space of the image to be processed, and an accurate second semantic segmentation result can be obtained. Illustratively, the semantic segmentation network may include, for example, a U-net (U-type network).

It can be seen that, in the method of the embodiment of the present disclosure, the position information of the key region in the image to be processed is determined first, and the deformation operation is performed on the image to be processed based on the information. Because semantic segmentation is carried out on the basis of a deformed image obtained after deformation operation, and the obtained first semantic segmentation result is converted into a second semantic segmentation result through inverse operation, the detail amplification of an important region and the space compression of a non-important region can be realized through configuring the deformation operation, so that the limited calculation amount of the semantic segmentation can act on the region needing to be finely segmented. Therefore, the high-precision semantic segmentation capability can be realized by using the light-weight semantic segmentation model, and the semantic segmentation model is favorably deployed on the mobile terminal.

In an alternative exemplary embodiment, in step S110, the determining the position information of the important region in the image to be processed may include:

processing the image to be processed based on a plurality of convolution networks to obtain a first characteristic diagram;

and obtaining the position information of the key area in the image to be processed based on the first feature map.

Illustratively, a spatial self-attention module may be provided in the semantic segmentation model. The spatial self-attention module includes a plurality of convolutional networks and a fully connected layer. Each convolution network in the plurality of convolution networks is used for sequentially convolving the image to be processed and determining the position information of a key area in the image to be processed; the full-connection layer is used for predicting and outputting position information of the key area based on the first feature map obtained by convolution.

Fig. 2 illustrates a schematic diagram of an exemplary spatial self-attention module. As shown in FIG. 2, the spatial self-attention module 200 includes three convolutional networks and a fully-connected layer. Wherein each convolutional network may comprise at least one convolutional layer. For example, each convolutional network may comprise 3 residual network structures. Finally, a full-connection layer is used for predicting and outputting the coordinates (x) of the rectangular frame of the key area₁,y₁,x₂,y₂)。

According to the embodiment, the feature information in the image to be processed is extracted by using the convolutional network, and then the position information of the key area is predicted based on the corresponding feature map, so that the prediction accuracy of the key area can be improved. And the step of predicting the key area is favorably arranged in the model to be carried out, the integral training of the end-to-end (from the image to be processed to the second semantic segmentation result) model is realized, the model training effect is improved, and a more optimal semantic segmentation model is obtained.

Exemplarily, processing the image to be processed based on a plurality of convolutional networks to obtain a first feature map, including:

processing the image to be processed based on at least one first convolution network and the first step length to obtain a second feature map;

processing the image to be processed based on at least one second convolution network and the second step length to obtain a first feature map;

wherein the first step size is smaller than the second step size.

For example, a spatial self-attention module is arranged in the semantic segmentation model, and the spatial self-attention module comprises 1 first convolution network and 2 second convolution networks, and each convolution network comprises 3 residual error network structures. Wherein, the step length of the residual error network structure of the first convolution network is 1. In each second convolutional network, the step size of the first residual network structure is 2, and the step sizes of the remaining residual network structures may be 1.

According to the embodiment, different convolution networks with different step lengths are combined, the feature information in the image to be processed is extracted to obtain the first feature map, and then the key area is predicted based on the first feature map. Thus, both prediction accuracy and efficiency can be considered. The method is beneficial to realizing the lightweight model and deploying the model on the mobile terminal.

In an alternative exemplary embodiment, the position information of the region of interest includes a plurality of vertex coordinates of the region of interest in the image to be processed. Accordingly, in step S120, performing a deformation operation on the image to be processed based on the position information of the important region to obtain a deformed image, including:

and taking the plurality of vertex coordinates as a plurality of control points, and executing non-uniform deformation operation on the image to be processed based on the plurality of control points to obtain a deformed image.

Non-uniform deformation may also be referred to herein as non-uniform sampling, non-rigid deformation.

Since the non-uniform deformation is compared with the linear deformation, the configuration flexibility is high, and if a deformation module for executing the non-uniform deformation operation is arranged in the semantic segmentation model and the deformation module is trained in the training process of the semantic segmentation model, the detail amplification effect on the key area and the space compression effect on the non-key area can be optimized in the training process. Meanwhile, the whole training of an end-to-end (from the image to be processed to the second semantic segmentation result) model is realized, the model training effect is improved, and a better semantic segmentation model is obtained.

Illustratively, the non-uniform deformation operation includes TPS (Thin Plate Spline, Thin Spline interpolation). TPS is one of the interpolation methods. Assuming that the deformation of the sheet steel is modeled as a two-dimensional plane, the TPS can minimize the bending energy of the sheet steel while ensuring that all control points are matched as closely as possible. By using the TPS, a deformation image with accurate and reversible deformation can be obtained based on the image to be processed, and the accuracy of the final semantic segmentation result can be improved.

In a specific application example, the image processing method can be realized by utilizing an improved semantic segmentation model. Fig. 3 shows a schematic diagram of the semantic segmentation model. As shown in fig. 3, the semantic segmentation model includes a spatial self-attention module 310, a TPS transform module 320, a semantic segmentation module 330, and an inverse TPS transform module 340. The spatial self-attention module 310 is used to determine the region of interest in the image to be processed. The TPS conversion module 320 performs TPS operation on the image to be processed, with the vertex of the rectangular frame of the key region being treated as a control point, thereby achieving spatial compression of the non-key region and detail enlargement of the key region, and obtaining a deformed image. The deformed image is input to the semantic segmentation module 330, and a first semantic segmentation result is obtained. Finally, the inverse TPS module 340 is used to perform inverse TPS on the first semantic segmentation result to obtain a final semantic segmentation result, i.e., a second semantic segmentation result.

It can be seen that the method of the embodiment of the present disclosure may enable a limited amount of computation of semantic segmentation to be applied to the region that needs to be finely segmented. Therefore, the high-precision semantic segmentation capability can be realized by using the light-weight semantic segmentation model, and the semantic segmentation model is favorably deployed on the mobile terminal.

According to an embodiment of the present disclosure, there is also provided a model training method, as shown in fig. 4, the method including:

step S410, training a preset model to obtain a semantic segmentation model;

determining the position information of a key area in an image to be processed;

Illustratively, the semantic segmentation model includes:

It can be understood that the preset model has a similar effect as the semantic segmentation model. The preset model may perform the following processing for an input image of the preset model:

determining position information of a key area in an input image;

performing deformation operation on the input image based on the position information of the key area to obtain a deformed image;

The preset model and the semantic segmentation model have the same structure but different parameters. Because the preset model is trained, the parameters of the preset model are optimized, and the segmentation effect of the semantic segmentation model is better.

According to the model training method, the semantic segmentation model can execute the image processing method, and the method has corresponding beneficial effects. In addition, because all modules for executing the image processing method are arranged in the model, the whole model is trained, which is beneficial to training to obtain an end-to-end model and improves the whole effect.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.

As an implementation of the above methods, the present disclosure also provides an image processing apparatus, which in one embodiment, as shown in fig. 5, includes:

a spatial self-attention module 510 for determining location information of a region of interest in the image to be processed;

a spatial transformation module 520, configured to perform a transformation operation on the image to be processed based on the position information of the key region, so as to obtain a transformed image;

a semantic segmentation module 530, configured to perform semantic segmentation based on the deformed image to obtain a first semantic segmentation result;

and the spatial inverse transformation module 540 is configured to perform an inverse operation of the transformation operation on the first semantic segmentation result to obtain a second semantic segmentation result.

In another embodiment, an image processing apparatus includes:

a spatial self-attention module 610, configured to determine location information of a key region in the image to be processed;

the spatial transformation module 620 is configured to perform a transformation operation on the image to be processed based on the position information of the key region to obtain a transformed image;

a semantic segmentation module 630, configured to perform semantic segmentation based on the deformed image to obtain a first semantic segmentation result;

and the space inverse transformation module 640 is configured to perform an inverse operation of the transformation operation on the first semantic segmentation result to obtain a second semantic segmentation result.

Illustratively, the position information of the key region includes a plurality of vertex coordinates of the key region in the image to be processed;

as shown in fig. 6, the spatial transform module 620 includes:

and a non-uniform deformation unit 621, configured to use the vertex coordinates as control points, and perform a non-uniform deformation operation on the to-be-processed image based on the control points, so as to obtain a deformed image.

Illustratively, the non-uniform morphing operation comprises TPS. Accordingly, the inverse transform operation is an inverse TPS.

Illustratively, as shown in fig. 6, the spatial self-attention module 610 includes:

the convolution unit 611 is configured to process the image to be processed based on the plurality of convolution networks to obtain a first feature map;

and the full connection layer 612 is used for obtaining position information of a key area in the image to be processed based on the first feature map.

Exemplarily, the convolution unit 611 is specifically configured to:

wherein the first step size is smaller than the second step size.

The present disclosure also provides a model training device. As shown in fig. 7, the apparatus includes:

the training module 710 is configured to train a preset model to obtain a semantic segmentation model;

determining the position information of a key area in an image to be processed;

The functions of each unit, module or sub-module in each apparatus in the embodiments of the present disclosure may refer to the corresponding description in the above method embodiments, and are not described herein again.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 8 illustrates a schematic block diagram of an example electronic device 800 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 8, the electronic device 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the electronic apparatus 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

A number of components in the electronic device 800 are connected to the I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, or the like; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the electronic device 800 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 801 performs the respective methods and processes described above, such as an image processing method or a model training method. For example, in some embodiments, the image processing method or the model training method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 808. In some embodiments, part or all of the computer program can be loaded and/or installed onto the electronic device 800 via the ROM 802 and/or the communication unit 809. When the computer program is loaded into the RAM 803 and executed by the computing unit 801, one or more steps of the image processing method or the model training method described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the image processing method or the model training method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. An image processing method comprising:

determining the position information of a key area in an image to be processed;

and executing the inverse operation of the deformation operation on the first semantic segmentation result to obtain a second semantic segmentation result.

2. The method according to claim 1, wherein the position information of the region of interest includes a plurality of vertex coordinates of the region of interest in the image to be processed;

the performing a deformation operation on the image to be processed based on the position information of the key area to obtain a deformed image, including:

and taking the vertex coordinates as a plurality of control points, and executing non-uniform deformation operation on the image to be processed based on the control points to obtain the deformed image.

3. The method of claim 2, wherein the non-uniform deformation operation comprises thin spline interpolation.

4. The method according to any one of claims 1-3, wherein the determining position information of the region of interest in the image to be processed comprises:

processing the image to be processed based on a plurality of convolutional networks to obtain a first characteristic diagram;

5. The method of claim 4, wherein the processing the image to be processed based on the plurality of convolutional networks to obtain a first feature map comprises:

processing the image to be processed based on at least one first convolution network and a first step length to obtain a second characteristic diagram;

processing the image to be processed based on at least one second convolution network and a second step length to obtain the first characteristic diagram;

wherein the first step size is smaller than the second step size.

6. A model training method, comprising:

training a preset model to obtain a semantic segmentation model;

the semantic segmentation model is used for processing an image to be processed as follows:

determining the position information of a key area in an image to be processed;

7. An image processing apparatus comprising:

the semantic segmentation module is used for performing semantic segmentation based on the deformation image to obtain a first semantic segmentation result;

8. The apparatus according to claim 7, wherein the position information of the region of interest includes a plurality of vertex coordinates of the region of interest in the image to be processed;

the spatial transform module includes:

and the non-uniform deformation unit is used for taking the vertex coordinates as a plurality of control points and executing non-uniform deformation operation on the image to be processed based on the control points to obtain the deformed image.

9. The apparatus of claim 8, wherein the non-uniform deformation operation comprises thin spline interpolation.

10. The apparatus of any of claims 7-9, wherein the spatial self-attention module comprises:

the convolution unit is used for processing the image to be processed based on a plurality of convolution networks to obtain a first characteristic diagram;

and the full connection layer is used for obtaining the position information of the key area in the image to be processed based on the first characteristic diagram.

11. The apparatus of claim 10, wherein the convolution unit is specifically configured to:

wherein the first step size is smaller than the second step size.

12. A model training apparatus comprising:

determining the position information of a key area in an image to be processed;

13. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1-6.

15. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-6.