CN115249221A

CN115249221A - Image processing method and device and cloud equipment

Info

Publication number: CN115249221A
Application number: CN202211161279.2A
Authority: CN
Inventors: 许鸿斌; 周志鹏; 孙佰贵; 周畅; 杨红霞; 陈伟涛
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2022-09-23
Filing date: 2022-09-23
Publication date: 2022-10-28

Abstract

The application provides an image processing method, an image processing device and cloud equipment, wherein the image processing method comprises the following steps: receiving an image to be migrated and a style image sent by terminal equipment, wherein the image to be migrated comprises a target object; inputting an image to be migrated and a style image into a pre-trained image style migration model for style migration processing to obtain a target image, wherein the target image comprises a target object, and the style of the target image is the style of the style image; and sending the target image to the terminal equipment. According to the image style migration method and device, the style of the style image can be rapidly migrated to the target object of the image to be migrated through the image style migration model trained in advance, and the target image with accurate details is obtained.

Description

Image processing method and device and cloud equipment

Technical Field

The present application relates to the field of image processing, and in particular, to an image processing method and apparatus, and a cloud device.

Background

With the development of internet technology, image processing means are more and more abundant. The image style migration refers to changing the details and style of an image under the condition of keeping the main content of the image, such as converting a portrait into a hand-drawing style and converting a photo into an animation style.

However, the current image style conversion technology is complex, and the obtained style conversion image has a lot of details loss.

Disclosure of Invention

Aspects of the present application provide an image processing method, an image processing device, and cloud equipment, so as to solve the problems that the current style conversion technology of an image is complex, and details of an obtained style conversion image are missing more.

A first aspect of the embodiments of the present application provides an image processing method, applied to a server, including: receiving an image to be migrated and a style image sent by terminal equipment, wherein the image to be migrated comprises a target object; inputting an image to be migrated and a style image into a pre-trained image style migration model for style migration processing to obtain a target image, wherein the target image comprises a target object, and the style of the target image is the style of the style image; and sending the target image to the terminal equipment.

A second aspect of the embodiments of the present application provides an image processing method, applied to a server, including: receiving a first remote sensing image and a second remote sensing image which are sent by terminal equipment, wherein the first remote sensing image comprises a target object; inputting the first remote sensing image and the second remote sensing image into a pre-trained image style migration model for style migration processing to obtain a target remote sensing image, wherein the target remote sensing image comprises a target object, and the style of the target remote sensing image is the style of the second remote sensing image; and sending the target remote sensing image to the terminal equipment.

A third aspect of the embodiments of the present application provides an image processing method, applied to a terminal device, including: acquiring a first remote sensing image and a second remote sensing image; sending the first remote sensing image and the second remote sensing image to a server so that the server performs style migration processing on the first remote sensing image and the second remote sensing image by adopting a pre-trained image style migration model to obtain a target remote sensing image; and receiving the target remote sensing image sent by the server.

A fourth aspect of the embodiments of the present application provides an image processing apparatus, applied to a server, including:

the receiving module is used for receiving the image to be migrated and the style image which are sent by the terminal equipment, wherein the image to be migrated comprises a target object;

the processing module is used for inputting the image to be migrated and the style image into a pre-trained image style migration model for style migration processing to obtain a target image, wherein the target image comprises a target object, and the style of the target image is the style of the style image;

and the sending module is used for sending the target image to the terminal equipment.

A fifth aspect of the embodiments of the present application provides a cloud device, including: a processor, a memory and a computer program stored on the memory and executable on the processor, the processor implementing the image processing method of the first, second or third aspect when executing the computer program.

A sixth aspect of embodiments of the present application provides a computer-readable storage medium, in which computer-executable instructions are stored, and when the computer-executable instructions are executed by a processor, the computer-executable instructions are configured to implement the image processing method according to the first aspect, the second aspect, or the third aspect.

A seventh aspect of embodiments of the present application provides a computer program product, including: a computer program, stored in a readable storage medium, from which at least one processor of an electronic device can read the computer program, execution of which by the at least one processor causes the electronic device to perform the image processing method of the first, second or third aspect.

The embodiment of the application is applied to a scene of image style migration, and the provided image processing method comprises the following steps: receiving an image to be migrated and a style image sent by terminal equipment, wherein the image to be migrated comprises a target object; inputting an image to be migrated and a style image into a pre-trained image style migration model for style migration processing to obtain a target image, wherein the target image comprises a target object, and the style of the target image is the style of the style image; and sending the target image to the terminal equipment. According to the method and the device, the style of the style image can be rapidly transferred to the target object of the image to be transferred through the image style transfer model trained in advance, and the target image with accurate details is obtained.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a view of an application scenario of an image processing method according to an exemplary embodiment of the present application;

FIG. 2 is a flowchart illustrating steps of an image processing method according to an exemplary embodiment of the present application;

FIG. 3 is a schematic diagram of obtaining a depth image according to an exemplary embodiment of the present disclosure;

FIG. 4 is a flowchart illustrating steps of a method for model training according to an exemplary embodiment of the present disclosure;

FIG. 5 is a schematic diagram of image fusion provided by an exemplary embodiment of the present application;

FIG. 6 is a schematic diagram of a distortion process provided by an exemplary embodiment of the present application;

FIG. 7 is a block diagram of a depth image construction model according to an exemplary embodiment of the present disclosure;

FIG. 8 is a schematic diagram of a model training process provided in an exemplary embodiment of the present application;

FIG. 9 is a flowchart illustrating steps of another image processing method according to an exemplary embodiment of the present application;

FIG. 10 is a flowchart illustrating steps of yet another method for image processing according to an exemplary embodiment of the present application;

fig. 11 is a block diagram of an image processing apparatus according to an exemplary embodiment of the present application;

fig. 12 is a schematic structural diagram of a cloud device according to an exemplary embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The image processing method provided by the application comprises the following steps: receiving an image to be migrated and a style image sent by terminal equipment, wherein the image to be migrated comprises a target object; inputting an image to be migrated and a style image into a pre-trained image style migration model for style migration processing to obtain a target image, wherein the target image comprises a target object, and the style of the target image is the style of the style image; and sending the target image to the terminal equipment. According to the image style migration method and device, the style of the style image can be rapidly migrated to the target object of the image to be migrated through the image style migration model trained in advance, and the target image with accurate details is obtained.

In this embodiment, the image processing method may be an image processing method that realizes the whole by means of a cloud computing system, and further, a server that executes the image processing method may be a cloud server so as to run various neural network models by virtue of resources on the cloud; with respect to the cloud, the image processing method may also be applied to a server device such as a conventional server or a server array, which is not limited herein.

In addition, the image processing method provided by the embodiment of the present application is applied to a scene of image style migration, where for example, referring to fig. 1, a user obtains images to be migrated I of a target object at different angles by shooting with a terminal device 11, then selects a style image S1, sends the images to be migrated I and the style image S1 at different angles to a server 12, and the server processes the images to be migrated I at different angles with an image style migration model obtained by training to obtain a target image P, and then returns the target image P to the terminal device 11. Therefore, the terminal equipment can conveniently and quickly obtain the target images with a plurality of styles transferred. The number of the images I to be migrated may be one or more.

The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.

Fig. 2 is a flowchart illustrating steps of an image processing method according to an exemplary embodiment of the present application. As shown in fig. 2, the image processing method is applied to a server, and specifically includes the following steps:

s201, receiving an image to be migrated and a style image sent by the terminal equipment.

Wherein the image to be migrated includes the target object. Specifically, the terminal device may be owned by a user, and the terminal device may acquire the image to be migrated through various channels, download the image on the internet or obtain the image through shooting by a camera of the terminal device, and besides, the style image may also be downloaded on the internet or obtained through shooting by a camera. The image to be migrated and the style image are different in style.

In an optional embodiment, the receiving of the image to be migrated and the style image sent by the terminal device includes: receiving an image to be migrated and a target style type sent by terminal equipment; sending a plurality of first images to be selected which accord with the target style type to the terminal equipment; and receiving the style image sent by the terminal equipment, wherein the style image is selected by the user in the plurality of first images to be selected.

The terminal equipment has various styles and types for users to select, such as sunny days, rainy and foggy days, snowy days and the like. If the user selects one style type, the style type is taken as a target style type. The terminal device sends the image to be migrated and the target style type to the server, and the server searches a plurality of first images to be selected which are in accordance with the target style type in the database, for example, if the target style type is a sunny day, the first images to be selected can be sunny days under various scenes, such as sunny days containing the sun, sunny days containing the white clouds, and the like. The user may then select one of the first plurality of images to be selected as the stylistic image.

In an optional embodiment, the receiving of the image to be migrated and the style image sent by the terminal device includes: receiving an image to be migrated sent by terminal equipment; sending a plurality of second images to be selected to the terminal equipment according to the style types of the images to be migrated, wherein the style types of the second images to be selected are matched with the style types of the images to be migrated; and receiving the style image sent by the terminal equipment, wherein the style image is selected by the user in the second images to be selected.

In the embodiment of the application, the matching relations of a plurality of style types can be stored in the server in advance, such as matching between rainy and foggy days and sunny days, and matching between snowy days and sunny days. The server can determine the style type of the image to be migrated after receiving the image to be migrated, if the style type of the image to be migrated is determined to be sunny, the style type matched with the sunny day is determined to be rainy day and snowy day, then the server determines a plurality of second images to be selected in accordance with the rainy day or the snowy day in the database, then the second images to be selected are sent to the terminal equipment, and the user can select other images as the style images through the terminal equipment.

S202, inputting the image to be migrated and the style image into a pre-trained image style migration model for style migration processing, and obtaining a target image.

The target image comprises a target object, and the style of the target image is the style of the style image. And extracting content information of the image to be migrated and style information of the style image by adopting a nerve style migration technology in the image style migration model, and fusing the content information and the style information to obtain a target image.

Specifically, the neural style migration technique is an optimization technique for mixing together the content information of an image to be migrated and the style information of a stylized image in two images so that the output target image is of a style having the content of the image to be migrated but adopting the style of the stylized image. Specifically, a convolution network may be used to extract content information of the graph to be migrated from the graph to be migrated and style information of the style image from the style image, where the convolution network is trained in advance.

In addition, the image style migration model is trained in advance, the trained image style migration model can capture an image to be migrated and style information irrelevant to the position in the image of the style by adopting a Gram matrix (a Gram matrix), the style information in the image to be migrated is removed to obtain content information, singular value decomposition is respectively carried out on the content information of the image to be migrated and the Gram matrix corresponding to the style information of the image of the style, matrix transformation is carried out according to the decomposition result, and the effect of migrating the style information of the style image to the content information of the image to be migrated is realized.

The target image can be obtained by performing distortion processing on the fused image obtained by fusing the content information of the image to be migrated and the style information of the style image.

Specifically, the distortion processing is implemented by using a Geometry Predicting Module (GPM), specifically, by using a distorted neural network, which is pre-trained. For example, the image dense prediction (SPN) technique may optimize the relationship between a point and an adjacent point through depth estimation and semantic segmentation, so as to obtain a more accurate and fine result.

In the neural style migration technique, some information may be locally lost to the obtained fused image, which may destroy the stereo consistency relationship between images at multiple angles. Therefore, the fused image is subjected to distortion processing to supplement missing information in the fused image, and a target image is obtained.

S203, the target image is sent to the terminal equipment.

After receiving the target image, the terminal device may perform other editing, such as screenshot, beautification, etc., on the target image. If the style migration of the target image is not satisfactory, the user can reselect the style image to perform the migration.

In the embodiment of the application, there may be a plurality of images to be migrated, and there may also be a plurality of stylistic images.

Referring to fig. 3, a plurality of images I to be migrated may be input into a depth image construction model trained in advance, so as to obtain a depth image D of a target object.

Optionally, a plurality of images to be migrated of the target object from different angles are received, and a plurality of target images are correspondingly obtained, and the image processing method further includes: acquiring camera parameters for shooting an image to be migrated; inputting a plurality of images to be migrated and camera parameters into a depth image construction model trained in advance for image processing to obtain a depth image of a target object; and sending the depth image to the terminal equipment.

The images to be migrated are images at different angles aiming at the target image. In this embodiment of the present application, the depth image construction model may process a plurality of images to be migrated to obtain a depth image at a certain angle, where the depth image is a depth image corresponding to the target object.

In the embodiment of the application, after the style migration is performed on a plurality of images to be migrated, the obtained target image can also obtain the depth image of the target image after the depth image construction model processing. The depth image may be used as an image to be migrated, and the style image may be selected and processed in S201 to S203, so that an image including the style of the style image corresponding to the depth information of the target object may be obtained.

According to the method and the device, the steps can be combined in any form, and then various images can be returned to the user, so that the user experience is improved.

Currently, a depth image is an image in which the distance from a camera to each point in a scene is used as a pixel value, and can directly reflect the geometric shape of a visible surface of an object. The depth image can obtain point cloud data through coordinate conversion, and further can be used for three-dimensional reconstruction of an object. The multi-angle image can be processed through the neural network model, and the depth image is obtained. At present, the training of the neural network model depends on a large number of image sets with labeled depth images, but the acquisition of the image sets with labeled depth images takes a large amount of time, so that the training efficiency of the neural network model is low

Further, multi-view stereo (MVS) based three-dimensional reconstruction is a subject of great interest. With the development of deep learning methods, a series of neural network models combining traditional MVS methods and deep learning methods appear, for example: MVSNet, R-MVSNet, cascade MVSNet, etc. The MVS may embed a matching relationship of the solid geometry into a Cost matching entity (Cost Volume) through a differential Homography projection (differential Homography), so as to implement an end-to-end neural network. Further, the input of the MVS is any number of multi-view images and camera parameters, and the output of the MVS is a depth image at a reference view. Compared with the traditional MVS, the MVS combined with the neural network model can better reconstruct dense three-dimensional point cloud information, and has stronger robustness to weak texture and noise interference. However, the training of the MVS relies on a large amount of annotation data (actual depth images corresponding to a plurality of images with different visions) in a three-dimensional scene, and since the determination process of the annotation data is complex and takes a long time, and the training process of the MVS requires a large amount of training samples, the training of the MVS is hindered, and the training efficiency of the MVS is low.

Based on this, the present application provides a training method for a depth image construction model, which specifically includes the following steps with reference to fig. 4:

s401, a first image set, and an annotated depth image and a second image set corresponding to the first image set are obtained.

The images included in the first image set are first images corresponding to the first object at different angles, and the images included in the second image set are second images corresponding to the second object at different angles.

In the embodiment of the present application, the first image set is obtained by taking the first object with different angles by the camera. The labeled depth image is a depth image corresponding to the first image set, and specifically may be a depth image of the first image at a certain angle in the first image set, where the labeled depth image is labeled in advance. In addition, the second image set is obtained by taking a second image by the camera at a different angle, and the second image set is an image set without the depth image being labeled.

Further, the first object and the second image may be the same or different. The camera that takes the first image set and the camera that takes the second image set may be the same or different. Furthermore, the different first images may be captured by different cameras at different angles or by the same camera at the same angle, which is not limited in this application.

In addition, since the determination process of the annotated depth image of the first image set is complex, in the embodiment of the present application, a smaller proportion of the first image set and a larger proportion of the second image set are selected for the training of the subsequent depth image construction model, so as to improve the training efficiency of the depth image construction model. For example, the ratio of the number of first image sets to the number of second image sets is 1 to 9.

S402, aiming at the first image in the first image set, carrying out fusion processing on the content information of the first image and the style information of the second image in the second image set to obtain a third image of the style information corresponding to the first object.

The content information of the first image refers to the content characteristics of a first object representing the first image in the first image; the style information of the second image refers to a feature in the second image that represents a style of the second image. Specifically, content information is extracted for a first image, style information is extracted for a second image, and the content information and the style information are fused to obtain a third image, wherein the content of the third image is the content of the first image, and the style is the style of the second image.

In addition, the bookThe application may be to perform fusion processing on each of the first image and the second image to obtain a third image. Illustratively, referring to fig. 5, if the plurality of images I to be migrated in fig. 1 are the first image set I, the method includes: first image I ₁ A first image I ₂ And a first image I ₃ The second set S of images includes: second image S ₁ A second image S ₂ And a second image S ₃ . Wherein the stylistic image of fig. 1 may be a second image wherein the first image I ₁ Fusing the content information of the second image S ₁ The style information of (2) to obtain a third image P of the third image set P ₁ First image I ₂ Fusing the content information of the second image S ₂ Obtaining a third image P of the third image set P ₂ First image I ₃ Fusing the content information of the second image S ₃ The style information of (2) to obtain a third image P of the third image set P ₃ 。

Optionally, the fusing, with respect to a first image in the first image set, content information of the first image and style information of a second image in the second image set to obtain a third image of the style information corresponding to the first object, includes: and aiming at a first image in the first image set, inputting the first image and a second image into a pre-trained image style migration model, extracting content information of the first image and style information of the second image in the image style migration model by adopting a nerve style migration technology, and fusing the content information and the style information to obtain a third image of the style information corresponding to the first object.

The neural style migration technology is an optimization technology for mixing the content information of the first image and the style information of the second image in the two images together, so that the output image has the content of the first image but adopts the style of the second image. Specifically, a convolution network may be used to extract content information of the first image from the first image and style information of the second image from the second image, wherein the convolution network is trained in advance.

In addition, the image style migration model is trained in advance, the trained image style migration model can capture style information irrelevant to positions in the first image and the second image by adopting a Gram matrix (Gram matrix), remove the style information in the first image to obtain content information, then respectively carry out singular value decomposition on Gram matrices corresponding to the content information of the first image and the style information of the second image, and carry out matrix transformation according to a decomposition result, so that the effect of migrating the style information of the second image to the content information of the first image is realized.

Wherein the third image comprises the content of the first image (e.g. the first object) and the style of the second image. Reference may be made in particular to the third image of the third image set P in fig. 5.

Further, fusing the content information and the style information to obtain a third image of the style information corresponding to the first object, including: fusing content information and style information to obtain an intermediate image; and carrying out distortion processing on the intermediate image to obtain a third image.

The distortion processing is implemented by a Geometry Preprocessing Module (GPM), specifically, by a distorted neural network, which is pre-trained. For example, the image dense prediction (SPN) technique may optimize the relationship between a point and an adjacent point through depth estimation and semantic segmentation, so as to obtain a more accurate and fine result.

In the neural style migration technology, much information is locally lost in the obtained intermediate image, and the three-dimensional consistency relationship among images at multiple angles is damaged. Therefore, the intermediate image is subjected to distortion processing to complement information missing in the intermediate image, resulting in a third image. Exemplarily, referring to fig. 6 and 5, the intermediate image Z ₃ The first image I may be subjected to a neural style migration technique ₃ And a second image S ₃ Processed, enlarged image Zf ₃ Is an intermediate image Z ₃ An enlarged view of the white frame. Third picture P ₃ Is to distort the intermediate image Z ₃ Obtained after processing, enlarged image Pf ₃ Is the third picture P ₃ An enlarged view of the white frame shows the distortionThird image P obtained by transformation ₃ Than the intermediate image Z ₃ With a clearer texture. Further, the depth image Zd ₃ May be to the intermediate image Z ₁ Intermediate image Z ₂ And an intermediate image Z ₃ Corresponding depth image, wherein the intermediate image Z ₁ The third image P can be obtained through distortion processing ₁ Intermediate image Z ₂ The third image P can be obtained after distortion processing ₂ . Third picture P ₁ A third picture P ₂ Third picture P ₃ Corresponding depth image Pd ₃ As can be seen, the depth image constructed from the third image obtained by the distortion processing contains more depth information, and it is further determined that the third image with higher quality can be obtained by the distortion processing.

And S403, taking a plurality of third images as training samples, marking the depth images as label images, training the depth image construction model, and obtaining the trained depth image construction model.

The depth image construction model adopts a skeleton network model, specifically an MVS model.

In the training process, a plurality of third images are input into the depth image construction model to obtain a predicted depth image, loss values of the predicted depth image and the labeled depth image are calculated, if the loss values are larger than or equal to a first loss value threshold value, the depth image construction model is adjusted by the loss values, and if the loss values are smaller than the first loss value threshold value, the training of the depth image construction model is completed.

Further, a plurality of first image sets and a plurality of second image sets are provided, and a plurality of third image sets, each of which contains a plurality of third images, can be obtained through the above steps. And further, the enhancement of a training sample (a third image set) can be realized, when the number of training samples is increased, the robustness of the depth image construction model obtained by training is higher, a large number of image sets do not need to be marked manually, and further the training efficiency of the depth image construction model can be improved.

Optionally, the training of the depth image construction model by using a plurality of third images as training samples and labeling the depth images as label images to obtain the trained depth image construction model includes: inputting a first parameter of a camera for shooting a first image set and a plurality of third images into a depth image construction model for image processing to obtain a first prediction depth image; and adjusting model parameters of the depth image construction model according to the loss values of the first prediction depth image and the marked depth image to obtain the trained depth image construction model.

The depth image construction model is a skeleton network, and the skeleton network may be any MVS network, for example: MVSNet, casMVSNet, and the like, and a specific structure of the depth image construction model refers to fig. 7, wherein the depth image construction model includes: the system comprises a first convolution network layer, a micro-homography projection layer and a three-dimensional convolution network layer. The first convolution network layer performs feature extraction on an input image set by adopting input camera parameters to obtain a feature image, the micro-homography projection layer embeds the matching relation of the solid geometry of the feature image into a cost matching body to obtain a cost body, the cost body is input into the three-dimensional convolution network layer to perform regularization processing to obtain a probability body, and the probability body is subjected to parameterization processing to obtain a prediction depth image.

In the embodiment of the present application, the first parameter is a camera parameter corresponding to each first image of the first image set, and includes an internal camera parameter and an external camera parameter. And in the parameter map 8, the first image set is used as an input image set, the first parameters are used as input camera parameters, a depth image construction model is input, and the obtained predicted depth image is a first predicted depth map.

Referring to fig. 8, in the training process, a third image set (a plurality of third images) and a first parameter are input into a depth image construction model, a first predicted depth image is correspondingly output, a first loss value of the first predicted depth image and a first loss value of an labeled depth image are calculated, the first loss value represents style consistency of the first predicted depth image and the labeled depth image, and if the first loss value is greater than or equal to a first loss value threshold, the depth image construction model is adjusted by using the first loss value.

Further, the method also comprises the following steps: performing luminosity distortion processing on a second image in a second image set to obtain an enhanced image set; inputting a second parameter of a second image set and a camera for shooting the second image set into a depth image construction model for image processing to obtain a first probability volume; inputting the enhanced image set and the second parameter into a depth image construction model for image processing to obtain a second probability volume; and adjusting the model parameters of the depth image construction model according to the loss values of the first probability body and the second probability body.

Wherein the photometric distortion processing comprises: and adjusting the brightness, the contrast, the chroma and the saturation of the second image or adding noise and the like to obtain an enhanced image set, wherein the enhanced image set comprises a plurality of fourth images, and each fourth image is a result of the photometric distortion of the corresponding second image.

Referring to fig. 7 and 8, the first probability volume is obtained by processing the second image set by the depth image construction model without parameterization. Further, the second parameters include camera internal parameters and camera external parameters. And the second probability volume is obtained by processing the enhanced image set by the depth image construction model without parameterization. And calculating a fourth loss value of the first probability body and the second probability body, wherein the fourth loss value can also be consistency loss, and if the fourth loss value is greater than or equal to a fourth loss value threshold, adjusting the depth image to construct the model by using the fourth loss value. Further, a KL (Kullback-Leibler Divergence) Divergence may be used to determine the fourth loss value.

Further, the depth image may be trained using the second set of images and the second image to construct a model. And inputting the second image set into the depth image construction model to obtain a third predicted depth map, calculating a third loss value of the third predicted depth map and the second image, and adjusting the depth image construction model by using the third loss value threshold if the third loss value is greater than or equal to the third loss value threshold, with reference to fig. 8. Wherein the third loss value is a reprojection loss value.

In the embodiment of the application, the second image set and the enhanced image set are adopted to train the depth image construction model, which belongs to unsupervised loss training, and the integrity of the trained depth image construction model can be improved.

In addition, the first image set and the marked depth image can be used for training the depth image to construct a model. Referring to fig. 8, the first image set is input into the depth image construction model to obtain a second predicted depth image, a second loss value of the second predicted depth image and a second loss value of the labeled depth image are calculated, and if the second loss value is greater than or equal to a second loss threshold value, the depth image construction model is adjusted by using the second loss value. Wherein the second penalty value may be a covariance penalty value. In addition, after the trained depth image construction model is obtained, the depth image construction model can be embedded into a three-dimensional reconstruction frame to realize the three-dimensional scene reconstruction of the complete process.

In the embodiment of the application, the first image set and the labeled depth image are adopted to train the depth image construction model without supervision loss training, so that the accuracy of the trained depth image construction model can be improved.

In conclusion, in the embodiment of the application, the content information of the first image and the style information of the second image are fused by adopting a neural style migration technology, so that the enhancement and the expansion of the training sample can be realized, and the efficiency of constructing the model by the depth image is improved. Furthermore, the third image used for training the depth image building model with higher quality can be obtained through distortion processing of the intermediate image. In addition, the integrity and the accuracy of the depth image construction model obtained by training can be improved through the combination of supervision loss, unsupervised loss and style loss, and the robustness of the depth image construction model is further improved.

Fig. 9 is a flowchart of steps of another image processing method provided in an exemplary embodiment of the present application, which is applied to a server, and as shown in fig. 9, the method specifically includes the following steps:

and S901, receiving the first remote sensing image and the second remote sensing image sent by the terminal equipment.

Wherein the first remote sensing image comprises a target object. The first remote sensing image can be used as an image to be migrated, and the second remote sensing image can be used as a style image.

And S902, inputting the first remote sensing image and the second remote sensing image into a pre-trained image style migration model for style migration processing to obtain a target remote sensing image.

The target remote sensing image comprises a target object, and the style of the target remote sensing image is the style of the second remote sensing image.

Further, the image style migration model can be trained in advance by using remote sensing images.

And S903, sending the target remote sensing image to the terminal equipment.

Further, a plurality of first remote sensing images of the target object from different angles are received, and the image processing method further comprises the following steps: acquiring a camera parameter for shooting a first remote sensing image; inputting a plurality of first remote sensing images and camera parameters into a depth image construction model trained in advance for image processing to obtain a first depth image of a target object; and inputting the first depth image and the second remote sensing image into an image style migration model to perform style migration processing to obtain a second depth image, wherein the second depth image comprises the depth information of the target object, and the style of the second depth image is the style of the second remote sensing image and is transmitted to the terminal equipment.

In the embodiment of the application, the pre-trained image style migration model can process the remote sensing image to realize the style migration of the remote sensing image. The specific implementation process of the method is referred to above, and is not described herein again.

Fig. 10 is a flowchart illustrating steps of an image processing method according to an exemplary embodiment of the present application, and is applied to a terminal device. As shown in fig. 10, the method specifically includes the following steps:

s101, acquiring a first remote sensing image and a second remote sensing image.

The terminal equipment can control the remote sensing camera to move or rotate, and then first remote sensing images of the target object at different angles can be shot. The second remotely sensed image may also be taken or selected.

And S102, sending the first remote sensing image and the second remote sensing image to a server.

And performing style migration processing on the first remote sensing image and the second remote sensing image by using a pre-trained image style migration model by the server to obtain a target remote sensing image. And in the server processing process, the first remote sensing image is taken as an image to be migrated, and the second remote sensing image is taken as a style image.

And S103, receiving the target remote sensing image sent by the server.

Further, the server determines a target depth image according to the multiple first remote sensing images and the camera parameters by adopting a depth image construction model, and the depth image construction model is obtained by adopting the model training method.

And then the terminal equipment sends the collected multiple first remote sensing images and camera parameters to a server, the server inputs the multiple first remote sensing images and camera parameters into the depth image construction model trained by the method for processing, the depth image construction model outputs a target depth image, and then the target depth image is sent to the terminal equipment.

Further, the terminal device receives the target depth image, and the target depth image comprises depth information of each pixel point, so that three-dimensional reconstruction of the target object can be achieved.

According to the depth image construction model with high robustness, the depth image construction model is embedded into the three-dimensional reconstruction framework of the service, a plurality of target images sent by the terminal device can be efficiently and accurately processed, and the high-quality target depth image is obtained.

In the embodiment of the present application, referring to fig. 11, in addition to providing an image processing method, there is provided an image processing apparatus 110, the image processing apparatus 110 including:

the receiving module 111 is configured to receive an image to be migrated and a style image sent by a terminal device, where the image to be migrated includes a target object;

the processing module 112 is configured to input the image to be migrated and the style image into a pre-trained image style migration model for style migration processing to obtain a target image, where the target image includes a target object and the style of the target image is the style of the style image;

a sending module 113, configured to send the target image to the terminal device.

In an optional embodiment, the receiving module 111 is specifically configured to: receiving an image to be migrated and a target style type sent by terminal equipment; sending a plurality of first images to be selected which accord with the target style type to the terminal equipment; and receiving the style image sent by the terminal equipment, wherein the style image is selected by the user in the plurality of first images to be selected.

In an optional embodiment, the receiving module 111 is specifically configured to: receiving an image to be migrated sent by terminal equipment; sending a plurality of second images to be selected to the terminal equipment according to the style types of the images to be migrated, wherein the style types of the second images to be selected are matched with the style types of the images to be migrated; and receiving the style image sent by the terminal equipment, wherein the style image is selected by the user in the second images to be selected.

In an optional embodiment, when a plurality of images to be migrated of a target object from different angles are received, and a plurality of target images are obtained correspondingly, the image processing apparatus 110 further includes: a depth determination module (not shown) specifically configured to: acquiring camera parameters for shooting an image to be migrated; inputting a plurality of images to be migrated and camera parameters into a depth image construction model trained in advance for image processing to obtain a depth image of a target object; and sending the depth image to the terminal equipment.

In an alternative embodiment, the image processing apparatus 110 further comprises: a training module (not shown) specifically configured to: acquiring a first image set, an annotated depth image corresponding to the first image set and a second image set, wherein the images included in the first image set are first images corresponding to a first object at different angles, and the images included in the second image set are second images corresponding to a second object at different angles; performing fusion processing on content information of a first image and style information of a second image in a second image set aiming at the first image in the first image set to obtain a third image of the style information corresponding to the first object; and adopting a plurality of third images as training samples, marking the depth images as label images to train the depth image construction model, and obtaining the trained depth image construction model.

In an optional embodiment, when the training module performs fusion processing on content information of a first image in a first image set and style information of a second image in a second image set to obtain a third image of the style information corresponding to the first object, the training module is specifically configured to: and inputting the first image and the second image into an image style migration model aiming at the first image in the first image set, extracting the content information of the first image and the style information of the second image in the image style migration model by adopting a nerve style migration technology, and fusing the content information and the style information to obtain a third image of the style information corresponding to the first object.

In an optional embodiment, when the training module fuses the content information and the style information to obtain a third image of the style information corresponding to the first object, the training module is specifically configured to: fusing content information and style information to obtain an intermediate image; and carrying out distortion processing on the intermediate image to obtain a third image.

In an optional embodiment, when the training module uses a plurality of third images as training samples and labels the depth images as label images to train the depth image construction model, and obtains the trained depth image construction model, the training module is specifically configured to: inputting a first parameter of a camera for shooting a first image set and a plurality of third images into a depth image construction model for image processing to obtain a first prediction depth image; and adjusting the model parameters of the depth image construction model according to the first prediction depth image and the loss value of the marked depth image to obtain the trained depth image construction model.

In an optional embodiment, the training module is further configured to: performing luminosity distortion processing on a second image in a second image set to obtain an enhanced image set; inputting a second parameter of a second image set and a camera for shooting the second image set into a depth image construction model for image processing to obtain a first probability volume; inputting the enhanced image set and the second parameter into a depth image construction model for image processing to obtain a second probability volume; and adjusting the model parameters of the depth image construction model according to the loss values of the first probability body and the second probability body.

In the embodiment of the present application, there is also provided another image processing apparatus (not shown) applied to a server, the image processing apparatus configured to: receiving a first remote sensing image and a second remote sensing image which are sent by terminal equipment, wherein the first remote sensing image comprises a target object; inputting the first remote sensing image and the second remote sensing image into a pre-trained image style transfer model for carrying out style transfer processing to obtain a target remote sensing image, wherein the target remote sensing image comprises a target object, and the style of the target remote sensing image is the style of the second remote sensing image; and sending the target remote sensing image to the terminal equipment.

In an optional embodiment, a plurality of first remote sensing images of the target object from different angles are received, and the image processing apparatus is further configured to: acquiring a camera parameter for shooting a first remote sensing image; inputting a plurality of first remote sensing images and camera parameters into a depth image construction model trained in advance for image processing to obtain a first depth image of a target object; inputting the first depth image and the second remote sensing image into an image style migration model for style migration processing to obtain a second depth image, wherein the second depth image comprises depth information of a target object, and the style of the second depth image is the style of the second remote sensing image; and sending the second depth image to the terminal equipment.

In the embodiment of the present application, there is provided still another image processing apparatus (not shown) applied to a terminal device, the image processing apparatus being configured to: acquiring a first remote sensing image and a second remote sensing image; sending the first remote sensing image and the second remote sensing image to a server so that the server performs style migration processing on the first remote sensing image and the second remote sensing image by adopting a pre-trained image style migration model to obtain a target remote sensing image; and receiving the target remote sensing image sent by the server.

In addition, in some of the flows described in the above embodiments and the drawings, a plurality of operations are included in a specific order, but it should be clearly understood that the operations may be executed out of order or in parallel as they appear in the present document, and only for distinguishing between the various operations, and the sequence number itself does not represent any execution order. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different.

Fig. 12 is a schematic structural diagram of a cloud device 120 according to an exemplary embodiment of the present application. The cloud device 120 is configured to execute the image processing method. As shown in fig. 12, the cloud device includes: a memory 124 and a processor 125.

A memory 124 for storing computer programs and may be configured to store other various information to support operations on the cloud device. The Storage 124 may be an Object Storage Service (OSS).

The memory 124 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

A processor 125, coupled to the memory 124, for executing the computer program in the memory 124, so as to receive the image to be migrated and the style image sent by the terminal device, where the image to be migrated includes the target object; inputting an image to be migrated and a style image into a pre-trained image style migration model for style migration processing to obtain a target image, wherein the target image comprises a target object, and the style of the target image is the style of the style image; and sending the target image to the terminal equipment.

Further optionally, when receiving the image to be migrated and the style image sent by the terminal device, the processor 125 is specifically configured to: receiving an image to be migrated and a target style type sent by terminal equipment; sending a plurality of first images to be selected which accord with the target style type to the terminal equipment; and receiving the style image sent by the terminal equipment, wherein the style image is selected by the user in the plurality of first images to be selected.

Further optionally, when receiving the image to be migrated and the style image sent by the terminal device, the processor 125 is specifically configured to: receiving an image to be migrated sent by terminal equipment; sending a plurality of second images to be selected to the terminal equipment according to the style types of the images to be migrated, wherein the style types of the second images to be selected are matched with the style types of the images to be migrated; and receiving the style image sent by the terminal equipment, wherein the style image is selected by the user in the second images to be selected.

Further optionally, the processor 125 is further configured to receive a plurality of images to be migrated of the target object from different angles, and correspondingly obtain a plurality of target images: acquiring camera parameters for shooting an image to be migrated; inputting a plurality of images to be migrated and camera parameters into a depth image construction model trained in advance for image processing to obtain a depth image of a target object; and sending the depth image to the terminal equipment.

Further optionally, the processor 125 is further configured to: acquiring a first image set, and an annotated depth image and a second image set which correspond to the first image set, wherein the images included in the first image set are first images corresponding to a first object at different angles, and the images included in the second image set are second images corresponding to a second object at different angles; performing fusion processing on content information of a first image and style information of a second image in a second image set aiming at the first image in the first image set to obtain a third image of the style information corresponding to the first object; and adopting a plurality of third images as training samples, marking the depth images as label images to train the depth image construction model, and obtaining the trained depth image construction model.

Further optionally, when the processor 125 performs fusion processing on the content information of the first image in the first image set and the style information of the second image in the second image set to obtain a third image of the style information corresponding to the first object, specifically configured to: and inputting the first image and the second image into a pre-trained image style migration model aiming at the first image in the first image set, extracting the content information of the first image and the style information of the second image by adopting a nerve style migration technology in the image style migration model, and fusing the content information and the style information to obtain a third image of the style information corresponding to the first object.

Further optionally, when the processor 125 fuses the content information and the style information to obtain the third image of the style information corresponding to the first object, the processor is specifically configured to: fusing the content information and the style information to obtain an intermediate image; and carrying out distortion processing on the intermediate image to obtain a third image.

Further optionally, when the processor 125 uses a plurality of third images as training samples and labels the depth images as label images to train the depth image construction model, and obtains the trained depth image construction model, the processor is specifically configured to: inputting a first parameter of a camera for shooting a first image set and a plurality of third images into a depth image construction model for image processing to obtain a first prediction depth image; and adjusting the model parameters of the depth image construction model according to the first prediction depth image and the loss value of the marked depth image to obtain the trained depth image construction model.

Further optionally, the processor 125 is further configured to: performing luminosity distortion processing on a second image in a second image set to obtain an enhanced image set; inputting a second parameter of a second image set and a camera for shooting the second image set into a depth image construction model for image processing to obtain a first probability volume; inputting the enhanced image set and the second parameter into a depth image construction model for image processing to obtain a second probability volume; and adjusting the model parameters of the depth image construction model according to the loss values of the first probability body and the second probability body.

In an alternative embodiment, the processor 125, coupled to the memory 124, is configured to execute the computer program in the memory 124 to: receiving a first remote sensing image and a second remote sensing image which are sent by terminal equipment, wherein the first remote sensing image comprises a target object; inputting the first remote sensing image and the second remote sensing image into a pre-trained image style transfer model for carrying out style transfer processing to obtain a target remote sensing image, wherein the target remote sensing image comprises a target object, and the style of the target remote sensing image is the style of the second remote sensing image; and sending the target remote sensing image to the terminal equipment.

In an alternative embodiment, the processor 125, coupled to the memory 124, is configured to execute the computer program in the memory 124 to: acquiring a first remote sensing image and a second remote sensing image; sending the first remote sensing image and the second remote sensing image to a server so that the server performs style migration processing on the first remote sensing image and the second remote sensing image by adopting a pre-trained image style migration model to obtain a target remote sensing image; receiving target remote sensing image sent by server

Further, as shown in fig. 12, the cloud device further includes: firewall 121, load balancer 122, communications component 126, power component 123, and other components. Only some of the components are schematically shown in fig. 12, and it is not meant that the cloud device includes only the components shown in fig. 12.

The cloud equipment provided by the embodiment of the application can be used for quickly and efficiently training the depth image construction model so as to determine the depth image according to images at different visual angles.

Accordingly, the present application also provides a computer readable storage medium storing a computer program, which when executed by a processor causes the processor to implement the steps of the above-mentioned method.

Accordingly, embodiments of the present application also provide a computer program product, which includes computer programs/instructions, when executed by a processor, cause the processor to implement the steps in the above-described illustrated method.

The communications component of fig. 12 described above is configured to facilitate communications between the device in which the communications component is located and other devices in a wired or wireless manner. The device where the communication component is located can access a wireless network based on a communication standard, such as a WiFi, a 2G, 3G, 4G/LTE, 5G and other mobile communication networks, or a combination thereof. In an exemplary embodiment, the communication component receives a broadcast signal or broadcast-related text from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared information association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

The power supply module of fig. 12 provides power to various components of the device in which the power supply module is located. The power components may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device in which the power component is located.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable information processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable information processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable information processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable information processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement the text storage by any method or technology. The text may be computer readable instructions, information structures, modules of a program, or other information. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store text that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated information signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising one of 8230; \8230;" 8230; "does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.

The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art to which the present application pertains. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. An image processing method applied to a server, the image processing method comprising:

receiving an image to be migrated and a style image sent by terminal equipment, wherein the image to be migrated comprises a target object;

acquiring camera parameters for shooting the image to be migrated;

inputting the image to be migrated and the camera parameters into a depth image construction model trained in advance for image processing to obtain a depth image of the target object;

inputting the depth image and the style image into a pre-trained image style migration model for style migration processing to obtain a target image, wherein the target image comprises the target object, and the style of the target image is the style of the style image;

and sending the target image to the terminal equipment.

2. The image processing method according to claim 1, wherein the receiving of the image to be migrated and the style image sent by the terminal device comprises:

receiving the image to be migrated and the target style type sent by the terminal equipment;

a plurality of first images to be selected which are sent to the terminal equipment and accord with the target style type;

and receiving a style image sent by the terminal equipment, wherein the style image is selected by the user in the plurality of first images to be selected.

3. The image processing method according to claim 1, wherein the receiving of the image to be migrated and the style image sent by the terminal device comprises:

receiving an image to be migrated sent by the terminal equipment;

sending a plurality of second images to be selected to the terminal equipment according to the style types of the images to be migrated, wherein the style types of the second images to be selected are matched with the style types of the images to be migrated;

and receiving the style image sent by the terminal equipment, wherein the style image is selected by the user in the second images to be selected.

4. An image processing method according to any one of claims 1 to 3, wherein the depth image construction model is trained in the following way:

acquiring a first image set, and an annotated depth image and a second image set which correspond to the first image set, wherein the images included in the first image set are first images corresponding to a first object at different angles, and the images included in the second image set are second images corresponding to a second object at different angles;

for a first image in the first image set, performing fusion processing on content information of the first image and style information of a second image in the second image set to obtain a third image of the first object corresponding to the style information;

and taking a plurality of third images as training samples, and taking the marked depth images as label images to train a depth image construction model to obtain the trained depth image construction model.

5. The image processing method according to claim 4, wherein the obtaining, by performing fusion processing on content information of a first image in the first image set and style information of a second image in the second image set with respect to the first image in the first image set, a third image of the first object corresponding to the style information comprises:

and inputting the first image and the second image into the image style migration model aiming at the first image in the first image set, extracting content information of the first image and style information of the second image by adopting a nerve style migration technology in the image style migration model, and fusing the content information and the style information to obtain a third image of the first object corresponding to the style information.

6. The image processing method according to claim 5, wherein said fusing the content information and the genre information to obtain a third image of the first object corresponding to the genre information comprises:

fusing the content information and the style information to obtain an intermediate image;

and carrying out distortion processing on the intermediate image to obtain the third image.

7. The image processing method according to claim 6, wherein the obtaining of the trained depth image construction model by using the plurality of third images as training samples and the labeled depth image as a label image training depth image construction model comprises:

inputting a first parameter of a camera for shooting the first image set and a plurality of third images into the depth image construction model for image processing to obtain a first prediction depth image;

and adjusting the model parameters of the depth image construction model according to the loss values of the first prediction depth image and the marked depth image to obtain the trained depth image construction model.

8. The image processing method according to claim 4, further comprising:

performing luminosity distortion processing on a second image in the second image set to obtain an enhanced image set;

inputting second parameters of the second image set and a camera for shooting the second image set into the depth image construction model for image processing to obtain a first probability volume;

inputting the enhanced image set and the second parameter into the depth image construction model for image processing to obtain a second probability volume;

and adjusting the model parameters of the depth image construction model according to the loss values of the first probability body and the second probability body.

9. An image processing method applied to a server, the image processing method comprising:

receiving a first remote sensing image and a second remote sensing image which are sent by terminal equipment, wherein the first remote sensing image comprises a target object;

acquiring a camera parameter for shooting the first remote sensing image;

inputting the plurality of first remote sensing images and the camera parameters into a depth image construction model trained in advance for image processing to obtain a first depth image of the target object;

inputting the first depth image and the second remote sensing image into the image style migration model for style migration processing to obtain a second depth image, wherein the second depth image comprises depth information of the target object, and the style of the second depth image is the style of the second remote sensing image;

and sending the second depth image to the terminal equipment.

10. An image processing method applied to a terminal device, the image processing method comprising:

acquiring a first remote sensing image and a second remote sensing image;

sending the first remote sensing image and the second remote sensing image to a server so that the server can obtain a camera parameter for shooting the first remote sensing image; inputting the plurality of first remote sensing images and the camera parameters into a depth image construction model trained in advance for image processing to obtain a first depth image of the target object; inputting the first depth image and the second remote sensing image into the image style migration model for style migration processing to obtain a second depth image, wherein the second depth image comprises depth information of the target object, and the style of the second depth image is the style of the second remote sensing image;

and receiving the second depth image sent by the server.

11. An image processing apparatus applied to a server, comprising:

the system comprises a receiving module, a judging module and a display module, wherein the receiving module is used for receiving an image to be migrated and a style image which are sent by terminal equipment, and the image to be migrated comprises a target object;

the depth determination module is used for acquiring camera parameters for shooting the image to be migrated, and inputting the image to be migrated and the camera parameters into a depth image construction model trained in advance for image processing to obtain a depth image of the target object;

the processing module is used for inputting the image to be migrated and the style image into a pre-trained image style migration model for style migration processing to obtain a target image, wherein the target image comprises the target object, and the style of the target image is the style of the style image;

12. A cloud device, comprising: processor, memory and computer program stored on the memory and executable on the processor, which when executed by the processor implements the image processing method according to any of claims 1 to 10.