CN111652796A

CN111652796A - Image processing method, electronic device, and computer-readable storage medium

Info

Publication number: CN111652796A
Application number: CN202010403084.9A
Authority: CN
Inventors: 徐伟
Original assignee: Shanghai Lianshang Network Technology Co Ltd
Current assignee: Shanghai Lianshang Network Technology Co Ltd
Priority date: 2020-05-13
Filing date: 2020-05-13
Publication date: 2020-09-11

Abstract

The application provides an image processing method, an electronic device and a computer readable storage medium. According to the method and the device, the target object is detected through the first image, the region of interest of the target object is obtained, then, the neural network model is utilized to carry out pixel-level foreground and background prediction on the region of interest image of the target object, a foreground and background prediction result is obtained, then, according to the foreground and background prediction result, the foreground and mask region image is extracted from the region of interest image of the target object, and fusion processing is carried out on the foreground and mask region image and the second image, so that a fusion image is obtained, the image customization effect and the image customization efficiency are improved, and the user experience is improved.

Description

Image processing method, electronic device, and computer-readable storage medium

[ technical field ] A method for producing a semiconductor device

The present disclosure relates to image processing technologies, and in particular, to an image processing method, an electronic device, and a computer-readable storage medium.

[ background of the invention ]

In various scenes such as daily life and work, users often have a need to customize a specific image. For example, due to the need of employment and office work, users need to customize head portrait photos such as their own identity photos and work photos; as another example, the background of the life photo of the user is replaced by a beautiful landscape image or a star image, and the like.

In the prior art, a specific part is separated from an original image to form an individual layer by generally adopting a matting and layer fusion mode, and then the specific part and a background layer to be replaced are fused into one layer, so that a specific image is customized.

The method for customizing the specific image by adopting the cutout and layer fusion mode needs professional image processing software (such as Photoshop), is complex in operation and low in processing efficiency, needs a user to have professional software use experience, and may influence the image fusion effect due to insufficient user experience, so that the user experience effect is poor. Therefore, it is desirable to provide an image processing method to improve the image fusion effect and the image processing efficiency and improve the user experience.

[ summary of the invention ]

Aspects of the present disclosure provide an image processing method, an electronic device, and a computer-readable storage medium, which are used to improve an image customization effect and an image customization efficiency.

In one aspect of the present application, an image processing method is provided, including:

detecting a target object in the first image to obtain an interested area of the target object;

performing pixel-level foreground and background prediction on the region-of-interest image of the target object by using a neural network model to obtain a foreground and background prediction result, wherein the foreground and background prediction result comprises: pixels belonging to the foreground and pixels belonging to the background in the region-of-interest image of the target object;

and extracting a foreground mask region image from the region-of-interest image of the target object according to the front background prediction result, and performing fusion processing on the foreground mask region image and the second image to obtain a fusion image.

In another aspect of the present application, there is provided an electronic device, including:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement an image processing method as provided in an aspect above.

In another aspect of the present application, a computer-readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements an image processing method as provided in the above aspect.

According to the technical scheme, after the target object detection is carried out on the first image to obtain the interesting region of the target object, the neural network model is used for carrying out pixel-level foreground and background prediction on the interesting region image of the target object to obtain a foreground and background prediction result, then the foreground mask region image is extracted from the interesting region image of the target object according to the foreground and background prediction result, and then the foreground mask region image and the second image are subjected to fusion processing to obtain a fusion image. Therefore, the user does not need to manually operate professional image processing software, the user operation is simplified, and the image processing efficiency is improved.

In addition, by adopting the technical scheme provided by the application, the image fusion effect can be prevented from being influenced by insufficient experience of the user due to the fact that the user does not need to manually operate professional image processing software, and the image fusion effect is improved.

In addition, by adopting the technical scheme provided by the application, the user experience can be effectively improved.

[ description of the drawings ]

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and those skilled in the art can also obtain other drawings according to the drawings without inventive labor.

Fig. 1 is a schematic flowchart of an image processing method according to an embodiment of the present application;

FIG. 2 is a block diagram of an exemplary computer system/server 12 suitable for use in implementing embodiments of the present application.

[ detailed description ] embodiments

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terminal according to the embodiment of the present invention may include, but is not limited to, a mobile phone, a Personal Digital Assistant (PDA), a wireless handheld device, a Tablet Computer (Tablet Computer), a Personal Computer (PC), an MP3 player, an MP4 player, a wearable device (e.g., smart glasses, smart watch, smart bracelet, etc.), and the like.

In addition, the term "and/or" in the embodiment of the present application is only one kind of association relationship describing an associated object, and means that three kinds of relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in the embodiment of the present application generally indicates that the preceding and following related objects are in an "or" relationship.

Fig. 1 is a schematic flowchart of an image processing method according to an embodiment of the present application, as shown in fig. 1.

101. And detecting a target object of the first image to obtain a Region of interest (ROI) of the target object.

In machine vision and image processing, a region to be processed is defined by a frame, a circle, an ellipse, an irregular polygon, or the like from a processed image, and is referred to as a region of interest.

102. And performing pixel-level foreground and background prediction on the interested region image of the target object by using a neural network model to obtain a foreground and background prediction result.

Wherein the pre-background prediction result comprises: pixels belonging to the foreground (i.e., the target object) and pixels belonging to the background (i.e., not belonging to the target object) in the region-of-interest image of the target object.

103. And extracting a foreground mask (mask) area image from the interested area image of the target object according to the front background prediction result, and carrying out fusion processing on the foreground mask area image and the second image to obtain a fusion image.

In the present application, the first image is an image including a target object, and the second image is an image that can be a background of the target object. For example, in an application scenario where a user customizes a certificate photo, the first image is an image including a user's head portrait, and the second image is an image serving as a certificate photo background (e.g., a blue background, a white background, etc.). For another example, in the application scenario of replacing the self life photo background with the landscape image, the first image is the life photo image including the user, and the second image is the landscape image.

The first image may be a still image or a frame image in a video acquired in advance, or may be a still image or a frame image in a video acquired in real time, which is not particularly limited in this embodiment.

In the present application, the first image and the second image may be uploaded and an image processing request may be submitted by a user before 101, and after 103, the fused image may be output.

It should be noted that part or all of the execution subjects 101 to 103 may be an application located in the terminal, or may also be a functional unit such as a plug-in or Software Development Kit (SDK) set in the application located in the terminal, or may also be a processing engine located in a network side server, which is not particularly limited in this embodiment.

It is to be understood that the application may be a native app (native app) installed on the terminal, or may also be a web page program (webApp) of a browser on the terminal, which is not limited in this embodiment.

In this way, after the target object detection is carried out on the first image to obtain the interesting region of the target object, the neural network model is utilized to carry out pixel-level foreground and background prediction on the interesting region image of the target object to obtain a foreground and background prediction result, then a foreground mask region image is extracted from the interesting region image of the target object according to the foreground and background prediction result, and then the foreground mask region image and the second image are subjected to fusion processing to obtain a fusion image. Therefore, a user does not need to manually operate professional image processing software, the user operation is simplified, the image processing efficiency is improved, the phenomenon that the image fusion effect is influenced due to insufficient user experience can be avoided, the image fusion effect is improved, and the user experience can be effectively improved.

Optionally, in a possible implementation manner of this embodiment, the target object may include, but is not limited to, any one or more of the following: at least a part of a human body, an object, an animal, and the like, and the present embodiment does not particularly limit the specific kind and range of the target object.

Optionally, in a possible implementation manner of this embodiment, when at least a portion of the human body includes a head, in 101, a face detection may be performed on the first image to obtain an area of interest of the face, and then the area of interest of the face is expanded according to a preset expansion ratio, for example, the area of interest of the face is expanded by half in four directions to obtain an area of interest of the head. In this implementation, the region of interest of the target object is specifically a region of interest of the head. In the application, the specific expansion proportion and the expansion direction of the region of interest of the face can be determined according to the image customization requirement, and can be adjusted according to the actual requirement, which is not limited in the embodiment.

In a specific implementation manner, a pre-trained neural network model may be used to perform Face detection on the first image, or a Face detection algorithm with higher positioning accuracy, a Single Shot Multi-box Detector (SSD), a Single Shot scale-invariant Face Detector (S3 FD), and the like may be used to perform Face detection on the first image.

Optionally, in another possible implementation manner of this embodiment, after performing face detection on the first image to obtain an area of interest of a face, affine transformation may be performed on the face to obtain the area of interest after the face is corrected. Correspondingly, the region of interest after the face is corrected can be expanded according to a preset expansion proportion, and the region of interest of the head is obtained.

The affine transformation is a kind of spatial rectangular coordinate transformation, which is a linear transformation from two-dimensional coordinates to two-dimensional coordinates, and maintains the straightness (straight line, circular arc or circular arc) and parallelism (parallelism, which is to maintain the relative position relationship between two-dimensional figures, parallel line, and intersection angle of intersecting straight lines) of two-dimensional figures. Affine transformations can be achieved by the composition of a series of atomic transformations, including: translation (Translation), scaling (Scale), Flip (Flip), and Rotation (Rotation).

Based on this embodiment, carry out affine transform to the face, can realize changeing of face to make in the final fusion image that obtains for the face, can satisfy certificate photograph, job photograph etc. and need face and just application scene demand of face.

In a specific implementation process, specifically, the key point detection may be performed on the region of interest of the face to obtain key point information of the face, then, an affine transformation matrix is obtained based on the key point information of the face and the key point information of the standard front face model, and then, the affine transformation matrix is used to perform affine transformation on the face so as to align (i.e., correct) the face in the first image.

The standard front face model is a preset front face model with an average human face not deflected in any direction.

The face key points of the present application generally refer to a plurality of points for locating a face or a partial region of the face or one or more face organs, the face key points generally include, but are not limited to, face contour key points, eye key points, eyebrow key points, mouth key points, nose key points, eyelid line key points, lip line key points, etc., and the face key points to be detected from the region of interest of the face are consistent with the key points of the standard frontal face model.

In a specific implementation mode, a pre-trained neural network model can be adopted to detect key points of the human face in the region of interest of the human face; or, Local Binary Feature (LBF) of the first image may be extracted, and based on the extracted LBF, a random forest is used to perform keypoint detection to obtain the keypoint information of the face.

In this embodiment, in order to align the face in the first image, the key points of the face need to be respectively adjusted to the same positions corresponding to the corresponding key points in the standard frontal face model, which requires some columns of transformations such as translation, scaling, flipping, and rotation for the face in the first image. And acquiring a corresponding affine transformation matrix based on the key point information of the face and the key point information of the standard frontal face model, and performing affine transformation on each point in the face by using the affine transformation matrix, so that the alignment of the face can be realized.

Optionally, in yet another possible implementation manner of this embodiment, after performing face detection on the first image to obtain an area of interest of a face, a face deflection angle of the face may be detected first, and then, whether the face deflection angle is within a preset deflection angle range is identified, and if the face deflection angle is within the preset deflection angle range, an affine transformation operation is performed on the face. Otherwise, if the deflection angle of the human face is not within the preset deflection angle range, performing affine transformation and subsequent operations on the human face is not performed.

In a specific implementation manner, the face deflection angle may be used to represent a degree of face deflection in the first image, for example, the face deflection angle may include the following angle indexes: one or more of an angle of rotation about the X-axis, an angle of rotation about the Y-axis, and an angle of rotation about the Z-axis. In general, the above-mentioned rotation angle around the X axis may be referred to as a Pitch angle (Pitch), which represents a head-up/head-down angle); the above rotation angle around the Y axis may be referred to as a side face angle (Yaw), which represents an angle at which a human face is deflected left/right; the above-mentioned rotation angle around the Z axis may be referred to as a left-right flip angle (Roll), which represents an angle at which the top of a human face (or the top of the head) is close to the left/right shoulder. That is, the face deflection angle can be expressed by (Pitch, Yaw, Roll). However, this does not mean that the face pose information in the present application must include Pitch, Yaw, and Roll.

The preset deflection angle range can be set according to actual requirements, for example, can be [ -15 degrees, 15 degrees ], and can also be adjusted in real time according to actual conditions. When the human face deflection angle includes a plurality of angle indexes in Pitch, Yaw and Roll, the preset deflection angle ranges of different angle indexes may be the same or different, or may be partially the same or partially different, which is not particularly limited in this embodiment. For example, the preset deflection angle range of Pitch may be set to [ -15 °, 15 ° ], and the preset deflection angle range of Yaw and Roll may be set to [ -10 °, 10 ° ].

Therefore, when the human face deflection angle in the first image is within the preset deflection angle range, affine transformation is carried out on the human face in the first image, the human face is corrected, the face-corrected image of the user is obtained, the fused image comprising the face of the user can be obtained, and the production effect of the work photo and the certificate photo of the user is improved. When the deflection angle of the face in the first image exceeds the preset deflection angle range, the face is deformed due to affine transformation on the face, so that the authenticity of the user image is influenced, and the affine transformation and subsequent operation on the face are not executed any more, so that meaningless image processing operation is avoided, and computing resources are saved.

Or, in yet another possible implementation manner of this embodiment, the region of interest of the face may also be expanded according to a preset expansion ratio, for example, the region of interest of the face is respectively expanded by half in four directions to obtain a region of interest of the head, and then the face is affine transformed to obtain the region of interest of the head after the face is corrected. In this implementation, the region of interest of the target object is specifically a region of interest of the human face after the human face is corrected. The specific expansion ratio and the expansion direction may be determined according to the image customization requirement, and may be adjusted according to the actual need, which is not limited in this embodiment.

Therefore, after the interesting region of the head is obtained, affine transformation is carried out on the face in the interesting region of the head, the face is corrected, the face image of the user is obtained, and therefore a fused image comprising the face of the user can be obtained, and the production effect of the work photo and the certificate photo of the user is improved.

In a specific implementation process, face key point detection may be specifically performed on the region of interest of the head to obtain key point information of the face, and then an affine transformation matrix is obtained based on the key point information of the face and the key point information of the standard frontal face model, and then the affine transformation matrix is used to perform affine transformation on the face.

Optionally, in another possible implementation manner of this embodiment, after performing face detection on the first image to obtain an interesting region of a face, a face deflection angle of the face may be detected first, and then, whether the face deflection angle is within a preset deflection angle range is identified, and if the face deflection angle is within the preset deflection angle range, the operation of expanding the interesting region of the face according to a preset expansion ratio is performed. Otherwise, if the human face deflection angle is not within the preset deflection angle range, the region of interest of the human face is not expanded according to the preset expansion proportion and subsequent operations are not executed.

Therefore, when the human face deflection angle in the first image is within the preset deflection angle range, the region of interest of the human face is expanded and the subsequent operation is performed according to the preset expansion proportion, when the human face deflection angle in the region of interest of the head exceeds the preset deflection angle range, the human face is deformed due to affine transformation performed on the human face, so that the authenticity of the user image is influenced, the application scene requirements of real human face front faces such as certificate pictures and working pictures which need to be met cannot be met, the affine transformation and the subsequent operation performed on the human face are not performed, so that the meaningless image processing operation is avoided, and the computing resources are saved.

Optionally, in one possible implementation manner of this embodiment, in 102, the neural network model may be a neural network that can implement a class prediction function at a pixel level, for example, the neural network model may include, but is not limited to, any of the following neural network models: a Full Convolution Network (FCN), a coding-decoding structure (U-Net) full Convolution Network, a Semantic Segmentation (SegNet) Network, a pyramid scene parsing Network (PSPNet), a Semantic Segmentation model (deepab), and the like, and the specific implementation of the neural Network model is not particularly limited in this embodiment.

The U-net full convolution network adopts an encoder-decoder structure, the first half part of the U-net full convolution network is used for feature extraction, and the second half part of the U-net full convolution network is used for up-sampling. Because the U-net full convolution network adopts completely different feature fusion modes: the features are spliced together in the channel dimension to form thicker features, so that the recovered feature map is ensured to be fused with more low-layer features, and the features with different sizes are fused, so that multi-scale prediction and deep supervision can be performed, and the segmentation edge information of the foreground and the background can be finer.

In a specific implementation manner, a sample image neural network model carrying a pixel level label may be trained in advance, so that the trained neural network model may accurately implement foreground and background segmentation of an image of a region of interest of a target object, for example, when the target object is a head, pixel level segmentation of a hair edge, a face edge, and the like may be implemented.

Optionally, in a possible implementation manner of this embodiment, in 103, a foreground mask area image may be extracted from the region-of-interest image of the target object according to the front background prediction result, and an image fusion algorithm, for example, a poisson fusion (poisson fusion) algorithm, is used to perform fusion processing on the foreground mask area image and the second image, so as to obtain a fusion image.

The Poisson fusion algorithm is to place a target object image in a background image, and the placement position is within a foreground mask area size range with a P point in the background image as a center. The color and the gradient in the target object image can be changed in the poisson fusion process, so that the seamless fusion effect is achieved, and the finally obtained fusion image is more natural and real.

In this embodiment, after a region of interest of a target object is obtained by performing target object detection on a first image, a neural network model is used to perform pixel-level foreground and background prediction on the region of interest image of the target object to obtain a foreground and background prediction result, then a foreground mask region image is extracted from the region of interest image of the target object according to the foreground and background prediction result, and then the foreground mask region image and a second image are subjected to fusion processing to obtain a fusion image. Therefore, the user does not need to manually operate professional image processing software, the user operation is simplified, and the image processing efficiency is improved.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

Another embodiment of the present application further provides an apparatus, including: one or more processors; the storage device is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors implement the image processing method provided by any embodiment corresponding to the embodiment in fig. 1.

FIG. 2 illustrates a block diagram of an exemplary computer system/server 12 suitable for use in implementing embodiments of the present application. The computer system/server 12 shown in FIG. 2 is only one example and should not be taken to limit the scope of use or functionality of embodiments of the present application.

As shown in FIG. 2, computer system/server 12 is in the form of a general purpose computing device. The components of computer system/server 12 may include, but are not limited to: one or more processors or processing units 16, a storage device or system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.

Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Computer system/server 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer system/server 12 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30 and/or cache memory 32. The computer system/server 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 2, and commonly referred to as a "hard drive"). Although not shown in FIG. 2, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. System memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the application.

A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in system memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally perform the functions and/or methodologies of the embodiments described herein.

The computer system/server 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with the computer system/server 12, and/or with any devices (e.g., network card, modem, etc.) that enable the computer system/server 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 44. Also, the computer system/server 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet) via the network adapter 20. As shown, network adapter 20 communicates with the other modules of computer system/server 12 via bus 18. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the computer system/server 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

The processing unit 16 executes various functional applications and data processing by running programs stored in the system memory 28, for example, implementing an image processing method provided in any of the embodiments corresponding to fig. 1.

Another embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the image processing method provided in any embodiment corresponding to fig. 1.

In particular, any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or page components may be combined or integrated into another system, or some features may be omitted or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. An image processing method, comprising:

2. The method of claim 1, wherein the target object comprises any one or more of: at least a part of a human body, an object, an animal.

3. The method of claim 2, wherein when the at least a portion of the human body includes a head, the performing target object detection on the first image to obtain a region of interest of the target object comprises:

carrying out face detection on the first image to obtain an interested area of a face;

expanding the interesting region of the face according to a preset expansion ratio to obtain the interesting region of the head; the region of interest of the target object is a region of interest of the head.

4. The method according to claim 3, wherein after the performing face detection on the first image to obtain the region of interest of the face, further comprises:

carrying out affine transformation on the human face to obtain an interested area after the human face is corrected;

the expanding the region of interest of the face according to the preset expansion proportion comprises the following steps:

and expanding the region of interest after the face is corrected according to a preset expansion proportion.

5. The method of claim 4, wherein the affine transforming the face comprises:

performing face key point detection on the region of interest of the face to obtain key point information of the face;

acquiring an affine transformation matrix based on the key point information of the face and the key point information of the standard frontal face model;

and carrying out affine transformation on the human face by utilizing the affine transformation matrix.

6. The method according to claim 4, wherein after the face detection is performed on the first image to obtain the region of interest of the face, the method further comprises:

detecting a face deflection angle of the face;

identifying whether the human face deflection angle is within a preset deflection angle range;

and if the deflection angle of the human face is within the preset deflection angle range, executing the affine transformation operation on the human face.

7. The method according to claim 3, wherein after the expanding the region of interest of the face according to the preset expansion ratio to obtain the region of interest of the head, the method further comprises:

carrying out affine transformation on the face to obtain an interested region of the head after the face is corrected; the interested region of the target object is the interested region of the human face after the human face is turned right.

8. The method of claim 7, wherein the affine transforming the face comprises:

performing face key point detection on the interested area of the head to obtain key point information of the face;

9. The method of claim 7, wherein after the performing face detection on the first image to obtain the region of interest of the face, further comprises:

detecting a face deflection angle of the face;

and if the human face deflection angle is within the preset deflection angle range, executing the operation of expanding the region of interest of the human face according to the preset expansion proportion.

10. The method according to any one of claims 1 to 9, wherein the neural network model comprises: U-Net full convolution network.

11. The method according to any one of claims 1 to 9, wherein the fusing the foreground mask region image and the second image comprises:

and performing fusion processing on the foreground mask area image and the second image by using a Poisson fusion algorithm.

12. An electronic device, characterized in that the device comprises:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement a method as claimed in any one of claims 1 to 11.

13. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the method of any one of claims 1 to 11.