WO2023217046A1 - 图像处理方法及装置、非易失性可读存储介质、电子设备 - Google Patents

图像处理方法及装置、非易失性可读存储介质、电子设备 Download PDF

Info

Publication number
WO2023217046A1
WO2023217046A1 PCT/CN2023/092576 CN2023092576W WO2023217046A1 WO 2023217046 A1 WO2023217046 A1 WO 2023217046A1 CN 2023092576 W CN2023092576 W CN 2023092576W WO 2023217046 A1 WO2023217046 A1 WO 2023217046A1
Authority
WO
WIPO (PCT)
Prior art keywords
image data
target area
image
neural network
network model
Prior art date
Application number
PCT/CN2023/092576
Other languages
English (en)
French (fr)
Inventor
杨勇杰
林崇仰
王进
Original Assignee
虹软科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 虹软科技股份有限公司 filed Critical 虹软科技股份有限公司
Publication of WO2023217046A1 publication Critical patent/WO2023217046A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Definitions

  • the present application relates to the field of image processing technology, specifically, to an image processing method and device, a non-volatile readable storage medium, and electronic equipment.
  • the camera is one of the most important sensors in smart terminals. Using the front camera to take selfies and the rear camera to take pictures of others and scenery, whether for storage or sharing, has become the most important part of the daily life of many people today. At present, most of the cameras and applications in smart terminals are only based on one of the front camera or the rear camera to obtain image or video data and process it. They lack the effective use of the respective characteristics of the front camera and the rear camera to make Interesting algorithmic effects for portrait images or videos. And when taking a selfie, the photographer's attention will be focused on himself and not the background environment.
  • Embodiments of the present application provide an image processing method and device, a non-volatile readable storage medium, and an electronic device to at least solve the problem of being unable to take into account the background environment due to the lack of effective utilization of the respective characteristics of the front camera and the rear camera. technical problem.
  • an image processing method including: acquiring first image data of a target object and second image data of a target scene; and processing the first image data according to the type of the first image data.
  • the image data is segmented to obtain a first target area corresponding to the target object in the first image data; the first target area and the second image data are fused to generate a fused image.
  • segmenting the first image data according to the type of the first image data to obtain the first target area corresponding to the target object in the first image data includes: when the first When the first image data is a video, the first image data is segmented using a pre-trained first neural network model to obtain the first target area corresponding to the target object; when the first image data is a picture, the first image data is segmented using The predetermined second neural network model performs segmentation processing on the first image data to obtain the third image corresponding to the target object. a target area.
  • the first neural network model is a lightweight model using a cross-layer connection between a backbone network and a decoding network as a network structure.
  • the training method of the pre-trained first neural network model includes: obtaining a training data set, wherein the training data set includes: first sample image data and a first sample target area, wherein the The first sample target area is to obtain a target area mask map based on the first sample image data; train a neural network model based on the training data set to generate the first neural network model, wherein when training the In the process of building the first neural network model, consistency constraints are applied to the first neural network model based on inter-frame information.
  • the method further includes: converting the first target area into a grayscale mask image, and The boundaries of the grayscale mask image are smoothed.
  • the method further includes: obtaining the previous frame segmentation result of the first image data, and using the The previous frame segmentation result performs temporal smoothing filtering on the first target area corresponding to the target.
  • the method further includes: combining the first image data, and using the pre-trained third neural network The first target area is optimized to obtain a processed first target area.
  • the training method of the pre-trained third neural network model includes: obtaining an image with a solid color background; performing matting processing and pre-annotation on the image with a solid color background to obtain a label mask image; using the solid color background The background image and the label mask image are used as sample data to train to obtain the third neural network model.
  • the method further includes: inputting the first target area into a post-processing module for post-processing.
  • performing a fusion process on the first target area and the second image to generate a fused image includes: evaluating environmental information of the second image data, and correcting the first image according to the environmental information. target area, obtain the corrected first target area; determine the corresponding third target area of the first target area in the second image data Two target areas; replace the second target area with the corrected first target area.
  • an image processing device including: an acquisition module for acquiring first image data of the target object and second image data of the target scene, wherein the first image data is acquired.
  • the image data and second image data collection devices are located on the same terminal device; a segmentation module is used to segment the first image data to obtain the first image data corresponding to the target object. a first target area; a fusion module, configured to fuse the first target area and the second image data to generate a fused image.
  • a non-volatile storage medium includes a stored program, wherein when the program is running, the device where the non-volatile storage medium is located is controlled to execute the above Image processing methods.
  • an electronic device including a memory and a processor; the processor is configured to run a program, wherein the above image processing method is executed when the program is run.
  • the first image data of the target object and the second image data of the target scene are obtained; the first image data is segmented according to the type of the first image data to obtain the first image data.
  • the first target area corresponding to the target object in the image data; the first target area and the second image data are fused to generate a fused image, respectively, through a collection device on the same terminal device Collect the first image and the second image and fuse the first image and the second image to achieve the purpose of making full use of the front and rear cameras, thereby achieving the technical effect of taking into account the background environment, thereby solving the problem of lack of effective utilization
  • the respective characteristics of the front camera and the rear camera result in the inability to take into account the technical issues of the background environment.
  • Figure 1 is a schematic diagram of an optional image processing method according to an embodiment of the present application.
  • FIG. 2 is a schematic diagram of another optional image processing method according to an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of an optional image segmentation process according to an embodiment of the present application.
  • Figure 4 is a schematic diagram of an optional image processing device according to an embodiment of the present application.
  • Figure 1 is an image processing method according to an embodiment of the present application. As shown in Figure 1, the method includes the following steps:
  • Step S102 obtain the first image data of the target object and the second image data of the target scene
  • Step S104 Segment the first image data according to the type of the first image data to obtain the first target area corresponding to the target object in the first image data;
  • Step S106 fuse the first target area and the second image data to generate a fused image.
  • the acquisition device located on the same terminal device can separately collect the first image data and the second image data, and fuse the target object contained in the first image data with the second image data, so as to fully utilize the front-end image data.
  • the purpose is to use the advantages of the rear camera to achieve the technical effect of taking into account the background environment, thereby solving the technical problem of being unable to take into account the background environment due to the lack of effective use of the respective characteristics of the front camera and the rear camera.
  • the above-mentioned image data collection device is the front camera and the rear camera of the same terminal (for example, a mobile phone and a notebook).
  • the collected image data containing the target object is the first image data, so the first
  • the image data collection device is not fixedly a front camera, but can also be a rear camera.
  • the second image data collection device is a front camera or a rear camera; the first image data and the second image data can be It can be a picture or a video;
  • the target object can be a portrait or other objects, such as animals and objects;
  • the target scene can be the scene where the target object is located or any virtual scene.
  • the image processing steps can be shown in Figure 2.
  • the collected portrait images and background videos undergo different processing processes before being fused.
  • the fused video is obtained.
  • it achieves the collection of foreground and background information at the same time, which can retain and restore more scene information of the shooting location; on the other hand, it makes full use of the respective advantages of the front and rear cameras of the mobile phone.
  • it is not only convenient to observe the imaging situation of self-portrait, but also convenient to observe the information of the entire background scene to obtain better framing.
  • the first image data is segmented according to the type of the first image data to obtain a first target area corresponding to the target object in the first image data, including :
  • the first image data is a video
  • the first image data is segmented using a pre-trained first neural network model to obtain the first target area corresponding to the target object;
  • the first image The data is a picture
  • the first image data is segmented using a predetermined second neural network model to obtain a first target area corresponding to the target object.
  • the first image data and the second image data collected can be pictures or videos.
  • different types of data have different processing goals. Different from the method, before performing segmentation processing on the image data, a category judgment is performed on the above-mentioned first image data.
  • the first neural network model can be a lightweight neural network model, used to segment video images; for segmentation of picture data Processing, since images require higher details, for example, the second neural network model can be a convolutional neural network model.
  • the method before performing category judgment on the first image data, the method further includes: performing content identification and judgment on the first image data; when the input image data is judged to belong to segmentation When processing a supported scene, the first image data performs category judgment; when the input image data is judged not to belong to a scene supported by segmentation processing, the processing of the first image data ends.
  • the computer vision method is used to extract the target area in the image.
  • the processing flow is shown in Figure 3: input the image data and then preprocess the image.
  • the subsequent images are judged to see whether they belong to scenes supported by the segmentation engine. If they are supported, they will be processed. If they are not supported, they will not be processed.
  • the scene supported by the segmentation engine requires that the image contains a target object.
  • the type of the target object is preset by the user, and the type of target object contained in the image is obtained through the detection and recognition algorithm.
  • the distance between the target object and the collection device needs to meet the preset requirements.
  • the preprocessing of image data takes mobile device image data as input, converts it into the corresponding data format according to subsequent segmentation network requirements, and adjusts the size, color, angle, etc. of the image to obtain usable Image data; distinguishing whether the preprocessed image belongs to the scene supported by the segmentation engine is composed of a scene discrimination convolutional neural network pre-trained based on massive labeled supervision data, which can quickly and accurately identify the content of the input image. Only when the input image is determined to belong to a scene supported by the segmentation engine, the subsequent processing process will be entered.
  • the recognition network is a typical classification network composed of a stack of convolutional layers, activation layers, and pooling layers. Considering the performance requirements of mobile devices, the number and width of the network layers have been strictly limited and optimized to ensure millisecond-level operation on the device side. .
  • the first neural network model is a lightweight model that uses a cross-layer connection between a backbone network and a decoding network as a network structure.
  • the training method of the pre-trained first neural network model includes: obtaining a training data set, wherein the training data set includes: first sample image data and a first sample target area, wherein the first sample The target area is to obtain a target area mask map based on the first sample image data; train a neural network model based on the training data set to generate the first neural network model, wherein, when training the first neural network During the modeling process, consistency constraints are imposed on the first neural network model based on inter-frame information.
  • the training data set is first collected by collecting a large amount of first sample image data.
  • the first sample image data needs to contain a group photo of both the object and the background. Since the training network and prediction The networks are independent of each other.
  • the sample image data does not limit the objects and scenes to the target objects and target backgrounds during recognition, but the sample image data needs to cover the actual usage scene categories.
  • this application does not limit the generation method of the first sample image data. It can be collected from actual scenes or through post-synthesis.
  • the first sample target area is determined through manual or automatic recognition, that is, the target area mask map is obtained as supervision information by annotating the first sample image data.
  • the network part adopts a lightweight model, which consists of convolution layer, activation layer, pooling layer and deconvolution. Targeted optimization has been made in terms of the number of network layers, convolution type, downsampling position, etc.
  • the network structure uses cross-layer connections between the backbone network and the decoding network to improve segmentation accuracy.
  • network deployment relies on instruction optimization and special equipment optimization of the running hardware platform to achieve real-time segmentation computing performance.
  • inter-frame information consistency constraints were added to the training process.
  • the method further includes: converting the first target area into a grayscale mask code image; and smooth the boundaries of the grayscale mask image.
  • the first target area is converted into a grayscale mask image with a background value of 0.
  • small ones are removed through image processing algorithms Isolated areas, smooth mask boundaries, and the introduction of masks are not only beneficial to shielding noise outside the target area, but also can fully extract and utilize the area of interest, thereby optimizing the segmentation results.
  • the method further includes: obtaining the previous frame segmentation result of the first image data , and use the previous frame segmentation result to perform temporal smoothing filtering on the first target area corresponding to the target.
  • the segmentation result of the previous frame is used for temporal smoothing filtering, which increases inter-frame stability and ensures the continuity of the output video results.
  • the second neural network model is a convolutional neural network model that adds atrous convolution and attention mechanisms.
  • the network model is composed of a deep convolutional network with more parameters and a more complex structure. Taking into account the higher requirements for details in photo mode, more refined standards are used when manually annotating training data to improve the quality of supervision data. At the same time, the requirements for computing performance in the photo mode are reduced, so structures such as dilated convolutions and attention mechanisms are appropriately added to the network part to improve network analysis capabilities. The depth, width, and feature map size of the network have also been appropriately relaxed to achieve more accurate segmentation requirements.
  • the deployment phase also uses instruction optimization and special equipment optimization on the operating platform to improve the model running speed.
  • the first target area may be an area where the target object is located in the first image data.
  • the method further includes: combining the first image data, and using the pre-trained second neural network model to obtain the first target area corresponding to the target object.
  • the third neural network performs optimization processing on the first target area to obtain the processed first target area.
  • the training method of the pre-trained third neural network model includes: obtaining an image with a solid color background; performing cutout processing and pre-annotation on the image with the solid color background to obtain a label mask image; using the image with the solid color background
  • the third neural network model is obtained by training the image and the label mask image as sample data.
  • the third neural network may be a Matting network (a neural network used for refined image segmentation).
  • the input to the third neural network is the output of the second neural network model and the original first image data. Due to limitations such as segmentation network resolution and downsampling, the area map obtained through the second neural network cannot obtain fine segmentation results in areas such as object edges and hair.
  • the Matting network based on the first target area output by the second neural network, the Matting network obtains the trimap map (three-point map) based on the confidence level, and at the same time adds an attention mechanism to the network, which can make the network Focus more on edges and improve edge precision.
  • the output of the network is a mask map, which is the processed first target area map, which is a regression of the opacity of each pixel position.
  • this application adopts three steps of fixed-point collection, automatic pre-annotation, and manual correction to obtain training data.
  • fixed-point collection refers to building a solid-color background collection environment with adjustable ambient light to obtain more natural solid-color background data.
  • Automatic pre-annotation uses the image matting algorithm and the trained third neural network to process the solid-color background data to obtain the initial mask map. Finally, the error areas in the initial mask result are fine-tuned through manual correction to obtain the final annotation result.
  • the third neural network in the automatic pre-annotation part can iterate as the data is updated, so that the pre-annotation effect continues to improve and further gradually reduce the cost of manual annotation.
  • the first target area and the second image data are fused, and before generating the fused image, the method further includes: inputting the first target area into post-processing module for post-processing.
  • the terminal device when the terminal device is movable, such as a mobile phone, it is often shot with a handheld device, and the captured video often contains a certain amount of jitter.
  • the video anti-shake module can be used to reduce or eliminate it through hardware and software technology. The shake in the video makes the captured video more stable.
  • an anti-shake module will be added for anti-shake processing.
  • the semantic segmentation algorithm based on neural networks can only process images with smaller resolutions, generally no larger than 512x512 pixels, which is higher than the initial image high-definition (720P ), ultra-high definition (1080P) or even 4K, etc. are often much smaller.
  • the general processing method is to first downsample the image to the neural network resolution, and then upsample to the original resolution after obtaining the result.
  • the result after direct upsampling is often not aligned with the original resolution at the boundary, and due to the scaling relationship, the details of the small image will be lost. After upsampling, the details that cannot be displayed on the small image will be lost.
  • the segmentation results obtained are also inaccurate.
  • this application adopts a post-processing module, based on the segmentation results of the first neural network, the second neural network or the third neural network, and calculates more accurate alignment with the original resolution image boundary by maintaining the edge filter algorithm, and the details are more accurate Rich portrait foreground area.
  • performing a fusion process on the first target area and the second image data to generate a fused image includes: estimating the environmental information of the second image data, and generating a fusion according to the environment information.
  • Information corrects the first target area to obtain the corrected first target area; determines the second target area corresponding to the first target area in the second image data; replaces the second target area with the The corrected first target area.
  • the fusion is based on the characteristics of the foreground and background to make the result as natural as possible, including retaining the naturalness of the foreground skin color, and adjusting the tone of the foreground to more naturally adapt to the tone of the background, so that the fused image visually forms a complete whole. And reduce cutout traces.
  • This application does not limit the types of fusion objects, and can realize the fusion of image foreground and image background, image foreground and video background, video foreground and video background, and video foreground and image background.
  • the foreground area of the portrait is the first target area and the background video result is the second image.
  • the result video of merging the output portrait with the background video is the first target area and the background video result.
  • this application will first evaluate the environmental information of the background, including the direction, intensity, color and other information of the light, and then make corresponding corrections to the foreground to ensure uniform image quality of the foreground and background images. Secondly, regarding the placement position, the foreground area is generally placed in the center of the background area. In addition, if any one or more boundaries of the upper, lower, left, and right of the foreground image touch the image boundary, the fusion result will also retain this feature.
  • the area of the collected background image to be fused is not a pure background, and there may be occlusions, such as non-target people and objects in the area.
  • this application also includes Perform target recognition on the background image, and prompt the user when the recognition result contains an occluded target.
  • the fusion calculation involves a huge amount of calculations.
  • the corresponding objects are simplified, including optimizing the overall process, and only performing lightweight network segmentation. , instead of using fine-level networks, and correspondingly simplifying each module, such as reducing or even completely removing post-processing, and further compressing the model using lightweight networks.
  • the embodiment of the present application also provides an image processing device, as shown in Figure 4, including: an acquisition module 40, used to acquire the first image data of the target object and the second image data of the target scene; a segmentation module 42, used to Segment the first image data according to the type of the first image data to obtain a first target area corresponding to the target object in the first image data; the fusion module 44 combines the first target The region is fused with the second image data to generate a fused image.
  • an acquisition module 40 used to acquire the first image data of the target object and the second image data of the target scene
  • a segmentation module 42 used to Segment the first image data according to the type of the first image data to obtain a first target area corresponding to the target object in the first image data
  • the fusion module 44 combines the first target The region is fused with the second image data to generate a fused image.
  • the segmentation module 42 includes: a first processing sub-module; a judgment sub-module used to judge the category of the first image data; when the first image data is a video, a pre-trained first neural network model is used to judge the category of the first image data. An image data is segmented to obtain the first target area corresponding to the target object; when the first image data is a picture, a predetermined second neural network model is used to segment the first image data, The first target area corresponding to the target object is obtained.
  • the judgment sub-module includes: judgment unit, first training unit, and transformation unit;
  • the judgment unit is used to perform content identification and judgment on the first image data.
  • the first image data performs category judgment; when the input image data is When it is determined that the first image data does not belong to the scene supported by the segmentation process, the processing of the first image data ends.
  • the first training unit is used to obtain a training data set, wherein the training data set includes: first sample image data and a first sample target area, wherein the first sample target area is based on the first Obtain the target area mask map from the sample image data; train a neural network model based on the training data set to generate the first neural network model, wherein during the process of training the first neural network model, based on inter-frame information Consistency constraints are applied to the first neural network model.
  • the conversion unit is used to convert the first target area into a grayscale mask image; and smooth the boundary of the grayscale mask image.
  • the first processing sub-module is used to combine the first image data and use a pre-trained third neural network to optimize the first target area to obtain a processed first target area.
  • the first processing sub-module includes: a second training unit;
  • the second training unit is used to obtain an image of a solid color background; perform cutout processing and pre-annotation on the image of the solid color background to obtain a label mask image; use the image of the solid color background as sample data, and the label mask image
  • the third neural network model is obtained by training as the supervised data.
  • the fusion module 44 includes: a second processing sub-module and a generation sub-module; the second processing sub-module is used to input the first target area into a post-processing module for post-processing.
  • the generation sub-module is used to evaluate the environmental information of the second image data, correct the first target area according to the environmental information, and obtain the corrected first target area;
  • a non-volatile storage medium includes a stored program, wherein when the program is running, the device where the non-volatile storage medium is located is controlled to execute the above Image processing methods.
  • the above-mentioned non-volatile storage medium is used to store a program that performs the following functions: acquiring a first image of a target object and a second image of a target scene; segmenting the first image data according to the type of the first image data , obtain the first target area corresponding to the target object in the first image; fuse the first target area and the second image to generate a fused image.
  • an electronic device including a memory and a processor; the processor is configured to run a program, wherein the above image processing method is executed when the program is run.
  • the above-mentioned processor is used to run a program that performs the following functions: acquiring a first image of a target object and a second image of a target scene; segmenting the first image data according to the type of the first image data to obtain a first image a first target area corresponding to the target object; the first target area and the second image are fused to generate a fused image.
  • the disclosed technical content can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of units can be a logical functional division. In actual implementation, there may be other division methods.
  • multiple units or components can be combined or integrated into Another system, or some features can be ignored, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the units or modules may be in electrical or other forms.
  • Units described as separate components may or may not be physically separate, and components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed over multiple units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit.
  • the above integrated units can be implemented in the form of hardware or software functional units.
  • Integrated units may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as independent products.
  • the technical solution of the present application is essentially or contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to cause a computer device (which can be a personal computer, a server or a network device, etc.) to execute all or part of the steps of the methods of various embodiments of the present application.
  • the aforementioned storage media include: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program code. .
  • the solution provided by the embodiment of the present application can be applied to the field of image processing technology.
  • the first image data of the target object and the second image data of the target scene are obtained; the first image data is processed according to the type of the first image data.
  • the first image data is segmented to obtain a first target area corresponding to the target object in the first image data; the first target area and the second image data are fused to generate a fusion
  • the final image achieves the technical effect of making full use of the camera's front and rear cameras when taking pictures, taking into account the background environment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

本申请公开了一种图像处理方法及装置、非易失性可读存储介质、电子设备。其中,该方法包括:获取目标对象的第一图像数据以及目标场景的第二图像数据;根据所述第一图像数据的类型对所述第一图像数据进行分割处理,得到所述第一图像数据中与所述目标对象对应的第一目标区域;将所述第一目标区域与所述第二图像数据进行融合处理,生成融合后的图像。

Description

图像处理方法及装置、非易失性可读存储介质、电子设备
本申请要求于2022年5月7日递交的中国专利申请第202210493747.X号的优先权,在此全文引用上述中国专利申请公开的内容以作为本申请的一部分。
技术领域
本申请涉及图像处理技术领域,具体而言,涉及一种图像处理方法及装置、非易失性可读存储介质、电子设备。
背景技术
相机是智能终端中最重要的传感器之一,使用前置摄像头自拍,以及后置摄像头拍摄他人和风景,无论是存储还是分享,都已经成为当代许多人日常生活中最重要的组成部分。目前绝大多数的智能终端中相机及应用,都只基于前置相机或后置相机之一,来获取图像或视频数据及处理,缺乏有效利用前置相机和后置相机各自的特性,做出有趣的人像图像或视频算法特效。并且在自拍的时候,拍摄者的注意力会关注在自身上,而无法兼顾背景环境。
针对上述的问题,目前尚未提出有效的解决方案。
发明内容
本申请实施例提供了一种图像处理方法及装置、非易失性可读存储介质、电子设备,以至少解决由于缺乏有效利用前置相机和后置相机各自的特性造成的无法兼顾背景环境的技术问题。
根据本申请实施例的一个方面,提供了一种图像处理方法,包括:获取目标对象的第一图像数据以及目标场景的第二图像数据;根据所述第一图像数据的类型对所述第一图像数据进行分割处理,得到所述第一图像数据中与所述目标对象对应的第一目标区域;将所述第一目标区域与所述第二图像数据进行融合处理,生成融合后的图像。
可选地,根据所述第一图像数据的类型对所述第一图像数据进行分割处理,得到所述第一图像数据中与所述目标对象对应的第一目标区域,包括:当所述第一图像数据是视频,利用预先训练的第一神经网络模型对所述第一图像数据进行分割处理,以得到所述目标对象对应的第一目标区域;当所述第一图像数据是图片,利用预先确定的第二神经网络模型对所述第一图像数据进行分割处理,得到所述目标对象对应的第 一目标区域。
可选地,对所述第一图像数据进行类别判断之前,所述方法还包括:对所述第一图像数据进行内容识别和判断;当输入的图像数据被判定为属于分割处理所支持的场景时,所述第一图像数据进行类别判断;当输入的图像数据被判定为不属于分割处理所支持的场景时,所述第一图像数据结束处理。
可选地,所述第一神经网络模型为采用骨干网络和解码网络跨层连接的方式作为网络结构的轻量级模型。
可选地,所述预先训练的第一神经网络模型的训练方法包括:获取训练数据集,其中,所述训练数据集包括:第一样本图像数据和第一样本目标区域,其中,所述第一样本目标区域为基于所述第一样本图像数据获得目标区域掩码图;基于所述训练数据集训练神经网络模型,生成所述第一神经网络模型,其中,在训练所述第一神经网络模型的过程中,基于帧间信息对所述第一神经网络模型进行一致性约束。
可选地,在利用预先训练的第一神经网络模型得到所述目标对象对应的第一目标区域之后,所述方法还包括:将所述第一目标区域转化为灰度掩码图,并将所述灰度掩码图的边界进行平滑。
可选地,在利用预先训练的第一神经网络模型得到所述目标对象对应的第一目标区域之后,所述方法还包括:获取所述第一图像数据的前帧分割结果,并利用所述前帧分割结果对所述目标对应的所述第一目标区域进行时域平滑滤波。
可选地,所述第二神经网络模型为加入空洞卷积采用和注意力机制的卷积神经网络模型。
可选地,在利用预先训练的第二神经网络模型得到所述目标对象对应的第一目标区域之后,所述方法还包括:结合所述第一图像数据,并利用预先训练的第三神经网络对所述第一目标区域进行优化处理,得到处理后的第一目标区域。
可选地,所述预先训练的第三神经网络模型的训练方法包括:获取纯色背景的图像;对所述纯色背景的图像进行抠图处理和预标注,得到标签蒙版图像;以所述纯色背景的图像和所述标签蒙版图像作为样本数据训练得到所述第三神经网络模型。
可选地,将所述第一目标区域与所述第二图像数据进行融合处理,生成融合后的图像之前,所述方法还包括:将所述第一目标区域输入后处理模块进行后处理。
可选地,将所述第一目标区域与所述第二图像进行融合处理,生成融合后的图像,包括:评估所述第二图像数据的环境信息,根据所述环境信息校正所述第一目标区域,获得校正后的第一目标区域;确定所述第一目标区域在所述第二图像数据中对应的第 二目标区域;将所述第二目标区域替换为所述校正后的第一目标区域。
根据本申请实施例的另一方面,还提供了一种图像处理装置,包括:获取模块,用于获取目标对象的第一图像数据以及目标场景的第二图像数据,其中,采集所述第一图像数据和所述第二图像数据的采集装置位于同一个终端设备上;分割模块,用于对所述第一图像数据进行分割处理,得到所述第一图像数据中与所述目标对象对应的第一目标区域;融合模块,用于将所述第一目标区域与所述第二图像数据进行融合处理,生成融合后的图像。
根据本申请实施例的再一方面,还提供了一种非易失性存储介质,非易失性存储介质包括存储的程序,其中,在程序运行时控制非易失性存储介质所在设备执行上述图像的处理方法。
根据本申请实施例的再一方面,还提供了一种电子设备,包括存储器和处理器;处理器用于运行程序,其中,程序运行时执行上述图像的处理方法。
在本申请实施例中,采用获取目标对象的第一图像数据以及目标场景的第二图像数据;根据所述第一图像数据的类型对所述第一图像数据进行分割处理,得到所述第一图像数据中与所述目标对象对应的第一目标区域;将所述第一目标区域与所述第二图像数据进行融合处理,生成融合后的图像的方式,通过同一终端设备上的采集装置分别采集第一图像和第二图像并将第一图像、第二图像进行融合,达到了充分利用前置与后置摄像头的目的,从而实现了兼顾背景环境的技术效果,进而解决了由于缺乏有效利用前置相机和后置相机各自的特性造成的无法兼顾背景环境技术问题。
附图说明
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1是根据本申请实施例的一种可选的图像处理方法的示意图;
图2是根据本申请实施例的另一种可选的图像处理方法的示意图;
图3是根据本申请实施例的一种可选的图像分割处理流程示意图;
图4是根据本申请实施例的一种可选的图像处理装置示意图。
具体实施方式
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本申请保护的范围。
需要说明的是,本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
根据本申请实施例,提供了一种图像的处理的方法实施例,需要说明的是,在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行,并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。
图1是根据本申请实施例的一种图像的处理方法,如图1所示,该方法包括如下步骤:
步骤S102,获取目标对象的第一图像数据以及目标场景的第二图像数据;
步骤S104,根据第一图像数据的类型对第一图像数据进行分割处理,得到第一图像数据中与目标对象对应的第一目标区域;
步骤S106,将第一目标区域与第二图像数据进行融合处理,生成融合后的图像。
通过上述步骤,可以实现位于同一终端设备上的采集装置分别采集第一图像数据和第二图像数据,并将第一图像数据包含的目标对象与第二图像数据进行融合,达到了充分利用前置与后置摄像头优势的目的,从而实现了兼顾背景环境的技术效果,进而解决了由于缺乏有效利用前置相机和后置相机各自的特性造成的无法兼顾背景环境技术问题。
需要进行说明的是,上述图像数据的采集装置是同一终端(例如,手机和笔记本)的前置摄像头和后置摄像头,采集的图像数据中包含目标对象的即为第一图像数据,故第一图像数据采集装置并非固定为前置摄像头,亦可以为后置摄像头,同理,第二图像数据的采集装置为前置摄像头或后置摄像头;第一图像数据和第二图像数据可以 是图片也可以是视频;目标对象可以是人像也可以是其他对象,例如动物和物品;目标场景可以为目标对象所处的场景或任意虚拟场景。
具体的,以利用手机的前置摄像头拍摄人像图像、后置摄像头拍摄背景视频为例,图像处理的步骤可以如图2所示,采集的人像图像和背景视频分别经过不同的处理流程后进行融合得到融合后的视频,一方面,实现了在同一时刻同时采集了前景和背景信息,可以更多地保留和还原了拍摄地点的场景信息;另一方面,充分利用了手机前后置摄像头各自的优势,既能够方便观察自拍人像的成像情况,也方便观察整个背景场景的信息以便获取更好的取景。
在本申请的一些实施例中,根据所述第一图像数据的类型对所述第一图像数据进行分割处理,得到所述第一图像数据中与所述目标对象对应的第一目标区域,包括:当所述第一图像数据是视频,利用预先训练的第一神经网络模型对所述第一图像数据进行分割处理,以得到所述目标对象对应的第一目标区域;当所述第一图像数据是图片,利用预先确定的第二神经网络模型对所述第一图像数据进行分割处理,得到所述目标对象对应的第一目标区域。
需要进行说明的是,由于本申请不限制终端采集数据的类型,故采集的第一图像数据和第二图像数据可以是图片也可以是视频,在实际分割中针对不同类型的数据在处理上目标和方法不同,故在对图像数据进行分割处理之前,对上述第一图像数据进行类别判断。针对视频数据的分割处理,不仅要保证精度还要同时维持实时处理,示例性的,第一神经网络模型可以为轻量级神经网络模型,用于对视频图像进行分割处理;针对图片数据的分割处理,由于图像对细节提出更高要求,示例性的,第二神经网络模型可以为卷积神经网络模型。
在本申请的一些实施例中,对所述第一图像数据进行类别判断之前,所述方法还包括:对所述第一图像数据进行内容识别和判断;当输入的图像数据被判定为属于分割处理所支持的场景时,所述第一图像数据进行类别判断;当输入的图像数据被判定为不属于分割处理所支持的场景时,所述第一图像数据结束处理。
具体的,以移动设备的图像数据作为输入为例,通过计算机视觉的方法,实现对图片中目标区域的提取,处理流程如图3所示:输入图像数据然后对图像进行预处理,对预处理过后的图像进行辨别是否属于分割引擎所支持的场景,若支持则进行处理,若不支持则不进行处理。首先,分割引擎支持的场景要求图像包含目标对象,其中,目标对象的种类是用户预设,并通过检测识别算法获取图像包含的目标对象的种类;其次,目标对象距离采集装置的距离需要满足预设条件,若目标对象与采集装置之间的距离超过预设范围,过近会导致采集的目标对象不全,例如只有部分五官无法形成合照,过远会导致采集的目标对象细节损失严重,严重影响后续融合效果,故无法启 动采集装置。
需要进行说明的是,图像数据的预处理为以移动设备图像数据为输入,根据后续分割网络要求将其转换为对应的数据格式,并对图像进行尺寸、色彩、角度等方面的调整,得到可用图像数据;对预处理过后的图像进行辨别是否属于分割引擎所支持的场景是通过基于海量带标签监督数据预先训练的场景判别卷积神经网络构成,能够快速准确的对输入图像的内容进行识别,仅当输入图像被判定为属于分割引擎所支持的场景时,才进入后续处理流程。识别网络是由卷积层、激活层、池化层堆叠组成的典型分类网络,考虑到移动设备端性能要求,网络层数和宽度上都做了严格的限制和优化,保证设备端毫秒级运行。
在本申请的一些实施例中,所述第一神经网络模型为采用骨干网络和解码网络跨层连接的方式作为网络结构的轻量级模型。所述预先训练的第一神经网络模型的训练方法包括:获取训练数据集,其中,所述训练数据集包括:第一样本图像数据和第一样本目标区域,其中,所述第一样本目标区域为基于所述第一样本图像数据获得目标区域掩码图;基于所述训练数据集训练神经网络模型,生成所述第一神经网络模型,其中,在训练所述第一神经网络模型的过程中,基于帧间信息对所述第一神经网络模型进行一致性约束。
具体的,对于第一神经网络模型,其训练数据集的建立首先通过搜集大量的第一样本图像数据,其中,第一样本图像数据需同时包含对象和背景的合影,由于训练网络和预测网络是相互独立的,样本图像数据并不限制对象和场景为识别时的目标对象和目标背景,但是样本图像数据需要覆盖实际使用场景类别。此外,本申请并不限制第一样本图像数据的生成方式,可以实际场景采集亦可以为通过后期合成。获取第一样本图像数据以后,通过人工或自动识别确定第一样本目标区域,即将通过对第一样本图像数据进行标注得到目标区域掩码图作为监督信息。考虑到视频应用的实时性要求,网络部分采用了轻量级的模型,有卷积层、激活层、池化层、反卷积组成。在网络层数,卷积类型,降采样位置等方面做了针对性优化。网络结构上采用骨干网络和解码网络跨层连接的方式,提高分割精度。同时,网络部署中借助运行硬件平台的指令优化、专用设备优化,达到实时分割的计算性能。训练过程中,为了提高视频时序上结果的稳定性,训练过程中加入了帧间信息一致性约束。
在本申请的一些实施例中,在利用预先训练的第一神经网络模型得到所述目标对象对应的第一目标区域之后,所述方法还包括:将所述第一目标区域转化为灰度掩码图;并将所述灰度掩码图的边界进行平滑。
具体的,视频通过预先训练的第一神经网络模型得到对应的第一目标区域之后,将第一目标区域转换为背景为0值的灰度掩码图。其次,通过图像处理算法去除小的 孤立区域,平滑掩码边界,掩码的引入不仅有利于屏蔽目标区域外的噪声,可以充分提取和利用感兴趣的区域,从而优化分割结果。
在本申请的一些实施例中,在利用预先训练的第一神经网络模型得到所述目标对象对应的第一目标区域之后,所述方法还包括:获取所述第一图像数据的前帧分割结果,并利用所述前帧分割结果对所述目标对应的第一目标区域进行时域平滑滤波。具体的,利用前帧的分割结果进行时域平滑滤波,增加了帧间稳定性,保证了输出视频结果的连续性。
在本申请的一些实施例中,所述第二神经网络模型为加入了空洞卷积和注意力机制的卷积神经网络模型。
具体的,对于第二神经网络模型,该网络模型由参数量更多、结构更加复杂的深度卷积网络构成。考虑到拍照模式下对细节提出了更高的要求,在训练数据人工标注时就采用更加精细的标准,提高了监督数据的质量。同时拍照模式下对算力性能的要求降低,因此在网络部分适当的增加了空洞卷积、注意力机制等结构来提高网络解析能力。网络的深度、宽度、特征图大小上也做了适当放宽,从而达到更加精准的分割需求。部署阶段同样借助运行平台上的指令优化、专用设备优化,提升模型运行速度。
需要进行说明的是,第一目标区域可以是目标对象在第一图像数据中所在的区域。
在本申请的一些实施例中,在利用预先训练的第二神经网络模型得到所述目标对象对应的第一目标区域之后,所述方法还包括:结合所述第一图像数据,并利用预先训练的第三神经网络对所述第一目标区域进行优化处理,得到处理后的第一目标区域。
其中,所述预先训练的第三神经网络模型的训练方法包括:获取纯色背景的图像;对所述纯色背景的图像进行抠图处理和预标注,得到标签蒙版图像;以所述纯色背景的图像和所述标签蒙版图像为样本数据训练得到所述第三神经网络模型。
需要进行说明的是,如图3所示,第三神经网络可以是Matting网络(一种用于图像精细化分割的神经网络)。
具体的,第三神经网络中输入为第二神经网络模型的输出,以及原始第一图像数据。由于分割网络分辨率和降采样等限制,通过第二神经网络获得的区域图在物体边缘、毛发等区域无法获得精细的分割结果。以Matting网络为例说明上述步骤,在第二神经网络输出的第一目标区域的基础上,Matting网络根据置信度获取trimap图(三分图),同时在网络中加入注意力机制,能够使网络更加专注于边缘,提高边缘精度。网络的输出是一张蒙版图,即为处理过后的第一目标区域图,它是对每个像素位置不透明度的回归。
由于第三神经网络训练需要更加精细透明度作为监督数据,为此,本申请采用定点采集、自动预标注、人工修正三个步骤来获取训练数据。具体的,定点采集指搭建纯色背景采集环境,其环境光可调,获取较为自然的纯色背景数据。自动预标注采用图像抠图算法和训练的第三神经网络对纯色背景数据进行处理,得到初始蒙版图。最后,通过人工修正方式对初始蒙版图结果中错误区域进行微调,得到最终标注结果。在实施过程中,自动预标注部分的第三神经网络可以随着数据更新进行迭代,使得预标注效果不断提升,进一步逐渐降低人工标注成本。
在本申请的一些实施例中,将所述第一目标区域与所述第二图像数据进行融合处理,生成融合后的图像之前,所述方法还包括:将所述第一目标区域输入后处理模块进行后处理。
需要进行说明的是,在终端设备是可移动的情况下,例如手机,往往是手持设备拍摄,所拍摄的视频往往包含一定的抖动,可以利用视频防抖模块通过硬件及软件技术,减轻或消除视频中的抖动,使得拍摄的视频更加稳定。本申请针对目标图像种类包含视频的,均会加入防抖模块进行防抖处理。
在一些可选的方式中,由于受手机算力及内存的限制,基于神经网络的语义分割算法只能处理较小分辨率的图像,一般不大于512x512像素,而这相对于初始图像高清(720P)、超清(1080P)甚至4K等大分辨率,往往要小得多。一般的处理方式,是先将图像下采样至神经网络分辨率,获得结果后再上采样至原始分辨率。而直接上采样后的结果和原始分辨率在边界上往往是对不齐的,且由于存在缩放的关系,会造成小图的细节丢失,再进行上采样后,对于小图上无法显现的细节获得的分割结果也是不准确的。基于此,本申请采用后处理模块,基于第一神经网络、第二神经网络或第三神经网络的分割结果,通过保持边缘滤波器算法计算出和原始分辨率图像边界对齐更准确,以及细节更丰富的人像前景区域。
在本申请的一些实施例中,将所述第一目标区域与所述第二图像数据进行融合处理,生成融合后的图像,包括:估所述第二图像数据的环境信息,根据所述环境信息校正所述第一目标区域,获得校正后的第一目标区域;确定所述第一目标区域在所述第二图像数据中对应的第二目标区域;将所述第二目标区域替换为所述校正后的第一目标区域。
融合是根据前背景各自的特点实现使得结果尽可能自然,包括保留前景肤色的自然性,且调整前景的色调能够更自然地适应背景的色调,使得融合后的图像视觉上形成一个完整的整体,而减少抠图痕迹。本申请不限制融合的对象种类,可以实现包括图像前景和图像背景、图像前景和视频背景、视频前景和视频背景以及视频前景和图像背景的融合。示例性的,以人像前景区域为第一目标区域和背景视频结果为第二图 像,将输出的人像与背景视频融合的结果视频。
具体的,由于前景和背景目标对象位置和图像质量不同,融合模块中直接替换将导致最终结果不协调。故本申请会首先评估背景的环境信息,包括光照的方向、强度、颜色等信息,再对前景进行相应的校正,保证前背景图像的图像质量统一。其次,针对放置位置,前景区域放置于背景区域的位置一般放置于中心,此外,前景图像的上、下、左、右任意一条或多条边界接触到图像边界,融合结果也会保留该特性。在实际的应用中,采集的背景图像待融合区域并不是纯背景,会存在有遮挡的情况,例如区域内出现非目标对象的人和物等,针对上述区域图像存在遮挡情况,本申请还包括对背景图像进行目标识别,当识别结果包含遮挡目标时,则对用户进行提示。
此外,若待融合的对象包括视频,融合计算涉及的计算量巨大,为了不仅要保证精度还要同时维持实时处理,对对应的对象进行简化处理,包括优化整体流程,只进行轻量级网络分割,而不使用精细级的网络,以及对各模块都进行相应的简化,比如减少甚至完全去除后处理,对使用的轻量级网络进一步压缩模型。
本申请实施例还提供了一种图像处理装置,如图4所示,包括:获取模块40,用于获取目标对象的第一图像数据以及目标场景的第二图像数据;分割模块42,用于根据所述第一图像数据的类型对所述第一图像数据进行分割处理,得到所述第一图像数据中与所述目标对象对应的第一目标区域;融合模块44,将所述第一目标区域与所述第二图像数据进行融合处理,生成融合后的图像。
分割模块42包括:第一处理子模块;判断子模块用于对所述第一图像数据进行类别判断;当所述第一图像数据是视频,利用预先训练的第一神经网络模型对所述第一图像数据进行分割处理,以得到所述目标对象对应的第一目标区域;当所述第一图像数据是图片,利用预先确定的第二神经网络模型对所述第一图像数据进行分割处理,得到所述目标对象对应的第一目标区域。
判断子模块包括:判断单元、第一训练单元、转化单元;
判断单元用于对所述第一图像数据进行内容识别和判断,当输入的图像数据被判定为属于分割处理所支持的场景时,所述第一图像数据进行类别判断;当输入的图像数据被判定为不属于分割处理所支持的场景时,所述第一图像数据结束处理。
第一训练单元用于获取训练数据集,其中,所述训练数据集包括:第一样本图像数据和第一样本目标区域,其中,所述第一样本目标区域为基于所述第一样本图像数据获得目标区域掩码图;基于所述训练数据集训练神经网络模型,生成所述第一神经网络模型,其中,在训练所述第一神经网络模型的过程中,基于帧间信息对所述第一神经网络模型进行一致性约束。
转化单元用于将所述第一目标区域转化为灰度掩码图;并将所述灰度掩码图的边界进行平滑。
第一处理子模块用于结合所述第一图像数据,并利用预先训练的第三神经网络对所述第一目标区域进行优化处理,得到处理后的第一目标区域。
第一处理子模块包括:第二训练单元;
第二训练单元用于获取纯色背景的图像;对所述纯色背景的图像进行抠图处理和预标注,得到标签蒙版图像;以所述纯色背景的图像为样本数据,所述标签蒙版图像作为所述监督数据训练得到所述第三神经网络模型。
融合模块44包括:第二处理子模块和生成子模块;第二处理子模块用于将所述第一目标区域输入后处理模块进行后处理。
生成子模块用于评估所述第二图像数据的环境信息,根据所述环境信息校正所述第一目标区域,获得校正后的第一目标区域;
确定所述第一目标区域在所述第二图像数据中对应的第二目标区域;
将所述第二目标区域替换为所述校正后的第一目标区域。
根据本申请实施例的再一方面,还提供了一种非易失性存储介质,非易失性存储介质包括存储的程序,其中,在程序运行时控制非易失性存储介质所在设备执行上述图像的处理方法。
上述非易失性存储介质用于存储执行以下功能的程序:获取目标对象的第一图像以及目标场景的第二图像;根据所述第一图像数据的类型对所述第一图像数据进行分割处理,得到第一图像中与目标对象对应的第一目标区域;将第一目标区域与第二图像进行融合处理,生成融合后的图像。
根据本申请实施例的再一方面,还提供了一种电子设备,包括存储器和处理器;处理器用于运行程序,其中,程序运行时执行上述图像的处理方法。
上述处理器用于运行执行以下功能的程序:获取目标对象的第一图像以及目标场景的第二图像;根据所述第一图像数据的类型对所述第一图像数据进行分割处理,得到第一图像中与目标对象对应的第一目标区域;将第一目标区域与第二图像进行融合处理,生成融合后的图像。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
在本申请的上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有 详述的部分,可以参见其他实施例的相关描述。
在本申请所提供的几个实施例中,应该理解到,所揭露的技术内容,可通过其它的方式实现。其中,以上所描述的装置实施例仅仅是示意性的,例如单元的划分,可以为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,单元或模块的间接耦合或通信连接,可以是电性或其它的形式。
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
以上仅是本申请的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本申请的保护范围。
工业实用性
本申请实施例提供的方案可应用于图像处理技术领域,在本申请实施例中,采用获取目标对象的第一图像数据以及目标场景的第二图像数据;根据所述第一图像数据的类型对所述第一图像数据进行分割处理,得到所述第一图像数据中与所述目标对象对应的第一目标区域;将所述第一目标区域与所述第二图像数据进行融合处理,生成融合后的图像,取得了在拍照时充分利用相机的前置与后置摄像头,兼顾背景环境的技术效果。

Claims (15)

  1. 一种图像的处理方法,包括:
    获取目标对象的第一图像数据以及目标场景的第二图像数据;
    根据所述第一图像数据的类型对所述第一图像数据进行分割处理,得到所述第一图像数据中与所述目标对象对应的第一目标区域;
    将所述第一目标区域与所述第二图像数据进行融合处理,生成融合后的图像。
  2. 根据权利要求1所述的方法,其中,根据所述第一图像数据的类型对所述第一图像数据进行分割处理,得到所述第一图像数据中与所述目标对象对应的第一目标区域,包括:
    当所述第一图像数据是视频,利用预先训练的第一神经网络模型对所述第一图像数据进行分割处理,以得到所述目标对象对应的第一目标区域;
    当所述第一图像数据是图片,利用预先确定的第二神经网络模型对所述第一图像数据进行分割处理,得到所述目标对象对应的第一目标区域。
  3. 根据权利要求1所述的方法,其中,对所述第一图像数据进行类别判断之前,所述方法还包括:
    对所述第一图像数据进行内容识别和判断;
    当输入的所述第一图像数据被判定为属于分割处理所支持的场景时,对所述第一图像数据进行类别判断;
    当输入的所述第一图像数据被判定为不属于分割处理所支持的场景时,所述第一图像数据结束处理。
  4. 根据权利要求2所述的方法,其中,所述第一神经网络模型为采用骨干网络和解码网络跨层连接的方式作为网络结构的轻量级模型。
  5. 根据权利要求2所述的方法,其中,所述预先训练的第一神经网络模型的训练方法包括:
    获取训练数据集,其中,所述训练数据集包括:第一样本图像数据和第一样本目标区域,其中,所述第一样本目标区域为基于所述第一样本图像数据获得目标区域掩码图;
    基于所述训练数据集训练神经网络模型,生成所述第一神经网络模型,其中,在训练所述第一神经网络模型的过程中,基于帧间信息对所述第一神经网络模型进行一致性约束。
  6. 根据权利要求2所述的方法,其中,在利用预先训练的第一神经网络模型得到所述目标对象对应的第一目标区域之后,所述方法还包括:
    将所述第一目标区域转化为灰度掩码图,并将所述灰度掩码图的边界进行平滑。
  7. 根据权利要求2所述的方法,其中,在利用预先训练的第一神经网络模型得到所述目标对象对应的第一目标区域之后,所述方法还包括:
    获取所述第一图像数据的前帧分割结果,并利用所述前帧分割结果对所述目标对应的所述第一目标区域进行时域平滑滤波。
  8. 根据权利要求2所述的方法,其中,所述第二神经网络模型为加入空洞卷积和注意力机制的卷积神经网络模型。
  9. 根据权利要求2所述的方法,其中,在利用预先训练的第二神经网络模型得到所述目标对象对应的第一目标区域之后,所述方法还包括:
    结合所述第一图像数据,并利用预先训练的第三神经网络对所述第一目标区域进行优化处理,得到处理后的第一目标区域。
  10. 根据权利要求9所述的方法,其中,所述预先训练的第三神经网络模型的训练方法,包括:
    获取纯色背景的图像;
    对所述纯色背景的图像进行抠图处理和预标注,得到标签蒙版图像;
    以所述纯色背景的图像和所述标签蒙版图像作为样本数据训练得到所述第三神经网络模型。
  11. 根据权利要求1或9所述的方法,其中,将所述第一目标区域与所述第二图像数据进行融合处理,生成融合后的图像之前,所述方法还包括:
    将所述第一目标区域输入后处理模块进行后处理。
  12. 根据权利要求1所述的方法,其中,将所述第一目标区域与所述第二图像数据进行融合处理,生成融合后的图像,包括:
    评估所述第二图像数据的环境信息,根据所述环境信息校正所述第一目标区域,获得校正后的第一目标区域;
    确定所述第一目标区域在所述第二图像数据中对应的第二目标区域;
    将所述第二目标区域替换为所述校正后的第一目标区域。
  13. 一种图像处理装置,包括:
    获取模块,设置为获取目标对象的第一图像数据以及目标场景的第二图像数据;
    分割模块,设置为根据所述第一图像数据的类型对所述第一图像数据进行分割处理,得到所述第一图像数据中与所述目标对象对应的第一目标区域;
    融合模块,设置为将所述第一目标区域与所述第二图像数据进行融合处理,生成融合后的图像。
  14. 一种非易失性存储介质,所述非易失性存储介质包括存储的程序,其中,在所述程序运行时控制所述非易失性存储介质所在设备执行权利要求1至12中任意一项所述图像的处理方法。
  15. 一种电子设备,包括存储器和处理器;所述处理器用于运行程序,其中,所述程序运行时执行权利要求1至12中任意一项所述图像的处理方法。
PCT/CN2023/092576 2022-05-07 2023-05-06 图像处理方法及装置、非易失性可读存储介质、电子设备 WO2023217046A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210493747.X 2022-05-07
CN202210493747.XA CN114897916A (zh) 2022-05-07 2022-05-07 图像处理方法及装置、非易失性可读存储介质、电子设备

Publications (1)

Publication Number Publication Date
WO2023217046A1 true WO2023217046A1 (zh) 2023-11-16

Family

ID=82722608

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/092576 WO2023217046A1 (zh) 2022-05-07 2023-05-06 图像处理方法及装置、非易失性可读存储介质、电子设备

Country Status (2)

Country Link
CN (1) CN114897916A (zh)
WO (1) WO2023217046A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114897916A (zh) * 2022-05-07 2022-08-12 虹软科技股份有限公司 图像处理方法及装置、非易失性可读存储介质、电子设备
CN115760986B (zh) * 2022-11-30 2023-07-25 北京中环高科环境治理有限公司 基于神经网络模型的图像处理方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111553923A (zh) * 2019-04-01 2020-08-18 上海卫莎网络科技有限公司 一种图像处理方法、电子设备及计算机可读存储介质
CN111581568A (zh) * 2020-03-25 2020-08-25 中山大学 一种网页端人物换背景的方法
CN111629212A (zh) * 2020-04-30 2020-09-04 网宿科技股份有限公司 一种对视频进行转码的方法和装置
CN112419328A (zh) * 2019-08-22 2021-02-26 北京市商汤科技开发有限公司 图像处理方法及装置、电子设备和存储介质
CN114897916A (zh) * 2022-05-07 2022-08-12 虹软科技股份有限公司 图像处理方法及装置、非易失性可读存储介质、电子设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111553923A (zh) * 2019-04-01 2020-08-18 上海卫莎网络科技有限公司 一种图像处理方法、电子设备及计算机可读存储介质
CN112419328A (zh) * 2019-08-22 2021-02-26 北京市商汤科技开发有限公司 图像处理方法及装置、电子设备和存储介质
CN111581568A (zh) * 2020-03-25 2020-08-25 中山大学 一种网页端人物换背景的方法
CN111629212A (zh) * 2020-04-30 2020-09-04 网宿科技股份有限公司 一种对视频进行转码的方法和装置
CN114897916A (zh) * 2022-05-07 2022-08-12 虹软科技股份有限公司 图像处理方法及装置、非易失性可读存储介质、电子设备

Also Published As

Publication number Publication date
CN114897916A (zh) 2022-08-12

Similar Documents

Publication Publication Date Title
CN111402135B (zh) 图像处理方法、装置、电子设备及计算机可读存储介质
WO2023217046A1 (zh) 图像处理方法及装置、非易失性可读存储介质、电子设备
US9692964B2 (en) Modification of post-viewing parameters for digital images using image region or feature information
Liu et al. HoLoCo: Holistic and local contrastive learning network for multi-exposure image fusion
Yang et al. Seeing deeply and bidirectionally: A deep learning approach for single image reflection removal
EP3937481A1 (en) Image display method and device
US9129381B2 (en) Modification of post-viewing parameters for digital images using image region or feature information
WO2021022983A1 (zh) 图像处理方法和装置、电子设备、计算机可读存储介质
CN106899781B (zh) 一种图像处理方法及电子设备
JP4865038B2 (ja) 顔検出と肌の色合いの情報を用いたデジタル画像処理
US8983202B2 (en) Smile detection systems and methods
US20150063692A1 (en) Image capture device with contemporaneous reference image capture mechanism
WO2018136373A1 (en) Image fusion and hdr imaging
CN107368806B (zh) 图像矫正方法、装置、计算机可读存储介质和计算机设备
CN113888437A (zh) 图像处理方法、装置、电子设备及计算机可读存储介质
JP2005303991A (ja) 撮像装置、撮像方法、及び撮像プログラム
CN103079034A (zh) 一种感知拍摄方法及系统
CN111986129A (zh) 基于多摄图像融合的hdr图像生成方法、设备及存储介质
CN111967319B (zh) 基于红外和可见光的活体检测方法、装置、设备和存储介质
WO2022261828A1 (zh) 图像处理方法、装置、电子设备及计算机可读存储介质
WO2021179764A1 (zh) 图像处理模型生成方法、处理方法、存储介质及终端
WO2023066173A1 (zh) 图像处理方法、装置及存储介质、电子设备
CN112804464A (zh) 一种hdr图像生成方法、装置、电子设备及可读存储介质
CN112258380A (zh) 图像处理方法、装置、设备及存储介质
CN110365897B (zh) 图像修正方法和装置、电子设备、计算机可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23802818

Country of ref document: EP

Kind code of ref document: A1