CN109816663B

CN109816663B - Image processing method, device and equipment

Info

Publication number: CN109816663B
Application number: CN201811199234.8A
Authority: CN
Inventors: 李宇; 马飞龙; 王提政; 黄秀杰
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2018-10-15
Filing date: 2018-10-15
Publication date: 2021-04-20
Anticipated expiration: 2038-10-15
Also published as: CN112840376B; EP3859670A1; AU2019362347B2; JP7226851B2; US20210241432A1; CN113129312A; CN112840376A; CN113112505B; EP3859670A4; BR112021007094A2; CN113129312B; WO2020078027A1; JP2022505115A; KR20210073568A; CN113112505A; AU2019362347A1; CN109816663A; MX2021004295A

Abstract

The invention provides an image processing method, which comprises the steps of determining a target area and a background area in an image by performing template segmentation on the image; by applying different color processing modes to the target area and the background area, the brightness or the chroma of the target area is higher than that of the background area, so that the theme corresponding to the target area is more remarkably highlighted, a special film effect is achieved when a terminal user shoots or shoots a video, and the shooting experience of the user is improved. In addition, the invention also provides a method for confirming the target area in different time periods and changing the color processing mode in different time periods, so that the main body change and the color change of the video content are more flexible and autonomous, and the man-machine interaction is enhanced.

Description

Image processing method, device and equipment

Technical Field

The present invention relates to the field of terminal technologies, and in particular, to an image processing method, apparatus and device.

Background

And (3) shooting, namely recording images of people and objects by using a camera and a video recorder. Different scenes have different shooting skills, such as night scene shooting, rain scene shooting, building shooting, portrait shooting and the like, and movie dynamic art shooting is also a type of shooting, but all follow a certain principle. With the progress of science and technology, shooting is also simpler and more popular.

With the improvement of network bandwidth and the enhancement of terminal processing capacity, the shooting and sharing of videos and images are more and more convenient, and the video consumption becomes a new life style of people. Video has rapidly become the primary focus of traffic on networks, with 80% -90% of traffic expected in the next few years.

In daily life, shooting becomes a main way for people to show themselves and find beauty, and people want to shoot more interesting styles by themselves; for example, special effect processing of images or videos is completed at the same time of shooting, and the WYSIWYG shooting experience is realized. Therefore, for non-professionals, more novel image processing technologies need to be integrated in the terminal.

At present, the terminal video recording function is generally monotonous. The existing video shooting can only provide routine shooting generally, and cannot realize some personalized effects.

Disclosure of Invention

The invention provides an image processing method, which comprises the steps of determining a target area and a background area in an image by performing template segmentation on the image; by applying different color processing modes to the target area and the background area, the brightness or the chroma of the target area is higher than that of the background area, so that the theme corresponding to the target area is more remarkably highlighted, a special film effect is achieved when a terminal user shoots or shoots a video, and the shooting experience of the user is improved.

The embodiment of the invention provides the following specific technical scheme:

in a first aspect, an embodiment of the present invention provides an image processing method, where the method is applied in a process of recording a video, and the method includes: acquiring N1 images over a first period of time; acquiring N2 images during a second time period; wherein the first time period and the second time period are adjacent time periods, and both N1 and N2 are positive integers; for each of the N1 images, determining a first target region and a first background region in the image; the first background area is a part of the image except the first target area; wherein the first target region in each of the N1 images corresponds to a first object; for each of the N2 images, determining a second target region and a second background region in the image; the second background area is the part of the image except the second target area; wherein the second target region in each of the N2 images corresponds to a second object; processing the first target area by adopting a first color processing mode, processing the first background area by adopting a second color processing mode, processing the second target area by adopting a third color processing mode, and processing the second background area by adopting a fourth color processing mode to obtain a target video; in the target video, the chroma of the first target area is greater than that of the first background area, or the brightness of the first target area is greater than that of the first background area; and the chromaticity of the second target region is greater than the chromaticity of the second background region, or the luminance of the second target region is greater than the luminance of the second background region.

In a second aspect, an embodiment of the present invention provides an image processing apparatus, where the apparatus is used in a process of capturing a video, and the apparatus includes: a shooting module for acquiring N1 images in a first time period and acquiring N2 images in a second time period; wherein the first time interval and the second time interval are adjacent time intervals, and both N1 and N2 are positive integers; a determining module for determining, for each of the N1 images, a first target region and a first background region in the image; the first background area is a part of the image except the first target area; wherein the first target region in each of the N1 images corresponds to the first object; for each of the N2 images, determining a second target region and a second background region in the image; the second background area is the part of the image except the second target area; wherein the second target region in each of the N2 images corresponds to the second object; the color processing module is used for processing the first target area by adopting a first color processing mode, processing the first background area by adopting a second color processing mode, processing the second target area by adopting a third color processing mode, and processing the second background area by adopting a fourth color processing mode to obtain a target video; in the target video, the chroma of the first target area is greater than that of the first background area, or the brightness of the first target area is greater than that of the first background area; and the chromaticity of the second target region is greater than the chromaticity of the second background region, or the luminance of the second target region is greater than the luminance of the second background region.

The method of the first aspect and the apparatus of the second aspect may be applied to recording video, and the broad video recording may include scenes with real-time video stream capture such as narrow-sense video capture and video calls. The photographed video may include the above-mentioned N images.

It should be understood that in the field of image application, the brightness of a certain area is greater than that of another area to describe an overall brightness visual perception, and the overall brightness representing the certain area is stronger relative to that of the other area; similarly, the chromaticity of one region is greater than that of another region to describe a global chromaticity visual perception, and the global chromaticity characterizing one region is stronger relative to that of another region.

In a possible design according to the first or second aspect, the first and second objects correspond to the same object.

In a possible embodiment according to the first or second aspect, the first and second objects correspond to the same object or to different objects.

In one possible design according to the first or second aspect, the first or second object comprises at least one individual of a person, an animal or a plant.

In a possible design according to the first or second aspect, the determination of the first and second objects is determined by a selection instruction of the user.

Specifically, for example, in a first frame image of a first time period, a first object is determined according to a selection instruction of a user, and all images in the first time period use the first object as a target object; similarly, in the first frame image in the second time interval, the second object is determined according to the selection instruction of the user, and all the images in the second time interval use the second object as the target object. If the semantic segmentation can be carried out on the image, k segmentation templates are obtained; the k segmentation templates correspond to different object categories, the selection instruction input by the user corresponds to which segmentation template or segmentation templates, and the object corresponding to which segmentation template or segmentation templates is the target object (including the first target object or the second target object).

In a specific application, the main body template is used for distinguishing different objects, and therefore, the main body template can be also called an object template in some cases. In a possible design according to the first or second aspect, the first object and the second object are determined by the terminal from the content of the two images at the preset time interval, respectively.

Specifically, for example, in a first frame image of a first period, a first object is determined, and all images in the first period use the first object as a target object; similarly, in the first frame image in the second time interval, the second object is determined, and all the images in the second time interval use the second object as the target object. Determining the first object in the first frame image of the first time interval and determining the second object in the first frame image of the second time interval includes, but is not limited to, one of the following ways:

performing semantic segmentation on the image to obtain k segmentation templates; wherein the k segmentation templates correspond to different object classes;

if k is 2 and the 2 segmentation templates comprise 1 object template and 1 background template, determining an image area corresponding to the object template as a target area; determining an area corresponding to the background template as a background area; correspondingly, the object corresponding to the object template is the first object or the second object; alternatively, the first and second electrodes may be,

if k is greater than 2 and the number of pixels contained in k0 object templates in the k segmentation templates is greater than a preset threshold, determining image areas corresponding to k0 object templates as target areas and determining image areas corresponding to the rest segmentation templates as background areas; correspondingly, the object corresponding to the object template is the first object or the second object; wherein k0 is a non-negative integer less than k; alternatively, the first and second electrodes may be,

if k is larger than 2, determining an image area corresponding to the segmentation template with the largest number of pixels in the k segmentation templates as a target area, and determining image areas corresponding to the rest segmentation templates as background areas; correspondingly, the object corresponding to the object template is the first object or the second object; alternatively, the first and second electrodes may be,

if k is larger than 2, determining a target template in the k segmentation templates according to the preset priority of the object class; determining an image area corresponding to the target template as a target area, and determining image areas corresponding to the rest of segmentation templates as background areas; correspondingly, the object corresponding to the object template is the first object or the second object; alternatively, the first and second electrodes may be,

if k is larger than 2, determining a target template in the k segmentation templates according to a selection instruction of a user; determining an image area corresponding to the target template as a target area, and determining image areas corresponding to the rest of segmentation templates as background areas; correspondingly, the object corresponding to the object template is the first object or the second object. The method may be specifically performed by a determination module.

In a possible design according to the first or second aspect, the first color processing mode is the same as the third color processing mode, and the second color processing mode is the same as the fourth color processing mode.

According to the first aspect or the second aspect, in one possible design, the first color processing manner is the same as the third color processing manner, and the second color processing manner is different from the fourth color processing manner.

According to the first aspect or the second aspect, in one possible design, the first color processing manner is different from the third color processing manner, and the second color processing manner is the same as the fourth color processing manner.

According to the first aspect or the second aspect, in one possible design, the first color processing manner is different from the third color processing manner, and the second color processing manner is different from the fourth color processing manner.

In a possible design according to the first aspect or the second aspect, the first color processing manner or the third color processing manner includes: preserving color, or one of color enhancement.

In a possible design according to the first or second aspect, the second or fourth color processing mode includes: black and white, darkened, blurred, or retro.

In a third aspect, an embodiment of the present invention provides a terminal device, including a camera, a memory, a processor, and a bus; the camera, the memory and the processor are connected through a bus; the camera is used for acquiring images, the memory is used for storing computer programs and instructions, and the processor is used for calling the computer programs and instructions stored in the memory and the acquired images and is also used for enabling the terminal equipment to execute any one of the possible design methods.

According to the third aspect, in one possible design, the terminal device further includes an antenna system, and the antenna system transmits and receives wireless communication signals under the control of the processor to realize wireless communication with the mobile communication network; the mobile communications network comprises one or more of the following: GSM networks, CDMA networks, 3G networks, 4G networks, 5G networks, FDMA, TDMA, PDC, TACS, AMPS, WCDMA, TDSCDMA, WIFI, and LTE networks.

In a fourth aspect, the present invention provides an image processing method, comprising: when a video is shot, determining a main body in a video picture; carrying out different color processing on a target area and a background area in a video picture to obtain a target video; the target area corresponds to the main body, and the background area is an area or part of the video picture except the target area.

In a fifth aspect, the present invention provides an image processing apparatus, comprising: the shooting module is used for shooting videos; the determining module is used for determining a main body in a video picture; the color processing module is used for carrying out different color processing on a target area and a background area in a video picture to obtain a target video; the target area corresponds to the main body, and the background area is an area or part of the video picture except the target area.

In a possible design according to the fourth or fifth aspect, the performing different color processing on the target area and the background area in the video picture includes: reserving color for a target area in a video picture, and carrying out gray/black-and-white processing on a background area in the video picture. This step is performed by the color processing module.

In a possible design according to the fourth or fifth aspect, the performing different color processing on the target area and the background area in the video picture includes: reserving color for a target area in a video picture, and blurring/blurring a background area in the video picture. This step is performed by the color processing module.

In a possible design according to the fourth or fifth aspect, the subjects identified in the video pictures are the same person. This step is performed by the determination module.

In a sixth aspect, an embodiment of the present invention provides a terminal device, including a camera, a memory, a processor, and a bus; the camera, the memory and the processor are connected through a bus; the camera is used for acquiring images, the memory is used for storing computer programs and instructions, and the processor is used for calling the computer programs and instructions stored in the memory and the acquired images, and is further specifically used for enabling the terminal device to execute any one of the possible design methods based on the fourth aspect or any one of the possible design modules based on the fifth aspect.

For any of the above possible designs, the combination of the schemes can be performed without violating the natural laws.

In the prior art, videos and images are shot, and no individual distinction and color distinction in any image is added, so that the special effect is not rich enough. Meanwhile, the invention can provide more color transformation and main body change, and improves the use viscosity of users.

Drawings

Fig. 1 is a schematic structural diagram of a terminal according to an embodiment of the present invention;

FIG. 2 is a flowchart of an image processing method according to an embodiment of the present invention;

FIG. 3 is an example of a segmentation template identifier in an embodiment of the present invention;

FIG. 4 is an illustration of another segmentation template identification in an embodiment of the present invention;

FIG. 5 is a diagram illustrating an example of determining a target template according to an embodiment of the present invention;

FIG. 6 is a schematic illustration of another embodiment of the present invention;

FIG. 7 is a schematic illustration of another embodiment of the present invention;

FIG. 8 is a schematic illustration of another embodiment of the present invention;

FIG. 9 is a diagram of an image processing apparatus according to an embodiment of the present invention;

FIG. 10 is a diagram of another image processing apparatus according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In this embodiment of the present invention, the terminal may be a device providing connectivity for shooting video and/or data for a user, a handheld device having a wireless connection function, or another processing device connected to a wireless modem, such as: digital cameras, single lens reflex cameras, mobile phones (or so-called "cellular" phones), smart phones, which may be portable, pocket-sized, hand-held, wearable devices (such as smart watches, etc.), tablet computers, Personal Computers (PCs), PDAs (Personal Digital assistants), car-mounted computers, drones, aerial cameras, etc.

Fig. 1 shows an alternative hardware structure diagram of the terminal 100.

Referring to fig. 1, the terminal 100 may include a radio frequency unit 110, a memory 120, an input unit 130, a display unit 140, a camera 150, an audio circuit 160 (including a speaker 161 and a microphone 162), a processor 170, an external interface 180, a power supply 190, and the like. Those skilled in the art will appreciate that fig. 1 is merely an example of a smart terminal or a multi-function device and does not constitute a limitation of a smart terminal or a multi-function device and may include more or less components than those shown, or combine some components, or different components. E.g., there is at least memory 120, processor 170, camera 150.

The camera 150 is used for collecting images or videos, and can be triggered and started through an application program instruction to realize a photographing or shooting function. The camera may include an imaging lens, a filter, an image sensor, and the like. Light rays emitted or reflected by the object enter the imaging lens, pass through the optical filter and finally converge on the image sensor. The imaging lens is mainly used for converging and imaging light emitted or reflected by all objects (also called as a scene to be shot, an object to be shot, a target scene or a target object, and also understood as a scene image expected to be shot by a user) in a shooting visual angle; the optical filter is mainly used for filtering unnecessary light waves (such as light waves except visible light, such as infrared) in light; the image sensor is mainly used for performing photoelectric conversion on the received optical signal, converting the optical signal into an electrical signal, and inputting the electrical signal to the processor 170 for subsequent processing. The cameras can be located in front of the terminal equipment and also can be located on the back of the terminal equipment, the specific number and the arrangement mode of the cameras can be flexibly determined according to the requirements of a designer or a manufacturer strategy, and the method is not limited in the application.

The input unit 130 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the portable multifunction device. In particular, the input unit 130 may include a touch screen 131 and/or other input devices 132. The touch screen 131 may collect touch operations of a user (e.g., operations of the user on or near the touch screen using any suitable object such as a finger, a joint, a stylus, etc.) and drive the corresponding connection device according to a preset program. The touch screen can detect the touch action of a user on the touch screen, convert the touch action into a touch signal and send the touch signal to the processor 170, and can receive and execute a command sent by the processor 170; the touch signal includes at least contact point coordinate information. The touch screen 131 may provide an input interface and an output interface between the terminal 100 and a user. In addition, the touch screen may be implemented using various types, such as resistive, capacitive, infrared, and surface acoustic wave. The input unit 130 may include other input devices in addition to the touch screen 131. In particular, other input devices 132 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys 132, switch keys 133, etc.), a trackball, a mouse, a joystick, and the like.

The display unit 140 may be used to display information input by or provided to a user, various menus of the terminal 100, an interactive interface, a file display, and/or a play of any one of multimedia files. In this embodiment of the present invention, the display unit is further configured to display images/videos acquired by the device using the camera 150, and the images/videos may include preview images/videos in some shooting modes, initial images/videos shot, and target images/videos shot and processed by a certain algorithm.

Further, the touch screen 131 can cover the display panel 141, and when the touch screen 131 detects a touch operation thereon or nearby, the touch operation is transmitted to the processor 170 to determine the type of the touch event, and then the processor 170 provides a corresponding visual output on the display panel 141 according to the type of the touch event. In this embodiment, the touch screen and the display unit may be integrated into one component to implement the input, output, and display functions of the terminal 100; for convenience of description, the touch display screen represents a functional set of the touch screen and the display unit; in some embodiments, the touch screen and the display unit may also be provided as two separate components.

The memory 120 may be used to store instructions and data, and the memory 120 may mainly include a storage instruction area and a storage data area, where the storage data area may store various data, such as multimedia files, texts, etc.; the storage instruction area may store software elements such as an operating system, an application, instructions required for at least one function, or a subset, an extended set thereof. Non-volatile random access memory may also be included; providing the processor 170 includes managing hardware, software, and data resources in the computing processing device, supporting control software and applications. But also for the storage of multimedia files, and for the storage of running programs and applications.

The processor 170 is a control center of the terminal 100, connects various parts of the entire handset using various interfaces and lines, and performs various functions of the terminal 100 and processes data by operating or executing instructions stored in the memory 120 and calling data stored in the memory 120, thereby performing overall control of the handset. Alternatively, processor 170 may include one or more processing units; preferably, the processor 170 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 170. In some embodiments, the processor, memory, and/or the like may be implemented on a single chip, or in some embodiments, they may be implemented separately on separate chips. The processor 170 may also be used for generating corresponding operation control signals, sending the corresponding operation control signals to the corresponding components of the computing and processing device, reading and processing data in software, and particularly reading and processing data and programs in the memory 120, so as to enable the respective functional modules therein to execute corresponding functions, thereby controlling the corresponding components to perform actions according to the instructions.

The rf unit 110 may be configured to receive and transmit information or signals during a call, for example, receive downlink information of a base station and then process the downlink information to the processor 170; in addition, the data for designing uplink is transmitted to the base station. Typically, the RF circuitry includes, but is not limited to, an antenna, at least one Amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like. In addition, the radio frequency unit 110 may also communicate with network devices and other devices through wireless communication. The wireless communication may use any communication standard or protocol, including but not limited to Global System for Mobile communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), email, Short Messaging Service (SMS), and the like.

Audio circuitry 160, speaker 161, and microphone 162 may provide an audio interface between a user and terminal 100. The audio circuit 160 can convert the received audio data into an electrical signal, transmit the electrical signal to the speaker 161, and convert the electrical signal into a sound signal for output by the speaker 161; on the other hand, the microphone 162 is used for collecting sound signals, converting the collected sound signals into electrical signals, converting the electrical signals into audio data after being received by the audio circuit 160, and outputting the audio data to the processor 170 for further processing via the rf unit 110, for example, to another terminal, or outputting the audio data to the memory 120, and the audio circuit may also include a headphone jack 163 for providing a connection interface between the audio circuit and headphones. The specific number and arrangement mode of the loudspeakers and the microphones can be flexibly determined according to the requirement of a designer or manufacturer policy, and the application is not limited.

The terminal 100 also includes a power supply 190 (e.g., a battery) for powering the various components, which may preferably be logically coupled to the processor 170 via a power management system that may be used to manage charging, discharging, and power consumption.

The terminal 100 further includes an external interface 180, which may be a standard Micro USB interface, or a multi-pin connector, which may be used to connect the terminal 100 to communicate with other devices, or to connect a charger to charge the terminal 100.

Although not shown, the terminal 100 may further include a flash, a wireless fidelity (WiFi) module, a bluetooth module, a sensor with different functions, and the like, which will not be described herein. Some or all of the methods described below may be applied in a terminal as shown in fig. 1.

The invention can be applied to terminal equipment with shooting (at least comprising one of shooting or shooting) function, and the ground product can be an intelligent terminal, such as a mobile phone, a tablet, a DV, a video camera, a portable computer, a notebook computer, an intelligent robot, a television, a security system, an unmanned aerial vehicle and other products provided with cameras. Specifically, the functional module of the present invention may be deployed on a DSP chip of the relevant device, specifically, may be an application program or software therein; the invention is arranged on the terminal equipment, and provides the image processing function through software installation or upgrading and through the calling and matching of hardware.

The method is mainly applied to the scene of taking pictures or videos by the terminal equipment. People have higher and higher requirements on video shooting, and hope that special effect processing of videos is completed while shooting, so that what you see is what you get is the video shooting experience. The invention can divide the main body of the picture or the video and adjust the color of different areas so as to realize the real-time special effect of the picture.

The invention is illustrated below by way of example.

Example 1

Specifically, referring to fig. 2, fig. 2 is a flowchart of an image processing method according to an embodiment of the present invention. The method is carried out in the process of picture shooting, and in the specific implementation process, the terminal can configure a certain shooting mode; the method may include the following steps in the photographing mode:

step 21: an image is acquired (which may also be understood as being taken or acquired).

Specifically, when the user takes a picture, the corresponding preview stream is also displayed on the screen, the preview image may be generally referred to as one of the preview streams, and when the user clicks the shutter, the taken image is obtained, and the size of the image is, for example, but not limited to 1920 × 1080.

Step 22: determining a target area and a background area in the image according to the content (which can be understood as scene semantics) in the captured image, and more specifically, determining the target area and the background area in the image according to the category of an object in the image; wherein, the background area is the part of the image except the target area; the target area corresponds to a target object or target object in the image, i.e. an object or object that the user wants to highlight in the image, which may be related to the user's interactive selection or system settings. Specifically, step 22 may include s221-s 224.

s 221: and (5) image preprocessing.

The captured image of the original size is down-sampled and converted into an image of smaller resolution. Calculation is performed based on the small graph, and the calculation amount can be reduced. In a specific implementation, the original size (e.g., m0 × n0) may be down-sampled to the size of m × n; wherein, the smaller the values of m and n are, the smaller the subsequent calculation amount is; however, if m and n are too small, the subsequent pixel resolution is degraded. Experiments show that the reasonable value intervals of m and n are [128, 512], more specifically [256, 300], and m and n can be equal or unequal. For example, a 1920 x 1080 image may be downsampled to 256 x 256.

s 222: and inputting the downsampled m x n image into a neural network for semantic segmentation, and determining an image segmentation template (Mask).

Semantic segmentation refers to pixel-level segmentation of objects in an image, and each pixel can indicate which type of object belongs to; for parts where no category is indicated, the "background" is indicated.

In particular, semantic segmentation may employ a deep learning algorithm based on cnn (conditional Neural networks). The network model based on the CNN is specifically described as follows:

1) performing down-sampling and convolution according to the m x n image; sampling m1 × n1, m2 × n2, … … and mz × nz in a following way, extracting semantic features of the pictures layer by layer to obtain a feature map of m1 × n1, a feature map of m2 × n2 and a feature map of … … and mz × nz, namely multi-scale semantic features; wherein m1, m2, … … and mz are in a multiple relation and are less than m; n1, n2, … … and nz are in a multiple relation and are less than n. For example, m 2m1 m 2m … … m2^z*mz；n＝2n1＝4n2＝，……，＝2^zNz. The value and the multiple relation of z can be determined according to the performance of the algorithm and the design requirement.

2) And performing convolution and upsampling according to the feature maps of m1 x n1, m2 x n2, … … and mz x nz to fuse the multi-scale semantic features.

For the above-mentioned convolution, down-sampling and up-sampling methods, techniques known in the art can be adopted, and the methods are not limited or enumerated in the present invention.

3) Determining the classes to be identified of the image, calculating the score of each class on each pixel, taking the object class (the class can be abbreviated as the class) with the maximum score as the classification result of the pixel, and finally obtaining a Mask image, namely a segmentation template.

For example, if the terminal can identify k object categories (e.g., at least one of human, animal, plant, other preset object, background, etc.), k images can be obtained; each pixel in the image will get a score value belonging to a certain category, and a higher score indicates a higher probability that the pixel belongs to the category.

Once the category is determined for any one pixel, it can be identified, such as a person with 1, a vehicle with 2, an animal with 3, a plant with 4, a background with 0, etc. By way of example only, and not by way of limitation. The user can design the classification number, the classification and the identification method at will according to the design requirements. One specific example may be as shown in fig. 3, where the pixel areas of the vehicle are all classified as vehicles by the neural network and are labeled as 1, and the pixel areas of the surrounding background portion are all classified as background by the neural network and are labeled as 0. For another example, in the segmentation template output by the neural network, the regions of the same object have the same label, for example, the label of the background is 0, the label of the cat is 1, and the label of the skateboard is 2. In the segmentation template as shown in fig. 4, the same color can also be used to represent labels of the same category, such as people, horses, and backgrounds, which are respectively identified by different colors.

Mask is the result of semantic segmentation algorithm, all pixels belonging to a certain kind of object in the image are labeled with a certain color or mark, and the background is also labeled with a certain color or mark, so that the processed image is called Mask to facilitate the visual display of the segmentation result.

The content of the image may include a subject and a background, and for convenience of description, the image segmentation template may include a subject template (object template) and a background template. The subject template may correspond to a subject recognized by the segmentation method, including an individual that the user wants to highlight in the photographed or photographed image, such as a person, an animal, a plant, or some specific object (cup, table, clothes, decoration … …), etc.; the background template corresponds to other regions of the image that are not identified as subject templates; the image segmentation template corresponds to the entire image. The recognition capability of the subject template is related to the performance of the neural network, e.g., some neural networks can only recognize people and background; some neural networks can recognize people, cars, and backgrounds; some neural networks can only identify cars and backgrounds; some neural networks are capable of recognizing humans, animals and backgrounds; some neural networks can only identify animals and backgrounds, and some neural networks can identify animals, plants and backgrounds … …

It should be understood that a picture may have only a subject and only a background, and when only a subject is present, the picture may be identified as a background, and these settings may be flexibly designed and determined by a user.

Training of the deep neural network requires the use of a large amount of segmentation training data, the training data set comprising a large number of images containing segmentation classes, including the input image and the segmentation template map. The training set can cover various typical application scenes of the segmented object and has diversity of data. Training the network by using the input images and the segmentation template pictures in the training set to obtain excellent network parameters, namely obtaining the satisfactory segmentation performance of the user; and using the obtained network parameters as the calculation parameters of the finally used neural network.

s 223: and determining a target template according to the segmentation template.

For different images and neural networks with different capabilities, various segmentation templates can be obtained, and the terminal can further determine which templates in the segmentation templates correspond to the objects which need to be highlighted and remarkably displayed most, namely the target template needs to be determined. The determination of the target template includes, but is not limited to, the following several ways.

Mode 1: if the segmentation template has only one subject template and one background template, the subject template is determined as the target template.

Specifically, assume that semantic segmentation is performed on an image to obtain k segmentation templates; wherein the k segmentation templates correspond to different object classes; if k is 2 and the 2 segmentation templates include 1 object template and 1 background template, the image region corresponding to the object template is determined as the target region and the region corresponding to the background template is determined as the background region.

As shown in fig. 5, the segmentation templates of the image output by the neural network only have the subject template a1 and the background template, and a1 may be determined as the target template.

Mode 2: if a plurality of main body templates and background templates exist in the segmentation template, if the number of pixels contained in any main body template is greater than a certain threshold value, determining the main body template as a target main body; and if the number of pixels contained in any one main body template is less than a certain threshold value, re-identifying the main body template and also identifying the main body template as the background. The number of pixels contained by the subject template may refer to the number of pixels contained by the connected region of the image of the individual.

Specifically, assume that semantic segmentation is performed on an image to obtain k segmentation templates; wherein the k segmentation templates correspond to different object classes; if k is greater than 2 and the number of pixels contained in k0 object templates in the k segmentation templates is greater than a preset threshold, determining image areas corresponding to k0 object templates as target areas and determining image areas corresponding to the rest segmentation templates as background areas; wherein k0 is a non-negative integer less than k.

As shown in fig. 6, the segmentation templates of the image output by the neural network include main templates a1 and a2, and a background template. If the number of pixels contained in a1 is greater than the preset threshold and the number of pixels contained in a2 is not greater than the preset threshold, a1 is determined as the target template, the subject template a2 is re-identified as the background template, and the re-identified template may be as shown in fig. 5. If the number of pixels contained in A1 is greater than the preset threshold value, and the number of pixels contained in A2 is also greater than the preset threshold value, both A1 and A2 are determined as the target templates. If the number of pixels contained in A1 and A2 is not greater than the preset threshold, A1 and A2 are re-identified as background templates, that is, the image has no subject template.

It should be understood that in a particular implementation, a1, a2 may be of the same category or of different categories.

Mode 3: if a plurality of main body templates and background templates exist in the segmentation template, selecting the main body template with the largest number of pixels as a target template; and re-identifying other subject templates as background templates.

Specifically, assume that semantic segmentation is performed on an image to obtain k segmentation templates; wherein the k segmentation templates correspond to different object classes; if k is larger than 2, determining the image area corresponding to the segmentation template with the largest number of pixels in the k segmentation templates as a target area, and determining the image areas corresponding to the rest segmentation templates as background areas.

As shown in fig. 6, if there are the subject templates a1, a2 and the background template in the segmentation templates of the image output by the neural network, a1 having the largest number of pixels is determined as the target template, the subject template a2 is re-identified as the background template, and the re-identified template may be as shown in fig. 5.

Mode 4: and if a plurality of main body templates and background templates exist in the segmentation template and a plurality of categories exist in the main body templates, determining the target template according to the priority of the categories. For example, a person template is higher priority than a vehicle template, then the person template is the target template and the vehicle template may be re-identified as the background. For example, a person template is higher than an animal template than a plant template, and if the system sets the priority to be higher than the plant template, both the person template and the animal template are subject templates, and the plant template can be re-identified as the background. It should be understood that there may be one or more individuals who belong to the uniform category template.

Specifically, assume that semantic segmentation is performed on an image to obtain k segmentation templates; wherein the k segmentation templates correspond to different object classes; if k is larger than 2, determining a target template in the k segmentation templates according to the preset priority of the object class; and determining the image area corresponding to the target template as a target area, and determining the image areas corresponding to the rest of the segmentation templates as background areas.

As shown in fig. 7, among the segmentation templates of the image output by the neural network are a main template a1, a background template B1, a1 and a B1 are in different categories, and a1 has a higher priority than B1; if the system setting includes that the theme templates with the priorities of B1 and B1 can be used as the target templates, A1 and B1 are both target templates; if the system sets the topic template of priority above B1 as the target template, it will be determined that a1 is the target template and B1 is re-identified as the background template.

Mode 5: if a plurality of main body templates and background templates exist in the segmentation template, the target template can be determined according to the selection operation input by the user, wherein the input mode includes but is not limited to a touch screen, voice and other selection instructions. The user selects which individual, and the corresponding subject template of which individual is the target template.

Specifically, assume that semantic segmentation is performed on an image to obtain k segmentation templates; wherein the k segmentation templates correspond to different object classes; if k is larger than 2, determining a target template in the k segmentation templates according to a selection instruction of a user; and determining the image area corresponding to the target template as a target area, and determining the image areas corresponding to the rest of the segmentation templates as background areas.

As shown in fig. 7, the segmentation templates of the image output by the neural network include main templates a1 and B1 and a background template. If the user clicks the individual corresponding to A1 on the touch screen during the photographing process, A1 is determined as the target template, and B1 is re-identified as the background template. If the user clicks the individual corresponding to the B1 on the touch screen in the photographing process, determining the B1 as the target template; and re-identifies a1 as a background template.

Mode 6: if a plurality of main body templates and background templates exist in the segmentation template and a plurality of categories exist in the plurality of main body templates, the target template can be determined according to the selection operation input by the user, wherein the input mode includes but is not limited to a touch screen, voice and other selection instructions. The user selects which individual, which corresponds to all subject templates of the category, is the target template.

As shown in fig. 8, the segmented templates of the image output by the neural network include main templates a1, a2, B1, B2 and background templates, where a1 and a2 are in the same category, and B1 and B2 are in the same category. If the user clicks the individual corresponding to the A1 on the touch screen in the photographing process, the A1 and A2 in the same category are determined as target templates, and the B1 and B2 are re-identified as background templates. If the user clicks the individual corresponding to the B2 on the touch screen in the photographing process, determining the B1 and the B2 in the same category as the target template; and re-identify a1, a2 as background templates.

It should be understood that the above-mentioned modes are only examples and should not be construed as limitations, and the above-mentioned modes can be freely combined without violating logic, so that after the image is divided into templates, one or more target templates can be obtained, the target templates can have only one category or can contain a plurality of categories, and one or more individuals can also exist in each category; the displayed result is related to the set rules of the terminal system for determining the target template and the input of the user, and in some scenes, only the background template may be contained in one image.

s 224: a target region and a background region are determined in the original image.

The segmentation template is up-sampled to the original size of the shot image, a target template and a background template in the segmentation template are also up-sampled, the up-sampled target template corresponds to a region formed by all pixels in the original image, namely a target region, and the up-sampled background template corresponds to a region formed by all pixels in the original image, namely a background region.

Step 23: processing a target area and a background area in the image by adopting different color processing modes to obtain a target image; and processing by adopting different color processing modes to ensure that the chromaticity of the target area is greater than that of the background area or the brightness of the target area is greater than that of the background area. I.e. the chrominance of the target area is larger than the chrominance of the background area in the target image or the luminance of the target area is larger than the luminance of the background area.

Specifically, a first color processing mode and a second color processing mode are respectively adopted for a target area and a background area in an image. Including but not limited to the following:

mode 1: the first color processing mode is reserved color, the second color processing mode is a filter, and if the color of the background area is converted into black and white; typical filters also include any of black and white, darkened, retro, film, blur, blurring, and the like.

For example, the black-and-white filter maps each pixel value to a gray scale value, so as to realize the effect of the black-and-white filter; for another example, the darkening filter is a special effect of reducing the brightness of each pixel value to achieve darkening.

Mode 2: the first color processing mode is a first filter mode, the second color processing mode is a second filter mode, and the first filter mode and the second filter mode are different. The first filter mode obtains higher image chroma than the second filter mode for the same image.

Mode 3: the first color processing mode is a third filter mode, the second color processing mode is a fourth filter mode, and the third filter mode is different from the fourth filter mode. For the same image, the brightness of the image obtained by the third filter mode is higher than that obtained by the fourth filter mode.

It is understood that color/hue is commonly represented by lightness, which is the property of a color that excludes lightness, which reflects the hue and saturation of the color, and chroma, which refers to the brightness of the color. Color processing thus includes processing of luminance and/or chrominance.

Specifically, the filter may include adjusting chromaticity, brightness, hue, and the like, and may further include superimposing texture, and by adjusting chromaticity and hue, one color system may be adjusted in a targeted manner to be darker, lighter, or change hue, while the other color systems are unchanged. The filter can also be understood as a pixel-to-pixel mapping, and the pixel value of the input image is mapped to the pixel value of the target pixel through a preset mapping table, so as to realize a special effect. It should be understood that the filter may be a pre-defined parameter template, and the color-related parameters may be parameters in a filter template known in the art or may be parameters designed autonomously by the user.

Additionally, after step 23, the method further comprises step 24: the picture processed in step 23 is saved.

According to the invention, in the shooting process, the terminal can determine the target individual and the background according to the picture content, and different color processing is applied to the target individual and the background, so that the picture shot by the user can be more prominent in the main body, and the shot picture has a large film feeling and is like a movie.

Example 2

Specifically, the video recording and image photographing processing method of the present invention is similar to the image photographing processing method, and the difference is that the object of the photographing processing is one image, and the object of the video recording processing is a continuous video frame, that is, a plurality of continuous images, which may be a complete video, a segment of a complete video, or a user-defined video segment of a certain time period interval. The processing flow for each frame of image in the video or video clip can refer to the processing method in example 1 above.

Specifically, the image processing method in the captured video may include the steps of:

step 31, acquiring N frames of shot images, wherein N is a positive integer, and performing the operations of steps 32 to 33 on each frame of image, wherein the N frames of images can be adjacent video frames, and the sum of the N frames of images can be understood as a video; the N frame images may also be in a non-adjacent relationship.

An alternative implementation may be the same as step 22, step 32.

Step 33, an alternative implementation may be the same as step 23.

Additionally, since the video is a continuous image composition, the individual determination method is also related to the time sequence, so step 33 can be implemented more abundantly besides step 23. Optionally, any manner of confirming the subject in s223 may have a time delay, for example, a person and a background are determined in the L1 th frame, and through comparison between the pixel marker and the template, the person in the images may still be determined as the subject in the L1+1 th frame to the L1+ L0 th frame, and the corresponding region thereof is the target region 0. It may not be necessary to determine the subject and background every frame. The time of each body confirmation may be autonomously defined by the user, and the body confirmation may be performed periodically, for example, but not limited to, every 2s or every 10s, and the like, and the manner of each body confirmation includes, but is not limited to, 6 manners in s 223.

And step 34, storing the video formed by the N frames of images subjected to the color processing.

By the method and the device, the terminal can determine the target individual and the background according to the content in the video during the video recording process, and different color processing is applied to the target individual and the background, so that the video shot by the user can highlight the main body. The video that makes to shoot has big film sense, just as the film, brings the video and dazzles the cool sense, promotes user experience.

Example 3

The video recording method is similar to the photographed image processing method, and is different in that the photographed image is a single image, and the video recording method is a continuous video frame, that is, a plurality of continuous images. Therefore, the processing flow of each frame image can refer to the processing method in example 1. In some complex scenes of shooting videos, some areas in the images may be falsely detected, if the same area is marked as a target or a background in adjacent frames, the same area is processed into different colors according to the method for processing colors in the above example, and the change of the color of the same area in the adjacent frames causes sensory flicker, so that the flicker needs to be judged and eliminated in the processing process. Flicker can be understood as a false positive for object classes.

A method for judging video flicker includes processing a previous frame segmentation template based on optical flow to obtain a segmentation template based on optical flow, comparing optical flow segmentation template with segmentation template of current frame, judging video flicker not to be flicker if coincidence degree or similarity exceeds a certain ratio, otherwise judging video flicker to be flicker. Additionally, it should be understood that determining flicker is a continuous process. Optionally, a specific method for determining whether flicker exists is as follows:

1) first, the optical flows of the adjacent frames are calculated, and the optical flows indicate the displacement relation of pixels in the front and rear frames (the t-1 frame and the t frame).

2) And obtaining a segmentation template of the t-1 frame, and calculating an optical flow segmentation template F of the t frame according to the segmentation template of the t-1 frame and the optical flow information of the t-1 frame and the t frame, wherein the optical flow segmentation template is calculated according to the optical flow.

3) Obtaining a segmentation template S of the t frame;

4) and counting a pixel set SF of a main body in the optical flow segmentation template F, counting a pixel set SS of the main body in the segmentation template S, calculating the numbers of pixels of a union set and an intersection of the SF and the SS to be Nu and Ni respectively, and when (Nu-Ni)/Nu is greater than a certain threshold value, considering that the difference between the segmentation templates of the t-1 frame and the t frame of the adjacent frame is large, and judging that the t-1 frame and the t frame flicker, or understanding that the t-th frame flickers. Wherein a large difference indicates that the same object may be misjudged as a different category. For example, the same individual in the t-1 frame and the t frame is determined to be a human and a monkey, respectively.

Alternatively, if the number of adjacent image groups, in which the same object is determined to be of different categories, in the first N0 (positive integer greater than 2) images of the current image is greater than the preset threshold, it may be determined that the flicker abnormality processing needs to be performed on the current image. If the number of the adjacent image groups of the same object judged to be of different types is not greater than the preset threshold value, the current frame can be judged to need to be subjected to flicker abnormity processing.

Alternatively, for example, for a predetermined number of historical adjacent frames or a preset number of historical frames, if more than half of the frames are determined to flicker (for example, 3 video frames in the previous adjacent 5 frames of the current video frame are determined to flicker), it may be determined that the current frame needs to be subjected to flicker exception processing; if no more than half of the frames are determined to flicker (for example, 1 video frame is determined to flicker in the previous 5 adjacent frames of the current video frame), it may be determined that flicker exception processing is not required for the current frame.

It should be understood that a current video image may be understood as an image being recorded at a certain moment, where a certain moment may be understood as a generally-indicated moment in some scenes; in some scenarios, it may also be understood as a specific time, such as the latest time, or a time of interest to the user.

Specifically, the image processing method in the captured video in the present example may include the steps of:

step 41: acquiring N frames of shot images, wherein N is a positive integer, and performing the operations of the steps 32 to 33 on each frame of image, wherein the N frames of images can be adjacent video frames, and the sum of the N frames of images can be understood as a video; the N frame images may also be in a non-adjacent relationship.

Step 42: it is determined whether the number of adjacent image groups in which flicker occurs in the first N0 frame of the current frame (current image) is greater than a preset threshold. The N0 and the threshold may be set by a user, for example, N0 is the number of selected historical video frame samples, and the threshold may be 1/2 or 2/3 of N0, which is by way of example and not limitation.

And if the judgment result is not greater than the preset threshold, executing the operation of the steps 42-43 on the current shot or collected image.

Step 43, an alternative implementation may be the same as step 32.

Step 44, an alternative implementation may be the same as step 33.

If the judgment result is greater than the preset threshold, the operation of step 44 is performed on the currently shot or acquired image.

Step 45: and processing all image areas of the current frame by adopting the same color processing method to obtain a target image. The same color processing method may be the same as that of the background region in the previous frame, or may be the same as that of the target region in the previous frame, or may be the same as that of the entire image of the previous frame. For example, the same color processing method as that for the background region in step 33(23) may be applied to the whole image; the same color processing method as that for the target region in step 33(23) may be applied to the entire image. For example, the image may be entirely color-retaining, or entirely black and white, or the image may be entirely color-processed in a first or second manner (including, but not limited to, the color processing described in example 1 above).

In this case, the template segmentation process as in step 22 may be present or omitted for the current frame, which is not limited in this example.

After step 45, step 46 is performed: and storing the video formed by the N frames of images subjected to the color processing. N is a positive integer.

Example 4

In some application scenarios, the content of the picture taken by the user often changes, so that the change of the picture body often occurs, and the user also wants to freely select the color processing mode of the body in different pictures to realize autonomous control of the video style.

The image processing method in the process of shooting the video can comprise the following steps:

step 51: acquiring a video frame;

step 52: for any video frame acquired from the video; determining a main body area and a background area in the video frame;

step 53: any color processing mode can be adopted at any time for the main body area, and any color processing mode can be adopted at any time for the background area. But it needs to ensure that, for any image, the brightness or chroma of the main body area after color processing is higher than that of the background area after color processing; alternatively, for any image, the color processing method applied to the main area is higher in chroma or brightness than the image obtained by the color processing method applied to the background area.

Example 5

In some application scenarios, the content of the picture taken by the user often changes, so that the change of the picture body often occurs, and the user also wants to freely select the color processing mode of the body in different pictures to realize autonomous control of the video style. Especially for color change over time.

step 61: acquiring N1 images in a first period; acquiring N2 images in a second time period; wherein the first time interval and the second time interval are adjacent time intervals, and both N1 and N2 are positive integers; the first time period and the second time period may be a time period in which the user can distinguish the image change with the naked eye, and N1 and N2 are determined by the frame rate at the time of recording and the time period, which is not limited in the present invention.

Step 62: for each of the N1 images, determining a first target region and a first background region in the image; the first background area is a part of the image except the first target area; wherein the first target region in each of the N1 images corresponds to a first object (which may comprise at least one object); for each of the N2 images, determining a second target region and a second background region in the image; the second background area is the part of the image except the second target area; wherein the second target area in each of the N2 images corresponds to a second object (which may contain at least one object);

and step 63: processing the first target area by adopting a first color processing mode, processing the first background area by adopting a second color processing mode, processing the second target area by adopting a third color processing mode, and processing the second background area by adopting a fourth color processing mode to obtain a target video; in the target video, the chroma of the first target area is greater than that of the first background area, or the brightness of the first target area is greater than that of the first background area; and the chromaticity of the second target region is greater than the chromaticity of the second background region, or the luminance of the second target region is greater than the luminance of the second background region.

Example 6

In some application scenarios, the content of the picture taken by the user often changes, so that the picture body is often changed, and the user also wants to freely select the target body to be highlighted in different pictures. For example, the image region corresponding to the first object is determined as the target region in the first period, the image region corresponding to the second object is determined as the target region in the second period, and the first object and the second object are different objects or individuals or categories.

In this scenario, the image processing method in the video shooting process may include the following steps:

step 71: an alternative implementation may be the same as step 61.

Step 72: determining a first target area and a first background area in any one of the N1 images according to the image content; determining a second target area and a second background area in any one of the N2 images according to the image content; the object corresponding to the second target area is different from or has different types from the object corresponding to the first target area, so that the system and the user can autonomously select the target main body and the target area of the image. An image is composed of a subject and a background, and correspondingly, is composed of a target area and a background area.

For example, the first object is a human and the second object is an animal; for example, the first object is a character A, and the second object is a character B; for example, the first object is two characters, the second object is a dog and the remaining unrecognized areas of the two cats … … are marked as background.

The method may determine the image segmentation template by the methods as s221 and s222, but the subsequent method is not limited to determining the target object in the segmentation template for each image.

Optionally, in the image segmentation template, the user may freely input, the first object and the second object are determined by a selection instruction input by the user, for example, the user selects an individual, the system may identify a pixel corresponding to the user input instruction, and further identify which individual (/ may be at least one individual) or which category (/ may be at least one category) the template selected by the user is, further determine which individual (/ s) or all individuals under which category (/ s) is the first object, determine the first object or the corresponding image region as the first target region, and may maintain for a period of time, that is, in the next several frames, the regions corresponding to the segmentation template corresponding to the first object are all the first target regions until the user selects other individuals at the next time, and determining the area corresponding to the new individual as a second target area according to the similar method. In one image, an image region other than the first target region or the second target region is a background region. Namely, the region corresponding to the corresponding segmentation template of the first object in the first time interval is a first target region; that is, the region corresponding to the segmentation template that becomes the second object in the second period is the second target region.

Alternatively, in the image segmentation template, the system may determine the target template of the image in a time period in the image segmentation template according to a preset time interval (for example, but not limited to, 1s, 2s, etc.) or a preset number of frames (for example, but not limited to, 50 frames, 100, etc.). For example, a 101 th frame determines a first target template, and for each of the next 102 to 200 frames, a segmentation template with the same category or the same individual as the first target template in the 101 th frame is adopted as the first target template; until frame 201, a second target template is determined, and for each of the next 202-200 frames, a segmented template of the same type or individual as the second target template in frame 201 is used as the second target template, it should be understood that the numbers in the above example can be defined in advance according to the user or the system. I.e. the target template is determined at some point and applied for a period of time with this type of template or the template of the individual.

The method for determining the first target template and the first second target template may refer to, but is not limited to, any one of the 6 manners in step s 223. Thus, the first target template and the second target template may be of the same category or the same individual, or may be of different categories or different individuals. Related to the recognition capability of the network, the change of scene pictures, or the input command of the user.

In addition, the first target area and the first background area, and the second target area and the second background area are further determined according to the method as s 224. This example is not described in detail.

Step 73: an alternative implementation may be the same as step 63.

In addition, since the time periods may be different in this example, there are many combinations of the methods of color processing.

Such as: the first color processing mode is the same as the third color processing mode, and the second color processing mode is the same as the fourth color processing mode. This way of color processing is consistent well.

Such as: the first color processing mode is the same as the third color processing mode, and the second color processing mode is different from the fourth color processing mode. The color processing mode enables the color of the target body to be consistent, and the color of the background to be changed, so that the overall vision is more cool.

Such as: the first color processing mode is different from the third color processing mode, and the second color processing mode is the same as the fourth color processing mode. The color processing mode keeps the color of the background consistent, and the color of the target body changes, so that the target body is more prominent.

Such as: the first color processing mode is different from the third color processing mode, and the second color processing mode is different from the fourth color processing mode. The color processing mode can provide more color transformation modes and can provide more color coordination under the requirements of different scenes.

The first color processing mode or the third color processing mode includes: color preserving, or color enhancing, etc. filters; the second color processing mode or the fourth color processing mode includes: black and white, dark, retro, film, fuzzy, blurring, etc. filters.

Specifically, the color processing method for the target area and the background area of the same image may refer to step 23. (wherein, for N2 images, the third color processing mode and the fourth color processing mode are similar to the first color processing mode and the second color processing mode, respectively.)

By the scheme, in some scenes, a user can freely select the color processing mode of the background in different pictures to realize different background setbacks. In some scenarios, a user may freely select the color processing mode of a subject in different pictures to achieve different degrees or forms of subject setbacks.

It is to be understood that the signals referred to by the same reference numerals may have different origins or may be obtained in different manners in different examples of the present invention, and are not to be construed as limiting. In addition, in the step references of different examples, the "synchronous step xx" is more focused on that the signal processing logics of the two are similar, the input and the output of the two are not limited to be identical, the flow of the two methods is not limited to be identical, and reasonable references and modifications can be brought by those skilled in the art to be within the protection scope of the present invention.

The invention provides an image processing method, which comprises the steps of determining a target area and a background area in an image by performing template segmentation on the image; by applying different color processing modes to the target area and the background area, the brightness or the chroma of the target area is higher than that of the background area, so that the theme corresponding to the target area is more remarkably highlighted, and the special effect of the movie is realized.

Based on the image processing method provided by the above embodiment, an embodiment of the present invention provides an image processing apparatus 900; the apparatus can be applied to various terminal devices, and can be implemented in any form of terminal 100, such as a terminal including a camera function, please refer to fig. 9, and the apparatus includes:

the shooting module 901 is used for acquiring images, and may be shooting photos or shooting videos. This module is particularly adapted to perform the method mentioned in the above examples in step 21, step 31, step 51, step 61, or step 71, and equivalents thereof; the module can call corresponding program instructions in the memory by the processor to control the camera to acquire images.

A determining module 902, configured to determine a target area and a background area in an image according to image content. This module is particularly intended to perform the method mentioned in the above example in step 22, step 32, step 52, step 62 or step 72, and equally as an alternative; the module can be realized by calling corresponding program instructions in the memory by the processor to realize a corresponding algorithm.

A color processing module 903, configured to apply different color processing manners to a target area and a background area in an image to obtain a target image or a target video; the chromaticity of the target area is made larger than that of the background area, or the luminance of the target area is made larger than that of the background area. This module is particularly adapted to perform the method mentioned in the above example in step 23, step 33, step 53, step 63, or step 73, and equivalents thereof; the module can be realized by a processor calling corresponding program instructions in a memory through a certain algorithm.

The apparatus may further comprise a save module 904 for storing the color processed image or video.

The explanation, the expression, and the extension of various implementation forms of the technical features in the above specific method examples and embodiments are also applicable to the method execution in the apparatus, and are not repeated in the apparatus embodiments.

The invention provides an image processing device, which determines a target area and a background area in an image according to image content by performing template segmentation on the image; by applying different color processing modes to the target area and the background area, the brightness or the chroma of the target area is higher than that of the background area, so that the theme corresponding to the target area is more remarkably highlighted, and the special effect of the movie is realized.

Based on the image processing method provided by the above embodiment, the embodiment of the present invention further provides an image processing apparatus 1000; the apparatus can be applied to various terminal devices, and can be implemented in any form of terminal 100, such as a terminal including a camera function, please refer to fig. 10, and the apparatus includes:

the shooting module 1001 is used for acquiring images, and may be for taking photos or videos. This module is particularly adapted to perform the method mentioned in the above examples in step 21, step 31, step 51, step 61, or step 71, and equivalents thereof; the module can call corresponding program instructions in the memory by the processor to control the camera to acquire images.

The determining module 1002 is configured to determine whether a number of frames in a first N0 frame of the current frame, where flicker occurs, is greater than a preset threshold. If the determination result is not greater than the preset threshold, the determining module 1002 continues to trigger the determining module 1003 and the color processing module 1004 to execute the related function; if the determination result is greater than the preset threshold, the determining module 1002 continues to trigger the flicker repairing module 1005 to execute the related function. This module 1002 is particularly adapted to carry out the method mentioned in step 42 of the above example and equivalent alternatives; the module can be realized by calling corresponding program instructions in the memory by the processor to realize a corresponding algorithm.

A determining module 1003, configured to, when the determining module 1002 determines that the number of frames with flicker in the first N0 frames of the current frame is not greater than the preset threshold; and determining a target area and a background area in the image according to the image content. This module is particularly intended to perform the method mentioned in the above examples in step 22, step 32, step 43, step 52, step 62, or step 72, and equally applicable; the module can be realized by calling corresponding program instructions in the memory by the processor to realize a corresponding algorithm. A color processing module 1004, configured to apply different color processing modes to a target area and a background area in an image; the chromaticity of the target area is made larger than that of the background area, or the luminance of the target area is made larger than that of the background area. This module is particularly adapted to perform the method mentioned in the above examples in step 23, step 33, step 44, step 53, step 63, or step 73, and equally alternative methods; the module can be realized by a processor calling corresponding program instructions in a memory through a certain algorithm.

A flicker restoration module 1005, configured to, when the determining module 1002 determines that the number of frames with flicker in the first N0 frames of the current frame is greater than a preset threshold; the same color processing method is adopted for all image areas of the current frame. The same color processing method may be the same as that of the background region in the previous frame, or may be the same as that of the target region in the previous frame. This module is particularly intended to carry out the method mentioned in step 45 of the above example and equivalent methods; the module can be realized by a processor calling corresponding program instructions in a memory through a certain algorithm.

The apparatus 1000 may further include a storage module 1006 for storing the color processed image or video.

It should be understood that the division of the modules in the above apparatus is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. For example, each of the above modules may be a processing element separately set up, or may be implemented by being integrated in a certain chip of the terminal, or may be stored in a storage element of the controller in the form of program code, and a certain processing element of the processor calls and executes the functions of each of the above modules. In addition, the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit chip having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software. The processing element may be a general-purpose processor, such as a Central Processing Unit (CPU), or may be one or more integrated circuits configured to implement the above method, such as: one or more application-specific integrated circuits (ASICs), one or more microprocessors (DSPs), one or more field-programmable gate arrays (FPGAs), etc.

It is to be understood that the terms "first," "second," and the like in the description and in the claims, and in the drawings, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Moreover, the terms "comprises," "comprising," and any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or modules is not necessarily limited to those steps or modules explicitly listed, but may include other steps or modules not expressly listed or inherent to such process, method, article, or apparatus.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While some embodiments of the invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the recited embodiments and all such alterations and modifications as fall within the scope of the invention. It will be apparent to those skilled in the art that various modifications and variations can be made in the embodiments of the present invention without departing from the spirit or scope of the embodiments of the invention. If such modifications and variations of the embodiments of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention also includes such modifications and variations.

Claims

1. An image processing method applied to a captured video, the method comprising:

acquiring N1 images, the N1 images corresponding to a first scene;

in acquiring the N1 images, for each of the N1 images,

determining a first target area and a first background area in the image according to the category of an object in the image by using a preset neural network; the first background area is a part of the image except the first target area;

reserving color for the first target area, and performing black and white processing on the first background area to obtain N1 color-processed images; wherein the first target region in each of the N1 images corresponds to a first object; the first target region corresponds to a segmentation template for a first object class; the first object category comprises at least one of a person, an animal, a plant, a vehicle or other preset objects;

acquiring N2 images, the N2 images corresponding to a second scene; the shooting contents of the first scene and the second scene are different, and both N1 and N2 are positive integers;

in acquiring the N2 images, for each of the N2 images,

determining a second target area and a second background area in the image according to the type of the object in the image by using the preset neural network; the second background area is a part of the image except the second target area;

reserving color for the second target area, and performing black-and-white processing on the second background area to obtain N2 color-processed images; wherein the second target region in each of the N2 images corresponds to a second object; the second target region corresponds to a segmentation template for a second object class; the second object category comprises at least one of a person, an animal, a plant, a vehicle or other preset objects; the first object class and the second object class are different and are not background; the first background area and the second background area correspond to a segmentation template of a background;

and obtaining the target video according to the N1 images and the N2 images after color processing.

2. The method of claim 1, wherein the first object or the second object comprises at least one individual.

3. The method of claim 1 or 2, wherein for one image of said N1 images,

the number of pixels of the first object in the image is larger than a preset threshold value.

4. The method of claim 1 or 2, wherein for one image of said N1 images,

the number of pixels corresponding to the first object in the image is greater than the number of pixels corresponding to other objects of the same category in the image.

5. The method of claim 1 or 2, wherein for one image of said N1 images,

the class to which the first object belongs has a higher priority determined to be a subject than the classes to which other objects in the image belong.

6. The method of claim 1 or 2, wherein for one image of said N2 images,

the number of pixels of the second object in the image is larger than a preset threshold value.

7. The method of claim 1 or 2, wherein for one image of said N2 images,

the number of pixels corresponding to the second object in the image is greater than the number of pixels corresponding to other objects of the same category in the image.

8. The method of claim 1 or 2, wherein for one image of said N2 images,

the class to which the second object belongs has a higher priority determined to be a subject than the classes to which other objects in the image belong.

9. An image processing apparatus, which is applied to shooting a video, the apparatus comprising:

a capture module to acquire N1 images, the N1 images corresponding to a first scene; also for acquiring N2 images, the N2 images corresponding to a second scene; the shooting contents of the first scene and the second scene are different, and both N1 and N2 are positive integers;

the determining module is used for determining a first target area and a first background area in each of the N1 images according to the category of an object in the image by using a preset neural network; the first background area is a part of the image except the first target area; wherein the first target region in each of the N1 images corresponds to a first object; the first target region corresponds to a segmentation template for a first object class; the first object category comprises at least one of a person, an animal, a plant, a vehicle or other preset objects;

the preset neural network is further used for determining a second target area and a second background area in each of the N2 images according to the category of the object in the image; the second background area is a part of the image except the second target area; wherein the second target region in each of the N2 images corresponds to a second object; the second target region corresponds to a segmentation template for a second object class; the second object category comprises at least one of a person, an animal, a plant, a vehicle or other preset objects; the first object class and the second object class are different and are not background; the first background area and the second background area correspond to a segmentation template of a background;

a color processing module, configured to reserve colors for the first target area and the second target area, and perform black-and-white processing on the first background area and the second background area to obtain N1 images and N2 images after color processing; and is also used for obtaining the target video according to the N1 images and the N2 images after color processing.

10. The apparatus of claim 9, wherein the first object or the second object comprises at least one individual.

11. The apparatus according to claim 9 or 10, wherein for one image of said N1 images,

12. The apparatus according to claim 9 or 10, wherein for one image of said N1 images,

13. The apparatus according to claim 9 or 10, wherein for one image of said N1 images,

14. The apparatus according to claim 9 or 10, wherein for one image of said N2 images,

15. The apparatus according to claim 9 or 10, wherein for one image of said N2 images,

the number of pixels corresponding to the second object is greater than the number of pixels corresponding to other objects of the same category in the image.

16. The apparatus according to claim 9 or 10, wherein for one image of said N2 images,

17. The terminal equipment is characterized by comprising a camera, a memory, a processor and a bus; the camera, the memory and the processor are connected through the bus;

the camera is used for collecting images;

the memory for storing computer programs and instructions;

the processor is for invoking the computer program, instructions and captured image stored in the memory for performing the method of any one of claims 1-8.

18. The terminal device of claim 17, further comprising an antenna system, the antenna system being under control of the processor for transceiving wireless communication signals for wireless communication with a mobile communication network; the mobile communications network comprises one or more of: GSM networks, CDMA networks, 5G networks, FDMA, TDMA, PDC, TACS, AMPS, WCDMA, TDSCDMA, WIFI, and LTE networks.