WO2020015470A1 - 图像处理方法、装置、移动终端及计算机可读存储介质 - Google Patents

图像处理方法、装置、移动终端及计算机可读存储介质 Download PDF

Info

Publication number
WO2020015470A1
WO2020015470A1 PCT/CN2019/089941 CN2019089941W WO2020015470A1 WO 2020015470 A1 WO2020015470 A1 WO 2020015470A1 CN 2019089941 W CN2019089941 W CN 2019089941W WO 2020015470 A1 WO2020015470 A1 WO 2020015470A1
Authority
WO
WIPO (PCT)
Prior art keywords
preview image
image
target
background
facial expression
Prior art date
Application number
PCT/CN2019/089941
Other languages
English (en)
French (fr)
Inventor
陈岩
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2020015470A1 publication Critical patent/WO2020015470A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/80Camera processing pipelines; Components thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/167Detection; Localisation; Normalisation using comparisons between temporally consecutive images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/61Control of cameras or camera modules based on recognised objects
    • H04N23/611Control of cameras or camera modules based on recognised objects where the recognised objects include parts of the human body

Definitions

  • the present application relates to the field of computer applications, and in particular, to an image processing method, device, mobile terminal, and computer-readable storage medium.
  • Embodiments of the present application provide an image processing method, device, mobile terminal, and computer-readable storage medium, which can coordinate a person image and a background image.
  • An image processing method includes:
  • Identify a scene of the preview image includes a background category and a foreground target
  • the foreground target is a portrait
  • detecting facial expression information of the portrait
  • An image processing device includes:
  • a recognition module configured to identify a scene of the preview image; the scene includes a background category and a foreground target;
  • a detection module configured to detect facial expression information of the portrait when the foreground target is a portrait
  • An adjustment module is configured to adjust a characteristic parameter of a background image in the preview image according to the facial expression information and the background category.
  • a mobile terminal includes a memory and a processor.
  • the memory stores a computer program.
  • the processor causes the processor to perform operations of the image processing method.
  • a computer-readable storage medium stores a computer program thereon, and when the computer program is executed by a processor, the operations of the image processing method are implemented.
  • the image processing method, device, mobile terminal, and computer-readable storage medium obtain a preview image to be processed; identify a scene of the preview image; the scene includes a background category and a foreground target; when the foreground When the target is a portrait, the facial expression information of the portrait is detected; the feature parameters of the background image in the preview image are adjusted according to the facial expression information and the background category, so that the person image and the background image in the processed image are coordinated.
  • FIG. 1 is a flowchart of an image processing method according to an embodiment
  • FIG. 2 is a schematic structural diagram of a neural network in an embodiment
  • FIG. 3 is a schematic diagram of categories of shooting scenes in an embodiment
  • FIG. 4 is a flowchart of a method for identifying a scene of a preview image based on a neural network according to an embodiment
  • FIG. 5 is a schematic structural diagram of a neural network in another embodiment
  • FIG. 6 is a flowchart of a method for identifying a scene of a preview image based on a neural network according to another embodiment
  • FIG. 7 is a schematic diagram of a bounding box of a foreground target in a preview image in an embodiment
  • FIG. 8 is a flowchart of a method for detecting facial expression information of a portrait in an embodiment
  • FIG. 9 is a flowchart of a method for detecting facial expression information of a portrait in another embodiment
  • FIG. 10 is a flowchart of a method for adjusting feature parameters of a background image in a preview image according to an embodiment
  • FIG. 11 is a structural block diagram of an image processing apparatus according to an embodiment
  • FIG. 12A is a schematic diagram of an internal structure of a mobile terminal according to an embodiment
  • FIG. 12B is a schematic diagram of an internal structure of a server according to an embodiment
  • FIG. 13 is a schematic diagram of an image processing circuit in one embodiment.
  • FIG. 1 is a flowchart of an image processing method according to an embodiment. As shown in FIG. 1, an image processing method includes operations 102 to 108.
  • Operation 102 Obtain a preview image to be processed.
  • the image to be processed may be a continuous multi-frame preview image
  • the continuous multi-frame preview image may be a preview image of two consecutive frames or more.
  • the continuous multi-frame preview image may refer to a multi-frame preview image acquired by a camera of a computer device within a preset time. For example, if the camera of the computer device collects three frames of preview images within 0.1 second, the three frames of preview images can be used as consecutive multi-frame preview images.
  • Operation 104 Identify a scene of the preview image.
  • the scene includes background categories and foreground targets.
  • the processor in the mobile terminal recognizes the scene of the preview image based on the neural network.
  • the neural network may be a Convolutional Neural Network (CNN).
  • CNN refers to a neural network model for image classification and recognition developed on the basis of traditional multilayer neural networks. Contrary to traditional multilayer neural networks, CNN introduces convolution algorithms and pooling algorithms. Among them, the convolution algorithm refers to a mathematical algorithm that weights and superimposes data in a local area, and the pooling algorithm refers to a mathematical algorithm that samples and processes data in a local area.
  • the CNN model is composed of a convolution layer and a pooling layer alternately.
  • the input layer 210 inputs a preview image
  • the convolution layer 220 performs image feature extraction on each local area of the image of the input layer and pools
  • the layer 230 samples the image features of the convolutional layer to reduce the dimensionality, and then connects the image features with several layers of fully connected layers 240.
  • the output value of the last hidden layer 250 is the finally extracted features.
  • Scene information is identified based on the finally extracted features, where the scene information includes background category information and foreground target category information.
  • a softmax analyzer is configured after the last hidden layer 250 of the convolutional neural network, and the softmax analyzer is used to analyze the final extracted features to obtain the probability and foreground of the category corresponding to the background in the image The probability that the target corresponds to the category.
  • the training process includes:
  • you can include at least one background training target (including: landscape, beach, snow, blue sky, green space, night view, dark, backlight, sunrise / sunset, indoor, fireworks, spotlight, etc.) and foreground training target (including subject object : Portrait, baby, cat, dog, food, etc.) training images are input to the neural network.
  • the neural network performs feature extraction based on background training targets and foreground training targets, and uses SIFT (Scale-invariant feature), directional gradient histograms.
  • the target is detected to obtain a second prediction confidence.
  • the first prediction confidence level is the confidence level that a pixel of a background region in the training image predicted by the neural network belongs to the background training target.
  • the second prediction confidence is the confidence that a pixel in the foreground region of the training image predicted by the neural network belongs to the foreground training target.
  • the training image may be pre-labeled with a background training target and a foreground training target to obtain a first true confidence level and a second true confidence level.
  • the first true confidence level indicates the confidence level that the pixel point previously marked in the training image belongs to the background training target.
  • the second true confidence level indicates the confidence level that the pixel point previously marked in the training image belongs to the foreground training target.
  • the true confidence can be expressed as 1 (or positive) and 0 (or negative), which are used to indicate that the pixel belongs to the training target and does not belong to the training target, respectively.
  • the difference between the first prediction confidence level and the first true confidence level is obtained to obtain a first loss function
  • the difference between the second prediction confidence level and the second true confidence level is obtained to obtain a second loss function.
  • Both the first loss function and the second loss function can be logarithmic, hyperbolic, or absolute value functions.
  • the first loss function and the second loss function are weighted and summed to obtain a target loss function, and parameters of the neural network are adjusted according to the target loss function to train the neural network.
  • the shooting scene of the training image may include the category of the background region, the foreground target, and others.
  • the categories of the background area may include landscape, beach, snow, blue sky, green space, night scene, dark, backlight, sunrise / sunset, indoor, fireworks, spotlight, and so on.
  • Prospects can be portraits, babies, cats, dogs, food, etc.
  • Others can be text documents, macros, etc.
  • Operation 106 When the foreground object is a portrait, the facial expression information of the portrait is detected.
  • prospects include portraits, babies, cats, dogs, food, and more.
  • a neural network is used to extract the facial feature points of the portrait, and corresponding facial expression information is obtained according to the facial feature points.
  • the facial feature point may be a contour shape of a facial organ or a facial motion feature of a specific area of the face (for example, a facial muscle motion feature of a specific area of the face).
  • the facial expression information may be happy emotional information, or sad emotional information or calm emotional information. Because of the diversity of facial expression changes, no specific limitation is made here.
  • the correspondence between human facial feature points and facial expression information is stored in a database of a computer device in advance, and facial expression information can be obtained by querying the database.
  • a facial behavior coding system is used to detect facial expression information of a portrait.
  • the face behavior coding system (Facial Action Coding System, FACS) is based on the anatomical features of the human face, the human face is divided into a number of both sports unit (Action Unit, AU) interdependent and interrelated, and analyzed The motion characteristics of these motion units and the main areas they control and the expression information associated with them.
  • FACS classifies many human expressions in real life, and defines 7 main emotional expressions to meet the characteristics of constant meaning expressed in different cultural environments. The 7 emotional expressions are Happiness, Happiness, Sadness, Anger, Fear, Surprise, Disgust, Contempt.
  • FACS is an anatomy-based system that can be used to describe facial movements corresponding to each of the above emotions.
  • FACS includes multiple action units (AUs). Each AU describes a group of facial muscles. Multiple AUs together form a specific facial movement. By detecting the facial movement, the facial expression corresponding to the facial movement can be obtained. information.
  • Operation 108 Adjust the feature parameters of the background image in the preview image according to the facial expression information and the background category.
  • the characteristic parameters of the background image in the preview image there are many ways to adjust the characteristic parameters of the background image in the preview image. For example, you can adjust the caption of the background image, you can also adjust the hue of the background image, you can also adjust the brightness of the background image, or adjust the background image. Animation, etc.
  • the background image can be adjusted according to different facial expressions, so that the background image is coordinated with the facial expressions in the person image.
  • the above image processing method obtains a preview image to be processed; identifies a scene of the preview image, the scene including a background category and a foreground target; when the foreground target is a portrait, detecting facial expression information of the portrait; The facial expression information and the background category adjust feature parameters of the background image in the preview image, so that the person image and the background image in the processed image are coordinated, thereby making the image more ornamental.
  • the scene includes a background category and a foreground target.
  • a method for identifying a scene of a preview image includes operations 402 to 410. :
  • Operation 402 Use the basic network of the neural network to extract features from the preview image to obtain feature data.
  • Operation 404 The feature data is input to the classification network of the neural network to perform classification detection on the background of the preview image, and a first confidence map is output. Each pixel in the first confidence level graph represents the confidence that each pixel in the preview image belongs to the background detection target.
  • Operation 406 Input the feature data to the target detection network of the neural network to detect the foreground target of the preview image, and output a second confidence map. Each pixel in the second confidence map represents the confidence that each pixel in the preview image belongs to the foreground detection target.
  • Operation 408 Weight the first confidence map and the second confidence map to obtain a final confidence map of the preview image.
  • Operation 410 Determine the background category and the foreground target category of the preview image according to the final confidence map.
  • the neural network includes a basic network 510, a classification network 520, and a target detection network 530.
  • the basic network 510 is used to extract the characteristic data of the preview image; the characteristic data is then input to the classification network 520 and the target detection network 530 respectively, and the background of the preview image is classified and detected through the classification network 520 to obtain a first confidence map, And performing target detection on the foreground of the preview image through the target detection network 530 to obtain a second confidence map; weighting the first confidence map and the second confidence map to obtain a final confidence map of the preview image; and according to the final confidence map Determines the background and foreground target categories of the preview image.
  • the confidence interval of a probability sample is an interval estimate of a certain population parameter of this sample.
  • the confidence interval shows the degree to which the true value of this parameter has a certain probability of falling around the measurement result.
  • Confidence is the degree of confidence in the measured value of the parameter being measured.
  • the method for identifying a scene of a preview image further includes operations 602 to 606.
  • the target detection network of the neural network is used to detect the foreground target position of the preview image, and a bounding box detection map is output.
  • the bounding box detection map includes a corresponding vector of each pixel in the preview image, where the corresponding vector represents a position relationship between the corresponding pixel and the corresponding detected bounding box, and the detected bounding box is detected in the preview image by using a neural network. Bounding box of the foreground target.
  • Operation 604 Weighting according to the first confidence map, the second confidence map, and the bounding box detection map to obtain a final confidence map of the preview image.
  • Operation 606 Determine the background category, foreground target category, and foreground target position of the preview image according to the final confidence map.
  • the bounding box detection map 710 includes a corresponding vector of each pixel point in the bounding box, where the corresponding vector represents a position relationship between a corresponding pixel point and a corresponding bounding box.
  • the vector of corresponding pixel points in the bounding box detection graph 710 determines a first four-dimensional vector and a second four-dimensional vector.
  • the bounding box of the preview image corresponding to the pixel points detects the distance of the upper, lower, left, and right boundaries of the graph 700. Understandably, by detecting the bounding box and detecting the second four-dimensional vector corresponding to all the pixels in the graph 710, the foreground target position can be determined.
  • the target detection network of the neural network detects the foreground target of the preview image, outputs a second confidence map and a bounding box detection map 710, and detects the first confidence map, the second confidence map and the bounding box detection.
  • the final confidence map of the preview image can be obtained by weighting the graph 710; the background category, foreground target category, and foreground target position of the preview image can be determined according to the final confidence map.
  • the preview image to be processed is a preview image of multiple consecutive frames.
  • a method for detecting facial expression information of the portrait includes operations 802 to 806.
  • Operation 802 Obtain facial motion data of a portrait in a continuous multi-frame preview image.
  • Operation 804 Match facial motion data with preset feature data based on a facial behavior encoding system.
  • Operation 806 When the facial motion data is consistent with the preset feature data, obtain a preset facial expression corresponding to the feature data, and use the preset facial expression as facial expression information of a portrait.
  • the category of the foreground target can be detected based on the neural network shown in FIG. 5.
  • the target detection network 530 of the neural network is used to detect the facial motion of the portrait and obtain facial motion data.
  • the facial motion data can be decomposed into 2 state data and 1 process data: a start state, an end state, and an offset process.
  • the 2 state data and 1 process data respectively correspond to preview images of different frames: start Frames, end frames, and offset frames (offset frames are all frames between the start and end frames).
  • the neural network detects the preview images of the start frame, the offset frame, and the end frame according to the frame timing, and obtains the facial motion data in the preview image composed of the start frame, the offset frame, and the end frame.
  • the facial motion data composed of different frames is matched with preset feature data defined by FACS.
  • FACS defines preset feature data of multiple motion units, and different facial expression information can be described by combining between different motion units.
  • the motion unit AU1 refers to: pulling the eyebrows in the middle upwards; the corresponding facial expression is described as: sadness.
  • the motion unit AU4 refers to: lowering the eyebrows and bringing them together; the corresponding facial expression is described as: physical or psychological barrier.
  • the combination of AU1 and AU4 appears quickly at a speed of 1 to 2 seconds. At this time, the facial expression corresponding to this combination is described as: disappointment.
  • other movement units may be defined according to different regions of the facial organs and different movement modes, which are not specifically limited herein.
  • each motion unit includes preset feature data of multiple frames, and the preset feature data corresponds to facial motion data of a specific area. It should be noted that the specific area includes areas of other facial organs in addition to the eyebrow area, and is not specifically limited herein.
  • the facial motion data including facial motion data in different frames
  • obtain a preset facial expression e.g., corresponding to AU1 + AU4
  • use the preset facial expression as facial expression information of a portrait.
  • detecting facial expression information of a portrait further includes operations 902 to 906.
  • Operation 902 Determine a face region in the preview image.
  • Operation 904 Obtain depth information corresponding to a face region.
  • Operation 906 Determine a facial expression according to the face area and the corresponding depth information.
  • the preview image is a depth image (Depth map), and the depth image is a universal three-dimensional scene information expression manner.
  • the gray value of each pixel in the depth image can be used to represent the distance of a point in the scene from the camera.
  • the depth image may be acquired by a passive ranging sensor or an active depth sensor provided in the camera, which is not specifically limited herein.
  • the camera transmits continuous near-infrared pulses to the target scene, and then uses the sensor to receive the light pulses reflected by the foreground target.
  • the The transmission delay between the two objects further obtains the distance between the foreground target and the transmitter, and finally obtains a depth image.
  • the face detection area is detected based on the target detection network 530 in the neural network.
  • the face region may be obtained by using a bounding box detection map outputted by the target detection network 530 with a human face as a detection target.
  • the acquisition of the depth information is related to the manner of acquiring the depth image.
  • structured light is a light with a specific mode, which has mode patterns such as points, lines, faces, etc.
  • the position and the degree of deformation are calculated using the principle of triangle to obtain the depth information of each point in the face area.
  • the depth information here refers to the three-dimensional information of each point in the face region.
  • determining a facial expression according to a face area and corresponding depth information first locate some feature points, for example, locate multiple feature points between the five features and parts between the features, such as the cheek. Characterize facial features and facial changes. Gabor wavelet coefficients of feature points are extracted by image convolution, and the matching distance of Gabor features is used as a measure of similarity. After the features are extracted, facial expression recognition can be realized through a multilayer neural network. In addition, expression recognition can also be implemented by algorithms based on convolutional neural networks.
  • adjusting the characteristic parameters of the background image according to the facial expression information and the background category includes adjusting at least one of the following characteristic parameters of the background image according to the facial expression information and the background category, for example, hue, brightness, color, and contrast. , Exposure, light effects, etc. Understandably, specific background categories include scenes such as indoors, landscapes, glare, and nights.
  • the computer device detects the foreground target as a portrait, and recognizes the facial expression of the portrait, and detects the scene where the foreground target is located (that is, the background category)
  • the background category parameters such as the hue, brightness, color, contrast, and light effect of the background image can be processed to make the background image coordinate with the facial expression of the portrait.
  • the recognized facial expression is sad and the background image is a landscape
  • the color tone of the landscape can be adjusted to a cool color (such as dark gray) and the contrast can be reduced to enhance the sad atmosphere.
  • the feature parameters of the background image are adjusted according to the facial expression information and the background category, including operations 1002 to 1006.
  • Operation 1002 Determine the feature parameter to be adjusted among the feature parameters according to the background category.
  • Operation 1004 Determine an adjustment mode of the characteristic parameter to be adjusted according to the facial expression information.
  • Operation 1006 Adjust the characteristic parameter to be adjusted according to the adjustment mode.
  • different background categories have their preferred feature parameters to be adjusted.
  • the preferred characteristic parameter to be adjusted is set to hue; when the background category is strong light or at night, the preferred characteristic parameter to be adjusted is set to light effect; when the background category is indoor, the preferred to be adjusted Adjust the feature parameter to color.
  • at least one preferred feature parameter to be adjusted may be set for different backgrounds according to the characteristics of the background category.
  • preferred feature parameters to be adjusted for different background categories may also be set according to user needs, which is not specifically limited herein.
  • an adjustment mode is determined according to facial expression information, and the feature parameter to be adjusted is adjusted according to the adjustment mode.
  • the background type is strong light
  • the preferred characteristic parameter to be adjusted is light effect.
  • the adjustment mode is determined to be: adding a light effect with a cool tone (such as a blue tone).
  • the characteristic parameter (light effect) to be adjusted is changed to set off the sad atmosphere.
  • the adjustment mode may also be set according to the actual needs of the user, which is not specifically limited herein.
  • FIG. 11 is a structural block diagram of an image processing apparatus in an embodiment.
  • an image processing apparatus includes an acquisition module 1110, a recognition module 1120, a determination module 1130, and a composition module 1140. among them:
  • the obtaining module 1110 is configured to obtain a preview image to be processed.
  • Recognition module 1120 used to identify a scene of the preview image; the scene includes a background category and a foreground target.
  • the detection module 1130 is configured to detect facial expression information of the portrait when the foreground target is a portrait.
  • the adjusting module 1140 is configured to adjust a characteristic parameter of a background image in the preview image according to the facial expression information and the background category.
  • the preview image to be processed is acquired by the acquisition module 1110; the recognition module 1120 identifies a scene of the preview image; the scene includes a background image and a foreground target; and the detection module 1130 when the foreground target is a portrait, Detect facial expression information of the portrait; the adjustment module 1140 adjusts the background image according to the facial expression information, so that the person image and the background image in the processed image are coordinated, thereby improving the viewing of the image.
  • the identification module 1120 further includes:
  • a feature extraction unit is configured to perform feature extraction on the preview image using a basic network of a neural network to obtain feature data.
  • Classification unit used to classify and detect the background of the preview image using a classification network of a neural network, and output a first confidence map; each pixel in the first confidence map represents each of the preview images Pixels belong to the confidence of the background detection target.
  • a target detection unit configured to detect a foreground target of the preview image by using a target detection network of a neural network, and output a second confidence map; each pixel in the second confidence map represents a value in the preview image Each pixel belongs to the confidence of the foreground detection target.
  • a calculation unit configured to obtain a final confidence map of the preview image by weighting according to the first confidence map and the second confidence map.
  • a first determining unit determines a background category and a foreground target category of the preview image according to the final confidence map.
  • the target detection unit further includes:
  • Target position detection subunit used to detect the target position of the foreground of the preview image by using a target detection network of a neural network, and output a bounding box detection map, wherein the bounding box detection map includes the correspondence of each pixel in the preview image A vector, where the corresponding vector represents a positional relationship between a corresponding pixel and a corresponding detection boundary box, and the detection boundary box is a boundary box of a foreground target detected in the image to be detected by using a neural network.
  • the calculation unit is further configured to obtain a final confidence map of the preview image by weighting according to the first confidence map, the second confidence map, and the bounding box detection map.
  • the first determining unit is further configured to determine a background category, a foreground target category, and a foreground target position of the preview image according to the final confidence map.
  • the detection module 1130 uses a facial behavior coding system to detect facial expression information of the portrait.
  • the detection module 1130 further includes:
  • a first obtaining unit is used for facial motion data of a portrait in the continuous multi-frame preview image.
  • the matching unit is configured to match the facial motion data with preset feature data based on a facial behavior coding system.
  • a second obtaining unit configured to obtain a preset facial expression corresponding to the feature data when the facial motion data is consistent with the preset characteristic data, and use the preset facial expression as a facial expression of the portrait information.
  • the detection module 1130 further includes:
  • the second determining unit is configured to determine a face region in the preview image.
  • the second obtaining unit is configured to obtain depth information corresponding to a face region.
  • a third determining unit is configured to determine the facial expression information according to a face region and corresponding depth information.
  • the adjustment module 1140 adjusts at least one of the following information of the background image according to the facial expression information: the hue of the background image, the brightness of the background image, or the contrast of the background image.
  • the adjustment module 1140 further includes:
  • a fourth determining unit is configured to determine a feature parameter to be adjusted among the feature parameters according to the background category.
  • a fifth determining unit is configured to determine an adjustment mode of the feature parameter to be adjusted according to the facial expression information.
  • the adjustment unit adjusts the characteristic parameter to be adjusted according to the adjustment mode.
  • FIG. 1, FIG. 4, FIG. 6, FIG. 8, FIG. 9, and FIG. 10 are sequentially displayed as indicated by the arrow, these operations are not necessarily performed sequentially in the order indicated by the arrow . Unless explicitly stated in this article, the execution of these operations is not strictly limited, and these operations can be performed in other orders. Moreover, at least a part of the operations in FIG. 1, FIG. 4, FIG. 6, FIG. 8, FIG. 9, and FIG. 10 may include multiple sub-operations or multiple stages.
  • sub-operations or stages are not necessarily performed at the same time, and It can be executed at different times, and the execution order of these sub-operations or phases is not necessarily performed sequentially, but can be performed in turn or alternately with at least a part of the sub-operations or phases of other operations or other operations.
  • each module in the above image processing apparatus is for illustration only. In other embodiments, the image processing apparatus may be divided into different modules as needed to complete all or part of the functions of the above image processing apparatus.
  • An embodiment of the present application further provides a mobile terminal.
  • the mobile terminal includes a memory and a processor.
  • the memory stores a computer program.
  • the processor causes the processor to perform operations of the image processing method.
  • An embodiment of the present application further provides a computer-readable storage medium.
  • a computer-readable storage medium has stored thereon a computer program that, when executed by a processor, implements the operations of the image processing method.
  • FIG. 12A is a schematic diagram of an internal structure of a mobile terminal according to an embodiment.
  • the mobile terminal includes a processor, a memory, and a network interface connected through a system bus.
  • the processor is used to provide computing and control capabilities to support the operation of the entire mobile terminal.
  • the memory is used to store data, programs, and the like. At least one computer program is stored on the memory, and the computer program can be executed by a processor to implement the wireless network communication method applicable to the mobile terminal provided in the embodiments of the present application.
  • the memory may include a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system and a computer program.
  • the computer program can be executed by a processor to implement a neural network model processing method or an image processing method provided by each of the following embodiments.
  • the internal memory provides a cached operating environment for operating system computer programs in a non-volatile storage medium.
  • the network interface may be an Ethernet card or a wireless network card, and is used to communicate with an external mobile terminal.
  • the mobile terminal may be a mobile phone, a tablet computer, or a personal digital assistant or a wearable device.
  • FIG. 12B is a schematic diagram of an internal structure of a server (or a cloud, etc.) in an embodiment.
  • the server includes a processor, a nonvolatile storage medium, an internal memory, and a network interface connected through a system bus.
  • the processor is used to provide computing and control capabilities to support the operation of the entire mobile terminal.
  • the memory is used to store data, programs, and the like. At least one computer program is stored on the memory, and the computer program can be executed by a processor to implement the wireless network communication method applicable to the mobile terminal provided in the embodiments of the present application.
  • the memory may include a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system and a computer program.
  • the computer program can be executed by a processor to implement an image processing method provided by each of the following embodiments.
  • the internal memory provides a cached operating environment for operating system computer programs in a non-volatile storage medium.
  • the network interface may be an Ethernet card or a wireless network card, and is used to communicate with an external mobile terminal.
  • the server can be implemented by an independent server or a server cluster composed of multiple servers. Those skilled in the art can understand that the structure shown in FIG. 12B is only a block diagram of a part of the structure related to the solution of the application, and does not constitute a limitation on the server to which the solution of the application is applied.
  • the specific server may include More or fewer components are shown in the figure, or some components are combined, or have different component arrangements.
  • each module in the image processing apparatus provided in the embodiments of the present application may be in the form of a computer program.
  • the computer program can be run on a mobile terminal or server.
  • the program module constituted by the computer program can be stored in a memory of a mobile terminal or a server.
  • the computer program is executed by a processor, the operations of the method described in the embodiments of the present application are implemented.
  • a computer program product containing instructions that, when run on a computer, causes the computer to perform an image processing method.
  • An embodiment of the present application further provides a mobile terminal.
  • the above mobile terminal includes an image processing circuit.
  • the image processing circuit may be implemented by using hardware and / or software components, and may include various processing units that define an ISP (Image Signal Processing) pipeline.
  • FIG. 13 is a schematic diagram of an image processing circuit in one embodiment. As shown in FIG. 13, for ease of description, only aspects of the image processing technology related to the embodiments of the present application are shown.
  • the image processing circuit includes an ISP processor 1340 and a control logic 1350.
  • the image data captured by the imaging device 1310 is first processed by the ISP processor 1340.
  • the ISP processor 1340 analyzes the image data to capture image statistical information that can be used to determine and / or one or more control parameters of the imaging device 1310.
  • the imaging device 1310 may include a camera having one or more lenses 1312 and an image sensor 1314.
  • the image sensor 1314 may include a color filter array (such as a Bayer filter).
  • the image sensor 1314 may obtain light intensity and wavelength information captured by each imaging pixel of the image sensor 1314, and provide a set of raw data that can be processed by the ISP processor 1340 Image data.
  • the sensor 1320 (such as a gyroscope) may provide parameters (such as image stabilization parameters) of the acquired image processing to the ISP processor 1340 based on the interface type of the sensor 1320.
  • the sensor 1320 interface may use a SMIA (Standard Mobile Imaging Architecture) interface, other serial or parallel camera interfaces, or a combination of the foregoing interfaces.
  • SMIA Standard Mobile Imaging Architecture
  • the image sensor 1314 may also send the original image data to the sensor 1320, and the sensor 1320 may provide the original image data to the ISP processor 1340 based on the interface type of the sensor 1320, or the sensor 1320 stores the original image data in the image memory 1330.
  • the ISP processor 1340 processes the original image data pixel by pixel in a variety of formats.
  • each image pixel may have a bit depth of 8, 10, 12, or 14 bits, and the ISP processor 1340 may perform one or more image processing operations on the original image data and collect statistical information about the image data.
  • the image processing operations may be performed with the same or different bit depth accuracy.
  • the ISP processor 1340 may also receive image data from the image memory 1330.
  • the sensor 1320 interface sends the original image data to the image memory 1330, and the original image data in the image memory 1330 is then provided to the ISP processor 1340 for processing.
  • the image memory 1330 may be a part of a memory device, a storage device, or a separate dedicated memory in a mobile terminal, and may include a DMA (Direct Memory Access) feature.
  • DMA Direct Memory Access
  • the ISP processor 1340 may perform one or more image processing operations, such as time-domain filtering.
  • the processed image data may be sent to the image memory 1330 for further processing before being displayed.
  • the ISP processor 1340 receives processed data from the image memory 1330, and performs image data processing on the processed data in the original domain and in the RGB and YCbCr color spaces.
  • the image data processed by the ISP processor 1340 may be output to a display 1370 for viewing by a user and / or further processed by a graphics engine or a GPU (Graphics Processing Unit).
  • the output of the ISP processor 1340 can also be sent to the image memory 1330, and the display 1370 can read image data from the image memory 1330.
  • the image memory 1330 may be configured to implement one or more frame buffers.
  • the output of the ISP processor 1340 may be sent to an encoder / decoder 1360 to encode / decode image data.
  • the encoded image data can be saved and decompressed before being displayed on the display 1370 device.
  • the encoder / decoder 1360 may be implemented by a CPU or a GPU or a coprocessor.
  • the statistical data determined by the ISP processor 1340 may be sent to the control logic 1350 unit.
  • the statistical data may include image sensor 1314 statistical information such as auto exposure, auto white balance, auto focus, flicker detection, black level compensation, and lens 1312 shading correction.
  • the control logic 1350 may include a processor and / or a microcontroller that executes one or more routines (such as firmware). The one or more routines may determine the control parameters of the imaging device 1310 and the ISP processing according to the received statistical data. 1340 control parameters.
  • control parameters of the imaging device 1310 may include sensor 1320 control parameters (such as gain, integration time for exposure control, image stabilization parameters, etc.), camera flash control parameters, lens 1312 control parameters (such as focus distance for focusing or zooming), or these A combination of parameters.
  • ISP control parameters may include gain levels and color correction matrices for automatic white balance and color adjustment (eg, during RGB processing), and lens 1312 shading correction parameters.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM), which is used as external cache memory.
  • RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchl) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDR SDRAM dual data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous Link (Synchl) DRAM
  • Rambus direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Abstract

一种图像处理方法和装置包括:获取待处理的预览图像;识别所述预览图像的场景;所述场景包括背景类别和前景目标;当所述前景目标为人像时,检测所述人像的面部表情信息;根据所述面部表情信息和所述背景类别调节所述预览图像中背景图像的特征参数。

Description

图像处理方法、装置、移动终端及计算机可读存储介质
相关申请的交叉引用
本申请要求于2018年7月16日提交中国专利局、申请号为201810779736.1、发明名称为“图像处理方法、装置、移动终端及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机应用领域,特别是涉及一种图像处理方法、装置、移动终端及计算机可读存储介质。
背景技术
随着移动终端的不断发展,尤其是智能手机设备的出现,几乎所有的移动终端设备都具有拍摄功能。但是,人们在拍摄照片时,背景图像通常较为固定,而人物的表情却非常丰富,当人物表情变化时,背景图像无法做出相应的变化,使得人物图像和背景图像不协调,图像观赏性低。
发明内容
本申请实施例提供一种图像处理方法、装置、移动终端及计算机可读存储介质,可以协调人物图像和背景图像。
一种图像处理方法,包括:
获取待处理的预览图像;
识别所述预览图像的场景;所述场景包括背景类别和前景目标;
当所述前景目标为人像时,检测所述人像的面部表情信息;及
根据所述面部表情信息和所述背景类别调节所述预览图像中背景图像的特征参数。
一种图像处理装置,包括:
获取模块,用于获取待处理的预览图像;
识别模块,用于识别所述预览图像的场景;所述场景包括背景类别和前景目标;
检测模块,用于当所述前景目标为人像时,检测所述人像的面部表情信息;及
调节模块,用于根据所述面部表情信息和所述背景类别调节所述预览图像中背景图像的特征参数。
一种移动终端,包括存储器及处理器,所述存储器中储存有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行所述的图像处理方法的操作。
一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现所述的图像处理方法的操作。
本申请实施例中图像处理方法、装置、移动终端及计算机可读存储介质,通过获取待处理的预览图像;识别所述预览图像的场景;所述场景包括背景类别和前景目标;当所述前景目标为人像时,检测所述人像的面部表情信息;根据所述面部表情信息和背景类别调节所述预览图像中背景图像的特征参数,使得处理后的图像中人物图像和背景图像协调。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为一个实施例中图像处理方法的流程图;
图2为一个实施例中神经网络的架构示意图;
图3为一个实施例中拍摄场景的类别示意图;
图4为一个实施例的基于神经网络对预览图像的场景进行识别的方法的流程图;
图5为另一个实施例中神经网络的架构示意图;
图6为另一个实施例基于神经网络对预览图像的场景进行识别的方法的流程图;
图7为一个实施例中预览图像中前景目标的边界框示意图;
图8为一个实施例中检测人像的面部表情信息的方法的流程图;
图9为另一个实施例中检测人像的面部表情信息的方法的流程图;
图10为一个实施例中调节预览图像中背景图像的特征参数的方法的流程图;
图11为一个实施例中图像处理装置的结构框图;
图12A为一个实施例中移动终端的内部结构示意图;
图12B为一个实施例中服务器的内部结构示意图;
图13为一个实施例中图像处理电路的示意图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
图1为一个实施例中图像处理方法的流程图。如图1所示,一种图像处理方法,包括操作102至操作108。
操作102:获取待处理的预览图像。
本实施例中,待处理的图像可以为连续多帧预览图像,连续多帧预览图像可以是连续两帧及两帧以上的预览图像。连续多帧预览图像可以是指计算机设备的摄像头在预设时间内采集的多帧预览图像。例如,计算机设备的摄像头在0.1秒内采集了3帧预览图像,则可以将这3帧预览图像作为连续多帧预览图像。
操作104:识别预览图像的场景。其中,该场景包括背景类别和前景目标。
本实施例中,移动终端中的处理器基于神经网络识别预览图像的场景。应当理解地,神经网络可以为卷积神经网络(Convolutional Neural Network,CNN),CNN是指在传统的多层神经网络的基础上发展起来的一种针对图像分类和识别的一种神经网络模型,相对与传统的多层神经网络,CNN引入了卷积算法和池化算法。其中,卷积算法是指将局部区域内的数据进行加权叠加的一种数学算法,池化算法是指将局部区域内的数据进行采样处理的一种数学算法。
具体而言,CNN模型由卷积层与池化层交替组成,如图2所示,输入层210输入预览图像,卷积层220对输入层的图像的各个局部区域进行图像特征提取,池化层230对卷积层的图像特征进行采样以降低维数,然后再以若干层全连接层240对图像特征进行连接,以最后一层隐藏层250的输出值为最终提取的特征。根据最终提取的特征对场景信息进行识别,其中场景信息包括了背景类别信息和前景目标类别信息。
在一个实施例中,在卷积神经网络的最后一层隐藏层250后配置softmax分析器,通过softmax分析器对上述最终提取的特征进行分析,可以得到图像中的背景对应的类别的概率和前景目标对应类别的概率。
需要说明的是,在采用神经网络对预览图像的背景类别和前景目标进行识别之前,需要对神经网络进行训练,其训练过程包括:
首先,可将包含有至少一个背景训练目标(包括:风景、海滩、雪景、蓝天、绿地、夜景、黑暗、背光、日出/日落、室内、烟火、聚光灯等)和前景训练目标(包括主体对象:人像、婴儿、猫、狗、美食等)的训练图像输入到神经网络中,神经网络根据背景训练目标和前景训练目标进行特征提取,通过SIFT(Scale-invariant feature transform)特征、方 向梯度直方图(Histogram of Oriented Gradient,HOG)特征等提取特征,再通过SSD(Single Shot MultiBox Detector)、VGG(Visual Geometry Group)等目标检测算法,对背景训练目标进行检测得到第一预测置信度,对前景训练目标进行检测得到第二预测置信度。第一预测置信度为采用该神经网络预测出的该训练图像中背景区域某一像素点属于该背景训练目标的置信度。第二预测置信度为采用该神经网络预测出的该训练图像中前景区域某一像素点属于该前景训练目标的置信度。训练图像中可以预先标注背景训练目标和前景训练目标,得到第一真实置信度和第二真实置信度。该第一真实置信度表示在该训练图像中预先标注的该像素点属于该背景训练目标的置信度。第二真实置信度表示在该训练图像中预先标注的该像素点属于该前景训练目标的置信度。针对图像中的每个像素点,真实置信度可以表示为1(或正值)和0(或负值),分别用以表示该像素点属于训练目标和不属于训练目标。
其次,求取第一预测置信度与第一真实置信度之间的差异得到第一损失函数,求其第二预测置信度与第二真实置信度之间的差异得到第二损失函数。第一损失函数和第二损失函数均可采用对数函数、双曲线函数、绝对值函数等。
最后,将所述第一损失函数和第二损失函数进行加权求和得到目标损失函数,并根据所述目标损失函数调节所述神经网络的参数,对所述神经网络进行训练。
在一实施例中,如图3所示,训练图像的拍摄场景可包括背景区域的类别、前景目标和其他。背景区域的类别可包括风景、海滩、雪景、蓝天、绿地、夜景、黑暗、背光、日出/日落、室内、烟火、聚光灯等。前景目标可为人像、婴儿、猫、狗、美食等。其他可为文本文档、微距等。
操作106:当前景目标为人像时,检测人像的面部表情信息。
具体而言,前景目标包括人像、婴儿、猫、狗、美食等。当检测到预览图像中的前景目标为人像时,采用神经网络提取人像的人脸特征点,根据人脸特征点获取对应的面部表情信息。其中,人脸特征点可以为面部器官的轮廓形状或者面部特定区域的面部动作特征(例如面部特定区域的人脸肌肉运动特征)。
应当理解地,面部表情信息可以是开心情感信息,也可以是悲伤情感信息或者平静情感信息,由于人脸面部表情变化的多样性,在此不做具体限定。在一实施例中,计算机设备的数据库中预先存储有人脸特征点与面部表情信息的对应关系,查询该数据库即可得到面部表情信息。
一实施例中,采用面部行为编码系统检测人像的面部表情信息。具体而言,面部行为编码系统(Facial Action Coding System,FACS)是根据人脸的 解剖学特点,将人脸划分成若干既相互独立又相互联系的运动单元(Action Unit,AU),并分析了这些运动单元的运动特征及其所控制的主要区域以及与之相关的表情信息。FACS将许多现实生活中人类的表情进行了分类,并定义了7个主要的情绪表情,满足在不同的文化环境下所表达意义不变的特性,该7个情绪表情分别是快乐(Happiness)、悲伤(Sadness)、愤怒(Anger)、恐惧(Fear)、惊讶(Surprise)、厌恶(Disgust)、轻蔑(Contempt)。FACS是一种基于解剖学的系统,可以用于描述上述每种情绪对应的面部运动。例如,FACS包括多个运动单元(Action Unit,AU),每个AU描述一组面部肌肉,多个AU一起共同组成一个特定的面部运动,通过检测该面部运动可以获取该面部运动对应的面部表情信息。
操作108:根据面部表情信息和背景类别调节预览图像中背景图像的特征参数。
本实施例中,调节预览图像中背景图像的特征参数的方式可以包括很多种,例如,可以调节背景图像的字幕,也可以调节背景图像的色调,还可以调节背景图像的亮度或者调节背景图像的动画等。根据不同的面部表情可以对背景图像进行相应的调节,使得背景图像与人物图像中的面部表情协调。
上述图像处理方法,通过获取待处理的预览图像;识别所述预览图像的场景,所述场景包括背景类别和前景目标;当所述前景目标为人像时,检测所述人像的面部表情信息;根据所述面部表情信息和所述背景类别调节所述预览图像中背景图像的特征参数,使得处理后的 图像中人物图像和背景图像协调,从而使图像更具有观赏性。
在一实施例中,场景包括背景类别和前景目标,如图4所示,识别预览图像的场景的方法,包括操作402至操作410。:
操作402:采用神经网络的基础网络对预览图像进行特征提取,得到特征数据。
操作404:将特征数据输入到神经网络的分类网络对预览图像的背景进行分类检测,输出第一置信度图。其中,第一置信度图中的每个像素点表示预览图像中每个像素点属于背景检测目标的置信度。
操作406:将特征数据输入到神经网络的目标检测网络对预览图像的前景目标进行检测,输出第二置信度图。其中,第二置信度图中的每个像素点表示预览图像中每个像素点属于前景检测目标的置信度。
操作408:根据第一置信度图和第二置信度图进行加权得到预览图像的最终置信度图。
操作410:根据最终置信度图确定预览图像的背景类别和前景目标类别。
本实施例中,如图5所示,神经网络包括基础网络510、分类网络520和目标检测网络530。其中,利用基础网络510提取预览图像的特征数据;再将特征数据分别输入至分类网络520和目标检测网络530,通过分类网络520对预览图像的背景进行分类检测,得到待第一置信度图,以及通过目标检测网络530对预览图像的前景进行目标检测,得第二置信度图;根据第一置信度图和第二置信度图进行加权得到预览图像的最终置信度图;根据最终置信度图确定预览图像的背景类别和前景目标类别。
需要说明的是,在统计学中,一个概率样本的置信区间是对这个样本的某个总体参数的区间估计。置信区间展现的是这个参数的真实值有一定概率落在测量结果的周围的程度。置信度是被测量参数的测量值的可信程度。
在一实施例中,如图6所示,识别预览图像的场景的方法,还包括操作602至操作606。
操作602:采用神经网络的目标检测网络对预览图像的前景目标位置进行检测,输出边界框检测图。其中,边界框检测图包含预览图像中各像素点的对应向量,所述对应向量表示对应的像素点与对应检测边界框的位置关系,所述检测边界框为采用神经网络在预览图像中检测到的前景目标的边界框。
操作604:根据第一置信度图、第二置信度图和边界框检测图进行加权得到预览图像的最终置信度图。
操作606:根据最终置信度图确定预览图像的背景类别、前景目标类别和前景目标位置。
具体而言,参见图7,该边界框检测图710包含该边界框中每个像素点的对应向量,该对应向量表示其对应的像素点与对应的边界框位置关系。其中,边界框检测图710中的对应像素点的向量确定第一四维向量和第二四维向量。该第一四维向量为x=(x 1,x 2,x 3,x 4),该第一四维向量中的元素为该像素点至前景目标的边界框图710的上、下、左、右边界的距离;该第二四维向量为x’=(x 1’,x 2’,x 3’,x 4’),该第二四维向量中的元素分别为该像素点至与该像素点对应的预览图像的边界框检测图700的上、下、左、右边界的距离。可以理解地,通过检测边界框检测图710中所有像素点对应的第二四维向量,即可确定前景目标位置。在一实施例中,神经网络的目标检测网络对预览图像的前景目标进行检测,输出第二置信度图和边界框检测图710,根据第一置信度图、第二置信度图和边界框检测图710进行加权可得到预览图像的最终置信度图;根据最终置信度图可确定预览图像的背景类别、前景目标类别和前景目标位置。进一步地,前景目标的边界框检测图710的面积为X=(x 1+x 2)*(x 3+x 4)。需要说明的是,本实施例中的边界框检测图710为矩形框图,在其它实施例中,边界框检测图以为任意形状的框图,在此不做具体限定。
在一实施例中,待处理的预览图像为连续多帧的预览图像,如图8所示,当前景目标为人像时,检测该人像的面部表情信息的方法,包括操作802至操作806。
操作802:获取连续多帧预览图像中人像的面部运动数据。
操作804:基于面部行为编码系统将面部运动数据与预设特征数据进行匹配。
操作806:当面部运动数据与预设特征数据一致时,获取特征数据对应的预设面部表情,并将该预设面部表情作为人像的面部表情信息。
本实施例中,基于图5所示的神经网络可以检测前景目标的类别,当前景目标为人像时,利用神经网络的目标检测网络530检测该人像的面部运动情况,并获取面部运动数据。应当理解地,该面部运动数据可分解为2个状态数据和1个过程数据:开始状态、结束状态和偏移过程,该2个状态数据和1个过程数据分别对应不同帧的预览图像:开始帧、结束帧和偏移帧(偏移帧指的是开始帧与结束帧之间的所有帧)。具体而言,神经网络对开始帧、偏移帧和结束帧的预览图像按照帧时序进行检测,获取开始帧、偏移帧和结束帧所组成的预览图像中的面部运动数据。
进一步地,将不同帧(开始帧、偏移帧和结束帧)所组成的面部运动数据与FACS定义的预设特征数据进行匹配。其中,FACS定义了多个运动单元的预设特征数据,并且通过不同运动单元之间的组合可以描述不同的面部表情信息。例如,运动单元AU1指:拉动中部的眉毛向上;对应的面部表情描述为:悲伤。运动单元AU4指:将眉毛压低并使眉毛聚拢;对应的面部表情描述为:碰到生理上或心理上阻隔。将AU1和AU4组合在一起,以1秒至2秒的速度快速出现,这个时候,这个组合对应的面部表情描述为:失望。在其他实施例中,根据面部器官的不同区域及不同的运动方式还可以定义其他运动单元,在此不做具体限定。
应当理解地,每个运动单元包括了多个帧的预设特征数据,该预设特征数据对应于特定区域的面部运动数据。需要说明的是,特定区域除了眉毛区域,还包括面部其他器官的区域,在此不做具体限定。当面部运动数据(包括不同帧的面部运动数据)与运动单元的预设特征数据(包括不同帧的面部运动数据)一致时,获取预设特征数据对应的预设面部表情(例如AU1+AU4对应的失望),并将该预设面部表情作为人像的面部表情信息。
在一实施例中,如图9所示,检测人像的面部表情信息,还包括操作902至操作906。
操作902:确定预览图像中的人脸区域。
操作904:获取与人脸区域对应的深度信息。
操作906:根据人脸区域和对应的深度信息确定面部表情。
本实施例中,预览图像为深度图像(Depth map),深度图像为一种普遍的三维场景信息表达方式。深度图像中的每个像素点的灰度值可用于表征场景中某一点距离摄像机的远近。此外,深度图像可由摄像机中设置的被动测距传感或主动深度传感获取,在此不做具体限定。例如,通过摄像机对目标场景发射连续的近红外脉冲,然后用传感器接收由前景目标反射回的光脉冲,通过比较发射光脉冲与经过前景目标反射的光脉冲的相位差,可以推算得到光脉冲之间的传输延迟进而得到前景目标相对于发射器的距离,最终得到一幅深度图像。当前景目标为人像时,基于神经网络中的目标检测网络530检测人脸区域。具体而言,参见图7,人脸区域可通过目标检测网络530输出的以人脸为检测目标的边界框检测图获取。
一实施例中,深度信息的获取与深度图像的获取方式有关。例如,当摄像机基于结构光(结构光是具有特定模式的光,其具有例如点、线、面等模式图案)获取深度图像时,通过结构光的模式图案可以得到的深度图像中人脸区域的位置以及形变程度,并利用三角原理计算即可得到人脸区域中各点的深度信息。其中,这里的深度信息指人脸区域中各点的三维信息。
一实施例中,在根据人脸区域和对应的深度信息确定面部表情的过程中,首先定位一些特征点,例如在五官以及五官之间例如脸颊等部分定位多个特征点,这些特征点基本能够表征五官以及面部的变化。再通过图像卷积抽取特征点的Gabor小波系数,以Gabor特征的匹配距离作为相似度的度量标准。提取特征之后,表情识别可通过多层神经网络实现。此外,表情识别的还可通过基于卷积神经网络的算法实现。
在一实施例中,根据面部表情信息和背景类别调节背景图像的特征参数,包括:根据面部表情信息和背景类别调节背景图像的以下至少之一的特征参数,例如,色调、亮度、色彩、对比度、曝光度、光效等。可以理解地,具体背景类别包括室内,风景,强光,夜晚等场景, 当计算机设备检测出前景目标为人像,且识别出人像的面部表情,以及检测出前景目标所在场景(即背景类别)时,根据人像的面部表情和场景种类(背景类别),可对背景图像的色调、亮度、色彩、对比度、光效等参数进行处理,使得背景图像与人像的面部表情协调。例如,识别的面部表情为悲伤,背景图像为风景时,可将风景色调调节为冷色调(比如暗灰色)并且降低对比度,以烘托悲伤的氛围。
一实施例中,如图10所示,根据面部表情信息和背景类别调节背景图像的特征参数,包括操作1002至操作1006。
操作1002:根据背景类别确定特征参数中的待调节特征参数。
操作1004:根据面部表情信息确定待调节特征参数的调节模式。
操作1006:根据调节模式调节待调节特征参数。
本实施例中,不同的背景类别具有其优选的待调节特征参数。例如,当背景类别为风景时,优选的待调节特征参数设置为色调;当背景类别为强光或夜晚时,优选的待调节特征参数设置为光效;当背景类别为室内时,优选的待调节特征参数设置为色彩。可以理解地,根据背景类别的特点可以为不同的背景设置至少一个优选的待调节特征参数。一实施例中,还可以根据用户的需求设置不同背景类别的优选的待调节特征参数,在此不做具体限定。
当背景类别和其优选的待调节特征参数确定后,根据面部表情信息确定调节模式,并根据调节模式调节待调节特征参数。例如,背景类别为强光,优选的待调节特征参数为光效,此时若检测到面部表情信息为悲伤,则确定调节模式为:添加冷色调(比如蓝色调)的光线效果。根据该调节模式改变待调节特征参数(光效),以烘托悲伤的氛围。在其他实施例中,还可以根据用户实际需求设置调节模式,在此不做具体限定。
图11为一个实施例中图像处理装置的结构框图。如图11所示,一种图像处理装置,包括获取模块1110、识别模块1120、确定模块1130和构图模块1140。其中:
获取模块1110:用于获取待处理的预览图像。
识别模块1120:用于识别所述预览图像的场景;所述场景包括背景类别和前景目标。
检测模块1130:用于当所述前景目标为人像时,检测所述人像的面部表情信息。
调节模块1140:用于根据所述面部表情信息和所述背景类别调节所述预览图像中背景图像的特征参数。
本申请实施例中,通过获取模块1110获取待处理的预览图像;识别模块1120识别所述预览图像的场景;所述场景包括背景图像和前景目标;检测模块1130当所述前景目标为人像时,检测所述人像的面部表情信息;调节模块1140根据所述面部表情信息调节所述背景图像,使得处理后的图像中人物图像和背景图像协调,从而提高图像的观赏性。
在一个实施例中,识别模块1120,还包括:
特征提取单元,用于采用神经网络的基础网络对所述预览图像进行特征提取,得到特征数据。
分类单元:用于采用神经网络的分类网络对所述预览图像的背景进行分类检测,输出第一置信度图;所述第一置信度图中的每个像素点表示所述预览图像中每个像素点属于背景检测目标的置信度。
目标检测单元,用于采用神经网络的目标检测网络对所述预览图像的前景目标进行检测,输出第二置信度图;所述第二置信度图中的每个像素点表示所述预览图像中每个像素点属于前景检测目标的置信度。
计算单元:用于根据所述第一置信度图和所述第二置信度图进行加权得到所述预览图像的最终置信度图。
第一确定单元,根据所述最终置信度图确定所述预览图像的背景类别和前景目标类别。
在一个实施例中,目标检测单元,还包括:
目标位置检测子单元:用于采用神经网络的目标检测网络对所述预览图像的前景目标位 置进行检测,输出边界框检测图,所述边界框检测图包含所述预览图像中各像素点的对应向量,所述对应向量表示对应的像素点与对应检测边界框的位置关系,所述检测边界框为采用神经网络在所述待检测图像中检测到的前景目标的边界框。
在一个实施例中,计算单元还用于根据所述第一置信度图、第二置信度图和边界框检测图进行加权得到所述预览图像的最终置信度图。
在一个实施例中,第一确定单元还用于根据所述最终置信度图确定所述预览图像的背景类别、前景目标类别和前景目标位置。
在一个实施例中,检测模块1130采用面部行为编码系统检测所述人像的面部表情信息。
在一个实施例中,检测模块1130,还包括:
第一获取单元,用于所述连续多帧预览图像中人像的面部运动数据。
匹配单元,用于基于面部行为编码系统将所述面部运动数据与预设特征数据进行匹配。
第二获取单元,用于当所述面部运动数据与所述预设特征数据一致时,获取所述特征数据对应的预设面部表情,并将所述预设面部表情作为所述人像的面部表情信息。
在一个实施例中,检测模块1130,还包括:
第二确定单元,用于确定所述预览图像中的人脸区域。
第二获取单元,用于获取与人脸区域对应的深度信息。
第三确定单元,用于根据人脸区域和对应的深度信息确定所述面部表情信息。
在一实施例中,调节模块1140根据所述面部表情信息调节所述背景图像的以下至少之一的信息:所述背景图像的色调、所述背景图像的亮度或者所述背景图像的对比度。
在一实施例中,调节模块1140,还包括:
第四确定单元,用于根据所述背景类别确定所述特征参数中的待调节特征参数。
第五确定单元,用于根据所述面部表情信息确定所述待调节特征参数的调节模式。
调节单元,根据所述调节模式调节所述待调节特征参数。
应该理解的是,虽然图1、图4、图6、图8、图9、图10的流程图中的各个操作按照箭头的指示依次显示,但是这些操作并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些操作的执行并没有严格的顺序限制,这些操作可以以其它的顺序执行。而且,图1、图4、图6、图8、图9、图10中的至少一部分操作可以包括多个子操作或者多个阶段,这些子操作或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子操作或者阶段的执行顺序也不必然是依次进行,而是可以与其它操作或者其它操作的子操作或者阶段的至少一部分轮流或者交替地执行。
上述图像处理装置中各个模块的划分仅用于举例说明,在其他实施例中,可将图像处理装置按照需要划分为不同的模块,以完成上述图像处理装置的全部或部分功能。
本申请实施例还提供一种移动终端。该移动终端包括存储器及处理器,该存储器中储存有计算机程序,该计算机程序被该处理器执行时,使得该处理器执行该图像处理方法的操作。
本申请实施例还提供一种计算机可读存储介质。一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现该图像处理方法的操作。
图12A为一个实施例中移动终端的内部结构示意图。如图12A所示,该移动终端包括通过系统总线连接的处理器、存储器和网络接口。其中,该处理器用于提供计算和控制能力,支撑整个移动终端的运行。存储器用于存储数据、程序等,存储器上存储至少一个计算机程序,该计算机程序可被处理器执行,以实现本申请实施例中提供的适用于移动终端的无线网络通信方法。存储器可包括非易失性存储介质及内存储器。非易失性存储介质存储有操作系统和计算机程序。该计算机程序可被处理器所执行,以用于实现以下各个实施例所提供的一种神经网络模型处理方法或图像处理方法。内存储器为非易失性存储介质中的操作系统计算机程序提供高速缓存的运行环境。网络接口可以是以太网卡或无线网卡等,用于与外部的移动终端进行通信。该移动终端可以是手机、平板电脑或者个人数字助理或穿戴式设备等。
图12B为一个实施例中服务器(或云端等)的内部结构示意图。如图12B所示,该服务器 包括通过系统总线连接的处理器、非易失性存储介质、内存储器和网络接口。其中,该处理器用于提供计算和控制能力,支撑整个移动终端的运行。存储器用于存储数据、程序等,存储器上存储至少一个计算机程序,该计算机程序可被处理器执行,以实现本申请实施例中提供的适用于移动终端的无线网络通信方法。存储器可包括非易失性存储介质及内存储器。非易失性存储介质存储有操作系统和计算机程序。该计算机程序可被处理器所执行,以用于实现以下各个实施例所提供的一种图像处理方法。内存储器为非易失性存储介质中的操作系统计算机程序提供高速缓存的运行环境。网络接口可以是以太网卡或无线网卡等,用于与外部的移动终端进行通信。服务器可以用独立的服务器或者是多个服务器组成的服务器集群来实现。本领域技术人员可以理解,图12B中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的服务器的限定,具体的服务器可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
本申请实施例中提供的图像处理装置中的各个模块的实现可为计算机程序的形式。该计算机程序可在移动终端或服务器上运行。该计算机程序构成的程序模块可存储在移动终端或服务器的存储器上。该计算机程序被处理器执行时,实现本申请实施例中所描述方法的操作。
一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行图像处理方法。
本申请实施例还提供一种移动终端。上述移动终端中包括图像处理电路,图像处理电路可以利用硬件和/或软件组件实现,可包括定义ISP(Image Signal Processing,图像信号处理)管线的各种处理单元。图13为一个实施例中图像处理电路的示意图。如图13所示,为便于说明,仅示出与本申请实施例相关的图像处理技术的各个方面。
如图13所示,图像处理电路包括ISP处理器1340和控制逻辑器1350。成像设备1310捕捉的图像数据首先由ISP处理器1340处理,ISP处理器1340对图像数据进行分析以捕捉可用于确定和/或成像设备1310的一个或多个控制参数的图像统计信息。成像设备1310可包括具有一个或多个透镜1312和图像传感器1314的照相机。图像传感器1314可包括色彩滤镜阵列(如Bayer滤镜),图像传感器1314可获取用图像传感器1314的每个成像像素捕捉的光强度和波长信息,并提供可由ISP处理器1340处理的一组原始图像数据。传感器1320(如陀螺仪)可基于传感器1320接口类型把采集的图像处理的参数(如防抖参数)提供给ISP处理器1340。传感器1320接口可以利用SMIA(Standard Mobile Imaging Architecture,标准移动成像架构)接口、其它串行或并行照相机接口或上述接口的组合。
此外,图像传感器1314也可将原始图像数据发送给传感器1320,传感器1320可基于传感器1320接口类型把原始图像数据提供给ISP处理器1340,或者传感器1320将原始图像数据存储到图像存储器1330中。
ISP处理器1340按多种格式逐个像素地处理原始图像数据。例如,每个图像像素可具有8、10、12或14比特的位深度,ISP处理器1340可对原始图像数据进行一个或多个图像处理操作、收集关于图像数据的统计信息。其中,图像处理操作可按相同或不同的位深度精度进行。
ISP处理器1340还可从图像存储器1330接收图像数据。例如,传感器1320接口将原始图像数据发送给图像存储器1330,图像存储器1330中的原始图像数据再提供给ISP处理器1340以供处理。图像存储器1330可为存储器装置的一部分、存储设备、或移动终端内的独立的专用存储器,并可包括DMA(Direct Memory Access,直接直接存储器存取)特征。
当接收到来自图像传感器1314接口或来自传感器1320接口或来自图像存储器1330的原始图像数据时,ISP处理器1340可进行一个或多个图像处理操作,如时域滤波。处理后的图像数据可发送给图像存储器1330,以便在被显示之前进行另外的处理。ISP处理器1340从图像存储器1330接收处理数据,并对所述处理数据进行原始域中以及RGB和YCbCr颜色空间中的图像数据处理。ISP处理器1340处理后的图像数据可输出给显示器1370,以供用户观看和 /或由图形引擎或GPU(Graphics Processing Unit,图形处理器)进一步处理。此外,ISP处理器1340的输出还可发送给图像存储器1330,且显示器1370可从图像存储器1330读取图像数据。在一个实施例中,图像存储器1330可被配置为实现一个或多个帧缓冲器。此外,ISP处理器1340的输出可发送给编码器/解码器1360,以便编码/解码图像数据。编码的图像数据可被保存,并在显示于显示器1370设备上之前解压缩。编码器/解码器1360可由CPU或GPU或协处理器实现。
ISP处理器1340确定的统计数据可发送给控制逻辑器1350单元。例如,统计数据可包括自动曝光、自动白平衡、自动聚焦、闪烁检测、黑电平补偿、透镜1312阴影校正等图像传感器1314统计信息。控制逻辑器1350可包括执行一个或多个例程(如固件)的处理器和/或微控制器,一个或多个例程可根据接收的统计数据,确定成像设备1310的控制参数及ISP处理器1340的控制参数。例如,成像设备1310的控制参数可包括传感器1320控制参数(例如增益、曝光控制的积分时间、防抖参数等)、照相机闪光控制参数、透镜1312控制参数(例如聚焦或变焦用焦距)、或这些参数的组合。ISP控制参数可包括用于自动白平衡和颜色调节(例如,在RGB处理期间)的增益水平和色彩校正矩阵,以及透镜1312阴影校正参数。
本申请所使用的对存储器、存储、数据库或其它介质的任何引用可包括非易失性和/或易失性存储器。合适的非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM),它用作外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDR SDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchl ink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (18)

  1. 一种图像处理方法,其特征在于,包括:
    获取待处理的预览图像;
    识别所述预览图像的场景;所述场景包括背景类别和前景目标;
    当所述前景目标为人像时,检测所述人像的面部表情信息;及
    根据所述面部表情信息和所述背景类别调节所述预览图像中背景图像的特征参数。
  2. 根据权利要求1所述的方法,其特征在于,所述待处理的预览图像为连续多帧预览图像,所述检测所述人像的面部表情信息,包括:
    获取所述连续多帧预览图像中人像的面部运动数据;
    基于面部行为编码系统将所述面部运动数据与预设特征数据进行匹配;
    当所述面部运动数据与所述预设特征数据一致时,获取所述特征数据对应的预设面部表情,并将所述预设面部表情作为所述人像的面部表情信息。
  3. 根据权利要求1所述的方法,其特征在于,所述检测所述人像的面部表情信息,还包括:
    确定所述预览图像中的人脸区域;
    获取与人脸区域对应的深度信息;
    根据人脸区域和对应的深度信息确定所述面部表情信息。
  4. 根据权利要求1所述的方法,其特征在于,所述根据所述面部表情信息和所述背景类别调节所述预览图像中背景图像的特征参数,包括:
    根据所述面部表情信息和背景类别调节所述预览图像中背景图像的以下至少之一的特征参数:色调、亮度、色彩、对比度、曝光度、光效。
  5. 根据权利要求1所述的方法,其特征在于,所述根据所述面部表情信息和所述背景类别调节所述预览图像中背景图像的特征参数,还包括:
    根据所述背景类别确定所述特征参数中的待调节特征参数;
    根据所述面部表情信息确定所述待调节特征参数的调节模式;
    根据所述调节模式调节所述待调节特征参数。
  6. 根据权利要求1所述的方法,其特征在于,所述识别所述预览图像的场景的方法,包括:
    采用神经网络的基础网络对预览图像进行特征提取,得到特征数据;
    将所述特征数据输入到所述神经网络的分类网络对预览图像的背景进行分类检测,输出第一置信度图;所述第一置信度图中的每个像素点表示预览图像中每个像素点属于背景检测目标的置信度;
    将所述特征数据输入到所述神经网络的目标检测网络对预览图像的前景目标进行检测,输出第二置信度图;所述第二置信度图中的每个像素点表示预览图像中每个像素点属于前景检测目标的置信度;
    根据所述第一置信度图和所述第二置信度图进行加权得到预览图像的最终置信度图;
    根据所述最终置信度图确定预览图像的背景类别和前景目标类别。
  7. 根据权利要求6所述的方法,其特征在于,所述识别所述预览图像的场景的方法,还包括:
    采用神经网络的目标检测网络对预览图像的前景目标位置进行检测,输出边界框检测图,所述边界框检测图包含预览图像中各像素点的对应向量,所述对应向量表示对应的像素点与对应检测边界框的位置关系,所述检测边界框为采用神经网络在所述待检测图像中检测到的前景目标的边界框;
    根据所述第一置信度图、第二置信度图和边界框检测图进行加权得到预览图像的最终 置信度图;
    根据所述最终置信度图确定预览图像的背景类别、前景目标类别和前景目标位置。
  8. 根据权利要求6所述的方法,其特征在于,所述神经网络通过以下操作进行训练:
    将包含有至少一个背景训练目标和前景训练目标的训练图像输入到神经网络中,神经网络根据背景训练目标和前景训练目标进行特征提取;
    通过目标检测算法对所述背景训练目标进行检测得到第一预测置信度,对所述前景训练目标进行检测得到第二预测置信度;
    所述训练图像中预先标注背景训练目标和前景训练目标,得到第一真实置信度和第二真实置信度;
    求取第一预测置信度与第一真实置信度之间的差异得到第一损失函数,求其第二预测置信度与第二真实置信度之间的差异得到第二损失函数;
    将所述第一损失函数和第二损失函数进行加权求和得到目标损失函数,并根据所述目标损失函数调节所述神经网络的参数,对所述神经网络进行训练。
  9. 一种图像处理装置,包括:
    获取模块,用于获取待处理的预览图像;
    识别模块,用于识别所述预览图像的场景;所述场景包括背景类别和前景目标;
    检测模块,用于当所述前景目标为人像时,检测所述人像的面部表情信息;及
    调节模块,用于根据所述面部表情信息和所述背景类别调节所述预览图像中背景图像的特征参数。
  10. 一种移动终端,包括存储器及处理器,所述存储器中储存有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行如下操作:。
    获取待处理的预览图像;
    识别所述预览图像的场景;所述场景包括背景类别和前景目标;
    当所述前景目标为人像时,检测所述人像的面部表情信息;及
    根据所述面部表情信息和所述背景类别调节所述预览图像中背景图像的特征参数。
  11. 根据权利要求10所述的移动终端,其特征在于,所述待处理的预览图像为连续多帧预览图像,所述检测所述人像的面部表情信息,包括:
    获取所述连续多帧预览图像中人像的面部运动数据;
    基于面部行为编码系统将所述面部运动数据与预设特征数据进行匹配;
    当所述面部运动数据与所述预设特征数据一致时,获取所述特征数据对应的预设面部表情,并将所述预设面部表情作为所述人像的面部表情信息。
  12. 根据权利要求10所述的移动终端,其特征在于,所述检测所述人像的面部表情信息,还包括:
    确定所述预览图像中的人脸区域;
    获取与人脸区域对应的深度信息;
    根据人脸区域和对应的深度信息确定所述面部表情信息。
  13. 根据权利要求10所述的移动终端,其特征在于,所述根据所述面部表情信息和所述背景类别调节所述预览图像中背景图像的特征参数,包括:
    根据所述面部表情信息和背景类别调节所述预览图像中背景图像的以下至少之一的特征参数:色调、亮度、色彩、对比度、曝光度、光效。
  14. 根据权利要求10所述的移动终端,其特征在于,所述根据所述面部表情信息和所述背景类别调节所述预览图像中背景图像的特征参数,还包括:
    根据所述背景类别确定所述特征参数中的待调节特征参数;
    根据所述面部表情信息确定所述待调节特征参数的调节模式;
    根据所述调节模式调节所述待调节特征参数。
  15. 根据权利要求10所述的移动终端,其特征在于,所述识别所述预览图像的场景的方法,包括:
    采用神经网络的基础网络对预览图像进行特征提取,得到特征数据;
    将所述特征数据输入到所述神经网络的分类网络对预览图像的背景进行分类检测,输出第一置信度图;所述第一置信度图中的每个像素点表示预览图像中每个像素点属于背景检测目标的置信度;
    将所述特征数据输入到所述神经网络的目标检测网络对预览图像的前景目标进行检测,输出第二置信度图;所述第二置信度图中的每个像素点表示预览图像中每个像素点属于前景检测目标的置信度;
    根据所述第一置信度图和所述第二置信度图进行加权得到预览图像的最终置信度图;
    根据所述最终置信度图确定预览图像的背景类别和前景目标类别。
  16. 根据权利要求15所述的移动终端,其特征在于,所述识别所述预览图像的场景的方法,还包括:
    采用神经网络的目标检测网络对预览图像的前景目标位置进行检测,输出边界框检测图,所述边界框检测图包含预览图像中各像素点的对应向量,所述对应向量表示对应的像素点与对应检测边界框的位置关系,所述检测边界框为采用神经网络在所述待检测图像中检测到的前景目标的边界框;
    根据所述第一置信度图、第二置信度图和边界框检测图进行加权得到预览图像的最终置信度图;
    根据所述最终置信度图确定预览图像的背景类别、前景目标类别和前景目标位置。
  17. 根据权利要求15所述的移动终端,其特征在于,所述处理器还用于通过以下操作训练神经网络:
    将包含有至少一个背景训练目标和前景训练目标的训练图像输入到神经网络中,神经网络根据背景训练目标和前景训练目标进行特征提取;
    通过目标检测算法对所述背景训练目标进行检测得到第一预测置信度,对所述前景训练目标进行检测得到第二预测置信度;
    所述训练图像中预先标注背景训练目标和前景训练目标,得到第一真实置信度和第二真实置信度;
    求取第一预测置信度与第一真实置信度之间的差异得到第一损失函数,求其第二预测置信度与第二真实置信度之间的差异得到第二损失函数;
    将所述第一损失函数和第二损失函数进行加权求和得到目标损失函数,并根据所述目标损失函数调节所述神经网络的参数,对所述神经网络进行训练。
  18. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至7中任一项所述的图像处理方法的操作。
PCT/CN2019/089941 2018-07-16 2019-06-04 图像处理方法、装置、移动终端及计算机可读存储介质 WO2020015470A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810779736.1A CN108900769B (zh) 2018-07-16 2018-07-16 图像处理方法、装置、移动终端及计算机可读存储介质
CN201810779736.1 2018-07-16

Publications (1)

Publication Number Publication Date
WO2020015470A1 true WO2020015470A1 (zh) 2020-01-23

Family

ID=64349247

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/089941 WO2020015470A1 (zh) 2018-07-16 2019-06-04 图像处理方法、装置、移动终端及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN108900769B (zh)
WO (1) WO2020015470A1 (zh)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111489322A (zh) * 2020-04-09 2020-08-04 广州光锥元信息科技有限公司 给静态图片加天空滤镜的方法及装置
CN111639653A (zh) * 2020-05-08 2020-09-08 浙江大华技术股份有限公司 一种误检图像确定方法、装置、设备和介质
CN111652930A (zh) * 2020-06-04 2020-09-11 上海媒智科技有限公司 一种图像目标检测方法、系统及设备
CN111754622A (zh) * 2020-07-13 2020-10-09 腾讯科技(深圳)有限公司 脸部三维图像生成方法及相关设备
CN112084960A (zh) * 2020-09-11 2020-12-15 中国传媒大学 一种基于稀疏图的人脸表情识别方法
CN112163988A (zh) * 2020-08-17 2021-01-01 中国人民解放军93114部队 红外图像的生成方法、装置、计算机设备和可读存储介质
CN112163492A (zh) * 2020-09-21 2021-01-01 华南理工大学 一种长时跨场景优化的交通物体检测方法、系统及介质
CN112733117A (zh) * 2020-02-03 2021-04-30 支付宝实验室(新加坡)有限公司 认证系统和方法
CN113012189A (zh) * 2021-03-31 2021-06-22 影石创新科技股份有限公司 图像识别方法、装置、计算机设备和存储介质
CN113256503A (zh) * 2020-02-13 2021-08-13 北京小米移动软件有限公司 图像优化方法及装置、移动终端及存储介质
CN113329173A (zh) * 2021-05-19 2021-08-31 Tcl通讯(宁波)有限公司 一种影像优化方法、装置、存储介质及终端设备
CN113553937A (zh) * 2021-07-19 2021-10-26 北京百度网讯科技有限公司 目标检测方法、装置、电子设备以及存储介质
CN113989857A (zh) * 2021-12-27 2022-01-28 四川新网银行股份有限公司 一种基于深度学习的人像照片内容解析方法及系统
CN114079725A (zh) * 2020-08-13 2022-02-22 华为技术有限公司 视频防抖方法、终端设备和计算机可读存储介质
CN114125286A (zh) * 2021-11-18 2022-03-01 维沃移动通信有限公司 拍摄方法及其装置
CN114399710A (zh) * 2022-01-06 2022-04-26 昇辉控股有限公司 一种基于图像分割的标识检测方法、系统及可读存储介质
CN112084960B (zh) * 2020-09-11 2024-05-14 中国传媒大学 一种基于稀疏图的人脸表情识别方法

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108900769B (zh) * 2018-07-16 2020-01-10 Oppo广东移动通信有限公司 图像处理方法、装置、移动终端及计算机可读存储介质
CN109685741B (zh) * 2018-12-28 2020-12-11 北京旷视科技有限公司 一种图像处理方法、装置及计算机存储介质
CN110046576A (zh) * 2019-04-17 2019-07-23 内蒙古工业大学 一种训练识别面部表情的方法和装置
CN110473185B (zh) * 2019-08-07 2022-03-15 Oppo广东移动通信有限公司 图像处理方法和装置、电子设备、计算机可读存储介质
CN110991465B (zh) * 2019-11-15 2023-05-23 泰康保险集团股份有限公司 一种物体识别方法、装置、计算设备及存储介质
CN112822542A (zh) * 2020-08-27 2021-05-18 腾讯科技(深圳)有限公司 视频合成方法、装置、计算机设备和存储介质
CN112351195B (zh) * 2020-09-22 2022-09-30 北京迈格威科技有限公司 图像处理方法、装置和电子系统
CN112203122B (zh) * 2020-10-10 2024-01-26 腾讯科技(深圳)有限公司 基于人工智能的相似视频处理方法、装置及电子设备
CN113177438A (zh) * 2021-04-02 2021-07-27 深圳小湃科技有限公司 图像处理方法、设备及存储介质
CN113408380B (zh) * 2021-06-07 2023-07-07 深圳小湃科技有限公司 视频图像调整方法、设备及存储介质
CN113762107B (zh) * 2021-08-23 2024-05-07 海宁奕斯伟集成电路设计有限公司 对象状态评估方法、装置、电子设备及可读存储介质
CN116546310B (zh) * 2023-07-05 2023-09-15 北京电子科技学院 基于人工智能的摄影辅助方法、装置、设备和介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120249841A1 (en) * 2011-03-31 2012-10-04 Tessera Technologies Ireland Limited Scene enhancements in off-center peripheral regions for nonlinear lens geometries
CN103679189A (zh) * 2012-09-14 2014-03-26 华为技术有限公司 场景识别的方法和装置
CN105931178A (zh) * 2016-04-15 2016-09-07 乐视控股(北京)有限公司 一种图像处理方法及装置
CN107563390A (zh) * 2017-08-29 2018-01-09 苏州智萃电子科技有限公司 一种图像识别方法及系统
CN108900769A (zh) * 2018-07-16 2018-11-27 Oppo广东移动通信有限公司 图像处理方法、装置、移动终端及计算机可读存储介质

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5148989B2 (ja) * 2007-12-27 2013-02-20 イーストマン コダック カンパニー 撮像装置
CN102289664B (zh) * 2011-07-29 2013-05-08 北京航空航天大学 基于统计形状理论的非线性面部运动流形学习方法
JP2013223146A (ja) * 2012-04-17 2013-10-28 Sharp Corp 画像処理装置、画像形成装置及び画像処理方法
CN106303250A (zh) * 2016-08-26 2017-01-04 维沃移动通信有限公司 一种图像处理方法及移动终端
CN106506975A (zh) * 2016-12-29 2017-03-15 深圳市金立通信设备有限公司 一种拍摄方法及终端
CN107680034A (zh) * 2017-09-11 2018-02-09 广东欧珀移动通信有限公司 图像处理方法和装置、电子装置和计算机可读存储介质
CN107818313B (zh) * 2017-11-20 2019-05-14 腾讯科技(深圳)有限公司 活体识别方法、装置和存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120249841A1 (en) * 2011-03-31 2012-10-04 Tessera Technologies Ireland Limited Scene enhancements in off-center peripheral regions for nonlinear lens geometries
CN103679189A (zh) * 2012-09-14 2014-03-26 华为技术有限公司 场景识别的方法和装置
CN105931178A (zh) * 2016-04-15 2016-09-07 乐视控股(北京)有限公司 一种图像处理方法及装置
CN107563390A (zh) * 2017-08-29 2018-01-09 苏州智萃电子科技有限公司 一种图像识别方法及系统
CN108900769A (zh) * 2018-07-16 2018-11-27 Oppo广东移动通信有限公司 图像处理方法、装置、移动终端及计算机可读存储介质

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112733117A (zh) * 2020-02-03 2021-04-30 支付宝实验室(新加坡)有限公司 认证系统和方法
CN113256503A (zh) * 2020-02-13 2021-08-13 北京小米移动软件有限公司 图像优化方法及装置、移动终端及存储介质
CN113256503B (zh) * 2020-02-13 2024-03-08 北京小米移动软件有限公司 图像优化方法及装置、移动终端及存储介质
CN111489322A (zh) * 2020-04-09 2020-08-04 广州光锥元信息科技有限公司 给静态图片加天空滤镜的方法及装置
CN111639653B (zh) * 2020-05-08 2023-10-10 浙江大华技术股份有限公司 一种误检图像确定方法、装置、设备和介质
CN111639653A (zh) * 2020-05-08 2020-09-08 浙江大华技术股份有限公司 一种误检图像确定方法、装置、设备和介质
CN111652930A (zh) * 2020-06-04 2020-09-11 上海媒智科技有限公司 一种图像目标检测方法、系统及设备
CN111652930B (zh) * 2020-06-04 2024-02-27 上海媒智科技有限公司 一种图像目标检测方法、系统及设备
CN111754622A (zh) * 2020-07-13 2020-10-09 腾讯科技(深圳)有限公司 脸部三维图像生成方法及相关设备
CN111754622B (zh) * 2020-07-13 2023-10-13 腾讯科技(深圳)有限公司 脸部三维图像生成方法及相关设备
CN114079725A (zh) * 2020-08-13 2022-02-22 华为技术有限公司 视频防抖方法、终端设备和计算机可读存储介质
CN114079725B (zh) * 2020-08-13 2023-02-07 华为技术有限公司 视频防抖方法、终端设备和计算机可读存储介质
CN112163988A (zh) * 2020-08-17 2021-01-01 中国人民解放军93114部队 红外图像的生成方法、装置、计算机设备和可读存储介质
CN112163988B (zh) * 2020-08-17 2022-12-13 中国人民解放军93114部队 红外图像的生成方法、装置、计算机设备和可读存储介质
CN112084960B (zh) * 2020-09-11 2024-05-14 中国传媒大学 一种基于稀疏图的人脸表情识别方法
CN112084960A (zh) * 2020-09-11 2020-12-15 中国传媒大学 一种基于稀疏图的人脸表情识别方法
CN112163492B (zh) * 2020-09-21 2023-09-08 华南理工大学 一种长时跨场景优化的交通物体检测方法、系统及介质
CN112163492A (zh) * 2020-09-21 2021-01-01 华南理工大学 一种长时跨场景优化的交通物体检测方法、系统及介质
CN113012189A (zh) * 2021-03-31 2021-06-22 影石创新科技股份有限公司 图像识别方法、装置、计算机设备和存储介质
CN113329173A (zh) * 2021-05-19 2021-08-31 Tcl通讯(宁波)有限公司 一种影像优化方法、装置、存储介质及终端设备
CN113553937A (zh) * 2021-07-19 2021-10-26 北京百度网讯科技有限公司 目标检测方法、装置、电子设备以及存储介质
CN114125286A (zh) * 2021-11-18 2022-03-01 维沃移动通信有限公司 拍摄方法及其装置
CN113989857A (zh) * 2021-12-27 2022-01-28 四川新网银行股份有限公司 一种基于深度学习的人像照片内容解析方法及系统
CN114399710A (zh) * 2022-01-06 2022-04-26 昇辉控股有限公司 一种基于图像分割的标识检测方法、系统及可读存储介质

Also Published As

Publication number Publication date
CN108900769A (zh) 2018-11-27
CN108900769B (zh) 2020-01-10

Similar Documents

Publication Publication Date Title
WO2020015470A1 (zh) 图像处理方法、装置、移动终端及计算机可读存储介质
CN108764370B (zh) 图像处理方法、装置、计算机可读存储介质和计算机设备
CN108777815B (zh) 视频处理方法和装置、电子设备、计算机可读存储介质
US10990825B2 (en) Image processing method, electronic device and computer readable storage medium
CN108810413B (zh) 图像处理方法和装置、电子设备、计算机可读存储介质
CN108764208B (zh) 图像处理方法和装置、存储介质、电子设备
US10896323B2 (en) Method and device for image processing, computer readable storage medium, and electronic device
US11233933B2 (en) Method and device for processing image, and mobile terminal
CN108805103B (zh) 图像处理方法和装置、电子设备、计算机可读存储介质
CN108875619B (zh) 视频处理方法和装置、电子设备、计算机可读存储介质
WO2019233393A1 (zh) 图像处理方法和装置、存储介质、电子设备
CN108961302B (zh) 图像处理方法、装置、移动终端及计算机可读存储介质
WO2019233297A1 (zh) 数据集的构建方法、移动终端、可读存储介质
CN108984657B (zh) 图像推荐方法和装置、终端、可读存储介质
WO2019085792A1 (en) Image processing method and device, readable storage medium and electronic device
CN110572573B (zh) 对焦方法和装置、电子设备、计算机可读存储介质
CN108765033B (zh) 广告信息推送方法和装置、存储介质、电子设备
CN108810406B (zh) 人像光效处理方法、装置、终端及计算机可读存储介质
CN111401324A (zh) 图像质量评估方法、装置、存储介质及电子设备
CN108959462B (zh) 图像处理方法和装置、电子设备、计算机可读存储介质
CN107743200A (zh) 拍照的方法、装置、计算机可读存储介质和电子设备
CN108848306B (zh) 图像处理方法和装置、电子设备、计算机可读存储介质
CN108111768A (zh) 控制对焦的方法、装置、电子设备及计算机可读存储介质
US11605220B2 (en) Systems and methods for video surveillance
CN108898163B (zh) 信息处理方法和装置、电子设备、计算机可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19837827

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19837827

Country of ref document: EP

Kind code of ref document: A1