WO2020011001A1 - 图像处理方法、装置、存储介质和计算机设备 - Google Patents

图像处理方法、装置、存储介质和计算机设备 Download PDF

Info

Publication number
WO2020011001A1
WO2020011001A1 PCT/CN2019/092586 CN2019092586W WO2020011001A1 WO 2020011001 A1 WO2020011001 A1 WO 2020011001A1 CN 2019092586 W CN2019092586 W CN 2019092586W WO 2020011001 A1 WO2020011001 A1 WO 2020011001A1
Authority
WO
WIPO (PCT)
Prior art keywords
image frame
region
acquired
image
additional element
Prior art date
Application number
PCT/CN2019/092586
Other languages
English (en)
French (fr)
Inventor
程君
朱莹
李昊沅
李峰
左小祥
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2020011001A1 publication Critical patent/WO2020011001A1/zh
Priority to US16/997,887 priority Critical patent/US11367196B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/143Segmentation; Edge detection involving probabilistic approaches, e.g. Markov random field [MRF] modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/165Detection; Localisation; Normalisation using facial parts and geometric relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation

Definitions

  • the present application relates to the field of computer technology, and in particular, to an image processing method, device, storage medium, and computer device.
  • image processing technology has also been continuously improved. Users can process images through professional image processing software to make processed images perform better. Users can also use image processing software to attach materials provided by image processing software to images, so that processed images can convey more information.
  • the current image processing method requires the user to expand the material library of the image processing software, browse the material library, select the appropriate material from the material library, adjust the position of the material in the image to confirm the modification, and complete the image processing. Therefore, the current image processing method requires a lot of manual operations, takes a long time, and causes the image processing process to be inefficient.
  • an image processing method, device, storage medium, and computer equipment are provided, which can solve the current problem of relatively low image processing efficiency.
  • An image processing method applied to an image processing system includes:
  • the acquisition time of the second image frame is after the acquisition time of the first image frame
  • the additional element is added to an image frame acquired after the second image frame.
  • An image processing device includes:
  • An acquisition module configured to acquire a captured image frame
  • a determining module configured to determine a target region and a reference region obtained through image semantic segmentation in the acquired image frame
  • a determining module configured to: when the positional relationship between the target region and the reference region in the acquired first image frame satisfies the motion determination start condition, and the positional relationship between the target region and the reference region in the acquired second image frame satisfies the motion determination end condition, It is determined that an action that triggers the addition of an additional element is detected, and the acquisition time of the second image frame is after the acquisition time of the first image frame;
  • the adding module is configured to obtain an additional element when the action is detected; and add the additional element to an image frame acquired after the second image frame.
  • a computer-readable storage medium stores a computer program on the computer-readable storage medium.
  • the processor causes the processor to perform the following steps:
  • the acquisition time of the second image frame is after the acquisition time of the first image frame
  • the additional element is added to an image frame acquired after the second image frame.
  • a computer device includes a memory and a processor.
  • the memory stores a computer program.
  • the processor causes the processor to perform the following steps:
  • the acquisition time of the second image frame is after the acquisition time of the first image frame
  • the additional element is added to an image frame acquired after the second image frame.
  • the image processing method, device, storage medium, and computer equipment described above automatically determine the target region and reference region obtained through image semantic segmentation in the acquired image frame after acquiring the acquired image frame, and then according to the multi-frame image frame
  • the positional relationship between the target area and the reference area determines whether there is an action that triggers the addition of an additional element. In this way, when determining the action, additional elements are automatically added to the subsequent captured image frames, avoiding the tedious steps of manual operations, and greatly improving the efficiency of image processing.
  • FIG. 1 is an application environment diagram of an image processing method in an embodiment
  • FIG. 2 is a schematic flowchart of an image processing method according to an embodiment
  • FIG. 3 is a schematic diagram illustrating a principle of processing an image frame obtained by an image semantic segmentation model in a specific embodiment
  • FIG. 4 is a schematic diagram of segmenting a hand region from an acquired image frame in an embodiment
  • FIG. 5 is a schematic diagram of an image frame that satisfies a start condition for motion determination in an embodiment
  • FIG. 6 is a schematic diagram of an image frame that satisfies an end condition of motion determination in an embodiment
  • FIG. 7 is a schematic diagram of an image frame with additional elements added in one embodiment
  • FIG. 8 is a flowchart of an image processing method in a specific embodiment
  • FIG. 9 is a block diagram of an image processing apparatus according to an embodiment.
  • FIG. 10 is an internal structural diagram of a computer device in one embodiment.
  • FIG. 1 is an application environment diagram of an image processing method in an embodiment.
  • the image processing method is applied to an image processing system.
  • the image processing system includes a terminal 110 and a server 120.
  • the terminal 110 and the server 120 are connected through a network. Both the terminal 110 and the server 120 can execute the image processing method.
  • the terminal 110 may specifically be a desktop terminal or a mobile terminal, and the mobile terminal may specifically be at least one of a mobile phone, a tablet computer, a notebook computer, and the like.
  • the server 120 may specifically be an independent server or a server cluster composed of multiple independent servers.
  • the terminal 110 may acquire the captured image frame.
  • the image frame may be acquired by the terminal 110 through a built-in image acquisition device or an externally connected image acquisition device.
  • the built-in image acquisition device may specifically be a front camera or a rear camera of the terminal 110.
  • the image frame may also be sent to the terminal 110 after being collected by other computer equipment.
  • the terminal 110 can then determine the target region and the reference region obtained by the image semantic segmentation in the acquired image frame, and the positional relationship between the target region and the reference region in the image frame preceding the acquisition time satisfies the start condition of the motion determination, and the acquisition time When the positional relationship between the target region and the reference region in the subsequent image frame satisfies the end condition of the motion determination, it is determined that a motion that triggers the addition of an additional element is detected. In this way, the terminal 110 can acquire an additional element when the action is detected, and add the acquired additional element to an image frame acquired after an image frame whose acquisition time is later.
  • the terminal 110 may also send the acquired image frame to the server 120.
  • the server 120 determines that the positional relationship between the target region and the reference region in the image frame preceding the acquisition time satisfies the start condition of the motion determination and the image frame in the subsequent acquisition time When the positional relationship between the target area and the reference area satisfies the end condition of the motion determination, the terminal 110 is notified to detect an action that triggers the addition of an additional element, and the terminal 110 then acquires the additional element and adds the acquired additional element to the image frame acquired after the acquisition time. In the image frame.
  • the terminal 110 may also send the acquired image frame to the server 120, and the position relationship between the target region and the reference region in the image frame before the acquisition time satisfies the action determination start condition and the target in the image frame after the acquisition time
  • the positional relationship between the area and the reference area satisfies the end condition of the motion determination, it is determined that an action that triggers the addition of an additional element is detected, and the additional element is acquired, and the acquired additional element is added to the image frame acquired after the image frame acquired later, The image frame after adding the additional element is fed back to the terminal 110.
  • FIG. 2 is a schematic flowchart of an image processing method according to an embodiment.
  • the image processing method is applied to a computer device for illustration.
  • the computer device may be the terminal 110 or the server 120 in FIG. 1.
  • the method specifically includes the following steps:
  • the image frame is data obtained by imaging an imaging target through a physical imaging principle.
  • the terminal may specifically acquire image frames at a fixed or dynamic frame rate, and acquire the acquired image frames. Collecting image frames at a fixed or dynamic frame rate enables the image frames to be played at the fixed or dynamic frame rate to form a continuous dynamic picture.
  • the terminal when the computer device is a terminal, the terminal may acquire an image frame within a current shooting range of the image acquisition device through a built-in or externally connected image acquisition device, and acquire the acquired image frame.
  • the shooting range of the image acquisition device may change due to changes in the posture and position of the terminal.
  • the image acquisition device of the terminal may specifically include a front camera or a rear camera.
  • the terminal may acquire an image frame through a shooting mode provided by a social application, and acquire the acquired image frame.
  • the social application is an application capable of performing social interaction on the network based on the social network.
  • Social applications include instant messaging applications, SNS (Social Network Service, Social Website) applications, live broadcast applications, or photo applications.
  • the terminal may receive an image frame sent by another terminal and acquired by the other terminal, and obtain the received image frame.
  • the terminal may receive an image frame sent by another terminal and acquired by the other terminal, and obtain the received image frame.
  • the terminal may receive an image frame sent by another terminal and acquired by the other terminal, and obtain the received image frame.
  • a terminal establishes a video session through a social application running on the terminal, it receives image frames sent by the terminal corresponding to the other conversation parties after collection.
  • the terminal may collect image frames through a shooting mode provided by a live broadcast application, and use the collected image frames as live data to perform live broadcast through the live broadcast application.
  • the terminal may also receive an image frame sent by another terminal and collected by another terminal through a shooting mode provided by the live broadcast application, and use the received image frame as live data to play a live broadcast initiated by another user through the live broadcast application through the live broadcast application.
  • the terminal in the foregoing embodiment may upload the image frame to the server after acquiring the image frame, and the server may acquire the acquired image frame.
  • the computer device is a terminal.
  • a video recording application is installed on the terminal.
  • the terminal may run the video recording application according to a user instruction, and call the built-in camera of the terminal to acquire image frames through the video recording application, and when acquiring the image frames, acquire the captured image frames in real time according to the image frame acquisition timing.
  • the frame rate of the image frame acquired by the computer device is less than or equal to the frame rate of the image frame acquired by the image acquisition device.
  • image semantic segmentation is to segment pixels in an image according to different expression semantics.
  • Image semantic segmentation is used to achieve image semantic division into multiple pixel regions.
  • image semantic segmentation implements pixel-level classification of images. By classifying pixels, it implements semantic annotation of the entire image.
  • the classification unit is not limited in the embodiments of the present application, and may be classified pixel by pixel or classified by image blocks. One image block includes multiple pixels.
  • the target region is a region in the image frame that serves as a target detection action.
  • the reference area is an area used as a reference detection action in an image frame. In different image frames, the target area is a dynamic area and the reference area is a static area. The position relationship between the target area and the reference area is different in different image frames. It can be understood that the static area here is not an absolute static area, but a static area relative to the target area.
  • the hand is the execution part of the hoe action, then the hand area is the target area, which is dynamically changed in different image frames; the face is The reference part of the hoe action, then the face area is the reference area, which is static relative to the hand.
  • the target area is a dynamic area and the reference area is a relatively static area.
  • the human body when the camera collects image frames, the user makes a jumping action, because jumping is a continuous action. Then in the series of image frames collected by the camera when the user jumps, the human body is the execution part of the jumping action, then the human body area is the target area, and it changes dynamically in different image frames.
  • the foot region local body region
  • the ground is the reference part of the jumping action, so the ground area is the reference area. In this scenario, the reference area is an absolute static area.
  • the terminal may encode the acquired image frame into a semantic segmentation feature matrix, then decode the semantic segmentation feature matrix to obtain a semantic segmentation image, and then segment a target region from the semantic segmentation image according to pixels belonging to the target category, and according to the reference
  • the pixels of the category segment the reference region from the semantic segmentation image.
  • the pixels in the semantically segmented image have pixel values representing the classification category to which they belong, and correspond to the pixels in the original image frame from which the semantically segmented image is obtained.
  • the semantic segmentation feature matrix is a low-dimensional expression of the semantic features of the image content in the image frame, and covers the semantic feature information of the entire image frame.
  • Semantic segmentation image is an image segmented into several non-overlapping regions with a certain semantic area.
  • the pixel values of the pixels in the semantic segmentation image are used to reflect the classification category to which the corresponding pixels belong.
  • the classification of pixels can be two-class or multi-class. Pixel two classification refers to dividing pixels in a semantic segmentation image into two different pixel values, which are used to represent two different classification categories, such as pixels in a road image and other pixels in a map image.
  • Pixel multi-classification refers to dividing pixels in a semantically segmented image into two or more pixel values, which are used to represent two or more classification categories, such as pixels corresponding to the sky, pixels corresponding to the earth, and Corresponds to the pixels of the character.
  • the image size of the semantic segmentation image is consistent with the image size of the original image frame. In this way, it can be understood that the original image frame is classified pixel by pixel, and the pixel values of the pixels in the image are segmented according to semantics, and the category to which each pixel in the original image frame belongs can be obtained.
  • the first image frame and the second image frame are arbitrary image frames obtained, and the acquisition time of the second image frame is after the acquisition time of the first image frame.
  • the action determination start condition is a constraint condition for determining that a specific action is to be started.
  • the action determination end condition is a constraint condition that a specific action is being performed. Since the motion is a continuous process, it can be understood that the detected motion can be determined only when an image frame that satisfies the start condition of the motion determination is acquired and an image frame that satisfies the end condition of the motion determination is acquired later.
  • the user makes a hoe action, because the hoe is a continuous action. Then only when it is detected that the user starts to scratch, and then it is detected that the user is scratching, can it be determined that there is a steamed gesture. However, the user stops motion immediately after starting the steamed bread, at this time, it cannot be considered that the steamed bread movement is detected.
  • jumping is a continuous action. Then it can be determined that there is a jumping action only when the user takes off and then detects that the user leaves the ground (jumping). However, the user stopped the action immediately after taking off and did not leave the ground. At this time, it cannot be considered that the jump action was detected.
  • Triggering the action of adding additional elements is to trigger the action of adding additional elements in the captured image frames. Trigger actions that add additional elements, such as hoeing, covering your face, or touching your chin.
  • the additional element is used to additionally add data in the image frame.
  • the additional element may be a decorative element, such as a pendant. Decorative elements are data for decoration that can be displayed in a visual form. Additional elements such as data displayed in image frames to modify the content of the image. Additional elements such as masks, armors, ribbons, blue sky, or white clouds are not limited in the embodiments of the present application.
  • the additional element can be dynamic data, such as a dynamic picture; it can also be static data, such as a static picture.
  • the action that triggers the addition of an additional element may be one or more.
  • different actions may correspond to the same action determination start condition, or the same action determination end condition. These multiple actions can trigger the addition of uniform additional elements at a uniform position in the image frame, or they can trigger the addition of uniform additional elements at different positions in the image frame, or they can trigger the addition of different elements at different positions in the image frame. Additional elements.
  • sequence relationship involved in the collection time first and the collection time later refers to the collection time of the image frame that satisfies the start condition of the action determination, and the position in the target area and the reference area. The relationship meets the action determination end condition before the acquisition time of the image frame.
  • the terminal may query the correspondence between the motion and the additional element that is established in advance, and query the additional element corresponding to the detected motion according to the correspondence, to obtain the queried additional element.
  • the number of additional elements corresponding to the action may be one or more.
  • the terminal may randomly select additional elements from the multiple additional elements, or may select additional elements that match the user tag according to the user tag of the currently logged-in user ID.
  • the second image frame here is an image frame in which the positional relationship between the target area and the reference area satisfies the end condition of the motion determination, and is acquired after the first image frame in which the positional relationship between the target area and the reference area satisfies the start condition of the motion determination Image frames.
  • the computer device may use an image frame (that is, an image frame after the acquisition time) in which the positional relationship between the target region and the reference region satisfies the end condition of the motion determination, Add an additional element to the image frame after the demarcation frame at the acquisition time.
  • the image frame to which additional elements are added may be all the image frames acquired after the acquisition time of the demarcation frame, or part of the image frames acquired after the acquisition time of the demarcation frame, or may include the current frame of the demarcation frame.
  • the computer device is a terminal.
  • a video recording application is installed on the terminal.
  • the terminal can run the video recording application according to a user instruction, and call the built-in camera of the terminal to collect image frames through the video recording application, and when acquiring the image frames, acquire the captured image frames in real time according to the image frame acquisition timing.
  • the acquisition of image frames by the camera is a real-time and continuous process
  • the acquisition of the acquired image frames by the terminal is also a real-time and continuous process.
  • the terminal determines the position of the target area and the reference area in the image frame.
  • the action determination start condition when the action determination start condition is met, it is determined whether the next frame of the acquired image frame is satisfied with the action determination end condition; when the action determination end condition is met, the next image frame is obtained from the acquired Start adding additional elements (which may include image frames that currently meet the end condition of the action decision).
  • the terminal collects a series of image frames P 1 , P 2 ... P i , P i + 1 ... P n in real time, and these image frames are arranged according to the acquisition timing.
  • the terminal determines that the positional relationship between the target area and the reference area in the image frame P 2 satisfies the start condition for motion determination, and determines that the positional relationship between the target area and the reference area in the image frame P i meets the end condition for the motion determination; The action of additional elements. Then, to the terminal or from P i P i + 1 starts to add additional elements.
  • the target region and the reference region obtained through image semantic segmentation are automatically determined in the acquired image frames, and then based on the positional relationship between the target region and the reference region in the multi-frame image frame To determine if there is an action that triggers the addition of an additional element. In this way, when determining the action, additional elements are automatically added to the subsequent captured image frames, avoiding the tedious steps of manual operations, and greatly improving the efficiency of image processing.
  • S204 includes: inputting the acquired image frame into the image semantic segmentation model; outputting the target region probability distribution matrix and the reference region probability distribution matrix through the image semantic segmentation model; and determining the acquired image frame according to the target region probability distribution matrix.
  • the target region of the image determine the reference region in the acquired image frame according to the reference region probability distribution matrix.
  • the image semantic segmentation model is a machine learning model with semantic segmentation after training.
  • Machine learning English is called Machine Learning, or ML for short.
  • Machine learning models can have specific capabilities through sample learning.
  • Machine learning models can use neural network models, support vector machines, or logistic regression models.
  • Neural network models such as convolutional neural networks.
  • the image semantic segmentation model is specifically a neural network model.
  • the neural network model may specifically be a convolutional neural network model (CNN).
  • the convolution layer of the convolutional neural network model includes a plurality of convolution kernels.
  • a convolution kernel is an operator that performs a convolution operation on the input of the convolution layer.
  • Each convolution kernel performs a convolution operation on the input to obtain an output.
  • the pooling layer of the neural network model is also called the sampling layer, which is used to compress the input. There are usually two forms of mean pooling and max pooling. Pooling can be viewed as a special convolution process.
  • the image semantic segmentation model can be understood as a classifier for classifying pixel points included in an input image frame pixel by pixel.
  • the number of classification categories of the image semantic segmentation model can be customized during training.
  • the image semantic segmentation model is set as a multi-classifier, and the classification category includes three types: a target category, a reference category, and a background category.
  • the model input image frame, the pixels belonging to the target category are the pixels of the target area, the pixels belonging to the reference category are the pixels of the reference area, and the pixels belonging to the background category are the pixels of the background area. In this way, the pixels can be divided according to the category to which the pixels belong, and the target area and the reference area in the acquired image frame can be determined.
  • the target category is the hand category
  • the reference category is the face category
  • the pixels in the obtained image frame that belong to the hand category are the pixels in the hand area
  • the pixels that belong to the face category are the pixels in the face area. In this way, the pixels can be divided according to the category to which the pixels belong, and the hand region and the face region in the acquired image frame can be determined.
  • the matrix elements of the target region probability distribution matrix have probability values indicating that they belong to the target category, and correspond to the pixels in the image frame of the input model. That is, assuming that the image frame of the input model is 2 * 2, the target region probability distribution matrix is also 2 * 2, and the value of the matrix element of the matrix position (m, n) is the pixel position (m, n) in the image frame. ) The probability that a pixel belongs to the target category. Among them, the matrix position (pixel position) in the upper left corner of the matrix (image frame) is (0,0).
  • the matrix elements of the reference region probability distribution matrix have probability values indicating that they belong to the reference category, and correspond to the pixels in the image frame of the input model. That is, assuming that the image frame of the input model is 2 * 2, the target region probability distribution matrix is also 2 * 2, and the value of the matrix element of the matrix position (m, n) is the pixel position (m, n) in the image frame. ) The probability that a pixel belongs to the reference category.
  • the terminal may input the acquired image frame into a previously trained image semantic segmentation model, and output the target region probability distribution matrix and the reference region probability distribution matrix through the image semantic segmentation model.
  • the terminal may then determine the area surrounded by pixels corresponding to the matrix element whose probability value is greater than the preset probability in the probability distribution matrix of the target area as the target area, and determine the matrix element whose probability value is greater than the preset probability in the reference area probability distribution matrix.
  • the area surrounded by the corresponding pixels is determined as the reference area.
  • the preset probability is a cut-off value set in advance for determining whether to be classified as the current category.
  • FIG. 3 is a schematic diagram illustrating a principle of processing an image frame obtained by an image semantic segmentation model in a specific embodiment.
  • the image semantic segmentation model is a U-shaped symmetric model, and the output of the previous network layer is used as the input of the network layer at the corresponding location through a skip connection.
  • the input of the image semantic segmentation model is the obtained feature map of the image frame (such as the RGB three-channel feature map).
  • the network layer in the image semantic segmentation model operates on the feature map input to this layer to obtain the feature map output.
  • the output can be a semantic segmentation image or a probability distribution matrix, which is determined based on the samples and labels during training.
  • n * k (for example, 3 * 256 * 256, or 64 * 256 * 256)
  • m represents the number of feature maps
  • n * k represents the size of the feature map.
  • Network layer operations on feature maps include: Convolution, BatchNorm, ReLU, MaxPool, Upsampling, etc.
  • the image frame is automatically input to a trained machine learning model, and the target region and reference region are determined by outputting the target region probability distribution matrix and the reference region probability distribution matrix according to the machine learning model.
  • the matrix elements in the probability distribution matrix have probability values indicating that corresponding pixels in an image frame belong to a specific classification category, so that a target area can be automatically determined based on pixels belonging to a target category, and pixels based on a reference category
  • the determination of the reference region improves the accuracy of image region division, and lays a foundation for subsequent determination of whether the motion determination start condition or the motion determination end condition is satisfied.
  • the target area is a hand area; the reference area is a face area.
  • the image processing method further includes: determining a gesture type corresponding to a hand region in the acquired image frame.
  • the gesture type is a gesture type that triggers the addition of an additional element, it can be determined whether the positional relationship between the target region and the reference region in the acquired image frame satisfies the motion determination start condition.
  • the action includes: if the gesture type of the first image frame is a trigger type, when the positional relationship between the hand region and the face region in the first image frame satisfies the motion determination start condition, and the When the positional relationship satisfies the end condition of the motion determination, it is determined that a motion that triggers the addition of an additional element is detected.
  • the hands and face are both limb parts of living beings (humans or animals).
  • the hand area is the area where the hand is located.
  • the hand region may be a region surrounded by the contour of the hand, or may be a regular region including the hand and having a high proportion of hands.
  • the face region may be a region surrounded by a contour of the face, or may be a regular region including a face and having a high proportion of the face.
  • Gestures are forms of motion made by the user through the hand.
  • the gesture type is the type to which the gesture belongs in the obtained image frame.
  • FIG. 4 shows a schematic diagram of segmenting a hand region from an acquired image frame in an embodiment.
  • the terminal may determine the hand region 401 in the image frame through image semantic segmentation.
  • it is an image obtained from the hand region segmented according to a regular shape from the obtained image frame including the hand region.
  • the hand region is segmented from the obtained original image, and then the segmented hand region is identified to avoid the hand region.
  • the problem of inaccurate recognition when the proportion of the entire image is small can reduce the interference of the recognition of the gesture type of the gesture in the hand region with respect to the background region of the hand region in the original image, and can improve the recognition accuracy.
  • the computer device may use a pre-trained gesture recognition model to recognize a gesture type to which the gesture belongs in the image frame.
  • a hand image is obtained by intercepting the hand region from the obtained image frame, and is input into the gesture recognition model.
  • the hidden layer in the gesture recognition model is used to calculate the characteristics of the hand image.
  • the type of gesture in the hand image is output.
  • the gesture type is the gesture type that triggers the addition of an additional element, it continues to determine whether the position relationship between the hand region and the face region in the acquired image frame satisfies the start condition of the motion determination, otherwise, it identifies the next frame in the acquired image frame. Whether the gesture type corresponding to the hand area is a gesture type that triggers adding an additional element.
  • the terminal recognizes that the gesture type corresponding to the hand area in the acquired image frame is a gesture type that triggers the addition of an additional element, and the position relationship between the hand area and the face area in the image frame satisfies an action determination start condition , Then continue to determine whether the gesture type corresponding to the hand area in the next image frame obtained after the image frame is the gesture type that triggers the addition of additional elements, and determine the location of the hand area in the next image frame.
  • the corresponding gesture type is the type of gesture that triggers the addition of additional elements, and then continues to determine whether the position relationship between the hand area and the face area in the next frame of the image frame satisfies the end condition of the motion determination, until another frame image acquired later is detected
  • the gesture type corresponding to the hand region in the frame is the gesture type that triggers the addition of additional elements, and when the positional relationship between the hand region and the face region in the image frame satisfies the motion determination end condition, it is determined that the action that triggers the addition of an additional element .
  • the gesture recognition model is a machine learning model.
  • the gesture recognition model is a two-class model.
  • the image samples used to train the binary classification model include positive samples that belong to the gesture type that triggers the addition of additional elements, and negative samples that do not belong to the gesture type that triggers the addition of additional elements.
  • the gesture recognition model is a multi-class model.
  • the image samples used to train the multi-class model include samples that belong to each gesture type that triggers the addition of additional elements.
  • the gesture recognition model can specifically use the ConvNet Configuration model as an initial model, and train the initial model according to the training samples to obtain model parameters suitable for gesture recognition.
  • the computer device may further perform feature matching on a hand image obtained by intercepting a hand region from the acquired image frame with a hand image template belonging to a gesture type that triggers the addition of an additional element.
  • the matching is successful, It is determined that the gesture type corresponding to the hand region in the acquired image frame is a gesture type that triggers adding an additional element.
  • the gesture type corresponding to the hand area in the acquired image frame is identified as the gesture type that triggers the addition of an additional element
  • the detection is determined.
  • the action of adding additional elements including: starting from the time when the positional relationship between the target area and the reference area in the first image frame satisfies the start condition of the motion determination; when the timing does not reach the preset duration, and the target area in the second image frame
  • it is determined that a motion that triggers the addition of an additional element is detected.
  • actions are not only continuous but also coherent. In layman's terms, it is done coherently after the action begins. For example, for a hoe action, a hoe action can be determined only when the user starts to hoe and then continues to hoe (that is, a hoe is detected within a certain time range); and the user starts Stop the action immediately after the hoe and wait for a long time before continuing the hoe. At this time, it cannot be considered that the hoe action has been detected. For another example, for the jumping action, only when the user takes off and then detects that the user leaves the ground continuously after the take-off (jumping) can it be determined that there is a jumping action; and the user stops the action immediately after taking off without leaving the ground. And wait for a long time before leaving the ground, at this time can not be considered as a jump movement detected.
  • a hoe action can be determined only when the user starts to hoe and then continues to hoe (that is, a hoe is detected within a certain time range); and the user starts Stop the action
  • each time a computer device acquires an image frame it determines whether the image frame satisfies a motion determination start condition, and starts counting when it determines that an image frame meets the motion determination start condition. In this way, the computer device continues to acquire an image frame when timing, and determines whether the image frame satisfies the end condition of the motion determination. Only when the timing duration does not reach the preset duration and the image frames that are continuously acquired meet the motion determination end condition, it is determined that a motion that triggers the addition of an additional element is detected. If the computer equipment does not detect that the image frames that continue to be acquired within the timing period meet the motion determination end condition until the timing duration reaches the preset duration, it is determined that the motion that triggers the addition of an additional element is not detected.
  • the computer device continues to acquire the image frame, and no longer determines whether the image frame satisfies the motion determination end condition, but determines whether the image frame satisfies the motion determination start condition, and then continues to determine that a certain image frame meets the motion determination start condition Time to start timing to continue to detect actions that trigger the addition of additional elements.
  • the preset duration is a duration for which the action is determined based on actual experience.
  • the image processing method further includes: when the ratio of the intersection of the target region and the reference region in the first image frame to the target region exceeds a first preset value, determining the target region and the target region in the first image frame. The position relationship of the reference area satisfies the start condition of the action determination; or when the ratio of the intersection of the target area and the reference area in the first image frame to the target area exceeds the second preset value, and the center position of the target area is located at the center position of the reference area When it is above, it is determined that the positional relationship between the target region and the reference region in the first image frame satisfies the motion determination start condition.
  • the first preset value and the second preset value are preset values.
  • the first preset value may be 0.5
  • the second preset value may be 0.2. It can be understood that the ratio of the intersection of the target region and the reference region in the acquired image frame to the target region exceeds the first preset value, or the ratio of the intersection of the target region and the reference region in the acquired image frame to the target region exceeds the second
  • the preset value and the center position of the target area is located above the center position of the reference area. It is a positional relationship between the target area and the reference area when the start condition of the action determination is determined based on actual experience.
  • FIG. 5 is a schematic diagram of an image frame that satisfies a start condition for motion determination in an embodiment.
  • the action of triggering the addition of an additional element is a hoe action
  • the target region is a hand region
  • the reference region is a face region.
  • the ratio of the intersection of the hand region and the face region to the hand region exceeds the first preset value (0.5), and it can be determined that the positional relationship between the target region and the reference region in the image frame satisfies the action Determine the start condition.
  • the first preset value 0.5
  • the image processing method further includes: determining a reference position in a reference area in the second image frame; and determining the second image when the target area in the second image frame is located above the reference position in the reference area. The positional relationship between the target area and the reference area in the frame satisfies the end condition of the motion determination.
  • the reference position is a comparison position used to determine whether the positional relationship between the target region and the reference region in the image frame satisfies the end condition of the motion determination. It can be understood that the position of the target region in the image frame above the reference position in the reference region is the positional relationship between the target region and the reference region when the end condition of the action determination is determined based on actual experience.
  • FIG. 6 is a schematic diagram of an image frame that satisfies an end condition of motion determination in an embodiment.
  • the action of triggering the addition of an additional element is a hoe action
  • the target region is the hand region
  • the reference region is the face region
  • the reference position is the position of the eyebrows in the face region. Referring to FIG. 6, it can be seen that the hand region is located above the eyebrow position in the face region in the image frame, and it can be determined that the positional relationship between the target region and the reference region in the image frame satisfies the end condition of the motion determination.
  • a basis for specifically determining whether an image frame satisfies a motion determination start condition or a motion determination end condition is provided, thereby ensuring that the motion determination is effectively performed. Moreover, only when it is determined that the acquired image frame satisfies the motion determination start condition, and the image frames that are continuously acquired within a preset period of time thereafter satisfy the motion determination end condition, it is determined that the motion is detected, so that the detection of the motion conforms to actual cognition And effective.
  • the acquired image frame meets the motion determination start condition, and the image frames that are continuously acquired within a preset period of time thereafter satisfy the motion determination end condition, indicating that the acquisition time of the image frame that meets the motion determination start condition and the motion determination are satisfied
  • the time interval between the acquisition times of the image frames of the end condition is less than or equal to a preset duration.
  • the target area is the hand area; the reference area is the face area; the reference position is the location of the eyebrows.
  • Adding additional elements to the image frames acquired after the second image frame includes: determining the area formed by the boundary of the eyebrows position in the face area and the hand area near the eyebrows position in the image frames acquired after the second image frame ; In an image frame acquired after the second image frame, an additional element is adaptively added to the determined area.
  • the computer device may perform face detection on an image frame acquired after the second image frame, determine the left and right eyebrow reference points in the face region in the second image frame, determine the eyebrow location based on the reference point, and then determine the face.
  • the additional element is adaptively added to the determined area.
  • the size of the additional element can be adjusted to the size of the determined area. In this way, the size of the determined area will gradually increase with the movement of the hand area, and the additional element also The display size is gradually increased as the size of the determined area is increased.
  • the additional element is adaptively added to the determined region, or a partial region of the additional element may be added to the determined region, where the partial region of the additional element is bounded by a certain boundary of the additional element, and the boundary and the determined region are Corresponds to the boundary. In this way, the size of the determined area will gradually increase with the movement of the hand area, and the additional elements will gradually change from partial display to full display as the size of the determined area increases. Bigger.
  • FIG. 7 is a schematic diagram of an image frame with additional elements added in one embodiment.
  • the action of triggering the addition of an additional element is a hoe action
  • the target region is the hand region
  • the reference region is the face region
  • the reference position is the position of the eyebrows in the face region.
  • the additional element is adaptively added to the determined area until it is completely added, instead of directly adding the complete additional element, which avoids the singularity and abruptness of the additional element addition process. Adapting to add additional elements enhances interactivity.
  • the computer device may further blur the boundary of the additional element.
  • the image processing method further includes: when the action that triggers the addition of the additional element is not detected, playing the acquired image frame frame by frame according to the collected timing; after detecting the action that triggers the addition of the additional element, The image frame after adding additional elements is played frame by frame according to the timing of the acquisition.
  • the computer device may play the captured image frames in real time after capturing the image frames.
  • the captured image frames can be directly rendered to form a preview screen, and the captured image frames are displayed visually.
  • the additional element is added to the image frame collected after the action that triggers the addition of additional elements is detected, and the addition is rendered.
  • the image frames after the additional elements form a preview screen to visually display the image frames after the additional elements are added.
  • a preview screen is generated in real time based on the captured video frame or the image frame after the additional element is added for the user to watch. This allows users to know what ’s being recorded as video in real time so they can fix or re-record in the event of an error.
  • the image processing method further includes: replacing the corresponding image frame before adding the additional element with the image frame after adding the additional element; and according to the acquisition time of the image frame determined after the replacement, The image frame generates a recorded video according to the timing of the acquisition time; among the image frames determined after replacement, the acquisition time of the image frame after adding the additional element is the acquisition time of the corresponding image frame before the additional element is added.
  • the image frames determined after the replacement include the image frames originally collected before the additional elements are added, and the image frames obtained by adding the additional elements after the additional elements are added. That is, for the obtained multiple image frames, some image frames have no additional elements added, and some image frames have additional elements added. Therefore, the determined multiple image frames include both The image frame, that is, the originally collected image frame, also includes the image frame after adding additional elements, that is, the image frame obtained by replacement.
  • the acquisition time of the original image frame without the replacement operation is the actual acquisition time of the image frame.
  • the acquisition time of the image frame obtained through the replacement in the image frame determined after the replacement is the acquisition time of the corresponding image frame before the attachment element is added.
  • the original captured image frames A, B, C, and D are added from the image frame C. Adding additional elements to the image frame C to obtain the image frame C1, and adding additional elements to the image frame D to obtain the image frame D1. Then the image frame C is replaced by the image frame C, the image frame D is replaced by the image frame D1, and the determined image frames are A, B, C1, and D1, that is, the image frames are used to generate the video.
  • the computer device may use the image frame obtained after adding the additional element, replace the corresponding image frame before adding the additional element, and then according to the acquisition time of each image frame determined after the replacement, the determined image frame after the replacement is collected according to the acquisition time. Generated video of the time sequence.
  • the time sequence according to the collection time may be a time reverse sequence or a time sequence.
  • the computer device may share the video to a social session or publish the video to a social publishing platform.
  • the captured image frames are processed automatically and in real time during the shooting process, and the video is generated in real time, which avoids the tedious steps brought by the subsequent manual processing, which greatly simplifies the operation and improves Video generation efficiency.
  • FIG. 8 shows a flowchart of an image processing method in a specific embodiment.
  • the action of triggering the addition of an additional element is a hoe action
  • the target region is the hand region
  • the reference region is the face region
  • the reference position is the position of the eyebrows in the face region.
  • the computer equipment is a terminal.
  • a video recording application is installed on the terminal. The terminal may run the video recording application according to a user instruction, and call the built-in camera of the terminal to acquire image frames through the video recording application, and when acquiring the image frames, acquire the captured image frames in real time according to the image frame acquisition timing.
  • the terminal After acquiring the captured image frame, the terminal can determine the gesture type corresponding to the hand area in the acquired image frame, and determine whether the gesture type is a trigger type; if not, obtain the next image frame and continue to determine the image The type of gesture corresponding to the hand region in the frame; if yes, determine whether the positional relationship between the target region and the reference region in the image frame satisfies the motion determination start condition.
  • the starting condition of the action determination is: the ratio of the intersection of the target region and the reference region in the image frame to the target region exceeds the first preset value, or the ratio of the intersection of the target region and the reference region in the image frame to the target region exceeds The second preset value and the center position of the target area is located above the center position of the reference area.
  • the terminal determines that the image frame does not satisfy the motion determination start condition, it acquires the next image frame and continues to determine the gesture type corresponding to the hand area in the image frame; when it determines that the image frame meets the motion determination start condition, it starts Time and continue to get the next image frame.
  • the terminal determines the type of gesture corresponding to the hand region in the image frame to be acquired, and determines whether the gesture type is a trigger type; if not, it acquires the next image frame and continues to determine the position of the hand region in the image frame.
  • the corresponding gesture type if yes, determine whether the positional relationship between the target region and the reference region in the image frame that is continuously acquired satisfies the motion determination end condition.
  • the action determination end condition is: the target region in the image frame is located above the reference position in the reference region.
  • the terminal When the terminal detects that the image frame satisfies the end condition of the motion determination when the timing time does not reach the preset duration, it determines that an action that triggers the addition of an additional element is detected, and determines the eyebrows in the face area in the image frames collected after the image frames that are continuously acquired The area formed by the location of the hand and the area near the boundary of the eyebrows; in the image frames acquired after the image frames that are continuously acquired, additional elements are adaptively added to the determined area.
  • the terminal When the terminal reaches the preset time length and no image frame is detected to meet the motion determination end condition, the terminal acquires the next image frame, and continues to determine the type of gesture corresponding to the hand area in the image frame. When the trigger type is detected, it is determined whether the start condition is satisfied.
  • the terminal can also replace the corresponding image frame before adding the additional element with the image frame after adding the additional element in real time during the image processing.
  • the acquisition time of the image frame determined after the replacement the determined image frame after the replacement is collected according to the acquisition time.
  • the recorded video is generated at the time sequence; after the image frame acquisition is ended, the recorded video is generated according to the acquisition time sequence of the determined image frame according to the acquisition time of the image frame determined after the replacement.
  • an image processing apparatus 900 is provided.
  • the image processing apparatus 900 includes: an acquisition module 901, a determination module 902, a determination module 903, and an addition module 904.
  • the obtaining module 901 is configured to obtain a captured image frame.
  • a determining module 902 is configured to determine a target region and a reference region obtained through image semantic segmentation in the acquired image frame.
  • a determining module 903 configured to: when the acquired positional relationship between the target region and the reference region in the first image frame meets the motion determination start condition, and the acquired positional relationship between the target region and the reference region in the second image frame meets the motion determination end condition , It is determined that an action that triggers the addition of an additional element is detected, and the acquisition time of the second image frame is after the acquisition time of the first image frame.
  • the adding module 904 is configured to obtain an additional element when a motion is detected; add the additional element to an image frame acquired after the second image frame.
  • the determining module 902 is further configured to input the acquired image frame into the image semantic segmentation model; output the target region probability distribution matrix and the reference region probability distribution matrix through the image semantic segmentation model; and determine the acquired region according to the target region probability distribution matrix.
  • the target region in the image frame; the reference region in the acquired image frame is determined according to the reference region probability distribution matrix.
  • the target area is a hand area; the reference area is a face area.
  • the determining module 902 is further configured to determine a gesture type corresponding to a hand region in the acquired image frame.
  • the determination module 903 is further configured to: if the gesture type of the first image frame is a trigger type, when the positional relationship between the hand region and the face region in the first image frame satisfies the motion determination start condition, and the hand region and the face in the second image frame When the positional relationship of the regions satisfies the end condition of the motion determination, it is determined that a motion that triggers the addition of an additional element is detected.
  • the determination module 903 is further configured to start timing when the positional relationship between the target area and the reference area in the first image frame satisfies a motion determination start condition; when the timing duration does not reach a preset duration, and in the first image frame When the positional relationship between the target area and the reference area in the second image frame obtained later satisfies the end condition of the motion determination, it is determined that a motion that triggers the addition of an additional element is detected.
  • the determining module 903 is further configured to determine the target area and the reference area in the first image frame when the ratio of the intersection of the target area and the reference area in the first image frame to the target area exceeds a first preset value.
  • the positional relationship of the position satisfies the start condition of action determination; or, the ratio of the intersection of the target region and the reference region in the first image frame to the target region exceeds the second preset value, and the center position of the target region is above the center position of the reference region At this time, it is determined that the positional relationship between the target region and the reference region in the first image frame satisfies a motion determination start condition.
  • the determining module 903 is further configured to determine a reference position in a reference region in the second image frame; when the target region in the second image frame is located above the reference position in the reference region, determine the second image frame The positional relationship between the target region and the reference region satisfies the end condition of the motion determination.
  • the target area is the hand area; the reference area is the face area; the reference position is the location of the eyebrows.
  • the adding module 904 is further configured to determine, in an image frame acquired after the second image frame, an area formed by a boundary between the position of the eyebrows in the face region and the hand region near the position of the eyebrows; in the image frames acquired after the second image frame To add additional elements to the identified area.
  • the acquiring module 901 is further configured to play the acquired image frame frame by frame according to the acquired timing when no triggering action of adding an additional element is detected.
  • the adding module 904 is further configured to, after detecting an action that triggers the addition of an additional element, play the image frame after the addition of the additional element frame by frame according to the timing of the acquisition.
  • the adding module 904 is further configured to replace the corresponding image frame before adding the additional element with the image frame after adding the additional element; according to the acquisition time of the image frame determined after the replacement, the determined image after the replacement is replaced The frame generates a recorded video according to the timing of the acquisition time; among the image frames determined after the replacement, the acquisition time of the image frame after adding the additional element is the acquisition time of the corresponding image frame before the additional element is added.
  • FIG. 10 shows an internal structure diagram of a computer device in one embodiment.
  • the computer device may specifically be the terminal 110 or the server 120 in FIG. 1.
  • the computer device includes a processor, a memory, and a network interface connected through a system bus.
  • the memory includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium of the computer device stores an operating system, and may also store a computer program.
  • the processor can implement an image processing method.
  • a computer program may also be stored in the internal memory, and when the computer program is executed by the processor, the processor may execute the image processing method.
  • FIG. 10 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. Include more or fewer parts than shown in the figure, or combine certain parts, or have a different arrangement of parts.
  • the image processing apparatus provided in this application may be implemented in the form of a computer program.
  • the computer program may be run on a computer device as shown in FIG. 10, and a non-volatile storage medium of the computer device may store a composition.
  • Each program module of the image processing apparatus includes, for example, an acquisition module 901, a determination module 902, a determination module 903, and an addition module 904 shown in FIG. 9.
  • the computer program composed of each program module causes the processor to execute the steps in the image processing method of each embodiment of the present application described in this specification.
  • the computer device shown in FIG. 10 may obtain the captured image frame through the obtaining module 901 in the image processing apparatus 900 shown in FIG. 9.
  • the determination module 902 determines a target region and a reference region obtained through image semantic segmentation in the acquired image frame.
  • the determination module 903 passes, It is determined that an action that triggers the addition of an additional element is detected, and the acquisition time of the second image frame is after the acquisition time of the first image frame.
  • the additional module 904 obtains additional elements when a motion is detected; and adds the additional elements to an image frame acquired after the second image frame.
  • a computer-readable storage medium is provided.
  • a computer program is stored on the computer-readable storage medium.
  • the processor causes the processor to perform the following steps: acquiring a captured image frame; Determine the target region and reference region obtained by image semantic segmentation in the acquired image frame; when the positional relationship between the target region and the reference region in the acquired first image frame satisfies the action determination start condition, and the target in the acquired second image frame
  • the positional relationship between the area and the reference area satisfies the end condition of the motion determination, it is determined that an action that triggers the addition of an additional element is detected, and the acquisition time of the second image frame is after the acquisition time of the first image frame; the additional element is acquired when the motion is detected ; Add an additional element to the image frame acquired after the second image frame.
  • determining the target region and the reference region obtained through image semantic segmentation in the acquired image frame includes: inputting the acquired image frame into the image semantic segmentation model; and outputting the target region probability distribution matrix and The reference region probability distribution matrix; the target region in the acquired image frame is determined according to the target region probability distribution matrix; the reference region in the acquired image frame is determined according to the reference region probability distribution matrix.
  • the target area is a hand area; the reference area is a face area.
  • the computer program further causes the processor to perform the following steps: determining a gesture type corresponding to a hand region in the acquired image frame;
  • the computer program also causes the processor to perform the following steps: if the gesture type of the first image frame is a trigger type, when the positional relationship between the hand region and the face region in the first image frame satisfies the motion determination start condition, and the second image frame When the positional relationship between the hand area and the face area satisfies the motion determination end condition, it is determined that a motion that triggers the addition of an additional element is detected.
  • the computer program further causes the processor to perform the following steps: start timing when the positional relationship between the target area and the reference area in the first image frame satisfies a motion determination start condition; when the timing duration does not reach a preset duration, and When the positional relationship between the target region and the reference region in the second image frame acquired after the first image frame satisfies the end condition of the motion determination, it is determined that a motion that triggers the addition of an additional element is detected.
  • the computer program further causes the processor to perform the following steps: when the ratio of the intersection of the target region and the reference region to the target region in the first image frame exceeds the first preset value, determine that the The positional relationship between the target area and the reference area satisfies the start condition for action determination; or, the ratio of the intersection of the target area and the reference area to the target area in the first image frame exceeds the second preset value, and the center position of the target area is located at the reference When the area center position is above, it is determined that the positional relationship between the target area and the reference area in the first image frame satisfies the motion determination start condition.
  • the computer program further causes the processor to perform the steps of: determining a reference position in a reference area in the second image frame; and when the target area in the second image frame is above the reference position in the reference area, It is determined that the positional relationship between the target area and the reference area in the second image frame satisfies the end condition of the motion determination.
  • the target area is the hand area; the reference area is the face area; the reference position is the location of the eyebrows.
  • Adding an additional element to an image frame acquired after an image frame after the acquisition time includes: determining an eyebrow position in a face region and a hand region near the eyebrow position in an image frame acquired after a second image frame An area formed by a boundary; in an image frame acquired after the second image frame, an additional element is added to the determined area.
  • the computer program further causes the processor to perform the following steps: when the action that triggers the addition of an additional element is not detected, the acquired image frame is played frame by frame according to the timing of the acquisition; After the action, the image frame after adding additional elements will be played frame by frame according to the timing of acquisition.
  • the computer program further causes the processor to perform the following steps: replace the corresponding image frame before adding the additional element with the image frame after adding the additional element; and replace the image frame after the replacement with the image frame acquisition time determined
  • a computer device including a memory and a processor.
  • the computer program is stored in the memory.
  • the processor causes the processor to perform the following steps: wherein the computer program is executed by the processor.
  • the acquisition time of the second image frame is after the acquisition time of the first image frame
  • An additional element is added to an image frame acquired after the second image frame.
  • the processor when the computer program is executed by the processor to determine the target region and the reference region obtained through image semantic segmentation in the acquired image frame, the processor causes the processor to perform the following steps:
  • the reference region in the acquired image frame is determined according to the reference region probability distribution matrix.
  • the target area is the hand area;
  • the reference area is the face area; when the computer program is executed by the processor, the processor causes the processor to perform the following steps:
  • the computer program is executed by the processor when the positional relationship between the target region and the reference region in the acquired first image frame satisfies the motion determination start condition, and the positional relationship between the target region and the reference region in the acquired second image frame satisfies the operation determination end condition , It is determined that when the step that triggers the action of adding an additional element is detected, the processor is caused to perform the following steps:
  • the gesture type of the first image frame is a trigger type, when the positional relationship between the hand region and the face region in the first image frame satisfies the motion determination start condition, and the positional relationship between the hand region and the face region in the second image frame satisfies the action
  • the end condition it is determined that an action that triggers the addition of an additional element is detected.
  • the computer program is executed by the processor when the acquired positional relationship between the target region and the reference region in the first image frame satisfies the motion determination start condition, and the acquired positional relationship between the target region and the reference region in the second image frame is acquired.
  • the processor is caused to perform the following steps:
  • the processor when the computer program is executed by the processor, the processor causes the processor to perform the following steps:
  • the processor when the computer program is executed by the processor, the processor causes the processor to perform the following steps:
  • an additional element is added to the determined area.
  • the processor when the computer program is executed by the processor, the processor causes the processor to perform the following steps:
  • the acquired image frames are played frame by frame according to the collected timing
  • the image frames after the addition of additional elements are played frame by frame in accordance with the timing of the acquisition.
  • the processor when the computer program is executed by the processor, the processor causes the processor to perform the following steps:
  • the recorded video is generated according to the acquisition time sequence of the determined image frame after the replacement;
  • the acquisition time of the image frame after the additional element is added is the acquisition time of the corresponding image frame before the additional element is added.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM dual data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain (Synchlink) DRAM
  • SLDRAM synchronous chain (Synchlink) DRAM
  • Rambus direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Abstract

本申请涉及一种图像处理方法、装置、存储介质和计算机设备,该方法包括:获取采集的图像帧;在获取的图像帧中确定通过图像语义分割得到的目标区域和参考区域;当获取的第一图像帧中目标区域和参考区域的位置关系满足动作判定开始条件、且获取的第二图像帧中目标区域和参考区域的位置关系满足动作判定结束条件时,则判定检测到触发添加附加元素的动作,第二图像帧的采集时间位于第一图像帧的采集时间之后;在检测到动作时获取附加元素;将附加元素添加至第二图像帧之后采集的图像帧中。本申请提供的方案提高了图像处理效率。

Description

图像处理方法、装置、存储介质和计算机设备
本申请要求于2018年7月11日提交、申请号为201810755907.7、发明名称为“图像处理方法、装置、存储介质和计算机设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,特别是涉及一种图像处理方法、装置、存储介质和计算机设备。
背景技术
随着计算机技术的发展,图像处理技术也不断进步。用户可以通过专业的图像处理软件对图像进行处理,使得经过处理的图像表现更好。用户还可以通过图像处理软件,在图像中附加由图像处理软件提供的素材,让经过处理的图像能够传递更多的信息。
然而,目前的图像处理方式,需要用户展开图像处理软件的素材库,浏览素材库,从素材库中选择合适的素材,调整素材在图像中的位置,从而确认修改,完成图像处理。于是目前的图像处理方式需要大量的人工操作,耗时长,导致图像处理过程效率低。
发明内容
基于此,提供一种图像处理方法、装置、存储介质和计算机设备,能够解决目前图像处理效率比较低的问题,。
一种图像处理方法,应用于图像处理系统,所述方法包括:
获取采集的图像帧;
在获取的图像帧中确定通过图像语义分割得到的目标区域和参考区域;
当获取的第一图像帧中目标区域和参考区域的位置关系满足动作判定开始条件、且获取的第二图像帧中目标区域和参考区域的位置关系满足动作判定结束条件时,则判定检测到触发添加附加元素的动作,所述第二图像帧的采集时间位于所述第一图像帧的采集时间之后;
在检测到所述动作时获取附加元素;
将所述附加元素添加至所述第二图像帧之后采集的图像帧中。
一种图像处理装置,包括:
获取模块,用于获取采集的图像帧;
确定模块,用于在获取的图像帧中确定通过图像语义分割得到的目标区域和参考区域;
判定模块,用于当获取的第一图像帧中目标区域和参考区域的位置关系满足动作判定开始条件、且获取的第二图像帧中目标区域和参考区域的位置关系满足动作判定结束条件时,则判定检测到触发添加附加元素的动作,所述第二图像帧的采集时间位于所述第一图像帧的采集时间之后;
添加模块,用于在检测到所述动作时获取附加元素;将所述附加元素添加至所述第二图像帧之后采集的图像帧中。
一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行以下步骤:
获取采集的图像帧;
在获取的图像帧中确定通过图像语义分割得到的目标区域和参考区域;
当获取的第一图像帧中目标区域和参考区域的位置关系满足动作判定开始条件、且获取的第二图像帧中目标区域和参考区域的位置关系满足动作判定结束条件时,则判定检测到触发添加附加元素的动作,所述第二图像帧的采集时间位于所述第一图像帧的采集时间之后;
在检测到所述动作时获取附加元素;
将所述附加元素添加至所述第二图像帧之后采集的图像帧中。
一种计算机设备,包括存储器和处理器,所述存储器中储存有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行以下步骤:
获取采集的图像帧;
在获取的图像帧中确定通过图像语义分割得到的目标区域和参考区域;
当获取的第一图像帧中目标区域和参考区域的位置关系满足动作判定开始条件、且获取的第二图像帧中目标区域和参考区域的位置关系满足动作判定结束条件时,则判定检测到触发添加附加元素的动作,所述第二图像帧的采集时间位于所述第一图像帧的采集时间之后;
在检测到所述动作时获取附加元素;
将所述附加元素添加至所述第二图像帧之后采集的图像帧中。
上述图像处理方法、装置、存储介质和计算机设备,在获取到采集的图像帧后,自动在获取的图像帧中确定通过图像语义分割得到的目标区域和参考区域,继而再根据多帧图像帧中目标区域和参考区域的位置关系,判定是否有触发添加附加元素的动作。这样即可在判定该动作时便自动将附加元素添加至后续采集的图像帧中,避免了人工操作的繁琐步骤,极大地提高了图像处理效率。
附图说明
图1为一个实施例中图像处理方法的应用环境图;
图2为一个实施例中图像处理方法的流程示意图;
图3为一个具体的实施例中图像语义分割模型对获取的图像帧进行处理的原理示意图;
图4为一个实施例中从获取的图像帧中分割出手部区域的示意图;
图5为一个实施例中满足动作判定开始条件的图像帧的示意图;
图6为一个实施例中满足动作判定结束条件的图像帧的示意图;
图7为一个实施例中添加附加元素的图像帧的示意图;
图8为一个具体的施例中图像处理方法的流程图;
图9为一个实施例中图像处理装置的模块结构图;
图10为一个实施例中计算机设备的内部结构图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用 于限定本申请。
图1为一个实施例中图像处理方法的应用环境图。参照图1,该图像处理方法应用于图像处理系统。该图像处理系统包括终端110和服务器120。其中,终端110和服务器120通过网络连接。终端110与服务器120均可执行该图像处理方法。终端110具体可以是台式终端或移动终端,移动终端具体可以手机、平板电脑、笔记本电脑等中的至少一种。服务器120具体可以是独立的服务器,也可以是多个独立的服务器组成的服务器集群。
终端110可以获取采集的图像帧,该图像帧可以是终端110通过内置的图像采集装置或者外部连接的图像采集装置采集的,内置的图像采集装置具体可以是终端110的前置摄像头或者后置摄像头;该图像帧也可以是其它计算机设备采集后发送至终端110的。终端110继而可以在获取的图像帧中确定通过图像语义分割得到的目标区域和参考区域,并在采集时间在前的图像帧中目标区域和参考区域的位置关系满足动作判定开始条件、且采集时间在后的图像帧中目标区域和参考区域的位置关系满足动作判定结束条件时,判定检测到触发添加附加元素的动作。这样终端110即可在检测到该动作时获取附加元素,将获取的附加元素添加至采集时间在后的图像帧之后采集的图像帧中。
终端110也可将获取的图像帧发送至服务器120,由服务器120在判定采集时间在前的图像帧中目标区域和参考区域的位置关系满足动作判定开始条件、且采集时间在后的图像帧中目标区域和参考区域的位置关系满足动作判定结束条件时,通知终端110检测到触发添加附加元素的动作,终端110继而获取附加元素,将获取的附加元素添加至采集时间在后的图像帧之后采集的图像帧中。
终端110也可将获取的图像帧发送至服务器120,由服务器120在采集时间在前的图像帧中目标区域和参考区域的位置关系满足动作判定开始条件、且采集时间在后的图像帧中目标区域和参考区域的位置关系满足动作判定结束条件时,判定检测到触发添加附加元素的动作,并获取附加元素,将获取的附加元素添加至采集时间在后的图像帧之后采集的图像帧中,再将添加附加元素后的图像帧反馈至终端110。
图2为一个实施例中图像处理方法的流程示意图。本实施例以该图像处理方法应用于计算机设备来举例说明,该计算机设备可以是图1中的终端110或者服务器120。参照图2,该方法具体包括如下步骤:
S202,获取采集的图像帧。
其中,图像帧是通过物理成像原理对成像目标进行成像而得到的数据。
在一个实施例中,在计算机设备为终端时,终端具体可按照固定或动态的帧率采集图像帧,获取采集的图像帧。其中,按照固定或动态的帧率采集图像帧,能够使图像帧按照该固定或动态的帧率播放,形成连续的动态画面。
在一个实施例中,在计算机设备为终端时,终端可通过内置或者外部连接的图像采集装置,在图像采集装置当前的拍摄范围内采集图像帧,获取采集的图像帧。其中,图像采集装置的拍摄范围可因终端的姿态和位置的变化而变化。终端的图像采集装置具体可以包括前置摄像头或者后置摄像头。
在一个实施例中,在计算机设备为终端时,终端可通过社交应用提供的拍摄模式采集图像帧,获取采集的图像帧。其中,社交应用是能够基于社交网络进行网络社交互动的应用。社交应用包括即时通信应用、SNS(Social Network Service,社交网站)应用、直播应用或者拍照应用等。
在一个实施例中,在计算机设备为终端时,终端可接收另一终端发送的、由另一终端采集的图像帧,获取接收的图像帧。比如,终端通过运行在终端上的社交应用建立视频会话时,接收其他会话方所对应的终端采集后发送的图像帧。
在一个实施例中,在计算机设备为终端时,终端可通过直播应用提供的拍摄模式采集图像帧,将采集的图像帧作为直播数据,以通过直播应用进行直播。终端也可接收另一终端发送的、由另一终端通过直播应用提供的拍摄模式采集的图像帧,将接收到的图像帧作为直播数据,以通过直播应用播放其他用户通过直播应用发起的直播。
在一个实施例中,在计算机设备为服务器时,前述实施例中的终端在获取到图像帧后可上传至服务器,服务器从而获取到采集的图像帧。
在一个具体的实施例中,计算机设备为终端。终端上安装有视频录制应用。终端可根据用户指令运行该视频录制应用,通过该视频录制应用调用终端内置的摄像头采集图像帧,并在采集图像帧时,按照图像帧的采集时序实时获取采集的图像帧。
上述实施例中,计算机设备获取图像帧的帧率小于或者等于图像采集装置采集图像帧的帧率。
S204,在获取的图像帧中确定通过图像语义分割得到的目标区域和参考区域。
其中,图像语义分割是将图像中的像素按照表达语义的不同进行分割。图像语义分割用于实现对图像按照语义划分为多个像素区域。本质上,图像语义分割实现的是图像像素级的分类,通过对像素点进行分类,实现整幅图像的语义标注。需要说明的是,本申请实施例中不对分类单位进行限定,可以是逐像素分类,也可以是按图像块分类,一个图像块包括多个像素。
目标区域是图像帧中作为目标检测动作的区域。参考区域是图像帧中作为参考检测动作的区域。在不同的图像帧中,目标区域是动态区域,参考区域是静态区域。不同的图像帧中目标区域与参考区域的位置关系不同。可以理解,这里的静态区域不是绝对的静态,是相对于目标区域而言静态的区域。
举例说明,假设摄像头在采集图像帧时,用户做出了撩头动作,由于撩头是一个持续的动作。那么用户在用手做撩头动作时摄像头采集的一系列图像帧中,手是撩头动作的执行部位,那么手部区域即为目标区域,在不同的图像帧中是动态变化的;面部是撩头动作的参考部位,那么面部区域即为参考区域,相对手部而言是静态的。在此场景下,目标区域是动态区域,参考区域是相对静态区域。
再比如,摄像头在采集图像帧时,用户做出了跳跃动作,由于跳跃是一个持续的动作。那么用户在跳跃时摄像头采集的一系列图像帧中,人体是跳跃动作的执行部位,那么人体区域即为目标区域,在不同的图像帧中是动态变化的。为方便计算,也可选择脚部区域(人体局部区域)作为目标区域。地面则是跳跃动作的参考部位,那么地面区域即为参考区域。在此场景下,参考区域是绝对静态区域。
具体地,终端可将获取的图像帧编码为语义分割特征矩阵,然后解码该语义分割特征矩阵得到语义分割图像,再根据属于目标类别的像素点从语义分割图像中分割出目标区域,根据属于参考类别的像素点从语义分割图像中分割出参考区域。其中,语义分割图像中的像素点,具有表示所属分类类别的像素值,且与得到该语义分割图像的原始图像帧中的像素点对应。
本领域技术人员可以理解,语义分割特征矩阵是对图像帧中图像内容的语义特征的低维表达,涵盖了该整个图像帧的语义特征信息。语义分割图像是分割为若干个互不重叠的、具 有一定语义的区域的图像。语义分割图像中像素点的像素值用于反映相应像素点所属的分类类别。像素点的分类可以是二分类,也可以是多分类。像素点二分类,是指将语义分割图像中的像素点分为两种不同的像素值,用于代表两种不同的分类类别,比如地图图像中对应道路的像素点和其他像素点。像素点多分类,是指将语义分割图像中的像素点分为两种以上的像素值,用于代表两种以上的分类类别,比如风景地图中对应天空的像素点、对应大地的像素点以及对应人物的像素点等。语义分割图像的图像尺寸与原始图像帧的图像尺寸一致。这样,可以理解为对原始图像帧进行了逐像素点分类,根据语义分割图像中的像素点的像素值,即可得到原始图像帧中的每个像素点隶属的类别。
S206,当获取的第一图像帧中目标区域和参考区域的位置关系满足动作判定开始条件、且获取的第二图像帧中目标区域和参考区域的位置关系满足动作判定结束条件时,则判定检测到触发添加附加元素的动作。
其中,第一图像帧和第二图像帧为获取到的任意图像帧,且第二图像帧的采集时间位于第一图像帧的采集时间之后。
其中,动作判定开始条件是判定开始执行特定动作的约束条件。动作判定结束条件是正在执行特定动作的约束条件。由于动作是一个持续的过程,那么可以理解,只有既获取到满足动作判定开始条件的图像帧,又在之后获取到满足动作判定结束条件的图像帧时,才能判定检测到的动作。
举例说明,假设摄像头在采集图像帧时,用户做出了撩头动作,由于撩头是一个持续的动作。那么只有在检测到用户开始撩头,之后又检测到用户正在撩头时,才能判定有撩头动作。而用户在开始撩头后立即又停止动作,这时就不能认为是检测到了撩头动作。
再比如,假设摄像头在采集图像帧时,用户做出了跳跃动作,由于跳跃是一个持续的动作。那么只有在检测到用户起跳,之后又检测到用户离开地面(正在跳跃)时,才能判定有跳跃动作。而用户在起跳后立即又停止动作未离开地面,这时就不能认为是检测到了跳跃动作。
触发添加附加元素的动作,是触发在采集的图像帧中添加附加元素的动作。触发添加附加元素的动作比如撩头动作、捂脸动作或者摸下巴动作等。附加元素是用于额外增加在图像帧中的数据。附加元素具体可以是装饰元素,比如挂件。装饰元素是能够以可视化形式展示 的用于装饰的数据。附加元素比如在图像帧中显示来修饰图像内容的数据。附加元素比如面具、盔甲、彩带、蓝天或者白云等,本申请实施例中对附加元素的种类不进行限定。附加元素可以是动态数据,比如动态图片;也可以是静态数据,比如静态图片。
在一个实施例中,触发添加附加元素的动作可以是一个或者多个。当触发添加附加元素的动作为多个时,不同的动作可以对应相同的动作判定开始条件,或者对应相同的动作判定结束条件。这多个动作可以触发添加在图像帧中统一的位置添加统一的附加元素,也可以分别触发在图像帧中不同的位置添加统一的附加元素,还可以分别触发在图像帧中不同的位置添加不同的附加元素。
可以理解,这里的采集时间在前以及采集时间在后所涉及的先后关系,是指目标区域和参考区域的位置关系满足动作判定开始条件的图像帧的采集时间,在目标区域和参考区域的位置关系满足动作判定结束条件图像帧的采集时间之前。
S208,在检测到动作时获取附加元素。
具体地,终端在检测到动作时,可以查询事先建立的动作与附加元素的对应关系,根据该对应关系查询与检测到的动作对应的附加元素,获取查询到的附加元素。
在一个实施例中,动作对应的附加元素的数量可以为一个或者多个。当动作对应的附加元素的数量可以为多个时,终端可从这多个附加元素中随机选取附加元素,也可根据当前登录的用户标识的用户标签,选取与该用户标签匹配的附加元素。
S210,将附加元素添加至第二图像帧之后采集的图像帧中。
可以理解,这里的第二图像帧,是目标区域和参考区域的位置关系满足动作判定结束条件的图像帧,是在目标区域和参考区域的位置关系满足动作判定开始条件的第一图像帧之后采集的图像帧。
具体地,计算机设备可在判定检测到触发添加附加元素的动作后,将目标区域和参考区域的位置关系满足动作判定结束条件的图像帧(也就是采集时间在后的图像帧)作为分界帧,在采集时间在该分界帧后的图像帧中添加附加元素。其中,添加附加元素的图像帧,可以是在该分界帧的采集时间后采集的全部图像帧,也可以是在该分界帧的采集时间后采集的部分图像帧,还可以包括该分界帧本帧。
在一个具体的实施例中,计算机设备为终端。终端上安装有视频录制应用。终端可根据 用户指令运行该视频录制应用,通过该视频录制应用调用终端内置的摄像头采集图像帧,并在采集图像帧时,按照图像帧的采集时序实时获取采集的图像帧。可以理解,摄像头采集图像帧是实时且持续的过程,终端获取采集的图像帧也是实时且持续的过程,终端在每获取一帧图像帧后,即判定该图像帧中目标区域和参考区域的位置关系是否满足动作判定开始条件;在满足动作判定开始条件时,则对获取的下一帧图像帧判定是否满足动作判定结束条件;在满足动作判定结束条件时,则从获取的下一帧图像帧开始添加附加元素(可以包括当前满足动作判定结束条件的图像帧)。
举例说明,终端实时采集了一系列图像帧P 1、P 2…P i、P i+1…P n,这些图像帧按采集时序排列。终端判定图像帧P 2中目标区域和参考区域的位置关系满足动作判定开始条件,且判定图像帧P i中目标区域和参考区域的位置关系满足动作判定结束条件;进而可以则判定检测到触发添加附加元素的动作。那么,终端即可从P i或P i+1开始添加附加元素。
上述图像处理方法,在获取到采集的图像帧后,自动在获取的图像帧中确定通过图像语义分割得到的目标区域和参考区域,继而再根据多帧图像帧中目标区域和参考区域的位置关系,判定是否有触发添加附加元素的动作。这样即可在判定该动作时便自动将附加元素添加至后续采集的图像帧中,避免了人工操作的繁琐步骤,极大地提高了图像处理效率。
在一个实施例中,S204包括:将获取的图像帧输入图像语义分割模型;通过图像语义分割模型输出目标区域概率分布矩阵和参考区域概率分布矩阵;根据目标区域概率分布矩阵确定获取的图像帧中的目标区域;根据参考区域概率分布矩阵确定获取的图像帧中的参考区域。
其中,图像语义分割模型是经过训练后具备语义分割功能的机器学习模型。机器学习英文全称为Machine Learning,简称ML。机器学习模型可通过样本学习具备特定的能力。机器学习模型可采用神经网络模型、支持向量机或者逻辑回归模型等。神经网络模型比如卷积神经网络等。
在本实施例中,图像语义分割模型具体为神经网络模型。神经网络模型具体可以是卷积神经网络模型(CNN)。卷积神经网络模型的卷积层(Convolution Layer)中包括多个卷积核(Convolution Kernel)。卷积核是卷积层对输入进行卷积运算的算子。每个卷积核对输入进行卷积运算后可得到一个输出。神经网络模型的池化层(Pooling Layer)层也称为采样层,用于对输入进行压缩,通常有均值池化(Mean Pooling)和最大值池化(Max Pooling)两种形 式。池化可以看作一种特殊的卷积过程。
图像语义分割模型可以理解为分类器,用于对输入图像帧中包括的像素点进行逐像素分类。图像语义分割模型的分类类别的数量可以在训练时自定义控制。在本实施例中,图像语义分割模型被设置为多分类器,分类类别包括目标类别、参考类别和背景类别三种。模型输入图像帧,属于目标类别的像素点即为目标区域的像素点,属于参考类别的像素点即为参考区域的像素点,属于背景类别的像素点即为背景区域的像素点。这样即可根据像素点所属的类别对像素点进行划分,确定获取的图像帧中的目标区域和参考区域。
举例说明,当触发添加附加元素的动作为撩头动作时,目标类别即为手部类别,参考类别即为面部类别。获取的图像帧中属于手部类别的像素点即为手部区域的像素点,属于面部类别的像素点即为面部区域的像素点。这样即可根据像素点所属的类别对像素点进行划分,确定获取的图像帧中的手部区域和面部区域。
目标区域概率分布矩阵的矩阵元素,具有表示属于目标类别的概率值,且与输入模型的图像帧中的像素点对应。也就是说,假设输入模型的图像帧为2*2,那么目标区域概率分布矩阵也为2*2,矩阵位置(m,n)的矩阵元素的值即为图像帧中像素位置(m,n)的像素点属于目标类别的概率。其中,矩阵(图像帧)以左上角的矩阵位置(像素位置)为(0,0)。
同理,参考区域概率分布矩阵的矩阵元素,具有表示属于参考类别的概率值,且与输入模型的图像帧中的像素点对应。也就是说,假设输入模型的图像帧为2*2,那么目标区域概率分布矩阵也为2*2,矩阵位置(m,n)的矩阵元素的值即为图像帧中像素位置(m,n)的像素点属于参考类别的概率。
具体地,终端可将获取的图像帧输入事先训练好的图像语义分割模型,通过图像语义分割模型输出目标区域概率分布矩阵和参考区域概率分布矩阵。终端可再将目标区域概率分布矩阵中概率值大于预设概率的矩阵元素所对应的像素点围成的区域确定为目标区域,并将参考区域概率分布矩阵中概率值大于预设概率的矩阵元素所对应的像素点围成的区域确定为参考区域。预设概率是事先设定的用于判定是否被分类为当前类别的分界值。
图3为一个具体的实施例中图像语义分割模型对获取的图像帧进行处理的原理示意图。参考图3,图像语义分割模型为U型对称模型,在前的网络层的输出通过跳跃连接(Skip connection)作为对应位置的网络层的输入。图像语义分割模型的输入为获取的图像帧的特征 图(如RGB三通道特征图),图像语义分割模型中的网络层对输入该层的特征图进行操作得到特征图输出,图像语义分割模型的输出可以是语义分割图像,也可以是概率分布矩阵,根据训练时的样本和标签决定。其中,图中m*n*k(如3*256*256、或64*256*256)中m表示特征图的数量,n*k表示特征图的尺寸。可以理解,图中的参数均为示例,不对实际使用的模型参数进行限定。网络层对特征图的操作包括:卷积变化(Convolution)、归一变化(BatchNorm)、激活变化(ReLU)、最大池化(MaxPool)以及上采样(Upsampling)等。
上述实施例中,在获取到图像帧后,即自动将该图像帧输入训练好的机器学习模型,根据机器学习模型输出目标区域概率分布矩阵和参考区域概率分布矩阵来确定目标区域和参考区域。其中,概率分布矩阵中的矩阵元素,具有表示图像帧中对应的像素点属于特定分类类别的概率值,这样即可自动根据属于目标类别的像素点来确定目标区域,根据属于参考类别的像素点来确定参考区域,提高了图像区域划分的准确率,且为后续判断动作判定开始条件或者动作判定结束条件是否满足奠定了基础。
在一个实施例中,目标区域为手部区域;参考区域为面部区域。该图像处理方法还包括:确定获取的图像帧中的手部区域所对应的手势类型。当手势类型为触发添加附加元素的手势类型时,即可判断获取的图像帧中目标区域和参考区域的位置关系是否满足动作判定开始条件。
当第一图像帧中目标区域和参考区域的位置关系满足动作判定开始条件、且第二图像帧中目标区域和参考区域的位置关系满足动作判定结束条件时,则判定检测到触发添加附加元素的动作,包括:如果第一图像帧的手势类型为触发类型,当第一图像帧中手部区域和面部区域的位置关系满足动作判定开始条件、且第一图像帧中手部区域和面部区域的位置关系满足动作判定结束条件时,则判定检测到触发添加附加元素的动作。
其中,手部与面部均为生物(人或动物)的肢体部分。手部区域是手部所在的区域。手部区域可以是手部轮廓以内围成的区域,也可以是包括手部且手部占比高的规则区域。面部区域可以是面部轮廓围成的区域,也可以是包括面部且面部占比高的规则区域。手势是由用户通过手部做出的动作形态。手势类型是获取的图像帧中手势所属的类型。
图4示出了一个实施例中从获取的图像帧中分割出手部区域的示意图。参考图4(a)为获取的图像帧,终端可通过图像语义分割确定该图像帧中的手部区域401。再参考图4(b) 为从获取的包括手部区域的图像帧中,按照规则形状分割出的手部区域得到的图像。
可以理解,相比于直接对获取的原始图像中手部区域所对应的手势类型进行识别,从获取的原始图像中分割出手部区域之后再对分割出的手部区域进行识别,避免手部区域占整个图像的比例较小时识别不准确的问题,能够减少原始图像中相对于手部区域的背景区域对手部区域中手势的手势类型进行识别的干扰,可以提高识别的准确度。
具体地,计算机设备可采用预先训练好的手势识别模型对图像帧中手势所属的手势类型进行识别。从获取的图像帧中截取手部区域得到手部图像,输入手势识别模型中,通过手势识别模型中的隐藏层对手部图像对应的特征进行运算,输出手部图像中手势的手势类型,在识别出的手势类型为触发添加附加元素的手势类型时,才继续判定该获取的图像帧中手部区域和面部区域的位置关系是否满足动作判定开始条件,否则识别获取的下一帧图像帧中的手部区域所对应的手势类型是否为触发添加附加元素的手势类型。
进一步地,终端在识别出获取的某帧图像帧中手部区域所对应的手势类型为触发添加附加元素的手势类型,且该图像帧中手部区域和面部区域的位置关系满足动作判定开始条件时,才继续判定在该图像帧后获取的下一帧图像帧中手部区域所对应的手势类型是否为触发添加附加元素的手势类型,并在判定该下一帧图像帧中手部区域所对应的手势类型为触发添加附加元素的手势类型,才继续判定该下一帧图像帧中手部区域和面部区域的位置关系是否满足动作判定结束条件,直到检测到在后采集的另外一帧图像帧中手部区域所对应的手势类型为触发添加附加元素的手势类型,且该图像帧中手部区域和面部区域的位置关系满足动作判定结束条件时,就判定检测到了触发添加附加元素的动作。
其中,手势识别模型为机器学习模型。当计算机设备预先设置的触发添加附加元素的手势类型唯一时,手势识别模型即为二分类模型。用于训练二分类模型的图像样本包括属于触发添加附加元素的手势类型的正样本,及不属于触发添加附加元素的手势类型的负样本。当计算机设备预先设置的触发添加附加元素的手势类型多样时,手势识别模型即为多分类模型。用于训练多分类模型的图像样本包括属于各触发添加附加元素的手势类型的样本。手势识别模型具体可利用ConvNet Configuration模型作为初始模型,根据训练样本训练该初始模型,得到适用于手势识别的模型参数。
在一个实施例中,计算机设备还可将从获取的图像帧中截取手部区域得到的手部图像, 与属于触发添加附加元素的手势类型的手部图像模板进行特征匹配,在匹配成功时,判定获取的图像帧中的手部区域所对应的手势类型为触发添加附加元素的手势类型。
上述实施例中,在具体的目标区域为手部区域、参考区域为面部区域的场景下,在识别出获取的图像帧中的手部区域所对应的手势类型为触发添加附加元素的手势类型时,才继续判断动作判定开始条件或者动作判定结束条件是否满足,避免了在无效手势下判断动作判定开始条件或者动作判定结束条件造成的资源浪费,提高了图像处理效率。
在一个实施例中,当第一图像帧中目标区域和参考区域的位置关系满足动作判定开始条件、且第二图像帧中目标区域和参考区域的位置关系满足动作判定结束条件时,则判定检测到触发添加附加元素的动作,包括:从第一图像帧中目标区域和参考区域的位置关系满足动作判定开始条件时开始计时;在计时时长未达到预设时长、且第二图像帧中目标区域和参考区域的位置关系满足动作判定结束条件时,则判定检测到触发添加附加元素的动作。
可以理解,动作不仅具有持续性还具有连贯性。通俗地说,就是在动作开始后是连贯地完成的。比如,对于撩头动作,只有在检测到用户开始撩头,之后又连贯地持续撩头(也就是在一定时间范围内检测到正在撩头)时,才能判定有撩头动作;而用户在开始撩头后立即停止动作,并等待较长时间后再继续撩头,这时就不能认为是检测到了撩头动作。再比如,对于跳跃动作,只有既检测到用户起跳,之后又检测到用户继起跳后连贯地离开地面(正在跳跃)时,才能判定有跳跃动作;而用户在起跳后立即停止动作未离开地面,并等待较长时间后再离开地面,这时就不能认为是检测到了跳跃动作。
具体地,计算机设备在每获取一帧图像帧时,便判断该图像帧是否满足动作判定开始条件,在判定某帧图像帧满足动作判定开始条件时开始计时。这样,计算机设备便在计时的时候,继续获取图像帧,并判断该图像帧是否满足动作判定结束条件。只有在计时时长未达到预设时长、且继续获取的图像帧满足动作判定结束条件时,才判定检测到触发添加附加元素的动作。若计算机设备直到计时时长达到预设时长,仍未检测到在计时时间段内继续获取的图像帧满足动作判定结束条件时,则判定未检测到触发添加附加元素的动作。此时,计算机设备则继续获取图像帧,不再判断该图像帧是否满足动作判定结束条件,而是判断该图像帧是否满足动作判定开始条件,从而继续在判定某帧图像帧满足动作判定开始条件时开始计时,以继续检测触发添加附加元素的动作。其中,预设时长是根据实际经验判定动作形成的时长。
在一个实施例中,该图像处理方法还包括:当第一图像帧中目标区域和参考区域的交集占目标区域的占比超过第一预设数值时,则判定第一图像帧中目标区域和参考区域的位置关系满足动作判定开始条件;或者,当第一图像帧中目标区域和参考区域的交集占目标区域的占比超过第二预设数值、且目标区域的中心位置位于参考区域中心位置的上方时,判定第一图像帧中目标区域和参考区域的位置关系满足动作判定开始条件。
其中,第一预设数值和第二预设数值是预先设置的数值。第一预设数值具体地可以为0.5,第二预设数值具体可以为0.2。可以理解,获取的图像帧中目标区域和参考区域的交集占目标区域的占比超过第一预设数值,或者获取的图像帧中目标区域和参考区域的交集占目标区域的占比超过第二预设数值、且目标区域的中心位置位于参考区域中心位置的上方,是根据实际经验确定的满足动作判定开始条件时,目标区域和参考区域的位置关系。
图5为一个实施例中满足动作判定开始条件的图像帧的示意图。在本实施例中触发添加附加元素的动作为撩头动作,目标区域为手部区域,参考区域为面部区域。参考图5(a),可以看出手部区域和面部区域的交集占手部区域的占比超过第一预设数值(0.5),可以判定该图像帧中目标区域和参考区域的位置关系满足动作判定开始条件。在参考图5(b),可以看出手部区域和面部区域的交集占手部区域的占比超过第二预设数值(0.2)、且手部区域的中心位置O1位于面部区域中心位置O2的上方时,可以判定该图像帧中目标区域和参考区域的位置关系满足动作判定开始条件。
在一个实施例中,该图像处理方法还包括:确定第二图像帧中的参考区域中的参考位置;当第二图像帧中目标区域位于参考区域中的参考位置之上时,判定第二图像帧中目标区域和参考区域的位置关系满足动作判定结束条件。
其中,参考位置是用来判定图像帧中目标区域与参考区域的位置关系是否满足动作判定结束条件的对照位置。可以理解,图像帧中目标区域位于参考区域中的参考位置之上是根据实际经验确定的满足动作判定结束条件时,目标区域和参考区域的位置关系。
图6为一个实施例中满足动作判定结束条件的图像帧的示意图。在本实施例中触发添加附加元素的动作为撩头动作,目标区域为手部区域,参考区域为面部区域,参考位置为面部区域中眉毛所在位置。参考图6,可以看出该图像帧中手部区域位于面部区域中的眉毛位置之上,可以判定该图像帧中目标区域和参考区域的位置关系满足动作判定结束条件。
上述实施例中,提供了具体判断图像帧是否满足动作判定开始条件或动作判定结束条件的依据,保证了动作判定的有效进行。而且,只有在判定获取的图像帧满足动作判定开始条件,且在之后的预设时长内继续获取的图像帧判定满足动作判定结束条件时,才判定检测到了动作,使得动作的检测符合实际认知且有效。
可以理解,判定获取的图像帧满足动作判定开始条件,且在之后的预设时长内继续获取的图像帧判定满足动作判定结束条件,表示满足动作判定开始条件的图像帧的采集时间与满足动作判定结束条件的图像帧的采集时间之间的时间间隔小于或等于预设时长。
在一个实施例中,目标区域为手部区域;参考区域为面部区域;参考位置为眉毛所在位置。将附加元素添加至第二图像帧之后采集的图像帧中,包括:在第二图像帧之后采集的图像帧中,确定面部区域中眉毛所在位置与手部区域靠近眉毛所在位置的边界形成的区域;在第二图像帧后采集的图像帧中,将附加元素自适应添加至确定的区域。
具体地,计算机设备可对第二图像帧后采集的图像帧进行人脸检测,确定该第二图像帧中人脸区域中的左右眉基准点,根据该基准点确定眉毛所在位置,再确定面部区域中眉毛所在位置与手部区域靠近眉毛所在位置的边界形成的区域,从而将附加元素自适应添加至确定的区域。
其中,将附加元素自适应添加至确定的区域,可以是将附加元素的尺寸调整至确定的区域的尺寸,这样,确定的区域的尺寸会随着手部区域的动作逐渐增大,而附加元素也随着确定的区域的尺寸增大而逐渐增大显示尺寸。将附加元素自适应添加至确定的区域,也可以是将附加元素的部分区域添加至确定的区域,其中,附加元素的部分区域以附加元素的某一边界为边界,该边界与确定的区域的边界对应,这样,确定的区域的尺寸会随着手部区域的动作逐渐增大,而附加元素也随着确定的区域的尺寸增大而逐渐由局部显示变为全部显示,且显示的局部越来越大。
图7为一个实施例中添加附加元素的图像帧的示意图。在本实施例中触发添加附加元素的动作为撩头动作,目标区域为手部区域,参考区域为面部区域,参考位置为面部区域中眉毛所在位置。参考图7,可以看出从(a)至(b)面部区域中眉毛所在位置与手部区域靠近眉毛所在位置的边界形成的区域逐渐增大,在该区域中添加的附加元素的尺寸也越来越大。
在本实施例中,将附加元素自适应添加至确定的区域直至完整添加,而非直接添加完整 的附加元素,避免了附加元素添加过程的单一性和突兀,通过根据手部区域的移动逐渐自适应添加附加元素提高增强了交互性。
在另外的实施例中,计算机设备还可对附加元素的边界作虚化处理。
在一个实施例中,该图像处理方法还包括:在未检测到触发添加附加元素的动作时,将获取的图像帧按照采集的时序逐帧播放;在检测到触发添加附加元素的动作后,将添加附加元素后的图像帧按照采集的时序逐帧播放。
具体地,计算机设备可在采集图像帧后,实时播放采集的图像帧。在未检测到触发添加附加元素的动作时,也就是说当前采集到的图像帧无需添加附加元素,那么则可直接渲染采集的图像帧形成预览画面,以可视化方式展示采集的图像帧。在检测到触发添加附加元素的动作后,也就是说当前采集到的图像帧需要添加附加元素,那么则将附加元素添加到在检测到触发添加附加元素的动作后采集的图像帧中,渲染添加附加元素后的图像帧形成预览画面,以可视化方式展示添加附加元素后的图像帧。
在本实施例中,在拍摄过程中,一边对采集的图像帧检测触发添加附加元素的动作,一边根据采集的视频帧或添加附加元素后的图像帧实时生成预览画面,以供用户观看。这样用户便可实时了解录制成视频的内容,以便在出现错误时及时修正或重新录制。
在一个实施例中,该图像处理方法还包括:用添加附加元素后的图像帧,替换添加附加元素前的相应图像帧;根据替换后所确定的图像帧的采集时间,将替换后所确定的图像帧按采集时间的时序生成录制的视频;其中,替换后所确定的图像帧中,添加附加元素后的图像帧的采集时间,是添加附加元素前的相应图像帧的采集时间。
其中,替换后所确定的图像帧,包括在添加附加元素前原始采集的图像帧,还包括在添加附加元素后,通过添加附加元素得到的图像帧。也即是,对于获取的多个图像帧来说,有些图像帧中未添加附加元素,而有些图像帧中添加了附加元素,因此所确定的多个图像帧中,既包括未添加附加元素的图像帧,即原始采集的图像帧,也包括添加附加元素后的图像帧,即通过替换得到的图像帧。
其中,替换后所确定的图像帧中,未进行替换操作的原始图像帧的采集时间,是该图像帧真实的采集时间。替换后所确定的图像帧中通过替换得到的图像帧的采集时间,是添加附件元素前的相应图像帧的采集时间。
举例说明,原始采集图像帧A、B、C和D,从图像帧C开始添加附加元素。对图像帧C添加附加元素得到图像帧C1,对图像帧D添加附加元素得到图像帧D1。那么则用图像帧C1来替换图像帧C,用图像帧D1来替换图像帧D,替换后所确定的图像帧即为A、B、C1和D1,也就是用这些图像帧来生成视频。
具体地,计算机设备可用添加附加元素后得到的图像帧,替换添加附加元素前的相应图像帧,再根据替换后所确定的各图像帧的采集时间,将替换后所确定的图像帧按采集时间的时序生成录制的视频。其中,按采集时间的时序可以是按时间逆序,也可以是按时间顺序。
进一步地,计算机设备在生成录制的视频后,可将该视频分享至社交会话中,或者将视频发布至社交发布平台。
在本实施例中,实现了在拍摄过程中即自动且实时地对采集的图像帧进行处理,并实时地生成视频,避免了需要后续手动处理带来的繁琐步骤,极大地简化了操作,提高了视频生成效率。
图8示出了一个具体的实施例中图像处理方法的流程图。在本实施例中,触发添加附加元素的动作为撩头动作,目标区域为手部区域,参考区域为面部区域,参考位置为面部区域中眉毛所在位置。计算机设备为终端。终端上安装有视频录制应用。终端可根据用户指令运行该视频录制应用,通过该视频录制应用调用终端内置的摄像头采集图像帧,并在采集图像帧时,按照图像帧的采集时序实时获取采集的图像帧。
终端在获取采集的图像帧后,可确定获取的图像帧中的手部区域所对应的手势类型,判断该手势类型是否为触发类型;若否,则获取下一帧图像帧,继续确定该图像帧中的手部区域所对应的手势类型;若是,则判断图像帧中目标区域和参考区域的位置关系是否满足动作判定开始条件。其中,动作判定开始条件为:图像帧中目标区域和参考区域的交集占目标区域的占比超过第一预设数值,或者,图像帧中目标区域和参考区域的交集占目标区域的占比超过第二预设数值、且目标区域的中心位置位于参考区域中心位置的上方。
终端在判定图像帧未满足动作判定开始条件时,则获取下一帧图像帧,继续确定该图像帧中的手部区域所对应的手势类型;在判定图像帧满足动作判定开始条件时,则开始计时并继续获取下一帧图像帧。终端继而确定继续获取的图像帧中的手部区域所对应的手势类型,判断该手势类型是否为触发类型;若否,则获取下一帧图像帧,继续确定该图像帧中的手部 区域所对应的手势类型;若是,则判断继续获取的图像帧中目标区域和参考区域的位置关系是否满足动作判定结束条件。其中,动作判定结束条件为:图像帧中目标区域位于参考区域中的参考位置之上。
终端在计时时长未达到预设时长时检测到图像帧满足动作判定结束条件时,则判定检测到触发添加附加元素的动作,在继续获取的图像帧后采集的图像帧中,确定面部区域中眉毛所在位置与手部区域靠近眉毛所在位置的边界形成的区域;在继续获取的图像帧后采集的图像帧中,将附加元素自适应添加至确定的区域。终端在计时时长达到预设时长是仍未检测到图像帧满足动作判定结束条件时,获取下一帧图像帧,继续确定该图像帧中的手部区域所对应的手势类型,并在手势类型为触发类型时检测动作判定开始条件是否满足。
终端还可在图像处理时,实时用添加附加元素后的图像帧,替换添加附加元素前的相应图像帧,根据替换后所确定的图像帧的采集时间,将替换后所确定的图像帧按采集时间的时序生成录制的视频;也可以在结束图像帧采集后,根据替换后所确定的图像帧的采集时间,将替换后所确定的图像帧按采集时间的时序生成录制的视频。
应该理解的是,虽然上述各实施例的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,上述各实施例中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
如图9所示,在一个实施例中,提供了一种图像处理装置900。参照图9,该图像处理装置900包括:获取模块901、确定模块902、判定模块903和添加模块904。
获取模块901,用于获取采集的图像帧。
确定模块902,用于在获取的图像帧中确定通过图像语义分割得到的目标区域和参考区域。
判定模块903,用于当获取的第一图像帧中目标区域和参考区域的位置关系满足动作判 定开始条件、且获取的第二图像帧中目标区域和参考区域的位置关系满足动作判定结束条件时,则判定检测到触发添加附加元素的动作,第二图像帧的采集时间位于第一图像帧的采集时间之后。
添加模块904,用于在检测到动作时获取附加元素;将附加元素添加至第二图像帧之后采集的图像帧中。
在一个实施例中,确定模块902还用于将获取的图像帧输入图像语义分割模型;通过图像语义分割模型输出目标区域概率分布矩阵和参考区域概率分布矩阵;根据目标区域概率分布矩阵确定获取的图像帧中的目标区域;根据参考区域概率分布矩阵确定获取的图像帧中的参考区域。
在一个实施例中,目标区域为手部区域;参考区域为面部区域。确定模块902还用于确定获取的图像帧中的手部区域所对应的手势类型。判定模块903还用于如果第一图像帧的手势类型为触发类型,当第一图像帧中手部区域和面部区域的位置关系满足动作判定开始条件、且第二图像帧中手部区域和面部区域的位置关系满足动作判定结束条件时,则判定检测到触发添加附加元素的动作。
在一个实施例中,判定模块903还用于从第一图像帧中目标区域和参考区域的位置关系满足动作判定开始条件时开始计时;在计时时长未达到预设时长、且在第一图像帧之后获取的第二图像帧中目标区域和参考区域的位置关系满足动作判定结束条件时,则判定检测到触发添加附加元素的动作。
在一个实施例中,判定模块903还用于在第一图像帧中目标区域和参考区域的交集占目标区域的占比超过第一预设数值时,判定第一图像帧中目标区域和参考区域的位置关系满足动作判定开始条件;或者,在第一图像帧中目标区域和参考区域的交集占目标区域的占比超过第二预设数值、且目标区域的中心位置位于参考区域中心位置的上方时,判定第一图像帧中目标区域和参考区域的位置关系满足动作判定开始条件。
在一个实施例中,判定模块903还用于确定第二图像帧中的参考区域中的参考位置;当第二图像帧中目标区域位于参考区域中的参考位置之上时,判定第二图像帧中目标区域和参考区域的位置关系满足动作判定结束条件。
在一个实施例中,目标区域为手部区域;参考区域为面部区域;参考位置为眉毛所在位 置。添加模块904还用于在第二图像帧之后采集的图像帧中,确定面部区域中眉毛所在位置与手部区域靠近眉毛所在位置的边界形成的区域;在第二图像帧之后采集的图像帧中,将附加元素添加至确定的区域。
在一个实施例中,获取模块901还用于在未检测到触发添加附加元素的动作时,将获取的图像帧按照采集的时序逐帧播放。添加模块904还用于在检测到触发添加附加元素的动作后,将添加附加元素后的图像帧按照采集的时序逐帧播放。
在一个实施例中,添加模块904还用于用添加附加元素后的图像帧,替换添加附加元素前的相应图像帧;根据替换后所确定的图像帧的采集时间,将替换后所确定的图像帧按采集时间的时序生成录制的视频;其中,替换后所确定的图像帧中,添加附加元素后的图像帧的采集时间,是添加附加元素前的相应图像帧的采集时间。
图10示出了一个实施例中计算机设备的内部结构图。该计算机设备具体可以是图1中的终端110或服务器120。如图10所示,该计算机设备包括通过系统总线连接的处理器、存储器和网络接口。其中,存储器包括非易失性存储介质和内存储器。该计算机设备的非易失性存储介质存储有操作系统,还可存储有计算机程序,该计算机程序被处理器执行时,可使得处理器实现图像处理方法。该内存储器中也可储存有计算机程序,该计算机程序被处理器执行时,可使得处理器执行图像处理方法。本领域技术人员可以理解,图10中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,本申请提供的图像处理装置可以实现为一种计算机程序的形式,计算机程序可在如图10所示的计算机设备上运行,计算机设备的非易失性存储介质可存储组成该图像处理装置的各个程序模块,比如,图9所示的获取模块901、确定模块902、判定模块903和添加模块904等。各个程序模块组成的计算机程序使得处理器执行本说明书中描述的本申请各个实施例的图像处理方法中的步骤。
例如,图10所示的计算机设备可以通过如图9所示的图像处理装置900中的获取模块901获取采集的图像帧。通过确定模块902在获取的图像帧中确定通过图像语义分割得到的目标区域和参考区域。通过判定模块903当获取的第一图像帧中目标区域和参考区域的位置 关系满足动作判定开始条件、且获取的第二图像帧中目标区域和参考区域的位置关系满足动作判定结束条件时,则判定检测到触发添加附加元素的动作,第二图像帧的采集时间位于第一图像帧的采集时间之后。通过添加模块904在检测到动作时获取附加元素;将附加元素添加至第二图像帧之后采集的图像帧中。
在一个实施例中,提供了一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器执行时,使得处理器执行以下步骤:获取采集的图像帧;在获取的图像帧中确定通过图像语义分割得到的目标区域和参考区域;当获取的第一图像帧中目标区域和参考区域的位置关系满足动作判定开始条件、且获取的第二图像帧中目标区域和参考区域的位置关系满足动作判定结束条件时,则判定检测到触发添加附加元素的动作,第二图像帧的采集时间位于第一图像帧的采集时间之后;在检测到动作时获取附加元素;将附加元素添加至第二图像帧之后采集的图像帧中。
在一个实施例中,在获取的图像帧中确定通过图像语义分割得到的目标区域和参考区域,包括:将获取的图像帧输入图像语义分割模型;通过图像语义分割模型输出目标区域概率分布矩阵和参考区域概率分布矩阵;根据目标区域概率分布矩阵确定获取的图像帧中的目标区域;根据参考区域概率分布矩阵确定获取的图像帧中的参考区域。
在一个实施例中,目标区域为手部区域;参考区域为面部区域。该计算机程序还使得处理器执行以下步骤:确定获取的图像帧中的手部区域所对应的手势类型;
该计算机程序还使得处理器执行以下步骤:如果第一图像帧的手势类型为触发类型,当第一图像帧中手部区域和面部区域的位置关系满足动作判定开始条件、且第二图像帧中手部区域和面部区域的位置关系满足动作判定结束条件时,则判定检测到触发添加附加元素的动作。
在一个实施例中,该计算机程序还使得处理器执行以下步骤:从第一图像帧中目标区域和参考区域的位置关系满足动作判定开始条件时开始计时;在计时时长未达到预设时长、且在第一图像帧之后获取的第二图像帧中目标区域和参考区域的位置关系满足动作判定结束条件时,则判定检测到触发添加附加元素的动作。
在一个实施例中,该计算机程序还使得处理器执行以下步骤:在第一图像帧中目标区域和参考区域的交集占目标区域的占比超过第一预设数值时,判定第一图像帧中目标区域和参 考区域的位置关系满足动作判定开始条件;或者,在第一图像帧中目标区域和参考区域的交集占目标区域的占比超过第二预设数值、且目标区域的中心位置位于参考区域中心位置的上方时,判定第一图像帧中目标区域和参考区域的位置关系满足动作判定开始条件。
在一个实施例中,该计算机程序还使得处理器执行以下步骤:确定第二图像帧中的参考区域中的参考位置;当第二图像帧中目标区域位于参考区域中的参考位置之上时,判定第二图像帧中目标区域和参考区域的位置关系满足动作判定结束条件。
在一个实施例中,目标区域为手部区域;参考区域为面部区域;参考位置为眉毛所在位置。将附加元素添加至在采集时间在后的图像帧后采集的图像帧中,包括:在第二图像帧之后采集的图像帧中,确定面部区域中眉毛所在位置与手部区域靠近眉毛所在位置的边界形成的区域;在第二图像帧之后采集的图像帧中,将附加元素添加至确定的区域。
在一个实施例中,该计算机程序还使得处理器执行以下步骤:在未检测到触发添加附加元素的动作时,将获取的图像帧按照采集的时序逐帧播放;在检测到触发添加附加元素的动作后,将添加附加元素后的图像帧按照采集的时序逐帧播放。
在一个实施例中,该计算机程序还使得处理器执行以下步骤:用添加附加元素后的图像帧,替换添加附加元素前的相应图像帧;根据替换后所确定的图像帧的采集时间,将替换后所确定的图像帧按采集时间的时序生成录制的视频;其中,替换后所确定的图像帧中,添加附加元素后的图像帧的采集时间,是添加附加元素前的相应图像帧的采集时间。
在一个实施例中,提供了一种计算机设备,包括存储器和处理器,存储器中储存有计算机程序,计算机程序被处理器执行时,使得处理器执行以下步骤:其中,上述计算机程序被处理器执行时,使得处理器执行以下步骤:
获取采集的图像帧;
在获取的图像帧中确定通过图像语义分割得到的目标区域和参考区域;
当获取的第一图像帧中目标区域和参考区域的位置关系满足动作判定开始条件、且获取的第二图像帧中目标区域和参考区域的位置关系满足动作判定结束条件时,则判定检测到触发添加附加元素的动作,第二图像帧的采集时间位于第一图像帧的采集时间之后;
在检测到动作时获取附加元素;
将附加元素添加至第二图像帧之后采集的图像帧中。
在一个实施例中,计算机程序被处理器执行在获取的图像帧中确定通过图像语义分割得到的目标区域和参考区域的步骤时,使得处理器执行以下步骤:
将获取的图像帧输入图像语义分割模型;
通过图像语义分割模型输出目标区域概率分布矩阵和参考区域概率分布矩阵;
根据目标区域概率分布矩阵确定获取的图像帧中的目标区域;
根据参考区域概率分布矩阵确定获取的图像帧中的参考区域。
在一个实施例中,目标区域为手部区域;参考区域为面部区域;计算机程序被处理器执行时,使得处理器执行以下步骤:
确定获取的图像帧中的手部区域所对应的手势类型;
计算机程序被处理器执行当获取的第一图像帧中目标区域和参考区域的位置关系满足动作判定开始条件、且获取的第二图像帧中目标区域和参考区域的位置关系满足动作判定结束条件时,则判定检测到触发添加附加元素的动作的步骤时,使得处理器执行以下步骤:
如果第一图像帧的手势类型为触发类型,当第一图像帧中手部区域和面部区域的位置关系满足动作判定开始条件、且第二图像帧中手部区域和面部区域的位置关系满足动作判定结束条件时,则判定检测到触发添加附加元素的动作。
在一个实施例中,计算机程序被处理器执行当获取的第一图像帧中目标区域和参考区域的位置关系满足动作判定开始条件、且获取的第二图像帧中目标区域和参考区域的位置关系满足动作判定结束条件时,则判定检测到触发添加附加元素的动作的步骤时,使得处理器执行以下步骤:
从第一图像帧中目标区域和参考区域的位置关系满足动作判定开始条件时开始计时;
在计时时长未达到预设时长、且在第一图像帧之后获取的第二图像帧中目标区域和参考区域的位置关系满足动作判定结束条件时,则判定检测到触发添加附加元素的动作。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行以下步骤:
在第一图像帧中目标区域和参考区域的交集占目标区域的占比超过第一预设数值时,判定第一图像帧中目标区域和参考区域的位置关系满足动作判定开始条件;或者,
在第一图像帧中目标区域和参考区域的交集占目标区域的占比超过第二预设数值、且目标区域的中心位置位于参考区域中心位置的上方时,判定第一图像帧中目标区域和参考区域 的位置关系满足动作判定开始条件。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行以下步骤:
确定第二图像帧中的参考区域中的参考位置;
当第二图像帧中目标区域位于参考区域中的参考位置之上时,判定第二图像帧中目标区域和参考区域的位置关系满足动作判定结束条件。
在一个实施例中,目标区域为手部区域;参考区域为面部区域;参考位置为眉毛所在位置;计算机程序被处理器执行将附加元素添加至在第二图像帧之后采集的图像帧中的步骤时,使得处理器执行以下步骤:
在第二图像帧之后采集的图像帧中,确定面部区域中眉毛所在位置与手部区域靠近眉毛所在位置的边界形成的区域;
在第二图像帧之后采集的图像帧中,将附加元素添加至确定的区域。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行以下步骤:
在未检测到触发添加附加元素的动作时,将获取的图像帧按照采集的时序逐帧播放;
在检测到触发添加附加元素的动作后,将添加附加元素后的图像帧按照采集的时序逐帧播放。
在一个实施例中,计算机程序被处理器执行时,使得处理器执行以下步骤:
用添加附加元素后的图像帧,替换添加附加元素前的相应图像帧;
根据替换后所确定的图像帧的采集时间,将替换后所确定的图像帧按采集时间的时序生成录制的视频;
其中,替换后所确定的图像帧中,添加附加元素后的图像帧的采集时间,是添加附加元素前的相应图像帧的采集时间。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一非易失性计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储 器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (23)

  1. 一种图像处理方法,其特征在于,应用于图像处理系统,所述方法包括:
    获取采集的图像帧;
    在获取的图像帧中确定通过图像语义分割得到的目标区域和参考区域;
    当获取的第一图像帧中目标区域和参考区域的位置关系满足动作判定开始条件、且获取的第二图像帧中目标区域和参考区域的位置关系满足动作判定结束条件时,则判定检测到触发添加附加元素的动作,所述第二图像帧的采集时间位于所述第一图像帧的采集时间之后;
    在检测到所述动作时获取附加元素;
    将所述附加元素添加至所述第二图像帧之后采集的图像帧中。
  2. 根据权利要求1所述的方法,其特征在于,所述在获取的图像帧中确定通过图像语义分割得到的目标区域和参考区域,包括:
    将获取的图像帧输入图像语义分割模型;
    通过所述图像语义分割模型输出目标区域概率分布矩阵和参考区域概率分布矩阵;
    根据所述目标区域概率分布矩阵确定获取的图像帧中的目标区域;
    根据所述参考区域概率分布矩阵确定获取的图像帧中的参考区域。
  3. 根据权利要求2所述的方法,其特征在于,所述目标区域为手部区域;所述参考区域为面部区域;所述方法还包括:
    确定获取的图像帧中的手部区域所对应的手势类型;
    所述当获取的第一图像帧中目标区域和参考区域的位置关系满足动作判定开始条件、且获取的第二图像帧中目标区域和参考区域的位置关系满足动作判定结束条件时,则判定检测到触发添加附加元素的动作,包括:
    如果所述第一图像帧的手势类型为触发类型,当所述第一图像帧中手部区域和面部区域的位置关系满足动作判定开始条件、且所述第二图像帧中手部区域和面部区域的位置关系满足动作判定结束条件时,则判定检测到触发添加附加元素的动作。
  4. 根据权利要求1所述的方法,其特征在于,所述当获取的第一图像帧中目标区域和参考区域的位置关系满足动作判定开始条件、且获取的第二图像帧中目标区域和参考区域的位 置关系满足动作判定结束条件时,则判定检测到触发添加附加元素的动作,包括:
    从所述第一图像帧中目标区域和参考区域的位置关系满足动作判定开始条件时开始计时;
    在计时时长未达到预设时长、且在所述第一图像帧之后获取的第二图像帧中目标区域和参考区域的位置关系满足动作判定结束条件时,则判定检测到触发添加附加元素的动作。
  5. 根据权利要求4所述的方法,其特征在于,所述方法还包括:
    在所述第一图像帧中目标区域和参考区域的交集占所述目标区域的占比超过第一预设数值时,判定所述第一图像帧中目标区域和参考区域的位置关系满足动作判定开始条件;或者,
    在所述第一图像帧中目标区域和参考区域的交集占所述目标区域的占比超过第二预设数值、且所述目标区域的中心位置位于所述参考区域中心位置的上方时,判定所述第一图像帧中目标区域和参考区域的位置关系满足动作判定开始条件。
  6. 根据权利要求4所述的方法,其特征在于,所述方法还包括:
    确定所述第二图像帧中的参考区域中的参考位置;
    当所述第二图像帧中目标区域位于参考区域中的参考位置之上时,判定所述第二图像帧中目标区域和参考区域的位置关系满足动作判定结束条件。
  7. 根据权利要求6所述的方法,其特征在于,所述目标区域为手部区域;所述参考区域为面部区域;所述参考位置为眉毛所在位置;
    所述将所述附加元素添加至在所述第二图像帧之后采集的图像帧中,包括:
    在所述第二图像帧之后采集的图像帧中,确定面部区域中眉毛所在位置与手部区域靠近所述眉毛所在位置的边界形成的区域;
    在所述第二图像帧之后采集的图像帧中,将所述附加元素添加至确定的所述区域。
  8. 根据权利要求1-7中任一项所述的方法,其特征在于,所述方法还包括:
    在未检测到所述触发添加附加元素的动作时,将获取的图像帧按照采集的时序逐帧播放;
    在检测到所述触发添加附加元素的动作后,将添加附加元素后的图像帧按照采集的时序逐帧播放。
  9. 根据权利要求1-7中任一项所述的方法,其特征在于,所述方法还包括:
    用添加所述附加元素后的图像帧,替换添加所述附加元素前的相应图像帧;
    根据替换后所确定的图像帧的采集时间,将替换后所确定的图像帧按采集时间的时序生成录制的视频;
    其中,替换后所确定的图像帧中,添加所述附加元素后的图像帧的采集时间,是添加所述附加元素前的相应图像帧的采集时间。
  10. 一种图像处理装置,其特征在于,包括:
    获取模块,用于获取采集的图像帧;
    确定模块,用于在获取的图像帧中确定通过图像语义分割得到的目标区域和参考区域;
    判定模块,用于当获取的第一图像帧中目标区域和参考区域的位置关系满足动作判定开始条件、且获取的第二图像帧中目标区域和参考区域的位置关系满足动作判定结束条件时,则判定检测到触发添加附加元素的动作,所述第二图像帧的采集时间位于所述第一图像帧的采集时间之后;
    添加模块,用于在检测到所述动作时获取附加元素;将所述附加元素添加至所述第二图像帧之后采集的图像帧中。
  11. 根据权利要求10所述的装置,其特征在于,所述确定模块还用于将获取的图像帧输入图像语义分割模型;通过所述图像语义分割模型输出的目标区域概率分布矩阵和参考区域概率分布矩阵;根据所述目标区域概率分布矩阵确定获取的图像帧中的目标区域;根据所述参考区域概率分布矩阵确定获取的图像帧中的参考区域。
  12. 根据权利要求10所述的装置,其特征在于,所述判定模块还用于从所述第一图像帧中目标区域和参考区域的位置关系满足动作判定开始条件时开始计时;在计时时长未达到预设时长、且在所述第一图像帧之后获取的第二图像帧中目标区域和参考区域的位置关系满足动作判定结束条件时,则判定检测到触发添加附加元素的动作。
  13. 根据权利要求12所述的装置,其特征在于,所述判定模块还用于当所述第一图像帧中目标区域和参考区域的交集占所述目标区域的占比超过第一预设数值时,则判定所述第一图像帧中目标区域和参考区域的位置关系满足动作判定开始条件;或者,当所述第一图像帧中目标区域和参考区域的交集占所述目标区域的占比超过第二预设数值、且所述目标区域的中心位置位于所述参考区域中心位置的上方时,判定所述第一图像帧中目标区域和参考区域的位置关系满足动作判定开始条件。
  14. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行如权利要求1至9中任一项所述的方法的步骤。
  15. 一种计算机设备,其特征在于,包括存储器和处理器,所述存储器中储存有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行以下步骤:
    获取采集的图像帧;
    在获取的图像帧中确定通过图像语义分割得到的目标区域和参考区域;
    当获取的第一图像帧中目标区域和参考区域的位置关系满足动作判定开始条件、且获取的第二图像帧中目标区域和参考区域的位置关系满足动作判定结束条件时,则判定检测到触发添加附加元素的动作,所述第二图像帧的采集时间位于所述第一图像帧的采集时间之后;
    在检测到所述动作时获取附加元素;
    将所述附加元素添加至所述第二图像帧之后采集的图像帧中。
  16. 根据权利要求15所述的计算机设备,其特征在于,所述计算机程序被所述处理器执行在获取的图像帧中确定通过图像语义分割得到的目标区域和参考区域的步骤时,使得所述处理器执行以下步骤:
    将获取的图像帧输入图像语义分割模型;
    通过所述图像语义分割模型输出目标区域概率分布矩阵和参考区域概率分布矩阵;
    根据所述目标区域概率分布矩阵确定获取的图像帧中的目标区域;
    根据所述参考区域概率分布矩阵确定获取的图像帧中的参考区域。
  17. 根据权利要求16所述的计算机设备,其特征在于,所述目标区域为手部区域;所述参考区域为面部区域;所述计算机程序被所述处理器执行时,使得所述处理器执行以下步骤:
    确定获取的图像帧中的手部区域所对应的手势类型;
    所述计算机程序被所述处理器执行当获取的第一图像帧中目标区域和参考区域的位置关系满足动作判定开始条件、且获取的第二图像帧中目标区域和参考区域的位置关系满足动作判定结束条件时,则判定检测到触发添加附加元素的动作的步骤时,使得所述处理器执行以下步骤:
    如果所述第一图像帧的手势类型为触发类型,当所述第一图像帧中手部区域和面部区域 的位置关系满足动作判定开始条件、且所述第二图像帧中手部区域和面部区域的位置关系满足动作判定结束条件时,则判定检测到触发添加附加元素的动作。
  18. 根据权利要求15所述的计算机设备,其特征在于,所述计算机程序被所述处理器执行当获取的第一图像帧中目标区域和参考区域的位置关系满足动作判定开始条件、且获取的第二图像帧中目标区域和参考区域的位置关系满足动作判定结束条件时,则判定检测到触发添加附加元素的动作的步骤时,使得所述处理器执行以下步骤:
    从所述第一图像帧中目标区域和参考区域的位置关系满足动作判定开始条件时开始计时;
    在计时时长未达到预设时长、且在所述第一图像帧之后获取的第二图像帧中目标区域和参考区域的位置关系满足动作判定结束条件时,则判定检测到触发添加附加元素的动作。
  19. 根据权利要求18所述的计算机设备,其特征在于,所述计算机程序被所述处理器执行时,使得所述处理器执行以下步骤:
    在所述第一图像帧中目标区域和参考区域的交集占所述目标区域的占比超过第一预设数值时,判定所述第一图像帧中目标区域和参考区域的位置关系满足动作判定开始条件;或者,
    在所述第一图像帧中目标区域和参考区域的交集占所述目标区域的占比超过第二预设数值、且所述目标区域的中心位置位于所述参考区域中心位置的上方时,判定所述第一图像帧中目标区域和参考区域的位置关系满足动作判定开始条件。
  20. 根据权利要求18所述的计算机设备,其特征在于,所述计算机程序被所述处理器执行时,使得所述处理器执行以下步骤:
    确定所述第二图像帧中的参考区域中的参考位置;
    当所述第二图像帧中目标区域位于参考区域中的参考位置之上时,判定所述第二图像帧中目标区域和参考区域的位置关系满足动作判定结束条件。
  21. 根据权利要求20所述的计算机设备,其特征在于,所述目标区域为手部区域;所述参考区域为面部区域;所述参考位置为眉毛所在位置;所述计算机程序被所述处理器执行将所述附加元素添加至在所述第二图像帧之后采集的图像帧中的步骤时,使得所述处理器执行以下步骤:
    在所述第二图像帧之后采集的图像帧中,确定面部区域中眉毛所在位置与手部区域靠近 所述眉毛所在位置的边界形成的区域;
    在所述第二图像帧之后采集的图像帧中,将所述附加元素添加至确定的所述区域。
  22. 根据权利要求15-21中任一项所述的计算机设备,其特征在于,所述计算机程序被所述处理器执行时,使得所述处理器执行以下步骤:
    在未检测到所述触发添加附加元素的动作时,将获取的图像帧按照采集的时序逐帧播放;
    在检测到所述触发添加附加元素的动作后,将添加附加元素后的图像帧按照采集的时序逐帧播放。
  23. 根据权利要求15-21中任一项所述的计算机设备,其特征在于,所述计算机程序被所述处理器执行时,使得所述处理器执行以下步骤:
    用添加所述附加元素后的图像帧,替换添加所述附加元素前的相应图像帧;
    根据替换后所确定的图像帧的采集时间,将替换后所确定的图像帧按采集时间的时序生成录制的视频;
    其中,替换后所确定的图像帧中,添加所述附加元素后的图像帧的采集时间,是添加所述附加元素前的相应图像帧的采集时间。
PCT/CN2019/092586 2018-07-11 2019-06-24 图像处理方法、装置、存储介质和计算机设备 WO2020011001A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/997,887 US11367196B2 (en) 2018-07-11 2020-08-19 Image processing method, apparatus, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810755907.7 2018-07-11
CN201810755907.7A CN110163861A (zh) 2018-07-11 2018-07-11 图像处理方法、装置、存储介质和计算机设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/997,887 Continuation US11367196B2 (en) 2018-07-11 2020-08-19 Image processing method, apparatus, and storage medium

Publications (1)

Publication Number Publication Date
WO2020011001A1 true WO2020011001A1 (zh) 2020-01-16

Family

ID=67645067

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/092586 WO2020011001A1 (zh) 2018-07-11 2019-06-24 图像处理方法、装置、存储介质和计算机设备

Country Status (3)

Country Link
US (1) US11367196B2 (zh)
CN (1) CN110163861A (zh)
WO (1) WO2020011001A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111582109A (zh) * 2020-04-28 2020-08-25 北京海益同展信息科技有限公司 识别方法、识别装置、计算机可读存储介质及电子设备

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108921101A (zh) * 2018-07-04 2018-11-30 百度在线网络技术(北京)有限公司 基于手势识别控制指令的处理方法、设备及可读存储介质
CN112153400B (zh) * 2020-09-22 2022-12-06 北京达佳互联信息技术有限公司 直播互动方法、装置、电子设备及存储介质
CN113313791B (zh) * 2021-07-30 2021-10-01 深圳市知小兵科技有限公司 互联网游戏的图像处理方法及相关设备
CN116614666B (zh) * 2023-07-17 2023-10-20 微网优联科技(成都)有限公司 一种基于ai摄像头特征提取系统及方法

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105791692A (zh) * 2016-03-14 2016-07-20 腾讯科技(深圳)有限公司 一种信息处理方法及终端

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9330718B2 (en) * 2013-02-20 2016-05-03 Intel Corporation Techniques for adding interactive features to videos
US10726593B2 (en) * 2015-09-22 2020-07-28 Fyusion, Inc. Artificially rendering images using viewpoint interpolation and extrapolation

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105791692A (zh) * 2016-03-14 2016-07-20 腾讯科技(深圳)有限公司 一种信息处理方法及终端

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111582109A (zh) * 2020-04-28 2020-08-25 北京海益同展信息科技有限公司 识别方法、识别装置、计算机可读存储介质及电子设备
CN111582109B (zh) * 2020-04-28 2023-09-05 京东科技信息技术有限公司 识别方法、识别装置、计算机可读存储介质及电子设备

Also Published As

Publication number Publication date
US11367196B2 (en) 2022-06-21
US20200380690A1 (en) 2020-12-03
CN110163861A (zh) 2019-08-23

Similar Documents

Publication Publication Date Title
WO2020011001A1 (zh) 图像处理方法、装置、存储介质和计算机设备
TWI777162B (zh) 圖像處理方法及裝置、電子設備和電腦可讀儲存媒體
JP6662876B2 (ja) アバター選択機構
US9299004B2 (en) Image foreground detection
US9330334B2 (en) Iterative saliency map estimation
US10019823B2 (en) Combined composition and change-based models for image cropping
GB2567920A (en) Deep salient content neural networks for efficient digital object segmentation
WO2019091412A1 (zh) 拍摄图像的方法、装置、终端和存储介质
US10313746B2 (en) Server, client and video processing method
US11145065B2 (en) Selection of video frames using a machine learning predictor
CN106815803B (zh) 图片的处理方法及装置
US20240046538A1 (en) Method for generating face shape adjustment image, model training method, apparatus and device
WO2022100690A1 (zh) 动物脸风格图像生成方法、模型训练方法、装置和设备
CN115294055A (zh) 图像处理方法、装置、电子设备和可读存储介质
CN109167939B (zh) 一种自动配文方法、装置及计算机存储介质
JP2023545052A (ja) 画像処理モデルの訓練方法及び装置、画像処理方法及び装置、電子機器並びにコンピュータプログラム
CN114372172A (zh) 生成视频封面图像的方法、装置、计算机设备及存储介质
CN113689440A (zh) 一种视频处理方法、装置、计算机设备以及存储介质
US11080549B1 (en) Automated cropping of images using a machine learning predictor
CN111274447A (zh) 基于视频的目标表情生成方法、装置、介质、电子设备
US9807315B1 (en) Lookup table interpolation in a film emulation camera system
US20240161240A1 (en) Harmonizing composite images utilizing a semantic-guided transformer neural network
US20210248755A1 (en) Product release method and image processing method, apparatus, device, and storage medium
JP7292349B2 (ja) 画像を処理するための方法およびシステム
US11947591B2 (en) Methods and systems for processing imagery

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19833160

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19833160

Country of ref document: EP

Kind code of ref document: A1