WO2019114696A1 - 一种增强现实的处理方法、对象识别的方法及相关设备 - Google Patents

一种增强现实的处理方法、对象识别的方法及相关设备 Download PDF

Info

Publication number
WO2019114696A1
WO2019114696A1 PCT/CN2018/120301 CN2018120301W WO2019114696A1 WO 2019114696 A1 WO2019114696 A1 WO 2019114696A1 CN 2018120301 W CN2018120301 W CN 2018120301W WO 2019114696 A1 WO2019114696 A1 WO 2019114696A1
Authority
WO
WIPO (PCT)
Prior art keywords
key point
image
pose
target
point set
Prior art date
Application number
PCT/CN2018/120301
Other languages
English (en)
French (fr)
Inventor
朱晓龙
王一同
黄凯宁
梅利健
黄生辉
罗镜民
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP18887537.1A priority Critical patent/EP3617995A4/en
Publication of WO2019114696A1 publication Critical patent/WO2019114696A1/zh
Priority to US16/680,058 priority patent/US10891799B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/006Mixed reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/248Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/277Analysis of motion involving stochastic approaches, e.g. using Kalman filters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2155Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06V10/7753Incorporation of unlabelled data, e.g. multiple instance learning [MIL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition

Definitions

  • the invention relates to the field of computer vision, in particular to augmented reality processing technology and object recognition technology.
  • the existing gesture recognition algorithm can only recognize the pose of all the people in each frame of the image, but cannot connect the poses of a particular person in the video sequence. In other words, when dealing with multi-person interaction in the video stream, the existing gesture recognition algorithm cannot determine whether a certain pose information in the current frame image and a certain pose information in other frame images belong to the same person, which reduces the accuracy of the recognition. rate.
  • the embodiment of the invention provides an augmented reality processing method, an object recognition method and related devices.
  • the terminal can identify a key point set belonging to the same object in the video stream, thereby improving the The accuracy of the identification.
  • a first aspect of the embodiments of the present invention provides a processing method for augmented reality, where the method is applied to a terminal, where the terminal is configured to generate an enhanced information image for a first object in a multi-frame image, where the multi-frame image includes An image and a second image, the second image being an adjacent image after the first image, the method comprising:
  • the target first pose key point set is used as a key point set of the first object in the second image
  • a second aspect of the embodiments of the present invention provides a method for object recognition, where the method is applied to a terminal, where the terminal is configured to generate an enhanced information image for a first object in a multi-frame image, where the multi-frame image includes a first An image and a second image, the second image being an adjacent image after the first image, the method comprising:
  • the target first pose key point set is used as a key point set of the first object in the second image.
  • a third aspect of the embodiments of the present invention provides a terminal, where the terminal is configured to generate an enhanced information image for a first object in a multi-frame image, where the multi-frame image includes a first image and a second image, and the second The image is an image of a frame adjacent to the first image, and the terminal includes:
  • An acquiring module configured to acquire a key point set of the first object in the first image
  • the acquiring module is configured to acquire, by using a neural network model, a first set of pose key points corresponding to the plurality of objects in the second image, where the neural network model is configured to acquire a set of key points of the object in the image, where the first At least one first pose key point is included in the pose key point set;
  • the acquiring module is configured to determine, according to the key point set and the motion trend of the first object, a second pose key point set of the first object in the second image, where the second pose key point The set includes at least one second pose key point;
  • a determining module configured to target any one of the plurality of first pose key point sets, according to the target first pose key point set, at least one first pose key point, and the at least one Determining a target distance between the target first pose key point set and the second pose key point set;
  • the determining module is configured to use the target first pose key point set as a key point set of the first object in the second image if the target distance satisfies a preset condition.
  • a fourth aspect of the embodiments of the present invention provides a terminal, including: a memory, a processor, and a bus system;
  • the memory is used to store a program
  • the processor is configured to execute the program in the memory, and specifically includes the following steps:
  • the target first pose key point set is used as a key point set of the first object in the second image
  • the bus system is configured to connect the memory and the processor to cause the memory and the processor to communicate.
  • a fifth aspect of the invention provides a computer readable storage medium having stored therein instructions that, when executed on a computer, cause the computer to perform the methods described in the above aspects.
  • a sixth aspect of the invention provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the methods described in the various aspects above.
  • a method for object recognition is provided, where the method is applied to a terminal, where the terminal is configured to generate an enhanced information image for a first object in a multi-frame image, where the multi-frame image includes a first image and a second image, the second image being an adjacent image after the first image.
  • the terminal acquires a key point set of the first object in the first image, and then acquires a first pose key point set corresponding to the plurality of objects in the second image by using the neural network model, and further, the terminal also needs to be based on the key point set and the first
  • the motion trend of the object determines a second set of pose key points of the first object in the second image, and the second set of pose key points may reflect a possible motion pose of the first object in the second image, which may be used as the determination of which first pose
  • the key point set is the basis for judging the key point set of the first object.
  • the terminal is configured according to at least one first pose key point and at least one second pose key point in the target first pose key point set, Determining a target distance between the first pose key point set and the second pose key point set, and if the target distance satisfies a preset condition, using the target first pose key point set as the key point set of the first object in the second image .
  • the terminal can identify the belonging in the video stream by using the second set of pose key points as the judgment basis for determining which first pose key point set is the key point set of the first object.
  • the collection of key points of the same object improves the accuracy of recognition.
  • FIG. 1 is a schematic flow chart of multi-person interactive gesture recognition according to an embodiment of the present invention.
  • FIG. 2 is a schematic diagram of an embodiment of a method for processing augmented reality according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of multi-person interactive gesture recognition in an application scenario of the present invention.
  • FIG. 4 is a schematic diagram of an embodiment of a method for object recognition according to an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of acquiring a set of key points in a single frame image according to an embodiment of the present invention
  • FIG. 6 is a schematic flowchart of identifying an object according to an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of an embodiment of a terminal according to an embodiment of the present invention.
  • FIG. 8 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
  • the embodiment of the invention provides an augmented reality processing method, an object recognition method and related devices.
  • the terminal can identify a key point set belonging to the same object in the video stream, thereby improving the The accuracy of the identification.
  • FIG. 1 is a schematic flowchart of multi-person interactive gesture recognition according to an embodiment of the present invention, as shown in the figure, specifically:
  • step 101 a video is acquired, where the video includes a multi-frame image
  • step 102 human pose estimation is performed on each frame image in the video
  • step 103 it is determined whether the image of the frame in step 102 is the first frame image in the video, and if so, proceeds to step 104, otherwise, if it is not the first frame image, then proceeds to step 105;
  • each human gesture is given a unique identity number (identity, ID) in the first frame image;
  • step 105 if it is not the first frame image, the human body pose estimation is continued on the frame image, and the human body key points of the previous frame image are tracked;
  • step 106 combined with the key point tracking result and the posture estimation result of the current frame image, the ID of each human body posture of the current frame is determined.
  • an embodiment of the processing method of the augmented reality in the embodiment of the present invention includes:
  • the method is applied to a terminal, where the terminal is configured to generate an enhanced information image for the first object in the multi-frame image, where the multi-frame image includes a first image and a second image, where the second image is An image of one frame adjacent to the first image is described.
  • the first object may be a user in the first image
  • the enhanced information image may be a texture, such as a “clothing”, “airplane” or “flower” texture, and may adopt an augmented reality (AR) technology. Combine the two into one object.
  • AR augmented reality
  • the terminal uses the neural network model to obtain a first set of pose key points corresponding to the plurality of objects in the second image.
  • the second image may be used as an input of the neural network model, and the plurality of objects respectively correspond to the first
  • the set of pose key points is used as an input to the neural network model.
  • the neural network model here is specifically OpenPose.
  • it can also be a convolutional neural network based pose estimation mechanism (CPM).
  • the second image is input into the neural network model, and the second image can be output.
  • the terminal predicts a key point set in the first image by using at least one of an optical flow method, a Kalman filtering algorithm, and a sliding window algorithm, thereby obtaining a second posture of the first object in the second image.
  • a collection of key points A collection of key points.
  • the motion trend can reflect the possible pose of the first object in the next frame. Therefore, the principle that the second pose key point set of the first object in the second image is predicted is actually predicted according to the key point set obtained in step 201 and the motion trend of the first object, thereby ensuring the obtained second
  • the pose key point is corresponding to the first object and reflects the possible pose of the first object in the second image.
  • the manner of determining the trend of the first object may be different depending on the algorithm used.
  • the motion trend of the first object may be determined by determining a motion trend of the first object according to the first image.
  • the algorithm used may be a Kalman filter algorithm or a sliding window algorithm.
  • Another manner of determining the motion trend of the first object may be determining a motion trend of the first object according to a pixel change between the first image and the second image.
  • the algorithm used may be an optical flow method.
  • any one of the plurality of first attitude key point sets according to the target first posture key point set, at least one first posture key point and the at least one second posture key Pointing, determining a target distance between the target first pose key point set and the second pose key point set;
  • the terminal calculates a distance between at least one first pose key point in the target first pose key point set and at least one second pose key point in the second pose key point set, for example, the first pose key point is (1.1), the key point of the second pose is (3, 3), then the calculation can be done as follows:
  • Dist is 2.828, and Dist represents the target distance.
  • the target distance is less than the preset threshold, set the target first pose key point as a key point set of the first object in the second image.
  • the terminal uses the AR technology to superimpose the first object and the target virtual object according to the key point information of the first object in the second image to generate an enhanced information image, and
  • the terminal display interface is displayed to generate an augmented reality image, the augmented reality image including the second image and the enhanced information image.
  • AR technology is a technology that integrates real world information and virtual world information "seamlessly". It is an entity information (visual information, sound, taste and taste) that is difficult to experience in a certain time and space of the real world. Tactile, etc.) Through the use of science and technology such as computers, simulation and then superimposition, the virtual information is applied to the real world, perceived by human senses, to achieve a sensory experience beyond reality. The real environment and virtual objects are superimposed in real time on the same picture or space. AR technology includes multimedia, 3D modeling, real-time video display and control, multi-sensor fusion, real-time tracking and registration, and scene fusion.
  • the AR system has three prominent features, the first, real world and virtual information integration; the second, with real-time interactivity; the third is to add positioning virtual objects in the three-dimensional scale space.
  • FIG. 3 is a schematic diagram of multi-person interactive gesture recognition in the application scenario of the present invention.
  • an embodiment of the method for object recognition in the embodiment of the present invention includes:
  • the terminal is configured to generate an enhanced information image for the first object in the multi-frame image, where the multi-frame image includes the first image and the second image, where the second image is adjacent to the first image One frame of the image.
  • the first object may refer to a person in a multi-person interactive scene.
  • FIG. 5 is a schematic diagram of acquiring a set of key points in a single frame image according to an embodiment of the present invention.
  • a person's gesture contains N pre-defined keypoint positions and their corresponding connections.
  • N key points such as a key point corresponding to the nose, a key point corresponding to the eye, and a key point corresponding to the neck.
  • the terminal uses the neural network model to acquire a first set of pose key points corresponding to the plurality of objects in the second image.
  • the neural network model here is specifically OpenPose. In addition, it may also be a CPM.
  • the first pose key point set corresponding to the plurality of objects in the second image may be output.
  • the terminal predicts a key point set in the first image by using at least one of an optical flow method, a Kalman filtering algorithm, and a sliding window algorithm, thereby obtaining a second posture of the first object in the second image.
  • a collection of key points A collection of key points.
  • any one of the plurality of first gesture key point sets according to the target first posture key point set, at least one first posture key point and the at least one second posture key Pointing, determining a target distance between the target first pose key point set and the second pose key point set;
  • the terminal calculates at least one first pose key point in the first pose key point set in the target first pose key point set, and a direct distance from at least one second pose key point in the second pose key point set, for example
  • the key point of the first pose is (1.1), and the key point of the second pose is (3, 3), then the calculation can be performed as follows:
  • Dist is 2.828, and Dist represents the target distance.
  • the target distance meets a preset condition, use the target first pose key point set as a key point set of the first object in the second image.
  • the preset condition is used to determine whether the target first pose key point set and the second pose key point set are similar. If the target distance satisfies the preset condition, the target first pose key point set and the second pose key are considered.
  • the point set is similar, that is, the target first pose key point set is the key point set of the first object in the second image, and the currently recognized object is the first object.
  • the preset condition may be that the target distance is less than a preset threshold.
  • the target first pose key point set is not the key point set of the first object in the second image, and the currently identified object is not the first object.
  • the target distance usually refers to the Euclidean distance, and may also be other distances, such as the Manhattan distance, which is not limited herein.
  • a method for object recognition is provided, where the method is applied to a terminal, and the terminal is configured to generate an enhanced information image for a first object in a multi-frame image, where the multi-frame image includes the first image and the second image.
  • the image, the second image is an adjacent image after the first image.
  • the terminal acquires a key point set of the first object in the first image, and then acquires a first pose key point set corresponding to the plurality of objects in the second image by using the neural network model, and further, the terminal also needs to be based on the key point set and the first
  • the motion trend of the object determines a second set of pose key points of the first object in the second image, and the second set of pose key points may reflect a possible motion pose of the first object in the second image, which may be used as the determination of which first pose
  • the key point set is the basis for judging the key point set of the first object.
  • the terminal is configured according to at least one first pose key point and at least one second pose key point in the target first pose key point set, Determining a target distance between the first pose key point set and the second pose key point set, and if the target distance satisfies a preset condition, using the target first pose key point set as the key point set of the first object in the second image .
  • the terminal can identify the belonging in the video stream by using the second set of pose key points as the judgment basis for determining which first pose key point set is the key point set of the first object.
  • the collection of key points of the same object improves the accuracy of recognition.
  • the multi-person estimation method is generally divided into two categories, namely, top-down and bottom-up, wherein top-down refers to first boxing out, wherein the person is mentioned in the embodiment.
  • Object Then use a single method to locate the joints of the person, and the bottom-up approach is to first determine the position of all joints, and then distinguish who the joint belongs to.
  • the present invention mainly uses a bottom-up approach to identify objects in a frame of image.
  • the terminal If the target distance is less than the preset threshold, the terminal considers that the target first pose key point set belongs to the first object, and further uses the AR technology to superimpose the first object and the target virtual object to generate AR information, and displays the AR information in the terminal. Displayed on the interface.
  • FIG. 6 is a schematic flowchart of identifying an object according to an embodiment of the present invention.
  • the terminal predicts the second image, and uses the attitude estimation algorithm to obtain the second image.
  • the first set of pose key points corresponding to the plurality of objects respectively (referred to as set A), and the second pose key point set of the first object in the second image is obtained by using the human body key point tracking algorithm (referred to as set B);
  • step 402 the terminal first marks all second pose key points in the set B as "unused";
  • step 403 for each set of key points in any one of the plurality of sets A (the target first pose key point set), calculate the key of each set in the set B that is marked as "unused” The distance of the point, for each set of key points in the target set A, record the previous frame ID (denoted as ID_pre) corresponding to the "unused" key point in the set B with the smallest distance and its corresponding distance, that is, the target distance ;
  • step 404 it is determined whether the target distance is less than the preset threshold, and if so, proceeds to step 406, and if not, proceeds to step 405;
  • step 405 when the distance is greater than or equal to the preset threshold, the ID of the human body pose corresponding to the set of key points is marked as a new (and previously conflicting) ID;
  • step 406 when the distance is less than the preset threshold, it is considered that a set of key points in the target first pose key point set A and a set of key points in the set B are successfully matched, so the target first pose key point set A can be
  • the ID of the human body gesture corresponding to the set of key points in the group is marked as the set of key points ID (ID_pre) in the corresponding set B, and the key points of the corresponding set in the set B are marked as "used";
  • the terminal determines that the target first pose key point set belongs to the first object.
  • the terminal can identify a set of key points belonging to the same object in the video stream, thereby improving the accuracy of the recognition.
  • a plurality of objects corresponding to the second image are respectively obtained by using a neural network model.
  • the first set of key points of the gesture may include:
  • the heat map comprising a probability heat map and a vector-based heat map
  • thermogram Determining, by the thermogram, a first set of pose key points corresponding to the plurality of objects in the second image.
  • a neural network model may be used to predict a first set of pose key points corresponding to a plurality of objects in the second image.
  • the neural network model may be OpenPose, and the Probose heat map corresponding to all the pose key points in the second image and the corresponding part affinity fields (PAF) heat map are predicted by OpenPose, and then the post-processing algorithm is adopted. All pose key points are connected to each person's posture.
  • the detection process is to input a frame image, then obtain the probability heat map and PAF, and then generate a series of even matches according to the PAF. Due to the vector nature of the PAF itself, the generated even matches are correct and finally merge into one's overall skeleton.
  • the terminal uses the neural network model to acquire the heat map of all the key points in the second image, and then predicts the first pose key point set through the heat map.
  • the neural network model similar to OpenPose is used to predict the first pose key point set with good reliability, and the neural network model runs faster, and there are more objects in the same frame image. The difficulty of predicting a large set of key points.
  • the method according to the motion of the key point set and the first object is determined.
  • the second set of pose key points of the first object in the second image may include:
  • the preset algorithm is an optical flow algorithm and a Kalman filter At least one of an algorithm and a sliding window algorithm.
  • the terminal may calculate the second set of pose key points by using at least one of an optical flow algorithm, a Kalman filter algorithm, and a sliding window algorithm.
  • the sliding window algorithm is used to predict the key point set.
  • the position is linear, then the position of the key point (1, 1) in the first image is (2, 2) in the second image, and the key point (2, 2) in the first image is in the second image.
  • the position is (3, 3), and the position of the key point (3, 3) in the first image is (4, 4) in the second image.
  • an optical flow algorithm and/or a Kalman filter algorithm may also be employed.
  • the optical flow algorithm may be based on the assumption that a change in the grayscale distribution of the image, that is, a pixel change is caused by the motion of the target or the scene, and may reflect the motion trend of the first object. That is to say, the gray level of the target and the scene does not change with time. This makes the optical flow method less resistant to noise, and its application range is generally limited to the assumption that the gray level of the target and the scene remains unchanged.
  • the Kalman filter algorithm is an algorithm that uses the linear system state equation to estimate the state of the system through the input and output of the system. Since the observed data includes the effects of noise and interference in the system, the optimal estimate can also be considered as a filtering process.
  • the Kalman filter algorithm does not require that both signal and noise are assumptions for a stationary process. For system disturbances and observation errors at each moment, as long as some appropriate assumptions are made about their statistical properties, by processing the observation signals containing noise, the real signal with the smallest error can be obtained in the average sense. Estimated value. In image processing, the Kalman filter algorithm is applied to recover the blurred image due to some noise effects. After some statistical assumptions about the noise, Kalman's algorithm can be used to recursively The real image with the smallest mean square error is obtained in the blurred image, so that the blurred image is restored.
  • At least one of the optical flow algorithm, the Kalman filtering algorithm, and the sliding window algorithm may be used to calculate the motion trend of the key point set and the first object, to obtain the first object in the second image.
  • the second set of key points in the set may be used to calculate the motion trend of the key point set and the first object, to obtain the first object in the second image.
  • the second set of key points in the set may be used to calculate the motion trend of the key point set and the first object, to obtain the first object in the second image.
  • the second set of key points in the set may be used to calculate the motion trend of the key point set and the first object, to obtain the first object in the second image.
  • the second set of key points in the set may be used to calculate the motion trend of the key point set and the first object, to obtain the first object in the second image.
  • the second set of key points in the set may be used to calculate the motion trend of the key point set and the first object, to obtain the first object in the second image.
  • Determining, according to any one of the first set of target key points, at least one first posture key point and the at least one second posture key point of the target first posture key point set, determining The target distance between the target first pose key point set and the second pose key point set may include:
  • the target distance is calculated according to the position information of the first target key point and the position information of the second target key point.
  • the terminal may acquire location information of a first target key point from at least one first gesture key point of the target first pose key point set, and acquire a second target from the at least one second posture key point. Location information for key points. It is assumed that the target first pose key point set includes two first pose key points (ie, point a and point b), and the second pose key point set includes two second pose key points (ie, point A and point B). Among them, point a and point A are the key points of the head, and points b and B are the key points of the neck. First, the terminal selects the shortest path according to the distance between point a to point A and the distance from point b to point B. For example, the distance between point a and point A is 10, and point b to point B. The distance is 20, then the target distance is 10.
  • a method for calculating a target distance by using a minimum value method is provided, that is, calculating a distance between two key points of the two sets of posture key points that are closest to each other, and the distance is the target distance.
  • the foregoing Determining, according to any one of the first set of target key points, at least one first posture key point and the at least one second posture key point of the target first posture key point set, determining The target distance between the target first pose key point set and the second pose key point set may include:
  • the terminal may acquire location information of each of the first target key points in the at least one first gesture key point of the target first pose key point set, and each second target key point of the at least one second posture key point. location information.
  • the target first pose key point set includes two first pose key points (ie, point a and point b)
  • the second pose key point set includes two second pose key points (ie, point A and point B).
  • point a and point A are the key points of the head
  • points b and B are the key points of the neck.
  • the terminal calculates the minimum distance between the key points of the head, that is, the distance between point a and point A, assuming that the calculated distance is 10, and then calculates the minimum distance between the key points of the neck, that is, point b to point B. The distance between them is assumed to be a calculated distance of 20. The terminal will then average the two distances and calculate a target distance of 15.
  • a method for calculating a target distance by using an average value method is provided, that is, a key point in a set of pose key points is matched with a key point in another set of pose key points. And get the minimum distance, and then take the average of all the minimum distances to determine the target distance.
  • the average value method is used to calculate the target distance with higher reliability, which is beneficial to improving the feasibility and operability of the solution.
  • the first object has a unique correspondence with the first object identifier;
  • the object identifier is used to identify the first object in the multi-frame image.
  • different objects may also be identified, and each object only corresponds to a unique identifier, and the identifiers are not repetitive.
  • each object there are four objects, namely, A, B, C, and D.
  • the relationship between each object and its corresponding identifier is as shown in Table 1.
  • identifiers are used to identify different objects, and it is determined in the multi-frame image whether the objects belong to the same object.
  • the uniqueness of the object can be directly determined according to the identifier, and the unique object is processed correspondingly, thereby improving the practicability and feasibility of the solution.
  • FIG. 7 is a schematic diagram of an embodiment of a terminal according to an embodiment of the present invention.
  • the terminal is configured to generate an enhanced information image for a first object in a multi-frame image, where the multi-frame image includes The first image and the second image, the second image is an adjacent image after the first image, and the terminal 50 includes:
  • An obtaining module 501 configured to acquire a key point set of the first object in the first image
  • the acquiring module 501 is configured to acquire, by using a neural network model, a first set of pose key points corresponding to the plurality of objects in the second image, where the neural network model is configured to acquire a set of key points of the object in the image, where the At least one first pose key point is included in a set of pose key points;
  • the obtaining module 501 is configured to determine, according to the key point set and the motion trend of the first object, a second set of pose key points of the first object in the second image, where the second pose key
  • the point set includes at least one second pose key point
  • a determining module 502 configured to use, according to any one of the plurality of first gesture key point sets, the at least one first posture in the target first posture key point set acquired by the acquiring module 501 a key point and the at least one second attitude key point, determining a target distance between the target first pose key point set and the second pose key point set;
  • the determining module 502 is configured to use the target first pose key point set as a key point set of the first object in the second image if the target distance satisfies a preset condition.
  • the acquiring module 501 acquires a key point set of the first object in the first image, and the acquiring module 501 acquires a first pose key point set corresponding to the plurality of objects in the second image by using the neural network model.
  • the obtaining module 501 determines, according to the key point set and the motion trend of the first object, a second set of pose key points of the first object in the second image, for any one of the plurality of first pose key point sets a target first pose key point set, the determining module 502, according to the at least one first pose key point and the second pose key point set in the target first pose key point set acquired by the obtaining module 501, Determining a target distance between the target first pose key point set and the second pose key point set, and when the target distance satisfies a preset condition, the determining module 502 uses the target first pose key point set as a A set of key points of the first object in the second image.
  • a terminal acquires a key point set of a first object in a first image, and then acquires a first pose key point set corresponding to multiple objects in the second image by using a neural network model.
  • the terminal also needs to determine a second set of pose key points of the first object in the second image according to the set of key points and the motion trend of the first object, and the second set of pose key points may reflect that the first object is in the second image.
  • the possible motion pose can be used as a basis for determining which first pose key point set is the key point set of the first object.
  • the terminal is configured according to at least one first pose key point and at least one second pose key point in the target first pose key point set, Determining a target distance between the first pose key point set and the second pose key point set, and if the target distance satisfies a preset condition, using the target first pose key point set as the key point set of the first object in the second image .
  • the terminal can identify the belonging in the video stream by using the second set of pose key points as the judgment basis for determining which first pose key point set is the key point set of the first object.
  • the collection of key points of the same object improves the accuracy of recognition.
  • the preset condition is that the target distance is less than the preset threshold.
  • the terminal determines that the target first pose key point set belongs to the first object. In the above manner, in a multi-person interaction scenario, the terminal can identify a set of key points belonging to the same object in the video stream, thereby improving the accuracy of the recognition.
  • the obtaining module 501 is specifically configured to acquire, by using the neural network model, a heat map of all key points in the second image, where the heat map includes a probability heat map and a vector-based heat map;
  • thermogram Determining, by the thermogram, a first set of pose key points corresponding to the plurality of objects in the second image.
  • the terminal acquires a heat map of all the key points in the second image by using the neural network model, and then predicts the first pose key point set corresponding to the plurality of objects in the second image by using the heat map.
  • the neural network model similar to OpenPose is used to predict the first pose key point set with good reliability, and the neural network model runs faster, and there are more objects in the same frame image. The difficulty of predicting a large set of key points.
  • the obtaining module 501 is configured to calculate, by using a preset algorithm, the motion trend of the key point set and the first object, to obtain the second pose key point set, where the preset algorithm is an optical flow algorithm. At least one of a Kalman filter algorithm and a sliding window algorithm.
  • At least one of the optical flow algorithm, the Kalman filtering algorithm, and the sliding window algorithm may be used to calculate a motion set of the key point set and the first object, and obtain the first object in the first A set of second pose key points in the two images.
  • the key point set of an object can be tracked in the multi-frame image, and the second pose key point set of the object in the next frame image is obtained, and the optical flow algorithm, the Kalman filter algorithm and the sliding window algorithm are both obtained. It is a computationally inefficient algorithm that improves the efficiency of key point set tracking.
  • the determining module 502 is configured to acquire location information of the first target key point from the at least one first gesture key point, and obtain a location of the second target key point from the at least one second posture key point Information, the second target key point is a key point having a minimum linear distance from the first target key point;
  • a method for calculating a target distance by using a minimum value method is provided, that is, calculating a distance between two key points of the two sets of posture key points that are closest to each other, and the distance is the target distance.
  • the determining module 502 is configured to acquire location information of each of the at least one first target key point, and obtain a location of each of the at least one second target key point Information, wherein each of the first target key points has a one-to-one correspondence with each of the second target key points;
  • a method for calculating a target distance by using an average value method is provided, that is, a key point in a set of pose key points is matched with a key point in another set of pose key points. And get the minimum distance, and then take the average of all the minimum distances to determine the target distance.
  • the average value method is used to calculate the target distance with higher reliability, which is beneficial to improving the feasibility and operability of the solution.
  • the first object has a unique correspondence with the first object identifier
  • the first object identifier is for identifying the first object in the multi-frame image.
  • identifiers are used to identify different objects, and it is determined in the multi-frame image whether the objects belong to the same object.
  • the uniqueness of the object can be directly determined according to the identifier, and the unique object is processed correspondingly, thereby improving the practicability and feasibility of the solution.
  • the embodiment of the present invention further provides another terminal.
  • FIG. 8 for the convenience of description, only parts related to the embodiment of the present invention are shown. If the specific technical details are not disclosed, please refer to the method part of the embodiment of the present invention.
  • the terminal may be any terminal device including a mobile phone, a tablet computer, a personal digital assistant (PDA), a point of sales (POS), a car computer, and the like, and the terminal is a mobile phone as an example:
  • FIG. 8 is a block diagram showing a partial structure of a mobile phone related to a terminal provided by an embodiment of the present invention.
  • the mobile phone includes: a radio frequency (RF) circuit 610 , a memory 620 , an input unit 630 , a display unit 640 , a sensor 650 , an audio circuit 660 , a wireless fidelity (WiFi) module 670 , and a processor 680 . And power supply 690 and other components.
  • RF radio frequency
  • the RF circuit 610 can be used for transmitting and receiving information or during a call, and receiving and transmitting the signal. Specifically, after receiving the downlink information of the base station, the processor 680 processes the data. In addition, the uplink data is designed to be sent to the base station. Generally, RF circuit 610 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like. In addition, RF circuitry 610 can also communicate with the network and other devices via wireless communication. The above wireless communication may use any communication standard or protocol, including but not limited to Global System of Mobile communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (Code Division). Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), E-mail, Short Messaging Service (SMS), and the like.
  • GSM Global System of Mobile communication
  • GPRS General Packet Radio Service
  • the memory 620 can be used to store software programs and modules, and the processor 680 executes various functional applications and data processing of the mobile phone by running software programs and modules stored in the memory 620.
  • the memory 620 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may be stored according to Data created by the use of the mobile phone (such as audio data, phone book, etc.).
  • memory 620 can include high speed random access memory, and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
  • the input unit 630 can be configured to receive input numeric or character information and to generate key signal inputs related to user settings and function controls of the handset.
  • the input unit 630 may include a touch panel 631 and other input devices 632.
  • the touch panel 631 also referred to as a touch screen, can collect touch operations on or near the user (such as the user using a finger, a stylus, or the like on the touch panel 631 or near the touch panel 631. Operation), and drive the corresponding connecting device according to a preset program.
  • the touch panel 631 can include two parts: a touch detection device and a touch controller.
  • the touch detection device detects the touch orientation of the user, and detects a signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts the touch information into contact coordinates, and sends the touch information.
  • the processor 680 is provided and can receive commands from the processor 680 and execute them.
  • the touch panel 631 can be implemented in various types such as resistive, capacitive, infrared, and surface acoustic waves.
  • the input unit 630 may also include other input devices 632.
  • other input devices 632 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control buttons, switch buttons, etc.), trackballs, mice, joysticks, and the like.
  • the display unit 640 can be used to display information input by the user or information provided to the user as well as various menus of the mobile phone.
  • the display unit 640 can include a display panel 641.
  • the display panel 641 can be configured in the form of a liquid crystal display (LCD), an organic light-emitting diode (OLED), or the like.
  • the touch panel 631 can cover the display panel 641. When the touch panel 631 detects a touch operation on or near it, the touch panel 631 transmits to the processor 680 to determine the type of the touch event, and then the processor 680 according to the touch event. The type provides a corresponding visual output on display panel 641.
  • the touch panel 631 and the display panel 641 are two independent components to implement the input and input functions of the mobile phone, in some embodiments, the touch panel 631 may be integrated with the display panel 641. Realize the input and output functions of the phone.
  • the handset can also include at least one type of sensor 650, such as a light sensor, motion sensor, and other sensors.
  • the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display panel 641 according to the brightness of the ambient light, and the proximity sensor may close the display panel 641 and/or when the mobile phone moves to the ear. Or backlight.
  • the accelerometer sensor can detect the magnitude of acceleration in all directions (usually three axes). When it is stationary, it can detect the magnitude and direction of gravity.
  • the mobile phone can be used to identify the gesture of the mobile phone (such as horizontal and vertical screen switching, related Game, magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tapping), etc.; as for the mobile phone can also be configured with gyroscopes, barometers, hygrometers, thermometers, infrared sensors and other sensors, no longer Narration.
  • the gesture of the mobile phone such as horizontal and vertical screen switching, related Game, magnetometer attitude calibration
  • vibration recognition related functions such as pedometer, tapping
  • the mobile phone can also be configured with gyroscopes, barometers, hygrometers, thermometers, infrared sensors and other sensors, no longer Narration.
  • Audio circuit 660, speaker 661, and microphone 662 provide an audio interface between the user and the handset.
  • the audio circuit 660 can transmit the converted electrical data of the received audio data to the speaker 661 for conversion to the sound signal output by the speaker 661; on the other hand, the microphone 662 converts the collected sound signal into an electrical signal by the audio circuit 660. After receiving, it is converted into audio data, and then processed by the audio data output processor 680, sent to the other mobile phone via the RF circuit 610, or outputted to the memory 620 for further processing.
  • WiFi is a short-range wireless transmission technology
  • the mobile phone can help users to send and receive emails, browse web pages, and access streaming media through the WiFi module 670, which provides users with wireless broadband Internet access.
  • FIG. 8 shows the WiFi module 670, it can be understood that it does not belong to the essential configuration of the mobile phone, and may be omitted as needed within the scope of not changing the essence of the invention.
  • the processor 680 is the control center of the handset, and connects various portions of the entire handset using various interfaces and lines, by executing or executing software programs and/or modules stored in the memory 620, and invoking data stored in the memory 620, executing The phone's various functions and processing data, so that the overall monitoring of the phone.
  • the processor 680 may include one or more processing units; optionally, the processor 680 may integrate an application processor and a modem processor, where the application processor mainly processes an operating system, a user interface, and an application. Etc.
  • the modem processor primarily handles wireless communications. It will be appreciated that the above described modem processor may also not be integrated into the processor 680.
  • the handset also includes a power supply 690 (such as a battery) that powers the various components.
  • a power supply 690 (such as a battery) that powers the various components.
  • the power supply can be logically coupled to the processor 680 through a power management system to manage charging, discharging, and power management functions through the power management system.
  • the mobile phone may further include a camera, a Bluetooth module, and the like, and details are not described herein again.
  • the processor 680 included in the terminal further has the following functions:
  • the target first pose key point set is used as a key point set of the first object in the second image
  • processor 680 is further configured to perform the following steps:
  • the target first pose key point set is used as a key point set of the first object in the second image.
  • the preset condition is that the target distance is less than a preset threshold.
  • processor 680 is specifically configured to perform the following steps:
  • the heat map comprising a probability heat map and a vector-based heat map
  • thermogram Determining, by the thermogram, a first set of pose key points corresponding to the plurality of objects in the second image.
  • processor 680 is specifically configured to perform the following steps:
  • the preset algorithm is an optical flow algorithm, a Kalman filter algorithm, and a sliding window algorithm At least one of them.
  • the determining the motion trend of the first object includes:
  • the determining the motion trend of the first object includes:
  • processor 680 is specifically configured to perform the following steps:
  • processor 680 is specifically configured to perform the following steps:
  • the embodiment of the present invention further provides a computer program product, comprising instructions, when executed on a computer, causing the computer to execute the processing method of the augmented reality or the method of object recognition according to any of the foregoing embodiments.
  • the disclosed system, apparatus, and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
  • the technical solution of the present invention which is essential or contributes to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium.
  • a number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program code. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computer Graphics (AREA)
  • Computer Hardware Design (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Social Psychology (AREA)
  • Psychiatry (AREA)
  • Human Computer Interaction (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)
  • Processing Or Creating Images (AREA)

Abstract

一种对象识别的方法,获取第一图像中第一对象的关键点集合(301);通过神经网络模型获取第二图像中多个对象分别对应的第一姿态关键点集合(302);根据关键点集合和第一对象的运动趋势确定第一对象在第二图像中的第二姿态关键点集合(303);针对多个第一姿态关键点集合中的任一个目标第一姿态关键点集合,根据目标第一姿态关键点集合中至少一个第一姿态关键点以及第二姿态关键点集合中至少一个第二姿态关键点,确定第一姿态关键点集合与第二姿态关键点集合之间的目标距离(304);若目标距离满足预设条件,将目标第一姿态关键点集合作为第二图像中第一对象的关键点集合(305)。在多人互动场景下,能够在视频流中识别出同一个对象的关键点集合,提升了识别的准确率。

Description

一种增强现实的处理方法、对象识别的方法及相关设备
本申请要求于2017年12月13日提交中国专利局、申请号201711329984.8、申请名称为“一种增强现实的处理方法、对象识别的方法及终端”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及计算机视觉领域,尤其涉及增强现实的处理技术、对象识别技术。
背景技术
随着互联网技术的发展,视频社交也逐渐成为人们津津乐道的通信方式。即时社交应用程序在用户的渗透率较高,为了增加社交的趣味性,还可以通过识别不同用户的姿态信息来搭建不同的场景或视讯环境。
目前,能够基于神经网络模型来识别用户姿态信息,比如采用“自底向上”的方法,通过神经网络模型预测图像中所有姿态关键点所对应的概率热力图以及基于向量热力图(part affinity fields,PAF),然后再通过处理算法将所有姿态关键点连接成每个人的姿态。
然而,现有的姿态识别算法只能将每一帧图像中所有人的姿态识别出来,但无法将视频序列中某个特定人的姿态串联起来。换言之,在处理视频流中多人互动时,采用现有的姿态识别算法无法确定当前帧图像中的某个姿态信息和其他帧图像中的某个姿态信息是否属于同一个人,降低了识别的准确率。
发明内容
本发明实施例提供了一种增强现实的处理方法、对象识别的方法及相关设备,在多人互动的场景下,终端能够在视频流中识别出属于同一个对象的关键点集合,从而提升了识别的准确率。
本发明实施例的第一方面提供一种增强现实的处理方法,所述方法应用于终端,所述终端用于对多帧图像中的第一对象生成增强信息图像,所述多帧图像包含第一图像以及第二图像,所述第二图像为所述第一图像之后相邻的一帧图像,所述方法包括:
获取所述第一图像中第一对象的关键点集合;
通过神经网络模型获取第二图像中多个对象分别对应的第一姿态关键点集合,所述神经网络模型用于获取对象在图像中的关键点集合,所述第一姿态关键点集合中包含至少一个第一姿态关键点;
根据所述关键点集合和所述第一对象的运动趋势确定所述第一对象在所述第二图像中的第二姿态关键点集合,所述第二姿态关键点集合中包含至少一个第二姿态关键点;
针对多个第一姿态关键点集合中的任一个目标第一姿态关键点集合,根据所述目标第一姿态关键点集合中至少一个第一姿态关键点以及所述至少一个第二姿态关键点,确定所述目标第一姿态关键点集合与所述第二姿态关键点集合之间的目标距离;
若所述目标距离小于所述预设门限,将所述目标第一姿态关键点集合作为所述第二图像中所述第一对象的关键点集合;
根据所述第二图像中所述第一对象的关键点集合生成增强信息图像。
本发明实施例的第二方面提供一种对象识别的方法,所述方法应用于终端,所述终端用于对多帧图像中的第一对象生成增强信息图像,所述多帧图像包含第一图像以及第二图像,所述第二图像为所述第一图像之后相邻的一帧图像,所述方法包括:
获取所述第一图像中第一对象的关键点集合;
通过神经网络模型获取第二图像中多个对象分别对应的第一姿态关键点集合,所述神经网络模型用于获取对象在图像中的关键点集合,所述第一姿态关键点集合中包含至少一个第一姿态关键点;
根据所述关键点集合和所述第一对象的运动趋势确定所述第一对象在所述第二图像中的第二姿态关键点集合,所述第二姿态关键点集合中包含至少一个第二姿态关键点;
针对多个第一姿态关键点集合中的任一个目标第一姿态关键点集合,根据所述目标第一姿态关键点集合中至少一个第一姿态关键点以及所述至少一个第二姿态关键点,确定所述目标第一姿态关键点集合与所述第二姿态关键点集合之间的目标距离;
若所述目标距离满足预设条件,将所述目标第一姿态关键点集合作为所述第二图像中所述第一对象的关键点集合。
本发明实施例的第三方面提供一种终端,所述终端用于对多帧图像中的第一对象生成增强信息图像,所述多帧图像包含第一图像以及第二图像,所述第二图像为所述第一图像之后相邻的一帧图像,所述终端包括:
获取模块,用于获取所述第一图像中第一对象的关键点集合;
所述获取模块,用于通过神经网络模型获取第二图像中多个对象分别对应的第一姿态关键点集合,所述神经网络模型用于获取对象在图像中的关键点集合,所述第一姿态关键点集合中包含至少一个第一姿态关键点;
所述获取模块,用于根据所述关键点集合和所述第一对象的运动趋势确定所述第一对象在所述第二图像中的第二姿态关键点集合,所述第二姿态关键点集合中包含至少一个第二姿态关键点;
确定模块,用于针对多个第一姿态关键点集合中的任一个目标第一姿态关键点集合,根据所述目标第一姿态关键点集合中至少一个第一姿态关键点以及所述至少一个第二姿态关键点,确定所述目标第一姿态关键点集合与所述第二姿态关键点集合之间的目标距离;
所述确定模块,用于若所述目标距离满足预设条件,将所述目标第一姿态关键点集合作为所述第二图像中所述第一对象的关键点集合。
本发明实施例的第四方面提供一种终端,包括:存储器、处理器以及总线系统;
其中,所述存储器用于存储程序;
所述处理器用于执行所述存储器中的程序,具体包括如下步骤:
获取所述第一图像中第一对象的关键点集合;
通过神经网络模型获取第二图像中多个对象分别对应的第一姿态关键点集合,所述神经网络模型用于获取对象在图像中的关键点集合,所述第一姿态关键点集合中包含至少一个第一姿态关键点;
根据所述关键点集合和所述第一对象的运动趋势确定所述第一对象在所述第二图像中的第二姿态关键点集合,所述第二姿态关键点集合中包含至少一个第二姿态关键点;
针对多个第一姿态关键点集合中的任一个目标第一姿态关键点集合,根据所述目标第一姿态关键点集合中至少一个第一姿态关键点以及所述至少一个第二姿态关键点,确定所述目标第一姿态关键点集合与所述第二姿态关键点集合之间的目标距离;
若所述目标距离满足预设条件,将所述目标第一姿态关键点集合作为所述第二图像中所述第一对象的关键点集合;
所述总线系统用于连接所述存储器以及所述处理器,以使所述存储器以及所述处理器进行通信。
本发明的第五方面提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述各方面所述的方法。
本发明的第六方面提供了一种计算机程序产品,包括指令,当其在计算机上运行时,使得计算机执行上述各方面所述的方法。
从以上技术方案可以看出,本发明实施例具有以下优点:
本发明实施例中,提供了一种对象识别的方法,该方法应用于终端,所述终端用于对多帧图像中的第一对象生成增强信息图像,所述多帧图像包含第一图像以及第二图像,所述第二图像为所述第一图像之后相邻的一帧图像。首先终端获取第一图像中第一对象的关键点集合,然后通过神经网络模型获取第二图像中多个对象分别对应的第一姿态关键点集合,此外,终端也需要根据关键点集合和第一对象的运动趋势确定第一对象在第二图像中 的第二姿态关键点集合,第二姿态关键点集合可以反映出第一对象在第二图像中可能的运动姿态,可以作为确定哪个第一姿态关键点集合是第一对象的关键点集合的判断依据。接下来,针对多个第一姿态关键点集合中的任一个目标第一姿态关键点集合,终端根据目标第一姿态关键点集合中至少一个第一姿态关键点以及至少一个第二姿态关键点,确定第一姿态关键点集合与第二姿态关键点集合之间的目标距离,若目标距离满足预设条件,将目标第一姿态关键点集合作为所述第二图像中第一对象的关键点集合。通过上述方式,在多人互动的场景下,通过将第二姿态关键点集合作为确定哪个第一姿态关键点集合是第一对象的关键点集合的判断依据,终端能够在视频流中识别出属于同一个对象的关键点集合,从而提升了识别的准确率。
附图说明
图1为本发明实施例中多人互动姿态识别的流程示意图;
图2为本发明实施例中增强现实的处理方法一个实施例示意图;
图3为本发明应用场景中多人互动姿态识别的一个示意图;
图4为本发明实施例中对象识别的方法一个实施例示意图;
图5为本发明实施例中单帧图像内获取关键点集合的一个示意图;
图6为本发明实施例中识别对象的一个流程示意图;
图7为本发明实施例中终端一个实施例示意图;
图8为本发明实施例中终端一个结构示意图。
具体实施方式
本发明实施例提供了一种增强现实的处理方法、对象识别的方法及相关设备,在多人互动的场景下,终端能够在视频流中识别出属于同一个对象的关键点集合,从而提升了识别的准确率。
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。
本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本发明的实施例例如能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
应理解,本发明主要应用于计算机视觉领域,具体应用于人体姿态识别,人体姿态识别技术可以帮助计算机理解用户动作、姿态和行为,是众多人体姿态应用的基础。目前,较多社交类应用程序采用人体姿态识别技术,比如用户在自拍的过程中可以通过识别出用户五官的位置来覆盖相应的贴图,或者用户在直播的过程中,可以在用户头顶上方出现弹幕。而本发明可以在多人场景中识别出同一个人,请参阅图1,图1为本发明实施例中多人互动姿态识别的流程示意图,如图所示,具体地:
步骤101中,获取视频,其中,视频中包含多帧图像;
步骤102中,对视频中的每一帧图像进行人体姿态估计;
步骤103中,判断步骤102中的这帧图像是否为视频中的第一帧图像,如果是,则进入步骤104,反之,如果不是第一帧图像,则跳转至步骤105;
步骤104中,在首帧图像中对每个人体姿态赋予一个唯一的身份标识号码(identity,ID);
步骤105中,如果不是首帧图像,那么继续对该帧图像进行人体姿态估计,并对先前帧图像的人体姿态关键点进行跟踪;
步骤106中,结合当前帧图像的关键点跟踪结果和姿态估计结果,确定当前帧每个人体姿态的ID。
下面将从终端的角度,对本发明中增强现实的处理方法进行介绍,请参阅图2,本发明实施例中增强现实的处理方法一个实施例包括:
201、获取第一图像中第一对象的关键点集合;
本实施例中,所述方法应用于终端,终端用于对多帧图像中的第一对象生成增强信息图像,所述多帧图像包含第一图像以及第二图像,所述第二图像为所述第一图像之后相邻的一帧图像。
具体地,第一对象可以是第一图像中一位用户,增强信息图像可以是一个贴图,比如“衣服”、“飞机”或者“花朵”等贴图,可以采用虚拟现实(augmented reality,AR)技术将两者合成为一个对象。
202、通过神经网络模型获取第二图像中多个对象分别对应的第一姿态关键点集合,所述神经网络模型用于获取对象在图像中的关键点集合,所述第一姿态关键点集合中包含至少一个第一姿态关键点;
本实施例中,终端采用神经网络模型获取第二图像中多个对象分别对应的第一姿态关键点集合,此时,第二图像可以作为神经网络模型的输入,多个对象分别对应的第一姿态关键点集合作为神经网络模型的输入。这里的神经网络模型具体为OpenPose,此外,还可以是一种基于卷积神经网络的姿态估计算法(convolutional pose machines,CPM),将第二 图像输入至神经网络模型,便可以输出第二图像中多个对象分别对应的第一姿态关键点集合,其中,第一姿态关键点集合中包含至少一个第一姿态关键点。
203、根据关键点集合和所述第一对象的运动趋势确定所述第一对象在第二图像中的第二姿态关键点集合,第二姿态关键点集合中包含至少一个第二姿态关键点;本实施例中,终端采用光流法、卡尔曼滤波算法和滑动窗口算法中的至少一种对第一图像中的关键点集合进行预测,从而得到第一对象在第二图像中的第二姿态关键点集合。
由于第二姿态关键点集合需要能够体现出第一对象在下一帧图像(第二图像)中可能的姿态,而运动趋势能够体现出第一对象在下一帧中可能的姿态。因此,预测得到第一对象在第二图像中的第二姿态关键点集合原理实际上是根据步骤201中得到的关键点集合和第一对象的运动趋势进行预测得到的,从而保证得到的第二姿态关键点是第一对象对应的,且反映出第一对象在第二图像中可能的姿态。
需要说明的是,根据采用算法的不同,确定第一对象的运动趋势的方式可能有所不同。其中,一种第一对象的运动趋势的确定方式可以是根据所述第一图像确定所述第一对象的运动趋势。此时,采用的算法可以是卡尔曼滤波算法或滑动窗口算法。
另一种第一对象的运动趋势的确定方式可以是根据所述第一图像和所述第二图像之间的像素变化确定所述第一对象的运动趋势。此时,采用的算法可以是光流法。
204、针对多个第一姿态关键点集合中的任一个目标第一姿态关键点集合,根据所述目标第一姿态关键点集合中至少一个第一姿态关键点以及所述至少一个第二姿态关键点,确定所述目标第一姿态关键点集合与所述第二姿态关键点集合之间的目标距离;
本实施例中,终端计算目标第一姿态关键点集合中至少一个第一姿态关键点,与第二姿态关键点集合中至少一个第二姿态关键点直接的距离,比如,第一姿态关键点为(1.1),第二姿态关键点为(3,3),那么可以采用如下方式进行计算:
Dist 2=(3-1) 2+(3-1) 2
=8
即Dist为2.828,Dist表示目标距离。
205、若所述目标距离小于所述预设门限,将所述目标第一姿态关键点集合作为所述第二图像中所述第一对象的关键点集合;
206、根据所述第二图像中所述第一对象的关键点集合生成增强信息图像。
本实施例中,如果目标距离小于预设门限,那么终端就利用AR技术,根据第二图像中第一对象的关键点信息对第一对象与目标虚拟物进行叠加,生成增强信息图像,并在终端显示界面上予以展示,进而可以生成增强现实图像,该增强现实图像包括第二图像以及增强信息图像。
其中,AR技术是一种将真实世界信息和虚拟世界信息“无缝”集成的技术,是把原本在现实世界的一定时间空间范围内很难体验到的实体信息(视觉信息、声音、味道和触觉等)通过电脑等科学技术,模拟仿真后再叠加,将虚拟的信息应用到真实世界,被人类感官所感知,从而达到超越现实的感官体验。真实的环境和虚拟的物体实时地叠加到了同一个画面或空间同时存在。AR技术包含了多媒体、三维建模、实时视频显示及控制、多传感器融合、实时跟踪及注册、场景融合等技术。
AR系统具有三个突出的特点,第一、真实世界和虚拟的信息集成;第二、具有实时交互性;第三、是在三维尺度空间中增添定位虚拟物体。
然而,若目标距离大于或等于预设门限,则说明目标第一姿态关键点集合不是第二图像中第一对象的关键点集合,至于目标第一姿态关键点集合是第二图像中哪个对象的关键点集合还需要针对其他对象重新执行步骤201-步骤205重新进行确定。为了便于理解,下面将说明如何在多人互动场景中识别同一个对象,并结合增强现实技术对该对象生成相应的增强现实信息。请参阅图3,图3为本发明应用场景中多人互动姿态识别的一个示意图,如图所示,在视频的第一帧图像中有两位用户正在直播,即左边的用户A和右边的用户B,利用AR技术在用户A的手上放置一台虚拟直升机。在下一帧图像中,用户A和用户B的动作的发生了变化,这个时候,虚拟直升机仍然只跟着用户A,于是在这帧图像中也能看到用户A的手上放置一台虚拟直升机。
下面将从终端的角度,对本发明中对象识别的方法进行介绍,请参阅图4,本发明实施例中对象识别的方法一个实施例包括:
301、获取第一图像中第一对象的关键点集合;
本实施例中,终端用于对多帧图像中的第一对象生成增强信息图像,所述多帧图像包含第一图像以及第二图像,所述第二图像为所述第一图像之后相邻的一帧图像。具体地,第一对象可以是指多人互动场景中的人物。
为了便于理解,请参阅图5,图5为本发明实施例中单帧图像内获取关键点集合的一个示意图,如图所示,输入单张静态图像(第一图像)之后,输出所有人的姿态,一个人的姿态包含N个预先定义的关键点位置及其对应的连接,比如可以识别到图5有三个对象,且每个对象上的点即为关键点,通常情况下,可以预先定义N个关键点,比如一个关键点对应鼻子,一个关键点对应眼睛,一个关键点对应脖子等。
302、通过神经网络模型获取第二图像中多个对象分别对应的第一姿态关键点集合,所述神经网络模型用于获取对象在图像中的关键点集合,所述第一姿态关键点集合中包含至少一个第一姿态关键点;
本实施例中,终端采用神经网络模型获取第二图像中多个对象分别对应的第一姿态关键点集合。这里的神经网络模型具体为OpenPose,此外,还可以是CPM,将第二图像输入至神经网络模型,便可以输出第二图像中多个对象分别对应的第一姿态关键点集合。
303、根据关键点集合和所述第一对象的运动趋势确定所述第一对象在第二图像中的第二姿态关键点集合,第二姿态关键点集合中包含至少一个第二姿态关键点;
本实施例中,终端采用光流法、卡尔曼滤波算法和滑动窗口算法中的至少一种对第一图像中的关键点集合进行预测,从而得到第一对象在第二图像中的第二姿态关键点集合。
需要说明的是,第一对象的运动趋势的确定方式可以参见图2对应实施例中的介绍,此处不再赘述。
304、针对多个第一姿态关键点集合中的任一个目标第一姿态关键点集合,根据所述目标第一姿态关键点集合中至少一个第一姿态关键点以及所述至少一个第二姿态关键点,确定所述目标第一姿态关键点集合与所述第二姿态关键点集合之间的目标距离;
本实施例中,终端计算目标第一姿态关键点集合中第一姿态关键点集合中至少一个第一姿态关键点,与第二姿态关键点集合中至少一个第二姿态关键点直接的距离,比如,第一姿态关键点为(1.1),第二姿态关键点为(3,3),那么可以采用如下方式进行计算:
Dist 2=(3-1) 2+(3-1) 2
=8
即Dist为2.828,Dist表示目标距离。
305、若所述目标距离满足预设条件,将所述目标第一姿态关键点集合作为所述第二图像中所述第一对象的关键点集合。
本实施例中,预设条件用于判断目标第一姿态关键点集合与第二姿态关键点集合是否相似,如果目标距离满足预设条件,则认为目标第一姿态关键点集合与第二姿态关键点集合相似,也就是认为目标第一姿态关键点集合是第二图像中第一对象的关键点集合,目前识别到的对象是第一对象。在一种实现方式中,预设条件可以为目标距离小于预设门限。
若目标距离不满足预设条件,则认为目标第一姿态关键点集合不是第二图像中第一对象的关键点集合,目前识别到的对象不是第一对象。
可以理解的是,目标距离通常指欧氏距离,也可以是其他距离,比如曼哈顿距离,此处不做限定。
本发明实施例中,提供了一种对象识别的方法,该方法应用于终端,所述终端用于对多帧图像中的第一对象生成增强信息图像,多帧图像包含第一图像以及第二图像,第二图像为第一图像之后相邻的一帧图像。首先终端获取第一图像中第一对象的关键点集合,然后通过神经网络模型获取第二图像中多个对象分别对应的第一姿态关键点集合,此外,终端也需要根据关键点集合和第一对象的运动趋势确定第一对象在第二图像中的第二姿态关键点集合,第二姿态关键点集合可以反映出第一对象在第二图像中可能的运动姿态,可以作为确定哪个第一姿态关键点集合是第一对象的关键点集合的判断依据。接下来,针对多 个第一姿态关键点集合中的任一个目标第一姿态关键点集合,终端根据目标第一姿态关键点集合中至少一个第一姿态关键点以及至少一个第二姿态关键点,确定第一姿态关键点集合与第二姿态关键点集合之间的目标距离,若目标距离满足预设条件,将目标第一姿态关键点集合作为所述第二图像中第一对象的关键点集合。通过上述方式,在多人互动的场景下,通过将第二姿态关键点集合作为确定哪个第一姿态关键点集合是第一对象的关键点集合的判断依据,终端能够在视频流中识别出属于同一个对象的关键点集合,从而提升了识别的准确率。
本实施例中,多人估计方法一般分为两大类,即自顶向下和自底向上,其中自顶向下是指先把人框出来,其中,人即为本实施例所提到的对象。然后再用单人的方法去定位人的关节,而自底向上的方法是先把所有关节位置确定出来,然后再区分关节属于哪个人。本发明主要采用自底向上的方式对一帧图像中的对象进行识别。
如果目标距离小于预设门限,那么终端就认为目标第一姿态关键点集合属于第一对象,还可以进一步利用AR技术,对第一对象与目标虚拟物进行叠加,生成AR信息,并在终端显示界面上予以展示。
为了便于介绍,请参阅图6,图6为本发明实施例中识别对象的一个流程示意图,如图所示,在步骤401中,终端对第二图像进行预测,采用姿态估计算法得到第二图像中多个对象分别对应的第一姿态关键点集合(记为集合A),采用人体姿态关键点跟踪算法得到第一对象在第二图像中的第二姿态关键点集合(记为集合B);
在步骤402中,终端先将集合B中所有第二姿态关键点标记为“未使用”;
在步骤403中,对于多个集合A中的任一个目标集合A(目标第一姿态关键点集合)中的每组关键点,计算其与集合B中每组被标记为“未使用”的关键点的距离,对于目标集合A中的每组关键点,记录与其距离最小的集合B中“未使用”的关键点对应的上一帧ID(记为ID_pre)以及其对应的距离,即目标距离;
在步骤404中,判断目标距离是否小于预设门限,若是,则进入步骤406,若否,则进入步骤405;
在步骤405中,距离大于或等于预设门限时,将该组关键点对应的人体姿态的ID标记为一个新的(和之前不冲突的)ID;
在步骤406中,距离小于预设门限时,认为目标第一姿态关键点集合A中的一组关键点和集合B中的一组关键点匹配成功,因此可将目标第一姿态关键点集合A中的这组关键点对应的人体姿态的ID标记为其对应的集合B中这组关键点ID(ID_pre),同时将对集合B中对应的这组的关键点标记为“已使用”;
其次,本发明实施例中,如果目标距离小于预设门限,那么终端确定目标第一姿态关键点集合属于第一对象。通过上述方式,在多人互动的场景下,终端能够在视频流中识别出属于同一个对象的关键点集合,从而提升了识别的准确率。
可选地,在上述图4对应的实施例的基础上,本发明实施例提供的对象识别的方法第二个可选实施例中,通过神经网络模型获取第二图像中多个对象分别对应的第一姿态关键点集合,可以包括:
通过所述神经网络模型获取所述第二图像中所有关键点的热力图,所述热力图包含概率热力图以及基于向量的热力图;
通过所述热力图确定所述第二图像中多个对象分别对应的第一姿态关键点集合。
本实施例中,可以采用神经网络模型来预测第二图像中多个对象分别对应的第一姿态关键点集合。具体地,神经网络模型可以为OpenPose,通过OpenPose预测第二图像中所有姿态关键点对应的概率热力图及对应的基于向量热力图(part affinity fields,PAF)热力图,然后再通过后处理算法将所有姿态关键点连接成每个人的姿态。检测过程为输入一帧图像,然后得到概率热力图和PAF,然后根据PAF生成一系列的偶匹配,由于PAF自身的矢量性,使得生成的偶匹配很正确,最终合并为一个人的整体骨架。
其次,本发明实施例中,终端利用神经网络模型获取第二图像中所有关键点的热力图,然后通过热力图预测第一姿态关键点集合。通过上述方式,采用类似OpenPose的神经网络模型预测第一姿态关键点集合具有较好的可靠性,且这类神经网络模型的运行速度较快,即时同一帧图像中有较多对象也不会增大姿态关键点集合预测的难度。
可选地,在上述图4对应的实施例的基础上,本发明实施例提供的对象识别的方法第三个可选实施例中,根据关键点集合和所述第一对象的运动趋势确定所述第一对象在第二图像中的第二姿态关键点集合,可以包括:
采用预设算法对关键点集合和所述第一对象的运动趋势进行计算,得到第一对象在第二图像中的第二姿态关键点集合,其中,预设算法为光流算法、卡尔曼滤波算法和滑动窗口算法中的至少一种。
本实施例中,终端可以采用光流算法、卡尔曼滤波算法和滑动窗口算法中的至少一种计算第二姿态关键点集合。举个例子,若第一图像中第一对象的关键点集合包括3个关键点,分别为(1,1)、(2,2)和(3,3),采用滑动窗口算法预测关键点集合的位置为线性变化,那么第一图像中的关键点(1,1)在第二图像中的位置为(2,2),第一图像中的关键点(2,2)在第二图像中的位置为(3,3),第一图像中的关键点(3,3)在第二图像中的位置为(4,4)。
当然,在实际应用中,还可以采用光流算法和/或卡尔曼滤波算法。
光流算法可以是基于以下假设,图像灰度分布的变化即像素变化是目标或者场景的运动引起的,可以反映出第一对象的运动趋势。也就是说,目标与场景的灰度不随时间变化。这使得光流方法抗噪声能力较差,其应用范围一般局限于目标与场景的灰度保持不变这个假设条件下。
卡尔曼滤波算法是一种利用线性系统状态方程,通过系统输入输出观测数据对系统状态进行最优估计的算法。由于观测数据中包括系统中的噪声和干扰的影响,所以最优估计也可看作是滤波过程。卡尔曼滤波算法不要求信号和噪声都是平稳过程的假设条件。对于每个时刻的系统扰动和观测误差,只要对它们的统计性质作某些适当的假定,通过对含有噪声的观测信号进行处理,就能在平均的意义上,求得误差为最小的真实信号的估计值。在图像处理方面,应用卡尔曼滤波算法对由于某些噪声影响而造成模糊的图像进行复原,在对噪声作了某些统计性质的假定后,就可以用卡尔曼的算法以递推的方式从模糊图像中得到均方差最小的真实图像,使模糊的图像得到复原。
其次,本发明实施例中,可以采用光流算法、卡尔曼滤波算法和滑动窗口算法中的至少一种,对关键点集合和第一对象的运动趋势进行计算,得到第一对象在第二图像中的第二姿态关键点集合。通过上述方式,能够在多帧图像中对同一对象的关键点集合进行跟踪,并得到该对象在下一帧图像中的第二姿态关键点集合,光流算法、卡尔曼滤波算法和滑动窗口算法都是计算量较小的算法,从而提升了关键点集合跟踪的效率。
可选地,在上述图4以及图4对应的第一至第三个实施例中任一项的基础上,本发明实施例提供的对象识别的方法第四个可选实施例中,针对多个第一姿态关键点集合中的任一个目标第一姿态关键点集合,根据所述目标第一姿态关键点集合中至少一个第一姿态关键点以及所述至少一个第二姿态关键点,确定所述目标第一姿态关键点集合与所述第二姿态关键点集合之间的目标距离,可以包括:
从至少一个第一姿态关键点中获取第一目标关键点的位置信息,并从至少一个第二姿态关键点中获取第二目标关键点的位置信息,第二目标关键点是与第一目标关键点之间直线距离最小的一个关键点;
根据第一目标关键点的位置信息以及第二目标关键点的位置信息计算目标距离。
本实施例中,终端可以从目标第一姿态关键点集合中至少一个第一姿态关键点中获取一个第一目标关键点的位置信息,并从至少一个第二姿态关键点中获取一个第二目标关键点的位置信息。假设,目标第一姿态关键点集合中包括2个第一姿态关键点(即a点和b点),第二姿态关键点集合中包括2个第二姿态关键点(即A点和B点),其中,a点和A点均为头部关键点,b点和B点均为脖子关键点。首先,终端根据a点到A点之间的距离和b点到B点之间的距离,选择一条最短的路径,比如a点到A点之间的距离为10,b点到B点之间的距离为20,那么目标距离即为10。
再次,本发明实施例中,提供了一种利用最小值法计算目标距离的方式,即计算两组姿态关键点中距离最近的两个关键点之间的距离,该距离即为目标距离。通过上述方式,在计算目标距离时只需计算一组关键点之间的距离即可,无需进行多次计算,有利于提升处理效率,节省计算资源,从而提升方案的实用性。
可选地,在上述图4以及图4对应的第一至第三个实施例中任一项的基础上,本发明实施例提供的对象识别的方法第五个可选实施例中,针对多个第一姿态关键点集合中的任一个目标第一姿态关键点集合,根据所述目标第一姿态关键点集合中至少一个第一姿态关键点以及所述至少一个第二姿态关键点,确定所述目标第一姿态关键点集合与所述第二姿态关键点集合之间的目标距离,可以包括:
获取至少一个第一姿态关键点中各个第一目标关键点的位置信息,并获取至少一个第二姿态关键点中各个第二目标关键点的位置信息,其中,每个第一目标关键点与每个第二目标关键点之间分别具有一一对应关系;
根据所述各个第一目标关键点的位置信息以及分别对应的第二目标关键点的位置信息,计算所述每个第一目标关键点与其所对应的第二目标关键点之间的最小距离;
计算至少一个最小距离的平均值,得到目标距离。
本实施例中,终端可以获取目标第一姿态关键点集合中至少一个第一姿态关键点中各个第一目标关键点的位置信息,以及至少一个第二姿态关键点中各个第二目标关键点的位置信息。假设,目标第一姿态关键点集合中包括2个第一姿态关键点(即a点和b点),第二姿态关键点集合中包括2个第二姿态关键点(即A点和B点),其中,a点和A点均为头部关键点,b点和B点均为脖子关键点。首先,终端计算头部关键点之间的最小距离,即a点到A点之间的距离,假设计算得到的距离为10,然后计算脖子关键点之间的最小距离,即b点到B点之间的距离,假设计算得到的距离为20。于是,终端将对这两个距离进行平均,并计算得到目标距离为15。
再次,本发明实施例中,提供了一种利用平均值法计算目标距离的方式,即将一组姿态关键点集合中的关键点与另一组姿态关键点集合中的关键点进行两两匹配计算,并得到多个最小距离,然后取所有最小距离的平均值,即可确定目标距离。通过上述方式,在计算目标距离时采用平均值法具有更高的可靠性,从而有利于提升方案的可行性和可操作性。
可选地,在上述图4对应的实施例的基础上,本发明实施例提供的对象识别的方法第六个可选实施例中,第一对象与第一对象标识具有唯一对应关系;第一对象标识用于在多帧图像中标识第一对象。
本实施例中,还可以对不同的对象进行标识,且每个对象只对应唯一一个标识,标识之间是不具有重复性的。
比如,在第一帧图像中有4个对象,分别为甲、乙、丙和丁,这个时候每个对象与其对应的标识关系如表1所示。
表1
对象 对象标识
ID-1
ID-2
ID-3
ID-4
在第二帧图像中的对象与其对应的标识关系如表2所示。
表2
对象 对象标识
ID-1
ID-2
ID-3
ID-4
ID-5
ID-6
需要说明的是,在给对象进行标识赋值的时候,可以按照从左到右赋值的规则对不同的对象进行赋值,也可以按照置信度从大到小赋值的规则对不同的对象进行赋值,此处不做限定。
其次,本发明实施例中,采用不同的标识来标识不同的对象,在多帧图像中通过标识即可确定是否属于同一个对象。通过上述方式,可以直接根据标识确定对象的唯一性,并对唯一的对象进行相应的处理,从而提升方案的实用性和可行性。
下面对本发明中的终端进行详细描述,请参阅图7,图7为本发明实施例中终端一个实施例示意图,终端用于对多帧图像中的第一对象生成增强信息图像,多帧图像包含第一图像以及第二图像,第二图像为第一图像之后相邻的一帧图像,终端50包括:
获取模块501,用于获取所述第一图像中第一对象的关键点集合;
所述获取模块501,用于通过神经网络模型获取第二图像中多个对象分别对应的第一姿态关键点集合,所述神经网络模型用于获取对象在图像中的关键点集合,所述第一姿态关键点集合中包含至少一个第一姿态关键点;
所述获取模块501,用于根据所述关键点集合和所述第一对象的运动趋势确定所述第一对象在所述第二图像中的第二姿态关键点集合,所述第二姿态关键点集合中包含至少一个第二姿态关键点;
确定模块502,用于针对多个第一姿态关键点集合中的任一个目标第一姿态关键点集合,根据所述获取模块501获取的目标第一姿态关键点集合中所述至少一个第一姿态关键点以及所述至少一个第二姿态关键点,确定所述目标第一姿态关键点集合与所述第二姿态关键点集合之间的目标距离;
所述确定模块502,用于若所述目标距离满足预设条件,将所述目标第一姿态关键点集合作为所述第二图像中所述第一对象的关键点集合。
本实施例中,获取模块501获取所述第一图像中第一对象的关键点集合,所述获取模块501通过神经网络模型获取第二图像中多个对象分别对应的第一姿态关键点集合,所述获取模块501根据所述关键点集合和第一对象的运动趋势确定第一对象在所述第二图像中的第二姿态关键点集合,针对多个第一姿态关键点集合中的任一个目标第一姿态关键点集合,确定模块502根据所述获取模块501获取的目标第一姿态关键点集合中至少一个第一姿态关键点以及第二姿态关键点集合中至少一个第二姿态关键点,确定所述目标第一姿态关键点集合与所述第二姿态关键点集合之间的目标距离,当所述目标距离满足预设条件,所述确定模块502将目标第一姿态关键点集合作为所述第二图像中第一对象的关键点集合。
本发明实施例中,提供了一种终端,首先终端获取第一图像中第一对象的关键点集合,然后通过神经网络模型获取第二图像中多个对象分别对应的第一姿态关键点集合,此外,终端也需要根据关键点集合和第一对象的运动趋势确定第一对象在第二图像中的第二姿态关键点集合,第二姿态关键点集合可以反映出第一对象在第二图像中可能的运动姿态,可以作为确定哪个第一姿态关键点集合是第一对象的关键点集合的判断依据。接下来,针对多个第一姿态关键点集合中的任一个目标第一姿态关键点集合,终端根据目标第一姿态关键点集合中至少一个第一姿态关键点以及至少一个第二姿态关键点,确定第一姿态关键点集合与第二姿态关键点集合之间的目标距离,若目标距离满足预设条件,将目标第一姿态关键点集合作为所述第二图像中第一对象的关键点集合。通过上述方式,在多人互动的场景下,通过将第二姿态关键点集合作为确定哪个第一姿态关键点集合是第一对象的关键点集合的判断依据,终端能够在视频流中识别出属于同一个对象的关键点集合,从而提升了识别的准确率。
可选地,在上述图7所对应的实施例的基础上,所述预设条件为所述目标距离小于所述预设门限。
本发明实施例中,如果目标距离小于预设门限,那么终端确定目标第一姿态关键点集合属于第一对象。通过上述方式,在多人互动的场景下,终端能够在视频流中识别出属于同一个对象的关键点集合,从而提升了识别的准确率。
可选地,在上述图7所对应的实施例的基础上,本发明实施例提供的终端50的另一实施例中,
所述获取模块501,具体用于通过所述神经网络模型获取所述第二图像中所有关键点的热力图,所述热力图包含概率热力图以及基于向量的热力图;
通过所述热力图确定所述第二图像中多个对象分别对应的第一姿态关键点集合。
其次,本发明实施例中,终端利用神经网络模型获取第二图像中所有关键点的热力图,然后通过热力图预测第二图像中多个对象分别对应的第一姿态关键点集合。通过上述方式,采用类似OpenPose的神经网络模型预测第一姿态关键点集合具有较好的可靠性,且这类神经网络模型的运行速度较快,即时同一帧图像中有较多对象也不会增大姿态关键点集合预测的难度。
可选地,在上述图7所对应的实施例的基础上,本发明实施例提供的终端50的另一实施例中,
所述获取模块501,具体用于采用预设算法对所述关键点集合和所述第一对象的运动趋势进行计算,得到所述第二姿态关键点集合,所述预设算法为光流算法、卡尔曼滤波算法和滑动窗口算法中的至少一种。
其次,本发明实施例中,可以采用光流算法、卡尔曼滤波算法和滑动窗口算法中的至少一种,对关键点集合和所述第一对象的运动趋势进行计算,得到第一对象在第二图像中的第二姿态关键点集合。通过上述方式,能够在多帧图像中对某个对象的关键点集合进行跟踪,并得到该对象在下一帧图像中第二姿态关键点集合,光流算法、卡尔曼滤波算法和滑动窗口算法都是计算量较小的算法,从而提升了关键点集合跟踪的效率。
可选地,在上述图7所对应的实施例的基础上,本发明实施例提供的终端50的另一实施例中,
所述确定模块502,具体用于从所述至少一个第一姿态关键点中获取第一目标关键点的位置信息,并从所述至少一个第二姿态关键点中获取第二目标关键点的位置信息,所述第二目标关键点是与所述第一目标关键点之间直线距离最小的一个关键点;
根据所述第一目标关键点的位置信息以及所述第二目标关键点的位置信息计算所述目标距离。
再次,本发明实施例中,提供了一种利用最小值法计算目标距离的方式,即计算两组姿态关键点中距离最近的两个关键点之间的距离,该距离即为目标距离。通过上述方式,在计算目标距离时只需计算一组关键点之间的距离即可,无需进行多次计算,有利于提升处理效率,节省计算资源,从而提升方案的实用性。
可选地,在上述图7所对应的实施例的基础上,本发明实施例提供的终端50的另一实施例中,
所述确定模块502,具体用于获取所述至少一个第一姿态关键点中各个第一目标关键点的位置信息,并获取所述至少一个第二姿态关键点中各个第二目标关键点的位置信息,其中,每个第一目标关键点与每个第二目标关键点之间分别具有一一对应关系;
根据所述各个第一目标关键点的位置信息以及分别对应的第二目标关键点的位置信息,计算所述每个第一目标关键点与其所对应的第二目标关键点之间的最小距离;
计算至少一个所述最小距离的平均值,得到所述目标距离。
再次,本发明实施例中,提供了一种利用平均值法计算目标距离的方式,即将一组姿态关键点集合中的关键点与另一组姿态关键点集合中的关键点进行两两匹配计算,并得到多个最小距离,然后取所有最小距离的平均值,即可确定目标距离。通过上述方式,在计算目标距离时采用平均值法具有更高的可靠性,从而有利于提升方案的可行性和可操作性。
可选地,在上述图7所对应的实施例的基础上,本发明实施例提供的终端50的另一实施例中,
所述第一对象与第一对象标识具有唯一对应关系;
所述第一对象标识用于在所述多帧图像中标识所述第一对象。
其次,本发明实施例中,采用不同的标识来标识不同的对象,在多帧图像中通过标识即可确定是否属于同一个对象。通过上述方式,可以直接根据标识确定对象的唯一性,并对唯一的对象进行相应的处理,从而提升方案的实用性和可行性。
本发明实施例还提供了另一种终端,如图8所示,为了便于说明,仅示出了与本发明实施例相关的部分,具体技术细节未揭示的,请参照本发明实施例方法部分。该终端可以为包括手机、平板电脑、个人数字助理(Personal Digital Assistant,PDA)、销售终端(Point of Sales,POS)、车载电脑等任意终端设备,以终端为手机为例:
图8示出的是与本发明实施例提供的终端相关的手机的部分结构的框图。参考图8,手机包括:射频(Radio Frequency,RF)电路610、存储器620、输入单元630、显示单元640、传感器650、音频电路660、无线保真(wireless fidelity,WiFi)模块670、处理器680、以及电源690等部件。本领域技术人员可以理解,图8中示出的手机结构并不构成对手机的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
下面结合图8对手机的各个构成部件进行具体的介绍:
RF电路610可用于收发信息或通话过程中,信号的接收和发送,特别地,将基站的下行信息接收后,给处理器680处理;另外,将设计上行的数据发送给基站。通常,RF电路610包括但不限于天线、至少一个放大器、收发信机、耦合器、低噪声放大器(Low Noise Amplifier,LNA)、双工器等。此外,RF电路610还可以通过无线通信与网络和其他设备通信。上述无线通信可以使用任一通信标准或协议,包括但不限于全球移动通讯系统(Global System of Mobile communication,GSM)、通用分组无线服务(General Packet Radio  Service,GPRS)、码分多址(Code Division Multiple Access,CDMA)、宽带码分多址(Wideband Code Division Multiple Access,WCDMA)、长期演进(Long Term Evolution,LTE)、电子邮件、短消息服务(Short Messaging Service,SMS)等。
存储器620可用于存储软件程序以及模块,处理器680通过运行存储在存储器620的软件程序以及模块,从而执行手机的各种功能应用以及数据处理。存储器620可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据手机的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器620可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。
输入单元630可用于接收输入的数字或字符信息,以及产生与手机的用户设置以及功能控制有关的键信号输入。具体地,输入单元630可包括触控面板631以及其他输入设备632。触控面板631,也称为触摸屏,可收集用户在其上或附近的触摸操作(比如用户使用手指、触笔等任何适合的物体或附件在触控面板631上或在触控面板631附近的操作),并根据预先设定的程式驱动相应的连接装置。可选的,触控面板631可包括触摸检测装置和触摸控制器两个部分。其中,触摸检测装置检测用户的触摸方位,并检测触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器从触摸检测装置上接收触摸信息,并将它转换成触点坐标,再送给处理器680,并能接收处理器680发来的命令并加以执行。此外,可以采用电阻式、电容式、红外线以及表面声波等多种类型实现触控面板631。除了触控面板631,输入单元630还可以包括其他输入设备632。具体地,其他输入设备632可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆等中的一种或多种。
显示单元640可用于显示由用户输入的信息或提供给用户的信息以及手机的各种菜单。显示单元640可包括显示面板641,可选的,可以采用液晶显示器(Liquid Crystal Display,LCD)、有机发光二极管(Organic Light-Emitting Diode,OLED)等形式来配置显示面板641。进一步的,触控面板631可覆盖显示面板641,当触控面板631检测到在其上或附近的触摸操作后,传送给处理器680以确定触摸事件的类型,随后处理器680根据触摸事件的类型在显示面板641上提供相应的视觉输出。虽然在图8中,触控面板631与显示面板641是作为两个独立的部件来实现手机的输入和输入功能,但是在某些实施例中,可以将触控面板631与显示面板641集成而实现手机的输入和输出功能。
手机还可包括至少一种传感器650,比如光传感器、运动传感器以及其他传感器。具体地,光传感器可包括环境光传感器及接近传感器,其中,环境光传感器可根据环境光线的明暗来调节显示面板641的亮度,接近传感器可在手机移动到耳边时,关闭显示面板641和/或背光。作为运动传感器的一种,加速计传感器可检测各个方向上(一般为三轴)加速度的大小,静止时可检测出重力的大小及方向,可用于识别手机姿态的应用(比如横竖屏切换、相关游戏、磁力计姿态校准)、振动识别相关功能(比如计步器、敲击)等;至于手 机还可配置的陀螺仪、气压计、湿度计、温度计、红外线传感器等其他传感器,在此不再赘述。
音频电路660、扬声器661,传声器662可提供用户与手机之间的音频接口。音频电路660可将接收到的音频数据转换后的电信号,传输到扬声器661,由扬声器661转换为声音信号输出;另一方面,传声器662将收集的声音信号转换为电信号,由音频电路660接收后转换为音频数据,再将音频数据输出处理器680处理后,经RF电路610以发送给比如另一手机,或者将音频数据输出至存储器620以便进一步处理。
WiFi属于短距离无线传输技术,手机通过WiFi模块670可以帮助用户收发电子邮件、浏览网页和访问流式媒体等,它为用户提供了无线的宽带互联网访问。虽然图8示出了WiFi模块670,但是可以理解的是,其并不属于手机的必须构成,完全可以根据需要在不改变发明的本质的范围内而省略。
处理器680是手机的控制中心,利用各种接口和线路连接整个手机的各个部分,通过运行或执行存储在存储器620内的软件程序和/或模块,以及调用存储在存储器620内的数据,执行手机的各种功能和处理数据,从而对手机进行整体监控。可选的,处理器680可包括一个或多个处理单元;可选的,处理器680可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器680中。
手机还包括给各个部件供电的电源690(比如电池),可选的,电源可以通过电源管理系统与处理器680逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。
尽管未示出,手机还可以包括摄像头、蓝牙模块等,在此不再赘述。
在本发明实施例中,该终端所包括的处理器680还具有以下功能:
获取所述第一图像中第一对象的关键点集合;
通过神经网络模型获取第二图像中多个对象分别对应的第一姿态关键点集合,所述神经网络模型用于获取对象在图像中的关键点集合,所述第一姿态关键点集合中包含至少一个第一姿态关键点;
根据所述关键点集合和所述第一对象的运动趋势确定所述第一对象在所述第二图像中的第二姿态关键点集合,所述第二姿态关键点集合中包含至少一个第二姿态关键点;
针对多个第一姿态关键点集合中的任一个目标第一姿态关键点集合,根据所述目标第一姿态关键点集合中至少一个第一姿态关键点以及所述至少一个第二姿态关键点,确定所述目标第一姿态关键点集合与所述第二姿态关键点集合之间的目标距离;
若所述目标距离小于所述预设门限,将所述目标第一姿态关键点集合作为所述第二图像中所述第一对象的关键点集合;
根据所述第二图像中所述第一对象的关键点集合生成增强信息图像。
可选地,处理器680还用于执行如下步骤:
获取所述第一图像中第一对象的关键点集合;
通过神经网络模型获取第二图像中多个对象分别对应的第一姿态关键点集合,所述神经网络模型用于获取对象在图像中的关键点集合,所述第一姿态关键点集合中包含至少一个第一姿态关键点;
根据所述关键点集合和所述第一对象的运动趋势确定所述第一对象在所述第二图像中的第二姿态关键点集合,所述第二姿态关键点集合中包含至少一个第二姿态关键点;
针对多个第一姿态关键点集合中的任一个目标第一姿态关键点集合,根据所述目标第一姿态关键点集合中至少一个第一姿态关键点以及所述至少一个第二姿态关键点,确定所述目标第一姿态关键点集合与所述第二姿态关键点集合之间的目标距离;
若所述目标距离满足预设条件,将所述目标第一姿态关键点集合作为所述第二图像中所述第一对象的关键点集合。
可选的,所述预设条件为所述目标距离小于预设门限。
可选地,处理器680具体用于执行如下步骤:
通过所述神经网络模型获取所述第二图像中所有关键点的热力图,所述热力图包含概率热力图以及基于向量的热力图;
通过所述热力图确定所述第二图像中多个对象分别对应的第一姿态关键点集合。
可选地,处理器680具体用于执行如下步骤:
采用预设算法对所述关键点集合和所述第一对象的运动趋势进行计算,得到所述第二姿态关键点集合,所述预设算法为光流算法、卡尔曼滤波算法和滑动窗口算法中的至少一种。
可选的,所述第一对象的运动趋势的确定方式包括:
根据所述第一图像确定所述第一对象的运动趋势。
可选的,所述第一对象的运动趋势的确定方式包括:
根据所述第一图像和所述第二图像之间的像素变化确定所述第一对象的运动趋势。
可选地,处理器680具体用于执行如下步骤:
从所述至少一个第一姿态关键点中获取第一目标关键点的位置信息,并从所述至少一个第二姿态关键点中获取第二目标关键点的位置信息,所述第二目标关键点是与所述第一目标关键点之间直线距离最小的一个关键点;
根据所述第一目标关键点的位置信息以及所述第二目标关键点的位置信息计算所述目标距离。
可选地,处理器680具体用于执行如下步骤:
获取所述至少一个第一姿态关键点中各个第一目标关键点的位置信息,并获取所述至少一个第二姿态关键点中各个第二目标关键点的位置信息,其中,每个第一目标关键点与每个第二目标关键点之间分别具有一一对应关系;
根据所述各个第一目标关键点的位置信息以及分别对应的第二目标关键点的位置信息,计算所述每个第一目标关键点与其所对应的第二目标关键点之间的最小距离;
计算至少一个所述最小距离的平均值,得到所述目标距离。
本发明实施例还提供一种计算机程序产品,包括指令,当其在计算机上运行时,使得计算机执行前述实施例任一项增强现实的处理方法或对象识别的方法。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本发明所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。

Claims (17)

  1. 一种增强现实的处理方法,所述方法应用于终端,所述终端用于对多帧图像中的第一对象生成增强信息图像,所述多帧图像包含第一图像以及第二图像,所述第二图像为所述第一图像之后相邻的一帧图像,所述方法包括:
    获取所述第一图像中第一对象的关键点集合;
    通过神经网络模型获取第二图像中多个对象分别对应的第一姿态关键点集合,所述神经网络模型用于获取对象在图像中的关键点集合,所述第一姿态关键点集合中包含至少一个第一姿态关键点;
    根据所述关键点集合和所述第一对象的运动趋势确定所述第一对象在所述第二图像中的第二姿态关键点集合,所述第二姿态关键点集合中包含至少一个第二姿态关键点;
    针对多个第一姿态关键点集合中的任一个目标第一姿态关键点集合,根据所述目标第一姿态关键点集合中至少一个第一姿态关键点以及所述至少一个第二姿态关键点,确定所述目标第一姿态关键点集合与所述第二姿态关键点集合之间的目标距离;
    若所述目标距离小于所述预设门限,将所述目标第一姿态关键点集合作为所述第二图像中所述第一对象的关键点集合;
    根据所述第二图像中所述第一对象的关键点集合生成增强信息图像。
  2. 根据权利要求1所述的方法,所述方法还包括:
    生成增强现实图像,所述增强现实图像包括所述第二图像以及所述增强信息图像。
  3. 根据权利要求1所述的方法,所述第一对象的运动趋势的确定方式包括:
    根据所述第一图像确定所述第一对象的运动趋势。
  4. 根据权利要求1所述的方法,所述第一对象的运动趋势的确定方式包括:
    根据所述第一图像和所述第二图像之间的像素变化确定所述第一对象的运动趋势。
  5. 一种对象识别的方法,所述方法应用于终端,所述终端用于对多帧图像中的第一对象生成增强信息图像,所述多帧图像包含第一图像以及第二图像,所述第二图像为所述第一图像之后相邻的一帧图像,所述方法包括:
    获取所述第一图像中第一对象的关键点集合;
    通过神经网络模型获取第二图像中多个对象分别对应的第一姿态关键点集合,所述神经网络模型用于获取对象在图像中的关键点集合,所述第一姿态关键点集合中包含至少一个第一姿态关键点;
    根据所述关键点集合和所述第一对象的运动趋势确定所述第一对象在所述第二图像中的第二姿态关键点集合,所述第二姿态关键点集合中包含至少一个第二姿态关键点;
    针对多个第一姿态关键点集合中的任一个目标第一姿态关键点集合,根据所述目标第一姿态关键点集合中至少一个第一姿态关键点以及所述至少一个第二姿态关键点,确定所述目标第一姿态关键点集合与所述第二姿态关键点集合之间的目标距离;
    若所述目标距离满足预设条件,将所述目标第一姿态关键点集合作为所述第二图像中所述第一对象的关键点集合。
  6. 根据权利要求5所述的方法,
    所述预设条件为所述目标距离小于预设门限。
  7. 根据权利要求5所述的方法,所述通过神经网络模型获取第二图像中多个对象分别对应的第一姿态关键点集合,包括:
    通过所述神经网络模型获取所述第二图像中所有关键点的热力图,所述热力图包含概率热力图以及基于向量的热力图;
    通过所述热力图确定所述第二图像中多个对象分别对应的第一姿态关键点集合。
  8. 根据权利要求5所述的方法,所述根据所述关键点集合和所述第一对象的运动趋势确定所述第一对象在所述第二图像中的第二姿态关键点集合,包括:
    采用预设算法对所述关键点集合和所述第一对象的运动趋势进行计算,得到所述第二姿态关键点集合,所述预设算法为光流算法、卡尔曼滤波算法和滑动窗口算法中的至少一种。
  9. 根据权利要求5所述的方法,所述第一对象的运动趋势的确定方式包括:
    根据所述第一图像确定所述第一对象的运动趋势。
  10. 根据权利要求5所述的方法,所述第一对象的运动趋势的确定方式包括:
    根据所述第一图像和所述第二图像之间的像素变化确定所述第一对象的运动趋势。
  11. 根据权利要求5至10中任一项所述的方法,所述针对多个第一姿态关键点集合中的任一个目标第一姿态关键点集合,根据所述目标第一姿态关键点集合中至少一个第一姿态关键点以及所述至少一个第二姿态关键点,确定所述目标第一姿态关键点集合与所述第二姿态关键点集合之间的目标距离,包括:
    从所述至少一个第一姿态关键点中获取第一目标关键点的位置信息,并从所述至少一个第二姿态关键点中获取第二目标关键点的位置信息,所述第二目标关键点是与所述第一目标关键点之间直线距离最小的一个关键点;
    根据所述第一目标关键点的位置信息以及所述第二目标关键点的位置信息计算所述目标距离。
  12. 根据权利要求5至10中任一项所述的方法,所述针对多个第一姿态关键点集合中的任一个目标第一姿态关键点集合,根据所述目标第一姿态关键点集合中至少一个第一姿态关键点以及所述至少一个第二姿态关键点,确定所述目标第一姿态关键点集合与所述第二姿态关键点集合之间的目标距离,包括:
    获取所述至少一个第一姿态关键点中各个第一目标关键点的位置信息,并获取所述至少一个第二姿态关键点中各个第二目标关键点的位置信息,其中,每个第一目标关键点与每个第二目标关键点之间分别具有一一对应关系;
    根据所述各个第一目标关键点的位置信息以及分别对应的第二目标关键点的位置信息,计算所述每个第一目标关键点与其所对应的第二目标关键点之间的最小距离;
    计算至少一个所述最小距离的平均值,得到所述目标距离。
  13. 根据权利要求5所述的方法,
    所述第一对象与第一对象标识具有唯一对应关系;
    所述第一对象标识用于在所述多帧图像中标识所述第一对象。
  14. 一种终端,所述终端用于对多帧图像中的第一对象生成增强信息图像,所述多帧图像包含第一图像以及第二图像,所述第二图像为所述第一图像之后相邻的一帧图像,所述终端包括:
    获取模块,用于获取所述第一图像中第一对象的关键点集合;
    所述获取模块,用于通过神经网络模型获取第二图像中多个对象分别对应的第一姿态关键点集合,所述神经网络模型用于获取对象在图像中的关键点集合,所述第一姿态关键点集合中包含至少一个第一姿态关键点;
    所述获取模块,用于根据所述关键点集合和所述第一对象的运动趋势确定所述第一对象在所述第二图像中的第二姿态关键点集合,所述第二姿态关键点集合中包含至少一个第二姿态关键点;
    确定模块,用于针对多个第一姿态关键点集合中的任一个目标第一姿态关键点集合,根据所述目标第一姿态关键点集合中至少一个第一姿态关键点以及所述至少一个第二姿态关键点,确定所述目标第一姿态关键点集合与所述第二姿态关键点集合之间的目标距离;
    所述确定模块,用于若所述目标距离满足预设条件,将所述目标第一姿态关键点集合作为所述第二图像中所述第一对象的关键点集合。
  15. 一种终端,包括:存储器、处理器以及总线系统;
    其中,所述存储器用于存储程序;
    所述处理器用于执行所述存储器中的程序,具体包括如下步骤:
    获取所述第一图像中第一对象的关键点集合;
    通过神经网络模型获取第二图像中多个对象分别对应的第一姿态关键点集合,所述神经网络模型用于获取对象在图像中的关键点集合,所述第一姿态关键点集合中包含至少一个第一姿态关键点;
    根据所述关键点集合和所述第一对象的运动趋势确定所述第一对象在所述第二图像中的第二姿态关键点集合,所述第二姿态关键点集合中包含至少一个第二姿态关键点;
    针对多个第一姿态关键点集合中的任一个目标第一姿态关键点集合,根据所述目标第一姿态关键点集合中至少一个第一姿态关键点以及所述至少一个第二姿态关键点,确定所述目标第一姿态关键点集合与所述第二姿态关键点集合之间的目标距离;
    若所述目标距离满足预设条件,将所述目标第一姿态关键点集合作为所述第二图像中所述第一对象的关键点集合;
    所述总线系统用于连接所述存储器以及所述处理器,以使所述存储器以及所述处理器进行通信。
  16. 一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行如权利要求1至13中任一项所述的方法。
  17. 一种计算机程序产品,包括指令,当其在计算机上运行时,使得计算机执行如权利要求1至13任一项所述的方法。
PCT/CN2018/120301 2017-12-13 2018-12-11 一种增强现实的处理方法、对象识别的方法及相关设备 WO2019114696A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP18887537.1A EP3617995A4 (en) 2017-12-13 2018-12-11 AUGMENTED REALITY PROCESSING PROCESS, OBJECT RECOGNITION PROCESS AND ASSOCIATED APPARATUS
US16/680,058 US10891799B2 (en) 2017-12-13 2019-11-11 Augmented reality processing method, object recognition method, and related device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711329984.8 2017-12-13
CN201711329984.8A CN109918975B (zh) 2017-12-13 2017-12-13 一种增强现实的处理方法、对象识别的方法及终端

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/680,058 Continuation US10891799B2 (en) 2017-12-13 2019-11-11 Augmented reality processing method, object recognition method, and related device

Publications (1)

Publication Number Publication Date
WO2019114696A1 true WO2019114696A1 (zh) 2019-06-20

Family

ID=66819994

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/120301 WO2019114696A1 (zh) 2017-12-13 2018-12-11 一种增强现实的处理方法、对象识别的方法及相关设备

Country Status (4)

Country Link
US (1) US10891799B2 (zh)
EP (1) EP3617995A4 (zh)
CN (1) CN109918975B (zh)
WO (1) WO2019114696A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598675A (zh) * 2019-09-24 2019-12-20 深圳度影医疗科技有限公司 一种超声胎儿姿态的识别方法、存储介质及电子设备
CN110705390A (zh) * 2019-09-17 2020-01-17 平安科技(深圳)有限公司 基于lstm的形体姿态识别方法、装置及存储介质
CN110866515A (zh) * 2019-11-22 2020-03-06 三一重工股份有限公司 厂房内对象行为识别方法、装置以及电子设备
CN113031464A (zh) * 2021-03-22 2021-06-25 北京市商汤科技开发有限公司 设备控制方法、装置、电子设备及存储介质

Families Citing this family (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109918975B (zh) * 2017-12-13 2022-10-21 腾讯科技(深圳)有限公司 一种增强现实的处理方法、对象识别的方法及终端
US10937185B2 (en) * 2018-12-03 2021-03-02 Everseen Limited System and method to detect articulate body pose
US11429842B2 (en) * 2019-02-24 2022-08-30 Microsoft Technology Licensing, Llc Neural network for skeletons from input images
CN109977847B (zh) * 2019-03-22 2021-07-16 北京市商汤科技开发有限公司 图像生成方法及装置、电子设备和存储介质
US10996664B2 (en) * 2019-03-29 2021-05-04 Mitsubishi Electric Research Laboratories, Inc. Predictive classification of future operations
CN110264499A (zh) * 2019-06-26 2019-09-20 北京字节跳动网络技术有限公司 基于人体关键点的交互位置控制方法、装置及电子设备
CN110443190B (zh) * 2019-07-31 2024-02-02 腾讯科技(成都)有限公司 一种对象识别方法和装置
CN110675447A (zh) * 2019-08-21 2020-01-10 电子科技大学 一种基于可见光相机与热像仪结合的人数统计方法
US11475590B2 (en) * 2019-09-12 2022-10-18 Nec Corporation Keypoint based pose-tracking using entailment
CN111881705B (zh) * 2019-09-29 2023-12-12 深圳数字生命研究院 数据处理、训练、识别方法、装置和存储介质
CN110782404B (zh) * 2019-10-11 2022-06-10 北京达佳互联信息技术有限公司 一种图像处理方法、装置及存储介质
EP3819812B1 (en) * 2019-11-08 2023-08-16 Axis AB A method of object re-identification
CN111428665B (zh) * 2020-03-30 2024-04-12 咪咕视讯科技有限公司 一种信息确定方法、设备及计算机可读存储介质
CN111428672A (zh) * 2020-03-31 2020-07-17 北京市商汤科技开发有限公司 交互对象的驱动方法、装置、设备以及存储介质
CN111539992A (zh) * 2020-04-29 2020-08-14 北京市商汤科技开发有限公司 图像处理方法、装置、电子设备和存储介质
CN111709428B (zh) * 2020-05-29 2023-09-15 北京百度网讯科技有限公司 图像中关键点位置的识别方法、装置、电子设备及介质
US11276201B1 (en) 2020-06-01 2022-03-15 Snap Inc. Localizing an augmented reality device
US11748954B2 (en) * 2020-06-01 2023-09-05 Snap Inc. Tracking an augmented reality device
CN111832526B (zh) * 2020-07-23 2024-06-11 浙江蓝卓工业互联网信息技术有限公司 一种行为检测方法及装置
CN111918114A (zh) * 2020-07-31 2020-11-10 北京市商汤科技开发有限公司 图像显示方法、装置、显示设备及计算机可读存储介质
CN112308977B (zh) * 2020-10-29 2024-04-16 字节跳动有限公司 视频处理方法、视频处理装置和存储介质
CN112270302A (zh) * 2020-11-17 2021-01-26 支付宝(杭州)信息技术有限公司 肢体控制方法、装置和电子设备
CN113326778B (zh) * 2021-05-31 2022-07-12 中科计算技术西部研究院 基于图像识别的人体姿态检测方法、装置和存储介质
CN113780176B (zh) * 2021-09-10 2023-08-25 平安科技(深圳)有限公司 局部遮挡对象识别方法、装置、设备及存储介质
CN115880776B (zh) * 2022-12-13 2023-11-03 北京百度网讯科技有限公司 关键点信息的确定方法和离线动作库的生成方法、装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103093212A (zh) * 2013-01-28 2013-05-08 北京信息科技大学 基于人脸检测和跟踪截取人脸图像的方法和装置
CN104992452A (zh) * 2015-06-25 2015-10-21 中国计量学院 基于热成像视频的飞行目标自动跟踪方法
CN105975119A (zh) * 2016-04-21 2016-09-28 北京集创北方科技股份有限公司 多目标追踪方法、触摸屏控制方法及系统
CN106845385A (zh) * 2017-01-17 2017-06-13 腾讯科技(上海)有限公司 视频目标跟踪的方法和装置
JP2017138659A (ja) * 2016-02-01 2017-08-10 トヨタ自動車株式会社 物体追跡方法、物体追跡装置、およびプログラム
CN107038713A (zh) * 2017-04-12 2017-08-11 南京航空航天大学 一种融合光流法和神经网络的运动目标捕捉方法

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015504616A (ja) * 2011-09-26 2015-02-12 マイクロソフト コーポレーション 透過近眼式ディスプレイのセンサ入力に基づく映像表示修正
US9996150B2 (en) * 2012-12-19 2018-06-12 Qualcomm Incorporated Enabling augmented reality using eye gaze tracking
US10852838B2 (en) * 2014-06-14 2020-12-01 Magic Leap, Inc. Methods and systems for creating virtual and augmented reality
US10339389B2 (en) * 2014-09-03 2019-07-02 Sharp Laboratories Of America, Inc. Methods and systems for vision-based motion estimation
CN116778367A (zh) * 2016-06-03 2023-09-19 奇跃公司 增强现实身份验证
CN109918975B (zh) * 2017-12-13 2022-10-21 腾讯科技(深圳)有限公司 一种增强现实的处理方法、对象识别的方法及终端

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103093212A (zh) * 2013-01-28 2013-05-08 北京信息科技大学 基于人脸检测和跟踪截取人脸图像的方法和装置
CN104992452A (zh) * 2015-06-25 2015-10-21 中国计量学院 基于热成像视频的飞行目标自动跟踪方法
JP2017138659A (ja) * 2016-02-01 2017-08-10 トヨタ自動車株式会社 物体追跡方法、物体追跡装置、およびプログラム
CN105975119A (zh) * 2016-04-21 2016-09-28 北京集创北方科技股份有限公司 多目标追踪方法、触摸屏控制方法及系统
CN106845385A (zh) * 2017-01-17 2017-06-13 腾讯科技(上海)有限公司 视频目标跟踪的方法和装置
CN107038713A (zh) * 2017-04-12 2017-08-11 南京航空航天大学 一种融合光流法和神经网络的运动目标捕捉方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3617995A4 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110705390A (zh) * 2019-09-17 2020-01-17 平安科技(深圳)有限公司 基于lstm的形体姿态识别方法、装置及存储介质
CN110598675A (zh) * 2019-09-24 2019-12-20 深圳度影医疗科技有限公司 一种超声胎儿姿态的识别方法、存储介质及电子设备
CN110866515A (zh) * 2019-11-22 2020-03-06 三一重工股份有限公司 厂房内对象行为识别方法、装置以及电子设备
CN110866515B (zh) * 2019-11-22 2023-05-09 盛景智能科技(嘉兴)有限公司 厂房内对象行为识别方法、装置以及电子设备
CN113031464A (zh) * 2021-03-22 2021-06-25 北京市商汤科技开发有限公司 设备控制方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
EP3617995A1 (en) 2020-03-04
US10891799B2 (en) 2021-01-12
CN109918975B (zh) 2022-10-21
EP3617995A4 (en) 2021-02-17
CN109918975A (zh) 2019-06-21
US20200082635A1 (en) 2020-03-12

Similar Documents

Publication Publication Date Title
WO2019114696A1 (zh) 一种增强现实的处理方法、对象识别的方法及相关设备
CN111417028B (zh) 信息处理方法、装置、存储介质及电子设备
CN109785368B (zh) 一种目标跟踪方法和装置
CN108234276B (zh) 一种虚拟形象之间互动的方法、终端及系统
WO2019141174A1 (zh) 未读消息的处理方法及移动终端
CN109213732B (zh) 一种改善相册分类的方法、移动终端及计算机可读存储介质
CN108985220B (zh) 一种人脸图像处理方法、装置及存储介质
CN109743504B (zh) 一种辅助拍照方法、移动终端和存储介质
CN111045511B (zh) 基于手势的操控方法及终端设备
CN107885448B (zh) 应用触摸操作的控制方法、移动终端及可读存储介质
CN109495616B (zh) 一种拍照方法及终端设备
CN111079030A (zh) 一种群组搜索方法及电子设备
CN109062485B (zh) 双面屏的显示方法、双面屏终端及计算机可读存储介质
CN109683778B (zh) 一种柔性屏控制方法、设备及计算机可读存储介质
CN109117037B (zh) 一种图像处理的方法及终端设备
CN114398113A (zh) 界面显示方法、智能终端及存储介质
WO2021083086A1 (zh) 信息处理方法及设备
CN112818733B (zh) 信息处理方法、装置、存储介质及终端
WO2024055748A1 (zh) 一种头部姿态估计方法、装置、设备以及存储介质
CN109547696B (zh) 一种拍摄方法及终端设备
WO2023137923A1 (zh) 基于姿态指导的行人重识别方法、装置、设备及存储介质
CN110750318A (zh) 一种消息回复方法、装置及移动终端
CN108809802A (zh) 一种信息的展示方法及移动终端
CN115167748A (zh) 界面控制方法、智能终端及存储介质
CN111026562B (zh) 一种消息发送方法及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18887537

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2018887537

Country of ref document: EP

Effective date: 20191125

NENP Non-entry into the national phase

Ref country code: DE