WO2022088819A1 - 视频处理方法、视频处理装置和存储介质 - Google Patents
视频处理方法、视频处理装置和存储介质 Download PDFInfo
- Publication number
- WO2022088819A1 WO2022088819A1 PCT/CN2021/110560 CN2021110560W WO2022088819A1 WO 2022088819 A1 WO2022088819 A1 WO 2022088819A1 CN 2021110560 W CN2021110560 W CN 2021110560W WO 2022088819 A1 WO2022088819 A1 WO 2022088819A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- video
- landmark
- video processing
- subject
- landmark building
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 65
- 238000012545 processing Methods 0.000 title claims abstract description 62
- 230000000875 corresponding effect Effects 0.000 claims abstract description 49
- 230000009471 action Effects 0.000 claims abstract description 31
- 238000004422 calculation algorithm Methods 0.000 claims description 31
- 230000015654 memory Effects 0.000 claims description 30
- 238000000034 method Methods 0.000 claims description 30
- 238000004590 computer program Methods 0.000 claims description 19
- 238000000605 extraction Methods 0.000 claims description 12
- 238000013145 classification model Methods 0.000 claims description 10
- 238000003062 neural network model Methods 0.000 claims description 5
- 238000013507 mapping Methods 0.000 claims description 4
- 239000000463 material Substances 0.000 claims description 4
- 230000000694 effects Effects 0.000 abstract description 26
- 230000003993 interaction Effects 0.000 abstract description 13
- 230000036544 posture Effects 0.000 description 35
- 238000004891 communication Methods 0.000 description 14
- 238000010586 diagram Methods 0.000 description 11
- 230000006870 function Effects 0.000 description 10
- 238000013515 script Methods 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 7
- 230000000007 visual effect Effects 0.000 description 7
- 230000003287 optical effect Effects 0.000 description 6
- 238000011156 evaluation Methods 0.000 description 4
- 239000004065 semiconductor Substances 0.000 description 4
- 230000000295 complement effect Effects 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 238000009877 rendering Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000005452 bending Methods 0.000 description 2
- 239000000470 constituent Substances 0.000 description 2
- 238000006073 displacement reaction Methods 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 239000002096 quantum dot Substances 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 1
- 241000404883 Pisa Species 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000002184 metal Substances 0.000 description 1
- 229910044991 metal oxide Inorganic materials 0.000 description 1
- 150000004706 metal oxides Chemical class 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- SNICXCGAKADSCV-UHFFFAOYSA-N nicotine Chemical compound CN1CCCC1C1=CC=CN=C1 SNICXCGAKADSCV-UHFFFAOYSA-N 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T19/00—Manipulating 3D models or images for computer graphics
- G06T19/006—Mixed reality
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/35—Categorising the entire scene, e.g. birthday party or wedding scene
- G06V20/38—Outdoor scenes
- G06V20/39—Urban scenes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44012—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving rendering scenes according to scene graphs, e.g. MPEG-4 scene graphs
Definitions
- Embodiments of the present disclosure relate to a video processing method, a video processing apparatus, and a storage medium.
- Augmented Reality (AR) technology is a technology that ingeniously integrates virtual information with the real world. It widely uses various technical means such as multimedia, 3D modeling, real-time tracking and registration, intelligent interaction, and sensing. AR technology simulates and simulates computer-generated text, images, 3D models, music, videos and other virtual information, and applies it to the real world.
- Short videos have the characteristics of strong social attributes, easy creation, and short duration, which are more in line with the fragmented content consumption habits of users in the mobile Internet era.
- the unique virtual and real fusion special effects of AR technology make it have broad application prospects and unlimited expansion space in the field of short video.
- landmark AR special effects are one of the hot spots in the short video field. Landmark AR special effects can increase the fun of shooting, prompting users to take more initiative to shoot and record.
- At least one embodiment of the present disclosure provides a video processing method, a video processing device, and a storage medium, which can enhance the interaction between the user and the photographed landmark buildings, increase the fun of photographing, and allow the user to obtain distinctive
- the shooting experience urges users to take more initiative to shoot and record, which broadens the application scope of the product and improves the market competitiveness of the product.
- At least one embodiment of the present disclosure provides a video processing method, wherein a picture of the video includes a landmark building and an activity subject, the method includes: identifying and tracking landmark buildings in the video; extracting and tracking activities in the video The key points of the main body are determined, and the posture of the active main body is determined according to the extracted information of the key points of the active main body; the key points of the active main body are corresponding to the landmark buildings, and the key points of the active main body are corresponding The action of driving the landmark building to perform the corresponding action, so that the posture of the landmark building in the picture of the video corresponds to the posture of the active subject.
- the method before the landmark building is driven to perform the corresponding action according to the action of the key point of the active subject, the method further includes: cutting out the picture of the video the landmark building; according to the pixels located around the landmark building in the picture of the video, the background at the landmark building that has been deducted is complemented by a smooth interpolation algorithm; the landmark building is restored to after the background is completed at the location.
- the key points of the moving subject are corresponding to the landmark buildings, and the landmark building is driven to perform corresponding actions according to the actions of the key points of the moving subject.
- the action includes: mapping the key points of the moving subject to the landmark building, so as to correspond the key points of the moving subject with the landmark building, so that the landmark building follows the key points of the moving subject. action to perform the corresponding action.
- the spine line of the active subject is mapped onto the central axis of the landmark building.
- the video processing method provided by at least one embodiment of the present disclosure further includes: anthropomorphizing the landmark building by means of a predefined animation material map, so that the landmark building has the characteristics of the active subject.
- a neural network model is used to extract the key points of the active subject in the video picture.
- identifying a landmark building in a picture of the video includes: extracting feature points of the landmark building; comparing the extracted feature points of the landmark building with the building The feature point classification model is matched to identify the landmark building.
- tracking landmark buildings in the video and tracking key points of moving subjects in the video include: by detecting the The landmark building and the key points of the activity body realize the tracking of the landmark building and the activity body.
- the key points of the active subject are corresponding to multiple landmark buildings, and all the key points of the active subject are driven according to actions of the key points of the active subject.
- the plurality of landmark buildings perform corresponding actions according to the actions of the active subjects.
- the key points of multiple active subjects are respectively corresponding to multiple landmark buildings, and the key points of the multiple active subjects are corresponding to the
- the multiple actions respectively drive the multiple landmark buildings to perform corresponding actions according to the actions of the corresponding active subjects, so that the gestures of the multiple landmark buildings and the gestures of the multiple active subjects correspond one-to-one.
- the video processing method provided by at least one embodiment of the present disclosure further includes: recording the video in real time by using an image capturing device, and processing each frame of images in the video in real time, so that the posture of the landmark building is consistent with that of the landmark building.
- the posture of the active body corresponds.
- the moving subject is a human body.
- At least one embodiment of the present disclosure further provides a video processing device, wherein a picture of the video includes a landmark building and an active subject, and the video processing device includes: an identification unit configured to identify and track the landmark building in the video; extracting a unit, configured to extract and track the key points of the active subject in the video, and determine the posture of the active subject according to the extracted information of the key points of the active subject; the driving unit is configured to convert the key points of the active subject
- the key points correspond to the landmark buildings, and the landmark buildings are driven to perform corresponding actions according to the actions of the key points of the active subject, so that the posture of the landmark buildings in the video picture is consistent with the movement of the active subject. Attitude corresponds.
- the video processing apparatus provided in at least one embodiment of the present disclosure further includes: an image capturing apparatus configured to record the video in real time, so as to perform real-time processing on each frame of images in the video.
- the moving subject and the landmark building are located on the same side or on different sides of the image capturing device.
- At least one embodiment of the present disclosure also provides a video processing apparatus, comprising: a processor; a memory; and one or more computer program modules, wherein the one or more computer program modules are stored in the memory and configured To be executed by the processor, the one or more computer program modules include instructions for executing the video processing method provided by any of the embodiments of the present disclosure.
- At least one embodiment of the present disclosure further provides a storage medium for non-transitory storage of computer-readable instructions, which can execute the video processing method provided by any embodiment of the present disclosure when the computer-readable instructions are executed by a computer.
- FIG. 1A is a flowchart of an example of a video processing method provided by at least one embodiment of the present disclosure
- FIG. 1B is a schematic diagram of a mapping relationship between an activity subject and a landmark building provided by at least one embodiment of the present disclosure
- FIG. 2 is a flowchart of a background completion method provided by at least one embodiment of the present disclosure
- FIG. 3 is a flowchart of a landmark building identification algorithm provided by at least one embodiment of the present disclosure
- FIG. 4 is a system that can be used to implement the video processing method provided by the embodiment of the present disclosure
- FIG. 5 is a schematic block diagram of a video processing apparatus according to at least one embodiment of the present disclosure.
- FIG. 6 is a schematic block diagram of another video processing apparatus provided by at least one embodiment of the present disclosure.
- FIG. 7 is a schematic block diagram of still another video processing apparatus provided by at least one embodiment of the present disclosure.
- FIG. 8 is a schematic structural diagram of an electronic device according to at least one embodiment of the present disclosure.
- FIG. 9 is a schematic diagram of a storage medium provided by at least one embodiment of the present disclosure.
- the term “including” and variations thereof are open-ended inclusions, ie, "including but not limited to”.
- the term “based on” is “based at least in part on.”
- the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one additional embodiment”; the term “some embodiments” means “at least some embodiments”. Relevant definitions of other terms will be given in the description below.
- landmark AR special effects can turn buildings into cartoon special effects, allowing users to obtain a unique shooting experience through a high degree of combination of real and virtual effects.
- At least one embodiment of the present disclosure provides a video processing method, wherein a picture of the video includes a landmark building and an active subject, the method includes: identifying and tracking the landmark building in the video; extracting and tracing the key points of the active subject in the video, and according to The extracted information of the key points of the active subject determines the posture of the active subject; the key points of the active subject are corresponded to the landmark buildings, and the landmark buildings are driven to perform corresponding actions according to the actions of the key points of the active subject, so that the landmarks in the video are displayed.
- the posture of the building corresponds to the posture of the active subject.
- Some embodiments of the present disclosure also provide a video processing apparatus and a storage medium corresponding to the above-mentioned video processing method.
- the video processing method, video processing device and storage medium provided by at least one embodiment of the present disclosure can enhance the interaction between the user and the photographed landmark buildings, increase the fun of photographing, allow the user to obtain a distinctive photographing experience, and encourage the user to Taking more initiative to shoot and record, broaden the application scope of the product and improve the market competitiveness of the product.
- At least one embodiment of the present disclosure provides a video processing method, which, for example, can be applied to the short video field, etc., to increase the interaction between the user and the shooting landmark, and increase the fun of shooting.
- the video processing method can be implemented in software, hardware, firmware or any combination thereof, loaded and executed by processors in devices such as mobile phones, digital cameras, tablet computers, notebook computers, desktop computers, network servers, etc., to The correspondence between the posture of the landmark building and the posture of the active subject is realized, thereby increasing the interaction between the user and the photographed landmark building.
- the video processing method is applicable to a computing device, which includes any electronic device with computing functions, such as a mobile phone, a digital camera, a notebook computer, a tablet computer, a desktop computer, a network server, etc., which can be loaded and executed
- a computing device may include a central processing unit (Central Processing Unit, CPU) or a graphics processing unit (Graphics Processing Unit, GPU), digital signal processor (DSP) and other forms with data processing capability and/or instruction execution capability
- the processing unit, storage unit, etc., the computing device is also installed with an operating system, an application programming interface (for example, OpenGL (Open Graphics Library), Metal, etc.), etc., by running codes or instructions. video processing method.
- an application programming interface for example, OpenGL (Open Graphics Library), Metal, etc.
- the computing device may further include an output component such as a display component, such as a Liquid Crystal Display (LCD), an Organic Light Emitting Diode (OLED) display, a quantum dot light emitting diode ( Quantum Dot Light Emitting Diode, QLED) display screen, etc.
- a display component such as a Liquid Crystal Display (LCD), an Organic Light Emitting Diode (OLED) display, a quantum dot light emitting diode ( Quantum Dot Light Emitting Diode, QLED) display screen, etc.
- the embodiments of the present disclosure are not limited thereto.
- the display component can display the processed video, for example, in the processed video, the posture of the landmark building corresponds to the posture of the active subject, for example, the posture of the landmark building and the posture of the active subject are consistent or complementary, etc., thereby Increase interaction between users and photographed landmarks.
- FIG. 1A is a flowchart of an example of a video processing method provided by at least one embodiment of the present disclosure.
- the video processing method provided by at least one embodiment of the present disclosure will be described in detail below with reference to FIG. 1A .
- the video processing method includes steps S110 to S130 .
- Step S110 Identify and track landmark buildings in the video.
- Step S120 Extract and track the key points of the active subject in the video, and determine the posture of the active subject according to the extracted information of the key points of the active subject.
- Step S130 Corresponding key points of the moving subject with landmark buildings, and driving the landmark building to perform corresponding actions according to the actions of the key points of the moving subject, so that the gesture of the landmark building in the video screen corresponds to the gesture of the moving subject.
- the footage of the video includes landmarks and moving subjects.
- the movable body may be a human body, or may be other movable objects such as animals, puppets, or robots, which are not limited in the embodiments of the present disclosure.
- label buildings refer to characteristic buildings, natural landscapes, artificial landscapes, etc.
- the Oriental Pearl Tower in Shanghai, the CCTV Headquarters Building in Beijing can all be called Landmark building.
- the embodiments of the present disclosure include, but are not limited to, the landmark buildings may be any characteristic artificial structures or natural objects.
- tracking the landmark building in the video includes: implementing the tracking of the landmark building by detecting the landmark building in each frame image of the video.
- the landmark building in each frame of image in the video is independently detected to realize the tracking of the landmark building in the video, so that the real-time driving of the pose of the landmark building in the video can be realized.
- the tracking of landmark buildings can be achieved through visual object tracking techniques such as generative model methods and discriminative model methods, for example, using image features and machine tools in discriminative model methods.
- Learning in the current frame image, the target area (for example, a landmark building) is used as a positive sample, and the background area is a negative sample, the machine learning method is used to train the classifier, and the trained classifier is used to find the optimal area in the next frame image (ie landmarks in the next frame of images) to achieve target tracking.
- the target area for example, a landmark building
- the background area is a negative sample
- the machine learning method is used to train the classifier
- the trained classifier is used to find the optimal area in the next frame image (ie landmarks in the next frame of images) to achieve target tracking.
- image tracking algorithms may also be used in this step S110, which is not limited in this embodiment of the present disclosure.
- FIG. 3 is a flowchart of a landmark building identification algorithm provided by at least one embodiment of the present disclosure. That is, FIG. 3 is a flowchart of an example of step S110 shown in FIG. 1A .
- a landmark building identification algorithm provided by at least one embodiment of the present disclosure will be described in detail below with reference to FIG. 3 .
- the identification algorithm includes step S111 and step S112.
- Step S111 Extract feature points of landmark buildings.
- Step S112 Match the extracted feature points of the landmark buildings with the building feature point classification model to identify the landmark buildings.
- a SIFT Scale-invariant feature transform, scale-invariant feature transform
- a SIFT algorithm can be used to extract the feature points of landmark buildings in the picture of the video.
- the feature points of landmark buildings in the video frame can be extracted by performing operations included in the SIFT algorithm, such as scale space extreme value detection, key point location, direction determination, and key point description.
- feature points are generally local extreme points, such as corner points, boundary points, etc., which can be automatically identified by the SIFT algorithm.
- the feature points of the building feature point classification model can also be extracted based on the SIFT algorithm, and the SIFT algorithm is used to perform operations such as SIFT feature generation and SIFT feature vector matching, and the feature points are matched one by one.
- the matching degree is higher than
- more than 85% of the buildings can be judged as landmark buildings.
- the building feature point classification model may be a landmark building classification model trained based on the modeling method in the field.
- the specific introduction of the modeling method please refer to the description in the field, which will not be repeated here.
- HOG Histogram of Oriented Gradients, histogram of directional gradients
- SVM Small Vector Machine, support vector machine
- other feature extractors for example, LBP (Local Binary Pattern, Local Binary Pattern) feature extraction Algorithm, Haar Feature Extraction Algorithm
- a classifier for example, LBP (Local Binary Pattern, Local Binary Pattern) feature extraction Algorithm, Haar Feature Extraction Algorithm
- a classifier for example, HOG (Histogram of Oriented Gradients, histogram of directional gradients) + SVM (Support Vector Machine, support vector machine) or other feature extractors (for example, LBP (Local Binary Pattern, Local Binary Pattern) feature extraction Algorithm, Haar Feature Extraction Algorithm) and a classifier are combined to achieve extraction and matching of feature points of landmark buildings, which are not limited in the embodiments of the present disclosure.
- LBP Local Binary Pattern
- Haar Feature Extraction Algorithm Haar Feature Extraction Algori
- landmarks in the video are identified and tracked so that the 3D pose of the landmark in the world coordinate system can be restored.
- the SIFT algorithm can be used to calculate each frame image of the video relative to the building feature point classification model.
- the camera pose matrix Each frame of the video corresponds to a camera pose matrix.
- the camera pose matrix can reflect the position of the camera in the world coordinate system, indicating the observation angle (observation position and observation angle) of the video image including landmark buildings and human body, that is, when a certain frame of the video is shot
- the selected observation angle can restore the 3D posture of the landmark building in the world coordinate system to correspond to the 3D posture of the human body in the subsequent steps.
- the key points of the active subject in the video picture are extracted by the trained neural network model.
- the key points may include various large joints such as the human head, hands, and feet, so as to obtain the 3D skeleton of the human body.
- the neural network may be a convolutional neural network, a bidirectional Long Short Term Memory Network (BLSTM), a Connectionist Temporal Classification (CTC), etc., embodiments of the present disclosure There is no restriction on this.
- BLSTM Long Short Term Memory Network
- CTC Connectionist Temporal Classification
- HOG feature extraction algorithm For example, HOG feature extraction algorithm, LBP feature extraction algorithm, and Haar feature extraction algorithm can also be used to extract the key points of the active subject.
- SIFT algorithm can also be used to extract the key points of the active subject in the video. This does not limit.
- tracking the key points of the moving subject in the video includes: tracking the key points of the moving subject by detecting the key points of the moving subject in each frame of the video.
- the key points of the moving subject in each frame of images in the video are independently detected to realize the tracking of the key points of the moving subject in the video, so that the posture of the moving subject can be determined in real time according to the key point information in each frame of images tracked .
- the tracking of the key points of the active subject can be realized by the target visual tracking (Visual Object Tracking) technology such as the generative model method and the discriminative model method.
- Visual Object Tracking Visual Object Tracking
- the generative model method the discriminative model method
- other image tracking algorithms may also be used in this step S120, which is not limited in the embodiment of the present disclosure.
- the posture of the moving subject is determined according to the extracted information of the key points of the moving subject in each frame.
- the information of the key point may include the relative positional relationship, direction, etc. of the key point in each frame of the video
- the posture of the active subject may include bending, leaning back, twisting, or Little, etc.
- the disclosed embodiments are not limited in this regard.
- the active subject is a human body
- it can be determined that the human body eg, the user
- the landmark building can be driven to make a corresponding posture according to the posture of the human body.
- the key points of the active subject are corresponding to the landmark buildings, and the landmark buildings are driven to perform corresponding actions according to the actions of the key points of the active subject, including: mapping the key points of the active subject to the landmark buildings, so as to map the key points of the active subject to the landmark buildings.
- the key points of the active subject correspond to the landmark buildings, so that the landmark buildings follow the movements of the key points of the active subject and perform corresponding actions.
- the spine line L2 of the human body for example, the spine line including the key points of the head P1, neck P2, waist P3, hip P4, etc.
- the corresponding feature points P5-P8 on the central axis L1 of the building for example, in FIG.
- the landmark building is the Leaning Tower of Pisa
- the key points (not shown in the figure) of other parts such as hands and feet are also It is mapped to other feature points of the landmark building, so that when the position and direction of the key points of the human body change, the position and direction of the corresponding feature points on the landmark building are correspondingly changed, thereby driving the landmark building to follow the key points of the human body. and perform corresponding actions, so that the posture of the landmark building in the video screen corresponds to the posture of the active subject, so that the special effect displayed in the video is that the landmark building dances with the dancing of the human body, which increases the distance between the user and the landmark building. interaction and the fun of the interaction.
- the key points P1-P4 of the human body are in a one-to-one correspondence with the features P5-P8 of the landmark building in a certain proportional relationship.
- the landmark corresponding to the key point is also moved correspondingly.
- the displacement relationship of the feature points of the building for example, when the key point P1 of the human body is displaced to the lower left or lower right (for example, the person tilts his head to the left or to the right), the feature point P5 of the landmark building corresponding to the key point P1 also To lower left or lower right displacement, so as to drive the landmark building to perform the corresponding action according to the action of the key point of the active subject.
- the posture of the landmark building corresponds to the posture of the active subject
- the posture of the landmark building includes that the posture of the landmark building is the same as that of the active subject, that is, in the video, the landmark building dances following the dancing posture of the active subject, for example, the human body bends over When the landmark building will bend over, the landmark building will also lean back when the human body leans back; or, the posture of the landmark building is complementary to the posture of the active subject, for example, when the human body bends over, the landmark building leans back, and when the human body twists to the left , the landmark building is twisted to the right, etc., which are not limited by the embodiments of the present disclosure.
- the pixels that were originally occluded by the landmark building may be exposed.
- the background part of the landmark building is complemented, so that when the landmark building moves, the pixels originally occluded by the landmark building can display the background behind the landmark building, thereby making the picture smoother and providing more good visual effects.
- FIG. 2 is a flowchart of a background completion method provided by at least one embodiment of the present disclosure. As shown in FIG. 2, the background completion method includes steps S140-S160.
- Step S140 Cut out the landmark building in the picture of the video.
- Step S150 Completing the background of the deducted landmark building through a smooth interpolation algorithm according to the pixels located around the landmark building in the picture of the video.
- Step S160 Restore the landmark building to the position after the background is completed.
- step S140 for example, an image keying algorithm in the art can be used to key out the landmark building in the picture of the video.
- an image keying algorithm in the art can be used to key out the landmark building in the picture of the video.
- the background at the deducted landmark building can be completed by, for example, a smooth interpolation algorithm according to the pixels located around the landmark building, and the background at the deducted landmark building can also be completed by other algorithms.
- a smooth interpolation algorithm according to the pixels located around the landmark building
- the background at the deducted landmark building can also be completed by other algorithms.
- the embodiments of the present disclosure do not limit this.
- the smooth interpolation algorithm reference may be made to the related introduction in the art, and details are not repeated here.
- the landmark building is restored to the position after the background is completed, so that when the landmark building is moved, the pixels that are exposed when the landmark building is offset display the background completed behind the landmark building, so that the picture can be more Smooth and provide better visual effects.
- the landmark building can be restored to the position after the background is completed by returning the display data of the landmark building to the pixels at the position after the background is completed.
- other methods in the art can also be used. Repeat. The embodiments of the present disclosure do not limit this.
- the complemented background is displayed in the form of a blurred background image, so that while making the picture smoother, the landmark building is mainly displayed, thereby providing a better visual display effect.
- the landmark building can also be anthropomorphic, for example, making the landmark building grow out of face contours, hands and so on.
- the landmark building is anthropomorphic by means of predefined animation material maps, so that the landmark building has the characteristics of an active subject.
- one image among the multiple frames of images of the video may be selected as the preview background curtain according to the user's instruction or according to a preset rule.
- 3D rendering can be performed using scripts from 3D modeling and animation software, such as Maya scripts.
- the Maya script is an executable script, such as a script written in Maya software by a designer designing AR special effects of landmark buildings, or a script provided in Maya software, which is not limited in the embodiments of the present disclosure.
- the landmark building model can be a 3D model created in Maya software, the use of Maya script for 3D rendering can simplify operations and improve efficiency.
- a script in the software can be used for 3D rendering, which can be determined according to actual needs, which is not limited by the embodiments of the present disclosure.
- the key points of an active subject can be corresponding to multiple landmark buildings, and according to the key points of the active subject
- the action of driving the plurality of landmark buildings to perform corresponding actions according to the action of the active subject That is, multiple buildings are driven to dance at the same time by, for example, the 3D pose of a person.
- the key points of multiple active subjects are respectively corresponding to multiple landmark buildings, and the multiple landmark buildings are driven according to the multiple actions of the key points of the multiple active subjects.
- the actions of the corresponding active subjects perform corresponding actions respectively, so that the gestures of the plurality of landmark buildings correspond one-to-one with the gestures of the plurality of active subjects. That is, for example, the 3D postures of multiple people can be recognized to drive multiple buildings to dance at the same time.
- the key points of the active subject can be mapped to multiple landmark buildings in sequence according to the order of appearance on each frame of the video, so that in the same video, multiple active subjects drive multiple landmark buildings to dance.
- the video may be recorded in real time by an image capturing device, and each frame of images in the video may be processed in real time, so that the posture of the landmark building corresponds to the posture of the moving subject.
- the moving subject and the landmark building may be located on the same side or on different sides of the image capturing device.
- the human body and the landmark building can be in the same scene, for example, both in the scene captured by the rear camera; the human body and the landmark building can also be in different scenes, for example, the human body is in the front In the scene captured by the rear camera, the landmark building is in the scene captured by the rear camera, which is not limited in the embodiments of the present disclosure.
- the video processing method provided by at least one embodiment of the present disclosure can enhance the interaction between the user and the photographed landmark buildings, increase the fun of photographing, allow the user to obtain a distinctive photographing experience, and urge the user to photograph and record more actively , broaden the application scope of the product and improve the market competitiveness of the product.
- the flow of the video processing methods provided by the foregoing embodiments of the present disclosure may include more or less operations, and these operations may be performed sequentially or in parallel.
- the flow of the video processing method described above includes various operations occurring in a particular order, it should be clearly understood that the order of the various operations is not limited.
- the video processing method described above may be executed once or multiple times according to predetermined conditions.
- FIG. 4 is a system that can be used to implement the video processing method provided by the embodiment of the present disclosure.
- the system 10 may include a user terminal 11 , a network 12 , a server 13 and a database 14 .
- the system 10 can be used to implement the video processing method provided by any embodiment of the present disclosure.
- the user terminal 11 is, for example, a computer 11-1 or a mobile phone 11-2. It can be understood that the user terminal 11 may be any other type of electronic device capable of performing data processing, which may include, but is not limited to, desktop computers, notebook computers, tablet computers, smart phones, smart home devices, wearable devices, in-vehicle electronics equipment, monitoring equipment, etc.
- the user terminal may also be any equipment provided with electronic equipment, such as a vehicle, a robot, and the like.
- the embodiments of the present disclosure do not limit the hardware configuration or software configuration of the user terminal (for example, the type (eg, Windows, MacOS, etc.) or version) of the operating system.
- the user can operate the application program installed on the user terminal 11 or the website logged in on the user terminal 11 , the application program or the website transmits the user behavior data to the server 13 through the network 12 , and the user terminal 11 can also receive the server through the network 12 . 13 Data transferred.
- the user terminal 11 may implement the video processing method provided by the embodiments of the present disclosure by running a subprogram or a subthread.
- the processing unit of the user terminal 11 may be used to execute the video processing method provided by the embodiments of the present disclosure.
- the user terminal 11 may execute the scene switching method by using an application program built in the user terminal 11 .
- the user terminal 11 may execute the video processing method provided by at least one embodiment of the present disclosure by invoking an application program externally stored in the user terminal 11 .
- the user terminal 11 sends the acquired video to the server 13 via the network 12, and the server 13 executes the video processing method.
- the server 13 may execute the video processing method using an application program built into the server.
- the server 13 may execute the video processing method by invoking an application program stored outside the server 13 .
- Network 12 may be a single network, or a combination of at least two different networks.
- the network 12 may include, but is not limited to, one or a combination of a local area network, a wide area network, a public network, a private network, and the like.
- the server 13 may be a single server or a server group, and each server in the group is connected through a wired or wireless network.
- a server farm can be centralized, such as a data center, or distributed.
- Server 13 may be local or remote.
- the database 14 can generally refer to a device with a storage function.
- the database 13 is mainly used to store various data used, generated and outputted from the operation of the user terminal 11 and the server 13 .
- Database 14 may be local, or remote.
- the database 14 may include various memories such as Random Access Memory (RAM), Read Only Memory (ROM), and the like.
- RAM Random Access Memory
- ROM Read Only Memory
- the storage devices mentioned above are just some examples, and the storage devices that can be used by the system are not limited thereto.
- the database 14 may be connected or communicated with the server 13 or a part thereof via the network 12, or directly connected or communicated with the server 13, or a combination of the above two methods.
- database 15 may be a stand-alone device. In other embodiments, the database 15 may also be integrated in at least one of the user terminal 11 and the server 14 . For example, the database 15 may be set on the user terminal 11 or on the server 14 . For another example, the database 15 may also be distributed, a part of which is set on the user terminal 11 and the other part is set on the server 14 .
- a model database may be deployed on database 14 .
- the user terminal 11 accesses the database 14 through the network 12, and obtains the classification model of architectural feature points stored in the database 14 through the network 12 or is used to extract the key points of the human body.
- point neural network model The embodiments of the present disclosure do not limit the type of the database, for example, it may be a relational database or a non-relational database.
- FIG. 5 is a schematic block diagram of a video processing apparatus according to at least one embodiment of the present disclosure.
- the video processing apparatus 100 includes an identification unit 110 , an extraction unit 120 and a driving unit 130 .
- these units may be implemented by hardware (eg, circuit) modules, software modules, or any combination of the two.
- the following embodiments are the same, and will not be repeated here.
- it may be implemented by a central processing unit (CPU), graphics processing unit (GPU), tensor processing unit (TPU), field programmable gate array (FPGA), or other form of data processing capability and/or instruction execution capability.
- CPU central processing unit
- GPU graphics processing unit
- TPU tensor processing unit
- FPGA field programmable gate array
- the identification unit 110 is configured to identify and track landmarks in the video.
- the identifying unit 110 may implement step S110, and reference may be made to the relevant description of step S110 for a specific implementation method, which will not be repeated here.
- the extraction unit 120 is configured to extract and track the key points of the moving subject in the video, and determine the pose of the moving subject according to the extracted information of the key points of the moving subject.
- the extracting unit 120 may implement step S120, and reference may be made to the relevant description of step S120 for the specific implementation method, which will not be repeated here.
- the driving unit 130 is configured to correspond the key points of the moving subject with the landmark buildings, and drive the landmark buildings to perform corresponding actions according to the actions of the key points of the moving subject, so that the gesture of the landmark building in the video screen corresponds to the gesture of the moving subject.
- the driving unit 130 may implement step S130, and reference may be made to the relevant description of step S130 for the specific implementation method, which will not be repeated here.
- FIG. 6 is a schematic block diagram of another video processing apparatus provided by at least one embodiment of the present disclosure.
- the video processing apparatus 100 further includes an image capturing apparatus 140 .
- the image capturing device 140 is configured to record a video in real time, so as to perform real-time processing on each frame of images in the video.
- the image capturing device may be implemented as a camera, or other devices including a CMOS (Complementary Metal Oxide Semiconductor) sensor, a CCD (Charge Coupled Device) sensor, etc., which are not limited by the embodiments of the present disclosure.
- CMOS Complementary Metal Oxide Semiconductor
- CCD Charge Coupled Device
- the moving subject and the landmark building may be located on the same side or on different sides of the image capturing device 140 .
- the human body and the landmark building may be in the same scene, for example, both are located in the scene captured by the rear camera; the human body and the landmark building may also be in different scenes, for example, the human body In the scene captured by the front camera, the landmark building is in the scene captured by the rear camera, which is not limited by the embodiments of the present disclosure.
- the video processing apparatus 100 may include more or less circuits or units, and the connection relationship between the respective circuits or units is not limited, and may be determined according to actual requirements .
- the specific structure of each circuit is not limited, and can be composed of analog devices, digital chips, or other suitable ways according to circuit principles.
- FIG. 7 is a schematic block diagram of still another video processing apparatus provided by at least one embodiment of the present disclosure.
- the video processing apparatus 200 includes a processor 210 , a memory 220 and one or more computer program modules 221 .
- the processor 210 and the memory 220 are connected through a bus system 230 .
- one or more computer program modules 221 are stored in memory 220 .
- one or more computer program modules 221 include instructions for performing the video processing method provided by any of the embodiments of the present disclosure.
- instructions in one or more computer program modules 221 may be executed by processor 210 .
- the bus system 230 may be a common serial or parallel communication bus, etc., which is not limited by the embodiment of the present disclosure.
- the processor 210 may be a central processing unit (CPU), a digital signal processor (DSP), a graphics processing unit (GPU), or other forms of processing units with data processing capabilities and/or instruction execution capabilities, and may be general-purpose processor or special purpose processor, and may control other components in video processing device 200 to perform desired functions.
- CPU central processing unit
- DSP digital signal processor
- GPU graphics processing unit
- Memory 220 may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory.
- the volatile memory may include, for example, random access memory (RAM) and/or cache memory (cache), among others.
- the non-volatile memory may include, for example, read only memory (ROM), hard disk, flash memory, and the like.
- One or more computer program instructions may be stored on the computer-readable storage medium, and the processor 210 may execute the program instructions to implement the functions in the embodiments of the present disclosure (implemented by the processor 210) and/or other desired functions, Such as video processing methods, etc.
- Various application programs and various data can also be stored in the computer-readable storage medium, such as key points of active subjects, characteristic points of landmark buildings, and various data used and/or generated by application programs.
- the embodiment of the present disclosure does not provide all the constituent units of the video processing apparatus 200 .
- those skilled in the art may provide and set other unshown constituent units according to specific needs, which are not limited in the embodiments of the present disclosure.
- FIG. 8 is a schematic structural diagram of an electronic device according to at least one embodiment of the present disclosure.
- Terminal devices in the embodiments of the present disclosure may include, but are not limited to, such as mobile phones, digital cameras, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablets), PMPs (portable multimedia players), vehicle-mounted Mobile terminals such as terminals such as in-vehicle navigation terminals, etc., and stationary terminals such as digital TVs, desktop computers, and the like.
- the electronic device shown in FIG. 8 is only an example, and should not impose any limitation on the function and scope of use of the embodiments of the present disclosure.
- an electronic device 300 includes a processing device (eg, a central processing unit, a graphics processing unit, etc.) 301, which may be based on a program stored in a read only memory (ROM) 302 or from a storage Device 308 loads programs into random access memory (RAM) 303 to perform various appropriate actions and processes.
- ROM read only memory
- RAM random access memory
- various programs and data required for the operation of the computer system are also stored.
- the processing device 301 , the ROM 302 , and the RAM 303 are connected to each other through a bus 304 .
- An input/output (I/O) interface 305 is also connected to bus 304 .
- the following components may be connected to the I/O interface 305: input devices 306 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration
- input devices 306 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration
- a storage device 308 including, for example, a magnetic tape, a hard disk, etc.
- a communication device 309 including a network interface card such as a LAN card, a modem, and the like.
- the communication means 309 may allow the electronic device 300 to perform wireless or wired communication with other devices to exchange data, perform communication processing via a network such as the Internet.
- a drive 310 is also connected to the I/O interface 305 as needed.
- FIG. 8 shows electronic device 300 including various means, it should be understood that not all of the illustrated means are required to be implemented or included. More or fewer devices may alternatively be implemented or included.
- the electronic device 300 may further include a peripheral interface (not shown in the figure) and the like.
- the peripheral interface may be various types of interfaces, such as a USB interface, a lightning interface, and the like.
- the communication device 309 may communicate with a network and other devices by wireless communication, such as the Internet, an intranet and/or a wireless network such as a cellular telephone network, a wireless local area network (LAN) and/or a metropolitan area network (MAN).
- wireless communication such as the Internet, an intranet and/or a wireless network such as a cellular telephone network, a wireless local area network (LAN) and/or a metropolitan area network (MAN).
- LAN wireless local area network
- MAN metropolitan area network
- Wireless communications may use any of a variety of communication standards, protocols and technologies, including but not limited to Global System for Mobile Communications (GSM), Enhanced Data GSM Environment (EDGE), Wideband Code Division Multiple Access (W-CDMA) , Code Division Multiple Access (CDMA), Time Division Multiple Access (TDMA), Bluetooth, Wi-Fi (e.g. based on IEEE 802.11a, IEEE 802.11b, IEEE 802.11g and/or IEEE 802.11n standards), Voice over Internet Protocol (VoIP), Wi-MAX, protocols for email, instant messaging and/or Short Message Service (SMS), or any other suitable communication protocol.
- GSM Global System for Mobile Communications
- EDGE Enhanced Data GSM Environment
- W-CDMA Wideband Code Division Multiple Access
- CDMA Code Division Multiple Access
- TDMA Time Division Multiple Access
- Wi-Fi e.g. based on IEEE 802.11a, IEEE 802.11b, IEEE 802.11g and/or IEEE 802.11n standards
- VoIP Voice over Internet Protocol
- Wi-MAX protocols for email,
- the electronic device may be any device such as a mobile phone, a tablet computer, a notebook computer, an e-book, a game console, a television, a digital photo frame, a navigator, etc., or any combination of electronic devices and hardware. This is not limited.
- embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated in the flowchart.
- the computer program may be downloaded and installed from the network via the communication device 309, or from the storage device 308, or from the ROM 302.
- the processing apparatus 301 When the computer program is executed by the processing apparatus 301, the above-mentioned video processing function defined in the method of the embodiment of the present disclosure is executed.
- the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
- the computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
- a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
- a computer-readable signal medium may include a data signal in baseband or propagated as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
- a computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device .
- Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, electrical wire, optical fiber cable, RF (radio frequency), etc., or any suitable combination of the foregoing.
- the client and server can use any currently known or future developed network protocol such as HTTP (HyperText Transfer Protocol) to communicate, and can communicate with digital data in any form or medium Communication (eg, a communication network) interconnects.
- HTTP HyperText Transfer Protocol
- Examples of communication networks include local area networks (“LAN”), wide area networks (“WAN”), the Internet (eg, the Internet), and peer-to-peer networks (eg, ad hoc peer-to-peer networks), as well as any currently known or future development network of.
- the above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or may exist alone without being assembled into the electronic device.
- the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: acquires at least two Internet Protocol addresses; A node evaluation request for an IP address, the node evaluation device selects an IP address from the at least two IP addresses and returns it; receives the IP address returned by the node evaluation device; the obtained IP address Indicates edge nodes in the content delivery network.
- the above computer-readable medium carries one or more programs, and when the above one or more programs are executed by the electronic device, the electronic device: receives a node evaluation request including at least two Internet Protocol addresses; From the at least two Internet Protocol addresses, the Internet Protocol address is selected; the selected Internet Protocol address is returned; the received Internet Protocol address indicates an edge node in the content distribution network.
- Computer program code for performing operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and This includes conventional procedural programming languages - such as the "C" language or similar programming languages.
- the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server.
- the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through Internet connection).
- LAN local area network
- WAN wide area network
- exemplary types of hardware logic components include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), Systems on Chips (SOCs), Complex Programmable Logical Devices (CPLDs) and more.
- FPGAs Field Programmable Gate Arrays
- ASICs Application Specific Integrated Circuits
- ASSPs Application Specific Standard Products
- SOCs Systems on Chips
- CPLDs Complex Programmable Logical Devices
- a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with the instruction execution system, apparatus or device.
- the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
- Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing.
- machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
- RAM random access memory
- ROM read only memory
- EPROM or flash memory erasable programmable read only memory
- CD-ROM compact disk read only memory
- magnetic storage or any suitable combination of the foregoing.
- FIG. 9 is a schematic diagram of a storage medium provided by at least one embodiment of the present disclosure.
- the storage medium 400 stores computer-readable instructions 401 non-transitory, and when the non-transitory computer-readable instructions are executed by a computer (including a processor), the computer-readable instructions provided by any embodiment of the present disclosure can be executed. video processing method.
- the storage medium may be any combination of one or more computer-readable storage media, such as one computer-readable storage medium containing computer-readable program code for identifying and tracking landmarks in video, and another computer-readable storage medium
- the medium contains computer readable program code to extract and track key points of an active subject in a video.
- the computer can execute the program code stored in the computer storage medium to perform, for example, the video processing method provided by any embodiment of the present disclosure.
- the storage medium may include a memory card of a smartphone, a storage component of a tablet computer, a hard disk of a personal computer, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM), A portable compact disk read only memory (CD-ROM), flash memory, or any combination of the above storage media, may also be other suitable storage media.
- RAM random access memory
- ROM read only memory
- EPROM erasable programmable read only memory
- CD-ROM compact disk read only memory
- flash memory or any combination of the above storage media, may also be other suitable storage media.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Computer Graphics (AREA)
- Computer Hardware Design (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Signal Processing (AREA)
- Image Analysis (AREA)
- Processing Or Creating Images (AREA)
Abstract
Description
Claims (17)
- 一种视频处理方法,所述视频的画面包括地标建筑和活动主体,所述方法包括:识别并跟踪所述视频中的地标建筑;提取并跟踪所述视频中的活动主体的关键点,并根据提取的所述活动主体的关键点的信息确定所述活动主体的姿态;将所述活动主体的关键点与所述地标建筑对应,并根据所述活动主体的关键点的动作驱动所述地标建筑执行相应的动作,以使得所述视频的画面中所述地标建筑的姿态与所述活动主体的姿态对应。
- 根据权利要求1所述的视频处理方法,其中,在根据所述活动主体的关键点的动作驱动所述地标建筑执行相应的动作之前,还包括:在所述视频的画面中抠除所述地标建筑;根据所述视频的画面中位于所述地标建筑周边的像素通过平滑插值算法补全已扣除的所述地标建筑处的背景;将所述地标建筑还原至补全所述背景后的位置处。
- 根据权利要求1所述的视频处理方法,其中,将所述活动主体的关键点与所述地标建筑对应,并根据所述活动主体的关键点的动作驱动所述地标建筑执行相应的动作,包括:将所述活动主体的关键点映射至所述地标建筑上,以将所述活动主体的关键点与所述地标建筑对应,使得所述地标建筑跟随所述活动主体的关键点的动作而执行相应的动作。
- 根据权利要求3所述的视频处理方法,其中,将所述活动主体的脊柱线映射至所述地标建筑的中轴线上。
- 根据权利要求1-4任一所述的视频处理方法,还包括:通过预定义动画素材贴图的方式使得所述地标建筑拟人化,以使得所述地标建筑具有所述活动主体的特征。
- 根据权利要求1-4任一所述的视频处理方法,其中,通过神经网络模型提取所述视频画面中的活动主体的关键点。
- 根据权利要求1-4任一所述的视频处理方法,其中,识别所述视频的画面中的地标建筑,包括:提取所述地标建筑的特征点;将提取的所述地标建筑的特征点与建筑特征点分类模型作匹配,以识别所述地标建筑。
- 根据权利要求1-4任一所述的视频处理方法,其中,跟踪所述视频中的地标建筑和跟踪所述视频中的活动主体的关键点,包括:通过检测所述视频的各帧图像中的所述地标建筑和所述活动主体的关键点,实现对所述地标建筑和所述活动主体的跟踪。
- 根据权利要求1-4任一所述的视频处理方法,其中,在同一个视频中,将所述活动主体的关键点与多个地标建筑对应,并根据所述活动主体的关键点的动作驱动所述多个地标建筑根据所述活动主体的动作执行相应的动作。
- 根据权利要求1-4任一所述的视频处理方法,其中,在同一个视频中,将多个活动主体的关键点分别与多个地标建筑对应,并根据所述多个活动主体的关键点的多个动作分别驱动所述多个地标建筑根据相应的活动主体的动作分别执行对应的动作,以使得所述多个地标建筑的姿态与所述多个活动主体的姿态一一对应。
- 根据权利要求1-4任一所述的视频处理方法,还包括:通过图像拍摄装置实时录制所述视频,并对所述视频中的各帧图像进行实时处理,以使得所述地标建筑的姿态与所述活动主体的姿态对应。
- 根据权利要求1-4任一所述的视频处理方法,其中,所述活动主体为人体。
- 一种视频处理装置,所述视频的画面包括地标建筑和活动主体,所述视频处理装置包括:识别单元,配置为识别并跟踪所述视频中的地标建筑;提取单元,配置为提取并跟踪所述视频中的活动主体的关键点,并根据提取的所述活动主体的关键点的信息确定所述活动主体的姿态;驱动单元,配置为将所述活动主体的关键点与所述地标建筑对应,并根据所述活动主体的关键点的动作驱动所述地标建筑执行相应的动作,以使得所述视频的画面中所述地标建筑的姿态与所述活动主体的姿态对应。
- 根据权利要求13所述的视频处理装置,还包括:图像拍摄装置,配置为实时录制所述视频,以用于对所述视频中的各帧图像进行实时处理。
- 根据权利要求14所述的视频处理装置,其中,所述活动主体和所述地标建筑位于所述图像拍摄装置的同一侧或不同侧。
- 一种视频处理装置,包括:处理器;存储器;一个或多个计算机程序模块,其中,所述一个或多个计算机程序模块被存储在所述 存储器中并被配置为由所述处理器执行,所述一个或多个计算机程序模块包括用于执行实现权利要求1-12任一所述的视频处理方法的指令。
- 一种存储介质,非暂时性地存储计算机可读指令,当所述计算机可读指令由计算机执行时可以执行根据权利要求1-12任一所述的视频处理方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011177198.2A CN112308977B (zh) | 2020-10-29 | 2020-10-29 | 视频处理方法、视频处理装置和存储介质 |
CN202011177198.2 | 2020-10-29 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022088819A1 true WO2022088819A1 (zh) | 2022-05-05 |
Family
ID=74330513
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/110560 WO2022088819A1 (zh) | 2020-10-29 | 2021-08-04 | 视频处理方法、视频处理装置和存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112308977B (zh) |
WO (1) | WO2022088819A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114926508A (zh) * | 2022-07-21 | 2022-08-19 | 深圳市海清视讯科技有限公司 | 视野分界线确定方法、装置、设备及存储介质 |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112308977B (zh) * | 2020-10-29 | 2024-04-16 | 字节跳动有限公司 | 视频处理方法、视频处理装置和存储介质 |
CN112733593A (zh) * | 2021-03-18 | 2021-04-30 | 成都中科大旗软件股份有限公司 | 基于图像位置实现图像信息识别的方法及系统 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108197534A (zh) * | 2017-12-19 | 2018-06-22 | 迈巨(深圳)科技有限公司 | 一种人头部姿态检测方法、电子设备及存储介质 |
CN110139115A (zh) * | 2019-04-30 | 2019-08-16 | 广州虎牙信息科技有限公司 | 基于关键点的虚拟形象姿态控制方法、装置及电子设备 |
US10620713B1 (en) * | 2019-06-05 | 2020-04-14 | NEX Team Inc. | Methods and systems for touchless control with a mobile device |
CN111107278A (zh) * | 2018-10-26 | 2020-05-05 | 北京微播视界科技有限公司 | 图像处理方法、装置、电子设备及可读存储介质 |
CN112308977A (zh) * | 2020-10-29 | 2021-02-02 | 字节跳动有限公司 | 视频处理方法、视频处理装置和存储介质 |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103956128A (zh) * | 2014-05-09 | 2014-07-30 | 东华大学 | 基于体感技术的智能主动广告平台 |
US20180005015A1 (en) * | 2016-07-01 | 2018-01-04 | Vangogh Imaging, Inc. | Sparse simultaneous localization and matching with unified tracking |
CN107610041B (zh) * | 2017-08-16 | 2020-10-27 | 南京华捷艾米软件科技有限公司 | 基于3d体感摄像头的视频人像抠图方法及系统 |
CN109918975B (zh) * | 2017-12-13 | 2022-10-21 | 腾讯科技(深圳)有限公司 | 一种增强现实的处理方法、对象识别的方法及终端 |
WO2019216593A1 (en) * | 2018-05-11 | 2019-11-14 | Samsung Electronics Co., Ltd. | Method and apparatus for pose processing |
CN109829451B (zh) * | 2019-03-22 | 2021-08-24 | 京东方科技集团股份有限公司 | 生物体动作识别方法、装置、服务器及存储介质 |
CN111582208B (zh) * | 2020-05-13 | 2023-07-21 | 抖音视界有限公司 | 用于生成生物体姿态关键点信息的方法和装置 |
CN111639612A (zh) * | 2020-06-04 | 2020-09-08 | 浙江商汤科技开发有限公司 | 姿态矫正的方法、装置、电子设备及存储介质 |
CN111639615B (zh) * | 2020-06-05 | 2023-09-19 | 上海商汤智能科技有限公司 | 一种虚拟建筑物的触发控制方法及装置 |
CN111638797A (zh) * | 2020-06-07 | 2020-09-08 | 浙江商汤科技开发有限公司 | 一种展示控制方法及装置 |
-
2020
- 2020-10-29 CN CN202011177198.2A patent/CN112308977B/zh active Active
-
2021
- 2021-08-04 WO PCT/CN2021/110560 patent/WO2022088819A1/zh active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108197534A (zh) * | 2017-12-19 | 2018-06-22 | 迈巨(深圳)科技有限公司 | 一种人头部姿态检测方法、电子设备及存储介质 |
CN111107278A (zh) * | 2018-10-26 | 2020-05-05 | 北京微播视界科技有限公司 | 图像处理方法、装置、电子设备及可读存储介质 |
CN110139115A (zh) * | 2019-04-30 | 2019-08-16 | 广州虎牙信息科技有限公司 | 基于关键点的虚拟形象姿态控制方法、装置及电子设备 |
US10620713B1 (en) * | 2019-06-05 | 2020-04-14 | NEX Team Inc. | Methods and systems for touchless control with a mobile device |
CN112308977A (zh) * | 2020-10-29 | 2021-02-02 | 字节跳动有限公司 | 视频处理方法、视频处理装置和存储介质 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114926508A (zh) * | 2022-07-21 | 2022-08-19 | 深圳市海清视讯科技有限公司 | 视野分界线确定方法、装置、设备及存储介质 |
CN114926508B (zh) * | 2022-07-21 | 2022-11-25 | 深圳市海清视讯科技有限公司 | 视野分界线确定方法、装置、设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN112308977A (zh) | 2021-02-02 |
CN112308977B (zh) | 2024-04-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10997787B2 (en) | 3D hand shape and pose estimation | |
JP6824433B2 (ja) | カメラ姿勢情報の決定方法、決定装置、モバイル端末及びコンピュータプログラム | |
US11830118B2 (en) | Virtual clothing try-on | |
CN110276840B (zh) | 多虚拟角色的控制方法、装置、设备及存储介质 | |
WO2022088819A1 (zh) | 视频处理方法、视频处理装置和存储介质 | |
WO2023070021A1 (en) | Mirror-based augmented reality experience | |
EP4248407A1 (en) | Real-time motion transfer for prosthetic limbs | |
US11688136B2 (en) | 3D object model reconstruction from 2D images | |
US20230267687A1 (en) | 3d object model reconstruction from 2d images | |
EP4143787A1 (en) | Photometric-based 3d object modeling | |
WO2023168957A1 (zh) | 姿态确定方法、装置、电子设备、存储介质及程序 | |
CN113705520A (zh) | 动作捕捉方法、装置及服务器 | |
WO2023087758A1 (zh) | 定位方法、定位装置、计算机可读存储介质和计算机程序产品 | |
EP4222961A1 (en) | Method, system and computer-readable storage medium for image animation | |
KR20240125620A (ko) | 실시간 상체 의복 교환 | |
WO2024055748A1 (zh) | 一种头部姿态估计方法、装置、设备以及存储介质 | |
KR20240128015A (ko) | 실시간 의복 교환 | |
KR20240125621A (ko) | 실시간 모션 및 외관 전달 | |
US20240071008A1 (en) | Generating immersive augmented reality experiences from existing images and videos | |
US20240288946A1 (en) | Shared augmented reality eyewear device with hand tracking alignment | |
WO2023226578A1 (zh) | 手掌轮廓提取方法、控制指令生成方法和装置 | |
CN117930978A (zh) | 一种博物馆文物ar交互方法、系统、设备与存储介质 | |
CN113160365A (zh) | 图像处理方法、装置、设备和计算机可读存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21884554 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 18251182 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 16.08.2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21884554 Country of ref document: EP Kind code of ref document: A1 |