CN111416991B

CN111416991B - Special effect processing method and apparatus, and storage medium

Info

Publication number: CN111416991B
Application number: CN202010350461.7A
Authority: CN
Inventors: 彭冬炜
Original assignee: Oppo Chongqing Intelligent Technology Co Ltd
Current assignee: Oppo Chongqing Intelligent Technology Co Ltd
Priority date: 2020-04-28
Filing date: 2020-04-28
Publication date: 2022-08-05
Anticipated expiration: 2040-04-28
Also published as: CN111416991A

Abstract

The embodiment of the application discloses a special effect processing method, special effect processing equipment and a storage medium, wherein the special effect processing method comprises the following steps: extracting a first key frame from the first static frame class, and determining scene information and semantic identification information corresponding to a first object in the first key frame; the first static frame class comprises at least one frame image with the same type in the video to be processed; if the scene information is a target scene, determining attitude information corresponding to the first object; determining a target special effect corresponding to the first static frame type according to the attitude information and the semantic identification information; carrying out special effect processing on each frame of image in the first static frame class based on the target special effect to generate a special effect static frame class corresponding to the first static frame class; and traversing a plurality of static frame classes corresponding to the video to be processed until a plurality of special effect static frame classes corresponding to the plurality of static frame classes are obtained, and generating the special effect video based on the plurality of special effect static frame classes.

Description

Special effect processing method and apparatus, and storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a special effect processing method and apparatus, and a storage medium.

Background

With the continuous development of computer and internet technologies, video technologies are increasingly applied to daily life and work of people, so that the requirements of people on video technologies are increasingly increased, a single video display effect cannot meet the requirements of users, more and more users expect that video images can meet the diversified requirements of the users, and therefore, video special effect technologies are produced at the same time, various special effects are added into videos, and the video special effect technologies enable the video contents to be richer and the effects to be more vivid.

However, the existing method for adding special effects to a video is still in the stage of artificial adding special effect processing, that is, special effects are artificially added to specific actions in the video according to user requirements, which is time-consuming and labor-consuming, and because the special effects added to each frame of video data are not exactly matched when the special effects are artificially added, a jumping and abrupt situation easily occurs, the special effect processing effect is poor, and the intelligence is low.

Disclosure of Invention

The embodiment of the application provides a special effect processing method and equipment and a storage medium, which can realize intelligent video special effect addition and meet diversified requirements of users.

The technical scheme of the embodiment of the application is realized as follows:

in a first aspect, an embodiment of the present application provides a special effect processing method, where the method includes:

extracting a first key frame from a first static frame class, and determining scene information and semantic identification information corresponding to a first object in the first key frame; wherein the first static frame class comprises at least one frame image with the same type in the video to be processed;

if the scene information is a target scene, determining attitude information corresponding to the first object;

determining a target special effect corresponding to the first static frame type according to the attitude information and the semantic identification information;

carrying out special effect processing on each frame of image in the first static frame class based on the target special effect to generate a special effect static frame class corresponding to the first static frame class;

traversing a plurality of static frame classes corresponding to the video to be processed until a plurality of special effect static frame classes corresponding to the plurality of static frame classes are obtained, and generating a special effect video based on the plurality of special effect static frame classes.

In a second aspect, an embodiment of the present application provides an effect processing apparatus, including: an extraction unit, a determination unit, a processing unit, an acquisition unit and a generation unit,

the extracting unit is used for extracting a first key frame from a first static frame class;

the determining unit is configured to determine scene information and semantic identification information corresponding to a first object in the first key frame; wherein the static frame class comprises at least one frame image with the same type in the video to be processed; if the scene information is a target scene, determining attitude information corresponding to the first object; determining a target special effect corresponding to the first static frame type according to the posture and the semantic identification information;

the processing unit is configured to perform special effect processing on each frame of image in the first static frame class based on the target special effect, and generate a special effect static frame class corresponding to the first static frame class;

the acquiring unit is used for traversing a plurality of static frame classes corresponding to the video to be processed until a plurality of special effect static frame classes corresponding to the static frame classes are obtained;

the generating unit is used for generating a special effect video based on the plurality of special effect static frame classes.

In a third aspect, an embodiment of the present application provides an apparatus for special effect processing, where the apparatus includes a processor, and a memory storing instructions executable by the processor, and when the instructions are executed by the processor, the method for special effect processing is implemented.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, on which a program is stored, and the program is applied to a special effect processing device, and when the program is executed by a processor, the program implements the special effect processing method described above.

The embodiment of the application provides a special effect processing method and equipment, and a storage medium, wherein the special effect processing equipment can extract a first key frame from a first static frame class and determine scene information and semantic identification information corresponding to a first object in the first key frame; the first static frame class comprises at least one frame image with the same type in the video to be processed; if the scene information is a target scene, determining attitude information corresponding to the first object; determining a target special effect corresponding to the first static frame type according to the attitude information and the semantic identification information; carrying out special effect processing on each frame of image in the first static frame class based on the target special effect to generate a special effect static frame class corresponding to the first static frame class; and traversing a plurality of static frame classes corresponding to the video to be processed until a plurality of special effect static frame classes corresponding to the plurality of static frame classes are obtained, and generating the special effect video based on the plurality of special effect static frame classes. That is to say, in the embodiment of the present application, the special effect processing device performs recognition processing on the key frame in the static frame class to obtain the pose information and the semantic identification information corresponding to the key frame, and automatically matches and adds a special effect to each frame of image in the static frame class based on the pose information and the semantic identification information, so as to complete special effect processing of a video to be processed, implement intelligent video special effect addition, and meet diversified requirements of users.

Drawings

Fig. 1 is a first schematic structural diagram of a special effect processing apparatus according to an embodiment of the present application;

fig. 2 is a first schematic flow chart illustrating an implementation process of the special effect processing method according to the embodiment of the present application;

fig. 3 is a schematic diagram of a gesture recognition processing procedure according to an embodiment of the present application;

fig. 4 is a schematic view of a second implementation flow of the special effect processing method according to the embodiment of the present application;

fig. 5 is a schematic flow chart illustrating an implementation process of the special effect processing method according to the embodiment of the present application;

fig. 6 is a schematic structural diagram of a second specific-effect processing apparatus according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a third configuration of special effect processing equipment according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It is to be understood that the specific embodiments described herein are illustrative of the relevant application and are not limiting of the application. It should be noted that, for the convenience of description, only the parts related to the related applications are shown in the drawings.

With the continuous development of computer and internet technologies, video technologies are increasingly applied to daily life and work of people, so that the requirements of people on the video technologies are increasingly increased, a single video display effect cannot meet the requirements of users, more and more users expect video images to meet the diversified requirements of the users, and therefore the video special effect technology is developed accordingly. Particularly, when editing of short videos is more and more popular, adding various special effects into the videos becomes a processing mode of the current popular images or videos, and therefore the video content is richer and the effect is more vivid through a video special effect technology.

However, with the increasing demand of adding special effects to videos, the existing method for adding special effects to videos is still in the stage of artificial special effect adding processing, that is, adding special effects to specific actions in videos artificially according to the demand of users takes time and labor, and because the special effects added to each frame of video data are not exactly matched when the special effects are added artificially, a sudden situation of jumping easily occurs, the special effect processing effect is poor, and the intelligence is low.

In order to solve the problems of a conventional special effect adding mechanism, the embodiment of the application provides a special effect processing method and device, and a storage medium, and specifically, the special effect processing device performs recognition processing on an image frame of a video to be processed to acquire attitude information and semantic identification information corresponding to an object to be processed in the image frame, and automatically matches and adds a special effect to the object to be processed based on the attitude information and the semantic identification information to complete special effect processing of the video to be processed, so that intelligent video special effect adding is realized, and diversified requirements of users are met.

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

An embodiment of the present application provides a special effect processing method, which is applied to a special effect processing device 10 shown in fig. 1, where fig. 1 is a schematic structural diagram of a composition of the special effect processing device provided in the embodiment of the present application, and as shown in fig. 1, in the embodiment of the present application, the special effect processing device 10 includes a key frame detection module 11, a semantic analysis module 12, a discrimination module 13, a gesture recognition module 14, a special effect addition module 15, a video coding module 16, a user scoring module 17, and an enhanced learning module 18. The key frame detection module 11 is mainly used for extracting a key frame corresponding to a video to be processed; the semantic analysis module 12 is mainly configured to determine scene information and semantic identification information corresponding to an object to be processed in a key frame; the judging module 13 is mainly used for determining whether the scene information is a target scene; the gesture recognition module 14 is mainly used for recognizing gesture information corresponding to the object to be processed when the scene information is a target scene; the special effect adding module 15 is mainly used for automatically matching and adding special effects to the recognized different postures according to the posture information and the semantic identification information corresponding to the object to be processed; the video coding module 16 is mainly used for coding the special effect video frame added with the special effect to further generate a special effect video; the user scoring module 17 is mainly used for obtaining a feedback result corresponding to the special effect video; the reinforcement learning module 18 is mainly configured to adjust corresponding special effect matching parameters according to the feedback result.

Fig. 2 is a first schematic flow chart of an implementation process of a special effect processing method provided in an embodiment of the present application, and as shown in fig. 2, in the embodiment of the present application, a special effect processing method executed by a special effect processing device may include the following steps:

step 101, extracting a first key frame from a first static frame class, and determining scene information and semantic identification information corresponding to a first object in the first key frame; the static frame class comprises at least one frame image with the same type in the video to be processed.

In an embodiment of the present application, the special effect processing device may extract a first key frame from a first static frame class, and further determine scene information and semantic identification information corresponding to a first object in the first key frame; wherein the static frame class comprises at least one frame image with the same type in the video to be processed.

It should be noted that, in the embodiment of the present application, the special effect processing device may be a tablet computer, a mobile phone, a Personal Computer (PC), a notebook computer, a wearable device, and the like, and the device to which the special effect processing method is applied is not specifically limited in the embodiment of the present application. In a specific implementation process, the special effect processing method in the embodiment of the present application may be executed by one device alone, or may be executed by multiple devices in cooperation.

It should be noted that, in the embodiment of the present application, the video to be processed may be a video stored in the terminal; or a video sent by a server; or the video shot by the camera; specifically, a video file in a specified format can be read from a specified path by accessing a memory of the terminal; or receiving videos sent by other terminals; or acquiring videos from corresponding websites; and after the camera application is started, a camera of the intelligent terminal is used to obtain the video shot by the camera. In this embodiment, the manner of obtaining the video to be processed is not particularly limited.

Further, in the embodiment of the present application, since there are too many similar static frames in the video to be processed, in order to remove information redundancy, the special effect processing apparatus needs to extract a key frame from all the static frames corresponding to the video to be processed, that is, extract a frame related to an action as a key frame. The key frames may be frames of different scenes, or frames of different perspectives or different poses.

Specifically, when extracting a key frame corresponding to a video to be processed, the special effect processing device may classify all static frames in the video to be processed based on a preset classification algorithm to obtain a plurality of static frame classes corresponding to the video to be processed; the preset classification algorithm is used for realizing clustering of similar static frames; further, the special effect processing device extracts a first key frame from a first static frame class of the plurality of static frame classes, one key frame corresponding to one static frame class.

Optionally, in an embodiment of the present application, when clustering of similar static frames is implemented, the preset classification algorithm may be any one of algorithms for implementing image clustering, such as K-Means (K-Means) clustering, hierarchical clustering, or spectral clustering, and the preset classification algorithm is not specifically limited in the embodiment of the present application. Further, the special effect processing apparatus may calculate an image with the largest entropy value in each of the still frame classes, and since the image frame with the largest entropy value in the still frame class is most discriminative as compared with other still frame classes, the image frame with the largest entropy value is taken as the key frame.

Further, in the embodiment of the present application, the special effect processing device may extract a first key frame from all key frames corresponding to all static frame classes corresponding to the video to be processed, detect and identify an object to be processed for the first key frame, and if any one object in the preset object library exists in the first key frame, perform acquisition processing of scene information and semantic identification information in the first key frame, thereby determining scene information and semantic identification information corresponding to the first object in the first key frame.

Specifically, in the embodiment of the present application, the special effect processing device may recognize scene information and semantic identification information corresponding to the first object in the first key frame, and it can be understood that the scene information is used to characterize to which scene category an action performed by the object to be processed in the key frame belongs, for example: in the video to be processed, if the character A dances, the scene information corresponding to the dancing of the character A is a dancing scene; if the character A moves, the scene information corresponding to the motion of the character A is a motion scene; the semantic identification information is to perform more detailed classification on the scene information by using semantic annotation, for example: the scene information corresponding to the character A is dance, and the semantic identification information is the type (ballet, street dance, etc.) of dancing of the character A.

It should be noted that, in an embodiment of the present application, in practical applications, the special effect processing apparatus may perform selection of the specified target, and the first object may be one or more specified targets identified from the first key frame. Specifically, the designated target may be a human, an animal, a plant, or other specific object in nature.

It can be understood that, in the embodiment of the present application, since there may not be an object to be processed in a portion of a key frame corresponding to a video to be processed, that is, there may not be a specified target, the special effect processing apparatus may preset an object library, where the preset object library includes all possible specified targets, for example: when detecting and identifying an object to be processed, the special effect processing device may determine whether the detected object is a designated target according to a pre-stored object library, and if so, the special effect processing device continues to perform scene identification and semantic labeling on the identified object, thereby obtaining scene information and semantic identification information corresponding to the first object in the first key frame.

Optionally, in an embodiment of the present application, the special effect processing device may input the key frame into a preset semantic analysis model established based on machine learning, so as to obtain scene information and semantic identification information corresponding to an object to be processed in the key frame, where the preset semantic analysis model may be a discrimination model based on a classification method or a generation model based on a probability method.

Illustratively, in the embodiment of the present application, a video to be processed is acquired from a local storage, a designated target person in an object library is preset, a key frame is extracted from the video to be processed, and a semantic analysis model is input for recognition processing, and if a person exists in the key frame, scene information and semantic identification information corresponding to the person a can be further obtained; if no person is present in the key frame, then scene recognition and semantic annotation are not required. Assuming that the video to be processed is a dance video of a character A, and the designated target in the preset object library is a person, the scene information can be a dance scene of the character, and the semantic identification information is a dance type (street dance, ballet, and the like) corresponding to the character A; assuming a sports game video, the scene information is a character sports scene, and the semantic identification information may be a sports category (running, long jump, etc.).

Further, in the embodiment of the present application, after the special effect processing device extracts the first key frame from the first static frame class and determines the scene information and the semantic identification information corresponding to the first object in the first key frame, the special effect processing device may further determine the scene information thereof, and determine whether the scene information thereof is the target scene.

And 102, if the scene information is the target scene, determining the posture information corresponding to the first object.

In an embodiment of the present application, after the special effect processing device determines the scene information and the semantic identification information corresponding to the first object in the first key frame, if it is determined that the scene information is the target scene, the special effect processing device may further perform gesture recognition processing on the first object, and determine gesture information corresponding to the first object.

Optionally, in an embodiment of the present application, the special effect processing device may preset a target scene. The target scene may include one or more of a dance scene, a motion scene, a meeting scene, and so on. Assuming that the target scene is a dance scene of a human being, if the scene information corresponding to the first object in the first key frame is the dance scene of the human being, the scene information corresponding to the first object is represented as the target scene; if the scene information corresponding to the first object is the human reading scene, the scene information corresponding to the first object is not the target scene.

Further, in the embodiment of the present application, the special effect processing device sequentially processes the key frames, and if the scene information corresponding to the first object in the first key frame does not belong to the target scene, the special effect processing device continues to extract the second key frame from the second static frame class, and determines the scene information and the semantic identification information corresponding to the second object in the second key frame; and when the scene information corresponding to the second object is the target scene, continuing to execute the special effect processing flow on the second static frame class.

It should be noted that, in the embodiment of the present application, corresponding objects to be processed in different key frames may be the same or different, that is, a first object in a first key frame and a second object in a second key frame may be the same or different. For example, a piece of dance video, first A, B, C three people jumping, the first case is that D, E two people are added later in a particular action, then the first object in the first key frame is A, B, C three people, and the second object in the second key frame is A, B, C, D, E five people, the second object is different from the first object, and the second object contains the first object; the second case is if the dance was changed to another three people D, E, F in a certain action, then the first object in the first key frame is A, B, C three people, and the second object in the second key frame is D, E, F three people, the second object is not the same as the first object; the third case is that the whole dance video is A, B, C three people dancing, and the objects to be processed are the same in the different key frames corresponding to the dance video.

Further, in an embodiment of the present application, if the current scene information is the target scene, the special effect processing device may further determine the posture information corresponding to the first object. Specifically, the special effect processing device may perform gesture recognition processing on the first object to obtain gesture feature data corresponding to the first object, and then determine gesture information corresponding to the first object according to the gesture feature data and a pre-stored gesture library. Specifically, the pose information is used to characterize a specific action situation performed by the first object in the first key frame in the target scene, and when the target is designated as a human, the pose information may be a gesture, an expression, an action, a body shape, and the like of the human. For example, in a dancing scene, if the first object performs a jumping motion, the posture information is jumping; if the first object performs a rotating motion, the pose information is the rotation.

Specifically, the special effect processing device may prestore a gesture library, where the gesture library includes all gesture categories and attributes corresponding to the object to be processed. The special effect processing device may perform pose estimation on the first object, find all key points (for example, a head, a hand, a knee, and the like) corresponding to the first object, that is, pose feature data corresponding to the first object, then match the pose feature data in a pre-stored pose library, and select a target pose corresponding to the first object. Exemplarily, fig. 3 is a schematic diagram of a gesture recognition processing process provided by the embodiment of the present application, and as shown in fig. 3, the special effect processing device may perform gesture recognition processing on the object to be processed to find out all key points corresponding to the first object, such as a first key point (arm), a second key point (head), an nth key point (leg), and all key points of the first object, where coordinate data corresponding to each key point is gesture feature data, and after a plurality of key points are assembled, the gesture information corresponding to the object to be processed is obtained, that is, each gesture category in the pre-stored gesture library has corresponding coordinate data, and the coordinate data corresponding to the key points are matched in the pre-stored gesture library, so as to obtain the gesture information corresponding to the object to be processed.

It should be noted that, in the embodiment of the present application, if the first object is a plurality of designated objects, and the posture information of each person is different, for example, the first object includes A, B, C three persons, a is kicking, B is jumping, C is waving, the posture recognition processing needs to be performed on the three persons, and after the posture matching is performed in the pre-stored posture library, the posture information corresponding to each person can be selected.

Further, in an embodiment of the present application, after the special effect processing device determines that the scene information corresponding to the first object in the first keyframe is a target scene, and thus determines the pose information corresponding to the first object, the special effect processing device may further match a corresponding target special effect for the pose information.

And 103, determining a target special effect corresponding to the first static frame according to the posture and the semantic identification information.

In an embodiment of the application, after the special effect processing device determines the posture information corresponding to the first object, the special effect processing device may further determine the target special effect corresponding to the first static frame class according to the posture information and the semantic information corresponding to the first object.

Optionally, in an embodiment of the present application, the special effect processing device may obtain a special effect library corresponding to the target scene, and then select the target special effect from the special effect library according to the posture information and the semantic identification information. Specifically, the special effect processing device is preset with a plurality of special effect libraries corresponding to different scenes, for example, the special effect processing device may be preset with a special effect library corresponding to a character dance scene, and the special effect library corresponding to the character dance scene stores a plurality of special effects corresponding to a plurality of dance postures; the special effect processing equipment can also preset a special effect library corresponding to a character motion scene, and various special effects corresponding to various motion postures are stored in the special effect library corresponding to the character motion scene.

Further, in the embodiment of the present application, since the first object may correspond to the same pose information and different semantic identification information, in the special effect library corresponding to the target scene, the same pose may correspond to multiple special effects, and when the special effect processing device matches a corresponding target special effect for the pose information, the special effect processing device may further select the target special effect corresponding to the pose information from the special effect library based on the pose information and the semantic identification information.

For example, assuming that a target scene corresponding to a first object is a character dance scene, and a target scene corresponding to a second object is a character motion scene, when gesture information corresponding to the first object is the same as gesture information corresponding to the second object, because two different target scenes correspond to different special effect libraries, target special effects corresponding to the first object and the second object may not be the same; on the other hand, assuming that the target scenes are all object dance scenes and the posture information is all jumping, if the semantic identification information corresponding to the first object is ballet and the second object is street dance, the target special effects corresponding to the first object and the second object may be different.

Optionally, in an embodiment of the present application, the gesture information includes a gesture category and a gesture attribute, specifically, the gesture category is a gesture, an expression, an action, a body shape, and the like of a person, for example: various actions such as lifting the hands, forking the waist, guiding, leveling the hands, squatting down, jumping, kicking the left foot forwards and the like; and the gesture attribute is to distinguish different gesture categories such as dynamic gesture and static gesture, for example: the posture attributes corresponding to the actions of lifting the hands, bifurcating, guiding, leveling the hands, etc. are static postures, and the posture attributes corresponding to the actions of squatting, jumping, kicking the left foot forward, etc. are dynamic postures.

Optionally, in an embodiment of the present application, the special effect processing device may determine the target special effect corresponding to the first object according to the posture category, the posture attribute, and the semantic identification information. Specifically, in the embodiment of the present application, gestures with different gesture attributes and the same gesture category and semantic identification information may correspond to different target special effects, for example, an object to be processed in a first key frame is a dynamic cardioid, and a cardioid is placed in a second key frame, where the gesture information and the semantic identification corresponding to the object to be processed are the same, but the target special effects corresponding to the dynamic cardioid and the static cardioid are not the same.

Further, in an embodiment of the present application, different special effects correspond to different special effect matching parameters, and when the special effect processing device selects a target special effect from the special effect library by combining the posture information and the semantic identification information, the special effect processing device may first determine a first matching parameter corresponding to the posture information and a second matching parameter corresponding to the semantic identification information, and further select the target special effect corresponding to the posture information from the special effect library according to the first matching parameter and the second matching parameter.

Optionally, in an embodiment of the present application, when the target special effect corresponding to the pose information is selected from the special effect library according to the first matching parameter and the second matching parameter, a special effect with the highest matching degree in the special effect library is used as the target special effect corresponding to the first object in the first keyframe. Specifically, the same posture information in the special effect library may correspond to a plurality of different special effects, in order to determine a target special effect with the best special effect, the special effect processing device may perform calculation processing on first matching parameters and second matching parameters corresponding to the different special effects according to a preset algorithm (for example, a maximum matching method), obtain a matching value corresponding to the special effect, and use the special effect with the maximum matching value as the target special effect corresponding to the first object in the first key frame.

Optionally, in an embodiment of the present application, the special effect processing device may input the posture information corresponding to the first object into a preset special effect matching model based on a preset special effect matching model established by machine learning, so as to obtain a target special effect corresponding to the posture information.

Further, in the embodiment of the present application, since the first key frame is extracted from the first static frame class, and the first static frame class is at least one image of the same type in the video to be processed, that is, the first static frame class includes an image frame of the same type as the key frame, that is, includes the same first object. Then, it can be considered that the target special effect corresponding to the first object in the first key frame is determined, that is, the target special effect corresponding to the first object in the first static frame class is determined, and therefore, the special effect corresponding to the first key frame is taken as the target special effect corresponding to the first static frame class.

Further, in the embodiment of the present application, after determining the target special effect corresponding to the first static frame class according to the posture and the semantic identification information, the special effect processing device may further perform special effect processing on the first static frame class based on the target special effect.

And 104, performing special effect processing on each frame of image in the first static frame class based on the target special effect to generate a special effect static frame class corresponding to the first static frame class.

In an embodiment of the application, after determining a target special effect corresponding to the first static frame class according to the posture and the semantic identification information, the special effect processing device may further perform special effect processing on each frame of image in the first static frame class based on the target special effect to generate a special effect static frame class.

Specifically, in the embodiment of the present application, the special effect processing device may perform special effect processing on all image frames in the first static frame class through the target special effect to obtain a special effect static frame class corresponding to the first static frame class. It can be understood that, in the embodiment of the present application, since the possible motion ranges of different to-be-processed objects are different in size, the special effect processing apparatus needs to adjust the application range corresponding to the target special effect according to the motion range of the to-be-processed object, that is, the target special effect may be more consistent with the pose information of different motion ranges.

Optionally, in an embodiment of the present application, the coordinate values of the key points corresponding to the object to be processed, that is, the gesture feature data, may determine not only the gesture category and the attribute, but also the action range corresponding to the object to be processed according to the gesture feature data.

For example, the objects to be processed are A, B, C three persons, the current dance motions of the three persons are all circling motions in ballet, but because the sizes of different human body lattices are different, the sizes of motion ranges corresponding to the posture information are different, so that the same circling special effect needs to be adjusted according to the size of the motion range of the objects to be processed, and the matching degree of the target special effect is higher.

Further, after the special effect processing device performs special effect processing on each frame image in the static frame class based on the target special effect to generate a special effect static frame class, the special effect processing device may traverse all the static frame classes corresponding to the video to be processed to generate a plurality of special effect static frame classes, and further generate a special effect video based on the plurality of special effect static frame classes.

And 105, traversing a plurality of static frame classes corresponding to the video to be processed until a plurality of special effect static frame classes corresponding to the plurality of static frame classes are obtained, and generating the special effect video based on the plurality of special effect static frame classes.

In an embodiment of the application, after the special effect processing device performs special effect processing on each frame of image in the first static frame class based on the target special effect to generate the special effect static frame class, the special effect processing device may further traverse a plurality of static frame classes corresponding to the video to be processed until a plurality of special effect static frame classes corresponding to the plurality of static frame classes are obtained, and generate the special effect video based on the plurality of special effect static frame classes.

Specifically, in the embodiment of the present application, the special effect processing device may perform special effect processing on all image frames in the first static frame class through a target special effect to obtain a special effect static frame, and after the special effect static frame class corresponding to the first static frame class is obtained, the special effect processing device sequentially performs special effect processing on the second static frame class, the third static frame class, and the like by using the same special effect processing method until a plurality of special effect static frame classes corresponding to the video to be processed are obtained, so that the corresponding special effect video may be generated based on the plurality of special effect static frame classes.

It should be noted that, in the embodiment of the present application, the multiple special effect static frame classes may be all special effect static frame classes corresponding to all static frame classes in the video to be processed, or may be partial special effect static frame classes corresponding to partial static frame classes in the video to be processed.

Specifically, after traversing all the static frame classes corresponding to the video to be processed, if the scene information corresponding to the object to be processed in all the static frame classes is the target scene, the special effect processing device performs special effect processing on all the static frame classes corresponding to the video to be processed, so as to obtain all the special effect static frame classes corresponding to all the static frame classes, and generates a special effect video based on all the special effect static frame classes; after the special effect processing device traverses all the static frame classes corresponding to the video to be processed, if scene information corresponding to the object to be processed does not belong to the target scene in all the static frame classes, the special effect processing device does not perform special effect processing on the static frame classes not belonging to the target scene, and the original image frame is kept.

For example, in the embodiment of the present application, if the first object first makes a motion for a period of time and then dances, it may be recognized that the scene information corresponding to the first object is a character motion scene and then a character dance scene, and assuming that the target scene specified by the special effect processing device is a character dance scene, when the object to be processed moves, none of the recognized scene information belongs to the target scene, and only when the object to be processed dances, the recognized scene information belongs to the target scene. Because the special effect processing device processes the key frames corresponding to the video to be processed in sequence, if the scene information corresponding to the first object in the first key frame does not belong to the target scene, all the image frames in the first static frame class corresponding to the first key frame do not carry out any special effect processing, and the original image frames are kept, the special effect processing device continues to extract the next key frame from the next static frame class, determines the scene information and the semantic identification information corresponding to the object to be processed in the next key frame, and executes a special effect processing flow on the next static frame class when the scene information is the target scene.

Further, in the embodiment of the application, after all the video frames to which the special effect is added corresponding to the video to be processed are obtained, all the image frames to which the special effect is not added and all the image frames to which the special effect is added are encoded, so that the special effect video corresponding to the video to be processed is generated. Optionally, the video encoding method may be a Moving Picture Experts Group (MPEG) series or an h.26x series, and the video encoding method is not specifically limited in the present application, so that a special effect video is obtained after encoding.

The embodiment of the application provides a special effect processing method, wherein special effect processing equipment can extract a first key frame from a first static frame class and determine scene information and semantic identification information corresponding to a first object in the first key frame; the first static frame class comprises at least one frame image with the same type in the video to be processed; if the scene information is a target scene, determining attitude information corresponding to the first object; determining a target special effect corresponding to the first static frame type according to the attitude information and the semantic identification information; carrying out special effect processing on each frame of image in the first static frame class based on the target special effect to generate a special effect static frame class corresponding to the first static frame class; and traversing a plurality of static frame classes corresponding to the video to be processed until a plurality of special effect static frame classes corresponding to the plurality of static frame classes are obtained, and generating the special effect video based on the plurality of special effect static frame classes. That is to say, in the embodiment of the present application, the special effect processing device performs recognition processing on the key frame in the static frame class to obtain the pose information and the semantic identification information corresponding to the key frame, and automatically matches and adds a special effect to each frame of image in the static frame class based on the pose information and the semantic identification information, so as to complete special effect processing of a video to be processed, implement intelligent video special effect addition, and meet diversified requirements of users.

Based on the foregoing embodiment, in another embodiment of the present application, as shown in fig. 4, fig. 4 is a schematic view of an implementation flow of a special effect processing method provided in the embodiment of the present application, where after traversing a plurality of static frame classes corresponding to a video to be processed until obtaining a plurality of special effect static frame classes corresponding to the plurality of static frame classes, and generating a special effect video based on the plurality of special effect static frame classes, that is, after step 105, the special effect processing apparatus may further perform:

and 106, outputting the special effect video and obtaining a feedback result corresponding to the special effect video.

In the embodiment of the application, after the special effect video is generated, the special effect processing device may output the special effect video to be provided to a user, and further obtain a feedback result corresponding to the special effect video.

It should be noted that, in the embodiment of the present application, the special effect processing device is provided with a user scoring module, after the user watches the special effect video, the special effect processing device may automatically pop up a scoring interface, and the user may score the current special effect video according to a preference, that is, whether a target special effect corresponding to the object to be processed is optimal. Specifically, the scoring result is divided into three levels of good, general and to-be-improved, and a user can score the special-effect video by clicking a scoring interface. If the user ignores the scoring, the system defaults that the scoring result is general.

Further, after the user scores, the special effect processing device may obtain a scoring result, that is, a feedback result corresponding to the special effect video, so that the special effect processing device may further update the special effect library according to the feedback result.

And step 107, correcting the first matching parameter and the second matching parameter according to the feedback result so as to update the special effect library.

In the embodiment of the application, after the special effect video is output and the feedback result corresponding to the special effect video is obtained, the special effect processing device may further perform correction processing on the first matching parameter and the second matching parameter corresponding to the current target special effect according to the feedback result, so as to update the special effect library.

Optionally, in an embodiment of the present application, the special effect processing apparatus may perform correction processing on the first matching parameter and the second matching parameter based on reinforcement learning. Specifically, a feedback result of the target special effect in the special effect video of this time can be used as training data for reinforcement learning, so that correction processing of special effect matching parameter values is realized, and the corrected special effect matching parameters can be used for adjusting the target special effect corresponding to the same attitude information at the next time.

It can be understood that, in the embodiment of the application, since the reinforcement learning is to correct the special effect matching parameter by using a reward punishment function, if the scoring result of the user of the special effect is good at this time, it indicates that the target special effect corresponding to the posture information is best, and then the reward is performed, that is, the special effect matching parameter value is increased; if the user scoring result of the special effect is general or to be improved, the target special effect corresponding to the attitude information is not the optimal target special effect. A penalty is made, i.e. the value of the special effect matching parameter is lowered.

Specifically, in the embodiment of the present application, the process of correcting the special effect matching parameter specifically includes: after the special effect processing equipment outputs the special effect video to the user, the user scores a target special effect in the special effect video, and if a scoring result is to be improved, a return of-1 is obtained, namely a special effect matching parameter value-1; if the scoring result is general, obtaining 0 return, namely the special effect matching parameter value is not changed; if the scoring result is good, the return of +1 is obtained, namely the special effect matching parameter value is + 1.

Optionally, in an embodiment of the present application, in an initial state of the special effect library, a gesture may have a plurality of special effects in advance, each special effect has a corresponding special effect matching parameter value, which includes a first matching parameter value and a second matching parameter value, and a gesture in the initial state is default to correspond to one of the special effects.

Further, the special effect processing device may adjust the special effect matching parameter value according to the feedback result to update the special effect library, that is, when the target feature corresponding to the posture information is determined next time, the target special effect corresponding to the maximum special effect matching parameter value is reselected from the multiple special effects corresponding to the posture after the special effect matching parameter value is updated.

Optionally, in an embodiment of the application, if the special effect library is prestored without a satisfactory target special effect, after a feedback result corresponding to the special effect video is obtained, the posture information and the semantic identification information corresponding to the object to be processed may be input as training data into the special effect generation model to generate a new target special effect, and the new target special effect and the corresponding relationship between the new target special effect and the special effect matching parameter value are stored to update the special effect library.

The embodiment of the application provides a special effect processing method, wherein special effect processing equipment identifies and processes key frames in a static frame class to obtain attitude information and semantic identification information corresponding to the key frames, and automatically matches and adds a special effect to each frame of image in the static frame class based on the attitude information and the semantic identification information so as to complete special effect processing of a video to be processed, so that intelligent video special effect adding is realized, and diversified requirements of users are met.

Based on the foregoing embodiment, in another embodiment of the present application, fig. 5 is a third implementation flow diagram of a special effect processing method provided in the embodiment of the present application, and as shown in fig. 5, the method for executing special effect processing by a special effect processing device may include:

step 201: and extracting key frames from the video to be processed.

The special effect processing equipment can classify all static frames in the video to be processed to obtain a plurality of static frame classes corresponding to a plurality of key frames; and a key frame can be extracted from each of the static frame classes. Specifically, when the special effect processing device executes the special effect processing method, the special effect processing flow is sequentially executed according to the static frame class.

Step 202, determining scene information and semantic identification information corresponding to the object to be processed in the key frame.

Because there may not be an object to be processed in a part of key frames corresponding to a video to be processed, the special effect processing device may preset an object library, where the preset object library includes a designated target, and when detecting and identifying the object to be processed, it may determine whether the detected object is the designated target according to the prestored object library, and if so, the special effect processing device continues to perform scene identification and semantic annotation on the identified object, thereby obtaining scene information and semantic identification information corresponding to the object to be processed in the current key frame.

Optionally, the special effect processing device may input the current key frame into a preset semantic analysis model established based on machine learning, so as to obtain scene information and semantic identification information corresponding to the object to be processed in the current key frame.

Step 203, judging whether the scene information is a target scene; if yes, go to step 204, otherwise go to step 201.

The special effects processing apparatus may specify a target scene. If the scene information corresponding to the object to be processed is the target scene, executing step 204; if the target scene is not the target scene, the method jumps to step 201, and continues to extract the next key frame from the video to be processed.

For example, the target scene is a character dance scene, and if the scene information corresponding to the object to be processed in the current key frame is the character dance scene, it indicates that the scene information corresponding to the object to be processed is the target scene, step 204 is executed; if the scene information corresponding to the object to be processed in the current key frame is an object motion scene, it indicates that the scene information is not a target scene, and at this time, the special effect processing device does not need to perform special effect processing on the key frame, and continues to extract the next key frame, so that the process skips to step 201.

And step 204, determining the posture information corresponding to the object to be processed.

If the scene information corresponding to the object to be processed in the current key frame is the target scene, the special effect processing device may further determine the pose information corresponding to the object to be processed. Specifically, the special effect processing device may prestore a posture library, where the posture library includes all posture categories and attributes corresponding to the object to be processed, and the special effect processing device may perform posture estimation on the object to be processed first, find all key points (for example, a head, a hand, a knee, and the like) corresponding to the object to be processed, that is, posture feature data corresponding to the object to be processed, then match the posture feature data in the prestored posture library, and select posture information corresponding to the object to be processed.

And step 205, determining a target special effect corresponding to the static frame according to the posture information and the semantic identification information.

Specifically, the special effect processing device is provided with a special effect library corresponding to a target scene, and in the special effect library corresponding to the target scene, since different semantic identification information may correspond to the same posture information, one posture may correspond to multiple special effects, so that when the special effect processing device matches a corresponding target special effect for an object to be processed, the special effect processing device may further select the target special effect corresponding to the key frame from the special effect library based on the posture information and the semantic identification information.

Further, when selecting the target special effect from the special effect library by combining the posture information and the semantic identification information, the special effect processing device may first determine a first matching parameter corresponding to the posture information and a second matching parameter corresponding to the semantic identification information, and further select the target special effect corresponding to the key frame from the special effect library according to the first matching parameter and the second matching parameter.

Further, since the key frame is extracted from the corresponding static frame class, and the static frame class includes an image frame of the same type as the key frame, it can be considered that the target special effect corresponding to the object to be processed in the key frame is determined, and the target special effect corresponding to the object to be processed in the static frame class is also determined, so that the special effect corresponding to the key frame is the target special effect corresponding to the static frame class.

And step 206, performing special effect processing on each frame of image in the static frame class based on the target special effect to generate a special effect static frame class.

Specifically, the special effect processing is to apply the target special effect to the object to be processed, and since the possible motion ranges of different objects to be processed are different, the special effect processing device needs to adjust the application range corresponding to the target special effect according to the motion range of the object to be processed, so that the target special effect is more consistent with the posture information.

Step 207, judging whether the current frame is the last key frame; if yes, go to step 208, otherwise go to step 201.

If the key frame is not the last key frame, it indicates that the special effect processing process corresponding to the video to be processed is not completed, then the special effect processing device continues to extract the next key frame from the video to be processed, and the special effect processing is performed on the next key frame, so that the process skips to step 201; if the current key frame is the last key frame to be processed, the special effect processing device has completed the special effect processing of all the image frames in the video to be processed, and therefore, the step 208 is continuously executed.

Step 208, generating a special effect video based on the plurality of special effect static frame classes.

After all the video frames with the special effect added, namely the special effect static frames, corresponding to the video to be processed are obtained, all the image frames without the special effect added and all the image frames with the special effect added are coded, and therefore the special effect video corresponding to the video to be processed is generated.

And step 209, outputting the special effect video.

The special effect processing device outputs the special effect video to provide to the user.

And step 210, obtaining a feedback result corresponding to the special effect video.

The special effect processing equipment is provided with a user scoring module, after a user watches the special effect video, the special effect processing equipment can automatically pop up a scoring interface, the user can score the special effect video according to the preference, and the special effect processing equipment can obtain a scoring result, namely a corresponding feedback result in the special effect video.

And step 211, correcting the special effect matching parameters based on the feedback result to obtain corrected special effect matching parameters so as to update the special effect library.

Alternatively, the special effect processing device may perform correction processing on the special effect matching parameter based on reinforcement learning to update the special effect library. Specifically, a feedback result of the user on the target special effect in the special effect video can be used as training data for reinforcement learning, so that correction processing on special effect matching parameter values is realized, and a special effect library is updated. Since the special effect processing device needs to perform automatic addition matching of the special effect on the object to be processed based on the corrected special effect matching parameter in the next special effect processing, it is necessary to skip to step 205.

Based on the special effect processing method proposed in the above steps 201 to 211, the special effect processing device may perform recognition processing on each frame of image in the video to be processed, obtain the posture information and semantic identification information corresponding to the object to be processed in the image frame, and automatically match and add a special effect to the object to be processed according to the posture information and the semantic identification information, thereby generating a special effect video.

The embodiment of the application provides a special effect processing method, wherein special effect processing equipment identifies and processes an image frame of a video to be processed to obtain attitude information and semantic identification information corresponding to an object to be processed in the image frame, and automatically matches and adds a special effect to the object to be processed based on the attitude information and the semantic identification information so as to finish the special effect processing of the video to be processed, so that the intelligent video special effect addition is realized, and the diversified requirements of users are met.

Based on the foregoing embodiment, in another embodiment of the present application, fig. 6 is a schematic diagram of a composition structure of the special effect processing apparatus proposed in the present application, and as shown in fig. 6, the special effect processing apparatus 10 proposed in the embodiment of the present application may include an extracting unit 19, a determining unit 110, a processing unit 111, an obtaining unit 112, a generating unit 113, a classifying unit 114, an executing unit 115, an outputting unit 116, and a correcting unit 117.

The extracting unit 19 is configured to extract a first key frame from the first static frame class;

the determining unit 110 is configured to determine scene information and semantic identification information corresponding to a first object in the first key frame; wherein the static frame class comprises at least one frame image with the same type in the video to be processed; if the scene information is a target scene, determining attitude information corresponding to the first object; determining a target special effect corresponding to the first static frame type according to the posture and the semantic identification information;

the processing unit 111 is configured to perform special effect processing on each frame of image in the first static frame class based on the target special effect, and generate a special effect static frame class corresponding to the first static frame class;

the obtaining unit 112 is configured to traverse a plurality of static frame classes corresponding to the video to be processed until a plurality of special effect static frame classes corresponding to the plurality of static frame classes are obtained;

the generating unit 113 is configured to generate a special effect video based on the plurality of special effect static frame classes.

Further, in an embodiment of the present application, the classifying unit 114 is configured to perform classification processing on all static frames in the video to be processed based on a preset classification algorithm before extracting a first key frame from a first static frame class, so as to obtain multiple static frame classes corresponding to the video to be processed; and the preset classification algorithm is used for realizing the clustering of the similar static frames.

Further, in an embodiment of the present application, the executing unit 115 is configured to, after a first key frame is extracted from a first static frame class, if a first object in the first key frame belongs to a preset object library, execute the obtaining processing of the scene information and the semantic identification information on the first object.

Further, in an embodiment of the present application, the determining unit 110 is specifically configured to, if the scene information is a target scene, perform gesture recognition processing on the first object to obtain gesture feature data corresponding to the first object; and determining the attitude information corresponding to the first object according to the attitude characteristic data and a pre-stored attitude library.

Further, in an embodiment of the present application, the determining unit 110 is further specifically configured to obtain a special effect library corresponding to the target scene; selecting a special effect corresponding to the first key frame from the special effect library according to the attitude information and the semantic identification information; and taking the special effect corresponding to the first key frame as the target special effect corresponding to the first static frame class.

Further, in an embodiment of the present application, the determining unit 110 is further specifically configured to determine a first matching parameter corresponding to the posture information and a second matching parameter corresponding to the semantic identification information; and selecting the target special effect from the special effect library according to the first matching parameter and the second matching parameter.

Further, in an embodiment of the present application, the extracting unit 19 is further configured to, after determining scene information and semantic identification information corresponding to a first object in the first key frame, if the scene information is not the target scene, extract a second key frame from a second static frame class.

Further, in an embodiment of the present application, the determining unit 110 is further configured to determine scene information and semantic identification information corresponding to a second object in the second key frame.

Further, in an embodiment of the present application, the executing unit 115 is further configured to continue to execute a special effect processing procedure on the second static frame class if the scene information corresponding to the second object is the target scene.

Further, in an embodiment of the present application, the output unit 116 is configured to output the special effect video after traversing a plurality of static frame classes corresponding to the to-be-processed video until obtaining a plurality of special effect static frame classes corresponding to the plurality of static frame classes and generating the special effect video based on the plurality of special effect static frame classes.

Further, in an embodiment of the present application, the obtaining unit 112 is further configured to obtain a feedback result corresponding to the special effect video.

Further, in an embodiment of the present application, the correcting unit 117 is configured to perform a correction process on the first matching parameter and the second matching parameter according to the feedback result, so as to update the special effects library.

In an embodiment of the present application, further, fig. 7 is a schematic structural diagram of a third component of the special effect processing apparatus provided by the present application, as shown in fig. 7, the special effect processing apparatus 10 provided by the embodiment of the present application may further include a processor 118 and a memory 119 storing executable instructions of the processor 118, and further, the recommendation apparatus 10 may further include a communication interface 120, and a bus 121 for connecting the processor 118, the memory 119, and the communication interface 120.

In an embodiment of the present Application, the Processor 118 may be at least one of an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a ProgRAMmable Logic Device (PLD), a Field ProgRAMmable Gate Array (FPGA), a Central Processing Unit (CPU), a controller, a microcontroller, and a microprocessor. It is understood that the electronic devices for implementing the above processor functions may be other devices, and the embodiments of the present application are not limited in particular. The special effects processing apparatus 10 may further comprise a memory 119, which memory 119 may be coupled to the processor 118, wherein the memory 119 is configured to store executable program code comprising computer operating instructions, and the memory 119 may comprise a high speed RAM memory and may further comprise a non-volatile memory, such as at least two disk memories.

In an embodiment of the present application, a bus 121 is used to connect the communication interface 120, the processor 118, and the memory 119 and the intercommunication among these devices.

In an embodiment of the present application, the memory 119 is used for storing instructions and data.

Further, in an embodiment of the present application, the processor 118 is configured to extract a first key frame from a first static frame class, and determine scene information and semantic identification information corresponding to a first object in the first key frame; wherein the first static frame class comprises at least one frame image with the same type in the video to be processed; if the scene information is a target scene, determining attitude information corresponding to the first object; determining a target special effect corresponding to the first static frame type according to the attitude information and the semantic identification information; carrying out special effect processing on each frame of image in the first static frame class based on the target special effect to generate a special effect static frame class corresponding to the first static frame class; traversing a plurality of static frame classes corresponding to the video to be processed until a plurality of special effect static frame classes corresponding to the plurality of static frame classes are obtained, and generating a special effect video based on the plurality of special effect static frame classes.

In practical applications, the Memory 119 may be a volatile Memory (volatile Memory), such as a Random-Access Memory (RAM); or a non-volatile Memory (non-volatile Memory), such as a Read-Only Memory (ROM), a flash Memory (flash Memory), a Hard Disk (Hard Disk Drive, HDD) or a Solid-State Drive (SSD); or a combination of the above types of memories and provides instructions and data to the processor 118.

In addition, each functional module in this embodiment may be integrated into one recommendation unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware or a form of a software functional module.

Based on the understanding that the technical solution of the present embodiment essentially or a part contributing to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium, and include several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the method of the present embodiment. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The embodiment of the application provides special effect processing equipment which can extract a first key frame from a first static frame class and determine scene information and semantic identification information corresponding to a first object in the first key frame; the first static frame class comprises at least one frame image with the same type in the video to be processed; if the scene information is a target scene, determining attitude information corresponding to the first object; determining a target special effect corresponding to the first static frame type according to the attitude information and the semantic identification information; carrying out special effect processing on each frame of image in the first static frame class based on the target special effect to generate a special effect static frame class corresponding to the first static frame class; and traversing a plurality of static frame classes corresponding to the video to be processed until a plurality of special effect static frame classes corresponding to the plurality of static frame classes are obtained, and generating the special effect video based on the plurality of special effect static frame classes. That is to say, in the embodiment of the present application, the special effect processing device performs recognition processing on the key frame in the static frame class to obtain the pose information and the semantic identification information corresponding to the key frame, and automatically matches and adds a special effect to each frame of image in the static frame class based on the pose information and the semantic identification information, so as to complete special effect processing of a video to be processed, implement intelligent video special effect addition, and meet diversified requirements of users.

An embodiment of the present application provides a computer-readable storage medium, on which a program is stored, which when executed by a processor implements the recommendation method as described above.

Specifically, the program instructions corresponding to a recommended method in the present embodiment may be stored on a storage medium such as an optical disc, a hard disc, a usb disk, or the like, and when the program instructions corresponding to a recommended method in the storage medium are read or executed by an electronic device, the method includes the following steps:

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of implementations of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart block or blocks and/or flowchart block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart block or blocks in the flowchart and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present application, and is not intended to limit the scope of the present application.

Claims

1. A special effect processing method, characterized in that the method comprises:

extracting a first key frame from a first static frame class, and determining scene information and semantic identification information corresponding to a first object in the first key frame; wherein the first static frame class comprises at least one frame image with the same type in the video to be processed; the first key frame is the image frame with the maximum entropy value in the first static frame class;

traversing a plurality of static frame classes corresponding to the video to be processed until a plurality of special effect static frame classes corresponding to the plurality of static frame classes are obtained, and generating a special effect video based on the plurality of special effect static frame classes;

outputting the special effect video and obtaining a feedback result corresponding to the special effect video;

and correcting a first matching parameter corresponding to the attitude information and a second matching parameter corresponding to the semantic identification information according to the feedback result so as to update a special effect library corresponding to the target scene.

2. The method of claim 1, wherein prior to said extracting the first key frame from the first static frame class, the method further comprises:

classifying all static frames in the video to be processed based on a preset classification algorithm to obtain a plurality of static frame classes corresponding to the video to be processed; and the preset classification algorithm is used for realizing the clustering of the similar static frames.

3. The method of claim 1, wherein after extracting the first key frame from the first static frame class, the method further comprises:

and if the first object in the first key frame belongs to a preset object library, executing the acquisition processing of the scene information and the semantic identification information on the first object.

4. The method according to claim 1, wherein determining the posture information corresponding to the first object if the scene information is a target scene comprises:

if the scene information is the target scene, performing gesture recognition processing on the first object to obtain gesture feature data corresponding to the first object;

and determining the attitude information corresponding to the first object according to the attitude characteristic data and a pre-stored attitude library.

5. The method according to claim 1, wherein the determining the target special effect corresponding to the first static frame class according to the pose information and the semantic identification information comprises:

acquiring the special effect library corresponding to the target scene;

selecting a special effect corresponding to the first key frame from the special effect library according to the attitude information and the semantic identification information;

and taking the special effect corresponding to the first key frame as the target special effect corresponding to the first static frame class.

6. The method according to claim 5, wherein the determining the target special effect corresponding to the first static frame class according to the pose information and the semantic identification information comprises:

determining the first matching parameter corresponding to the attitude information and the second matching parameter corresponding to the semantic identification information;

and selecting the target special effect from the special effect library according to the first matching parameter and the second matching parameter.

7. The method according to claim 1, wherein after determining the scene information and the semantic identification information corresponding to the first object in the first key frame, the method further comprises:

if the scene information is not the target scene, extracting a second key frame from a second static frame class, and determining scene information and semantic identification information corresponding to a second object in the second key frame;

and if the scene information corresponding to the second object is the target scene, continuing to execute a special effect processing flow on the second static frame class.

8. An effect processing apparatus characterized by comprising: an extracting unit, a determining unit, a processing unit, an acquiring unit, a generating unit, an outputting unit and a correcting unit,

the extracting unit is used for extracting a first key frame from a first static frame class; wherein the first keyframe is the image frame with the largest entropy value in the first static frame class;

the generating unit is used for generating a special effect video based on the plurality of special effect static frame classes;

the output unit is used for outputting the special effect video after traversing a plurality of static frame classes corresponding to the video to be processed until a plurality of special effect static frame classes corresponding to the static frame classes are obtained and generating the special effect video based on the special effect static frame classes;

the obtaining unit is further configured to obtain a feedback result corresponding to the special-effect video;

and the correction unit is used for correcting the first matching parameter corresponding to the attitude information and the second matching parameter corresponding to the semantic identification information according to the feedback result so as to update the special effect library corresponding to the target scene.

9. The special effects processing apparatus according to claim 8, further comprising: the classification unit is used for classifying the target,

the classification unit is used for classifying all static frames in the video to be processed based on a preset classification algorithm before extracting a first key frame from a first static frame class, so as to obtain a plurality of static frame classes corresponding to the video to be processed; and the preset classification algorithm is used for realizing the clustering of the similar static frames.

10. The special effects processing apparatus according to claim 8, further comprising: an execution unit for executing the execution of the program,

the execution unit is configured to, after a first key frame is extracted from a first static frame class, execute, if a first object in the first key frame belongs to a preset object library, the acquisition processing of the scene information and the semantic identification information on the first object.

11. The special effects processing apparatus of claim 8,

the determining unit is specifically configured to perform gesture recognition processing on the first object if the scene information is a target scene, so as to obtain gesture feature data corresponding to the first object; and determining the attitude information corresponding to the first object according to the attitude feature data and a pre-stored attitude library.

12. The special effects processing apparatus of claim 8,

the determining unit is further specifically configured to obtain the special effect library corresponding to the target scene; selecting a special effect corresponding to the first key frame from the special effect library according to the attitude information and the semantic identification information; and taking the special effect corresponding to the first key frame as the target special effect corresponding to the first static frame class.

13. The special effects processing apparatus of claim 12,

the determining unit is further specifically configured to determine the first matching parameter corresponding to the posture information and the second matching parameter corresponding to the semantic identification information; and selecting the target special effect from the special effect library according to the first matching parameter and the second matching parameter.

14. The special effects processing apparatus of claim 10,

the extracting unit is further configured to, after determining scene information and semantic identification information corresponding to a first object in the first key frame, if the scene information is not the target scene, extract a second key frame from a second static frame class;

the determining unit is further configured to determine scene information and semantic identification information corresponding to a second object in the second key frame;

the execution unit is further configured to continue to execute a special effect processing flow on the second static frame class if the scene information corresponding to the second object is the target scene.

15. An effect processing apparatus, comprising a processor, a memory having stored thereon instructions executable by the processor to perform the method of any one of claims 1-7 when the instructions are executed by the processor.

16. A computer-readable storage medium, on which a program is stored, for application in a special effects processing device, characterized in that the program, when executed by a processor, implements the method according to any one of claims 1 to 7.