CN114827730B

CN114827730B - Video cover selection method, device, equipment and storage medium

Info

Publication number: CN114827730B
Application number: CN202210411941.9A
Authority: CN
Inventors: 柳建龙; 尹瑶瑶; 付荣; 邢刚; 陈旻
Original assignee: China Mobile Communications Group Co Ltd; MIGU Culture Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; MIGU Culture Technology Co Ltd
Priority date: 2022-04-19
Filing date: 2022-04-19
Publication date: 2024-05-31
Anticipated expiration: 2042-04-19
Also published as: CN114827730A

Abstract

The invention discloses a video cover selection method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring a key frame to be processed in a video, and determining the facing angle of a target person in the key frame to be processed, wherein the key frame to be processed has multiple frames; identifying a first gesture of a target person, and acquiring a preset picture matched with the first gesture; correcting each action angle of the second gesture action of the reference character in the preset picture according to the facing angle, and determining an action score of the first gesture action according to the facing angle and each corrected action angle; and determining the position score of the target person in the key frames to be processed, and selecting the video cover of the video from each key frame to be processed according to the action score and the position score. The invention ensures that the selected video cover is more accurate and meets the requirements of users.

Description

Video cover selection method, device, equipment and storage medium

Technical Field

The present invention relates to the field of internet technologies, and in particular, to a method, an apparatus, a device, and a storage medium for selecting a video cover.

Background

Currently, when determining a video cover, video cover selection is performed based on user attribute features. The templates generated for the video covers correspondingly generated by the users are different through the difference of the attribute characteristics of the users. For example, the user is an individual user a whose attribute features are historical shopping habits: if buying snack is liked, the generation of the template for the video cover correspondingly generated by the user can include screening conditions: the food is then used for generating a video cover containing various food based on the video cover generating template. For another example, the user is company b, and the attribute of company b is characterized by the type of company b: the screen conditions may be included in the template generated for the video cover corresponding to the company b: the star character and the scene picture can be used for generating a video cover containing a certain star character and a certain scene picture based on the video cover generating template. However, due to the fact that angles and positions of shot characters are different, the optimal character position cannot be selected, and when picture scoring is performed, picture scoring cannot be performed accurately due to angle reasons, so that the accuracy of the selected video cover is low.

Disclosure of Invention

The embodiment of the invention provides a video cover selection method, a device, equipment and a storage medium, which aim to solve the technical problems that the best character position cannot be selected due to different angles and positions of shot characters, and the accuracy of the selected video cover is low because the images cannot be accurately scored due to angle reasons when the images are scored.

The embodiment of the invention provides a video cover selection method, which comprises the following steps:

acquiring a key frame to be processed in a video, and determining the facing angle of a target person in the key frame to be processed, wherein the key frame to be processed has multiple frames;

Identifying a first gesture of the target person, and acquiring a preset picture matched with the first gesture;

Correcting each action angle of the second gesture action of the reference character in the preset picture according to the facing angle, and determining an action score of the first gesture action according to the facing angle and each corrected action angle;

and determining the position score of the target person in the key frames to be processed, and selecting the video cover of the video from each key frame to be processed according to the action score and the position score.

In an embodiment, the step of correcting each action angle of the second gesture action of the reference person in the preset picture according to the facing angle includes:

extracting a first action frame point line graph of the second gesture action, wherein the first action frame point line graph is formed by connecting lines of all the nodes of the reference person;

Rotating the first action frame point line graph according to a preset direction, a central axis of the first action frame point line graph and the facing angle to obtain a second action frame point line graph, wherein the central axis is a connecting line of a midpoint between a neck joint and a hip joint;

acquiring an included angle between each joint point and the central axis in the second action frame point diagram;

And taking the included angle between each joint point and the central axis as each corrected action angle.

In an embodiment, the step of determining the action score of the first gesture action according to the facing angle and the corrected action angles includes:

acquiring a maximum facing angle of the facing angles of the target person in each key frame to be processed, a maximum action angle of each corrected action angle and a preset weight value corresponding to each corrected action angle;

determining an angle score of each corrected action angle according to the facing angle, the maximum action angle, each corrected action angle and the preset weight value;

determining a comparison score of a first gesture action of the target person in each key frame to be processed and the corrected second gesture action according to each angle score;

and determining the action score of the first gesture action according to each comparison score.

In one embodiment, the step of determining the location score of the target person in the key frame to be processed comprises:

determining an intersection point of a golden section point of a picture in the video, wherein the intersection point is at least two;

determining golden section point scores of the region where the target person is located in the key frame to be processed relative to the intersection points;

and determining the position score according to golden section point scores of the region where the target person is located relative to the intersection points.

In one embodiment, the step of determining the golden section point score of the region of the key frame to be processed where the target person is located relative to each of the intersection points includes:

constructing a golden section point frame set, wherein the golden section point frame set comprises first frame numbers of multi-frame golden section point frames and frame number intervals associated with the first frame numbers;

When the second frame number of the key frame to be processed is in the frame number interval, determining a plurality of picture scores of the key frame to be processed according to the first frame number, each second frame number and the maximum value and the minimum value of each frame number interval;

and determining the golden section point score according to a plurality of the picture scores.

In one embodiment, the step of constructing the golden section point frame collection comprises:

Constructing a two-dimensional coordinate system in the video, selecting at least two of a positive direction of a transverse axis, a negative direction of the transverse axis, a positive direction of a longitudinal axis and a negative direction of the longitudinal axis of the two-dimensional coordinate system as target directions, and marking an intersection point of a golden section point of the picture in the two-dimensional coordinate system according to the target directions;

acquiring a starting frame and an ending frame from each key frame to be processed, and constructing a connecting line segment between the target person contained in the starting frame and the target person contained in the ending frame in the two-dimensional coordinate system;

Determining the tangential points of the intersecting points and the connecting line segments by taking the intersecting points as circle centers, and selecting video frames at the tangential points as golden section point frames;

determining a first frame number of each golden section point frame according to the length of each connecting line segment, the distance between each tangent point and the target person contained in the initial frame, the frame number of the key frame to be processed and the third frame number of the initial frame;

Based on each connecting line segment, constructing a frame number interval of a third frame number of the starting frame and a fourth frame number of the ending frame, and associating each frame number interval with the corresponding first frame number to obtain a plurality of association relations;

And generating the golden section point frame set according to the association relations.

In one embodiment, the step of selecting the video cover of the video from the key frames to be processed according to the action score and the position score includes:

Drawing n line segments corresponding to n dimension scores, so that one ends of the n line segments intersect at a central point; wherein the n dimension scores comprise the action score and the position score, and n is more than or equal to 2;

according to the n dimension scores of the target person in the key frame to be processed, correspondingly increasing each line segment along the direction away from the central point;

sequentially connecting one end, far away from the center point, of each line segment after the increase to obtain scoring polygons corresponding to each key frame to be processed;

And selecting a video cover of the video from the key frames to be processed according to the areas of the scoring polygons.

In addition, in order to achieve the above object, the present invention also provides a video cover selecting device, including:

The information acquisition module is used for acquiring a key frame to be processed in the video and determining the facing angle of a target person in the key frame to be processed, wherein the key frame to be processed is provided with a plurality of frames;

The action recognition module is used for recognizing a first gesture action of the target person and acquiring a preset picture matched with the first gesture action;

The score calculation module is used for correcting each action angle of the second gesture action of the reference character in the preset picture according to the facing angle, and determining the action score of the first gesture action according to the facing angle and each corrected action angle;

The cover selection module is used for determining the position score of the target person in the key frames to be processed and selecting the video cover of the video from the key frames to be processed according to the action score and the position score.

In addition, to achieve the above object, the present invention also provides a terminal device, including: the system comprises a memory, a processor and a video cover selection program which is stored in the memory and can run on the processor, wherein the video cover selection program realizes the steps of the video cover selection method when being executed by the processor.

In addition, in order to achieve the above object, the present invention also provides a storage medium having stored thereon a video cover selection program which, when executed by a processor, implements the steps of the video cover selection method described above.

The technical scheme of the video cover selection method, the device, the equipment and the storage medium provided by the embodiment of the invention has at least the following technical effects or advantages:

According to the technical scheme, the method comprises the steps of obtaining multi-frame key frames to be processed in a video, correcting each action angle of a second gesture action of a reference person in a preset picture matched with a first gesture action of the target person according to the facing angle of the target person in the key frames to be processed, determining action scores of the first gesture actions according to the facing angle and the corrected action angles, determining position scores of the target person in the key frames to be processed, and selecting video covers of the video from the key frames to be processed according to the action scores and the position scores corresponding to the key frames to be processed. According to the method, a plurality of scoring dimensions (at least comprising action scores and position scores) are introduced, namely, action scores are carried out on target characters by using the facing angles of the target characters in the key frames to be processed and comprehensive scores are carried out on the key frames to be processed in the video by carrying out position scores on the positions of the target characters in the key frames to be processed, then video covers are selected from the key frames to be processed according to the comprehensive scores, so that the accuracy and quality of video cover selection are improved, and the selected video covers are more accurate and more in line with user requirements.

Drawings

FIG. 1 is a schematic diagram of a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating an embodiment of a method for selecting a video cover according to the present invention;

FIG. 3 is a schematic diagram of a calculation process of action scores in the video cover selection method of the present invention;

FIG. 4 is a schematic diagram of a key frame to be processed according to the present invention;

FIG. 5 is a schematic diagram of a preset picture according to the present invention;

FIG. 6 is a schematic diagram of a point diagram of an action frame in a preset picture according to the present invention;

FIG. 7 is a schematic diagram of another example of a motion frame plot in a preset picture according to the present invention;

FIG. 8 is a schematic diagram of a position score calculation process in the video cover selection method according to the present invention;

FIG. 9 is a schematic diagram of a two-dimensional coordinate system of the present invention;

FIG. 10 is another schematic diagram of a two-dimensional coordinate system of the present invention;

FIG. 11 is a schematic diagram of a construction flow of a scoring polygon according to the present invention;

FIG. 12 is a schematic view of a scoring polygon according to the present invention;

FIG. 13 is a functional block diagram of a video cover selection device according to the present invention.

Detailed Description

In order that the above-described aspects may be better understood, exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

As shown in fig. 1, fig. 1 is a schematic structural diagram of a hardware running environment according to an embodiment of the present invention.

It should be noted that fig. 1 may be a schematic structural diagram of a hardware operating environment of a terminal device.

As an implementation manner, as shown in fig. 1, an embodiment of the present invention may relate to a terminal device, where the terminal device includes: a processor 1001, such as a CPU, memory 1002, a communications bus 1003. Wherein the communication bus 1003 is used to enable connectivity communications between these components.

The memory 1002 may be a high-speed RAX memory or a stable memory (non-volatileXeXory), such as a disk memory. As shown in fig. 1, a video cover selection program may be included in a memory 1002 as a storage medium; and the processor 1001 may be configured to call a video cover selection program stored in the memory 1002 and perform at least the following operations:

Embodiments of the present invention provide embodiments of video cover selection methods, it being noted that although a logical sequence is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in a different order than that illustrated herein.

As shown in fig. 2, in an embodiment of the invention, the video cover selection method of the invention includes the following steps:

Step S210: and acquiring a key frame to be processed in the video, and determining the facing angle of the target person in the key frame to be processed.

In this embodiment, the video is understood to be a video requiring a video cover, such as sports video, music video, animation video, recorded video, and the like, and the present invention is described by taking sports video as an example. Specifically, in the early stage of setting a video cover for a video, a lot of highlight pictures of specified topics need to be collected, the highlight pictures corresponding to the topics are scored, then different topics are associated with the corresponding highlight pictures and the scores, for example, the topics are football shooting, the highlight pictures are match pictures of football players shooting, the scores of the match pictures are 8 points, then the match pictures and the scores are associated with football shooting, and the match pictures and the scores are stored in a preprocessing library (namely a database), wherein the highlight pictures are called as preset pictures, and the topics are preset topics. If a cover is required to be set for the video, a preset theme matched with the video theme is searched from a preprocessing library, a preset picture with the highest corresponding score is obtained according to the searched preset theme, and then, according to the obtained preset picture, a decision is made on how to set the video cover for the video.

The key frame to be processed is provided with a plurality of frames and is an alternative picture for setting a video cover in the video. Typically, the acquired video includes a number of frame key frames (I-frames), which may be all or partially consecutive. And generating a set P by using a plurality of frames of key frames, carrying out image recognition on all the key frames in the set P according to the video theme, selecting the key frames containing the video theme from the set P, and generating a set C by using the selected key frames containing the video theme, wherein for example, the key frames containing football players are in the set C. Then, the key frames in the set C need to be screened to obtain the key frames to be processed.

The screening process is as follows: and (3) performing quality detection on all key frames in the set C, filtering out key frames with unqualified quality, and putting the key frames meeting the quality into the set B, wherein the key frames included in the set B are the key frames to be processed, namely, the key frames are used for selecting the video covers. If the key frames in the set C meet the quality anomaly detection rule, the key frames in the set C do not meet the quality, and if the key frames in the set C do not meet the quality detection rule, the key frames in the set C meet the quality anomaly detection rule as follows:

Element E occupies less than 20% or more than 50% of the overall map;

The brightness of the whole graph is too dark or too bright;

The whole graph is blurred;

Element E is located too far to the side in the figure;

The manner in which element E is shown in the figure is reversed.

The element E is a target object to be identified, and the target object may be a person, an object, or the like.

After the key frame to be processed in the video is obtained, identifying the target person in the key frame to be processed, and calculating the facing angle of the target person in the key frame to be processed.

Step S220: and identifying a first gesture of the target person, and acquiring a preset picture matched with the first gesture.

After the key frame to be processed in the video is obtained, the gesture action of the target person in the key frame to be processed is identified, and in order to distinguish the gesture action from the gesture action in the preset picture, the embodiment is called a first gesture action. Assuming that one of the key frames to be processed is fig. 4, the recognition result of the first gesture motion is the motion in fig. 4, and fig. 4 is a shout motion of a football player with both hands open, both legs separated and leaning back slightly, namely, the first gesture motion can be outlined by an action frame point line diagram, and the action frame point line diagram can be understood as a matchman figure, namely, a connection line formed by all the joints of a human body.

And searching a preset image matched with the target person and the recognition result from the preprocessing library according to the recognition result of the first gesture. For example, fig. 4 is a key frame to be processed for a football player, i.e. the subject is a football player, the first gesture motion is a shout motion with both hands open, both legs separated and leaning back slightly, then the preset picture found from the pre-processing library needs to be a football player, and the second gesture motion of the football player in the preset picture is the same as the first gesture motion, but the facing angles of the reference character and the target character in the picture may be different, i.e. the motion angles of the reference character and the target character may be different. The searched preset pictures are preset pictures matched with the first gesture action and the football player in the preprocessing library, and the searched preset pictures are provided with multiple frames. For example, the searched preset picture is shown in fig. 5.

Step S230: correcting each action angle of the second gesture action of the reference character in the preset picture according to the facing angle, and determining the action score of the first gesture action according to the facing angle and each corrected action angle.

After a preset picture matched with the first gesture is obtained, the preset picture comprises a reference character, the reference character and a target character in a key frame to be processed are the same, namely the subjects are the same, the actions of the second gesture and the first gesture are different in angle, the action angle is an included angle between a joint point of the character and a central axis, and the central axis is a connecting line of a middle point between a neck joint and a hip joint.

For selecting the video cover, the invention introduces two scoring dimensions, namely an action score of the gesture action of the target person and a position score of the target person in the key frame to be processed.

The determination of the action score is required to be obtained based on a preset picture, specifically, each action angle of the second gesture action of the reference person in the preset picture is corrected according to the facing angle of the target person in the key frame to be processed, so that the facing angle of the reference person in the preset picture is close to the facing angle of the target person as much as possible. And correcting each action angle of the second gesture action of the reference character to obtain a corrected second gesture action, correcting each action angle corresponding to the corrected second gesture action, and calculating an action score of the first gesture action of the target character according to the facing angle and each corrected action angle. The key frame to be processed is provided with a plurality of frames, and the action scores of the first gesture actions of the target person in each frame of key frame to be processed are required to be calculated according to the mode.

Step S240: and determining the position score of the target person in the key frames to be processed, and selecting the video cover of the video from each key frame to be processed according to the action score and the position score.

In this embodiment, after the action score of the first gesture action of the target person in the key frame to be processed in each frame is calculated, the position score of the target person in the corresponding key frame to be processed is calculated, where the position score may be understood as the score of the position of the target person in the key frame to be processed in the whole key frame to be processed.

On the one hand, after action scores and position scores corresponding to the key frames to be processed of each frame are obtained respectively, scoring polygons are constructed according to the action scores and the position scores. The action score and the position score can be used as the lengths of two line segments, namely a first line segment and a second line segment, the first line segment and the second line segment intersect at one point, and one ends of the first line segment and the second line segment far away from the intersection point are connected, so that a scoring polygon is formed, and the scoring polygon is a triangle, in particular a scoring triangle. Wherein the scoring triangle may be one of a right triangle, an obtuse triangle, and an acute triangle.

After the scoring triangles corresponding to each frame of the key frame to be processed are built, calculating the area of each scoring triangle, and then representing the comprehensive score of each frame of the key frame to be processed through the area of each scoring triangle, namely area = comprehensive score. The selection rule of the video cover is to take the maximum value in each comprehensive score, namely, the key frame to be processed of the maximum comprehensive score in each comprehensive score is selected as the video cover of the video. For example, there are 3 key frames to be processed, S11, S12, S13, respectively, and the corresponding composite scores are Z11, Z12, Z13, Z11< Z12< Z13, then S13 is selected as the video cover of the video.

On the other hand, after the action score and the position score corresponding to each frame of the key frame to be processed are obtained respectively, weighting calculation is performed on the action score and the position score corresponding to each frame of the key frame to be processed to obtain a score weighted sum, and then the comprehensive score of each frame of the key frame to be processed is represented by the score weighted sum corresponding to each frame of the key frame to be processed, namely the score weighted sum=the comprehensive score. The selection rule of the video cover is to take the maximum value in each comprehensive score, namely, the key frame to be processed of the maximum comprehensive score in each comprehensive score is selected as the video cover of the video. For example, there are 3 key frames to be processed, S21, S22, S23, respectively, and the corresponding composite scores are Z21, Z22, Z23, Z21< Z22< Z23, then S23 is selected as the video cover of the video.

According to the technical scheme, the action scoring is performed by introducing a plurality of scoring dimensions (action score and position score), namely, the action scoring is performed by taking the facing angle of the target person in the key frame to be processed as the target person, the position scoring is performed by taking the position of the target person in the key frame to be processed, and then the key frame to be processed in the video is comprehensively scored based on the action score and the position score, and the video cover is selected from the key to be processed according to the comprehensive score, so that the accuracy and the quality of video cover selection are improved, and the selected video cover is more accurate and meets the requirements of users.

As shown in fig. 3, according to the first embodiment, the correcting, according to the facing angle, each action angle of the second gesture action of the reference person in the preset picture includes the following steps:

step S231: extracting a first action frame point line graph of the second gesture action, wherein the first action frame point line graph is formed by connecting lines of all the nodes of the reference person;

Step S232: rotating the first action frame point line graph according to a preset direction, a central axis of the first action frame point line graph and the facing angle to obtain a second action frame point line graph, wherein the central axis is a connecting line of a midpoint between a neck joint and a hip joint;

Step S233: acquiring an included angle between each joint point and the central axis in the second action frame point diagram;

Step S234: and taking the included angle between each joint point and the central axis as each corrected action angle.

As shown in fig. 4 and 5, fig. 4 is a key frame to be processed, the football player in fig. 4 is a target character, and the first gesture motion is a motion frame point line diagram in which all the joint points in fig. 4 are connected, namely a matchman; fig. 5 is a preset picture matched with the first gesture, the football player in fig. 5 is a reference character, and the second gesture is a point diagram of an action frame with all the joints connected in fig. 5, namely a matchman. The action frame point line diagram in the preset picture is preset, action frame point lines in the key frame to be processed need to be identified and constructed, the action frame point line diagram corresponding to the second gesture action is called a first action frame point line diagram, and the independent first action frame point line diagram is shown as a left diagram in fig. 6 and a left diagram in fig. 7.

After the first action frame point line diagram of the second gesture action is extracted, the first action frame point line diagram is rotated according to a preset direction, a central axis and a facing angle of the first action frame point line diagram, namely, the central axis is taken as a rotation axis, the first action frame point line diagram is rotated by the facing angle along the preset direction, the rotated first action frame point line diagram, namely, a second action frame point line diagram, is obtained, the facing angle of the second action frame point line diagram is close to the facing angle of the action frame point line diagram in the key frame to be processed, and the first action frame point line diagram and the second action frame point line diagram are similar. Wherein, the rotation process is shown in fig. 6 and 7, the right graph of fig. 6 shows a second action frame dot line graph, the right graph of fig. 7 shows a top view, the preset direction in fig. 6 and 7 is anticlockwise, and the facing angle is 45 degrees; the central axis is the line of the midpoint between the neck and hip joints, which is the dashed line in the left-hand diagram of fig. 6 and the left-hand diagram of fig. 7.

After the second action frame point line diagram is obtained, the included angle between each joint point in the second action frame point line diagram and the central axis, namely the included angle between the connecting line between every two adjacent joint points and the central axis, is obtained, and after the included angle between each joint point in the second action frame point line diagram and the central axis is obtained, the corrected action angles are obtained, namely the included angle between each joint point in the second action frame point line diagram and the central axis is equal to the corrected action angles. As shown in the right graph of fig. 6, the included angle between the connecting line of the two dots and the dotted line is an action angle.

Further, according to the first embodiment, the step of determining the action score of the first gesture action according to the facing angle and the corrected action angles includes:

Referring to fig. 6 and 7, the following description will be given by taking as an example a corrected operation angle, that is, a rotated thigh-to-hip joint angle, which is denoted by D ', and a facing angle θ, D' is calculated as follows:

D′＝360°-(∠AB″S1A∠S1B″C″)；

∠AB″S1＝arAtan(AS1/B″S1)；

∠S1B″C″＝90°AarAtan(S1S2/(C″S2-B″S1))；

According to the right graph of fig. 7, B "s1=ab 'Aos θ= ABAos θ=bs 1Aos θ, AB' =ab=bs 1;

C″S2＝AC′Aosθ＝ACAosθ＝CS2Aosθ；

Angle AB "s1= arAtan (AS 1/(BS 1Aos θ)), AC' =ac=cs2 according to the right plot of fig. 7;

B 'S1 and C' S2 are substituted into the formula:

∠S1B″C″＝90°AarAtan(S1S2/(CS2Aosθ-BS1Aosθ)；

D′＝360°-(∠AB″S1A∠S1B″C″)＝360°-((arAtan(AS1/(BS1Aosθ)))A90°AarAtan(S1S2/(CS2Aosθ-BS1Aosθ))。

Wherein AS1, BS1, S1S2, CS2 and BS1 are known parameters in a preset image, the angle is theta, and the parameters are obtained through calculation of a key frame to be processed without calculation.

And calculating each corrected action angle according to the calculation method of D'. And calculating An angle score An of each corrected action angle according to the following formula, wherein the angle score calculation formula is as follows:

An＝(1-|Dn-D′n|/MAX(D′n,D′n))Qn；

Wherein Dn represents a facing angle corresponding to the n-th frame to be processed key frame, D 'n represents a corrected n-th action angle in the preset picture, qn represents a preset weight value corresponding to the corrected n-th action angle, and MAX (Dn, D' n) represents a maximum facing angle in the n-th frame to be processed key frame and a corrected maximum action angle in the preset picture.

Based on each key frame to be processed, after the corrected angles of each action angle in the preset picture are obtained, calculating the comparison score of the first gesture action in each key frame to be processed and the corrected second gesture action in each preset picture, wherein the calculation formula of the comparison score is as follows:

After the comparison scores of the first gesture actions in each key frame to be processed and the corrected second gesture actions in each preset picture are obtained, namely, one frame of key frame to be processed corresponds to multiple frames of preset pictures, a plurality of comparison scores are obtained, and then the maximum comparison score is determined to be the action score of the first gesture actions corresponding to the key frame to be processed, so that the action scores of the first gesture actions corresponding to all the key frames to be processed can be calculated according to the calculation mode.

As shown in fig. 8, based on the first embodiment, determining the location score of the target person in the key frame to be processed includes the steps of:

step S241: and determining the intersection point of the golden section points of the pictures in the video.

Each video has a plurality of picture golden section points, namely 2 picture golden section points in the length direction and the width direction of the video respectively, namely the left side and the right side in the length direction are included, and the upper part and the lower part in the width direction are included. The intersecting point of the picture golden section point in the length direction and the picture golden section point in the width direction is the intersecting point, and the number of the intersecting points is at least two, specifically 4. The golden section points are used for representing the display positions of the things in the video, so that the things displayed by the video of the person can be better visually enjoyed by the people.

Step S242: and determining golden section point scores of the region where the target person is located in the key frame to be processed relative to each intersection point.

Step S243: and determining the position score according to golden section point scores of the region where the target person is located relative to the intersection points.

After the intersection points of the golden section points of the picture are determined, golden section point scores of the region where the target person is located in the key frame to be processed relative to the intersection points are calculated based on the intersection points of the golden section points of the picture, and as the display positions of the golden section points of the picture representing things in video are quite suitable, the things displayed by the video of the person are coordinated in proportion, better visual enjoyment can be provided for people, whether the position where the target person is located in the key frame to be processed is coordinated or not can be provided for people, and therefore the golden section point scores are used for describing the position scores. Since the number of intersections is plural and the number of golden section point scores obtained is plural, the score having the golden section point score in the front order determines the determination position score. The location score is determined, for example, from the maximum golden point score, i.e., maximum golden point score = location score.

Specifically, step S242 includes the steps of:

The golden section point frame set comprises first frame numbers of multi-frame golden section point frames and frame number intervals associated with the first frame numbers. Golden section point frames refer to video frames that are near the intersection of the golden section points of the picture, which may be ordinary frames in the video, or key frames, etc. The first frame number refers to the frame number of the golden section point frame, and the frame number interval includes the start frame number and the end frame number in all the key frames to be processed, for example, the first frame number is C, the start frame number is a, the end frame number is B, and the frame number interval W is [ a, B ], that is, C has an association relationship with W.

The frame number of the key frame to be processed is marked as a second frame number, denoted as F, the first frame number is denoted as Fg, the start frame is denoted as Fs, the end frame number is denoted as Fe, i.e. the maximum value of the frame number interval is Fe and the minimum value is Fs. After each F is obtained, judging whether the F is in a frame number interval, if so, calculating a plurality of picture scores corresponding to each key frame to be processed according to each F, each Fg and Fs and Fe corresponding to each Fg, and then determining the maximum picture score as a golden section point score of a region where a target person in the key frame to be processed is located relative to each intersection point. The calculation formula of the picture score T is as follows:

further, the process of constructing the golden section point frame set comprises the following steps:

As shown in fig. 9 and 10, since there are a plurality of intersections of the golden section points of the picture, there are also a plurality of ways of determining the intersections of the golden section points of the picture. The two-dimensional coordinate system is constructed in the video, the origin of the two-dimensional coordinate system is (0, 0), and the two-dimensional coordinate system is set according to actual requirements, and the two-dimensional coordinate system is set in a mode shown in fig. 9 and 10, namely, the top right corner vertex of the video is taken as the origin (0, 0) of the two-dimensional coordinate system. After the two-dimensional coordinate system is established, the X axis is the length direction of the video, and the Y axis is the width direction of the video, namely the positive direction of the transverse axis, the negative direction of the transverse axis, the positive direction of the longitudinal axis and the negative direction of the longitudinal axis of the two-dimensional coordinate system can be determined. Fig. 9 and 10 show that the intersection of the golden section points of the pictures is determined with the directions of the positive direction of the horizontal axis and the positive direction of the vertical axis, the point (X2, 0) is one golden section point of the picture on the X axis, and the point (0, Y2) is one golden section point of the picture on the Y axis; point C (X2, Y2) is the intersection of the golden section points of one of the pictures on the X-axis and the Y-axis; wherein the point C (X2, Y2) determined by the orientation of the horizontal axis positive direction and the vertical axis positive direction is near the lower left corner vertex of the video. In addition, when the intersection point C of the golden section point of the picture is determined in the direction of the negative horizontal axis direction and the negative vertical axis direction, the determined point C is close to the top right corner vertex of the video; when the intersection point C of the golden section point of the picture is determined by the direction of the positive direction of the horizontal axis and the negative direction of the vertical axis, the determined point C is close to the top left corner vertex of the video; when the intersection point C of the golden section point of the picture is determined in the direction of the negative direction of the horizontal axis and the positive direction of the vertical axis, the determined point C is close to the right lower corner vertex of the video.

After each intersection point of the golden section points of the picture is determined, a starting frame and an ending frame are acquired from a plurality of key frames to be processed, the starting frame corresponds to the picture in which the point A in fig. 9 and 10 is located, the ending frame corresponds to the picture in which the point B in fig. 9 and 10 is located, then a target person contained in the starting frame is connected with a target person contained in the ending frame, a connecting line segment is obtained, the connecting line segment is AB, and the point A and the point B are respectively a certain part of the target person in the starting frame and the ending frame.

Then, a circle tangent to the AB is drawn by taking the intersection point (for example, C) as a circle center, the tangent point of the circle and the AB is expressed as D, and after the tangent point is determined, the video frames at the tangent points are selected as golden section point frames. Then, calculating the length of the AB, the distance DA between each D and the target person contained in the initial frame, and calculating and determining the first frame number P of each golden section point frame according to the length of each AB, each distance DA, the number N (number) of all key frames to be processed and the third frame number (frame number of the initial frame) of the initial frame; the calculation formula is as follows: p=n (DA/AB) a frame number of the start frame. The image golden section point frame comprises a plurality of intersecting points, and the plurality of intersecting points are also obtained, wherein among the plurality of intersecting points, the closest intersecting point can be selected as a target intersecting point, namely, a video frame at the target intersecting point is selected as the golden section point frame.

Further, based on each connecting line segment, the frame number interval of the third frame number of each start frame and the fourth frame number of the corresponding end frame (frame number of the end frame), that is, the maximum value of the frame number interval is the fourth frame number, the minimum value is the third frame number of the start frame, and then the association between each frame number interval and the corresponding first frame number is performed to obtain a plurality of association relations. Wherein the third frame number is Fs, the fourth frame number is Fe, the first frame number is Fg, the association is (Fs, fe, fg), and then a plurality of association relations are stored together to generate the golden section point frame set. For example, a total of 4 correlations are obtained, namely, correlation 1= (Fs 1, fe1, fg 1), correlation 2= (Fs 2, fe2, fg 2), correlation 3= (Fs 3, fe3, fg 3) and correlation 4 (Fs 4, fe4, fg 4), respectively, i.e., correlations 1-4 are included in the golden section point frame set.

Further, as shown in fig. 11, according to the first embodiment, the selecting the video cover of the video from the key frames to be processed according to the action score and the position score includes the following steps:

step S244: drawing n line segments corresponding to n dimension scores, so that one ends of the n line segments intersect at a central point;

Step S245: according to the n dimension scores of the target person in the key frame to be processed, correspondingly increasing each line segment along the direction away from the central point;

step S246: sequentially connecting one end, far away from the center point, of each line segment after the increase to obtain scoring polygons corresponding to each key frame to be processed;

Step S247: and selecting a video cover of the video from the key frames to be processed according to the areas of the scoring polygons.

Specifically, the n dimension scores include an action score and a position score, n is equal to or greater than 2, that is, when n=2, the n dimension scores include an action score and a position score; when n >2, the n dimension scores include both the action score and the position score, but may include other scores, such as an expression score, a degree of interest score, and the like of the target person.

Drawing a plurality of line segments, wherein one ends of the drawn line segments are intersected at a central point O, the length of each line segment is 1, and the drawing of a unit circle with the radius of 1 by taking the central point O as the center of a circle can be understood, then n radiuses are selected from the unit circle, the n radiuses correspond to the n line segments, and the n radiuses respectively represent initial values of action scores, position scores and other scores, namely the initial values are all 1. The n radii divide the unit circle into a plurality of sectors, and the included angles of each sector can be the same or different.

It is assumed that when n=2, the n dimension scores include an action score and a position score, after the unit circle is drawn, 2 radii are selected from the unit circle, the included angle of the 2 radii is one of a right angle, an obtuse angle, and an acute angle, the lengths of the 2 radii are initial values of the action score and the position score, and the included angle of the 2 radii may be preset and a known included angle. Then according to the action score and the position score, correspondingly increasing the 2 radiuses outwards along the radial direction of the unit circle, and sequentially connecting one ends of the 2 increased radiuses far away from the center of the unit circle to form a scoring polygon, wherein the scoring polygon is a scoring triangle; wherein, since the key frame to be processed has a plurality of frames, the scoring triangle has a plurality. Then, calculating the area of the scoring triangle through a triangle area formula, and then using the area of the scoring triangle to represent the comprehensive score of the key frame to be processed.

As shown in fig. 12, assuming that when n=3, n dimension scores include an action score, a position score, and other scores, which are expression scores, 3 radii are selected from the unit circle after the unit circle is drawn. The 3 radii represent initial values of the action score, the position score, and the expression score, respectively, and are 1, i.e., OA, OB, OC are 1, respectively. The included angles between every two of the three radiuses are equal, namely, the angle aob= aoc= boc=120°. AA 'denotes an action score, BB' denotes an expression score, and CC 'denotes a position score, and each radius is correspondingly increased in a direction away from the center point O according to the action score, the position score, and the expression score corresponding to each target person, so as to obtain final action scores, expression scores, and position scores, that is, OA', OB ', and OC', respectively. A ', B', C 'are sequentially connected to form a scoring polygon, and the scoring polygon formed by the scoring polygon is a scoring triangle, namely triangle A' B 'C'. Wherein, since the key frame to be processed has a plurality of frames, the scoring triangle has a plurality. And further calculating the area S of each scoring triangle, wherein the area S has the following calculation formula:

S＝1/2*OA′OB′sin120°A1/2*OA′OC′sin120°A1/2*OC′OB′sin120°。

In addition, when n >3, the drawn line segment has 4 or more, that is, one end of the 4 or more line segments intersects at the center point O. Further, the number of sides of the drawn scoring polygon is 4 or more, for example, the scoring polygon includes a scoring quadrilateral, a scoring pentagon, a scoring hexagon, and the like, according to n dimension scores corresponding to each target person. Wherein, since the key frame to be processed has a plurality of frames, the scoring polygon has a plurality. And then determining the comprehensive score of each key frame to be processed by calculating the area of the scoring polygon, and selecting the video cover of the video according to the comprehensive score of each key frame to be processed, thereby being beneficial to improving the accuracy and quality of video cover selection.

Furthermore, the comprehensive score of the key frame to be processed is represented by scoring the area of the polygon, so that the n dimension scores are closely related to the comprehensive score of the key frame to be processed, the score of the comprehensive score can be influenced by any one of the n dimension scores, the accuracy and quality of video cover selection can be improved by determining the comprehensive score based on the area of the scoring polygon, and the selected video cover can meet the requirements of users.

As shown in fig. 13, the video cover selecting apparatus provided by the present invention, the information obtaining module includes:

An information obtaining module 310, configured to obtain a key frame to be processed in a video, and determine an angle of orientation of a target person in the key frame to be processed, where the key frame to be processed has a plurality of frames;

The action recognition module 320 is configured to recognize a first gesture of the target person, and obtain a preset picture matched with the first gesture;

The score calculating module 330 is configured to correct each action angle of the second gesture action of the reference person in the preset picture according to the facing angle, and determine an action score of the first gesture action according to the facing angle and each corrected action angle;

the cover selection module 340 is configured to determine a position score of the target person in the key frames to be processed, and select a video cover of the video from each key frame to be processed according to the action score and the position score.

It should be noted that the video cover selection device may further include other optional functional modules, so that it may perform other steps involved in the above embodiments. The specific embodiment of the video cover selecting device is basically the same as the embodiments of the video cover selecting method, and will not be described herein.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. The video cover selecting method is characterized by comprising the following steps of:

Determining the position score of the target person in the key frames to be processed, and selecting a video cover of the video from each key frame to be processed according to the action score and the position score;

Wherein the step of determining the action score of the first gesture action according to the facing angle and the corrected action angles includes:

2. The method of claim 1, wherein the step of correcting each action angle of the second gesture action of the reference person in the preset picture according to the facing angle comprises:

3. The method of claim 1, wherein the step of determining a location score of the target person in the key frame to be processed comprises:

4. The method of claim 3 wherein said step of determining golden section point scores for regions of said key frame to be processed where said target person is located relative to each of said intersection points comprises:

5. The method of claim 4, wherein the step of constructing a golden section point frame collection comprises:

6. The method of claim 1, wherein selecting a video cover of the video from each of the pending key frames based on the action score and the location score comprises:

7. A video cover selection device, wherein the video cover selection device comprises:

The cover selection module is used for determining the position score of the target person in the key frames to be processed and selecting the video cover of the video from the key frames to be processed according to the action score and the position score;

The score calculating module is further configured to obtain a maximum facing angle of the facing angles of the target person in each key frame to be processed, a maximum action angle of each corrected action angle, and a preset weight value corresponding to each corrected action angle; determining an angle score of each corrected action angle according to the facing angle, the maximum action angle, each corrected action angle and the preset weight value; determining a comparison score of a first gesture action of the target person in each key frame to be processed and the corrected second gesture action according to each angle score; and determining the action score of the first gesture action according to each comparison score.

8. A terminal device, characterized in that the terminal device comprises: a memory, a processor, and a video cover selection program stored on the memory and executable on the processor, which when executed by the processor, performs the steps of the video cover selection method of any one of claims 1-6.

9. A storage medium having stored thereon a video cover selection program which when executed by a processor performs the steps of the video cover selection method of any one of claims 1-6.