CN108921136A

CN108921136A - Video marker method and device, storage medium, terminal

Info

Publication number: CN108921136A
Application number: CN201810863800.4A
Authority: CN
Inventors: 孙鑫
Original assignee: Shanghai Xiaoyi Technology Co Ltd
Current assignee: Shanghai Xiaoyi Technology Co Ltd
Priority date: 2018-08-01
Filing date: 2018-08-01
Publication date: 2018-11-30

Abstract

A kind of video marker method and device, storage medium, terminal, video marker method include：Obtain video file to be processed；Feature identification is carried out to the video content of the video file, to obtain recognition result；Judge whether the recognition result includes default feature；When the recognition result includes the default feature, the video content comprising the default feature is marked.The efficiency and accuracy of video screening can be improved in technical solution of the present invention.

Description

Video marker method and device, storage medium, terminal

Technical field

The present invention relates to technical field of video processing more particularly to a kind of video marker method and devices, storage medium, end End.

Background technique

In the prior art, user after obtaining the video recorded needs that video content is checked and screened frame by frame, with The video content for being unsatisfactory for custom condition is rejected, guarantees the quality of video content.

But time-consuming and laborious, efficiency is lower is manually checked frame by frame to the screening of video content needs；Meanwhile the selection result With certain artificial subjectivity, screening effect is poor.

Summary of the invention

Present invention solves the technical problem that being how to improve the efficiency and accuracy of video screening.

In order to solve the above technical problems, the embodiment of the present invention provides a kind of video marker method, video marker method includes： Obtain video file to be processed；Feature identification is carried out to the video content of the video file, to obtain recognition result；Judgement Whether the recognition result includes default feature；When the recognition result includes the default feature, to comprising described default The video content of feature is marked.

Optionally, the described pair of video content comprising the default feature be marked including：In the recognition result pair The position that axis is played locating for the video content answered is marked.

Optionally, packet is marked in the position that axis is played locating for the corresponding video content of the recognition result It includes：The initial position of broadcasting axis and end position locating for the video content are marked.

Optionally, the default feature includes default text, and the video content to the video file identifies Including：To in the video file voice or text identify that the recognition result includes text；It is described in the identification As a result include the default feature when, to the video content comprising the default feature be marked including：It is tied in the identification When text in fruit includes the default text, the video content comprising the default text is marked.

Optionally, the default feature includes deliberate action, and the video content to the video file identifies Including：Humanoid in the video file is identified, the recognition result includes humanoid movement；It is described to be tied in the identification Fruit include the default feature when, to the video content comprising the default feature be marked including：In the recognition result In humanoid movement it is consistent with the deliberate action when, the video content comprising the humanoid movement is marked.

Optionally, the default feature includes presetting expression or default face, in the video to the video file Appearance carries out identification：Face in the video file is identified, the recognition result includes face；It is described described Recognition result include the default feature when, to the video content comprising the default feature be marked including：In the knowledge The expression of the face in the face recognition result consistent or described with the default face in other result and the preset table When feelings are consistent, the video content comprising the face is marked.

Optionally, the default feature includes predeterminated frequency, and the video content to the video file identifies Including：Track in the video file is identified, the recognition result includes sound frequency；It is described to be tied in the identification Fruit include the default feature when, to the video content comprising the default feature be marked including：In the recognition result In sound frequency it is consistent with the predeterminated frequency when, the video content comprising the predeterminated frequency is marked.

In order to solve the above technical problems, the embodiment of the invention also discloses a kind of video marker device, video marker device Including：Video file obtains module, suitable for obtaining video file to be processed；Feature recognition module is suitable for the video text The video content of part carries out feature identification, to obtain recognition result；Judgment module, suitable for judge the recognition result whether include Default feature；Mark module is suitable for when the recognition result includes the default feature, to the view comprising the default feature Frequency content is marked.

Optionally, the mark module includes：Axis marking unit is played, is suitable in the corresponding video of the recognition result The position for holding locating broadcasting axis is marked.

Optionally, initial position and the end for playing axis marking unit to axis is played locating for the video content Position is marked.

Optionally, the default feature includes default text, and the feature recognition module includes：Word recognition unit is fitted In in the video file voice or text identify that the recognition result includes text；The mark module includes： When first marking unit suitable for the text in the recognition result includes the default text, to including the default text Video content be marked.

Optionally, the default feature includes deliberate action, and the feature recognition module includes：Action recognition unit is fitted It is identified in humanoid in the video file, the recognition result includes humanoid movement；The mark module includes：The Two marking units, when consistent with the deliberate action suitable for the humanoid movement in the recognition result, to including the people The video content of shape movement is marked.

Optionally, the default feature includes default expression or default face, the feature recognition module include：Face is known Other unit, suitable for identifying to the face in the video file, the recognition result includes face；The mark module packet It includes：Third marking unit, suitable for the identification knot consistent or described with the default face of the face in the recognition result When the expression of face in fruit is consistent with the default expression, the video content comprising the face is marked.

Optionally, the default feature includes predeterminated frequency, and the feature recognition module includes：Track recognition unit is fitted It is identified in the track in the video file, the recognition result includes sound frequency；The mark module includes：The Four marking units, when consistent with the predeterminated frequency suitable for the sound frequency in the recognition result, to comprising described pre- If the video content of frequency is marked.

The embodiment of the invention also discloses a kind of storage mediums, are stored thereon with computer instruction, the computer instruction The step of video marker method is executed when operation.

The embodiment of the invention also discloses a kind of terminal, including memory and processor, being stored on the memory can The computer instruction run on the processor, the processor execute the video marker when running the computer instruction The step of method.

Compared with prior art, the technical solution of the embodiment of the present invention has the advantages that：

Technical solution of the present invention obtains video file to be processed；Feature is carried out to the video content of the video file Identification, to obtain recognition result；Judge whether the recognition result includes default feature；It include described pre- in the recognition result If when feature, the video content comprising the default feature is marked.Technical solution of the present invention passes through to video to be processed Feature identification is carried out, and video content is marked by recognition result and the inclusion relation determination of default feature, so that packet Video content containing default feature is screened out, is further processed for user；It avoids and artificially looks into the prior art Efficiency brought by seeing and accuracy problem improve efficiency and accuracy to video screening.

Further, the position that axis is played locating for the corresponding video content of the recognition result is marked.This hair Bright technical solution is marked video content by the broadcasting shaft position in video file, is conducive to user's dragging broadcasting axis and looks into The video content for seeing label, improve labeled video content checks convenience.

Further, the default feature includes default text or the default feature includes deliberate action, Huo Zhesuo State that default feature includes deliberate action or the default feature includes predeterminated frequency.Technical solution of the present invention is more by being arranged The default feature of kind, and feature identification correspondingly is carried out to video content in different ways, it realizes and video content is carried out The rich and diversity of label, expands the application range of video marker method.

Detailed description of the invention

Fig. 1 is a kind of flow chart of video marker method of the embodiment of the present invention；

Fig. 2 is a kind of specific embodiment flow chart of video marker method of the embodiment of the present invention；

Fig. 3 is the specific embodiment flow chart of another kind video marker method of the embodiment of the present invention；

Fig. 4 is the specific embodiment flow chart of another video marker method of the embodiment of the present invention；

Fig. 5 is the specific embodiment flow chart of another video marker method of the embodiment of the present invention；

Fig. 6 is a kind of structural schematic diagram of video marker device of the embodiment of the present invention；

Fig. 7 is a kind of structural schematic diagram of the specific embodiment of video marker device of the embodiment of the present invention.

Specific embodiment

As described in the background art, the screening needs of video content are manually checked frame by frame in the prior art, it is time-consuming and laborious, Efficiency is lower；Meanwhile the selection result has certain artificial subjectivity, screening effect is poor.

Technical solution of the present invention passes through recognition result and default feature by carrying out feature identification to video to be processed Inclusion relation determination video content is marked so that comprising preset feature video content be screened out, for user into The further processing of row；Efficiency brought by artificially checking in the prior art and accuracy problem are avoided, is improved to video The efficiency and accuracy of screening.

To make the above purposes, features and advantages of the invention more obvious and understandable, with reference to the accompanying drawing to the present invention Specific embodiment be described in detail.

Fig. 1 is a kind of flow chart of video marker method of the embodiment of the present invention.

Video marker method shown in Fig. 1 may comprise steps of：

Step S101：Obtain video file to be processed；

Step S102：Feature identification is carried out to the video content of the video file, to obtain recognition result；

Step S103：Judge whether the recognition result includes default feature；

Step S104：When the recognition result includes the default feature, in the video comprising the default feature Appearance is marked.

In specific implementation, video file to be processed, which can be, to be prerecorded, and is also possible to enroll in real time.Example Such as, in the case where quality of instruction monitors scene, the instructional video enrolled in instructional space can be obtained in real time.

For the video prerecorded, can be pre-stored in database.Step S101 can be adjusted from the database Take video file to be processed.

In the specific implementation of step S102, carrying out feature identification to video content can refer in extraction video content Feature, recognition result may include the feature of video content.Specifically, when video content includes face, it can be in video Face in appearance is identified；When video content includes humanoid, humanoid in video content can be identified；Video content When including voice, the voice in video content can be identified.

It is understood that video content may include any enforceable feature, and can be to any in video content Enforceable feature carries out feature identification, and the embodiment of the present invention is without limitation.

After obtaining the recognition result of video content, in the specific implementation of step S103 and step S104, it can be determined that Whether recognition result includes default feature.If recognition result includes default feature, to the video comprising the default feature Content is marked, namely the corresponding video content of recognition result is marked.

Specifically, default feature can be it is pre-set.Specifically, default feature can be deliberate action, preset Face, default text etc..Under different application scenarios, different default features can be set.

The embodiment of the present invention passes through the packet of recognition result and default feature by carrying out feature identification to video to be processed Video content is marked in the determination containing relationship, so that being screened out comprising the video content for presetting feature, carries out for user Further processing；Efficiency brought by artificially checking in the prior art and accuracy problem are avoided, improves and video is sieved The efficiency and accuracy of choosing.

In a specific embodiment of the invention, about the specific mark mode to video content, it can be in the identification As a result the position that axis is played locating for corresponding video content is marked.

Specifically, playing axis has time attribute.The position that axis is played locating for video content is determined for regarding The time location of frequency content plays the position that axis is played locating for axis to the video content by dragging, can play described Video content is checked for user.The broadcasting axis is referred to as time shaft.

Specifically, it is marked can be in the position for playing axis and be marked using the shapes such as arrow or point.More specifically Ground, the color of mark shape can be any enforceable color such as red, yellow.

The embodiment of the present invention is marked video content by the broadcasting shaft position in video file, and video may be implemented The intuitive of content-label；Be conducive to user and drag the video content that broadcasting axis checks label, improves in labeled video That holds checks convenience.

It should be noted that any other enforceable mode can also be to the mark mode of video content, such as Default video frame etc. is inserted into video content comprising the default feature, the embodiment of the present invention is without limitation.

Further, the initial position of broadcasting axis and end position locating for the video content are marked.

Specifically, the video content comprising the default feature may include multiple continuous video frames.In this feelings Under condition, when the video content is marked, it can exist to video content in the initial position and video content for playing axis The end position for playing axis is marked.In other words, video content can correspond to the video content in the initial position for playing axis Start frame, video content can correspond to the end frame of the video content in the end position for playing axis.

In turn, user plays axis to labeled initial position and end position by dragging, can check labeled Video content, the convenience that the intuitive and user for realizing label are checked.

The present invention is a kind of about in the specific embodiment of step S103 and step S104, referring to figure 2., video marker side Method may comprise steps of：

Step S201：To in the video file voice or text identify that the recognition result includes text；

Step S202：When text in the recognition result includes the default text, to including the default text Video content be marked.

In specific implementation, if in video file including voice, speech recognition is carried out to the voice in video file；People Include subtitle in work video file, then Text region is carried out to the subtitle in video file.If in video file including voice And letter, then one of which can be identified, both of which can also be identified, and two kinds of recognition results progress are whole Conjunction and duplicate removal, obtain final recognition result.

It is equal that the recognition result for identifying and being identified to the subtitle in video file is carried out to the voice in video file Including text.

Default text can be preset by user.It, can be in video file to be processed by the way that default text is arranged Determine the video content comprising default text.So as to which the video content is marked.

In specific application scenarios, it is dirty word that default text, which can be set, then packet can be determined through the above way Video content containing dirty word, and be marked.User can be quickly found after getting the video file after label and include The video content of dirty word, so as to delete the video content, or using the video content as evidence to video text People in part is monitored.

It is understood that speech recognition and Text region can be realized using any enforceable technology or algorithm, The embodiment of the present invention is without limitation.

It is of the invention another about in the specific embodiment of step S103 and step S104, referring to figure 3., video marker Method may comprise steps of：

Step S301：Humanoid in the video file is identified, the recognition result includes humanoid movement；

Step S302：When humanoid movement in the recognition result is consistent with the deliberate action, to comprising described The video content of humanoid movement is marked.

In specific implementation, if video file include it is humanoid, to video file carry out Human detection.It is wrapped in recognition result It includes humanoid in the video file.Further, humanoid to indicate to act, then identify it is humanoid after, can determine view Movement in frequency file.

Deliberate action can be preset by user.It, can be in video file to be processed by the way that deliberate action is arranged Determine the video content comprising deliberate action.So as to which the video content is marked.

In specific application scenarios, in teaching monitoring scene, it is to raise one's hand, push and shove that deliberate action, which can be set,.So The video content comprising raising one's hand or pushing and shoving can be determined through the above way, and is marked.User is after getting label After video file, it can be quickly found the video content comprising raising one's hand or pushing and shoving, so as to include in the video raised one's hand It takes in and makees splendid contents, will be deleted comprising the video content pushed and shoved, or push and shove the user of movement to execution in video file It is monitored.

It is understood that Human detection can realize that the present invention is implemented using any enforceable technology or algorithm Example is without limitation.

The present invention another about in the specific embodiment of step S103 and step S104, referring to figure 4., video marker Method may comprise steps of：

Step S401：Face in the video file is identified, the recognition result includes face；

Step S402：The face recognition result consistent or described with the default face in the recognition result In face expression it is consistent with the default expression when, the video content comprising the face is marked.

It may include face in video file in specific implementation.It, can be with by being identified to the face in video file The identity of face is obtained, or can determine the expression of face.

Default expression or default face can be configured in advance by user.By the way that default face is arranged, can help to use Family navigates to the video content comprising default face in video file, namely user is helped to look for people.By the way that default expression is arranged, The video content comprising default expression can be determined in video file.

In a specific application scenarios, default face can be the face to be looked for of user, such as Missing Persons, criminal Deng video file can be market monitor video, road monitor video etc..Since monitor video is usually long, such as usually It is the monitor video in three days, one week or one month, therefore packet can be quickly determined in monitor video through the above way Video content containing default face, the problem of avoiding low efficiency brought by artificial checking monitoring video.

In another specific application scenarios, default expression can be laugh, and video file can be instructional video.It is logical Aforesaid way is crossed, can determine that comprising human face expression be the video content laughed, so that it is determined that the student played on classroom；Or Person is the quality of instruction that the video content laughed can be used for evaluating the video file comprising human face expression.

It is understood that default expression is also possible to any other enforceable expression, such as smile, is angry, is sad, Yawn etc., the embodiment of the present invention is without limitation.

The present invention another about in the specific embodiment of step S103 and step S104, referring to figure 5., video marker Method may comprise steps of：

Step S501：Track in the video file is identified, the recognition result includes sound frequency；

Step S502：When sound frequency in the recognition result is consistent with the predeterminated frequency, to comprising described The video content of predeterminated frequency is marked.

In the present embodiment, when video file includes track, the track in video file can be identified.And pass through by The frequency of track in video file is compared with predeterminated frequency, can determine the video content comprising predeterminated frequency, go forward side by side Line flag.

Predeterminated frequency, which can be, to be predefined and is arranged.

In specific application scenarios, video file can be instructional video.Predeterminated frequency can be teaching person, such as always The track frequency of teacher.By determining the video content comprising predeterminated frequency, the teaching duration of teaching person can be determined.Alternatively, Video content not comprising predeterminated frequency can be marked, to determine the duration that do not teach, and for evaluating and testing quality of instruction.

It should be noted that step shown in Fig. 2 to Fig. 5 is that step S101 and step S102 shown in Fig. 1 are held later Capable；Specific embodiment about step S101 and step S102 please refers to previous embodiment, and details are not described herein again.

In a preferred embodiment of the invention, it can also include the following steps after step S104 shown in FIG. 1：It will label The video content of completion is sent to default terminal device.

In specific implementation, default terminal device can be the equipment that relevant person in charge uses, and pass through the video after marking Content is sent to associated user, and associated user can be made to be monitored using the video content, or for promotional display etc..

Please refer to Fig. 6, video marker device 60 may include video file obtain module 601, feature recognition module 602, Judgment module 603 and mark module 604.

Wherein, video file obtains module 601 and is suitable for obtaining video file to be processed；Feature recognition module 602 is suitable for Feature identification is carried out to the video content of the video file, to obtain recognition result；Judgment module 603 is suitable for judging the knowledge Whether other result includes default feature；Mark module 604 be suitable for the recognition result include the default feature when, to comprising The video content of the default feature is marked.

In a specific embodiment of the invention, mark module 604 may include playing axis marking unit, be suitable for described The position that axis is played locating for the corresponding video content of recognition result is marked.

Further, the axis marking unit that plays is to initial position and the knot for playing axis locating for the video content Beam position is marked.

Fig. 7 is please referred to, feature recognition module 602 and mark module 604 can be respectively provided with multiple specific units, multiple tools Body unit can be applied in combination, and can be used for different application scenarios.

Specifically, the feature recognition module 602 shown in Fig. 6 may include word recognition unit 6021, action recognition list Member 6022, face identification unit 6023 and/or track recognition unit 6024.Mark module 604 may include the first marking unit 6041, the second marking unit 6042, third marking unit 6043 and/or the 4th marking unit 6044.

In a specific embodiment of the invention, feature recognition module 602 may include word recognition unit 6021, be suitable for To in the video file voice or text identify that the recognition result includes text.Mark module 604 may include When first marking unit 6041 suitable for the text in the recognition result includes the default text, to comprising described default The video content of text is marked.

In another embodiment of the present invention, the feature recognition module 602 may include action recognition unit 6022, suitable for identifying to humanoid in the video file, the recognition result includes humanoid movement.Mark module 604 It may include the second marking unit 6042, it is consistent with the deliberate action suitable for the humanoid movement in the recognition result When, the video content comprising the humanoid movement is marked.

In another specific embodiment of the invention, the feature recognition module 602 may include face identification unit 6023, suitable for identifying to the face in the video file, the recognition result includes face.Mark module 604 can be with Including third marking unit 6043, suitable in the recognition result face and the default face it is consistent or described When the expression of face in recognition result is consistent with the default expression, the video content comprising the face is marked Note.

In another specific embodiment of the invention, the feature recognition module 602 may include track recognition unit 6024, suitable for identifying to the track in the video file, the recognition result includes sound frequency.Mark module 604 It may include the 4th marking unit 6044, it is consistent with the predeterminated frequency suitable for the sound frequency in the recognition result When, the video content comprising the predeterminated frequency is marked.

It is understood that multiple units shown in Fig. 7 are also possible to any other enforceable combination, the present invention is real It is without limitation to apply example.For example, word recognition unit 6021 and the first marking unit 6041 and action recognition unit 6022 It is applied in combination with the second marking unit 6042；Face identification unit 6023 and third marking unit 6043 and track recognition unit 6024 and the 4th marking unit 6044 be applied in combination.

Working principle, more contents of working method about the video marker device 60, are referred to Fig. 1 to Fig. 5 In associated description, which is not described herein again.

The embodiment of the invention also discloses a kind of storage mediums, are stored thereon with computer instruction, the computer instruction The step of video marker method shown in Fig. 1 to Fig. 5 can be executed when operation.The storage medium may include ROM, RAM, Disk or CD etc..The storage medium can also include non-volatility memorizer (non-volatile) or non-transient (non-transitory) memory etc..

The embodiment of the invention also discloses a kind of terminal, the terminal may include memory and processor, the storage The computer instruction that can be run on the processor is stored on device.The processor can be with when running the computer instruction 1 is executed to shown in Fig. 5 the step of video marker method.The terminal includes but is not limited to mobile phone, computer, tablet computer Equal terminal devices.

Although present disclosure is as above, present invention is not limited to this.Anyone skilled in the art are not departing from this It in the spirit and scope of invention, can make various changes or modifications, therefore protection scope of the present invention should be with claim institute Subject to the range of restriction.

Claims

1. a kind of video marker method, which is characterized in that including：

Obtain video file to be processed；

Feature identification is carried out to the video content of the video file, to obtain recognition result；

Judge whether the recognition result includes default feature；

When the recognition result includes the default feature, the video content comprising the default feature is marked.

2. video marker method according to claim 1, which is characterized in that the described pair of video comprising the default feature Content be marked including：

The position that axis is played locating for the corresponding video content of the recognition result is marked.

3. video marker method according to claim 2, which is characterized in that described in the corresponding video of the recognition result Locating for content play axis position be marked including：

The initial position of broadcasting axis and end position locating for the video content are marked.

4. video marker method according to claim 1, which is characterized in that the default feature includes default text, institute State to the video content of the video file carry out identification include：

To in the video file voice or text identify that the recognition result includes text；

It is described the recognition result include the default feature when, the video content comprising the default feature is marked Including：

When text in the recognition result includes the default text, the video content comprising the default text is carried out Label.

5. video marker method according to claim 1, which is characterized in that the default feature includes deliberate action, institute State to the video content of the video file carry out identification include：

Humanoid in the video file is identified, the recognition result includes humanoid movement；

When humanoid movement in the recognition result is consistent with the deliberate action, to the video comprising the humanoid movement Content is marked.

6. video marker method according to claim 1, which is characterized in that the default feature includes default expression or pre- If face, the video content to the video file carries out identification and includes：

Face in the video file is identified, the recognition result includes face；

The expression of face in the recognition result consistent or described with the default face of the face in the recognition result When consistent with the default expression, the video content comprising the face is marked.

7. video marker method according to claim 1, which is characterized in that the default feature includes predeterminated frequency, institute State to the video content of the video file carry out identification include：

Track in the video file is identified, the recognition result includes sound frequency；

When sound frequency in the recognition result is consistent with the predeterminated frequency, to the video comprising the predeterminated frequency Content is marked.

8. a kind of video marker device, which is characterized in that including：

Video file obtains module, suitable for obtaining video file to be processed；

Feature recognition module carries out feature identification suitable for the video content to the video file, to obtain recognition result；

Judgment module, suitable for judging whether the recognition result includes default feature；

Mark module is suitable for when the recognition result includes the default feature, in the video comprising the default feature Appearance is marked.

9. video marker device according to claim 8, which is characterized in that the mark module includes：

Axis marking unit is played, the position suitable for playing axis locating for the corresponding video content of the recognition result is marked Note.

10. video marker device according to claim 9, which is characterized in that the broadcasting axis marking unit is to the view The initial position of axis is played locating for frequency content and end position is marked.

11. video marker device according to claim 8, which is characterized in that the default feature includes default text, institute Stating feature recognition module includes：

Word recognition unit, suitable for in the video file voice or text identify, the recognition result include text Word；

The mark module includes：

When first marking unit suitable for the text in the recognition result includes the default text, to comprising described default The video content of text is marked.

12. video marker device according to claim 8, which is characterized in that the default feature includes deliberate action, institute Stating feature recognition module includes：

Action recognition unit, suitable for identifying to humanoid in the video file, the recognition result includes humanoid movement；

The mark module includes：

Second marking unit, when consistent with the deliberate action suitable for the humanoid movement in the recognition result, to comprising The video content of the humanoid movement is marked.

13. video marker device according to claim 8, which is characterized in that the default feature include default expression or Default face, the feature recognition module include：

Face identification unit, suitable for identifying to the face in the video file, the recognition result includes face；

The mark module includes：

Third marking unit, suitable for the identification consistent or described with the default face of the face in the recognition result As a result when the expression of the face in is consistent with the default expression, the video content comprising the face is marked.

14. video marker device according to claim 8, which is characterized in that the default feature includes predeterminated frequency, institute Stating feature recognition module includes：

Track recognition unit, suitable for identifying to the track in the video file, the recognition result includes sound frequency；

The mark module includes：

4th marking unit, when consistent with the predeterminated frequency suitable for the sound frequency in the recognition result, to comprising The video content of the predeterminated frequency is marked.

15. a kind of storage medium, is stored thereon with computer instruction, which is characterized in that the computer instruction executes when running Described in any one of claims 1 to 7 the step of video marker method.

16. a kind of terminal, including memory and processor, the meter that can be run on the processor is stored on the memory Calculation machine instruction, which is characterized in that perform claim requires any one of 1 to 7 institute when the processor runs the computer instruction The step of stating video marker method.