CN109635158A

CN109635158A - For the method and device of video automatic labeling, medium and electronic equipment

Info

Publication number: CN109635158A
Application number: CN201811542198.0A
Authority: CN
Inventors: 陈方毅; 陈晓君; 李君懿; 陶建
Original assignee: Hangzhou Shaozi Street Information Technology Co Ltd
Current assignee: Hangzhou Shaozi Street Information Technology Co Ltd
Priority date: 2018-12-17
Filing date: 2018-12-17
Publication date: 2019-04-16

Abstract

The disclosure is directed to a kind of for the method and device of video automatic labeling, medium and electronic equipment, belong to technical field of video processing.This method comprises: the video is decomposed framing in response to getting video；The frame resolved into is grouped according to pre-defined rule；Each group of frame is connected into sequence of frames of video；The sequence of frames of video is inputted into machine learning model, by the label of the machine learning model output video frame sequence；Label based on sequence of frames of video labels for the video.The disclosure is labelled automatically for video by machine learning model, improves the accuracy rate and efficiency to label.

Description

For the method and device of video automatic labeling, medium and electronic equipment

Technical field

This disclosure relates to technical field of video processing, in particular to a kind of method for video automatic labeling and Device, medium and electronic equipment.

Background technique

Video tab is the label classified according to video attribute and set, be video content is ranked up and to User provides the important evidence of personalized recommendation etc..

In recent years, propagation information carried out by video, show self etc. with very high temperature.It is emerging that user finds oneself sense The video of interest and certain businessmans or platform recommend video to require the label according to video.In particular, usually having in video very big Part is no voice and caption information.Not according to the method to label conventionally by video speech and subtitle to video It is feasible.And it relies on the mode manually demarcated and will cause poor efficiency, the low accuracy problem to label to this partial video.

Accordingly, it is desirable to provide a kind of is newly method and apparatus, medium and the electronic equipment of video automatic labeling.

It should be noted that information is only used for reinforcing the reason to the background of the disclosure disclosed in above-mentioned background technology part Solution, therefore may include the information not constituted to the prior art known to persons of ordinary skill in the art.

Summary of the invention

The disclosure is designed to provide a kind of scheme for video automatic labeling, and then at least to a certain extent gram Inefficient, the low accuracy rate problem to label caused by taking the limitation and defect due to the relevant technologies for video.

According to one aspect of the disclosure, a kind of method for video automatic labeling is provided, comprising:

In response to getting video, the video is decomposed into framing；

The frame resolved into is grouped according to pre-defined rule；

Each group of frame is connected into sequence of frames of video；

The sequence of frames of video is inputted into machine learning model, by the mark of the machine learning model output video frame sequence Label；

Label based on sequence of frames of video labels for the video,

Wherein, the machine learning model is trained as follows: by each video in sequence of frames of video sample set Frame sequence sample inputs the machine learning model, and the sequence of frames of video sample is by the video of various known labels according to institute Pre-defined rule grouping is stated, and each group of frame is connected in series, machine learning model output video frame sequence samples institute From video label, be compared with video known label, if it is inconsistent, adjusting in the machine learning model Coefficient keeps the label of the machine learning model output consistent with the video known label.

In a kind of exemplary embodiment of the disclosure, include: according to pre-defined rule grouping by the frame resolved into

Using continuous predetermined number frame as one group.

Take predetermined number frame as one group at random from the frame of decomposition.

The frame that the video resolves into is divided into N group, N is positive integer, and the number of sequence of frames of video is also N, by frame number I-th group is formed for the frame of aN+i, wherein a and i is positive integer, 0≤a≤N-1,1≤i≤N.

In a kind of exemplary embodiment of the disclosure, the frame by each group is connected into sequence of frames of video and includes:

Each group of frame is connected into sequence of frames of video according to the sequencing of the frame number of each frame.

In a kind of exemplary embodiment of the disclosure, the label based on sequence of frames of video is the video mark Label, comprising:

Using the maximum label of probability right accounting in the label of obtained sequence of frames of video as the final label of video.

By the maximum top n label of number in the label of obtained sequence of frames of video all as the label beaten for video.

In a kind of exemplary embodiment of the disclosure, by the maximum label of number in the label of obtained sequence of frames of video As the label beaten for video, comprising:

If the maximum label of number have it is multiple, increase resolve into frame grouping number.

In a kind of exemplary embodiment of the disclosure, the number for increasing the frame grouping resolved into includes:

If pre-defined rule includes using continuous predetermined number frame as one group, the number of increase group makes at least one The frame that grouping includes partly overlaps.

The frame resolved into is grouped according to the first pre-defined rule, and is grouped according to the second pre-defined rule, the first pre-defined rule It is different from the second pre-defined rule.

According to one aspect of the disclosure, a kind of device for video automatic labeling is provided, comprising:

Decomposing module, in response to getting video, the video to be decomposed framing；

Grouping module, for the frame resolved into be grouped according to pre-defined rule；

Laminating module, for each group of frame to be connected into sequence of frames of video；

First demarcating module, for the sequence of frames of video to be inputted machine learning model, by the machine learning model The label of output video frame sequence；

Second demarcating module labels for the label based on sequence of frames of video for the video.According to the one of the disclosure A aspect provides a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer Method described in any of the above embodiments is realized when program is executed by processor.

According to one aspect of the disclosure, a kind of electronic equipment is provided characterized by comprising

Processor；And

Memory, for storing the executable instruction of the processor；

Wherein, the processor is configured to execute side described in any of the above embodiments via the executable instruction is executed Method.

A kind of scheme for video automatic labeling of the disclosure.In the program, in response to getting video, by the video Decompose framing；The frame resolved into is grouped according to pre-defined rule；Each group of frame is connected into sequence of frames of video；By the video Frame sequence inputs machine learning model, by the label of the machine learning model output video frame sequence；Based on sequence of frames of video Label, label for the video.The disclosure labelled automatically for video by machine learning model, is improved and is labelled Accuracy rate and efficiency, and in order to avoid inputting the poor efficiency that machine learning model brings machine learning model one by one, it adopts Concatenated mode after being grouped is taken, makes the frame for inputting machine learning model that there is regularity, the high efficiency of input is further increased and beaten The accuracy rate of label.

It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not The disclosure can be limited.

Detailed description of the invention

The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the disclosure Example, and together with specification for explaining the principles of this disclosure.It should be evident that the accompanying drawings in the following description is only the disclosure Some embodiments for those of ordinary skill in the art without creative efforts, can also basis These attached drawings obtain other attached drawings.

Fig. 1 schematically shows a kind of flow chart of method for video automatic labeling.

Fig. 2 schematically shows a kind of Application Scenarios-Example figure of method for video automatic labeling.

Fig. 3 schematically shows a kind of block diagram of device for video automatic labeling.

Fig. 4 schematically shows a kind of for realizing the above-mentioned electronic equipment example block diagram for video automatic labeling method.

Fig. 5 schematically shows a kind of for realizing the above-mentioned computer-readable storage medium for video automatic labeling method Matter.

Specific embodiment

Example embodiment is described more fully with reference to the drawings.However, example embodiment can be with a variety of shapes Formula is implemented, and is not understood as limited to example set forth herein；On the contrary, thesing embodiments are provided so that the disclosure will more Fully and completely, and by the design of example embodiment comprehensively it is communicated to those skilled in the art.Described feature, knot Structure or characteristic can be incorporated in any suitable manner in one or more embodiments.In the following description, it provides perhaps More details fully understand embodiment of the present disclosure to provide.It will be appreciated, however, by one skilled in the art that can It is omitted with technical solution of the disclosure one or more in the specific detail, or others side can be used Method, constituent element, device, step etc..In other cases, be not shown in detail or describe known solution to avoid a presumptuous guest usurps the role of the host and So that all aspects of this disclosure thicken.

In addition, attached drawing is only the schematic illustrations of the disclosure, it is not necessarily drawn to scale.Identical attached drawing mark in figure Note indicates same or similar part, thus will omit repetition thereof.Some block diagrams shown in the drawings are function Energy entity, not necessarily must be corresponding with physically or logically independent entity.These function can be realized using software form Energy entity, or these functional entitys are realized in one or more hardware modules or integrated circuit, or at heterogeneous networks and/or place These functional entitys are realized in reason device device and/or microcontroller device.

A kind of method for video automatic labeling is provided firstly in this example embodiment.A kind of application of this method In scene, some videos are obtained according to the demand that user video uses first, these demands used may include commercial object Recommendation, video display platform classification show etc.；These videos can be crawled from public network, can be from having shooting The capture apparatus of function, at the same these videos can be with subtitle perhaps voice can also not have subtitle or voice, Particular determination is not done in the present exemplary embodiment to this.It then is these video automatic Calibration labels using machine learning model, Video can be recommended or be classified according to the label demarcated for video during recommendation or classification etc..Utilizing machine Before learning model is these video automatic Calibration labels, by the way that video is decomposed these frames after framing according to scheduled rule Grouping, each framing for then obtaining grouping are defeated by sequence of frames of video respectively according to the sequential series of frame number at sequence of frames of video Enter machine learning model calibration label, the pressure of machine learning model can be reduced in this way, improve the input rate of input model. Finally, the label of video is obtained to the label that sequence of frames of video is demarcated based on machine learning model.Disclosure embodiment is general For labelling before video is online for video.The effect to label be after video is online, can according to these labels to Video is recommended at family, can also realize fast search by these labels when user searches for video.This mode can be certain The accuracy rate and rate that the calibration of the video of no subtitle and voice is improved in degree, have equally been readily applicable to subtitle and language The video of sound.This can run on server for the method for video automatic labeling, can also run on server cluster or cloud Server etc., certainly, those skilled in the art can also run method of the invention in other platforms according to demand, this is exemplary Particular determination is not done in embodiment to this.Refering to what is shown in Fig. 1, should may include following step for the method for video automatic labeling It is rapid:

Step S110. decomposes framing in response to getting video, by the video.

The frame resolved into is grouped by step S120. according to pre-defined rule.

Each group of frame is connected into sequence of frames of video by step S130..

The sequence of frames of video is inputted machine learning model by step S140., exports video by the machine learning model The label of frame sequence.

Label of the step S150. based on sequence of frames of video labels for the video,

In the above-mentioned method for video automatic labeling, by the way that in response to getting video, the video is decomposed framing The frame resolved into is grouped according to pre-defined rule afterwards, each group of frame is then connected into sequence of frames of video, while by the view Frequency frame sequence inputs machine learning model, by the label of the machine learning model output video frame sequence, finally based on described The label of sequence of frames of video labels for the video, solve to the video of acquisition according to video attribute, classification demand etc. into The problem of row automatic Calibration.It is labelled automatically for video by machine learning model, improves the accuracy rate and efficiency to label, And in order to avoid inputting the poor efficiency that machine learning model brings machine learning model one by one, take concatenated after being grouped Mode makes the frame for inputting machine learning model have regularity, and the high efficiency of input further increases the accuracy rate to label.

In the following, by conjunction with attached drawing to each step in the method for video automatic labeling above-mentioned in this example embodiment Carry out detailed explanation and explanation.

In step s 110, in response to getting video, the video is decomposed into framing.

In this example embodiment, refering to what is shown in Fig. 2, firstly, server 201 is from user terminal 202 or other clothes Business device 203 obtains video.When obtaining video from user terminal 202, obtaining video is to be realized by user to the upload of video. When obtaining video from other servers 203, can be carried out by the website periodically or non-periodically to other same displaying videos It crawls and obtains.Certainly, in the case where crawling, the authorization of the website of other same displaying videos is obtained.

The user terminal can be mobile terminal device (such as can be mobile phone), be also possible to other have storage or The terminal device (such as can be camera, wrist-watch etc.) of video capability is shot, there is no special restriction on this for this example.Further , which may include one, also may include it is multiple, there is no special restriction on this for this example；Other servers It can be any server that can store video from internet or other storage equipment, which can be with Including one, also may include it is multiple, there is no special restriction on this for this example.It then, can root by video decomposition framing It is carried out according to the frame head mark of each frame.Frame head can be added in transmission, the frame head band in each frame, i.e. video a picture There is frame head mark.It is identified by the frame head, video accurately can be decomposed into framing.

In the step s 120, the frame resolved into is grouped according to pre-defined rule.

In a kind of this exemplary embodiment, by the frame resolved into according to pre-defined rule be grouped in pre-defined rule be by Continuous predetermined number frame is as one group, comprising:, will even according to the adaptation frame number of machine learning model after video is decomposed framing Continuous multiple frames are divided into one group, such as the video for including in each superposition frame of the machine learning model identification for demarcating label Best frame number is 12, then continuous 12 frames in frame that video decomposes is divided into one group；Further, every group contain Frame number, which can not be, is necessary for best frame number.

It is simplified processing using continuous predetermined number frame as one group of benefit, it is convenient and easy.

In a kind of this exemplary embodiment, by the frame resolved into according to pre-defined rule be grouped in pre-defined rule be from Take predetermined number frame as one group at random in the frame of decomposition, comprising: by the video frame obtained after decomposition according to machine learning mould The adaptation frame number of type randomly selects M and is used as one group；Further, every group of frame number contained, which can not be, is necessary for optimum frame Number.

The benefit of scheme for extracting random frame is grouped after upsetting each frame, can prevent on the time relatively close to Frame is assigned in a group, so that the case where frame in a group may reflect the frame of various time points, there is every group of frame more Representativeness, to improve the effect that video labels.

In a kind of this exemplary embodiment, by the frame resolved into according to pre-defined rule be grouped in pre-defined rule be by The frame that the video resolves into is divided into N group, and N is positive integer, and the number for being superimposed frame is also N, and the frame that frame number is aN+i is formed I-th group, wherein a and i is positive integer, 0≤a≤N-1,1≤i≤N.For example, being divided into 5 after video is decomposed into 100 frames Frame number is the combination of the frames such as 2,7,12,17,22,27 wherein being that the frames groups such as 1,6,11,16,21,26 are combined into one group by frame number by group Be one group, behind each group, such combination can be so that the frame in group be evenly distributed in video, and can To bring a part of randomness, the accuracy of machine learning model calibration label is being improved to a certain degree and is expanding machine learning mould The range of type calibration video.

Although the embodiment of three groupings foregoing illustrate, however, those skilled in the art should understand that, the disclosure is unlimited In three of the above packet mode.

In a kind of originally exemplary embodiment, the frame resolved into is grouped according to the first pre-defined rule, and according to second Pre-defined rule grouping, wherein first pre-defined rule and the second pre-defined rule are predetermined in 3 embodiments before this example Any one either above-mentioned undisclosed but those skilled in the art in rule benefit from above-mentioned introduction it is contemplated that coming out The rule of other embodiment, and the first pre-defined rule is different with the second pre-defined rule, for example, the first pre-defined rule is Pre-defined rule in 1st embodiment, i.e., using continuous predetermined number frame as one group, the second pre-defined rule is the 2nd reality The pre-defined rule in mode is applied, the video frame obtained after decomposition is randomly selected M according to the adaptation frame number of machine learning model As one group, but the first pre-defined rule and the second pre-defined rule cannot be simultaneously pre-defined rule in the 1st embodiment or Pre-defined rule in the 2nd embodiment of person.Meanwhile it is the video frame obtained after decomposition is pre- by the first pre-defined rule and second When set pattern is then grouped simultaneously, the group number of the group finally obtained can be and pass through the first pre-defined rule or the second pre- set pattern It is then individually grouped that obtained group number is consistent, is also possible to twice of group number being individually grouped.

In one embodiment, the group number of the group finally obtained with pass through the first pre-defined rule or the second pre-defined rule list Solely being grouped obtained group number unanimously this can be implemented so that

All frames resolved into are divided into first part's frame and second part frame；

First part's frame is grouped using the first pre-defined rule, second part frame is grouped using the first pre-defined rule, is pressed The group that the frame for being combined together as resolving into that the group and the second pre-defined rule being divided into according to the first pre-defined rule are divided into is divided into.

For example, first part is the first half of all frames resolved into, such as 100 frames are decomposited, preceding 50 frames are the A part.The first half of the frame number resolved into is applied into the first pre-defined rule, i.e., using continuous predetermined number frame as one group, example Such as 10 one group of frame, it is divided into 5 groups；Second part is the later half of all frames resolved into, such as decomposites 100 frames, rear 50 A frame is second part.The later half of the frame number resolved into is applied into the second pre-defined rule, i.e., is made continuous predetermined number frame It is one group, such as 10 one group of frame, is divided into 5 groups.10 groups are added up in total, the group being divided into as 100 frames resolved into.

In one embodiment, the group number of the group finally obtained is by the first pre-defined rule or the second pre-defined rule list Solely being grouped 2 times of obtained group number this can be implemented so that

All frames decomposited are grouped using the first pre-defined rule, all frames decomposited are used into the second pre-defined rule Grouping, the group being divided into according to the first pre-defined rule are divided into the frame for being combined together as resolving into that the second pre-defined rule is divided into Group.

For example, resolve into 100 frames are applied into the first pre-defined rule, i.e., using continuous predetermined number frame as one group, Such as 10 one group of frame, it is divided into 10 groups；Again resolve into 100 frames are applied into the second pre-defined rule again, i.e., will continuously made a reservation for Number frame is as one group, such as 10 one group of frame, is divided into 10 groups.20 groups are added up in total, as 100 frames resolved into point At group.

The randomness of grouping can be improved by this packet mode, to improve the standard of video calibration to a certain extent True rate.

In step s 130, each group of frame is connected into sequence of frames of video.

In this exemplary embodiment, the video frame in each group after all video frame packets, which is together in series, to be become Sequence of frames of video, the video frame obtained after decomposition be it is individually separated, input machine learning model efficiency can be relatively low, by video Frame sequence input can effectively improve input efficiency, and then improve calibration efficiency.

In a kind of originally exemplary embodiment, it includes: according to each frame that each group of frame, which is connected into sequence of frames of video, The sequencing of frame number each group of frame is connected into sequence of frames of video.For example, one grouping in, including frame number be 11, 1,6,26,16,21 etc. frame, so that it may all frames be together in series according to 1,6,11,16,21,26 etc. sequence.Thus may be used So that being consistent in the sequence of frame and original video, the accuracy rate of calibration is effectively improved.

In step S140, the sequence of frames of video is inputted into machine learning model, is exported by the machine learning model The label of sequence of frames of video.

In this exemplary embodiment, the sequence of frames of video is inputted into machine learning model, by the machine learning The label of model output video frame sequence.Sequence of frames of video is inputted in trained machine learning model in advance first, by pre- First trained machine learning model demarcates the label of video according to sequence of frames of video.

In this exemplary embodiment, the training method including carrying out machine learning model.The training method includes: head Each sequence of frames of video sample in sequence of frames of video sample set is first inputted into the machine learning model, the sequence of frames of video Sample is to be grouped the video of various known labels according to the pre-defined rule, and each group of frame is connected in series, described Machine learning model output video frame sequence samples from video label, be compared with video known label, if It is inconsistent, then the coefficient in the machine learning model is adjusted, the label and the video for making the machine learning model output are Know that label is consistent.

In step S150, the label based on sequence of frames of video labels for the video.

It is in a kind of originally exemplary embodiment, probability right accounting in the label of obtained sequence of frames of video is maximum Label is as the final label of video, such as having 5 in all sequence of frames of video labels for finally obtaining is to make laughs, and 2 are lives, 1 is talk show, then the label that will finally make laughs as video.

In a kind of originally exemplary embodiment, by the maximum top n mark of number in the label of obtained sequence of frames of video Label are as the label beaten for video, such as predetermined N is 3, while having 5 tourisms in obtained superposition frame tagging, 4 open airs, and 3 Make laughs, 2 life, 2 talk shows, 1 cuisines, then by tourism, open air, make laughs all as the label of video.

In a kind of this exemplary embodiment, if the maximum label of the number has multiple, increase the frame resolved into The number of grouping, while if pre-defined rule includes the number of increase group using continuous predetermined number frame as one group, make to Few a part of group of frame for including partly overlaps, for example, if the label in the 5 sequence of frames of video calibration initially obtained has 3 It makes laughs, 3 tourisms, 2 open airs, then there is the maximum label of number and makes laughs and travel have 3 in 1 beautiful scenery, but predetermined It selects the maximum number of label as video tab, so needing to re-scale, therefore carries out second and divide series connection, for the first time It is taken again a little in the sequence of frames of video of acquisition, sequence of frames of video of connecting after grouping；Such as take two points of first time sequence of frames of video One of place be new division points, if for the first time series connection be using 1-20 frame as sequence of frames of video 1,20-40 frame is as video Frame sequence 2 ... ... 80-100 can take different starting points for second as sequence of frames of video 5, such as by 10-30 frame As sequence of frames of video 1,30-50 frame is as sequence of frames of video 2 ... ... 90-10 is as sequence of frames of video 5, and to total twice The label as small video that totally 10 superposition frames label, and take the camera lens number labeled most.Such calibration side Formula can improve the accuracy of calibration label to a certain extent.

The disclosure additionally provides a kind of device for video automatic labeling.Refering to what is shown in Fig. 4, the video automatic labeling Device may include decomposing module 310, grouping module 320, laminating module 330, the first demarcating module 340 and the second calibration mold Block 350.Wherein:

Decomposing module 310, in response to getting video, the video to be decomposed framing；

Grouping module 320, for the frame resolved into be grouped according to pre-defined rule；

Laminating module 330, for each group of frame to be connected into sequence of frames of video；

First demarcating module 340, for the sequence of frames of video to be inputted machine learning model, by the machine learning mould The label of type output video frame sequence；

Second demarcating module 350 labels for the label based on sequence of frames of video for the video,

It is above-mentioned be video automatic labeling device in each module detail it is corresponding be that video is beaten automatically It is described in detail in the method for label, therefore details are not described herein again.

It should be noted that although being referred to several modules or list for acting the equipment executed in the above detailed description Member, but this division is not enforceable.In fact, according to embodiment of the present disclosure, it is above-described two or more Module or the feature and function of unit can embody in a module or unit.Conversely, an above-described mould The feature and function of block or unit can be to be embodied by multiple modules or unit with further division.

In addition, although describing each step of method in the disclosure in the accompanying drawings with particular order, this does not really want These steps must be executed in this particular order by asking or implying, or having to carry out step shown in whole could realize Desired result.Additional or alternative, it is convenient to omit multiple steps are merged into a step and executed by certain steps, and/ Or a step is decomposed into execution of multiple steps etc..

Through the above description of the embodiments, those skilled in the art is it can be readily appreciated that example described herein is implemented Mode can also be realized by software realization in such a way that software is in conjunction with necessary hardware.Therefore, according to the disclosure The technical solution of embodiment can be embodied in the form of software products, which can store non-volatile at one Property storage medium (can be CD-ROM, USB flash disk, mobile hard disk etc.) in or network on, including some instructions are so that a calculating Equipment (can be personal computer, server, mobile terminal or network equipment etc.) is executed according to disclosure embodiment Method.

In an exemplary embodiment of the disclosure, a kind of electronic equipment that can be realized the above method is additionally provided.

Person of ordinary skill in the field it is understood that various aspects of the invention can be implemented as system, method or Program product.Therefore, various aspects of the invention can be embodied in the following forms, it may be assumed that complete hardware embodiment, complete The embodiment combined in terms of full Software Implementation (including firmware, microcode etc.) or hardware and software, can unite here Referred to as circuit, " module " or " system ".

The electronic equipment 400 of this embodiment according to the present invention is described referring to Fig. 4.The electronics that Fig. 4 is shown Equipment 400 is only an example, should not function to the embodiment of the present invention and use scope bring any restrictions.

As shown in figure 4, electronic equipment 400 is showed in the form of universal computing device.The component of electronic equipment 400 can wrap It includes but is not limited to: at least one above-mentioned processing unit 410, at least one above-mentioned storage unit 420, the different system components of connection The bus 430 of (including storage unit 420 and processing unit 410).

Wherein, the storage unit is stored with program code, and said program code can be held by the processing unit 410 Row, so that various according to the present invention described in the execution of the processing unit 410 above-mentioned " illustrative methods " part of this specification The step of illustrative embodiments.For example, the processing unit 410 can execute step S110 as shown in fig. 1: in response to Video is got, the video is decomposed into framing, step S120: the frame resolved into is grouped according to pre-defined rule, step S130: Each group of frame is connected into sequence of frames of video, step S140: the sequence of frames of video being inputted into machine learning model, by described The label of machine learning model output video frame sequence, step S150: the label based on sequence of frames of video is the video mark Label, wherein the machine learning model is trained as follows: by each sequence of frames of video in sequence of frames of video sample set Sample inputs the machine learning model, and the sequence of frames of video sample is by the video of various known labels according to described predetermined Rule grouping, and each group of frame is connected in series, the machine learning model output video frame sequence samples from The label of video is compared with video known label, if it is inconsistent, the coefficient in the machine learning model is adjusted, Keep the label of the machine learning model output consistent with the video known label.

Storage unit 420 may include the readable medium of volatile memory cell form, such as Random Access Storage Unit (RAM) 4201 and/or cache memory unit 4202, it can further include read-only memory unit (ROM) 4203.

Storage unit 420 can also include program/utility with one group of (at least one) program module 4205 4204, such program module 4205 includes but is not limited to: operating system, one or more application program, other program moulds It may include the realization of network environment in block and program data, each of these examples or certain combination.

Bus 430 can be to indicate one of a few class bus structures or a variety of, including storage unit bus or storage Cell controller, peripheral bus, graphics acceleration port, processing unit use any bus structures in a variety of bus structures Local bus.

Electronic equipment 400 can also be with one or more external equipments 600 (such as keyboard, sensing equipment, bluetooth equipment Deng) communication, can also be enabled a user to one or more equipment interact with the electronic equipment 400 communicate, and/or with make Any equipment (such as the router, modulation /demodulation that the electronic equipment 400 can be communicated with one or more of the other calculating equipment Device etc.) communication.This communication can be carried out by input/output (I/O) interface 450.Also, electronic equipment 400 can be with By network adapter 460 and one or more network (such as local area network (LAN), wide area network (WAN) and/or public network, Such as internet) communication.As shown, network adapter 460 is communicated by bus 430 with other modules of electronic equipment 400. It should be understood that although not shown in the drawings, other hardware and/or software module can not used in conjunction with electronic equipment 400, including but not Be limited to: microcode, device driver, redundant processing unit, external disk drive array, RAID system, tape drive and Data backup storage system etc..

Through the above description of the embodiments, those skilled in the art is it can be readily appreciated that example described herein is implemented Mode can also be realized by software realization in such a way that software is in conjunction with necessary hardware.Therefore, according to the disclosure The technical solution of embodiment can be embodied in the form of software products, which can store non-volatile at one Property storage medium (can be CD-ROM, USB flash disk, mobile hard disk etc.) in or network on, including some instructions are so that a calculating Equipment (can be personal computer, server, terminal installation or network equipment etc.) is executed according to disclosure embodiment Method.

In an exemplary embodiment of the disclosure, a kind of computer readable storage medium is additionally provided, energy is stored thereon with Enough realize the program product of this specification above method.In some possible embodiments, various aspects of the invention may be used also In the form of being embodied as a kind of program product comprising program code, when described program product is run on the terminal device, institute Program code is stated for executing the terminal device described in above-mentioned " illustrative methods " part of this specification according to this hair The step of bright various illustrative embodiments.

Refering to what is shown in Fig. 5, describing the program product for realizing the above method of embodiment according to the present invention 500, can using portable compact disc read only memory (CD-ROM) and including program code, and can in terminal device, Such as it is run on PC.However, program product of the invention is without being limited thereto, in this document, readable storage medium storing program for executing can be with To be any include or the tangible medium of storage program, the program can be commanded execution system, device or device use or It is in connection.

Described program product can be using any combination of one or more readable mediums.Readable medium can be readable letter Number medium or readable storage medium storing program for executing.Readable storage medium storing program for executing for example can be but be not limited to electricity, magnetic, optical, electromagnetic, infrared ray or System, device or the device of semiconductor, or any above combination.The more specific example of readable storage medium storing program for executing is (non exhaustive List) include: electrical connection with one or more conducting wires, portable disc, hard disk, random access memory (RAM), read-only Memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read only memory (CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.

Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, In carry readable program code.The data-signal of this propagation can take various forms, including but not limited to electromagnetic signal, Optical signal or above-mentioned any appropriate combination.Readable signal medium can also be any readable Jie other than readable storage medium storing program for executing Matter, the readable medium can send, propagate or transmit for by instruction execution system, device or device use or and its The program of combined use.

The program code for including on readable medium can transmit with any suitable medium, including but not limited to wirelessly, have Line, optical cable, RF etc. or above-mentioned any appropriate combination.

The program for executing operation of the present invention can be write with any combination of one or more programming languages Code, described program design language include object oriented program language-Java, C++ etc., further include conventional Procedural programming language-such as " C " language or similar programming language.Program code can be fully in user It calculates and executes in equipment, partly executes on a user device, being executed as an independent software package, partially in user's calculating Upper side point is executed on a remote computing or is executed in remote computing device or server completely.It is being related to far Journey calculates in the situation of equipment, and remote computing device can pass through the network of any kind, including local area network (LAN) or wide area network (WAN), it is connected to user calculating equipment, or, it may be connected to external computing device (such as utilize ISP To be connected by internet).

In addition, above-mentioned attached drawing is only the schematic theory of processing included by method according to an exemplary embodiment of the present invention It is bright, rather than limit purpose.It can be readily appreciated that the time that above-mentioned processing shown in the drawings did not indicated or limited these processing is suitable Sequence.In addition, be also easy to understand, these processing, which can be, for example either synchronously or asynchronously to be executed in multiple modules.

Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to its of the disclosure His embodiment.This application is intended to cover any variations, uses, or adaptations of the disclosure, these modifications, purposes or Adaptive change follow the general principles of this disclosure and including the undocumented common knowledge in the art of the disclosure or Conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the disclosure are by claim It points out.

Claims

1. a kind of method for video automatic labeling characterized by comprising

In response to getting video, the video is decomposed into framing；

The frame resolved into is grouped according to pre-defined rule；

Each group of frame is connected into sequence of frames of video；

The sequence of frames of video is inputted into machine learning model, by the label of the machine learning model output video frame sequence；

Label based on sequence of frames of video labels for the video,

Wherein, the machine learning model is trained as follows: by each video frame sequence in sequence of frames of video sample set Column sample inputs the machine learning model, and the sequence of frames of video sample is by the video of various known labels according to described pre- Set pattern is then grouped, and each group of frame is connected in series, the machine learning model output video frame sequence samples are come from Video label, be compared with video known label, if it is inconsistent, adjusting in the machine learning model and being Number keeps the label of the machine learning model output consistent with the video known label.

2. the method according to claim 1, wherein the pre-defined rule includes: by continuous predetermined number frame As one group.

3. the method according to claim 1, wherein the pre-defined rule includes: to resolve into the video Frame is divided into N group, and N is positive integer, and the number for being superimposed frame is also N, the frame that frame number is aN+i is formed i-th group, wherein a and i For positive integer, 0≤a≤N-1,1≤i≤N.

4. the method according to claim 1, wherein the frame by each group is connected into sequence of frames of video packet It includes:

5. the method according to claim 1, wherein the label based on sequence of frames of video, is the video It labels, comprising:

Using the maximum label of number in the label of obtained sequence of frames of video as the label beaten for video.

6. according to the method described in claim 5, it is characterized in that, number in the label of obtained sequence of frames of video is maximum Label is as the label beaten for video, comprising:

7. according to the method described in claim 6, it is characterized in that, it is described increase resolve into frame grouping number include: as Fruit pre-defined rule includes the then number of increase group using continuous predetermined number frame as one group, makes at least part group include Frame partly overlaps.

8. a kind of device for video automatic labeling characterized by comprising

First demarcating module is exported for the sequence of frames of video to be inputted machine learning model by the machine learning model The label of sequence of frames of video；

Second demarcating module labels for the label based on sequence of frames of video for the video,

9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program quilt Claim 1-7 described in any item methods are realized when processor executes.

10. a kind of electronic equipment characterized by comprising

Processor；And

Memory, for storing the executable instruction of the processor；

Wherein, the processor is configured to require 1-7 described in any item via executing the executable instruction and carry out perform claim Method.