CN108154120A - video classification model training method, device, storage medium and electronic equipment - Google Patents

video classification model training method, device, storage medium and electronic equipment Download PDF

Info

Publication number
CN108154120A
CN108154120A CN201711420935.5A CN201711420935A CN108154120A CN 108154120 A CN108154120 A CN 108154120A CN 201711420935 A CN201711420935 A CN 201711420935A CN 108154120 A CN108154120 A CN 108154120A
Authority
CN
China
Prior art keywords
vector
subcharacter
video
target
feature vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711420935.5A
Other languages
Chinese (zh)
Inventor
包怡欣
彭垚
绍杰
赵之健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANGHAI QINIU INFORMATION TECHNOLOGIES Co Ltd
Original Assignee
SHANGHAI QINIU INFORMATION TECHNOLOGIES Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI QINIU INFORMATION TECHNOLOGIES Co Ltd filed Critical SHANGHAI QINIU INFORMATION TECHNOLOGIES Co Ltd
Priority to CN201711420935.5A priority Critical patent/CN108154120A/en
Priority to PCT/CN2018/079907 priority patent/WO2019127940A1/en
Publication of CN108154120A publication Critical patent/CN108154120A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention discloses a kind of video classification model training method, device, storage medium and electronic equipment, this method and includes:Video file input video disaggregated model is learnt, obtains feature vector;Described eigenvector is divided into multiple subcharacter vectors;A sub- eigen vector is chosen from the multiple subcharacter vector as target subcharacter vector;The target subcharacter vector is inputted the video classification model to be trained, obtains final video disaggregated model.Partial Feature vector has been intercepted as target signature subvector input video disaggregated model for training, has reduced the size of input data and the data of its conversion, so as to reduce training parameter, improves training effectiveness.

Description

Video classification model training method, device, storage medium and electronic equipment
Technical field
The present invention relates to video field, more specifically, being related to a kind of video classification model training method, device, storage Medium and electronic equipment.
Background technology
When classifying to video file, need in advance to be trained video classification model, the visual classification after being optimized Model.Required parameter is more when being trained to video classification model, directly extremely inefficient using traditional algorithm so that training time mistake It is long.
Invention content
The technical problems to be solved by the invention are to provide a kind of video classification model training method, device, storage medium And electronic equipment, can raising training effectiveness, reduce the training time.
The purpose of the present invention is achieved through the following technical solutions:
In a first aspect, the embodiment of the present application provides a kind of video classification model training method, including:
Video file input video disaggregated model is learnt, obtains feature vector;
Described eigenvector is divided into multiple subcharacter vectors;
A sub- eigen vector is chosen from the multiple subcharacter vector as target subcharacter vector;
The target subcharacter vector is inputted the video classification model to be trained, obtains final video classification mould Type.
Second aspect, the embodiment of the present application provide a kind of video classification model training device, including:
First acquisition unit for video file input video disaggregated model to be learnt, obtains feature vector;
Division unit, for described eigenvector to be divided into multiple subcharacter vectors;
Selection unit, for from the multiple subcharacter vector choose a sub- eigen vector as target subcharacter to Amount;
Training unit is trained for the target subcharacter vector to be inputted the video classification model, obtains most Whole video classification model.
The third aspect, the embodiment of the present application provide a kind of storage medium, computer program are stored thereon with, when the calculating When machine program is run on computers so that the computer performs above-mentioned video classification model training.
Fourth aspect, the embodiment of the present application provide a kind of electronic equipment, and including processor and memory, the memory has Computer program, the processor is by calling the computer program, for performing above-mentioned video classification model training side Method.
Video classification model training method provided by the embodiments of the present application, device, storage medium and electronic equipment, pass through by Video file input video disaggregated model is learnt, and obtains feature vector;Feature vector is divided into multiple subcharacter vectors;From A sub- eigen vector is chosen in multiple subcharacter vectors as target subcharacter vector;It will be described in the input of target subcharacter vector Video classification model is trained, and obtains final video disaggregated model.Intercepted Partial Feature vector as target signature to Amount input video disaggregated model reduces the size of input data and the data of its conversion, so as to reduce instruction for training Practice parameter, improve training effectiveness.
Description of the drawings
Attached drawing to be used is needed to be briefly described.It should be evident that the accompanying drawings in the following description is only the application's Some embodiments, for those skilled in the art, without creative efforts, can also be attached according to these Figure obtains other attached drawings.
Fig. 1 is the first flow diagram of video classification model training method provided by the embodiments of the present application;
Fig. 2 is the block diagram representation of video classification model training method provided by the embodiments of the present application;
Fig. 3 is second of flow diagram of video classification model training method provided by the embodiments of the present application;
Fig. 4 is the third flow diagram of video classification model training method provided by the embodiments of the present application;
Fig. 5 is the 4th kind of flow diagram of video classification model training method provided by the embodiments of the present application;
Fig. 6 is the structure diagram of visual classification device provided by the embodiments of the present application.
Specific embodiment
Schema is please referred to, wherein identical element numbers represent identical component, the principle of the application is to implement one It is illustrated in appropriate computing environment.The following description be based on illustrated the application specific embodiment, should not be by It is considered as limitation the application other specific embodiments not detailed herein.
In the following description, the specific embodiment of the application will be with reference to as the step performed by one or multi-section computer And symbol illustrates, unless otherwise stating clearly.Therefore, these steps and operation will have to mention for several times is performed by computer, this paper institutes The computer execution of finger includes by representing with the computer processing unit of the electronic signal of the data in a structuring pattern Operation.This operation is converted at the data or the position being maintained in the memory system of the computer, reconfigurable Or in addition change the running of the computer in a manner of known to the tester of this field.The data structure that the data are maintained For the provider location of the memory, there is the specific feature as defined in the data format.But the application principle is with above-mentioned text Word illustrates that be not represented as a kind of limitation, this field tester will appreciate that plurality of step as described below and behaviour Also it may be implemented in hardware.
Term as used herein " unit " can regard the software object to be performed in the arithmetic system as.It is as described herein Different components, unit, engine and service can be regarded as the objective for implementation in the arithmetic system.And device as described herein and side Method can be implemented in a manner of software, can also be implemented on hardware certainly, within the application protection domain.
Term " comprising " and " having " and their any deformations in the application, it is intended that cover non-exclusive packet Contain.Such as it contains process, method, system, product or the equipment of series of steps or module and is not limited to the step listed Rapid or module, but some embodiments further include the step of not listing or module or some embodiments are further included for these Process, method, product or equipment intrinsic other steps or module.
Referenced herein " embodiment " is it is meant that a particular feature, structure, or characteristic described can wrap in conjunction with the embodiments It is contained at least one embodiment of the application.Each position in the description occur the phrase might not each mean it is identical Embodiment, nor the independent or alternative embodiment with other embodiments mutual exclusion.Those skilled in the art explicitly and Implicitly understand, embodiment described herein can be combined with other embodiments.
The embodiment of the present application provides a kind of video classification model training method, the execution of the video classification model training method Main body can be video classification model training device provided by the embodiments of the present application or be integrated with video classification model training The electronic equipment of device, wherein hardware may be used in the video classification model training device or the mode of software is realized.
The embodiment of the present application will be described from the angle of video classification model training device, video classification model training Device can specifically integrate in the electronic device.The video classification model training method includes:By video file input video point Class model is learnt, and obtains feature vector;Feature vector is divided into multiple subcharacter vectors;It is selected from multiple subcharacter vectors A sub- eigen vector is taken as target subcharacter vector;Target subcharacter vector input video disaggregated model is trained, Obtain final video disaggregated model.
Wherein electronic equipment is set including smart mobile phone, tablet computer, palm PC, computer, server, Cloud Server etc. It is standby.
Please refer to Fig.1 and Fig. 2, Fig. 1 be video classification model training method provided by the embodiments of the present application the first stream Journey schematic diagram, Fig. 2 are the block diagram representation of video classification model training method provided by the embodiments of the present application.The embodiment of the present application The video classification model training method of offer, idiographic flow can be as follows:
Step 101, video file input video disaggregated model is learnt, obtains feature vector.
Video file can be the video file of the forms such as mjpeg, avi, rmvb, 3gp.Herein not to the lattice of video file Formula is defined.
Video classification model can be convolutional neural networks algorithm model, Recognition with Recurrent Neural Network algorithm model etc..Video point Class model can also be SENet (Squeeze-and-Excitation Networks) algorithm model.
By in video file input video disaggregated model, video classification model obtains corresponding to the video according to the video file The eigen vector of document classification information.The scene characteristic of such as video file, character features, item characteristics, temporal characteristics.
Step 102, feature vector is divided into multiple subcharacter vectors.
In a dimension of eigen vector, which is divided into multiple feature vectors.Such as feature vector is Feature vector is divided into multiple subcharacters vector of 4 512*200 by 2048*200.
Step 103, a sub- eigen vector is chosen from multiple subcharacter vectors as target subcharacter vector.
A sub- feature vector is arbitrarily chosen from multiple subcharacter vectors as target subcharacter vector.Can be first A sub- feature vector can also be the last one subcharacter vector or any one intermediate subcharacter vector.Also may be used Determined with being the data in each subcharacter vector, such as obtain the sum of data of each feature, access according to the sum of it is maximum or Minimum or median is target subcharacter vector.It can also be by calculating the difference of two squares of each subcharacter vector, the difference of two squares is most Small is target subcharacter vector.
Step 104, target subcharacter vector input video disaggregated model is trained, obtains final video classification mould Type.
After obtaining target subcharacter vector, it is inputted video classification model and is trained, the video after being optimized point Class model optimizes parameters value in video classification model.Wherein, specifically, video classification model can include NetVLad Layer, the NetVLad layers is to make it can be micro- VLAD processing and add in one layer of process layer of formation in convolutional neural networks, should NetVLad layers is can pass through the coding of back propagation learning picture.
Referring to Fig. 3, second of flow that Fig. 3 is video classification model training method provided by the embodiments of the present application is illustrated Figure.Video classification model training method provided by the embodiments of the present application, idiographic flow can be as follows:
Step 201, video file input video disaggregated model is learnt, obtains feature vector.
Video classification model can be convolutional neural networks algorithm model, Recognition with Recurrent Neural Network algorithm model etc..Video point Class model can also be SENet (Squeeze-and-Excitation Networks) algorithm model.
By in video file input video disaggregated model, video classification model obtains corresponding to the video according to the video file The eigen vector of document classification information.The scene characteristic of such as video file, character features, item characteristics, temporal characteristics.
Step 202, it is equal and sequentially connected multiple sub special to be divided into multiple vector lengths by vector length for feature vector Sign vector.
It can be divided according to the vector length of eigen vector, it is equal that division obtains multiple vector lengths, and is sequentially connected Multiple subcharacters vector.It is, for example, possible to use data are carried out the one-dimensional characteristic that compression formation length is 2048 by full articulamentum Vector, one-dimensional characteristic vector represent a frame image.Therefore, it is such as extracted with the frequency of one frame image of extraction per second from video file 200 frame images, then video file can extract the feature vector of one group of 2048*200.The figure for needing to extract can also be preset As quantity, total reproduction time of video file is then obtained, then total reproduction time divided by amount of images, obtain obtaining a frame figure The frequency of picture, for example, preset need extract 300 frame images, the total reproduction time of video file be 30 minutes, then remove within 30 minutes With 300, obtain obtaining the frequency of a frame image to obtain within every 6 seconds a frame image, each frame image be length be 2048 it is one-dimensional Feature vector, and then feature vector is 2048*300.
Then feature vector is pressed to the length 2048 of vector length, i.e. one-dimensional characteristic vector, be divided into multiple vector length phases Deng and sequentially connected multiple subcharacters vector, be such as divided into the subcharacter vector of 4 512*200.
Step 203, it from multiple subcharacter vectors, chooses first or the last one subcharacter vector is special as target Sign vector.
Partial video file can introduce the video clip of the video file in the beginning part or ending, therefore from multiple In subcharacter vector, first sub- feature vector of corresponding video file the beginning part or corresponding video file tail end are chosen The last one the subcharacter vector divided is as target subcharacter vector.
Step 204, target subcharacter vector input video disaggregated model is trained, obtains final video classification mould Type.
After obtaining target subcharacter vector, it is inputted video classification model and is trained, the video after being optimized point Class model optimizes parameters value in video classification model.
In some embodiments, any one subcharacter vector can be chosen as mesh from multiple subcharacter vectors Mark subcharacter vector.
Referring to Fig. 4, the third flow that Fig. 4 is video classification model training method provided by the embodiments of the present application is illustrated Figure.Video classification model training method provided by the embodiments of the present application, idiographic flow can be as follows:
Step 301, video file input video disaggregated model is learnt, obtains feature vector.
Video classification model can be convolutional neural networks algorithm model, Recognition with Recurrent Neural Network algorithm model etc..Video point Class model can also be SENet (Squeeze-and-Excitation Networks) algorithm model.
By in video file input video disaggregated model, video classification model obtains corresponding to the video according to the video file The eigen vector of document classification information.The scene characteristic of such as video file, character features, item characteristics, temporal characteristics.
Step 3021, by feature vector by vector length, it is divided into that multiple vector lengths are equal and continuous multiple features Array section.
It can be divided according to the vector length of eigen vector, it is equal that division obtains multiple vector lengths, and is sequentially connected Multiple feature vector sections.It is, for example, possible to use data are carried out the one-dimensional characteristic that compression formation length is 2048 by full articulamentum Vector, one-dimensional characteristic vector represent a frame image.Therefore, it is such as extracted with the frequency of one frame image of extraction per second from video file 200 frame images, then video file can extract the feature vector of one group of 2048*200.The figure for needing to extract can also be preset As quantity, total reproduction time of video file is then obtained, then total reproduction time divided by amount of images, obtain obtaining a frame figure The frequency of picture, for example, preset need extract 300 frame images, the total reproduction time of video file be 30 minutes, then remove within 30 minutes With 300, obtain obtaining the frequency of a frame image to obtain within every 6 seconds a frame image, each frame image be length be 2048 it is one-dimensional Feature vector, and then feature vector is 2048*300.
Then feature vector is pressed to the length 2048 of vector length, i.e. one-dimensional characteristic vector, be divided into multiple vector length phases Deng and sequentially connected multiple feature vector sections, be such as divided into the feature vector section of 16 128*200.
Step 3022, at least two feature vector sections in multiple feature vector sections are formed into a sub- feature vector, obtained To multiple subcharacters vector, one of subcharacter vector includes first feature vector section and the last one feature vector section;
At least two feature vector sections in multiple feature vector sections are formed into a sub- feature vector, such as 2 features to It measures section and forms a sub- feature vector, and then obtain multiple subcharacter vectors.Partial video file is in the beginning part or tail end The video clip of the video file is introduced by branch, therefore from multiple subcharacter vectors, chooses corresponding video file the beginning part First feature vector section and corresponding video file ending the last one feature vector section merge to obtain a son it is special In addition sign vector, the subcharacter vector can only include first feature vector section and the last one feature vector section, can also be Feature vector section including one or more other positions.
Step 303, the subcharacter vector for including first feature vector section and the last one feature vector section is chosen, as Target subcharacter vector.
Step 304, target subcharacter vector input video disaggregated model is trained, obtains final video classification mould Type.
After obtaining target subcharacter vector, it is inputted video classification model and is trained, the video after being optimized point Class model optimizes parameters value in video classification model.
It should be noted that in the above-described embodiment, the vector length of target subcharacter vector for feature vector to / 8th of amount length are between half.Such as feature vector is 2048*200, then target subcharacter vector is 256* 200 between 1024*200.
Referring to Fig. 5, the 4th kind of flow that Fig. 5 is video classification model training method provided by the embodiments of the present application is illustrated Figure.Video classification model training method provided by the embodiments of the present application, idiographic flow can be as follows:
Step 401, the front section of video file input video disaggregated model is learnt, obtains feature vector.
Step 402, feature vector is divided into multiple subcharacter vectors.
Step 403, a sub- eigen vector is chosen from multiple subcharacter vectors as target subcharacter vector.
Step 404, the hindfoot portion of target subcharacter vector input video disaggregated model is trained, is finally regarded Frequency division class model.
By feature vector (2048*200) average intercept that video file learns from disaggregated model such as SENet models into four Section (512*200), every section of subcharacter vector independent as one, then arbitrarily choose the entire video text of one section of conduct therein Feature vector carry out next training.Reduce the size of each feature in this way, so as to reduce training parameter, improve instruction Practice efficiency.The training that the feature of Partial Feature vector as a whole enters next layer has been intercepted, has reduced next each spy The size of vector is levied, so as to reduce training parameter, improves training effectiveness.
In some embodiments, feature vector can be extracted from video file, this feature vector is inputted into algorithm mould Type learns, and obtains the weighted value of each feature in corresponding this feature vector, weighted value is divided into several sections, will according to weighted value Feature in feature vector is divided into multiple subcharacter vectors, and each subcharacter vector includes the feature of different weighted values, and different Feature quantity in subcharacter vector in same weighted value section is equal.
In some embodiments, corresponding continuous multiple frames image can be extracted from video file, each frame image is existed Classify in algorithm model, and formed and represent the other first group of feature of object type and represent second group of feature of scene type, First group of feature and second group of Fusion Features are formed into a third one-dimensional vector, are upper by the third one-dimensional vector The initial characteristics vector in embodiment is stated, is then trained according to the initial characteristics vector.Such as the spy obtained in step 101 Sign vector be third one-dimensional characteristic vector, by this feature vector according to scene type and object classification be divided into multiple subcharacters to Amount, each subcharacter vector includes scene type and object type another characteristic quantity is equal.
From the foregoing, it will be observed that video classification model training method provided by the embodiments of the present application, by the way that video file input is regarded Frequency division class model is learnt, and obtains feature vector;Feature vector is divided into multiple subcharacter vectors;From multiple subcharacters vector One sub- eigen vector of middle selection is as target subcharacter vector;Target subcharacter vector input video disaggregated model is instructed Practice, obtain final video disaggregated model.Partial Feature vector has been intercepted as target signature subvector input video disaggregated model For training, reduce the size of input data and the data of its conversion, so as to reduce training parameter, improve training effect Rate.
Referring to Fig. 6, Fig. 6 is the structure diagram of video classification model training device provided by the embodiments of the present application.Its In the video classification model training device 500 include first acquisition unit 501, division unit 502, selection unit 503 and training Unit 504.Wherein:
First acquisition unit 501 for video file input video disaggregated model to be learnt, obtains feature vector.
Video file can be the video file of the forms such as mjpeg, avi, rmvb, 3gp.Herein not to the lattice of video file Formula is defined.
Video classification model can be convolutional neural networks algorithm model, Recognition with Recurrent Neural Network algorithm model etc..Video point Class model can also be SENet (Squeeze-and-Excitation Networks) algorithm model.
By in video file input video disaggregated model, video classification model obtains corresponding to the video according to the video file The eigen vector of document classification information.The scene characteristic of such as video file, character features, item characteristics, temporal characteristics.
Division unit 502, for feature vector to be divided into multiple subcharacter vectors.
In a dimension of eigen vector, which is divided into multiple feature vectors.Such as feature vector is Feature vector is divided into multiple subcharacters vector of 4 512*200 by 2048*200.
Selection unit 503, for from multiple subcharacter vectors choose a sub- eigen vector as target subcharacter to Amount.
A sub- feature vector is arbitrarily chosen from multiple subcharacter vectors as target subcharacter vector.Can be first A sub- feature vector can also be the last one subcharacter vector or any one intermediate subcharacter vector.Also may be used Determined with being the data in each subcharacter vector, such as obtain the sum of data of each feature, access according to the sum of it is maximum or Minimum or median is target subcharacter vector.It can also be by calculating the difference of two squares of each subcharacter vector, the difference of two squares is most Small is target subcharacter vector.
Training unit 504 for target subcharacter vector input video disaggregated model to be trained, obtains final video Disaggregated model.
After obtaining target subcharacter vector, it is inputted video classification model and is trained, the video after being optimized point Class model optimizes parameters value in video classification model.
In some embodiments, division unit 502 are additionally operable to feature vector being divided into multiple vectors by vector length Equal length and sequentially connected multiple subcharacter vectors.
It can be divided according to the vector length of eigen vector, it is equal that division obtains multiple vector lengths, and is sequentially connected Multiple subcharacters vector.It is, for example, possible to use data are carried out the one-dimensional characteristic that compression formation length is 2048 by full articulamentum Vector, one-dimensional characteristic vector represent a frame image.Therefore, it is such as extracted with the frequency of one frame image of extraction per second from video file 200 frame images, then video file can extract the feature vector of one group of 2048*200.The figure for needing to extract can also be preset As quantity, total reproduction time of video file is then obtained, then total reproduction time divided by amount of images, obtain obtaining a frame figure The frequency of picture, for example, preset need extract 300 frame images, the total reproduction time of video file be 30 minutes, then remove within 30 minutes With 300, obtain obtaining the frequency of a frame image to obtain within every 6 seconds a frame image, each frame image be length be 2048 it is one-dimensional Feature vector, and then feature vector is 2048*300.
Then feature vector is pressed to the length 2048 of vector length, i.e. one-dimensional characteristic vector, be divided into multiple vector length phases Deng and sequentially connected multiple subcharacters vector, be such as divided into the subcharacter vector of 4 512*200.
Selection unit 503 is additionally operable to from multiple subcharacter vectors, chooses first or the last one subcharacter vector is made For target subcharacter vector.
Partial video file can introduce the video clip of the video file in the beginning part or ending, therefore from multiple In subcharacter vector, first sub- feature vector of corresponding video file the beginning part or corresponding video file tail end are chosen The last one the subcharacter vector divided is as target subcharacter vector.
In some embodiments, selection unit 503 are additionally operable to from multiple subcharacter vectors, choose any one height Feature vector is as target subcharacter vector.
In some embodiments, division unit 502 are additionally operable to feature vector being divided into multiple vectors by vector length Equal length and continuous multiple feature vector sections;At least two feature vector sections in multiple feature vector sections are formed one A sub- feature vector obtains multiple subcharacter vectors, and one of subcharacter vector is including first feature vector section and finally One feature vector section.
It can be divided according to the vector length of eigen vector, it is equal that division obtains multiple vector lengths, and is sequentially connected Multiple feature vector sections.It is, for example, possible to use data are carried out the one-dimensional characteristic that compression formation length is 2048 by full articulamentum Vector, one-dimensional characteristic vector represent a frame image.Therefore, it is such as extracted with the frequency of one frame image of extraction per second from video file 200 frame images, then video file can extract the feature vector of one group of 2048*200.The figure for needing to extract can also be preset As quantity, total reproduction time of video file is then obtained, then total reproduction time divided by amount of images, obtain obtaining a frame figure The frequency of picture, for example, preset need extract 300 frame images, the total reproduction time of video file be 30 minutes, then remove within 30 minutes With 300, obtain obtaining the frequency of a frame image to obtain within every 6 seconds a frame image, each frame image be length be 2048 it is one-dimensional Feature vector, and then feature vector is 2048*300.
Then feature vector is pressed to the length 2048 of vector length, i.e. one-dimensional characteristic vector, be divided into multiple vector length phases Deng and sequentially connected multiple feature vector sections, be such as divided into the feature vector section of 16 128*200.
At least two feature vector sections in multiple feature vector sections are formed into a sub- feature vector, such as 2 features to It measures section and forms a sub- feature vector, and then obtain multiple subcharacter vectors.Partial video file is in the beginning part or tail end The video clip of the video file is introduced by branch, therefore from multiple subcharacter vectors, chooses corresponding video file the beginning part First feature vector section and corresponding video file ending the last one feature vector section merge to obtain a son it is special In addition sign vector, the subcharacter vector can only include first feature vector section and the last one feature vector section, can also be Feature vector section including one or more other positions.
Selection unit 503 is additionally operable to choose the son spy for including first feature vector section and the last one feature vector section Sign vector, as target subcharacter vector.
In some embodiments, the vector length of target subcharacter vector for feature vector vector length eight/ One between half.Such as feature vector be 2048*200, then target subcharacter vector for 256*200 to 1024*200 it Between.
In some embodiments, disaggregated model includes front section and hindfoot portion.First acquisition unit 501, is also used Learn in by the front section of video file input video disaggregated model, obtain feature vector.Training unit 504, is also used It is trained in by the hindfoot portion of target subcharacter vector input video disaggregated model.
By feature vector (2048*200) average intercept that video file learns from disaggregated model such as SENet models into four Section (512*200), every section of subcharacter vector independent as one, then arbitrarily choose the entire video text of one section of conduct therein Feature vector carry out next training.Reduce the size of each feature in this way, so as to reduce training parameter, improve instruction Practice efficiency.The training that the feature of Partial Feature vector as a whole enters next layer has been intercepted, has reduced next each spy The size of vector is levied, so as to reduce training parameter, improves training effectiveness.
From the foregoing, it will be observed that video classification model training device provided by the embodiments of the present application, by the way that video file input is regarded Frequency division class model is learnt, and obtains feature vector;Feature vector is divided into multiple subcharacter vectors;From multiple subcharacters vector One sub- eigen vector of middle selection is as target subcharacter vector;Target subcharacter vector input video disaggregated model is instructed Practice, obtain final video disaggregated model.Partial Feature vector has been intercepted as target signature subvector input video disaggregated model For training, reduce the size of input data and the data of its conversion, so as to reduce training parameter, improve training effect Rate.
When it is implemented, Yi Shang modules can be independent entity to realize, arbitrary combination can also be carried out, is made It is realized for same or several entities, the specific implementation of more than modules can be found in the embodiment of the method for front, herein not It repeats again.
In the embodiment of the present application, video classification model training device and the video classification model training side in foregoing embodiments It is owned by France in same design, can run in video classification model training method embodiment and carry on video classification model training device The either method of confession, specific implementation process refer to the embodiment of video classification model training method, and details are not described herein again.
The embodiment of the present application also provides a kind of electronic equipment.Electronic equipment includes processor and memory.Wherein, it handles Device is electrically connected with memory.
Processor is the control centre of electronic equipment, utilizes each portion of various interfaces and the entire electronic equipment of connection Point, the data being stored in memory and are called at computer program in memory by operation or load store, are performed The various functions of electronic equipment simultaneously handle data, so as to carry out integral monitoring to electronic equipment.
Memory can be used for storage software program and unit, and processor is stored in the computer journey of memory by operation Sequence and unit, so as to perform various functions application and data processing.Memory can mainly include storing program area and storage Data field, wherein, storing program area can storage program area, the computer program needed at least one function (for example broadcast by sound Playing function, image player function etc.) etc.;Storage data field can be stored uses created data etc. according to electronic equipment.This Outside, memory can include high-speed random access memory, can also include nonvolatile memory, for example, at least a disk Memory device, flush memory device or other volatile solid-state parts.Correspondingly, memory can also include memory control Device, to provide access of the processor to memory.
In the embodiment of the present application, the processor in electronic equipment can be according to the steps, by one or more Computer program process it is corresponding instruction be loaded into memory, and calculating stored in memory is run by processor Machine program is as follows so as to fulfill various functions:
Video file input video disaggregated model is learnt, obtains feature vector;
Feature vector is divided into multiple subcharacter vectors;
A sub- eigen vector is chosen from multiple subcharacter vectors as target subcharacter vector;
Target subcharacter vector input video disaggregated model is trained, obtains final video disaggregated model.
The embodiment of the present application also provides a kind of storage medium, and storage medium is stored with computer program, works as computer program When running on computers so that computer performs the application program management-control method in any of the above-described embodiment, such as:By video File input video disaggregated model is learnt, and obtains feature vector;Feature vector is divided into multiple subcharacter vectors;From multiple A sub- eigen vector is chosen in subcharacter vector as target subcharacter vector;Target subcharacter vector input video is classified Model is trained, and obtains final video disaggregated model.
In the embodiment of the present application, storage medium can be magnetic disc, CD, read-only memory (Read Only Memory, ) or random access memory (Random Access Memory, RAM) etc. ROM.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment Point, it may refer to the associated description of other embodiment.
It should be noted that for the video classification model training method of the embodiment of the present application, this field common test Personnel are appreciated that realize all or part of flow of the embodiment of the present application video classification model training method, are that can pass through meter Calculation machine program is completed to control relevant hardware, and computer program can be stored in a computer read/write memory medium, such as It is stored in the memory of electronic equipment, and is performed by least one processor in the electronic equipment, in the process of implementation may be used The flow of embodiment including such as audio frequency playing method.Wherein, storage medium can be magnetic disc, CD, read-only memory, random Access/memory body etc..
The above content is a further detailed description of the present invention in conjunction with specific preferred embodiments, it is impossible to assert The specific implementation of the present invention is confined to these explanations.For those of ordinary skill in the art to which the present invention belongs, exist Under the premise of not departing from present inventive concept, several simple deduction or replace can also be made, should all be considered as belonging to the present invention's Protection domain.

Claims (10)

1. a kind of video classification model training method, which is characterized in that including:
Video file input video disaggregated model is learnt, obtains feature vector;
Described eigenvector is divided into multiple subcharacter vectors;
A sub- eigen vector is chosen from the multiple subcharacter vector as target subcharacter vector;
The target subcharacter vector is inputted the video classification model to be trained, obtains final video disaggregated model.
2. video classification model training method as described in claim 1, which is characterized in that the disaggregated model includes leading portion portion Point and hindfoot portion;
The step of video file input video disaggregated model is learnt, including:
The front section of video file input video disaggregated model is learnt, obtains feature vector;
The step of target subcharacter vector input video disaggregated model is trained, including:
The hindfoot portion of the target subcharacter vector input video disaggregated model is trained.
3. video classification model training method as described in claim 1, which is characterized in that described to be divided into described eigenvector The step of multiple subcharacter vectors, including;
By described eigenvector by vector length, it is divided into that multiple vector lengths are equal and sequentially connected multiple subcharacter vectors;
It is described that step of the sub- eigen vector as target subcharacter vector, packet are chosen from the multiple subcharacter vector It includes:
From the multiple subcharacter vector, choose first or the last one subcharacter vector is vectorial as target subcharacter.
4. video classification model training method as described in claim 1, which is characterized in that be divided into described eigenvector multiple The step of subcharacter vector, including;
By described eigenvector by vector length, it is divided into that multiple vector lengths are equal and continuous multiple feature vector sections;
By in multiple feature vector sections at least two feature vector sections formed a sub- feature vector, obtain multiple subcharacters to Amount, one of subcharacter vector include first feature vector section and the last one feature vector section;
It is described that step of the sub- eigen vector as target subcharacter vector, packet are chosen from the multiple subcharacter vector It includes:
Choose include first feature vector section and the last one feature vector section subcharacter it is vectorial, as target subcharacter to Amount.
5. the video classification model training method as described in claim 1-4 is any, which is characterized in that the target subcharacter to The vector length of amount is the vector length of described eigenvector 1/8th between half.
6. a kind of video classification model training device, which is characterized in that including:
First acquisition unit for video file input video disaggregated model to be learnt, obtains feature vector;
Division unit, for described eigenvector to be divided into multiple subcharacter vectors;
Selection unit, for choosing a sub- eigen vector from the multiple subcharacter vector as target subcharacter vector;
Training unit is trained for the target subcharacter vector to be inputted the video classification model, is finally regarded Frequency division class model.
7. video classification model training device as claimed in claim 6, which is characterized in that the disaggregated model includes leading portion portion Point and hindfoot portion;
The first acquisition unit is additionally operable to learn the front section of video file input video disaggregated model, obtain Feature vector;
The training unit is additionally operable to instruct the hindfoot portion of the target subcharacter vector input video disaggregated model Practice.
8. video classification model training device as claimed in claim 6, which is characterized in that
It is equal and connect successively to be additionally operable to described eigenvector being divided by vector length multiple vector lengths for the division unit The multiple subcharacters vector connect;
The selection unit is additionally operable to from the multiple subcharacter vector, chooses first or the last one subcharacter vector As target subcharacter vector.
9. a kind of storage medium, is stored thereon with computer program, which is characterized in that when the computer program on computers During operation so that the computer performs such as video classification model training method described in any one of claim 1 to 5.
10. a kind of electronic equipment, including processor and memory, the memory has computer program, which is characterized in that described Processor is instructed by calling the computer program for performing video classification model described in any one of claim 1 to 5 such as Practice method.
CN201711420935.5A 2017-12-25 2017-12-25 video classification model training method, device, storage medium and electronic equipment Pending CN108154120A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201711420935.5A CN108154120A (en) 2017-12-25 2017-12-25 video classification model training method, device, storage medium and electronic equipment
PCT/CN2018/079907 WO2019127940A1 (en) 2017-12-25 2018-03-21 Video classification model training method, device, storage medium, and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711420935.5A CN108154120A (en) 2017-12-25 2017-12-25 video classification model training method, device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN108154120A true CN108154120A (en) 2018-06-12

Family

ID=62465816

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711420935.5A Pending CN108154120A (en) 2017-12-25 2017-12-25 video classification model training method, device, storage medium and electronic equipment

Country Status (2)

Country Link
CN (1) CN108154120A (en)
WO (1) WO2019127940A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165722A (en) * 2018-07-09 2019-01-08 北京市商汤科技开发有限公司 Model expansion method and device, electronic equipment and storage medium
CN109214399A (en) * 2018-10-12 2019-01-15 清华大学深圳研究生院 A kind of improvement YOLOV3 Target Recognition Algorithms being embedded in SENet structure
CN109614517A (en) * 2018-12-04 2019-04-12 广州市百果园信息技术有限公司 Classification method, device, equipment and the storage medium of video
CN110175266A (en) * 2019-05-28 2019-08-27 复旦大学 A method of it is retrieved for multistage video cross-module state

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102360434A (en) * 2011-10-09 2012-02-22 江苏大学 Target classification method of vehicle and pedestrian in intelligent traffic monitoring
US20120076401A1 (en) * 2010-09-27 2012-03-29 Xerox Corporation Image classification employing image vectors compressed using vector quantization
CN105912611A (en) * 2016-04-05 2016-08-31 中国科学技术大学 CNN based quick image search method
CN106101831A (en) * 2016-07-15 2016-11-09 合网络技术(北京)有限公司 video vectorization method and device
CN106650617A (en) * 2016-11-10 2017-05-10 江苏新通达电子科技股份有限公司 Pedestrian abnormity identification method based on probabilistic latent semantic analysis
CN107341452A (en) * 2017-06-20 2017-11-10 东北电力大学 Human bodys' response method based on quaternary number space-time convolutional neural networks

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6915009B2 (en) * 2001-09-07 2005-07-05 Fuji Xerox Co., Ltd. Systems and methods for the automatic segmentation and clustering of ordered information
CN102930294A (en) * 2012-10-18 2013-02-13 上海交通大学 Chaotic characteristic parameter-based motion mode video segmentation and traffic condition identification method
CN103218608B (en) * 2013-04-19 2017-05-10 中国科学院自动化研究所 Network violent video identification method
CN105512631B (en) * 2015-12-07 2019-01-25 上海交通大学 Video detecting method is feared cruelly based on MoSIFT and CSD feature

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120076401A1 (en) * 2010-09-27 2012-03-29 Xerox Corporation Image classification employing image vectors compressed using vector quantization
CN102360434A (en) * 2011-10-09 2012-02-22 江苏大学 Target classification method of vehicle and pedestrian in intelligent traffic monitoring
CN105912611A (en) * 2016-04-05 2016-08-31 中国科学技术大学 CNN based quick image search method
CN106101831A (en) * 2016-07-15 2016-11-09 合网络技术(北京)有限公司 video vectorization method and device
CN106650617A (en) * 2016-11-10 2017-05-10 江苏新通达电子科技股份有限公司 Pedestrian abnormity identification method based on probabilistic latent semantic analysis
CN107341452A (en) * 2017-06-20 2017-11-10 东北电力大学 Human bodys' response method based on quaternary number space-time convolutional neural networks

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
卓柳: "胎儿颜面部三维超声基准标准切面自动校对系统的研究", 《中国优秀硕士学位论文全文数据库 医药卫生科技辑》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165722A (en) * 2018-07-09 2019-01-08 北京市商汤科技开发有限公司 Model expansion method and device, electronic equipment and storage medium
CN109165722B (en) * 2018-07-09 2021-07-23 北京市商汤科技开发有限公司 Model expansion method and device, electronic equipment and storage medium
CN109214399A (en) * 2018-10-12 2019-01-15 清华大学深圳研究生院 A kind of improvement YOLOV3 Target Recognition Algorithms being embedded in SENet structure
CN109614517A (en) * 2018-12-04 2019-04-12 广州市百果园信息技术有限公司 Classification method, device, equipment and the storage medium of video
CN109614517B (en) * 2018-12-04 2023-08-01 广州市百果园信息技术有限公司 Video classification method, device, equipment and storage medium
CN110175266A (en) * 2019-05-28 2019-08-27 复旦大学 A method of it is retrieved for multistage video cross-module state
CN110175266B (en) * 2019-05-28 2020-10-30 复旦大学 Cross-modal retrieval method for multi-segment video

Also Published As

Publication number Publication date
WO2019127940A1 (en) 2019-07-04

Similar Documents

Publication Publication Date Title
Rochan et al. Video summarization using fully convolutional sequence networks
Sun et al. Lattice long short-term memory for human action recognition
CN111209440B (en) Video playing method, device and storage medium
CN110147711A (en) Video scene recognition methods, device, storage medium and electronic device
CN108090203A (en) Video classification methods, device, storage medium and electronic equipment
CN110990631A (en) Video screening method and device, electronic equipment and storage medium
Wang et al. Dynamic attention guided multi-trajectory analysis for single object tracking
Hou et al. Content-attention representation by factorized action-scene network for action recognition
CN108154120A (en) video classification model training method, device, storage medium and electronic equipment
Hii et al. Multigap: Multi-pooled inception network with text augmentation for aesthetic prediction of photographs
CN112200041B (en) Video motion recognition method and device, storage medium and electronic equipment
CN110688524A (en) Video retrieval method and device, electronic equipment and storage medium
CN111539290A (en) Video motion recognition method and device, electronic equipment and storage medium
CN109086697A (en) A kind of human face data processing method, device and storage medium
Wang et al. Multiscale deep alternative neural network for large-scale video classification
CN111783712A (en) Video processing method, device, equipment and medium
CN113761359B (en) Data packet recommendation method, device, electronic equipment and storage medium
Su et al. Transfer learning for video recognition with scarce training data for deep convolutional neural network
CN108133020A (en) Video classification methods, device, storage medium and electronic equipment
CN113779303A (en) Video set indexing method and device, storage medium and electronic equipment
CN114637923A (en) Data information recommendation method and device based on hierarchical attention-graph neural network
CN112784929A (en) Small sample image classification method and device based on double-element group expansion
CN111310041A (en) Image-text publishing method, model training method and device and storage medium
WO2022183805A1 (en) Video classification method, apparatus, and device
Chiang et al. A multi-embedding neural model for incident video retrieval

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20180612