CN110414344A - A kind of human classification method, intelligent terminal and storage medium based on video - Google Patents
A kind of human classification method, intelligent terminal and storage medium based on video Download PDFInfo
- Publication number
- CN110414344A CN110414344A CN201910553048.8A CN201910553048A CN110414344A CN 110414344 A CN110414344 A CN 110414344A CN 201910553048 A CN201910553048 A CN 201910553048A CN 110414344 A CN110414344 A CN 110414344A
- Authority
- CN
- China
- Prior art keywords
- target person
- image block
- classification
- video
- video frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Multimedia (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The present invention provides a kind of human classification method, intelligent terminal and storage medium based on video, which comprises obtain video frame images to be detected, extract include in the video frame images target person image block;Described image block is inputted in the sorter network model, the preliminary classification result and attention weight of target person in described image block are obtained;The final classification of the target person is obtained according to the preliminary classification result of target person in described image block and attention weight as a result, classifying according to the final classification result to target person included in the video frame images.Method provided by the present invention is extracted the image block of target person to be detected respectively and is classified to target person by region extraction module and sorter network model, the e-learning power weight that gains attention is combined with initial predicted result, contribution of the characteristic part to final classification result is improved, so that video human classification result is more accurate.
Description
Technical field
The present invention relates to image identification technical field more particularly to a kind of human classification method, intelligence based on video
It can terminal and storage medium.
Background technique
In recent years, with the development of internet and entertainment industry, number of videos rapidly increases.Video based on video content
Understand and the demand of retrieval is also being continuously improved.Field is understood in video, and person detecting is one of important subject.
Due to the difference of camera angle, illumination condition is complicated, the variation of countenance and blocks, the personage in video
It detects extremely challenging.Presently relevant technology includes target detection technique and pedestrian's weight identification technology.The personage of target detection
It is a given sub-picture, the coordinate and classification information of the object or person of classification to be detected is belonged in forecast image.And pedestrian's weight
The target of identification is that the personage in image is classified and retrieved.Although the above method all achieves not on respective field
Wrong effect.But person detecting field in video, since the phase knowledge and magnanimity between personage are high, target detection often will appear
Classification error causes human classification accuracy rate low.
Therefore, the existing technology needs further improvement.
Summary of the invention
In view of the above shortcomings in the prior art, personage's inspection based on video that the purpose of the present invention is to provide a kind of
Survey method, intelligent terminal and storage medium overcome since the phase knowledge and magnanimity between personage are high in existing video person detecting field, people
The low defect of object classification accuracy.
First embodiment disclosed in this invention is a kind of human classification method based on video, wherein including following step
It is rapid:
Video frame images to be detected are obtained, the image block in the video frame images comprising target person is extracted;
Described image block is inputted in the sorter network model, the preliminary classification of target person in described image block is obtained
As a result with attention weight;The sorter network model is target in image block and described image block based on the target person
Made of the corresponding relationship training of the preliminary classification result and attention weight of personage;
The target person is obtained according to the preliminary classification result of target person in described image block and attention weight
Final classification is as a result, classify to target person included in the video frame images according to the final classification result.
The human classification method based on video, wherein extract the figure in the video frame images comprising target person
As the step of block, specifically include:
The video frame images input area is extracted in network model, is extracted in the video frame images comprising target person
The image block of object;The extracted region network model is based on target in input video frame image and the input video frame image
Made of the corresponding relationship training of character image block.
The human classification method based on video, wherein the sorter network model includes: the first convolutional layer, Chi Hua
Layer and the second convolutional layer containing multiple sub- convolutional layers;
Described to input described image block in the sorter network model, target person is initial in acquisition described image block
It the step of classification results and attention weight, specifically includes:
Described image block is inputted in the first convolutional layer, the characteristic pattern of described image block is extracted;
The characteristic pattern is inputted into pond layer, obtains multiple feature vectors of the characteristic pattern;
Each described eigenvector is separately input in each sub- convolutional layer, target person in described image block is obtained
Preliminary classification result and attention weight.
The human classification method based on video, wherein second convolutional layer includes: the first sub- convolutional layer, second
Convolutional layer, classifier and Recurrent networks;
Each described eigenvector is separately input in each sub- convolutional layer, is obtained corresponding to each described eigenvector
Preliminary classification result and attention weight the step of, specifically include:
Each described eigenvector is sequentially input in the first sub- convolutional layer and the second sub- convolutional layer, each spy is exported
Levy the first dimensional characteristics and the second dimensional characteristics corresponding to vector;
First dimensional characteristics are inputted into classifier, obtain the preliminary classification result of target person in described image block;
Second dimensional characteristics are inputted into Recurrent networks, obtain the attention weight of target person in described image block.
The human classification method based on video, wherein initial point according to target person in described image block
Class result and attention weight obtain the final classification of the target person as a result, according to the final classification result to the view
It the step of target person included in frequency frame image is classified, specifically includes:
The preliminary classification result of target person in described image block and attention multiplied by weight are obtained into the target person
Final classification result;
The final classification end value for choosing the target person is maximum a kind of as included in the video frame images
Target person tag along sort.
The human classification method based on video, wherein the extracted region network model include: the first extract layer and
Second extract layer;
It is described to extract the video frame images input area in network model, it extracts in the video frame images comprising mesh
The step of marking the image block of personage, specifically includes:
The video frame images are inputted in first extract layer, obtaining includes the corresponding feature of target person detection block
Figure;
It described will be inputted in second extract layer comprising the corresponding characteristic pattern of target person detection block, and extract the video
It include the image block of target person in frame image.
The human classification method based on video, wherein described that the video frame images input area is extracted into network
In model, before extracting the step of including the image block of target person in the video frame images, further includes:
Obtaining includes target person to training image collection, to the true classification for concentrating target person to training image
It is labeled with true coordinate;
It described will be extracted in network model to training image collection input area, neural network forecast is obtained by propagated forward algorithm
Target person classification and coordinate;
By loss function to the true classification of the target person of mark and the target person of true coordinate and neural network forecast
Classification and coordinate be compared, obtain prediction error;
The prediction error is trained the extracted region network model by back-propagation algorithm.
The human classification method based on video, wherein the loss function are as follows:
Wherein, i is the serial number of detection block in training process,For the true classification of target person in i-th of detection block,
For the true coordinate of target person in i-th of detection block, piFor the neural network forecast classification of target person in i-th of detection block, xiFor
The neural network forecast coordinate of target person, N in i-th of detection blockarmAnd NodmIt is detected respectively in extracted region network model
The sum of frame comprising personage to be detected, LbFor an intersection loss function, LrIt is a recurrence loss function.
A kind of intelligent terminal, wherein it include: processor, the storage medium that is connect with processor communication, the storage medium
Suitable for storing a plurality of instruction;The processor is suitable for calling the instruction in the storage medium, to execute realization any of the above-described
The step of described human classification method based on video.
A kind of storage medium, wherein the control of the item recommendation method based on collaborative filtering is stored on the storage medium
The control program of processing procedure sequence, the item recommendation method based on collaborative filtering is realized described in any item when being executed by processor
The step of human classification method based on video.
Beneficial effect, the present invention provides a kind of human classification method, intelligent terminal and storage medium based on video lead to
The image block that region extraction module extracts target person to be detected is crossed, the feature of image block and right is extracted by classification and Detection module
Target person is classified, position detection and the assorting process separation of target person, and attention is introduced in assorting process
Power mechanism is gained attention power weight by e-learning, attention weight is combined with initial predicted result, improves feature
Contribution of the property part to final classification result, so that video human classification result is more accurate.
Detailed description of the invention
Fig. 1 is the flow chart of the preferred embodiment of the human classification method provided by the present invention based on video;
Fig. 2 is the schematic diagram of the function of intelligent terminal of the invention.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer and more explicit, right as follows in conjunction with drawings and embodiments
The present invention is further described.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and do not have to
It is of the invention in limiting.
A kind of human classification method based on video provided by the invention, can be applied in terminal.Wherein, terminal can be with
But it is not limited to various personal computers, laptop, mobile phone, tablet computer, vehicle-mounted computer and portable wearable device.
Terminal of the invention uses multi-core processor.Wherein, the processor of terminal can be central processing unit (Central
Processing Unit, CPU), graphics processor (Graphics Processing Unit, GPU), video processing unit
At least one of (Video Processing Unit, VPU) etc..
When in order to solve in the prior art to classify to target person in video, since the phase knowledge and magnanimity between personage are high,
Target person detection often will appear classification error, and the problem for causing target person classification accuracy low, the present invention provides one
Human classification method of the kind based on video.
Fig. 1 is please referred to, Fig. 1 is a kind of stream of the preferred embodiment of human classification method based on video provided by the invention
Cheng Tu.
In example 1, there are three steps for the human detection method based on video:
S100, video frame images to be detected are obtained, extracts the image block in the video frame images comprising target person.
Video to be detected refers to the video for needing the method for the human classification based on video to be handled.As video to be detected can
To be the video of a certain monitor recording, a certain section of television video etc..Video to be detected is continuously shown by multiple image,
In the present embodiment when classifying to personage, need to extract the image of video to be detected from video to be detected in advance.Ability
The method for extracting image in domain from video have been relatively mature, such as be obtained often to video to be detected by decoder or code
Frame image, therefore not to repeat here for the present patent application.
When it is implemented, parts of images includes target since video to be detected is continuously shown by multiple image
Character image, parts of images do not include target person image.In order to classify to the target person in video, in the present embodiment
It needs to extract the image block comprising target person from the video image to be detected of acquisition.The target person may include police
The suspect just required to look up, a certain role personage etc. in TV play.
When it is implemented, needing to pre-establish an extracted region network model for the image block extraction of target person.
The extracted region network model can be based on the common target detection network frame such as RefineDet, SSD or Faster RCNN
Frame is constructed.After obtaining video frame images to be detected, the video frame images input area is extracted in network model, is mentioned
Take the image block in the video frame images comprising target person.
In view of in the prior art, target person detection is mainly based upon deep learning to complete, and learns to be one progressive
Process, during target person detection, the background frame that network generates generally has thousands of, and includes target person
Detection block it is generally less, in network training process, be easy so that network is more biased to export background frame judgement.Though at present
So have by down-sampled to background frame progress, but since network can not be absorbed in the prediction of study class label and coordinate simultaneously, it is existing
There is method that the above problem can't be solved perfectly.Therefore, the extracted region network model in the present embodiment includes the first extract layer
With the second extract layer, first extract layer is used to carry out tentative prediction, second extract layer to the target person label
For being returned to the target person coordinate and carrying out more accurate prediction, the extracted region net to the target person label
Network model improves the accuracy of detection to the target person label and the target person coordinate part learning.
When it is implemented, described extract the video frame images input area in network model, the video frame is extracted
Before the step of including the image block of personage in image, further includes:
S100a, acquisition, to training image collection, concentrate the true of target person to training image to described comprising target person
Real classification and true coordinate are labeled;
S100b, it described will be extracted in network model to training image collection input area, net is obtained by propagated forward algorithm
The classification and coordinate of the target person of network prediction;
S100c, by loss function to the true classification of the target person of mark and the mesh of true coordinate and neural network forecast
The classification and coordinate for marking personage are compared, and obtain prediction error;
S100d, the prediction error is trained the extracted region network model by back-propagation algorithm.
When it is implemented, needing to prepare in advance in the present embodiment for train the extracted region network model includes mesh
Mark personage to training image collection, using annotation tool to the true classification of the target person concentrated to training image and true
Real coordinate is labeled.After the completion of mark, described will be extracted in network model to training image collection input area, it is described at this time to
Training image collection first passes through first extract layer and carries out coarse extraction.Specifically, by the convolutional layer in first extract layer,
Mark includes the detection block of target person position on every frame image to training image collection, roughly by propagated forward algorithm
Adjust the coordinate, scale and positive and negative classification of detection block (positive class indicates to include target person, and negative class indicates not including target person).
Then by it is all prediction be positive classification detection block position and classification information pass to the second extract layer, the second extract layer is first
Further accurate extraction is done on the basis of extract layer.Specifically, the positive class that will be obtained after the first extract layer extracts roughly
Detection block and the corresponding characteristic pattern of the positive class detection block input the second extract layer, and the convolutional layer in second extract layer is to defeated
The characteristic pattern entered carries out Feature Conversion, and the constraint of positive and negative classification and detection block classification is added to the characteristic pattern after conversion, finally
Export the classification and coordinate of the target person of neural network forecast.In the present embodiment extracted region network model include the first extract layer and
Second extract layer, the first extract layer carry out tentative prediction to target person label, the second extract layer are focused more on
Recurrence to target person coordinate, while more accurately prediction is carried out to target person label, two extract layers cooperate,
It is common to improve the accuracy for extracting target person image block.
Specifically, the process of the propagated forward is layer-by-layer from front to back in all convolutional layers of first extract layer
It carries out, each layer of calculation formula is as follows:
Wherein xi-1Indicate the input of current layer, wi-1Indicate the network parameter of current layer,Indicate convolution algorithm, xiIt indicates
The output of current layer, f indicate that ReLu function, ReLu function are defined as follows:
Further, acquisition is mentioned in abovementioned steps after training image collection, can mark the mesh that training image is concentrated manually
The true classification and true coordinate for marking personage, after the classification and the coordinate that obtain the target person of neural network forecast, by what is artificially marked
The classification and coordinate of the target person of the true classification and true coordinate and neural network forecast of target person are carried out by loss function
Compare.The true coordinate of the target person wherein artificially marked is the learning objective of the coordinate of the target person of neural network forecast, with
The progress of training, the coordinate value of the target person of neural network forecast can become closer to the true seat of the target person artificially marked
Scale value.Specifically, the formula of the loss function specifically:
Wherein, i is the serial number of detection block in training process,For the true classification of target person in i-th of detection block,
For the true coordinate of target person in i-th of detection block, piFor the neural network forecast classification of target person in i-th of detection block, xiFor
The neural network forecast coordinate of target person, N in i-th of detection blockarmAnd NodmIt is detected respectively in extracted region network model
The sum of frame comprising personage to be detected, LbFor an intersection loss function, LrIt is a recurrence loss function.In the present embodiment
Extracted region network in loss function be a prospect and background binary classification loss function, Softmax can also be used
More Classification Loss function training networks, this is not construed as limiting in the present patent application.
When it is implemented, LbFor an intersection loss function, function is specifically defined are as follows:
LrIt is a recurrence loss function, this, which returns loss function, can use L1 loss function, can also be damaged using L2
Lose function, it is preferable that L1 loss function is used in the present embodiment, function is defined as follows: L1 (x1,x2)=| x1-x2|, work as bracket
It is 1 when interior condition is set up, is otherwise 0.
Further, by the true classification and true coordinate of the target person artificially marked and the target person of neural network forecast
Classification and coordinate be compared by loss function, obtain neural network forecast error, then will prediction error by backpropagation calculation
Method is trained the extracted region network model, the communication process of specific backpropagation be by the last layer convolutional layer by
Layer is propagated forward, and each layer of propagation formula is as follows:
WhereinIt is loss function to the partial derivative of current convolution layer parameter, α is learning rate, generally 0.0001, every instruction
Practice 50 times, decays to original 0.1 times.
Further, the extracted region network model includes: the first extract layer and the second extract layer.By the video frame
Image input area domain is extracted in network model, and the step of include the image block of target person in the video frame images is extracted, and is had
Body includes:
S101, the video frame images are inputted in first extract layer, is obtained corresponding comprising target person detection block
Characteristic pattern;
S102, it described will be inputted in second extract layer comprising the corresponding characteristic pattern of target person detection block, and extract institute
State the image block in video frame images comprising target person.
When it is implemented, after the completion of to extracted region network model training, so that it may by video frame figure to be detected
As in input trained extracted region network model.The video frame images to be detected can pass through the first extract layer, will
Most background frame filters out, and obtains comprising the corresponding characteristic pattern of target person detection block.Then it will obtain comprising target
The corresponding characteristic pattern of person detecting frame inputs in the second extract layer, further to described comprising the corresponding spy of target person detection block
Sign figure carries out Feature Conversion, obtains the graph block in the video frame figure comprising target person.Pass through above-mentioned double extract layers
Processing, network can more accurately obtain the detection block comprising target person compared to other current main stream approach.
Fig. 1 is continued back at, the human classification method based on video further comprises the steps of:
S200, described image block is inputted in the sorter network model, target person is first in acquisition described image block
Beginning classification results and attention weight.
Only having been extracted in video frame images to be detected in step S100 may connect down comprising the image block of target person
To need to classify to target person.It presets in the present embodiment and classifies for the image block to the target person
Sorter network model.The sorter network model uses ResNet50 framework, and increases three-layer coil lamination, can also use it
He replaces conventional sorter network, and such as VGG, ResNet, DenseNet etc. are not construed as limiting this in the present embodiment.
When it is implemented, the sorter network model includes: the first convolutional layer, pond layer and containing multiple sub- convolutional layers
Second convolutional layer.It is described to input described image block in the sorter network model, obtain target person in described image block
The step of preliminary classification result and attention weight, specific steps include:
S201, described image block is inputted in the first convolutional layer, extracts the characteristic pattern of described image block;
S202, the characteristic pattern is inputted into pond layer, obtains multiple feature vectors of the characteristic pattern;
S203, each described eigenvector is separately input in each sub- convolutional layer, obtains target in described image block
The preliminary classification result and attention weight of personage.
When it is implemented, after inputting image block in sorter network model in the present embodiment, it can be first defeated by described image block
Enter in the first convolutional layer, extracts the characteristic pattern of described image block.Such as when the sorter network model uses ResNet50 framework
When, described image block inputs the 3 dimensional feature figures that image is extracted in the first convolutional layer.Later, average pond is carried out by pond layer
Change, the 3 dimensional feature figure is averaged in the horizontal direction is divided into 6 parts, and every part corresponds to a feature vector of picture.
In view of in practical application, user often pays close attention to certain special parts, such as face etc. when observing personage.
In order to make final human classification result be more nearly the classification results of actual user, sorter network model described in the present embodiment
In be provided with the second convolutional layer containing multiple sub- convolutional layers, by each described eigenvector obtained by pond layer distinguish it is defeated
Enter into each sub- convolutional layer after carrying out convolution, obtains preliminary classification result and the attention power of target person in described image block
Weight.
When it is implemented, second convolutional layer includes: the first sub- convolutional layer, the second convolutional layer, classifier and returns net
Network.It is described that each described eigenvector is separately input in each sub- convolutional layer, it obtains corresponding to each described eigenvector
Preliminary classification result and attention weight the step of, specifically include:
S201, each described eigenvector is sequentially input in the first sub- convolutional layer and the second sub- convolutional layer, is exported each
First dimensional characteristics and the second dimensional characteristics corresponding to described eigenvector;
S202, first dimensional characteristics are inputted into classifier, obtains the preliminary classification of target person in described image block
As a result;
S203, second dimensional characteristics are inputted into Recurrent networks, obtains the attention of target person in described image block
Weight.
When it is implemented, the second convolutional layer described in the present embodiment is provided with two different sub- convolutional layers, respectively
One sub- convolutional layer and the second sub- convolutional layer.Each described eigenvector is sequentially input in the first sub- convolutional layer first, output is each
The dimension of first dimensional characteristics corresponding to a described eigenvector are as follows: 2048- > 256- > 6, by the first dimensional characteristics of output
Be connected on classifier, such as existing support vector machines (SVM), output obtains the preliminary classification result of 6x7.By each institute
It states feature vector to sequentially input in the second sub- convolutional layer, exports the latitude of the second dimensional characteristics corresponding to each described eigenvector
Degree are as follows: the second dimensional characteristics of output are connected to Recurrent networks by 2048- > 256- > 1, such as existing logistics is returned
Network, output obtain the attention weight of a 6x1.
Further, it before being classified using sorter network model to the obtained image block comprising target person, needs
The sorter network model is trained.Specific training process is to obtain the pictures to be trained comprising target person, right
The true classification of target person is labeled in the pictures to be trained.Then the pictures to be trained are inputted into the first volume
In lamination, the characteristic pattern of image in the pictures to be trained is extracted;Then the characteristic pattern is inputted into pond layer, obtains every portion
Divide the corresponding feature vector of characteristic pattern, described eigenvector is inputted in the first sub- convolutional layer and the second sub- convolutional layer respectively and obtained
The corresponding preliminary classification result of target person and attention weight in described image, according to the preliminary classification result and attention
The target person classification results of weight output category network model prediction, it is by the target person classification results of prediction and artificial in advance
The true classification of the target person of mark compares, and the two is subtracted each other to obtain training error, then by back-propagation algorithm to described
Sorter network model is trained.Specific back-propagation algorithm uses reversed when extracting network model training with aforementioned areas
Propagation algorithm is identical, and details are not described herein.
Fig. 1 is continued back at, the human classification method based on video further comprises the steps of:
S300, the target person is obtained according to the preliminary classification result and attention weight of target person in described image block
The final classification of object is as a result, divide target person included in the video frame images according to the final classification result
Class.
Specifically, as previously mentioned, the first sub- convolutional layer exports the dimension of the first dimensional characteristics are as follows: 2048- > 256- > 6, it will
First dimensional characteristics input in existing support vector machines, and output obtains the preliminary classification of 6x7 as a result, note preliminary classification knot
Fruit is ci, each ciA classification results are represented, 6 kinds of classification results are finally obtained;Second sub- convolutional layer exports the second dimensional characteristics
Dimension are as follows: second dimensional characteristics are inputted existing logistics Recurrent networks by 2048- > 256- > 1, and output obtains
The attention weight of one 6x1, note attention weight are wi, according to formula:Wherein attention weighted value
It may range from the arbitrary value between [0,1], it is also adjustable wider to [0,5] etc..According to target in described image block
The preliminary classification result and attention weight of personage obtain the final classification result of the target person.Such as the preliminary classification of 6x7
As a result, each ciA classification results are represented, 6 kinds of classification results are finally obtained, obtain 6 kinds of classification results are passed through into attention
Weight wiWeighting obtains final classification as a result, described in the maximum a kind of conduct of the final classification end value for choosing the target person
The tag along sort of target person contained in video frame images, to classify to target person.
Embodiment 2
Based on the above embodiment, the present invention also provides a kind of intelligent terminal, functional block diagram can be as shown in Figure 2.It should
Intelligent terminal includes processor, memory, network interface, display screen and the temperature sensor connected by system bus.Wherein,
The processor of the intelligent terminal is for providing calculating and control ability.The memory of the intelligent terminal includes that non-volatile memories are situated between
Matter, built-in storage.The non-volatile memory medium is stored with operating system and computer program.The built-in storage is non-volatile
The operation of operating system and computer program in storage medium provides environment.The network interface of the intelligent terminal is used for and outside
Terminal by network connection communication.To realize a kind of human classification based on video when the computer program is executed by processor
Method.The display screen of the intelligent terminal can be liquid crystal display or electric ink display screen, and the temperature of the intelligent terminal passes
Sensor is to be arranged inside intelligent terminal in advance, for detecting the current running temperature of internal unit.
It will be understood by those skilled in the art that functional block diagram shown in Figure 2, only portion relevant to the present invention program
The block diagram of separation structure does not constitute the restriction for the intelligent terminal that the system described in the present invention program is applied thereon, specific intelligence
Energy terminal may include perhaps combining certain components or with different components than more or fewer components as shown in the figure
Arrangement.
In one embodiment, a kind of intelligent terminal, including memory and processor are provided, is stored with meter in memory
Following steps at least may be implemented when executing computer program in calculation machine program, the processor:
Video frame images to be detected are obtained, the image block in the video frame images comprising target person is extracted;
Described image block is inputted in the sorter network model, the preliminary classification of target person in described image block is obtained
As a result with attention weight;The sorter network model is target in image block and described image block based on the target person
Made of the corresponding relationship training of the preliminary classification result and attention weight of personage;
The target person is obtained according to the preliminary classification result of target person in described image block and attention weight
Final classification is as a result, classify to target person included in the video frame images according to the final classification result.
In one of them embodiment, which can also realize when executing computer program: by the video frame
Image input area domain is extracted in network model, and the image block in the video frame images comprising target person is extracted;The region
Extracting network model is the corresponding pass based on input video frame image with target person image block in the input video frame image
Made of system's training.
In one of them embodiment, which can also realize when executing computer program: described by the figure
As obtaining the preliminary classification result and attention weight of target person in described image block in the block input sorter network model
The step of, it specifically includes: described image block is inputted in the first convolutional layer, extract the characteristic pattern of described image block;By the spy
Sign figure input pond layer, obtains multiple feature vectors of the characteristic pattern;Each described eigenvector is separately input to each
In sub- convolutional layer, the preliminary classification result and attention weight of target person in described image block are obtained.
In one of them embodiment, which can also realize when executing computer program: described by each institute
Feature vector is stated to be separately input in each sub- convolutional layer, obtain preliminary classification result corresponding to each described eigenvector and
It the step of attention weight, specifically includes: each described eigenvector is sequentially input into the first sub- convolutional layer and the second sub- convolution
In layer, the first dimensional characteristics and the second dimensional characteristics corresponding to each described eigenvector are exported;First dimension is special
Sign input classifier, obtains the preliminary classification result of target person in described image block;Second dimensional characteristics are inputted back
Return network, obtains the attention weight of target person in described image block.
In one of them embodiment, which can also realize when executing computer program: by described image block
The preliminary classification result and attention multiplied by weight of middle target person obtain the final classification result of the target person;Choose institute
State maximum a kind of point as target person included in the video frame images of final classification end value of target person
Class label.
In one of them embodiment, which can also realize when executing computer program: described by the view
Frequency frame image input area domain is extracted in network model, and the step of the image block in the video frame images comprising target person is extracted
Suddenly, it specifically includes: the video frame images is inputted in first extract layer, obtain corresponding comprising target person detection block
Characteristic pattern;It is inputted in the second extract layer to described comprising the corresponding characteristic pattern of target person detection block, obtains the video frame figure
Include the image block of target person as in.
In one of them embodiment, which can also realize when executing computer program: obtain comprising target
Personage to training image collection, concentrate the true classification of target person and true coordinate to be labeled to training image to described;
It described will be extracted in network model to training image collection input area, the target person of neural network forecast is obtained by propagated forward algorithm
The classification and coordinate of object;By loss function to the true classification of the target person of mark and the mesh of true coordinate and neural network forecast
The classification and coordinate for marking personage are compared, and obtain prediction error;By the prediction error by back-propagation algorithm to described
Extracted region network model is trained.
In one of them embodiment, which can also realize when executing computer program: pass through formulaTo mark
Personage's coordinate of the true classification and true coordinate of personage and personage's class label of neural network forecast and neural network forecast is compared,
Obtaining prediction error, wherein i is the serial number of detection block in training process,For the true class of target person in i-th of detection block
Not,For the true coordinate of target person in i-th of detection block, piFor the neural network forecast class of target person in i-th of detection block
Not, xiFor the neural network forecast coordinate of target person in i-th of detection block, NarmAnd NodmRespectively institute in extracted region network model
Detect the sum of the frame comprising personage to be detected, LbFor an intersection loss function, LrIt is a recurrence loss function.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with
Relevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computer
In read/write memory medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein,
To any reference of memory, storage, database or other media used in each embodiment provided by the present invention,
Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM
(PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include
Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms,
Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing
Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM
(RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
In conclusion the present invention provides a kind of human classification method, intelligent terminal and storage medium based on video, institute
The method of stating includes: to obtain video frame images to be detected, extracts the image block in the video frame images comprising target person;It will
Described image block, which inputs in the sorter network model, extracts feature vector, obtains initial point of target person in described image block
Class result and attention weight;According to the preliminary classification result of target person in described image block and the acquisition of attention weight
The final classification of target person is as a result, according to the final classification result to target person contained in the video frame images
Classify.Method provided by the present invention extracts target person to be detected by region extraction module and sorter network model respectively
The image block of object and classify to target person, the e-learning power weight that gains attention combined with initial predicted result,
Contribution of the characteristic part to final classification result is improved, so that video human classification result is more accurate.
It should be understood that system application of the invention is not limited to above-mentioned citing, those of ordinary skill in the art are come
It says, it can be modified or changed according to the above description, and all these modifications and variations all should belong to right appended by the present invention and want
The protection scope asked.
Claims (10)
1. a kind of human classification method based on video, characterized in that it comprises:
Video frame images to be detected are obtained, the image block in the video frame images comprising target person is extracted;
Described image block is inputted in the sorter network model, the preliminary classification result of target person in described image block is obtained
With attention weight;The sorter network model is target person in image block and described image block based on the target person
Preliminary classification result and attention weight corresponding relationship training made of;
The final of the target person is obtained according to the preliminary classification result of target person in described image block and attention weight
Classification results classify to target person included in the video frame images according to the final classification result.
2. the human classification method based on video according to claim 1, which is characterized in that extract in the video frame images
It the step of including the image block of target person, specifically includes:
The video frame images input area is extracted in network model, extracting includes target person in the video frame images
Image block;The extracted region network model is based on target person in input video frame image and the input video frame image
Made of the corresponding relationship training of image block.
3. the human classification method based on video according to claim 1, which is characterized in that the sorter network model packet
It includes: the first convolutional layer, pond layer and the second convolutional layer containing multiple sub- convolutional layers;
It is described to input described image block in the sorter network model, obtain the preliminary classification of target person in described image block
As a result it with attention weight the step of, specifically includes:
Described image block is inputted in the first convolutional layer, the characteristic pattern of described image block is extracted;
The characteristic pattern is inputted into pond layer, obtains multiple feature vectors of the characteristic pattern;
Each described eigenvector is separately input in each sub- convolutional layer, target person is initial in acquisition described image block
Classification results and attention weight.
4. the human classification method based on video according to claim 3, which is characterized in that second convolutional layer includes:
First sub- convolutional layer, the second convolutional layer, classifier and Recurrent networks;
It is described that each described eigenvector is separately input in each sub- convolutional layer, it obtains corresponding to each described eigenvector
Preliminary classification result and attention weight the step of, specifically include:
Each described eigenvector is sequentially input in the first sub- convolutional layer and the second sub- convolutional layer, export each feature to
Amount corresponding the first dimensional characteristics and the second dimensional characteristics;
First dimensional characteristics are inputted into classifier, obtain the preliminary classification result of target person in described image block;
Second dimensional characteristics are inputted into Recurrent networks, obtain the attention weight of target person in described image block.
5. the human classification method based on video according to claim 1, which is characterized in that described according in described image block
The preliminary classification result and attention weight of target person obtain the final classification of the target person as a result, according to described final
It the step of classification results classify to target person included in the video frame images, specifically includes:
The preliminary classification result of target person in described image block and attention multiplied by weight are obtained into the target person most
Whole classification results;
The final classification end value for choosing the target person is maximum a kind of as mesh included in the video frame images
Mark the tag along sort of personage.
6. the human classification method based on video according to claim 2, which is characterized in that the extracted region network model
It include: the first extract layer and the second extract layer;
It is described to extract the video frame images input area in network model, it extracts in the video frame images comprising target person
It the step of image block of object, specifically includes:
The video frame images are inputted in first extract layer, obtaining includes the corresponding characteristic pattern of target person detection block;
It described will be inputted in second extract layer comprising the corresponding characteristic pattern of target person detection block, and extract the video frame figure
Include the image block of target person as in.
7. the human classification method based on video according to claim 6, which is characterized in that described by the video frame images
Input area extracts in network model, before extracting the step of include the image block of target person in the video frame images, and also
Include:
Obtaining includes target person to training image collection, to the true classification to training image concentration target person and very
Real coordinate is labeled;
It described will be extracted in network model to training image collection input area, the mesh of neural network forecast is obtained by propagated forward algorithm
Mark the classification and coordinate of personage;
By loss function to the class of the true classification and true coordinate of the target person of mark and the target person of neural network forecast
It is not compared with coordinate, obtains prediction error;
The prediction error is trained the extracted region network model by back-propagation algorithm.
8. the human classification method based on video according to claim 7, which is characterized in that the loss function are as follows:
Wherein, i is the serial number of detection block in training process,For the true classification of target person in i-th of detection block,It is
The true coordinate of target person, p in i detection blockiFor the neural network forecast classification of target person in i-th of detection block, xiIt is i-th
The neural network forecast coordinate of target person, N in a detection blockarmAnd NodmIt is detected respectively in extracted region network model to include
The sum of the frame of personage to be detected, LbFor an intersection loss function, LrIt is a recurrence loss function.
9. a kind of intelligent terminal characterized by comprising processor, the storage medium being connect with processor communication, the storage
Medium is suitable for storing a plurality of instruction;The processor is suitable for calling the instruction in the storage medium, realizes above-mentioned power to execute
The step of human classification method that benefit requires 1-8 described in any item based on video.
10. a kind of storage medium, which is characterized in that be stored with the item recommendation method based on collaborative filtering on the storage medium
Control program, when the control program of the item recommendation method based on collaborative filtering is executed by processor realize as right is wanted
The step of seeking the human classification method described in any one of 1-8 based on video.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910553048.8A CN110414344B (en) | 2019-06-25 | 2019-06-25 | Character classification method based on video, intelligent terminal and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910553048.8A CN110414344B (en) | 2019-06-25 | 2019-06-25 | Character classification method based on video, intelligent terminal and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110414344A true CN110414344A (en) | 2019-11-05 |
CN110414344B CN110414344B (en) | 2023-06-06 |
Family
ID=68359697
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910553048.8A Active CN110414344B (en) | 2019-06-25 | 2019-06-25 | Character classification method based on video, intelligent terminal and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110414344B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111046974A (en) * | 2019-12-25 | 2020-04-21 | 珠海格力电器股份有限公司 | Article classification method and device, storage medium and electronic equipment |
CN111461246A (en) * | 2020-04-09 | 2020-07-28 | 北京爱笔科技有限公司 | Image classification method and device |
CN111814617A (en) * | 2020-06-28 | 2020-10-23 | 智慧眼科技股份有限公司 | Video-based fire determination method and device, computer equipment and storage medium |
CN111914107A (en) * | 2020-07-29 | 2020-11-10 | 厦门大学 | Instance retrieval method based on multi-channel attention area expansion |
CN112101154A (en) * | 2020-09-02 | 2020-12-18 | 腾讯科技(深圳)有限公司 | Video classification method and device, computer equipment and storage medium |
CN112995666A (en) * | 2021-02-22 | 2021-06-18 | 天翼爱音乐文化科技有限公司 | Video horizontal and vertical screen conversion method and device combined with scene switching detection |
CN113191205A (en) * | 2021-04-03 | 2021-07-30 | 国家计算机网络与信息安全管理中心 | Method for identifying special scene, object, character and noise factor in video |
CN113496231A (en) * | 2020-03-18 | 2021-10-12 | 北京京东乾石科技有限公司 | Classification model training method, image classification method, device, equipment and medium |
CN113673576A (en) * | 2021-07-26 | 2021-11-19 | 浙江大华技术股份有限公司 | Image detection method, terminal and computer readable storage medium thereof |
CN113673588A (en) * | 2021-08-12 | 2021-11-19 | 连尚(北京)网络科技有限公司 | Method, apparatus, medium, and program product for video classification |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN205388826U (en) * | 2016-03-09 | 2016-07-20 | 郑永春 | Vehicle recognition cameras |
CN106845361A (en) * | 2016-12-27 | 2017-06-13 | 深圳大学 | A kind of pedestrian head recognition methods and system |
CN109034024A (en) * | 2018-07-16 | 2018-12-18 | 浙江工业大学 | Logistics vehicles vehicle classification recognition methods based on image object detection |
CN109074472A (en) * | 2016-04-06 | 2018-12-21 | 北京市商汤科技开发有限公司 | Method and system for person recognition |
CN109359592A (en) * | 2018-10-16 | 2019-02-19 | 北京达佳互联信息技术有限公司 | Processing method, device, electronic equipment and the storage medium of video frame |
CN109614517A (en) * | 2018-12-04 | 2019-04-12 | 广州市百果园信息技术有限公司 | Classification method, device, equipment and the storage medium of video |
CN109684990A (en) * | 2018-12-20 | 2019-04-26 | 天津天地伟业信息系统集成有限公司 | A kind of behavioral value method of making a phone call based on video |
-
2019
- 2019-06-25 CN CN201910553048.8A patent/CN110414344B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN205388826U (en) * | 2016-03-09 | 2016-07-20 | 郑永春 | Vehicle recognition cameras |
CN109074472A (en) * | 2016-04-06 | 2018-12-21 | 北京市商汤科技开发有限公司 | Method and system for person recognition |
CN106845361A (en) * | 2016-12-27 | 2017-06-13 | 深圳大学 | A kind of pedestrian head recognition methods and system |
CN109034024A (en) * | 2018-07-16 | 2018-12-18 | 浙江工业大学 | Logistics vehicles vehicle classification recognition methods based on image object detection |
CN109359592A (en) * | 2018-10-16 | 2019-02-19 | 北京达佳互联信息技术有限公司 | Processing method, device, electronic equipment and the storage medium of video frame |
CN109614517A (en) * | 2018-12-04 | 2019-04-12 | 广州市百果园信息技术有限公司 | Classification method, device, equipment and the storage medium of video |
CN109684990A (en) * | 2018-12-20 | 2019-04-26 | 天津天地伟业信息系统集成有限公司 | A kind of behavioral value method of making a phone call based on video |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111046974A (en) * | 2019-12-25 | 2020-04-21 | 珠海格力电器股份有限公司 | Article classification method and device, storage medium and electronic equipment |
CN113496231A (en) * | 2020-03-18 | 2021-10-12 | 北京京东乾石科技有限公司 | Classification model training method, image classification method, device, equipment and medium |
CN113496231B (en) * | 2020-03-18 | 2024-06-18 | 北京京东乾石科技有限公司 | Classification model training method, image classification method, device, equipment and medium |
CN111461246A (en) * | 2020-04-09 | 2020-07-28 | 北京爱笔科技有限公司 | Image classification method and device |
CN111814617B (en) * | 2020-06-28 | 2023-01-31 | 智慧眼科技股份有限公司 | Fire determination method and device based on video, computer equipment and storage medium |
CN111814617A (en) * | 2020-06-28 | 2020-10-23 | 智慧眼科技股份有限公司 | Video-based fire determination method and device, computer equipment and storage medium |
CN111914107A (en) * | 2020-07-29 | 2020-11-10 | 厦门大学 | Instance retrieval method based on multi-channel attention area expansion |
CN111914107B (en) * | 2020-07-29 | 2022-06-14 | 厦门大学 | Instance retrieval method based on multi-channel attention area expansion |
CN112101154A (en) * | 2020-09-02 | 2020-12-18 | 腾讯科技(深圳)有限公司 | Video classification method and device, computer equipment and storage medium |
CN112101154B (en) * | 2020-09-02 | 2023-12-15 | 腾讯科技(深圳)有限公司 | Video classification method, apparatus, computer device and storage medium |
CN112995666A (en) * | 2021-02-22 | 2021-06-18 | 天翼爱音乐文化科技有限公司 | Video horizontal and vertical screen conversion method and device combined with scene switching detection |
CN112995666B (en) * | 2021-02-22 | 2022-04-22 | 天翼爱音乐文化科技有限公司 | Video horizontal and vertical screen conversion method and device combined with scene switching detection |
CN113191205A (en) * | 2021-04-03 | 2021-07-30 | 国家计算机网络与信息安全管理中心 | Method for identifying special scene, object, character and noise factor in video |
CN113673576A (en) * | 2021-07-26 | 2021-11-19 | 浙江大华技术股份有限公司 | Image detection method, terminal and computer readable storage medium thereof |
CN113673588A (en) * | 2021-08-12 | 2021-11-19 | 连尚(北京)网络科技有限公司 | Method, apparatus, medium, and program product for video classification |
Also Published As
Publication number | Publication date |
---|---|
CN110414344B (en) | 2023-06-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110414344A (en) | A kind of human classification method, intelligent terminal and storage medium based on video | |
Zhang et al. | Cross-modality interactive attention network for multispectral pedestrian detection | |
CN114202672A (en) | Small target detection method based on attention mechanism | |
CN112396002A (en) | Lightweight remote sensing target detection method based on SE-YOLOv3 | |
US20180114071A1 (en) | Method for analysing media content | |
CN109886066A (en) | Fast target detection method based on the fusion of multiple dimensioned and multilayer feature | |
Alshehri et al. | Deep attention neural network for multi-label classification in unmanned aerial vehicle imagery | |
CN111353544B (en) | Improved Mixed Pooling-YOLOV 3-based target detection method | |
CN107239775A (en) | Terrain classification method and device | |
CN111242144A (en) | Method and device for detecting abnormality of power grid equipment | |
CN108229432A (en) | Face calibration method and device | |
Liu et al. | A shadow detection algorithm based on multiscale spatial attention mechanism for aerial remote sensing images | |
CN114782798A (en) | Underwater target detection method based on attention fusion | |
CN115375781A (en) | Data processing method and device | |
CN115984226A (en) | Insulator defect detection method, device, medium, and program product | |
CN115496971A (en) | Infrared target detection method and device, electronic equipment and storage medium | |
Wu et al. | Improved YOLOX foreign object detection algorithm for transmission lines | |
CN111582057B (en) | Face verification method based on local receptive field | |
Mohamed et al. | Data augmentation for deep learning algorithms that perform driver drowsiness detection | |
Zhao et al. | Recognition and Classification of Concrete Cracks under Strong Interference Based on Convolutional Neural Network. | |
Shi et al. | Combined channel and spatial attention for YOLOv5 during target detection | |
CN116958615A (en) | Picture identification method, device, equipment and medium | |
Bai et al. | Countr: An end-to-end transformer approach for crowd counting and density estimation | |
CN114140524A (en) | Closed loop detection system and method for multi-scale feature fusion | |
Yue et al. | A Novel Two-stream Architecture Fusing Static And Dynamic Features for Human Action Recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |