CN114648713A - Video classification method and device, electronic equipment and computer-readable storage medium - Google Patents

Video classification method and device, electronic equipment and computer-readable storage medium Download PDF

Info

Publication number
CN114648713A
CN114648713A CN202011509800.8A CN202011509800A CN114648713A CN 114648713 A CN114648713 A CN 114648713A CN 202011509800 A CN202011509800 A CN 202011509800A CN 114648713 A CN114648713 A CN 114648713A
Authority
CN
China
Prior art keywords
video
video frame
classification
category
classified
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011509800.8A
Other languages
Chinese (zh)
Inventor
毛永波
孙文胜
韦晓全
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ByteDance Network Technology Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN202011509800.8A priority Critical patent/CN114648713A/en
Publication of CN114648713A publication Critical patent/CN114648713A/en
Pending legal-status Critical Current

Links

Images

Abstract

The embodiment of the disclosure discloses a video classification method, a video classification device, electronic equipment and a computer-readable storage medium. The video classification method comprises the following steps: acquiring a plurality of video frames of a video to be classified; classifying the plurality of video frames to obtain a first category of the plurality of video frames, wherein the first category comprises an object external video frame and an object internal video frame; identifying the object according to the object external video frame to obtain a first classification vector of the object external video frame; identifying the object according to the object internal video frame to obtain a first classification vector of the object internal video frame; and determining a classification result of the video to be classified according to the first classification vector. The method solves the technical problem of low recall rate of video classification by combining the object external video frame and the object internal video frame.

Description

Video classification method and device, electronic equipment and computer-readable storage medium
Technical Field
The present disclosure relates to the field of video classification, and in particular, to a video classification method and apparatus, an electronic device, and a computer-readable storage medium.
Background
In recent years, with the rapid development of the mobile internet, the short video industry rises rapidly, and the advantages of high propagation speed, low manufacturing threshold and strong social attribute are favored by a large number of users and creators. In order to recommend related content for a user more accurately, each video needs to be labeled in a category, for example, in a car video, a train described by the video needs to be labeled. For video train classification in a user created content (UGC) scene, the conventional technical scheme is to extract a video frame, extract a target area with the largest area ratio in a picture by using a detection model, and return to a vehicle train type of the target area.
Aiming at the problem of identifying a multi-label video train, the prior art scheme mainly has the following defects: 1. in the automobile video sample, a large amount of introduction contents to the automobile interior are included, and the existing scheme only identifies the automobile interior through the appearance of the automobile, so that the overall recall rate is not high; 2. the situation that a plurality of vehicles exist in a single-frame video sometimes, the existing scheme only returns a target result with the largest area, and misjudgment possibly exists; 3. the final result is only related to a single frame, and if the single frame is identified incorrectly, the accuracy is greatly reduced.
Disclosure of Invention
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
In order to solve the above technical problem, the embodiments of the present disclosure propose the following technical solutions.
In a first aspect, an embodiment of the present disclosure provides a video classification method, including:
acquiring a plurality of video frames of a video to be classified;
classifying the plurality of video frames to obtain a first category of the plurality of video frames, wherein the first category comprises an object external video frame and an object internal video frame;
identifying the object according to the object external video frame to obtain a first classification vector of the object external video frame;
identifying the object according to the object internal video frame to obtain a first classification vector of the object internal video frame;
and determining a classification result of the video to be classified according to the first classification vector.
In a second aspect, an embodiment of the present disclosure provides a video classification apparatus, including:
the video frame acquisition module is used for acquiring a plurality of video frames of the video to be classified;
a first classification module, configured to classify the plurality of video frames to obtain a first class of the plurality of video frames, where the first class includes an object external video frame and an object internal video frame;
the external first classification vector acquisition module is used for identifying the object according to the external video frame of the object to obtain a first classification vector of the external video frame of the object;
the internal first classification vector acquisition module is used for identifying the object according to the object internal video frame to obtain a first classification vector of the object internal video frame;
and the classification result determining module is used for determining the classification result of the video to be classified according to the first classification vector.
In a third aspect, an embodiment of the present disclosure provides an electronic device, including: at least one processor; and (c) a second step of,
a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the preceding first aspects.
In a fourth aspect, the disclosed embodiments provide a non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium stores computer instructions for causing a computer to perform the method of any one of the foregoing first aspects.
The embodiment of the disclosure discloses a video classification method, a video classification device, electronic equipment and a computer-readable storage medium. The video classification method comprises the following steps: acquiring a plurality of video frames of a video to be classified; classifying the plurality of video frames to obtain a first category of the plurality of video frames, wherein the first category comprises an object external video frame and an object internal video frame; identifying the object according to the object external video frame to obtain a first classification vector of the object external video frame; identifying the object according to the object internal video frame to obtain a first classification vector of the object internal video frame; and determining a classification result of the video to be classified according to the first classification vector. The method solves the technical problem of low recall rate of video classification by combining the object external video frame and the object internal video frame.
The foregoing is a summary of the present disclosure, and for the purposes of promoting a clear understanding of the technical means of the present disclosure, the present disclosure may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
Drawings
The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.
Fig. 1 is a schematic flowchart of a video classification method according to an embodiment of the present disclosure;
fig. 2 is a schematic flow chart of a video classification method provided in an embodiment of the present disclosure;
fig. 3 is a schematic flow chart of a video classification method according to an embodiment of the present disclosure;
fig. 4 is a schematic flow chart of a video classification method according to an embodiment of the present disclosure;
fig. 5 is a schematic flowchart of a video classification method according to an embodiment of the disclosure
Fig. 6 is a schematic flow chart of a video classification method according to an embodiment of the present disclosure;
fig. 7 is a schematic flow chart of a video classification method according to an embodiment of the present disclosure;
fig. 8 is a schematic view of an application scenario of a video classification method according to an embodiment of the present disclosure;
fig. 9 is a schematic structural diagram of an embodiment of a video classification apparatus according to an embodiment of the present disclosure;
fig. 10 is a schematic structural diagram of an electronic device provided according to an embodiment of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more complete and thorough understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.
It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence of the functions performed by the devices, modules or units.
It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.
The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.
Fig. 1 is a flowchart of an embodiment of a video classification method provided in this disclosure, where the video classification method provided in this embodiment may be executed by a video classification apparatus, and the video classification apparatus may be implemented as software, or implemented as a combination of software and hardware, and the video classification apparatus may be integrated in some device in a video classification system, such as a video classification server or a video classification terminal device. As shown in fig. 1, the method comprises the steps of:
step S101, acquiring a plurality of video frames of a video to be classified;
the video to be classified can be any type of video. And if the classification is the type of the object to be classified in the video to be classified, if the classification is the type of the automobile, the classification indicates that the automobile system of the automobile is included in the video to be classified, or the content of the video to be classified is related to the automobile system.
It can be understood that the video to be classified may be read from a preset location, or received through a preset interface, for example, the video to be classified is read from a preset storage location or a network location, or the video to be classified uploaded by a user is received through a human-computer interaction interface, and so on, which is not described herein again.
Optionally, the obtaining of the plurality of video frames of the video to be classified includes: and performing frame extraction on the video to be classified according to the frame extraction frequency to obtain the plurality of video frames. Exemplarily, the video to be classified is decimated according to a frequency of 2fps (frame per second), so as to obtain a video frame sequence I ═ I1,I2,……,InWhere n denotes the number of video frames.
Optionally, in order to facilitate the subsequent identification of the object in the video frame, preprocessing may be further included in this step. Illustratively, in this step, the video frames are normalized to have a length and a width of M and N, respectively, for facilitating the subsequent processing. It is understood that the preprocessing may include any preprocessing method, which is not described herein.
Returning to fig. 1, the video classification method further includes, in step S102, classifying the plurality of video frames to obtain a first category of the plurality of video frames, where the first category includes an object external video frame and an object internal video frame;
wherein the first category is to classify the video frames into object external video frames including external features of the object and object internal video frames including internal features of the object. For example, if the object is an automobile, in this step, the extracted video frames are divided into an automobile appearance video frame and an automobile interior video frame, wherein the automobile interior video frame may further include an automobile center control video frame because the automobile center control in the automobile interior may more accurately reflect the automobile system of an automobile.
In order to prevent false recognition caused by too large proportion of non-target objects when multiple objects appear in the video frame, optionally, as shown in fig. 2, the step S102 further includes:
step S201, carrying out target detection on the video frame to obtain at least one target detection frame;
step S202, calculating the comprehensive confidence of the target frame according to the confidence of the target detection frame, the distance between the target detection frame and the center point of the video frame and the proportion of the area of the target detection frame in the video frame;
step S203, the category corresponding to the target detection box with the maximum integrated confidence is used as the first category of the video frame.
Wherein, the step S201 may be performed by a pre-trained target detection model. The target detection model is used for detecting two types of video frames, namely an object exterior and an object interior. In particularIf the detection result of a certain video frame is null, it indicates that the frame does not include an object, and therefore the frame is discarded. When the detection result of a certain frame is not null, the target detection model outputs at least one target detection frame, and exemplarily, the output of the target detection model is represented as: b ═ B1,B2,......,BmWhere m denotes the number of object detection frames obtained in the frame. The kth target detection box is defined as Bk=[xk,yk,wk,hk,ck,sk]Wherein x iskAnd ykRespectively representing the abscissa and ordinate, w, of the upper left corner of the target detection boxkAnd hkRespectively represent the width and height of the target detection frame, ckRepresents a first class, skRepresenting a confidence level of the target detection box.
Further, in the step S202, a comprehensive confidence is calculated using the following formula (1):
Sk=sk*dk*ak (1)
wherein d isk∈(0,1]Representing the position score of the target detection frame, wherein the score is higher the closer the central point of the target detection frame is to the central point of the video frame; illustratively, said dkCalculated according to the following equation (2):
dk=max(|xk+wk/2-N/2|,|yk+hk/2-M/2|) (2)
wherein, ak∈(0,1]Represents the area ratio of the target detection frame in the video frame, and exemplarily, the akCalculated according to the following equation (3):
ak=wk*hk/(M*N) (3)
in step S202, a comprehensive confidence of each target detection box output by the model is calculated to obtain at least one comprehensive confidence.
In step S203, the at least one integrated confidence level is sorted according to size, and a category corresponding to the target detection frame with the highest integrated confidence level is used as the first category of the video frame.
In the above steps S201 to S203, by adding the position score and the area ratio during the target detection, the weight of the target detection frame having a larger area near the middle position of the video frame is made larger, so that when a plurality of target detection frames are detected, the video frame can be classified more accurately.
Returning to fig. 1, the video classification method further includes, in step S103, identifying the object according to the object external video frame to obtain a first classification vector of the object external video frame; and step S104, identifying the object according to the object internal video frame to obtain a first classification vector of the object internal video frame.
Optionally, in the step S103, the video frame classified as the object external video frame in the step S102 is input into a pre-trained object external classification model, where the object external classification model is used to classify the video frame through the object external video frame, for example, the appearance video frame is classified through the appearance video frame of the automobile to obtain the train system corresponding to the video frame.
Optionally, in the step S104, the video frames classified into the object internal video frames in the step S102 are input into a pre-trained object internal classification model, where the object internal classification model is used to classify the video frames through the object internal video frames, for example, a vehicle system corresponding to the video frames is obtained by classifying the central control video frames through a central control video frame of an automobile.
It is understood that, for faster processing speed, the video frame input into the object external classification model or the object internal classification model may be the image within the object detection frame obtained in step S102.
Each first element in the first classification vector corresponds to a second category, and the value of the first element represents the confidence level that the video frame is in the second category corresponding to the first element. Illustratively, the first classification vector is a normalized one-dimensional vector output by the object-external classification model or the object-internal classification model, and is represented by ViTo representA first classification vector for the ith video frame, then:
Vi={Vi1,Vi2,......,Vic}
wherein VijFor the first element in the first classification vector, j ∈ [1, c ∈ [ ]]Where c represents the number of second classes, i.e. VijRepresenting the confidence that the ith frame belongs to the jth class; wherein, Vi1+Vi2+......+Vic=1。
Through the steps S103 and S104, the object external video frame and the object internal video frame are respectively identified and classified to obtain the first classification vector, and the identification of the object internal video frame is added on the basis of the object external video frame, so that the overall recall rate of the classification of the video is improved.
Returning to fig. 1, the video classification method further includes, in step S105, determining a classification result of the video to be classified according to the first classification vector.
In step S103 and step S104, classifying and identifying the video frame of the video to be classified to obtain at least one first classification vector, and in general, obtaining a plurality of first classification vectors, and in step S105, synthesizing the plurality of first classification vectors to determine a classification result of the video to be classified, that is, a second class of the video to be classified.
Optionally, as shown in fig. 3, the step S105 further includes:
step S301, classifying the video frame corresponding to the first classification vector into at least one second class according to the first classification vector; wherein each second category corresponds to a set of video frames;
step S302, calculating the confidence coefficient of the video to be classified as the second category corresponding to the video frame set according to the first classification vector of each video frame in the video frame set;
step S303, determining the classification result of the video to be classified according to the confidence of the second category.
For each of the object external video frame and the object internal video frame obtained in step S102, a corresponding first classification vector may be obtained in step S103 or step S104, and the video frame may be classified into at least one second category by the classification vector, for example, the second category of the video frame is determined by the value of the largest first element in the first classification vector. In order to make the classification more accurate, a clustering threshold may be preset, and the video frames are classified according to the clustering threshold, optionally, step S301 further includes:
step S401, acquiring a clustering threshold;
step S402, comparing the value of each first element in the first classification vector with the size of the clustering threshold;
step S403, in response to that the value of the first element is greater than the clustering threshold, classifying the video frame corresponding to the first classification vector into a second category corresponding to the first element.
Let the clustering threshold be th1And th1∈[0,1]The above steps S402 and S403 can be realized by the following formula (4):
Figure BDA0002846054420000071
wherein, CijFor indicating whether the ith frame belongs to the jth class. The video frames obtained in step S103 or step S104 are classified into one or more second categories through step S402 and step S403, and it is understood that in the above steps, a certain frame may not belong to any second category, and in this case, the video frame may be discarded.
Through the steps, the video frame corresponding to each of the c second categories can be obtained, that is, each second category corresponds to a video frame set, and the video frame set includes the object external video frame and/or the object internal video frame.
Returning to fig. 3, in step S302, a comprehensive first classification vector of each second category is first calculated, and optionally, the comprehensive first classification vector is calculated by summing the first classification vectors of each video frame in the video frame set corresponding to the second category and then normalizing the summed first classification vectors. Taking the second class as class i as an example, the comprehensive first class vector is calculated by the following formula (5):
Figure BDA0002846054420000072
where A denotes a set of video frames belonging to class l, VqA first classification vector, f (), representing the qth frame Softmax (); from this, a comprehensive first classification vector Vl _ result can be obtained, where the value Vl _ result of the l-th element in Vl _ resultlNamely the confidence of the second category of the video to be classified as the l category.
For each second category, the confidence level of the second category may be calculated, and then in step S303, the second category corresponding to the maximum confidence level of the second category may be determined as the classification result of the video to be classified.
In order to reduce the amount of calculation, before step S302, the method further includes: filtering the second category. Optionally, the filtering includes filtering out a second category with few video frames. Optionally, a quantity threshold parameter th is set2Therein th2∈[0,1]From this, the quantity threshold can be derived: n th2And filtering out second categories corresponding to the video frame sets with the number of the video frames smaller than the number threshold, wherein the remaining second categories participate in the subsequent steps of calculating the comprehensive first classification vector and determining the confidence of the second categories. Thereby, a clearly incorrect second class can be filtered out to reduce the amount of computation in determining the confidence of the second class.
In step S303, since each second class participating in the calculation of the integrated first classification vector will obtain a corresponding Vl _ resultlTherefore, in this step, all Vl _ results can be comparedlWill Vl _ resultlThe second category with the largest value is used as the classification result of the video to be classified.
Alternatively, e.g. in accordance withVl_resultlWhen the second category with the largest value is used as the classification result of the video to be classified, all possibly the Vl _ resultslIs not large, and the classification result is inaccurate. Therefore, for the accuracy of the final classification result, the step S303 further includes:
obtaining a classification threshold value;
comparing the confidence level of the second class to the classification threshold;
and determining the second category with the confidence coefficient larger than the classification threshold value as the classification result of the video to be classified.
The classification threshold is a preset threshold th3When Vl _ resultl>th3Then, the classification result of the video to be classified is determined to be class I, and in order to prevent the video to be classified from being classified into a plurality of second classes th3Can be set to a larger value, e.g. th3=0.8。
Through the steps S101-S105, when videos are classified, besides the external features of the object, the internal features of the object are added, so that the classification of the videos including the inside and the outside of the object is more accurate, the technical problem of low recall rate caused by only classifying the videos through the video frames outside the object is solved, and the problem of low accuracy rate caused by single-frame misrecognition is also solved; in addition, when the target identification is carried out on the single frame, the position and area information of the target detection frame is combined, so that the misjudgment is reduced.
Further, in some scenes, objects in some videos are obviously identified, which can accurately classify the videos, for example, in the case of train identification, the brand of an automobile can be identified by a car logo or a specific model in the brand of the automobile can be identified by the identification of the tail of the automobile. Thus, when an object identifier is identified in a frame, the weight of the frame can be increased when the integrated first classification vector is calculated. Therefore, optionally, as shown in fig. 5, the video classification method further includes:
step S501, identifying the identification of the object according to the external video frame of the object to obtain the identification confidence of the identification of the object in the external video frame of the object;
step S502, calculating a weight value of the external video frame of the object according to the identification confidence;
at this time, the step S302 further includes:
step S503, according to the weight value of the object external video frame, carrying out weighted calculation on a first classification vector of the object external video frame in a video frame set to obtain a weighted first classification vector;
step S504, calculating a confidence of the video to be classified as a second category corresponding to the video frame set according to the weighted first classification vector.
In the step S501, the external video frame of the object may be input into a detection model of an object identifier trained in advance to identify whether the external video frame of the object includes the identifier of the object, and an output result of the detection model of the object identifier is similar to an output of the model in the step S102, that is, one or more object detection boxes are output, and each object detection box is represented by a vertical and horizontal coordinate of an upper left corner, a length and width value, a category, and a confidence of the object detection box. Therefore, the identification confidence degrees of the target detection frames can be compared to obtain the maximum identification confidence degree.
Optionally, as shown in fig. 6, in step S502, calculating a weight value of the video frame outside the object according to the identification confidence includes:
step S601, acquiring a weight threshold;
step S602, when the identification confidence is greater than or equal to the weight threshold, calculating a weight value of the external video frame of the object according to the weight threshold and the identification confidence;
step S603, when the identification confidence is smaller than the weight threshold, setting a preset weight value as the weight value of the external video frame of the object; wherein the preset weight value is less than or equal to the first weight value.
Illustratively, the weight threshold th is preset4Setting the identification confidence of the external video frame of the object obtained in the step S502 as S _ logoqWhich represents the confidence of the identification of the existing object at the qth frame. The weight value of the video frame outside the object can be calculated by the following equation (6):
Figure BDA0002846054420000091
in step S503, a weighted first classification vector is obtained by performing a weighted calculation on the first classification vector of the external video frame of the object in the video frame set. As described in the above embodiment, the video frame q in the video frame set a with category i weights the first classification vector as: w is aq*Vq
In step S504, a comprehensive weighted first classification vector is first obtained according to the weighted first classification vectors of all the video frames in the set a, and the comprehensive weighted first classification vector is obtained by, for example, the following formula (7):
Figure BDA0002846054420000092
the definition of the parameters is the same as that in formula (5), and is not described herein again. Thus, the confidence of the second of the final l classes is: vl _ resultl. Then, the process of determining the classification result of the video to be classified refers to the description in step S105, which is not repeated herein.
In the above further embodiment, the video frames containing object identifiers are weighted in calculating the integrated first classification vector such that their weights become larger and the frame with greater confidence in the identification has a weight wqThe larger the value of (A), the larger the proportion of the first classification vector in the calculation of the integrated weight, thereby resulting in the final classification vectorThe robustness of the classification result is enhanced, and the classification is more accurate.
In the above embodiments, the classification process is related to the features of the video frames; for the video uploaded by the user, the video usually comprises a title, and the title often comprises characters or keywords closely related to the content, so that the classification result can be strengthened through the title of the video to be classified. As shown in fig. 7, further, the video classification method further includes:
step S701, acquiring the title of the video to be classified;
step S702, calculating a first coefficient according to the title, wherein the value of the first coefficient is related to the times of the second category of names appearing in the title;
at this time, the step S302 further includes:
step S703, calculating a first confidence of the video to be classified as a second category corresponding to the video frame set according to the first classification vector of each video frame in the video frame set;
step S704, calculating a confidence level of the video to be classified as the second category corresponding to the video frame set according to the first coefficient and the first confidence level of the second category.
In step S702, performing operations such as word segmentation and keyword extraction on the titles of the videos to be classified to obtain characters or keywords matching the names of the second category, where the number of the characters or keywords may be any number. And calculating a first coefficient of the second category corresponding to the video frame set according to the number of the characters or the keywords matched with the name of the second category. Optionally, the step S702 further includes: matching the title of the video to be classified with the second category to obtain the times of the name of each second category in the title; and calculating a first coefficient of each second category according to the times, wherein the first coefficient is in inverse relation with the times. Illustratively, the first coefficient is calculated according to the following equation (8):
γ=2/(et+1) (8)
wherein t represents the number of times a character or keyword matching the name of the second category appears in the title. Wherein t is not less than 0 and t is an integer. Where the larger the value of t, the smaller the value of γ.
In step S703, a first confidence of the second category is calculated, where the first confidence of the second category may be the confidence of the second category calculated in step S302 or the confidence of the second category calculated in step S504. That is, the first confidence of the second category in the step S703 includes the Vl _ result calculated by the formula (5) or the formula (7)l
In step S704, a confidence level of the video to be classified as the second category corresponding to the video frame set is calculated according to the first coefficient and the first confidence level of the second category. Optionally, the confidence of the second class is calculated by using the index of the first confidence of the second class of the first coefficient. Illustratively, according to: s _ resultl=V_resultl γCalculating a confidence level for the second class of class i. Then, the process of determining the classification result of the video to be classified refers to the description in step S105, which is not repeated herein. The value γ of the first coefficient becomes smaller as t becomes larger due to Vl _ resultlIs a value less than 1, so V _ resultl γThe value of (d) becomes larger as γ becomes smaller, that is, as t becomes larger. That is, the greater the number of times a character or keyword matching the second category appears in the title, the higher the confidence of the corresponding second category.
In the above optional embodiment, the classification result is influenced by the header, so that the classification result can be strengthened by the information contained in the header, the robustness of the final classification result is strengthened, and the classification is more accurate.
Fig. 8 is an application scenario of the video classification method in the above embodiment. As shown in fig. 8, the application scene is a classification of the car systems of the videos, which classifies the videos by car appearances and car center control video frames in the videos. As shown in fig. 8, first, video information including a video and a title of the video is acquired; performing frame extraction and preprocessing on the video, performing word segmentation on the title, counting the times of the name of each train appearing in the title, and calculating a first coefficient; classifying the preprocessed video frames into appearance frames and central control frames, inputting the appearance frames into an appearance recognition model to perform feature extraction and classification to obtain first classification vectors of the appearance frames, and inputting the central control frames into a central control recognition model to perform feature extraction and classification to obtain first classification vectors of the central control frames; meanwhile, the appearance frame is input into a car logo detection module for car logo detection, and the car logo confidence coefficient of the car logo in the appearance frame is output; and when the video classification result is calculated, calculating by taking the car logo confidence degree, the first classification vector and the first coefficient as calculation factors to obtain a final classification result. In the application scene, the appearance characteristics of the automobile and the central control characteristics of the automobile in the video are combined, the classification result is enhanced through the automobile logo characteristics and the characteristics in the title, and the recall rate and the accuracy rate of video classification are increased.
The embodiment of the disclosure discloses a video classification method, which comprises the following steps: acquiring a plurality of video frames of a video to be classified; classifying the plurality of video frames to obtain a first category of the plurality of video frames, wherein the first category comprises an object external video frame and an object internal video frame; identifying the object according to the object external video frame to obtain a first classification vector of the object external video frame; identifying the object according to the object internal video frame to obtain a first classification vector of the object internal video frame; and determining a classification result of the video to be classified according to the first classification vector. The method solves the technical problem of low recall rate of video classification by combining the object external video frame and the object internal video frame.
In the above, although the steps in the above method embodiments are described in the above sequence, it should be clear to those skilled in the art that the steps in the embodiments of the present disclosure are not necessarily performed in the above sequence, and they may also be performed in other sequences such as reverse, parallel, and cross, and other sequences may also be added on the basis of the above steps, and these obvious modifications or equivalents should also be included in the protection scope of the present disclosure, and are not described herein again.
Fig. 9 is a schematic structural diagram of an embodiment of a video classification apparatus according to an embodiment of the present disclosure. As shown in fig. 9, the apparatus 900 includes: a video frame obtaining module 901, a first classification module 902, an outer first classification vector obtaining module 903, an inner first classification vector obtaining module 904, and a classification result determining module 905.
Wherein, the first and the second end of the pipe are connected with each other,
a video frame acquiring module 901, configured to acquire a plurality of video frames of a video to be classified;
a first classification module 902, configured to classify the plurality of video frames to obtain a first category of the plurality of video frames, where the first category includes an object external video frame and an object internal video frame;
an external first classification vector obtaining module 903, configured to identify the object according to the object external video frame to obtain a first classification vector of the object external video frame;
an internal first classification vector obtaining module 904, configured to identify the object according to the object internal video frame to obtain a first classification vector of the object internal video frame;
a classification result determining module 905, configured to determine a classification result of the video to be classified according to the first classification vector.
Further, the first classification module 902 is further configured to:
performing target detection on the video frame to obtain at least one target detection frame;
calculating the comprehensive confidence of the target frame according to the confidence of the target detection frame, the distance between the target detection frame and the center point of the video frame and the occupation ratio of the area of the target detection frame in the video frame;
and taking the category corresponding to the target detection box with the maximum comprehensive confidence as the first category of the video frame.
Furthermore, each first element in the first classification vector corresponds to a second class, and a value of the first element represents a confidence level that the video frame is of the second class corresponding to the first element.
Further, the classification result determining module 905 is further configured to:
classifying the video frame corresponding to the first classification vector into at least one second class according to the first classification vector; wherein each second category corresponds to a set of video frames;
calculating the confidence coefficient of the video to be classified into a second category corresponding to the video frame set according to the first classification vector of each video frame in the video frame set;
and determining the classification result of the video to be classified according to the confidence coefficient of the second category.
Further, the classification result determining module 905 is further configured to:
acquiring a clustering threshold value;
comparing the value of each first element in the first classification vector to the magnitude of the clustering threshold;
in response to the value of the first element being greater than the clustering threshold, classifying the video frame corresponding to the first classification vector into a second category corresponding to the first element.
Further, the video classification apparatus further includes:
the identification confidence determining module is used for identifying the identification of the object according to the external video frame of the object to obtain the identification confidence of the identification of the object in the external video frame of the object;
the weight calculation module is used for calculating the weight value of the external video frame of the object according to the identification confidence;
wherein, the classification result determining module 905 is further configured to:
according to the weight value of the object external video frame, carrying out weighted calculation on a first classification vector of the object external video frame in a video frame set to obtain a weighted first classification vector;
and calculating the confidence coefficient of the video to be classified as the second category corresponding to the video frame set according to the weighted first classification vector.
Further, the weight calculation module is further configured to:
acquiring a weight threshold;
when the identification confidence is greater than or equal to the weight threshold, calculating a weight value of the external video frame of the object according to the weight threshold and the identification confidence;
when the identification confidence is smaller than the weight threshold, setting a preset weight value as the weight value of the external video frame of the object; wherein the preset weight value is less than or equal to the first weight value.
Further, the video classification apparatus further includes:
the title acquisition module is used for acquiring the title of the video to be classified;
a first coefficient calculation module for calculating a first coefficient according to the title, a value of the first coefficient being related to a number of times the name of the second category appears in the title;
wherein, the classification result determining module 905 is further configured to:
calculating a first confidence coefficient of the video to be classified into a second category corresponding to the video frame set according to the first classification vector of each video frame in the video frame set;
and calculating the confidence coefficient of the video to be classified into the second category corresponding to the video frame set according to the first coefficient and the first confidence coefficient of the second category.
Further, the first coefficient calculation module is further configured to:
matching the title of the video to be classified with the second category to obtain the times of the name of each second category in the title;
and calculating a first coefficient of each second category according to the times, wherein the first coefficient is in inverse relation with the times.
Further, the classification result determining module 905 is further configured to:
obtaining a classification threshold value;
comparing the confidence level of the second class to the classification threshold;
and determining the second category with the confidence coefficient larger than the classification threshold value as the classification result of the video to be classified.
The apparatus shown in fig. 9 can perform the method of the embodiment shown in fig. 1-8, and the detailed description of this embodiment can refer to the related description of the embodiment shown in fig. 1-8. The implementation process and technical effect of the technical solution refer to the descriptions in the embodiments shown in fig. 1 to 8, and are not described herein again.
Referring now to FIG. 10, a block diagram of an electronic device 1000 suitable for use in implementing embodiments of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 10 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 10, the electronic device 1000 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 1001 that may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)1002 or a program loaded from a storage means 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data necessary for the operation of the electronic apparatus 1000 are also stored. The processing device 1001, the ROM 1002, and the RAM 1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.
Generally, the following devices may be connected to the I/O interface 1005: input devices 1006 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 1007 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage devices 1008 including, for example, magnetic tape, hard disk, and the like; and a communication device 1009. The communication device 1009 may allow the electronic device 1000 to communicate with other devices wirelessly or by wire to exchange data. While fig. 10 illustrates an electronic device 1000 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication means 1009, or installed from the storage means 1008, or installed from the ROM 1002. The computer program, when executed by the processing device 1001, performs the above-described functions defined in the methods of embodiments of the present disclosure.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: the above-described video classification method is performed.
Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of an element does not in some cases constitute a limitation on the element itself.
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
According to one or more embodiments of the present disclosure, there is provided a video classification method including:
acquiring a plurality of video frames of a video to be classified;
classifying the plurality of video frames to obtain a first category of the plurality of video frames, wherein the first category comprises an object external video frame and an object internal video frame;
identifying the object according to the object external video frame to obtain a first classification vector of the object external video frame;
identifying the object according to the object internal video frame to obtain a first classification vector of the object internal video frame;
and determining a classification result of the video to be classified according to the first classification vector.
Further, the classifying the plurality of video frames to obtain the first category of the plurality of video frames includes:
performing target detection on the video frame to obtain at least one target detection frame;
calculating the comprehensive confidence of the target frame according to the confidence of the target detection frame, the distance between the target detection frame and the center point of the video frame and the occupation ratio of the area of the target detection frame in the video frame;
and taking the category corresponding to the target detection box with the maximum comprehensive confidence as the first category of the video frame.
Furthermore, each first element in the first classification vector corresponds to a second class, and a value of the first element represents a confidence that the video frame is of the second class to which the first element corresponds.
Further, the determining a classification result of the video to be classified according to the first classification vector includes:
classifying the video frame corresponding to the first classification vector into at least one second class according to the first classification vector; wherein each second category corresponds to a set of video frames;
calculating the confidence coefficient of the video to be classified as a second category corresponding to the video frame set according to the first classification vector of each video frame in the video frame set;
and determining the classification result of the video to be classified according to the confidence coefficient of the second category.
Further, the classifying the video frame corresponding to the first classification vector into at least one second category according to the first classification vector includes:
acquiring a clustering threshold value;
comparing the value of each first element in the first classification vector to the magnitude of the clustering threshold;
in response to the value of the first element being greater than the clustering threshold, classifying the video frame corresponding to the first classification vector into a second category corresponding to the first element.
Further, the method further comprises:
identifying the identifier of the object according to the external video frame of the object to obtain an identifier confidence coefficient of the identifier of the object in the external video frame of the object;
calculating a weight value of the external video frame of the object according to the identification confidence;
the calculating the confidence of the video to be classified into the second category corresponding to the video frame set according to the first classification vector of each video frame in the video frame set includes:
according to the weight value of the object external video frame, carrying out weighted calculation on a first classification vector of the object external video frame in a video frame set to obtain a weighted first classification vector;
and calculating the confidence coefficient of the video to be classified as the second category corresponding to the video frame set according to the weighted first classification vector.
Further, the calculating a weight value of the video frame outside the object according to the identification confidence includes:
acquiring a weight threshold;
when the identification confidence is greater than or equal to the weight threshold, calculating a weight value of the external video frame of the object according to the weight threshold and the identification confidence;
when the identification confidence is smaller than the weight threshold, setting a preset weight value as the weight value of the external video frame of the object; wherein the preset weight value is less than or equal to the first weight value.
Further, the method further comprises:
acquiring the title of the video to be classified;
calculating a first coefficient from the title, the value of the first coefficient being related to the number of times the name of the second category appears in the title;
the calculating the confidence of the video to be classified into the second category corresponding to the video frame set according to the first classification vector of each video frame in the video frame set includes:
calculating a first confidence coefficient of the video to be classified into a second category corresponding to the video frame set according to the first classification vector of each video frame in the video frame set;
and calculating the confidence coefficient of the video to be classified into the second category corresponding to the video frame set according to the first coefficient and the first confidence coefficient of the second category.
Further, the calculating the first coefficient according to the title includes:
matching the title of the video to be classified with the second category to obtain the times of the name of each second category in the title;
and calculating a first coefficient of each second category according to the times, wherein the first coefficient is in inverse relation with the times.
Further, the determining the classification result of the video to be classified according to the confidence of the second category includes:
obtaining a classification threshold value;
comparing the confidence level of the second class to the classification threshold;
and determining the second category with the confidence coefficient larger than the classification threshold value as the classification result of the video to be classified.
According to one or more embodiments of the present disclosure, there is provided a video classification apparatus including:
the video frame acquisition module is used for acquiring a plurality of video frames of the video to be classified;
a first classification module, configured to classify the plurality of video frames to obtain a first class of the plurality of video frames, where the first class includes an object external video frame and an object internal video frame;
the external first classification vector acquisition module is used for identifying the object according to the external video frame of the object to obtain a first classification vector of the external video frame of the object;
the internal first classification vector acquisition module is used for identifying the object according to the object internal video frame to obtain a first classification vector of the object internal video frame;
and the classification result determining module is used for determining the classification result of the video to be classified according to the first classification vector.
Further, the first classification module is further configured to:
performing target detection on the video frame to obtain at least one target detection frame;
calculating the comprehensive confidence coefficient of the target frame according to the confidence coefficient of the target detection frame, the distance between the target detection frame and the center point of the video frame and the proportion of the area of the target detection frame in the video frame;
and taking the category corresponding to the target detection box with the maximum comprehensive confidence as the first category of the video frame.
Furthermore, each first element in the first classification vector corresponds to a second class, and a value of the first element represents a confidence level that the video frame is of the second class corresponding to the first element.
Further, the classification result determining module is further configured to:
classifying the video frame corresponding to the first classification vector into at least one second class according to the first classification vector; wherein each second category corresponds to a set of video frames;
calculating the confidence coefficient of the video to be classified as a second category corresponding to the video frame set according to the first classification vector of each video frame in the video frame set;
and determining the classification result of the video to be classified according to the confidence coefficient of the second category.
Further, the classification result determining module is further configured to:
acquiring a clustering threshold value;
comparing the value of each first element in the first classification vector to the magnitude of the clustering threshold;
in response to the value of the first element being greater than the clustering threshold, classifying the video frame corresponding to the first classification vector into a second category corresponding to the first element.
Further, the video classification apparatus further includes:
the identification confidence determining module is used for identifying the identification of the object according to the external video frame of the object to obtain the identification confidence of the identification of the object in the external video frame of the object;
the weight calculation module is used for calculating the weight value of the external video frame of the object according to the identification confidence;
wherein the classification result determining module is further configured to:
according to the weight value of the object external video frame, carrying out weighted calculation on a first classification vector of the object external video frame in a video frame set to obtain a weighted first classification vector;
and calculating the confidence coefficient of the video to be classified as the second category corresponding to the video frame set according to the weighted first classification vector.
Further, the weight calculation module is further configured to:
acquiring a weight threshold;
when the identification confidence is greater than or equal to the weight threshold, calculating a weight value of the external video frame of the object according to the weight threshold and the identification confidence;
when the identification confidence is smaller than the weight threshold, setting a preset weight value as the weight value of the external video frame of the object; wherein the preset weight value is less than or equal to the first weight value.
Further, the video classification apparatus further includes:
the title acquisition module is used for acquiring the title of the video to be classified;
a first coefficient calculation module for calculating a first coefficient according to the title, a value of the first coefficient being related to a number of times the name of the second category appears in the title;
wherein the classification result determining module is further configured to:
calculating a first confidence coefficient of the video to be classified into a second category corresponding to the video frame set according to the first classification vector of each video frame in the video frame set;
and calculating the confidence coefficient of the video to be classified into the second category corresponding to the video frame set according to the first coefficient and the first confidence coefficient of the second category.
Further, the first coefficient calculation module is further configured to:
matching the title of the video to be classified with the second category to obtain the times of the name of each second category in the title;
and calculating a first coefficient of each second category according to the times, wherein the first coefficient is in inverse relation with the times.
Further, the classification result determining module is further configured to:
obtaining a classification threshold value;
comparing the confidence level of the second class to the classification threshold;
and determining the second category with the confidence coefficient larger than the classification threshold value as the classification result of the video to be classified.
According to one or more embodiments of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of video classification of any of the preceding first aspects.
According to one or more embodiments of the present disclosure, there is provided a non-transitory computer-readable storage medium, characterized in that the non-transitory computer-readable storage medium stores computer instructions for causing a computer to perform the video classification method of any of the foregoing first aspects.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims (13)

1. A method for video classification, comprising:
acquiring a plurality of video frames of a video to be classified;
classifying the plurality of video frames to obtain a first category of the plurality of video frames, wherein the first category comprises an object external video frame and an object internal video frame;
identifying the object according to the object external video frame to obtain a first classification vector of the object external video frame;
identifying the object according to the video frame inside the object to obtain a first classification vector of the video frame inside the object;
and determining a classification result of the video to be classified according to the first classification vector.
2. The video classification method of claim 1, wherein said classifying the plurality of video frames into a first class of the plurality of video frames comprises:
performing target detection on the video frame to obtain at least one target detection frame;
calculating the comprehensive confidence coefficient of the target frame according to the confidence coefficient of the target detection frame, the distance between the target detection frame and the center point of the video frame and the proportion of the area of the target detection frame in the video frame;
and taking the category corresponding to the target detection box with the maximum comprehensive confidence as the first category of the video frame.
3. The method for video classification according to claim 1, wherein each first element in said first classification vector corresponds to a second class, and a value of said first element represents a confidence level that said video frame is in said second class to which said first element corresponds.
4. The video classification method according to claim 3, wherein said determining a classification result of the video to be classified according to the first classification vector comprises:
classifying the video frame corresponding to the first classification vector into at least one second class according to the first classification vector; wherein each second category corresponds to a set of video frames;
calculating the confidence coefficient of the video to be classified as a second category corresponding to the video frame set according to the first classification vector of each video frame in the video frame set;
and determining the classification result of the video to be classified according to the confidence coefficient of the second category.
5. The video classification method according to claim 4, wherein said classifying the video frame corresponding to the first classification vector into at least one second class according to the first classification vector comprises:
acquiring a clustering threshold value;
comparing the value of each first element in the first classification vector to the magnitude of the clustering threshold;
in response to the value of the first element being greater than the clustering threshold, classifying the video frame corresponding to the first classification vector into a second category corresponding to the first element.
6. The video classification method of claim 4, characterized in that the method further comprises:
identifying the identifier of the object according to the external video frame of the object to obtain an identifier confidence coefficient of the identifier of the object in the external video frame of the object;
calculating a weight value of the external video frame of the object according to the identification confidence;
the calculating, according to the first classification vector of each video frame in the video frame set, the confidence that the video to be classified is of the second category corresponding to the video frame set includes:
according to the weight value of the object external video frame, carrying out weighted calculation on a first classification vector of the object external video frame in a video frame set to obtain a weighted first classification vector;
and calculating the confidence coefficient of the video to be classified as the second category corresponding to the video frame set according to the weighted first classification vector.
7. The video classification method of claim 6, wherein said calculating a weight value for the video frame outside the object according to the identification confidence comprises:
acquiring a weight threshold;
when the identification confidence is greater than or equal to the weight threshold, calculating a weight value of the external video frame of the object according to the weight threshold and the identification confidence;
when the identification confidence is smaller than the weight threshold, setting a preset weight value as the weight value of the external video frame of the object; wherein the preset weight value is less than or equal to the first weight value.
8. The video classification method of claim 4, characterized in that the method further comprises:
acquiring the title of the video to be classified;
calculating a first coefficient from the title, the value of the first coefficient being related to the number of times the name of the second category appears in the title;
the calculating the confidence of the video to be classified into the second category corresponding to the video frame set according to the first classification vector of each video frame in the video frame set includes:
calculating a first confidence coefficient of the video to be classified into a second category corresponding to the video frame set according to the first classification vector of each video frame in the video frame set;
and calculating the confidence coefficient of the video to be classified into the second category corresponding to the video frame set according to the first coefficient and the first confidence coefficient of the second category.
9. The video classification method according to claim 8, wherein said calculating a first coefficient from said title comprises:
matching the title of the video to be classified with the second category to obtain the times of the name of each second category in the title;
and calculating a first coefficient of each second category according to the times, wherein the first coefficient is in inverse relation with the times.
10. The video classification method according to any one of claims 4 to 9, wherein said determining a classification result of the video to be classified according to the confidence of the second class comprises:
obtaining a classification threshold value;
comparing the confidence level of the second class to the classification threshold;
and determining the second category with the confidence coefficient larger than the classification threshold value as the classification result of the video to be classified.
11. A video classification apparatus, comprising:
the video frame acquisition module is used for acquiring a plurality of video frames of the video to be classified;
a first classification module, configured to classify the plurality of video frames to obtain a first class of the plurality of video frames, where the first class includes an object external video frame and an object internal video frame;
the external first classification vector acquisition module is used for identifying the object according to the external video frame of the object to obtain a first classification vector of the external video frame of the object;
the internal first classification vector acquisition module is used for identifying the object according to the object internal video frame to obtain a first classification vector of the object internal video frame;
and the classification result determining module is used for determining the classification result of the video to be classified according to the first classification vector.
12. An electronic device, comprising:
a memory for storing computer readable instructions; and
a processor for executing the computer readable instructions such that the processor when executed implements the method of any of claims 1-10.
13. A non-transitory computer readable storage medium storing computer readable instructions which, when executed by a computer, cause the computer to perform the method of any one of claims 1-10.
CN202011509800.8A 2020-12-18 2020-12-18 Video classification method and device, electronic equipment and computer-readable storage medium Pending CN114648713A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011509800.8A CN114648713A (en) 2020-12-18 2020-12-18 Video classification method and device, electronic equipment and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011509800.8A CN114648713A (en) 2020-12-18 2020-12-18 Video classification method and device, electronic equipment and computer-readable storage medium

Publications (1)

Publication Number Publication Date
CN114648713A true CN114648713A (en) 2022-06-21

Family

ID=81990548

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011509800.8A Pending CN114648713A (en) 2020-12-18 2020-12-18 Video classification method and device, electronic equipment and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN114648713A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115953722A (en) * 2023-03-03 2023-04-11 阿里巴巴(中国)有限公司 Processing method and device for video classification task

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115953722A (en) * 2023-03-03 2023-04-11 阿里巴巴(中国)有限公司 Processing method and device for video classification task

Similar Documents

Publication Publication Date Title
CN111476309B (en) Image processing method, model training method, device, equipment and readable medium
KR102576344B1 (en) Method and apparatus for processing video, electronic device, medium and computer program
CN110674349B (en) Video POI (Point of interest) identification method and device and electronic equipment
CN111931859B (en) Multi-label image recognition method and device
CN112364829B (en) Face recognition method, device, equipment and storage medium
CN112232311B (en) Face tracking method and device and electronic equipment
CN110399847B (en) Key frame extraction method and device and electronic equipment
CN111738316B (en) Zero sample learning image classification method and device and electronic equipment
CN115861400B (en) Target object detection method, training device and electronic equipment
CN112712036A (en) Traffic sign recognition method and device, electronic equipment and computer storage medium
CN115393606A (en) Method and system for image recognition
CN114898154A (en) Incremental target detection method, device, equipment and medium
CN114648713A (en) Video classification method and device, electronic equipment and computer-readable storage medium
CN113255501A (en) Method, apparatus, medium, and program product for generating form recognition model
CN114648712B (en) Video classification method, device, electronic equipment and computer readable storage medium
CN110781809A (en) Identification method and device based on registration feature update and electronic equipment
CN111832354A (en) Target object age identification method and device and electronic equipment
CN115393755A (en) Visual target tracking method, device, equipment and storage medium
CN110704679B (en) Video classification method and device and electronic equipment
CN114428867A (en) Data mining method and device, storage medium and electronic equipment
CN113705643A (en) Target detection method and device and electronic equipment
CN115147434A (en) Image processing method, device, terminal equipment and computer readable storage medium
CN112214639A (en) Video screening method, video screening device and terminal equipment
CN112561956A (en) Video target tracking method and device, electronic equipment and storage medium
CN113111692A (en) Target detection method and device, computer readable storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Applicant after: Tiktok vision (Beijing) Co.,Ltd.

Address before: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Applicant before: BEIJING BYTEDANCE NETWORK TECHNOLOGY Co.,Ltd.

Address after: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Applicant after: Douyin Vision Co.,Ltd.

Address before: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Applicant before: Tiktok vision (Beijing) Co.,Ltd.