CN113033707A - Video classification method and device, readable medium and electronic equipment - Google Patents

Video classification method and device, readable medium and electronic equipment Download PDF

Info

Publication number
CN113033707A
CN113033707A CN202110450256.2A CN202110450256A CN113033707A CN 113033707 A CN113033707 A CN 113033707A CN 202110450256 A CN202110450256 A CN 202110450256A CN 113033707 A CN113033707 A CN 113033707A
Authority
CN
China
Prior art keywords
video
classification
target video
determining
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110450256.2A
Other languages
Chinese (zh)
Other versions
CN113033707B (en
Inventor
杜正印
李伟健
王长虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Youzhuju Network Technology Co Ltd
Original Assignee
Beijing Youzhuju Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Youzhuju Network Technology Co Ltd filed Critical Beijing Youzhuju Network Technology Co Ltd
Priority to CN202110450256.2A priority Critical patent/CN113033707B/en
Publication of CN113033707A publication Critical patent/CN113033707A/en
Application granted granted Critical
Publication of CN113033707B publication Critical patent/CN113033707B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/75Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Probability & Statistics with Applications (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure relates to a video classification method, apparatus, readable medium and electronic device, including: acquiring a target video; respectively inputting the target video into a plurality of video classification models, and determining a plurality of groups of classification prediction results corresponding to the target video; determining target video characteristics of the target video according to the multiple groups of classification prediction results; and inputting the characteristics of the target video into a pre-trained fusion classification model, and determining a classification label corresponding to the target video. Through the technical scheme, the classification prediction results of the multiple video classification models can be finally fused to serve as the video features of the target video, and then the video classification is identified according to the newly obtained video features, so that richer classification information of the target video can be obtained, the classification effect of a video classification task is improved, and the classification accuracy of the video is improved.

Description

Video classification method and device, readable medium and electronic equipment
Technical Field
The present disclosure relates to the field of video technologies, and in particular, to a video classification method, an apparatus, a readable medium, and an electronic device.
Background
Video tag identification and classification are basic technologies of a video content platform, and have many applications in detection and analysis of content in the platform, recommendation and search of platform content and the like. At present, the identification of video labels is mainly based on a supervised machine learning method, automatic learning is carried out from labeled data in an end-to-end mode, and then the learned model is automatically predicted on new text data. However, the single machine learning model has limited video feature extraction capability, and cannot realize more accurate classification effect.
Disclosure of Invention
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
In a first aspect, the present disclosure provides a video classification method, including:
acquiring a target video;
respectively inputting the target video into a plurality of video classification models, and determining a plurality of groups of classification prediction results corresponding to the target video;
determining the target video characteristics of the target video according to the multiple groups of classification prediction results;
and inputting the target video characteristics into a pre-trained fusion classification model, and determining a classification label corresponding to the target video.
In a second aspect, the present disclosure provides a video classification apparatus, the apparatus comprising:
the first acquisition module is used for acquiring a target video;
the first determining module is used for respectively inputting the target video into a plurality of video classification models and determining a plurality of groups of classification prediction results corresponding to the target video;
the second determining module is used for determining the target video characteristics of the target video according to the multiple groups of classification prediction results;
and the third determining module is used for inputting the target video characteristics into a pre-trained fusion classification model and determining a classification label corresponding to the target video.
In a third aspect, the present disclosure provides a computer readable medium having stored thereon a computer program which, when executed by a processing apparatus, performs the steps of the method of the first aspect.
In a fourth aspect, the present disclosure provides an electronic device comprising:
a storage device having a computer program stored thereon;
processing means for executing the computer program in the storage means to carry out the steps of the method of the first aspect.
Through the technical scheme, the classification prediction results corresponding to the target video can be obtained through the plurality of video classification models and are used as the classification characteristic data of the target video, the classification prediction results of the plurality of video classification models are finally fused and are used as the video characteristics of the target video, and then the video classification is identified according to the newly obtained video characteristics, so that the richer classification information of the target video can be obtained, the classification effect of a video classification task is improved, and the classification accuracy of the video is improved.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows.
Drawings
The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale. In the drawings:
fig. 1 is a flowchart illustrating a video classification method according to an exemplary embodiment of the present disclosure.
Fig. 2 is a flowchart illustrating a video classification method according to still another exemplary embodiment of the present disclosure.
Fig. 3 is a flowchart illustrating a training method for fusing classification models in a video classification method according to still another exemplary embodiment of the present disclosure.
Fig. 4 is a block diagram illustrating a structure of a video classification apparatus according to an exemplary embodiment of the present disclosure.
Fig. 5 is a block diagram illustrating a structure of a video classification apparatus according to still another exemplary embodiment of the present disclosure.
FIG. 6 shows a schematic structural diagram of an electronic device suitable for use in implementing embodiments of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.
It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.
It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.
The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.
Fig. 1 is a flowchart illustrating a video classification method according to an exemplary embodiment of the present disclosure. As shown in fig. 1, the method includes steps 101 to 104.
In step 101, a target video is acquired. The target video may be any form of video that requires tag classification. For example, the short video published by the user in the short video platform may be used, or the long video published by the user in other video platforms may be used.
In step 102, the target video is respectively input into a plurality of video classification models, and a plurality of groups of classification prediction results corresponding to the target video are determined.
The video classification models can be classification models in any field, and the number of the video classification models can also be any number. For example, the video classification model 1 to the video classification model 10 may be included, the video classification model 1 is used for determining which of the classes 1 to 10 the target video belongs to according to the probability that the target video belongs to the classes 1 to 10, the video classification model 2 is used for determining which of the classes 11 to 20 the target video belongs to according to the probability that the target video belongs to the classes 11 to 20, …, and the video classification model 10 is used for determining which of the classes 91 to 100 the target video belongs to according to the probability that the target video belongs to the classes 91 to 100, and so on.
The multiple groups of classification prediction results corresponding to the target video may be formed by prediction classifications output by each video classification model, or may be formed by probability values of the target video output by each video classification model belonging to each classification, or the classification prediction results may include both the probability values and the prediction classifications.
In step 103, determining the target video characteristics of the target video according to the multiple groups of classification prediction results.
The method for determining the target video features according to the multiple groups of classification prediction results may be, for example, intuitively merging and splicing (concat) the multiple groups of classification prediction results into the same array, or selecting a part of the classification prediction results to splice into a feature array.
In a case that the classification prediction result is a probability that the target video belongs to each of the classifications included in each of the video classification models, the method for determining the target video feature of the target video according to the multiple groups of classification prediction results may be: and merging the multiple groups of classification prediction results into an N-dimensional array according to a preset sequence, determining the N according to the total number of all the classifications included in the multiple video classification models, and determining the N-dimensional array as the target video characteristics of the target video. For example, if the video classification models include video classification models 1 to 4, the video classification model 1 outputs three classifications (0.7, 0.2, 0.1), the video classification model 2 outputs three classifications (0.03, 0.67, 0.3), the video classification model 3 outputs two classifications (0.7, 0.3), the video classification model 4 outputs four classifications (0.01, 0.8, 0.09, 0.1), and the predetermined sequence is in order of the numbers of the video classification models 1 to 4, the target video features finally obtained by merging and splicing are 12-dimensional arrays of (0.7, 0.2, 0.1, 0.03, 0.67, 0.3, 0.7, 0.3, 0.01, 0.8, 0.09, 0.1).
In step 104, the target video features are input into a pre-trained fusion classification model, and a classification label corresponding to the target video is determined.
In the process of model training, the processing of the training sample video is the same as that in the steps 101 to 103, and the video features corresponding to the training sample video are obtained through the same processing steps and then input into the fusion classification model for training, so that the trained fusion classification model can be obtained.
Through the technical scheme, the classification prediction results corresponding to the target video can be obtained through the plurality of video classification models and are used as the classification characteristic data of the target video, the classification prediction results of the plurality of video classification models are finally fused and are used as the video characteristics of the target video, and then the video classification is identified according to the newly obtained video characteristics, so that the richer classification information of the target video can be obtained, the classification effect of a video classification task is improved, and the classification accuracy of the video is improved.
Fig. 2 is a flowchart illustrating a video classification method according to still another exemplary embodiment of the present disclosure, as shown in fig. 2, the method further includes step 201 and step 202.
In step 201, video attribute information of the target video is obtained, where the video attribute information includes at least one of video duration, video author, and number of fans of the video author.
In step 202, determining a target video feature of the target video according to the multiple sets of classification prediction results and the video attribute information.
The video attribute information may include other video attribute information in addition to at least one of the video duration, the video author, and the number of fans of the video author, and the content of the video attribute information is not limited in the present disclosure as long as the video attribute information is inherent information included in the target video.
The method for determining the target video characteristics of the target video according to the multiple groups of classification prediction results and the video attribute information can also be multiple. If the classification prediction result is the probability that the target video belongs to each of the classifications included in each of the video classification models, and the method for determining the target video characteristics of the target video according to the multiple groups of classification prediction results comprises the following steps: and merging the multiple groups of classification prediction results into an N-dimensional array according to a preset sequence, determining the N according to the total number of all classifications contained in the multiple video classification models, and determining the N-dimensional array as the target video feature of the target video, so that the determination of the target video feature of the target video according to the multiple groups of classification prediction results and the video attribute information can be realized by taking each type of information in the video attribute information as one-dimensional data in the array, and merging and splicing the N-dimensional data corresponding to the multiple groups of classification prediction results into new array data.
For example, if the video attribute information includes the video time length, the video author, and the number of video author fans, the data corresponding to the video time length may be (60) indicating that the video time length is 60 seconds, the data corresponding to the video author may be (12345) indicating the ID of the video author, the data corresponding to the number of video author fans may be (100) indicating that the number of video author fans is 100 ten thousand, and the like, according to the above example, if the target video feature determined from the plurality of sets of classification prediction results is (0.7, 0.2, 0.1, 0.03, 0.67, 0.3, 0.7, 0.3, 0.01, 0.8, 0.09, 0.1), the target video feature determined from the plurality of sets of classification prediction results and the video attribute information may be (0.7, 0.2, 0.1, 0.03, 0.67, 0.3, 0.7, 0.3, 0.01, 0.8, 0.09, 0.1, 0.45, 0.60, 0.45, and the like).
In addition, in the process of training the fusion classification model, the processing of the training sample video is the same as that in the steps 101, 02, 201 and 202 shown in fig. 2, and the video features corresponding to the training sample video are obtained through the same processing steps and then input into the fusion classification model for training, so that the trained fusion classification model can be obtained.
Through the technical scheme, the classification information of the video can be acquired through the plurality of video classification models, and the inherent video attribute information in the video can be used as the video characteristics of the target video, so that richer classification information of the video can be obtained, the classification effect of a video classification task is further improved, and the classification accuracy of the video is further improved.
In a possible implementation manner, the video classification models include a first classification model, and video categories corresponding to a plurality of first target classifications included in the first classification model are not identical. The first object classification is also an object classification that the first classification model can classify the object video, for example, the first object classification may include conventional first-class classifications of fun, delicacy, fashion, travel, family, car, game, music, science and technology, or may include second-class classifications of hand-trip, web-trip, end-trip, and the like in some specific application fields. That is, the first target classification may be a full primary classification, or may include a part of a secondary classification in the case of including a part of the primary classification. The video category corresponding to the first-level classification is also the category of the video category, for example, the video category corresponding to the game classification is the game category, the video category corresponding to the second-level classification is the video category corresponding to the first-level classification to which the video category belongs, and the video categories corresponding to the hand game, the page game, the end game and the like are the game categories corresponding to the first-level classification game. Under the condition that the first target classification is all the first-level classification, the video classes corresponding to the first target classification are different, and under the condition that the first target classification comprises both the first-level classification and the second-level classification, the video classes corresponding to the first target classification are not identical.
In a possible implementation manner, the video classification model includes a second classification model, and a plurality of second target classifications included in the second classification model all belong to the same video category; the video classification model comprises a plurality of second classification models, and the video classification corresponding to each second classification model is different. The second classification model may be, for example, a vertical classification model, and the model may individually correspond to a video category, and all the second object classifications included in the model belong to the video category corresponding to the model. For example, the video category may be any of the above-mentioned first-level categories, such as games, and the second target category may be a second-level category in each game category, such as hand games, page games, end games, and the like. The video classification models include a plurality of video categories corresponding to the second classification model, which may be different from each other, for example, a second classification model corresponding to a game category, a second classification model corresponding to a food category, a second classification model corresponding to a car category, and so on.
In a possible implementation manner, a third classification model is included in the video classification models, and a plurality of third object classifications are included in the third classification model, where a correlation between any two videos belonging to the same third object classification is lower than a first preset threshold. The third classification model may be a weakly correlated model such as a painting model, in which content correlation between any two videos classified into the same classification is weak, but even if the content correlation is weak, a part of video feature information in the videos can be represented. The first preset threshold may be set according to actual conditions as long as the third classification model is a weakly correlated classification model.
The video classification model can comprise one or more of the first classification model, the second classification model and the third classification model, and can also comprise a plurality of second classification models and/or a plurality of third classification models, so that more video characteristic information in the target video is extracted through each classification model, the classification effect of the video classification task is further improved, and the classification accuracy of the video is further improved.
Fig. 3 is a flowchart illustrating a training method for fusing classification models in a video classification method according to still another exemplary embodiment of the present disclosure. As shown in fig. 3, the method includes steps 301 to 304.
In step 301, a training sample video is acquired.
In step 302, a training sample video is respectively input into a plurality of the video classification models to determine a plurality of groups of classification prediction results corresponding to the training sample video, and a classification label of the training sample video is determined according to the classification prediction result of the first classification model.
In step 303, sample video features of the training sample video are determined according to the multiple sets of classification prediction results.
In step 304, the sample video features are input into the fused classification model to train the fused classification model.
Under the condition that the video classification model comprises the first classification model, the classification result output by the first classification model can be directly used as the classification label of the training sample video to label the training sample video, so that a large amount of unlabelled videos can be used as the training sample video to train the fusion classification model.
The fusion classification model may be gbdt (gradient Boosting classification tree), dnn (deep Neural network), or any other supervised machine learning model, and the model type of the fusion classification model is not limited in this disclosure, and the model contents of the plurality of video classification models are not limited.
The training method may further include obtaining video attribute information of the training sample video, and the process of determining the sample video features of the training sample video in step 303 may further include: and determining sample video characteristics of the training sample video according to the multiple groups of classification prediction results and the video attribute information of the training sample video.
Fig. 4 is a block diagram illustrating a structure of a video classification apparatus according to an exemplary embodiment of the present disclosure. As shown in fig. 4, the apparatus includes: a first obtaining module 10, configured to obtain a target video; a first determining module 20, configured to input the target video into a plurality of video classification models respectively, and determine a plurality of groups of classification prediction results corresponding to the target video; a second determining module 30, configured to determine a target video feature of the target video according to the multiple groups of classification prediction results; and a third determining module 40, configured to input the target video features into a pre-trained fusion classification model, and determine a classification label corresponding to the target video.
Through the technical scheme, the classification prediction results corresponding to the target video can be obtained through the plurality of video classification models and are used as the classification characteristic data of the target video, the classification prediction results of the plurality of video classification models are finally fused and are used as the video characteristics of the target video, and then the video classification is identified according to the newly obtained video characteristics, so that the richer classification information of the target video can be obtained, the classification effect of a video classification task is improved, and the classification accuracy of the video is improved.
Fig. 5 is a block diagram illustrating a structure of a video classification apparatus according to still another exemplary embodiment of the present disclosure, as shown in fig. 5, the apparatus further includes: a second obtaining module 50, configured to obtain video attribute information of the target video, where the video attribute information includes at least one of a video duration, a video author, and a number of video author fans; the second determination module 30 is further configured to: and determining the target video characteristics of the target video according to the multiple groups of classification prediction results and the video attribute information.
In a possible implementation manner, the video classification models include a first classification model, and video categories corresponding to a plurality of first target classifications included in the first classification model are not identical.
In a possible implementation manner, the video classification model includes a second classification model, and a plurality of second target classifications included in the second classification model all belong to the same video category; the video classification model comprises a plurality of second classification models, and the video classification corresponding to each second classification model is different.
In a possible implementation manner, a third classification model is included in the video classification models, and a plurality of third object classifications are included in the third classification model, where a correlation between any two videos belonging to the same third object classification is lower than a first preset threshold.
In a possible implementation manner, the classification prediction result is a probability that the target video belongs to each of the classifications included in each of the video classification models respectively; the second determination module 30 is further configured to: combining the multiple groups of classification prediction results in an N-dimensional array according to a preset sequence, and determining N according to the total number of all classifications in the multiple video classification models; and determining the N-dimensional array as the target video characteristics of the target video.
In one possible embodiment, the fusion classification model is trained by: acquiring a training sample video; respectively inputting training sample videos into a plurality of video classification models to determine a plurality of groups of classification prediction results corresponding to the training sample videos, and determining classification label labels of the training sample videos according to the classification prediction results of the first classification models; determining sample video characteristics of the training sample video according to the multiple groups of classification prediction results; and inputting the sample video features into the fusion classification model so as to train the fusion classification model.
Referring now to FIG. 6, a block diagram of an electronic device 600 suitable for use in implementing embodiments of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 6, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of the embodiments of the present disclosure.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a target video; respectively inputting the target video into a plurality of video classification models, and determining a plurality of groups of classification prediction results corresponding to the target video; determining the target video characteristics of the target video according to the multiple groups of classification prediction results; and inputting the target video characteristics into a pre-trained fusion classification model, and determining a classification label corresponding to the target video.
Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present disclosure may be implemented by software or hardware. The name of the module does not in some cases constitute a limitation to the module itself, and for example, the first acquisition module may also be described as a "module acquiring a target video".
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
Example 1 provides a video classification method according to one or more embodiments of the present disclosure, the method including: acquiring a target video; respectively inputting the target video into a plurality of video classification models, and determining a plurality of groups of classification prediction results corresponding to the target video; determining the target video characteristics of the target video according to the multiple groups of classification prediction results; and inputting the target video characteristics into a pre-trained fusion classification model, and determining a classification label corresponding to the target video.
Example 2 provides the method of example 1, further comprising, in accordance with one or more embodiments of the present disclosure: acquiring video attribute information of the target video, wherein the video attribute information comprises at least one of video duration, video authors and video author fan number; the determining the target video characteristics of the target video according to the plurality of groups of classification prediction results comprises: and determining the target video characteristics of the target video according to the multiple groups of classification prediction results and the video attribute information.
Example 3 provides the method of example 1, in which the video classification model includes a first classification model, and video categories corresponding to a plurality of first target classifications included in the first classification model are not identical.
Example 4 provides the method of example 1, the video classification model including a second classification model, the second classification model including a plurality of second target classifications belonging to a same video category, according to one or more embodiments of the present disclosure; the video classification model comprises a plurality of second classification models, and the video classification corresponding to each second classification model is different.
According to one or more embodiments of the present disclosure, example 5 provides the method of example 1, where a third classification model is included in the video classification models, and a plurality of third object classifications are included in the third classification model, where a correlation between any two videos belonging to the same third object classification is lower than a first preset threshold.
Example 6 provides the method of example 1, the classification prediction result being a probability that the target video belongs to each of the classifications included in each of the video classification models, respectively; the determining the target video characteristics of the target video according to the plurality of groups of classification prediction results comprises: combining the multiple groups of classification prediction results in an N-dimensional array according to a preset sequence, and determining N according to the total number of all classifications in the multiple video classification models; and determining the N-dimensional array as the target video characteristics of the target video.
Example 7 provides the method of example 1, the fused classification model trained by: acquiring a training sample video; respectively inputting training sample videos into a plurality of video classification models to determine a plurality of groups of classification prediction results corresponding to the training sample videos, and determining classification label labels of the training sample videos according to the classification prediction results of the first classification models; determining sample video characteristics of the training sample video according to the multiple groups of classification prediction results; and inputting the sample video features into the fusion classification model so as to train the fusion classification model.
Example 8 provides, in accordance with one or more embodiments of the present disclosure, a video classification apparatus, the apparatus comprising: the first acquisition module is used for acquiring a target video; the first determining module is used for respectively inputting the target video into a plurality of video classification models and determining a plurality of groups of classification prediction results corresponding to the target video; the second determining module is used for determining the target video characteristics of the target video according to the multiple groups of classification prediction results; and the third determining module is used for inputting the target video characteristics into a pre-trained fusion classification model and determining a classification label corresponding to the target video.
Example 9 provides a computer readable medium having stored thereon a computer program that, when executed by a processing apparatus, performs the steps of the method of any of examples 1-7, in accordance with one or more embodiments of the present disclosure.
Example 10 provides, in accordance with one or more embodiments of the present disclosure, an electronic device comprising: a storage device having a computer program stored thereon; processing means for executing the computer program in the storage means to carry out the steps of the method of any of examples 1-7.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.
Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims. With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Claims (10)

1. A method for video classification, the method comprising:
acquiring a target video;
respectively inputting the target video into a plurality of video classification models, and determining a plurality of groups of classification prediction results corresponding to the target video;
determining the target video characteristics of the target video according to the multiple groups of classification prediction results;
and inputting the target video characteristics into a pre-trained fusion classification model, and determining a classification label corresponding to the target video.
2. The method of claim 1, further comprising:
acquiring video attribute information of the target video, wherein the video attribute information comprises at least one of video duration, video authors and video author fan number;
the determining the target video characteristics of the target video according to the plurality of groups of classification prediction results comprises:
and determining the target video characteristics of the target video according to the multiple groups of classification prediction results and the video attribute information.
3. The method according to claim 1, wherein a first classification model is included in the video classification models, and video categories corresponding to a plurality of first target classifications included in the first classification model are not identical.
4. The method according to claim 1, wherein a second classification model is included in the video classification models, and a plurality of second object classifications included in the second classification model all belong to the same video category;
the video classification model comprises a plurality of second classification models, and the video classification corresponding to each second classification model is different.
5. The method according to claim 1, wherein a third classification model is included in the video classification models, and a plurality of third object classifications are included in the third classification model, wherein a correlation degree between any two videos belonging to the same third object classification is lower than a first preset threshold.
6. The method according to claim 1, wherein the classification prediction result is a probability that the target video belongs to each of the classifications included in each of the video classification models, respectively;
the determining the target video characteristics of the target video according to the plurality of groups of classification prediction results comprises:
combining the multiple groups of classification prediction results in an N-dimensional array according to a preset sequence, and determining N according to the total number of all classifications in the multiple video classification models;
and determining the N-dimensional array as the target video characteristics of the target video.
7. The method of claim 3, wherein the fused classification model is trained by:
acquiring a training sample video;
respectively inputting training sample videos into a plurality of video classification models to determine a plurality of groups of classification prediction results corresponding to the training sample videos, and determining classification label labels of the training sample videos according to the classification prediction results of the first classification models;
determining sample video characteristics of the training sample video according to the multiple groups of classification prediction results;
and inputting the sample video features into the fusion classification model so as to train the fusion classification model.
8. An apparatus for video classification, the apparatus comprising:
the first acquisition module is used for acquiring a target video;
the first determining module is used for respectively inputting the target video into a plurality of video classification models and determining a plurality of groups of classification prediction results corresponding to the target video;
the second determining module is used for determining the target video characteristics of the target video according to the multiple groups of classification prediction results;
and the third determining module is used for inputting the target video characteristics into a pre-trained fusion classification model and determining a classification label corresponding to the target video.
9. A computer-readable medium, on which a computer program is stored, characterized in that the program, when being executed by processing means, carries out the steps of the method of any one of claims 1 to 7.
10. An electronic device, comprising:
a storage device having a computer program stored thereon;
processing means for executing the computer program in the storage means to carry out the steps of the method according to any one of claims 1 to 7.
CN202110450256.2A 2021-04-25 2021-04-25 Video classification method and device, readable medium and electronic equipment Active CN113033707B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110450256.2A CN113033707B (en) 2021-04-25 2021-04-25 Video classification method and device, readable medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110450256.2A CN113033707B (en) 2021-04-25 2021-04-25 Video classification method and device, readable medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN113033707A true CN113033707A (en) 2021-06-25
CN113033707B CN113033707B (en) 2023-08-04

Family

ID=76454490

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110450256.2A Active CN113033707B (en) 2021-04-25 2021-04-25 Video classification method and device, readable medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN113033707B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113627551A (en) * 2021-08-17 2021-11-09 平安普惠企业管理有限公司 Multi-model-based certificate classification method, device, equipment and storage medium
WO2023035877A1 (en) * 2021-09-08 2023-03-16 北京有竹居网络技术有限公司 Video recognition method and apparatus, readable medium, and electronic device

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180032846A1 (en) * 2016-08-01 2018-02-01 Nvidia Corporation Fusing multilayer and multimodal deep neural networks for video classification
US20180144194A1 (en) * 2016-11-22 2018-05-24 Jinsoo Park Method and apparatus for classifying videos based on audio signals
CN108134784A (en) * 2017-12-19 2018-06-08 东软集团股份有限公司 web page classification method and device, storage medium and electronic equipment
CN109359636A (en) * 2018-12-14 2019-02-19 腾讯科技(深圳)有限公司 Video classification methods, device and server
CN109614517A (en) * 2018-12-04 2019-04-12 广州市百果园信息技术有限公司 Classification method, device, equipment and the storage medium of video
CN110046278A (en) * 2019-03-11 2019-07-23 北京奇艺世纪科技有限公司 Video classification methods, device, terminal device and storage medium
CN110162669A (en) * 2019-04-04 2019-08-23 腾讯科技(深圳)有限公司 Visual classification processing method, device, computer equipment and storage medium
CN111491187A (en) * 2020-04-15 2020-08-04 腾讯科技(深圳)有限公司 Video recommendation method, device, equipment and storage medium
US10848791B1 (en) * 2018-10-30 2020-11-24 Amazon Technologies, Inc. Determining portions of video content based on artificial intelligence model
CN112562727A (en) * 2020-12-18 2021-03-26 科大讯飞股份有限公司 Audio scene classification method, device and equipment applied to audio monitoring

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180032846A1 (en) * 2016-08-01 2018-02-01 Nvidia Corporation Fusing multilayer and multimodal deep neural networks for video classification
US20180144194A1 (en) * 2016-11-22 2018-05-24 Jinsoo Park Method and apparatus for classifying videos based on audio signals
CN108134784A (en) * 2017-12-19 2018-06-08 东软集团股份有限公司 web page classification method and device, storage medium and electronic equipment
US10848791B1 (en) * 2018-10-30 2020-11-24 Amazon Technologies, Inc. Determining portions of video content based on artificial intelligence model
CN109614517A (en) * 2018-12-04 2019-04-12 广州市百果园信息技术有限公司 Classification method, device, equipment and the storage medium of video
CN109359636A (en) * 2018-12-14 2019-02-19 腾讯科技(深圳)有限公司 Video classification methods, device and server
CN110046278A (en) * 2019-03-11 2019-07-23 北京奇艺世纪科技有限公司 Video classification methods, device, terminal device and storage medium
CN110162669A (en) * 2019-04-04 2019-08-23 腾讯科技(深圳)有限公司 Visual classification processing method, device, computer equipment and storage medium
CN111491187A (en) * 2020-04-15 2020-08-04 腾讯科技(深圳)有限公司 Video recommendation method, device, equipment and storage medium
CN112562727A (en) * 2020-12-18 2021-03-26 科大讯飞股份有限公司 Audio scene classification method, device and equipment applied to audio monitoring

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
M. SORNAM 等: "A Survey on Image Classification and Activity Recognition using Deep Convolutional Neural Network Architecture", 《2017 NINTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (ICOAC)》, pages 121 - 126 *
OLEG V. KOMOGORTSEV 等: "Biometric authentication via complex oculomotor behavior", 《2013 IEEE SIXTH INTERNATIONAL CONFERENCE ON BIOMETRICS: THEORY, APPLICATIONS AND SYSTEMS (BTAS)》, pages 1 - 8 *
ZHENGYIN DU 等: "Spatio-Temporal Encoder-Decoder Fully Convolutional Network for Video-Based Dimensional Emotion Recognition", 《IEEE TRANSACTIONS ON AFFECTIVE COMPUTING》, pages 565 *
张丽娟 等: "基于深度多模态特征融合的短视频分类", 《北京航空航天大学学报》, vol. 47, no. 3, pages 478 - 485 *
智洪欣 等: "基于时空域深度特征两级编码融合的视频分类", 《计算机应用研究》, vol. 35, no. 3, pages 926 - 929 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113627551A (en) * 2021-08-17 2021-11-09 平安普惠企业管理有限公司 Multi-model-based certificate classification method, device, equipment and storage medium
WO2023035877A1 (en) * 2021-09-08 2023-03-16 北京有竹居网络技术有限公司 Video recognition method and apparatus, readable medium, and electronic device

Also Published As

Publication number Publication date
CN113033707B (en) 2023-08-04

Similar Documents

Publication Publication Date Title
CN109308490B (en) Method and apparatus for generating information
CN113470619B (en) Speech recognition method, device, medium and equipment
CN109961032B (en) Method and apparatus for generating classification model
CN110598157A (en) Target information identification method, device, equipment and storage medium
CN110674414A (en) Target information identification method, device, equipment and storage medium
CN112364829B (en) Face recognition method, device, equipment and storage medium
CN112200173B (en) Multi-network model training method, image labeling method and face image recognition method
CN113449070A (en) Multimodal data retrieval method, device, medium and electronic equipment
CN113033707B (en) Video classification method and device, readable medium and electronic equipment
CN112883968A (en) Image character recognition method, device, medium and electronic equipment
CN110634050B (en) Method, device, electronic equipment and storage medium for identifying house source type
CN115294501A (en) Video identification method, video identification model training method, medium and electronic device
CN109446324B (en) Sample data processing method and device, storage medium and electronic equipment
CN111090993A (en) Attribute alignment model training method and device
CN114494709A (en) Feature extraction model generation method, image feature extraction method and device
CN113033682B (en) Video classification method, device, readable medium and electronic equipment
CN112182179B (en) Entity question-answer processing method and device, electronic equipment and storage medium
CN114187557A (en) Method, device, readable medium and electronic equipment for determining key frame
CN114428867A (en) Data mining method and device, storage medium and electronic equipment
CN113033680A (en) Video classification method and device, readable medium and electronic equipment
CN113420723A (en) Method and device for acquiring video hotspot, readable medium and electronic equipment
KR20210084641A (en) Method and apparatus for transmitting information
CN113222050A (en) Image classification method and device, readable medium and electronic equipment
CN113140012B (en) Image processing method, device, medium and electronic equipment
CN110543491A (en) Search method, search device, electronic equipment and computer-readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant