WO2022237065A1 - Procédé de formation de modèle de classification, procédé de classification de vidéo et dispositif associé - Google Patents

Procédé de formation de modèle de classification, procédé de classification de vidéo et dispositif associé Download PDF

Info

Publication number
WO2022237065A1
WO2022237065A1 PCT/CN2021/123284 CN2021123284W WO2022237065A1 WO 2022237065 A1 WO2022237065 A1 WO 2022237065A1 CN 2021123284 W CN2021123284 W CN 2021123284W WO 2022237065 A1 WO2022237065 A1 WO 2022237065A1
Authority
WO
WIPO (PCT)
Prior art keywords
video frame
video
target
classification
target video
Prior art date
Application number
PCT/CN2021/123284
Other languages
English (en)
Chinese (zh)
Inventor
张宁
刘林
Original Assignee
中移智行网络科技有限公司
中移(上海)信息通信科技有限公司
中国移动通信集团有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中移智行网络科技有限公司, 中移(上海)信息通信科技有限公司, 中国移动通信集团有限公司 filed Critical 中移智行网络科技有限公司
Publication of WO2022237065A1 publication Critical patent/WO2022237065A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Definitions

  • the embodiments of the present application relate to the technical field of video processing, and in particular to a classification model training method, a video classification method, and related equipment.
  • classification models are mainly used to implement video classification, for example, the above classification model may be a support vector machine (support vector machine, SVM).
  • SVM support vector machine
  • the video to be classified may include blank video frames, and inputting the video to be classified containing blank video frames into the classification model will lead to invalid calculations of the classification model, increasing the amount of calculation of the classification model, and resulting in low video classification efficiency .
  • Embodiments of the present application provide a classification model training method, a video classification method, and related equipment to solve the technical problem of low video classification efficiency due to a large number of invalid calculations in the classification model.
  • the embodiment of the present application provides a classification model training method, which is executed by a terminal, and the method includes:
  • the training set includes a plurality of first target video frames and identification information of the first target video frames; the identification information is used to identify image features included in the first target video frames , the weight value of the first target video frame is greater than or equal to a first preset threshold, and the weight value is related to the quantity of the identification information;
  • the initial classification model is trained through the training set to obtain the target classification model.
  • the embodiment of the present application also provides a video classification method, which is executed by a terminal, and the method includes:
  • the video to be classified includes a plurality of third video frames
  • the weight value of the second target video frame is greater than or equal to a first preset threshold
  • the classification result includes identification information for identifying image features corresponding to the second target video frame.
  • the embodiment of the present application further provides a terminal, including:
  • the first transceiver is configured to obtain a training set, wherein the training set includes a plurality of first target video frames and identification information of the first target video frames; the identification information is used to identify the first target video Image features included in the frame, the weight value of the first target video frame is greater than or equal to a first preset threshold, and the weight value is related to the quantity of the identification information;
  • the training module is used to train the initial classification model through the training set to obtain the target classification model.
  • the embodiment of the present application further provides a terminal, including:
  • the second transceiver is used to obtain the video to be classified, and the video to be classified includes a plurality of third video frames;
  • An extraction module configured to extract second feature information in the third video frame, and determine a weight value corresponding to the third video frame according to the second feature information, where the second feature information is used to characterize the the number of image features included in the third video frame;
  • a screening module configured to screen the plurality of third video frames to obtain a second target video frame, the weight value of the second target video frame being greater than or equal to a first preset threshold
  • a classification module configured to input the second target video frame into the target classification model for classification to obtain a classification result, wherein the classification result includes an identifier for identifying the image feature corresponding to the second target video frame information.
  • the embodiment of the present application further provides a device, including: a transceiver, a memory, a processor, and a program stored in the memory and operable on the processor; the processor is configured to read fetching the program in the memory to implement the steps in the method described in the aforementioned first aspect; or, the processor is configured to read the program in the memory to implement the steps in the method described in the aforementioned second aspect.
  • the embodiment of the present application also provides a readable storage medium for storing a program, and when the program is executed by a processor, the steps in the method described in the aforementioned first aspect are implemented; or, the program is processed When the device is executed, the steps in the method as described in the aforementioned second aspect are realized.
  • the feature information of all video frames in the video to be classified is extracted, and the weight value corresponding to each video frame is determined according to the feature information; all video frames in the video to be classified are determined according to the weight value corresponding to each video frame Screening is performed to obtain the target video frame, and the target video frame is input into the trained target classification model for classification to obtain the classification result.
  • all video frames in the video to be classified are screened in advance, and the target video frames input into the classification model are all video frames with a weight value greater than or equal to the first preset threshold. In this way, the video frames to be classified are eliminated. Make sure that the above target video frame does not include a blank video frame. That is to say, the classification model does not need to perform related calculations on the blank video frames in the video to be classified, thereby reducing the calculation amount of the classification model, thereby improving the efficiency of video classification.
  • Fig. 1 is a schematic flow chart of the training method of the classification model provided by the embodiment of the present application
  • FIG. 2 is a schematic diagram of an application scenario using a neural network model for analysis provided by an embodiment of the present application
  • FIG. 3 is a schematic flow diagram of a video classification method provided in an embodiment of the present application.
  • FIG. 4 is a schematic diagram of an application scenario of a video classification method provided in an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a terminal provided by the implementation of the present application.
  • FIG. 6 is a schematic structural diagram of another terminal provided by the implementation of the present application.
  • Fig. 7 is a schematic structural diagram of the device provided by the implementation of the present application.
  • first”, “second” and the like in the embodiments of the present application are used to distinguish similar image features, and are not necessarily used to describe a specific order or sequence.
  • the terms “comprising” and “having”, as well as any variations thereof, are intended to cover a non-exclusive inclusion, for example, a process, method, system, product or device comprising a sequence of steps or elements is not necessarily limited to the expressly listed instead, may include other steps or elements not explicitly listed or inherent to the process, method, product or apparatus.
  • FIG. 1 is a schematic flowchart of a classification model training method provided in an embodiment of the present application.
  • the method for training a classification model shown in FIG. 1 may be executed by a terminal.
  • the training method of the classification model may include the following steps:
  • Step 101 obtain a training set.
  • classification model may be SVM or other classification models.
  • the above training set includes a plurality of first target video frames and identification information of the first target video frames.
  • the video frame in the training set can be called the first target video frame
  • the identification information is used to identify the image features included in the first target video frame
  • the number of identification information corresponding to one frame of the first target video frame can be one or more.
  • the image features include at least one of the following: entity features, behavior features, and scene features.
  • the above entity features refer to the entities displayed in the video frames, and objects such as bicycles, buses, motorcycles, and pedestrians displayed in the video frames can be understood as entities. It should be understood that the above entity features have different definitions in different scene videos, and the above entity features may also be customized by the user.
  • the above-mentioned behavior features refer to the behaviors corresponding to the entities in the video frames, for example, pedestrians passing the road, buses passing the intersection, etc. It should be understood that the above-mentioned behavioral features have different definitions in different scene videos, and the above-mentioned behavioral features can also be customized by the user.
  • the above scene features refer to the scene displayed by the video frame, for example, the scene displayed by the video frame is a crossroad, and the scene displayed by the video frame is a highway. It should be understood that the above scene features may also be customized by the user.
  • the weight value of the first target video frame is greater than or equal to the first preset threshold, and the weight value is related to the number of identification information. For a video frame, the more the number of identification information corresponding to the video frame, Then the weight value corresponding to the video frame is higher.
  • the weight value of the first target video frame please refer to the subsequent embodiments.
  • a training set is obtained, wherein the training set may be generated by the terminal based on a video file, or may be a video file sent by the terminal received by other electronic devices.
  • Step 102 train the initial classification model through the training set to obtain the target classification model.
  • the terminal uses the training set to iteratively train the initial classification model to obtain a trained target classification model.
  • the classification model is SVM
  • the SVM can be trained in the following ways.
  • the penalty parameter can be represented by C.
  • C the larger the penalty parameter, the more intolerant the error in the classification result, which will easily lead to over-fitting of the classification result; the smaller the penalty parameter, it will easily lead to under-fitting of the classification result.
  • the kernel parameter is a parameter of the Radial Basis Function (RBF) in SVM.
  • RBF Radial Basis Function
  • the number of support vectors affects the speed of SVM training and prediction.
  • the classification model may also be other classification models than the SVM, or a neural network model, or other forms of models.
  • a database other than Scikit-Learn may also be set as the SVM database.
  • the obtaining training set includes:
  • the first video frame is screened to obtain a second video frame; the second video frame is input into a preset neural network model for analysis to obtain the first target video frame.
  • the above-mentioned first video may be a video provided by a third-party organization.
  • the videos in the training set are videos of traffic scenes
  • the first video may be a video of a traffic scene provided by a third-party organization.
  • a video frame in the first video may be understood as a first video frame.
  • the first feature information is used to characterize the quantity of image features included in the first video frame.
  • the network model can be used to perform image recognition on the first video based on Rule-Based rules, to identify each image feature in the first video, and to obtain the identification information corresponding to each image feature, wherein, based on Rule The -Based rule pre-sets the mapping relationship between image features and identification information. It should be understood that in some embodiments, other tools may also be used to perform image recognition on the first video, which is not specifically limited here.
  • the weight value is related to the quantity of identification information. Specifically, for how to determine the weight value corresponding to the first feature information, please refer to the subsequent embodiments.
  • the multiple first video frames are screened according to the weight values to obtain the second video frames.
  • the weight values of the second video frames are greater than or equal to the first preset threshold.
  • the video frames whose weight value is less than the first preset threshold are determined as invalid video frames
  • the video frames whose weight value is greater than or equal to the first preset threshold are determined as invalid video frames. valid video frames, and deleting invalid video frames among the plurality of first video frames to obtain a second video frame.
  • the invalid video frame refers to a blank video frame, that is, a video frame that does not include image features, or a video frame with a small number of image features; the specific value of the above-mentioned first preset threshold can be customized, and will not be described in detail here. limited.
  • a plurality of first video frames are screened according to the weight value to obtain a second video frame, thereby deleting invalid video frames in the plurality of first video frames, eliminating invalid data in the training set, and further reducing
  • the calculation amount of the classification model in the video classification process improves the efficiency of video classification.
  • the determining the weight value corresponding to the first feature information includes:
  • a product result of the first feature information and a preset coefficient is determined as the weight value.
  • image features include but not limited to entity features, behavior features and scene features.
  • the preset coefficients include a first coefficient corresponding to entity characteristics, a second coefficient corresponding to behavior characteristics, and a third coefficient corresponding to scene characteristics.
  • the first numerical value is used to represent the quantity of entity features
  • the second numerical value is used to represent the quantity of behavioral features
  • the third numerical value is used to represent the quantity of scene features.
  • Another optional implementation manner is to determine the product of the number of image features represented by the first feature information and the preset coefficient as the weight value.
  • the inputting the second video frame into a preset neural network model for analysis, and obtaining the first target video frame includes:
  • the second video frame is input into the neural network model, and the identification information corresponding to each image feature in the second video frame is determined.
  • the neural network model may be a convolutional neural network model, or other types of neural network models, which are not specifically limited here.
  • the verification result is used to indicate whether the identification information matches the image feature corresponding to the identification information, wherein the verification result may be manually generated according to the image feature.
  • the identification information indicated by the verification result matches the identified image feature
  • a positive feedback signal is sent to the neural network model, thereby The neural network model is controlled to output the second video frame corresponding to the identification information to the training set for storage.
  • the identification information indicated by the verification result does not match the identified image feature, it means that the identification information determined by the neural network model does not match the image feature corresponding to the identification information, then a negative feedback signal is sent to the neural network model, The neural network model is controlled to perform image recognition on the second video frame corresponding to the identification information again.
  • the video frame is deleted to get the second video frame.
  • the second video frame is input into the neural network model, and the neural network model performs image recognition on the second video frame to determine the identification information corresponding to each image feature in the second video frame.
  • the machine learning in FIG. 2 is the neural network model.
  • the identification information corresponding to the image feature "bus” is "bus”
  • the image feature "bus” is included in a second video frame, if the identification information generated by the neural network model for the image feature is " bus”, after manually judging that the identification information matches the image features, a positive feedback signal is sent to the neural network model; if the identification information generated by the neural network model for the image feature is not After the feature mismatch, send a negative feedback signal to the neural network model.
  • the neural network model After receiving the positive feedback signal, the neural network model outputs the value training set of the second video frame, that is, the traffic scene classification training set in Fig. 2; after receiving the negative feedback signal, it continues to perform image processing on the second video frame identify.
  • the process of the neural network model performing image recognition on the second video frame and receiving positive/negative feedback signals can be referred to as the “reinforcement learning process” in FIG. 2 .
  • FIG. 3 is a schematic flowchart of a video classification method provided in an embodiment of the present application.
  • the video classification method shown in FIG. 3 can be executed by a terminal.
  • the terminal applying the video classification method and the terminal applying the above classification model training method may be the same terminal.
  • the video classification method may include the following steps:
  • Step 201 acquire videos to be classified.
  • the above-mentioned video to be classified is a video after deduplication operation is performed on the target video, wherein the above-mentioned target video may be a video sent by another device communicating with the terminal.
  • the above-mentioned target video may be a video sent by another device communicating with the terminal.
  • Step 202 extracting second feature information in a third video frame, and determining a weight value corresponding to the third video frame according to the second feature information.
  • the video frame in the video to be classified may be referred to as a third video frame.
  • the network model may be used to extract feature information in the third video frame, or other methods may be used to provide feature information in the third video frame, which is not specifically limited here.
  • the feature information in the third video frame is referred to as second feature information, where the second feature information is used to represent the quantity of image features included in the third video frame.
  • the method for extracting the second feature information in the third video frame is consistent with the above-mentioned method for determining the first feature information corresponding to the first video frame, and will not be repeated here.
  • an identifier corresponding to each image feature will be generated based on the same method as confirming the identification information corresponding to each image feature in the second video frame information, here, the identification information corresponding to each image feature in the third video frame may be called a label.
  • Step 203 Filter the plurality of third video frames to obtain a second target video frame.
  • the video frames whose weight value is less than the first preset threshold in the third video frame are deleted to obtain the second target video frame. It is easy to understand that the weight value of the second target video frame is greater than or equal to the first preset threshold .
  • all the second target video frames may be referred to as a test set.
  • Step 204 input the second target video frame into a target classification model for classification, and obtain a classification result.
  • the above-mentioned target classification model is the trained classification model.
  • the second target video frame is input into the target classification model for classification to obtain a classification result, wherein the classification result includes identifying the corresponding image features of the second target video frame. Identification information.
  • the second target video frame may include multiple different image features, then the second target video frame may also include multiple different identification information.
  • the video frames in the video to be classified are screened in advance, and the target video frames input into the classification model are all video frames with a weight value greater than or equal to the first preset threshold, so that the blank video in the video to be classified is eliminated frame, make sure the target video frame above does not include blank video frames.
  • the classification model does not need to perform related calculations on the blank video frames in the video to be classified, thereby reducing the calculation amount of the classification model and improving the efficiency of video classification.
  • the acquisition of videos to be classified includes:
  • a Gaussian distribution curve corresponding to the fourth video frame is generated; based on the standard deviation and average value of the Gaussian distribution curve, the fourth video is calculated The relative entropy corresponding to the frame; deleting the fifth video frame in the target video to obtain the video to be classified.
  • any video frame in the video frames of the target video except the end video frame may be referred to as a fourth video frame, and the fourth video frame is displayed in the form of a Gaussian distribution curve.
  • a Gaussian distribution curve corresponding to the fourth video frame may be generated based on the pixel value corresponding to each pixel in the fourth video frame.
  • the Gaussian distribution curve corresponding to the fourth video frame may be generated based on the gray value corresponding to each pixel in the fourth video frame.
  • the above-mentioned i-th frame is the fourth video frame, and the above-mentioned relative entropy can also be called KL divergence.
  • a second preset threshold is set. If the relative entropy of a video frame is greater than the second preset threshold, it means that the content represented by the video frame may be the same or similar to the content represented by the adjacent video frames of the video frame. , the video frame needs to be deleted.
  • a video frame whose relative entropy is greater than a second preset threshold may be called a fifth video frame, and the fifth video frame in the target video may be deleted to obtain a video to be classified.
  • the Gaussian distribution curve corresponding to each fourth video frame in the target video is obtained, and based on the standard deviation and average value of the Gaussian distribution curve, the relative entropy corresponding to each fourth video frame is obtained, and the relative entropy is used for Characterize the similarity between the corresponding video frame and adjacent video frames; delete the video frames whose relative entropy is higher than the second preset threshold in the target video, so as to perform deduplication operation on the target video, and obtain the video to be distributed.
  • the fourth video frame with a high degree of similarity in the target video is deleted, thereby reducing the calculation amount of the classification model, thereby improving the efficiency of video classification.
  • the method includes:
  • the identification information corresponding to each image feature in the third video frame can be called a label. Since the second target video frame is obtained by deleting some video frames from the third video frame, the second target video frame also A plurality of labels are included, wherein the labels are used to characterize the categories of the image features in the second target video frame.
  • the above index value is used to characterize the accuracy of the classification result of the classification model.
  • the index value is greater than the third preset threshold, it means that the classification result is relatively accurate.
  • the second target video frame is stored in the data set, so as to Augment the training data in the training set.
  • the classification model is judged as P A B
  • the classification model is judged as Q C D.
  • both A and D in Table 1 represent the number of second target video frames whose category represented by the label is the same as the category characterized by the classification result, and when the category represented by the label is the same as the category represented by the classification result, it means classification The classification result of the model is correct.
  • Both B and C in Table 1 represent the number of second target video frames in which the category represented by the label is different from the category represented by the classification result. The result is wrong.
  • index values are precision rate, correct rate, recall rate and evaluation value, and the above four index values can be calculated using the following formula:
  • the third preset threshold can be set to include a fourth value corresponding to the precision rate, a fifth value corresponding to the correct rate, a sixth value corresponding to the recall rate, and a seventh value corresponding to the evaluation value.
  • the precision rate is greater than the corresponding fourth value
  • the correct rate is greater than the corresponding fifth value
  • the recall rate is greater than the corresponding sixth value
  • the evaluation value is greater than the corresponding seventh value
  • the traffic scene data set is the first video; determine the first feature information corresponding to the first video frame, and determine the weight value corresponding to the first feature information, namely Corresponding to the "Formation of structured data based on rules" part in Figure 4; according to the weight value, the first video frame is screened to obtain the second video frame, and the second video frame is input to the neural network model for analysis to obtain the first
  • the target video frame, the first target video frame is a video frame in the training set, which corresponds to the part of "forming a set of traffic scene classification training set" in Fig. 4 .
  • the target video is deduplicated to obtain the video to be classified, that is, the "video is decomposed into video frames" part in Figure 4; the third video in the video to be classified is extracted
  • the feature information corresponding to the frame corresponds to the part of "extracting the feature data of the video frame based on CNN" in Fig. 4; multiple third video frames are screened to obtain the second target video frame, which is the test set.
  • Video frame corresponding to the "Formation of structured data based on rules" part in Figure 4.
  • the training set uses the training set to train the SVM classification model. After the training is completed, input the test set into the SVM classification model to obtain the classification result, and after passing the evaluation index measurement, that is, the index value in the above-mentioned embodiment is greater than the third preset In the case of the threshold, the test set is stored in the traffic scene dataset.
  • FIG. 5 is a schematic structural diagram of a terminal provided by an embodiment of the present application. As shown in Figure 5, the terminal 300 includes:
  • the first transceiver 301 is configured to acquire a training set
  • the training module 302 is configured to train the initial classification model through the training set to obtain the target classification model.
  • the first transceiver 301 includes:
  • an extracting unit configured to extract a plurality of first video frames in the first video
  • a determining unit configured to determine first feature information corresponding to the first video frame
  • a screening unit configured to determine a weight value corresponding to the first feature information, and filter the plurality of first video frames according to the weight value to obtain a second video frame;
  • An analysis unit configured to input the second video frame into a preset neural network model for analysis to obtain the first target video frame.
  • the analysis unit is also used for:
  • the determining unit is further configured to:
  • a product result of the first feature information and a preset coefficient is determined as the weight value.
  • the terminal 300 can implement each process of the method embodiment in FIG. 1 in the embodiment of the present application and achieve the same beneficial effect. To avoid repetition, details are not repeated here.
  • FIG. 5 is a schematic structural diagram of another terminal provided by an embodiment of the present application.
  • the terminal 400 includes:
  • the second transceiver 401 is used to obtain the video to be classified
  • An extraction module 402 configured to extract second feature information in a third video frame, and determine a weight value corresponding to the third video frame according to the second feature information;
  • a screening module 403, configured to screen the plurality of third video frames to obtain a second target video frame
  • a classification module 404 configured to input the second target video frame into a target classification model for classification to obtain a classification result.
  • the second transceiver 401 is also used for:
  • the terminal 400 also includes:
  • a determining module configured to determine an index value corresponding to the second target video frame based on the classification result corresponding to the label and each image feature
  • a storage module configured to store the second target video frame in a training set when the index value is greater than a third preset threshold.
  • the terminal 400 can implement each process of the method embodiment in FIG. 3 in the embodiment of the present application, and achieve the same beneficial effect. To avoid repetition, details are not repeated here.
  • terminal 300 and terminal 400 may be the same terminal.
  • the electronic device may include a processor 501 , a memory 502 and a program 5021 stored in the memory 502 and executable on the processor 501 .
  • any steps in the method embodiments corresponding to FIG. 1 and/or FIG. 3 can be implemented and the same beneficial effect can be achieved, and details are not repeated here.
  • the embodiment of the present application also provides a readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, it can implement the above-mentioned method embodiments corresponding to FIG. 1 and/or FIG. 3 Any steps can achieve the same technical effect, in order to avoid repetition, no more details here.
  • the storage medium is, for example, a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a magnetic disk or an optical disk, and the like.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • magnetic disk or an optical disk and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente demande concerne un procédé de formation de modèle de classification, un procédé de classification de vidéo et un dispositif associé. Le procédé de classification de vidéo comprend les étapes suivantes consistant : à obtenir une vidéo à classifier ; à extraire des secondes informations de caractéristique dans des troisièmes trames vidéo et, en fonction des secondes informations de caractéristique, à déterminer des valeurs de pondération correspondant aux troisièmes trames vidéo ; à effectuer un filtrage sur une pluralité de troisièmes trames vidéo pour obtenir une deuxième trame vidéo cible ; et à entrer la deuxième trame vidéo cible dans un modèle de classification cible pour une classification de sorte à obtenir un résultat de classification. Selon les modes de réalisation de la présente demande, les trames vidéo dans ladite vidéo sont filtrées à l'avance et les trames vidéo cibles entrées dans le modèle de classification sont toutes des trames vidéo dont les valeurs de pondération sont supérieures ou égales à un premier seuil prédéfini et, ainsi, une trame vidéo vierge dans ladite vidéo est éliminée et il est garanti que les trames vidéo cibles ne comprennent pas la trame vidéo vierge. Le modèle de classification n'a pas besoin d'effectuer un calcul approprié sur la trame vidéo vierge dans ladite vidéo et, ainsi, la quantité de calcul du modèle de classification est réduite et l'efficacité de classification de vidéo est encore améliorée.
PCT/CN2021/123284 2021-05-12 2021-10-12 Procédé de formation de modèle de classification, procédé de classification de vidéo et dispositif associé WO2022237065A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110517456.5A CN113177603B (zh) 2021-05-12 2021-05-12 分类模型的训练方法、视频分类方法及相关设备
CN202110517456.5 2021-05-12

Publications (1)

Publication Number Publication Date
WO2022237065A1 true WO2022237065A1 (fr) 2022-11-17

Family

ID=76929900

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/123284 WO2022237065A1 (fr) 2021-05-12 2021-10-12 Procédé de formation de modèle de classification, procédé de classification de vidéo et dispositif associé

Country Status (2)

Country Link
CN (1) CN113177603B (fr)
WO (1) WO2022237065A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113177603B (zh) * 2021-05-12 2022-05-06 中移智行网络科技有限公司 分类模型的训练方法、视频分类方法及相关设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103778237A (zh) * 2014-01-27 2014-05-07 北京邮电大学 一种基于活动事件时空重组的视频摘要生成方法
CN111027507A (zh) * 2019-12-20 2020-04-17 中国建设银行股份有限公司 基于视频数据识别的训练数据集生成方法及装置
CN111626251A (zh) * 2020-06-02 2020-09-04 Oppo广东移动通信有限公司 一种视频分类方法、视频分类装置及电子设备
CN113177603A (zh) * 2021-05-12 2021-07-27 中移智行网络科技有限公司 分类模型的训练方法、视频分类方法及相关设备

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9510044B1 (en) * 2008-06-18 2016-11-29 Gracenote, Inc. TV content segmentation, categorization and identification and time-aligned applications
CN107273782B (zh) * 2016-04-08 2022-12-16 微软技术许可有限责任公司 使用递归神经网络的在线动作检测
CN108615358A (zh) * 2018-05-02 2018-10-02 安徽大学 一种道路拥堵检测方法及装置
CN110858290B (zh) * 2018-08-24 2023-10-17 比亚迪股份有限公司 驾驶员异常行为识别方法、装置、设备及存储介质
CN109815873A (zh) * 2019-01-17 2019-05-28 深圳壹账通智能科技有限公司 基于图像识别的商品展示方法、装置、设备及介质
CN109829432B (zh) * 2019-01-31 2020-11-20 北京字节跳动网络技术有限公司 用于生成信息的方法和装置
CN110149531A (zh) * 2019-06-17 2019-08-20 北京影谱科技股份有限公司 一种识别视频数据中视频场景的方法和装置
CN110991373A (zh) * 2019-12-09 2020-04-10 北京字节跳动网络技术有限公司 图像处理方法、装置、电子设备及介质
CN111626922B (zh) * 2020-05-11 2023-09-15 北京字节跳动网络技术有限公司 图片生成方法、装置、电子设备及计算机可读存储介质
CN111666898B (zh) * 2020-06-09 2021-10-26 北京字节跳动网络技术有限公司 用于识别车辆所属类别的方法和装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103778237A (zh) * 2014-01-27 2014-05-07 北京邮电大学 一种基于活动事件时空重组的视频摘要生成方法
CN111027507A (zh) * 2019-12-20 2020-04-17 中国建设银行股份有限公司 基于视频数据识别的训练数据集生成方法及装置
CN111626251A (zh) * 2020-06-02 2020-09-04 Oppo广东移动通信有限公司 一种视频分类方法、视频分类装置及电子设备
CN113177603A (zh) * 2021-05-12 2021-07-27 中移智行网络科技有限公司 分类模型的训练方法、视频分类方法及相关设备

Also Published As

Publication number Publication date
CN113177603B (zh) 2022-05-06
CN113177603A (zh) 2021-07-27

Similar Documents

Publication Publication Date Title
CN109086811B (zh) 多标签图像分类方法、装置及电子设备
CN107463605B (zh) 低质新闻资源的识别方法及装置、计算机设备及可读介质
US20190102655A1 (en) Training data acquisition method and device, server and storage medium
CN112270252A (zh) 一种改进YOLOv2模型的多车辆目标识别方法
WO2019091402A1 (fr) Procédé et dispositif d'estimation d'âge
CN110909784B (zh) 一种图像识别模型的训练方法、装置及电子设备
CN115953665B (zh) 一种目标检测方法、装置、设备及存储介质
CN111460153A (zh) 热点话题提取方法、装置、终端设备及存储介质
CN111639970A (zh) 基于图像识别的物品价格确定方法及相关设备
CN111046904B (zh) 一种图像描述方法、图像描述装置及计算机存储介质
CN111639230B (zh) 一种相似视频的筛选方法、装置、设备和存储介质
CN112765402A (zh) 一种敏感信息识别方法、装置、设备及存储介质
WO2022237065A1 (fr) Procédé de formation de modèle de classification, procédé de classification de vidéo et dispositif associé
CN115082659A (zh) 一种图像标注方法、装置、电子设备及存储介质
CN110619349A (zh) 植物图像分类方法及装置
CN116578738B (zh) 一种基于图注意力和生成对抗网络的图文检索方法和装置
CN111967383A (zh) 年龄估计方法、年龄估计模型的训练方法和装置
CN114155388B (zh) 一种图像识别方法、装置、计算机设备和存储介质
CN114969439A (zh) 一种模型训练、信息检索方法及装置
CN111984812B (zh) 一种特征提取模型生成方法、图像检索方法、装置及设备
CN113420699A (zh) 一种人脸匹配方法、装置及电子设备
CN112214639A (zh) 视频筛选方法、视频筛选装置及终端设备
CN112183283A (zh) 一种基于图像的年龄估计方法、装置、设备及存储介质
CN113535951B (zh) 用于进行信息分类的方法、装置、终端设备及存储介质
US20240135679A1 (en) Method for classifying images and electronic device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21941631

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21941631

Country of ref document: EP

Kind code of ref document: A1