CN111967346A - Video data processing method and device and electronic equipment - Google Patents

Video data processing method and device and electronic equipment Download PDF

Info

Publication number
CN111967346A
CN111967346A CN202010740279.2A CN202010740279A CN111967346A CN 111967346 A CN111967346 A CN 111967346A CN 202010740279 A CN202010740279 A CN 202010740279A CN 111967346 A CN111967346 A CN 111967346A
Authority
CN
China
Prior art keywords
image
information
sequence
target video
category
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010740279.2A
Other languages
Chinese (zh)
Inventor
谢文珍
周佳
包英泽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dami Technology Co Ltd
Original Assignee
Beijing Dami Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dami Technology Co Ltd filed Critical Beijing Dami Technology Co Ltd
Priority to CN202010740279.2A priority Critical patent/CN111967346A/en
Publication of CN111967346A publication Critical patent/CN111967346A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Tourism & Hospitality (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Marketing (AREA)
  • Educational Administration (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Educational Technology (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a method and a device for processing video data and electronic equipment, wherein the method comprises the following steps: acquiring a target video; converting the target video into an image sequence, detecting at least one frame of image in the image sequence according to at least one detection model, and acquiring first image detection information, wherein the first image detection information is used for representing basic information of the image; determining a first image detection information sequence according to the time sequence of at least one frame of image and corresponding first image detection information; determining characteristic information according to the first image detection information sequence; and respectively inputting the characteristic information into at least two gradient boosting decision tree models, respectively determining at least two first category scores, and determining a second category score of the target video according to the at least two first category scores. By the method, the target video can be judged quickly and accurately, and the category of the target video is determined.

Description

Video data processing method and device and electronic equipment
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for processing video data, and an electronic device.
Background
With the development of internet applications, video image processing has been widely used, for example: the method is applied to video communication and online teaching, but has some problems in application. Use online teaching as an example, can have the video to online teaching and carry out the analysis to the demand that realizes the teaching and control, thereby carry out subsequent processing, for example: defects in the teaching process are found, teachers are supervised and urged to improve teaching skills, teaching quality is improved, user experience is improved, and the like. In the prior art, the video is monitored in an artificial mode, the amount of video data is huge, a large amount of human resources need to be wasted in the artificial mode, and the judgment result is influenced due to the fact that the standards judged by everyone are different.
Disclosure of Invention
The invention provides a method and a device for processing video data and electronic equipment, which can efficiently and accurately judge a target video.
According to a first aspect of embodiments of the present invention, there is provided a method of video data processing, comprising: acquiring a target video; converting the target video into an image sequence, wherein images in the image sequence are arranged according to a time sequence; detecting at least one frame of image in the image sequence according to at least one detection model to obtain first image detection information, wherein the first image detection information is used for representing basic information of the image; determining a first image detection information sequence according to the time sequence of at least one frame of image and corresponding first image detection information; determining characteristic information according to the first image detection information sequence; inputting the characteristic information into at least two gradient lifting decision tree models respectively, and determining at least two first category scores respectively, wherein each gradient lifting decision tree model is used for classifying teaching courses corresponding to the teaching characteristic information; determining a second category score of the target video according to the at least two first category scores.
In one embodiment, the method further comprises: the method further comprises the following steps: and in response to the second category score being larger than or equal to a set threshold value, the target video corresponding to the characteristic information is set as the first category video.
In one embodiment, the method further comprises: the method further comprises the following steps: and in response to the second category score being smaller than a set threshold, the target video corresponding to the characteristic information is set as a second category video.
In one embodiment, the method further comprises: the converting the target video into an image sequence specifically includes: and acquiring a frame of image in the target video at set intervals, encoding the acquired images according to a time sequence, and forming an image sequence by the encoded images.
In one embodiment, the method further comprises: the determining a second category score of the target video according to the at least two first category scores specifically includes: determining weighting coefficients corresponding to the at least two first category scores respectively; multiplying the at least two first category scores by corresponding weight coefficients respectively to determine at least two numerical values; determining a sum of the at least two numerical values as the second category score.
In one embodiment, the method further comprises: the first image detection information includes: at least one of basic information, face emotion information, face angle information, face information, and gesture information.
In one embodiment, the method further comprises: the characteristic information includes: at least one of basic feature information, face emotion feature information, face angle feature information, face feature information, and gesture feature information.
In one embodiment, the method further comprises: the detection model comprises: at least one of a face emotion detection model, a face angle detection model, a face detection model, and a gesture detection model.
In one embodiment, the method further comprises: the training process of the gradient boosting decision tree model comprises the following steps: acquiring at least two characteristic information training sets, wherein the characteristic information training sets comprise historical characteristic information; and training at least two gradient lifting decision tree models according to the at least two characteristic information training sets.
According to a second aspect of the embodiments of the present invention, there is provided an apparatus for video data processing, including: an acquisition unit configured to acquire a target video; a conversion unit, configured to convert the target video into an image sequence, where images in the image sequence are arranged in a time sequence; the detection unit is used for detecting at least one frame of image in the image sequence according to at least one detection model to obtain first image detection information, wherein the first image detection information is used for representing basic information of the image; the processing unit is used for determining a first image detection information sequence according to the time sequence of at least one frame of image and corresponding first image detection information; the processing unit is further configured to determine feature information according to the first image detection information sequence; the processing unit is further configured to input the feature information into at least two gradient boosting decision tree models respectively, and determine at least two first category scores respectively, where each gradient boosting decision tree model is used to classify teaching courses corresponding to the teaching feature information; a determining unit, configured to determine a second category score of the target video according to the at least two first category scores.
In one embodiment, the apparatus further comprises: and the judging unit is used for responding to the second category score being larger than or equal to a set threshold value and setting the target video corresponding to the characteristic information as the first category video.
In one embodiment, the determining unit is further configured to: and in response to the second category score being smaller than a set threshold, the target video corresponding to the characteristic information is set as a second category video.
In one embodiment, the conversion unit is specifically configured to: and acquiring a frame of image in the target video at set intervals, encoding the acquired images according to a time sequence, and forming an image sequence by the encoded images.
In one embodiment, the determining unit is specifically configured to: determining weighting coefficients corresponding to the at least two first category scores respectively; multiplying the at least two first category scores by corresponding weight coefficients respectively to determine at least two numerical values; determining a sum of the at least two numerical values as the second category score.
In one embodiment, the first image detection information includes: at least one of basic information, face emotion information, face angle information, face information, and gesture information.
In one embodiment, the feature information includes: at least one of basic feature information, face emotion feature information, face angle feature information, face feature information, and gesture feature information.
In one embodiment, the detection model comprises: at least one of a face emotion detection model, a face angle detection model, a face detection model, and a gesture detection model.
In one embodiment, the apparatus further comprises a training unit: the method comprises the steps of obtaining at least two characteristic information training sets, wherein the characteristic information training sets comprise historical characteristic information; and training at least two gradient lifting decision tree models according to the at least two characteristic information training sets.
According to a third aspect of embodiments of the present invention, there is provided an electronic device comprising a memory and a processor, the memory being configured to store one or more computer program instructions, wherein the one or more computer program instructions are executed by the processor to implement the method according to the first aspect or any possibility of the first aspect.
According to a fourth aspect of embodiments of the present invention, there is provided a computer-readable storage medium on which computer program instructions are stored, wherein the computer program instructions, when executed by a processor, implement the method according to the first aspect or any possibility of the first aspect.
The beneficial effects of the embodiment of the invention comprise: firstly, obtaining a target video, then converting the target video into an image sequence, wherein images in the image sequence are arranged according to a time sequence, then detecting at least one frame of image in the image sequence according to at least one detection model, obtaining first image detection information, the first image detection information is used for representing basic information of the image, then determining a first image detection information sequence according to the time sequence of at least one frame of image and the corresponding first image detection information, then determining characteristic information according to the first image detection information sequence, respectively inputting the characteristic information into at least two gradient boosting decision tree models, respectively determining at least two first category scores, wherein each gradient boosting decision tree model is used for classifying teaching courses corresponding to the teaching characteristic information, and finally, determining a second category score of the target video according to the at least two first category scores. By the method, the second category score of the target video can be efficiently and accurately judged, and the quality of the target video can be further judged.
Drawings
The above and other objects, features and advantages of the present invention will become more apparent from the following description of the embodiments of the present invention with reference to the accompanying drawings, in which:
fig. 1 is a flow chart of a method for processing video data according to an embodiment of the present invention;
fig. 2 is a flow chart of a method for processing video data according to an embodiment of the present invention;
fig. 3 is a flowchart of a method for processing video data according to an embodiment of the present invention;
fig. 4 is a flowchart of a method for processing video data according to an embodiment of the present invention;
FIG. 5 is a diagram illustrating a video data processing according to an embodiment of the present invention;
fig. 6 is a schematic diagram of an apparatus for processing video data according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The present disclosure is described below based on examples, but the present disclosure is not limited to only these examples. In the following detailed description of the present disclosure, certain specific details are set forth. It will be apparent to those skilled in the art that the present disclosure may be practiced without these specific details. Well-known methods, procedures, components and circuits have not been described in detail so as not to obscure the present disclosure.
Further, those of ordinary skill in the art will appreciate that the drawings provided herein are for illustrative purposes and are not necessarily drawn to scale.
Unless the context clearly requires otherwise, throughout this specification, the words "comprise", "comprising", and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is, what is meant is "including, but not limited to".
In the description of the present disclosure, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In addition, in the description of the present disclosure, "a plurality" means two or more unless otherwise specified.
According to one or more embodiments, a method for processing video data is provided, a target video is obtained; converting the target video into an image sequence, wherein images in the image sequence are arranged according to a time sequence; detecting at least one frame of image in the image sequence according to at least one detection model to obtain first image detection information, wherein the first image detection information is used for representing basic information of the image; determining a first image detection information sequence according to the time sequence of at least one frame of image and corresponding first image detection information; determining characteristic information according to the first image detection information sequence; inputting the characteristic information into at least two gradient lifting decision tree models respectively, and determining at least two first category scores respectively, wherein each gradient lifting decision tree model is used for classifying teaching courses corresponding to the teaching characteristic information; determining a second category score of the target video according to the at least two first category scores. In one or more embodiments, a method flow for video data processing is shown in fig. 1, but the example of fig. 1 is not to be construed as a specific limitation on the embodiments.
And step S100, acquiring a target video.
According to one or more embodiments, the target video may be an online video or a pre-recorded video.
According to one or more embodiments, the target video may be a course video of online education, and may also be a video of other industries, which is not limited by the embodiments of the present invention.
Step S101, converting the target video into an image sequence, wherein images in the image sequence are arranged according to a time sequence.
According to one or more embodiments, a frame of image is acquired in the target video at set time intervals, the acquired images are encoded according to a time sequence, and the encoded images form an image sequence.
According to one or more embodiments, if the target video is a pre-recorded video, assuming that the duration of the target video is 5 minutes, acquiring one frame of image every 1 second from the 1 st second, acquiring 300 frames of image from the 5-minute target video, encoding each frame of image according to a time sequence, and forming the encoded images into an image sequence, where the above embodiments are merely exemplary illustrations, the duration of the target video is not limited, the frequency of acquiring the images is not limited, and according to actual situations, one frame of image may be acquired every 2 seconds, one frame of image may be acquired every 3 seconds, and one frame of image may be acquired every 5 seconds.
According to one or more embodiments, if the target video is an online video, images are acquired from the 1 st second, one frame of image is acquired every 1 second, until the online video is finished, each frame of image is encoded according to a time sequence, and the encoded images form an image sequence.
Step S102, detecting at least one frame of image in the image sequence according to at least one detection model, and acquiring first image detection information, wherein the first image detection information is used for representing basic information of the image.
According to one or more embodiments, the first image detection information includes basic information, face emotion information, face angle information, face information, and gesture information.
According to one or more embodiments, the detection models include a face emotion detection model, a face angle detection model, a face detection model, and a gesture detection model.
According to one or more embodiments, at least one frame of image in the image sequence is detected according to a human face emotion detection model, and human face emotion information is obtained, for example, the human face emotion information specifically includes smile, anger, fear, and the like.
According to one or more embodiments, at least one frame of image in the image sequence is detected according to a face angle detection model, and face angle information is obtained, for example, the face angle information specifically includes a yaw (yaw) of a face average angle, a pitch (pitch) of the face average angle, and a roll (roll) of the face average angle.
According to one or more embodiments, at least one frame of image in the image sequence is detected according to a face detection model, and face information is obtained, wherein the face information specifically includes a horizontal axis of a face average center and a vertical axis of the face average center.
According to one or more embodiments, at least one frame of image in the image sequence is detected according to a gesture detection model, and gesture information is obtained, wherein the gesture information comprises a gesture mean center point horizontal axis and a gesture mean center point vertical axis.
According to one or more embodiments, the first image detection information further includes basic information, a course unique code, a course length, and the like, and the basic information may be obtained by any one of the detection models described above or may be preset, which is not limited in the embodiments of the present invention.
Step S103, determining a first image detection information sequence according to the time sequence of at least one frame of image and the corresponding first image detection information.
According to one or more embodiments, the first image detection information is sorted according to the time sequence of at least one frame of image, and a first image detection information sequence is generated; for example, assuming an image sequence composed of 300 frames of images, the first image detection information of each image is acquired, and the acquired first image detection information is sorted according to the time sequence of 300 frames of images, so as to generate a first image detection information sequence.
And step S104, determining characteristic information according to the first image detection information sequence.
According to one or more embodiments, feature information is determined through feature engineering and the first image detection information sequence, wherein feature engineering is an important link of automatic machine learning, the feature engineering is to obtain a plurality of candidate data sets through feature transformation on an original data set, and obtain an optimal data set through evaluation on the plurality of candidate data sets, the optimal data set comprises data features which can be used for machine learning, the data features can describe the characteristics of the original data set in all directions and multiple angles, and a model established by the data features can show good performance.
According to one or more embodiments, the feature information includes basic feature information, face emotion feature information, face angle feature information, face feature information, and gesture feature information.
According to one or more embodiments, the basic feature information includes a course unique code, a course length, network disconnection information in the first 10 minutes, network disconnection information in the last 5 minutes, network disconnection information in 11-20 minutes, and the like.
In accordance with one or more embodiments, the face emotional characteristic information includes a total number of smiles, a total number of faces, a total score of smiles, a face smile rate (total number of smiles/total number of faces), a face smile score (total score of smiles/total number of faces), a total number of smiles in the first 10 minutes, a total number of faces in the first 10 minutes, a total score of smiles in the first 10 minutes, a face smile rate in the first 10 minutes, a face smile score in the first 10 minutes, the total number of smiles in the last 5 minutes, the total number of faces in the last 5 minutes, the total score of smiles in the last 5 minutes, the face smile rate in the last 5 minutes, the face smile score in the last 5 minutes, the total number of smiles in 11-20 minutes, the face total score in 11-20 minutes, the face smile rate in 11-20 minutes, and the face smile score in 11-20 minutes.
According to one or more embodiments, the face angle feature information includes yaw of the face average angle, pitch of the face average angle, roll of the face average angle, total of face twist angles, and face average twist angle (total of face twist angles/total number of faces).
According to one or more embodiments, the face feature information includes a horizontal axis of a face average center, a vertical axis of the face average center, a total distance of movement of the face center, and a total distance of movement of the face center (total distance of movement of the face center/total number of faces of the teacher).
According to one or more embodiments, the gesture feature information includes a total number of gestures, a horizontal axis of a gesture average center point, a vertical axis of a gesture average center point, a total distance of movement of a gesture center point, and a mean distance of movement of a gesture center point (total distance of movement of a gesture center point/total number of gestures).
Step S105, the characteristic information is respectively input into at least two gradient lifting decision tree models, and at least two first category scores are respectively determined, wherein each gradient lifting decision tree model is used for classifying teaching courses corresponding to the teaching characteristic information.
According to one or more embodiments, the first class score is any value between 0 and 1, and assuming that there are 4 gradient boosting decision tree models, the feature information is input to the 4 gradient boosting decision tree models, and 4 first class scores are determined, for example, the first class score corresponding to the first gradient boosting decision tree model is 0.8, the first class score corresponding to the second gradient boosting decision tree model is 0.6, the first class score corresponding to the third gradient boosting decision tree model is 0.7, and the first class score corresponding to the fourth gradient boosting decision tree model is 0.3.
And S106, determining a second category score of the target video according to the at least two first category scores.
According to one or more embodiments, the second category score of the target video is determined by the at least two first category scores through a voting mechanism of ensemble learning, wherein the ensemble learning is to generate a plurality of learners through a certain rule, combine the learners by adopting a certain ensemble strategy, and finally comprehensively judge and output a final result.
According to one or more embodiments, determining a weighting coefficient corresponding to each of the at least two first category scores; multiplying the at least two first category scores by corresponding weight coefficients respectively to determine at least two numerical values; determining a sum of the at least two numerical values as the second category score.
According to one or more embodiments, it is assumed that there are four gradient boosting decision tree models, a weight coefficient corresponding to a first gradient boosting decision tree model is 0.2, a weight coefficient corresponding to a second gradient boosting decision tree model is 0.3, a weight coefficient corresponding to a third gradient boosting decision tree model is 0.1, a weight coefficient corresponding to a fourth gradient boosting decision tree model is 0.4, a first category score corresponding to the first gradient boosting decision tree model is 0.8, a first category score corresponding to the second gradient boosting decision tree model is 0.6, a first category score corresponding to the third gradient boosting decision tree model is 0.7, and a first category score corresponding to the fourth gradient boosting decision tree model is 0.3; multiplying each of the four first category scores by its corresponding weighting factor to determine at least four values, 0.2 x 0.8-0.16, 0.3 x 0.6-0.18, 0.1 x 0.7-0.07, and 0.4 x 0.3-0.12; the second category score is determined as a sum of 0.16, 0.18, 0.07, and 0.12, 0.16+0.18+1.07+0.12 being 0.53.
In one or more embodiments, after step S106, the method further includes other steps, specifically a method flow of video data processing, as shown in fig. 2:
and step S107, judging whether the second category score is larger than or equal to a set threshold value.
Step S1081, in response to the second category score being greater than or equal to the set threshold, the target video corresponding to the feature information is a first category video.
According to one or more embodiments, assuming that the threshold is set to 0.5, if the second category score is greater than or equal to 0.5, for example, the second category score is 0.53, the target video is the first category video, and the premium video.
Or, in step S1082, in response to that the second category score is smaller than the set threshold, the target video corresponding to the feature information is the second category video.
According to one or more embodiments of the present invention, if the threshold is set to 0.5, if the second category score is less than 0.5, for example, the second category score is 0.45, the target video is a second category video and a gray video, which may also be referred to as a progressive video, a poor video, and the like, which is not limited in the embodiments of the present invention.
In one or more embodiments, assuming that a second category score of ten thousand target videos is determined, ten thousand videos are sorted from high to low according to the second category score, and a specified number of target videos sorted at the last are selected to be determined as the second category videos, for example, the targets ranked at 6001 to 10000 are the second category videos.
In one or more embodiments, as shown in fig. 3, the steps of the training process of the gradient boosting decision tree model are as follows:
step S200, at least two characteristic information training sets are obtained, wherein the characteristic information training sets comprise historical characteristic information.
According to one or more embodiments, each of the training sets of characteristic information includes a plurality of positive samples and a plurality of negative samples, and in order to ensure the accuracy of the training set, the number of positive samples may be equal to the number of negative samples, and assuming that the number of positive samples is 2000, the number of negative samples is also 2000.
According to one or more embodiments, since the number of negative samples is small, for example, in ten thousand samples, there are 2000 negative samples and 8000 positive samples, if 4 feature training sets are required, the 8000 positive samples are randomly divided into four parts, each of 2000 positive samples, and the 4 feature training sets include one of the four positive samples and 2000 negative samples.
Step S201, training at least two gradient lifting decision tree models according to the at least two characteristic information training sets.
According to one or more embodiments, four gradient boosting decision tree models are trained based on the four feature information training sets, e.g., each of the four feature information training sets is used to train one gradient boosting decision tree model.
In one or more embodiments, as shown in fig. 4, the specific steps of the training process of the gradient boosting decision tree model are as follows:
and step S300, acquiring a historical target video.
Step S301, converting the historical target video into an image sequence, wherein the images in the image sequence are arranged according to a time sequence.
Step S302, detecting at least one frame of image in the image sequence according to at least one detection model, and acquiring first image detection information, wherein the first image detection information is used for representing basic information of the image.
Step S303, determining a first image detection information sequence according to the time sequence of at least one frame of image and the corresponding first image detection information.
And step S304, determining historical characteristic information according to the first image detection information sequence.
And S305, determining at least two characteristic information training sets according to the historical characteristic information.
And S306, training at least two gradient lifting decision tree models according to the at least two characteristic information training sets.
In one or more embodiments, as shown in fig. 5, specifically, a schematic view of a video data processing flow, after a target video is acquired, the specific processing flow includes:
and step S400, converting the target video into an image sequence.
Step S401, respectively detecting at least one frame of image in the image sequence according to a face emotion detection model, a face angle detection model, a face detection model and a gesture detection model to obtain first image detection information, and determining a first image detection information sequence according to the time sequence of the at least one frame of image and the corresponding first image detection information.
And step S402, processing the first image detection information sequence through feature engineering.
And step S403, determining characteristic information.
The method specifically comprises the following steps: basic feature information, face emotion feature information, face angle feature information, face feature information, and gesture feature information.
And S404, inputting the characteristic information into the four gradient lifting decision tree models, and determining four first-class scores.
The four gradient lifting decision tree models are obtained by training through four training sets, for example, a first gradient lifting decision tree model is obtained through a training set 1, a second gradient lifting decision tree model is obtained through a training set 2, a third gradient lifting decision tree model is obtained through a training set 3, and a fourth gradient lifting decision tree model is obtained through a training set 4.
And step S405, determining a second category score of the target video according to the four first category scores according to a voting mechanism of ensemble learning.
In one or more embodiments, the result screening model may also be determined by integrating the learned voting mechanism with the four gradient boosting decision tree models, where the result screening model is a sum of products of the four gradient boosting decision tree models and corresponding weight coefficients, for example, a weight coefficient of 1 + a first gradient boosting decision tree model + a weight coefficient of 2 + a second gradient boosting decision tree model + a weight coefficient of 3 + a third gradient boosting decision tree model + a weight coefficient of 4 + a fourth gradient boosting decision tree model.
According to one or more embodiments, if the target video capture device includes but is not limited to: personal Computers (PCs), tablet computers, handheld devices (e.g., smart phones, palm top computers), in-vehicle devices, wearable devices, computing devices or other processing devices connected to a wireless modem, and the like. The acquisition devices of the target video in different networks may be called different names, for example: user equipment, access terminal, subscriber unit, subscriber station, mobile station, remote terminal, mobile device, user terminal, wireless communication device, user agent or user equipment, cellular telephone, cordless telephone, Personal Digital Assistant (PDA), terminal equipment in a 5G network or future evolution network, and the like.
Fig. 6 is a schematic diagram of an apparatus for processing video data according to an embodiment of the present invention, where the apparatus includes: an acquisition unit 61, a conversion unit 62, a detection unit 63, a processing unit 64, and a determination unit 65. The acquiring unit 61 is used for acquiring a target video; a conversion unit 62, configured to convert the target video into an image sequence, where images in the image sequence are arranged in a time sequence; a detecting unit 63, configured to detect at least one frame of image in the image sequence according to at least one detection model, and obtain first image detection information, where the first image detection information is used to represent basic information of the image; a processing unit 64 for determining a first image detection information sequence according to the temporal order of at least one frame of image and the corresponding first image detection information; the processing unit 64 is further configured to determine feature information according to the first image detection information sequence; the processing unit 64 further inputs the feature information into at least two gradient boosting decision tree models respectively, and determines at least two first category scores respectively, where each gradient boosting decision tree model is used to classify the teaching courses corresponding to the teaching feature information; a determining unit 65, configured to determine a second category score of the target video according to the at least two first category scores.
In one or more embodiments, the apparatus further comprises: and the judging unit is used for responding to the second category score being larger than or equal to a set threshold value and setting the target video corresponding to the characteristic information as the first category video.
In one or more embodiments, the determining unit is further configured to: and in response to the second category score being smaller than a set threshold, the target video corresponding to the characteristic information is set as a second category video.
In one or more embodiments, the conversion unit is specifically configured to: and acquiring a frame of image in the target video at set intervals, encoding the acquired images according to a time sequence, and forming an image sequence by the encoded images.
In one or more embodiments, the determining unit is specifically configured to: determining weighting coefficients corresponding to the at least two first category scores respectively; multiplying the at least two first category scores by corresponding weight coefficients respectively to determine at least two numerical values; determining a sum of the at least two numerical values as the second category score.
In one or more embodiments, the first image detection information includes: at least one of basic information, face emotion information, face angle information, face information, and gesture information.
In one or more embodiments, the feature information includes: at least one of basic feature information, face emotion feature information, face angle feature information, face feature information, and gesture feature information.
In one or more embodiments, the detection model comprises: at least one of a face emotion detection model, a face angle detection model, a face detection model, and a gesture detection model.
In one or more embodiments, the apparatus further comprises a training unit: the method comprises the steps of obtaining at least two characteristic information training sets, wherein the characteristic information training sets comprise historical characteristic information; and training at least two gradient lifting decision tree models according to the at least two characteristic information training sets.
Fig. 7 is a schematic diagram of an electronic device of an embodiment of the invention. The electronic device shown in fig. 7 is a general-purpose data processing apparatus comprising a general-purpose computer hardware structure including at least a processor 71 and a memory 72. The processor 71 and the memory 72 are connected by a bus 73. The memory 72 is adapted to store instructions or programs executable by the processor 71. The processor 71 may be a stand-alone microprocessor or a collection of one or more microprocessors. Thus, the processor 71 implements the processing of data and the control of other devices by executing instructions stored by the memory 72 to perform the method flows of embodiments of the present invention as described above. The bus 73 connects the above-described components together, and also connects the above-described components to a display controller 74 and a display device and an input/output (I/O) device 75. Input/output (I/O) devices 75 may be a mouse, keyboard, modem, network interface, touch input device, motion sensing input device, printer, and other devices known in the art. Typically, the input/output devices 75 are connected to the system through input/output (I/O) controllers 76.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, various aspects of embodiments of the invention may take the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a "circuit," module "or" system. Furthermore, various aspects of embodiments of the invention may take the form of: a computer program product embodied in one or more computer readable media having computer readable program code embodied thereon.
Any combination of one or more computer-readable media may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of embodiments of the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to: electromagnetic, optical, or any suitable combination thereof. The computer readable signal medium may be any of the following computer readable media: is not a computer readable storage medium and may communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of embodiments of the present invention may be written in any combination of one or more programming languages, including: object oriented programming languages such as Java, Smalltalk, C + +, and the like; and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package; executing in part on a user computer and in part on a remote computer; or entirely on a remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention described above describe various aspects of embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (12)

1. A method of video data processing, comprising:
acquiring a target video;
converting the target video into an image sequence, wherein images in the image sequence are arranged according to a time sequence;
detecting at least one frame of image in the image sequence according to at least one detection model to obtain first image detection information, wherein the first image detection information is used for representing basic information of the image;
determining a first image detection information sequence according to the time sequence of the at least one frame of image and the corresponding first image detection information;
determining characteristic information according to the first image detection information sequence;
inputting the characteristic information into at least two gradient lifting decision tree models respectively, and determining at least two first category scores respectively, wherein each gradient lifting decision tree model is used for classifying teaching courses corresponding to the teaching characteristic information;
determining a second category score of the target video according to the at least two first category scores.
2. The method of claim 1, wherein the method further comprises:
and in response to the second category score being larger than or equal to a set threshold value, the target video corresponding to the characteristic information is set as the first category video.
3. The method of claim 1, wherein the method further comprises:
and in response to the second category score being smaller than a set threshold, the target video corresponding to the characteristic information is set as a second category video.
4. The method according to claim 1, wherein said converting the target video into a sequence of images comprises:
and acquiring a frame of image in the target video at set intervals, encoding the acquired images according to a time sequence, and forming an image sequence by the encoded images.
5. The method according to claim 1, wherein the determining the second category score of the target video according to the at least two first category scores comprises:
determining weighting coefficients corresponding to the at least two first category scores respectively;
multiplying the at least two first category scores by corresponding weight coefficients respectively to determine at least two numerical values;
determining a sum of the at least two numerical values as the second category score.
6. The method of claim 1, wherein the first image detection information comprises: at least one of basic information, face emotion information, face angle information, face information, and gesture information.
7. The method of claim 1, wherein the characteristic information comprises: at least one of basic feature information, face emotion feature information, face angle feature information, face feature information, and gesture feature information.
8. The method of claim 1, wherein the detection model comprises: at least one of a face emotion detection model, a face angle detection model, a face detection model, and a gesture detection model.
9. The method of claim 1, wherein the training process of the gradient boosting decision tree model comprises:
acquiring at least two characteristic information training sets, wherein the characteristic information training sets comprise historical characteristic information;
and training at least two gradient lifting decision tree models according to the at least two characteristic information training sets.
10. An apparatus for video data processing, comprising:
an acquisition unit configured to acquire a target video;
a conversion unit, configured to convert the target video into an image sequence, where images in the image sequence are arranged in a time sequence;
the detection unit is used for detecting at least one frame of image in the image sequence according to at least one detection model to obtain first image detection information, wherein the first image detection information is used for representing basic information of the image;
the processing unit is used for determining a first image detection information sequence according to the time sequence of at least one frame of image and corresponding first image detection information;
the processing unit is further configured to determine feature information according to the first image detection information sequence;
the processing unit is further configured to input the feature information into at least two gradient boosting decision tree models respectively, and determine at least two first category scores respectively, where each gradient boosting decision tree model is used to classify teaching courses corresponding to the teaching feature information;
a determining unit, configured to determine a second category score of the target video according to the at least two first category scores.
11. An electronic device comprising a memory and a processor, wherein the memory is configured to store one or more computer program instructions, wherein the one or more computer program instructions are executed by the processor to implement the method of any of claims 1-9.
12. A computer-readable storage medium on which computer program instructions are stored, which computer program instructions, when executed by a processor, implement the method of any one of claims 1-9.
CN202010740279.2A 2020-07-28 2020-07-28 Video data processing method and device and electronic equipment Pending CN111967346A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010740279.2A CN111967346A (en) 2020-07-28 2020-07-28 Video data processing method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010740279.2A CN111967346A (en) 2020-07-28 2020-07-28 Video data processing method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN111967346A true CN111967346A (en) 2020-11-20

Family

ID=73362977

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010740279.2A Pending CN111967346A (en) 2020-07-28 2020-07-28 Video data processing method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN111967346A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130142418A1 (en) * 2011-12-06 2013-06-06 Roelof van Zwol Ranking and selecting representative video images
CN110363084A (en) * 2019-06-10 2019-10-22 北京大米科技有限公司 A kind of class state detection method, device, storage medium and electronics
CN111028216A (en) * 2019-12-09 2020-04-17 Oppo广东移动通信有限公司 Image scoring method and device, storage medium and electronic equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130142418A1 (en) * 2011-12-06 2013-06-06 Roelof van Zwol Ranking and selecting representative video images
CN110363084A (en) * 2019-06-10 2019-10-22 北京大米科技有限公司 A kind of class state detection method, device, storage medium and electronics
CN111028216A (en) * 2019-12-09 2020-04-17 Oppo广东移动通信有限公司 Image scoring method and device, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
CN110147456B (en) Image classification method and device, readable storage medium and terminal equipment
CN109491915B (en) Data processing method and device, medium and computing equipment
CN109214501B (en) Method and apparatus for identifying information
US20230048386A1 (en) Method for detecting defect and method for training model
CN115311676A (en) Picture examination method and device, computer equipment and storage medium
CN113239914A (en) Classroom student expression recognition and classroom state evaluation method and device
CN115829058A (en) Training sample processing method, cross-modal matching method, device, equipment and medium
CN112766402A (en) Algorithm selection method and device and electronic equipment
CN111797822B (en) Text object evaluation method and device and electronic equipment
CN110704614B (en) Information processing method and device for predicting user group type in application
CN111967346A (en) Video data processing method and device and electronic equipment
CN116977256A (en) Training method, device, equipment and storage medium for defect detection model
CN114187751B (en) Adaptability evaluation method, device and equipment of early warning system and readable storage medium
US20220327450A1 (en) Method for increasing or decreasing number of workers and inspectors in crowdsourcing-based project for creating artificial intelligence learning data
CN114237182B (en) Robot scheduling method and system
CN113469090B (en) Water pollution early warning method, device and storage medium
CN114648688A (en) Method, system and equipment for evaluating landscape level along high-speed rail and readable storage medium
CN114677622A (en) Video frame selection method and device, electronic equipment and computer readable storage medium
CN115147353A (en) Defect detection model training method, device, equipment, medium and program product
CN111522943A (en) Automatic test method, device, equipment and storage medium for logic node
CN112434717A (en) Model training method and device
CN111062468B (en) Training method and system for generating network, and image generation method and device
CN113344056B (en) Training method and device of personnel mobility prediction model
CN115379259B (en) Video processing method, device, electronic equipment and storage medium
CN117390522B (en) Online deep learning level prediction method and device based on process and result fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination