CN112446237A

CN112446237A - Video classification method, device, equipment and computer readable storage medium

Info

Publication number: CN112446237A
Application number: CN201910805672.2A
Authority: CN
Inventors: 王平; 周志超; 迟至真; 龙翔; 赵翔; 李甫; 何栋梁; 张赫男; 孙昊; 丁二锐; 文石磊
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2019-08-29
Filing date: 2019-08-29
Publication date: 2021-03-05

Abstract

The application discloses a video classification method, a video classification device, video classification equipment and a computer readable storage medium, and relates to a big data classification technology. The specific implementation scheme is as follows: acquiring a data set to be processed from a data server, and performing optimization operation on the data set to be processed according to an optimization request sent by terminal equipment to obtain a data set to be trained; training the model to be trained through the data set to be trained to obtain a class recognition network model; the method comprises the steps that at least one sports video to be identified sent by terminal equipment is obtained, the sports video to be identified is identified through a category identification network model, and sports category information corresponding to the sports video to be identified is obtained; and classifying the sports videos to be recognized according to the sports category information of the sports videos to be recognized. By means of the technical means of optimizing the database to be processed and carrying out video classification according to the model obtained by training the optimized data to be trained, the technical problem that the accuracy rate is low when video classification is carried out according to keywords corresponding to videos is solved.

Description

Video classification method, device, equipment and computer readable storage medium

Technical Field

The present disclosure relates to the field of data processing, and in particular, to a big data classification technique.

Background

In recent years, with the rise of video websites and various large live apps, parsing and classification of video content has become increasingly important. Sports is a common video subject, and the research on the algorithm for classifying sports videos has a great application prospect in user search and directional recommendation.

Previously, the classification of sports videos was screened by keywords in the textual description. The small videos uploaded by many users have the problems of random naming, irrelevant video characters and the like, and screen missing and screen error can be caused.

Disclosure of Invention

The application provides a video classification method, a video classification device, video classification equipment and a computer readable storage medium, which are used for solving the technical problem that the accuracy is low when the existing video classification is carried out according to keywords corresponding to videos.

In a first aspect, an embodiment of the present application provides a method for video classification, which is applied to a video classification device, and includes:

acquiring a data set to be processed from a data server, wherein the data set to be processed comprises video data of a plurality of marked sports categories;

acquiring an optimization request sent by terminal equipment, and performing optimization operation on the data set to be processed according to the optimization request to obtain a data set to be trained;

training a preset model to be trained through the data set to be trained to obtain a class recognition network model;

acquiring at least one sports video to be identified sent by terminal equipment, identifying the at least one sports video to be identified through the category identification network model, and acquiring sports category information corresponding to each sports video to be identified;

and classifying the at least one sports video to be identified according to the sports category information corresponding to each sports video to be identified.

According to the video classification method provided by the embodiment, the preset data to be processed is optimized to obtain the data set to be trained, the preset model to be trained is trained through the data set to be trained to obtain the category identification network model, the sports type of the obtained sports video data to be identified is identified through the category identification network model to obtain the sports category information corresponding to the sports video to be identified, and the sports video to be identified can be effectively classified according to the sports category information corresponding to the sports video to be identified. Therefore, the identification of the sports category videos in the live broadcast platform or the video website can be effectively realized, and in addition, the subsequent video classification can better meet the personalized requirements of domestic users through the optimization operation of the data set to be processed.

In one possible design, the optimization request includes a deletion request, where the deletion request includes a preset first sports category identifier;

correspondingly, the optimizing the to-be-processed data set according to the optimization request to obtain a to-be-trained data set includes:

and deleting the sports video data corresponding to the preset first sports category identification in the data set to be processed to obtain a data set to be trained.

According to the video classification method provided by the embodiment, the video data corresponding to the first sports category identification in the data set to be processed is deleted according to the deletion request to obtain the data set to be trained, so that the sports categories corresponding to the video data in the data set to be trained can meet the requirements of domestic users better, and the accuracy of the deep learning model obtained by training according to the data set to be trained is higher in the video classification process.

In one possible design, the optimization request includes a renaming request, where the renaming request includes at least one preset second sports category identifier and a target sports category identifier;

and modifying the category of the sports video data corresponding to the preset second sports category identification in the data set to be processed into the target sports category identification according to the renaming request, and obtaining the data set to be trained.

In the video classification method provided in this embodiment, the category of the sports video data corresponding to the preset second sports category identifier in the to-be-processed data set is modified to the target sports category identifier according to the renaming request, so that the identifiers of the video data in the to-be-trained data set can be more agreed, and further, in the video classification process, the accuracy of the deep learning model obtained by training according to the to-be-trained data set is higher.

In one possible design, the optimization request includes an addition request, and the addition request includes sports video data corresponding to at least one third sports category identifier;

and adding the sports video data corresponding to the at least one third sports category identifier into the data set to be processed according to the adding request to obtain the data set to be trained.

According to the video classification method provided by the embodiment, the sports video data corresponding to the at least one third sports category identifier is added to the data set to be processed according to the adding request, so that the sports categories corresponding to the video data in the data set to be trained can meet the requirements of domestic users better, and the accuracy of the deep learning model obtained by training according to the data set to be trained is higher in the video classification process.

In one possible design, the optimization request includes a video quality adjustment request including at least one of a chroma parameter, a size parameter, and an angle parameter;

and adjusting the quality of the sports video data in the data set to be processed according to the video quality adjusting request to obtain the data set to be trained.

According to the video classification method provided by the embodiment, after the video quality adjustment request is obtained, the quality of sports video data in a data set to be processed can be adjusted according to the video quality adjustment request, so that the video quality corresponding to the video data in the data set to be trained can meet the requirements of domestic users, and the accuracy of a deep learning model obtained by training according to the data set to be trained is higher in the video classification process.

In a possible design, before the training of the preset first model to be trained by the data set to be trained, the method further includes:

randomly extracting at least one video paragraph for each sports video in the data set to be trained;

and for each video paragraph, extracting a preset number of image information from the video paragraph according to a preset sampling frequency, and inputting the preset number of image information to the first model to be trained.

In the video classification method provided by this embodiment, the video paragraphs of each sports video in the data set to be trained are randomly extracted, a preset number of pieces of image information are extracted from the video paragraphs according to a preset sampling frequency for each video paragraph, and the image information is input to the first model to be trained, so that time information is added to the input image information in addition to the two-dimensional information of the image, time domain information is fully utilized, and the category identification network model obtained through training of the data set to be trained can better identify the video information. In addition, the data volume can be effectively reduced and the model training efficiency can be improved on the basis of keeping the characteristics of the video information through the operation.

In one possible design, the identifying, by the category identification network model, the at least one sports video to be identified includes:

inputting the sports video to be identified into the category identification network model, wherein the category identification network model is used for performing complete convolution operation with different convolution kernel sizes on the sports video to be identified in three dimensions of time, height and width to obtain three pairs of output results, adding the six output results, performing global average pooling operation on the added output results to obtain pooled output results, inputting the pooled output results into two full-connected layers which firstly reduce dimensions and then increase dimensions to obtain matrixes corresponding to the three pairs of output results, performing weighting operation on the matrixes according to the weight of each matrix to obtain output vectors, and determining the sports category information of the sports video to be identified according to the output vectors.

According to the video classification method provided by the embodiment, through the complete convolution operation of different convolution kernels of the sports video to be recognized in three dimensions of time, length and width, the original 2D convolution in a single direction is extended into the 2D convolution in 3 directions, time domain information is fully utilized, the time sequence relation between the previous frame and the next frame is considered, the video classification is more differentiated through the change characteristics of capturing actions, in addition, after the global average pooling operation and the dimension reduction and dimension increase are carried out on the convolution results, the matrixes corresponding to the three pairs of output results are weighted, the calculation resources can be saved, and the performance of the video classification device is improved.

In one possible design, the class-identifying network model includes a CoSKNet model.

According to the video classification method provided by the embodiment, the CoSKNet model is selected to identify the category of the sports video to be identified, so that the calculation amount is reduced and the efficiency of video classification identification is improved on the basis of improving the identification accuracy.

In a second aspect, an embodiment of the present application provides a video classification apparatus, including:

the system comprises a to-be-processed data acquisition module, a to-be-processed data acquisition module and a data processing module, wherein the to-be-processed data acquisition module is used for acquiring a to-be-processed data set from a data server, and the to-be-processed data set comprises a plurality of video data marked with sports categories;

the optimization module is used for acquiring an optimization request sent by terminal equipment, and performing optimization operation on the data set to be processed according to the optimization request to acquire a data set to be trained;

the training module is used for training a preset model to be trained through the data set to be trained to obtain a class recognition network model;

the identification module is used for acquiring at least one to-be-identified sports video sent by the terminal equipment, identifying the at least one to-be-identified sports video through the category identification network model, and acquiring sports category information corresponding to each to-be-identified sports video;

and the classification module is used for performing classification operation on the at least one sports video to be identified according to the sports category information corresponding to each sports video to be identified.

In a third aspect, the present application provides an electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect.

In a fourth aspect, the present application provides a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of the first aspect.

In a fifth aspect, the present application provides a method for video classification, including:

acquiring a data set to be processed, wherein the data set to be processed comprises a plurality of video data marked with sports categories;

obtaining an optimization request, and performing optimization operation on the data set to be processed according to the optimization request to obtain a data set to be trained;

and acquiring at least one sports video to be identified, identifying the at least one sports video to be identified through the category identification network model, and acquiring the sports category information corresponding to each sports video to be identified.

In a sixth aspect, the present application provides a computer program comprising program code for performing the method according to the first or fifth aspect when the computer program is run by a computer.

One embodiment in the above application has the following advantages or benefits: the sports video data can be accurately classified. Due to the adoption of the technical means of optimizing the database to be processed and identifying the sports video category to be identified through the CoSKNet model, the technical problem that the accuracy is low when the existing video classification is carried out according to the keywords corresponding to the videos is solved, and the technical effect of accurately classifying the sports video data is achieved.

Other effects of the above-described alternative will be described below with reference to specific embodiments.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is a diagram of a network architecture upon which the present application is based;

fig. 2 is a schematic flowchart of a video classification method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an interactive interface provided by an embodiment of the present application;

fig. 4 is a schematic flowchart of a video classification method according to a second embodiment of the present application;

FIG. 5 is a network architecture diagram of a class identification network model provided by an embodiment of the present application;

fig. 6 is a schematic structural diagram of a video classification apparatus according to a third embodiment of the present application;

fig. 7 is a schematic structural diagram of a server according to an embodiment of the present application;

fig. 8 is a schematic flowchart of a video classification method according to a fourth embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In order to solve the technical problem that the accuracy is low when video classification is performed according to keywords corresponding to videos in the prior art, the application provides a video classification method, a video classification device, video classification equipment and a computer-readable storage medium.

It should be noted that the video classification method, apparatus, device and computer-readable storage medium provided in the present application can be applied to scenes for identifying any kind of video data.

Fig. 1 is a diagram of a network architecture on which the present application is based, and as shown in fig. 1, the network architecture on which the present application is based at least includes a video classification apparatus 1, a data server 2, and a terminal device 3. The video classification device 1 is in communication connection with the data server 2 and the terminal device 3, so that the video classification device 1 can acquire a data set to be processed from the data server 2 and can receive video data to be identified or an optimization operation instruction sent by a user through the terminal device 3. The video classification device 1 is written in languages such as C/C + +, Java, Shell, or Python, the data server library 2 may be a cloud server or a server cluster, a large amount of data is stored in the cloud server or the server cluster, and the terminal device 3 may be a desktop computer, a tablet computer, or the like.

Fig. 2 is a schematic flowchart of a video classification method according to an embodiment of the present application; fig. 3 is a schematic view of an interactive interface provided by an embodiment of the present application, where the method is applied to a video classification device, as shown in fig. 2 to 3, the method includes,

step 101, acquiring a data set to be processed from a data server, wherein the data set to be processed comprises a plurality of video data labeled with sports categories.

The main implementation of the present embodiment is a video classification apparatus. To enable classification of sports video data, a deep learning model may be built. Therefore, the data to be processed can be acquired first. The video classification device 1 may be in communication connection with the data server 2, so as to perform information interaction with the data server 2, and the data server 2 may store data of multiple categories, for example, the data may include sports category data, musical instrument category data, dance category data, and the like, and it should be noted that training a model using data of different categories enables classification and identification of data of different categories by the trained model. To achieve classification of sports video data, sports category data including video data of a plurality of tagged sports categories may be obtained from the data server. For example, the sports category data may include football video data, which corresponds to a football identifier; basketball video data can also be included, and the corresponding mark of the basketball video data is basketball; the method may further include the strut skip video data, the corresponding identifier of which is the strut skip, and in addition, the method further includes various other types of video data, which is not limited herein.

It should be noted that the sports category data may be an open-source sports slm data set, or may be any other data set including video data of multiple sports categories and category identifiers corresponding to the video data of each sports category, which is not limited herein.

And 102, acquiring an optimization request sent by the terminal equipment, and performing optimization operation on the data set to be processed according to the optimization request to obtain a data set to be trained.

In this embodiment, since the sports category data collected in step 101 cannot meet the personalized requirements of the domestic users on the sports categories, it is also necessary to perform an optimization operation on the to-be-processed data set. For example, the to-be-processed data set may include cricket category videos, but the popularity of the sports is not high in China; or for video data of the same sports category, which are labeled with different identifications; or the shooting angle and the tone of the image do not meet the preference of the user. Therefore, an optimization request sent by a user through the terminal device can be obtained, the optimization request of the user is responded, the data set to be processed can be optimized according to the optimization request, the data set to be trained is obtained, and therefore the preset deep learning model can be trained according to the data set to be trained.

And 103, training a preset model to be trained through the data set to be trained to obtain a class recognition network model.

In this embodiment, after the data set to be trained that can meet the personalized requirements of the user is acquired, the preset model to be trained can be trained through the data set to be trained until the model to be trained converges, and the trained category identification network model is acquired, and the category identification network model can identify the category of the input sports video to be identified.

Specifically, on the basis of any of the above embodiments, the class identification network model includes CoSKNet.

And step 104, acquiring at least one to-be-identified sports video sent by the terminal equipment, identifying the at least one to-be-identified sports video through the category identification network model, and acquiring sports category information corresponding to each to-be-identified sports video.

In this embodiment, after the classification recognition network model is obtained by training, the video classification can be recognized by the classification recognition network model. The video classification apparatus 1 is also in communication connection with the terminal device 3, so that the video classification apparatus 1 can perform information interaction with the terminal device 3. Specifically, the sports video to be recognized sent by the terminal device 3 may be acquired, and the category of the sports video to be recognized is recognized through the trained classification recognition network model.

And 105, classifying the at least one sports video to be recognized according to the sports category information corresponding to each sports video to be recognized.

In the embodiment, after the category identification operation is performed on at least one sports video to be identified, the sports category of each sports video to be identified can be accurately determined. So that at least one sports video to be identified can be classified. For example, sports videos to be recognized may be classified into a basketball category, a football category, a swimming category, and the like. Aiming at video software or live broadcast software, in the using process of a user, videos of instruction categories can be accurately pushed to the user according to classified results, and therefore user experience can be improved. As shown in fig. 3, when using video software or live broadcast software, a user may search for a sports video of a specific category according to his/her own needs to watch, for example, the user may search for a basketball-type video to watch. After the sports videos to be recognized are recognized and classified, all videos corresponding to the basketball videos can be obtained according to search information input by the user and displayed to the user.

Further, on the basis of any of the above embodiments, the optimization request includes a deletion request, where the deletion request includes a preset first sports category identifier;

correspondingly, step 102 specifically includes:

In this embodiment, since the sports category data in the open-source to-be-processed data set cannot meet the personalized requirements of the domestic users on the sports categories, it is also necessary to perform optimization operation on the to-be-processed data set. For example, the to-be-processed data set may include cricket category videos, but the popularity of the sports is not high in China; the dataset to be processed also includes videos of the football category, and the popularity of football sports in China is not high. Therefore, in order to optimize the data set to be processed, the sports with high popularity in China can be counted and sorted in advance, and the first sports category identification of the sports category which is not popular can be determined according to the statistical information. Correspondingly, an optimization request sent by a user can be received, wherein the optimization request comprises a deletion request, and the deletion request comprises a preset first sports category identifier. And then, according to the deletion request, deleting the video data corresponding to the first sports category identifier in the data set to be processed to obtain a data set to be trained.

Further, on the basis of any of the above embodiments, the optimization request includes a renaming request, where the renaming request includes at least one preset second sports category identifier and a target sports category identifier;

correspondingly, step 102 specifically includes:

In this embodiment, since the sports category data in the open-source to-be-processed data set cannot meet the personalized requirements of the domestic users on the sports categories, it is also necessary to perform optimization operation on the to-be-processed data set. For example, the data sets to be processed are for video data of the same sports category, which are labeled with different identifications. Specifically, after the optimization request is obtained, the category of the sports video data corresponding to the preset second sports category identifier in the to-be-processed data set may be modified to the target sports category identifier according to the renaming request, so as to obtain the to-be-trained data set.

Further, on the basis of any of the above embodiments, the optimization request includes an addition request, where the addition request includes sports video data corresponding to at least one third sports category identifier;

correspondingly, step 102 specifically includes:

In this embodiment, the to-be-trained data set that may be open-source does not include some sports category video data of the local nature of china, and therefore, in order to further expand the recognition range of video recognition, some video data with the local nature of china may be counted in advance and added to the to-be-processed data set to obtain the to-be-trained data set. Specifically, the sports video data corresponding to the at least one third sports category identifier may be added to the to-be-processed data set according to the addition request, so as to obtain the to-be-trained data set. Wherein the third athletic category includes, but is not limited to, martial arts, table tennis, etc.

Further, on the basis of any of the above embodiments, the optimization request includes a video quality adjustment request, where the video quality adjustment request includes at least one of a chroma parameter, a size parameter, and an angle parameter;

correspondingly, step 102 specifically includes:

In this embodiment, the video quality of the video data in the data set to be processed may be different from the video quality that is used by domestic users. For example, the chroma parameter, the shooting angle, and the size parameter may be different, and therefore, after the video quality adjustment request is obtained, the quality of sports video data in the data set to be processed may be adjusted according to the video quality adjustment request, so as to obtain the data set to be trained.

Fig. 4 is a schematic flowchart of a video classification method according to a second embodiment of the present application, where on the basis of any of the foregoing embodiments, as shown in fig. 4, step 103 further includes:

step 201, randomly extracting at least one video paragraph for each sports video in the data set to be trained;

step 202, for each video paragraph, extracting a preset number of image information from the video paragraph according to a preset sampling frequency, and inputting the preset number of image information to the first model to be trained.

In this embodiment, in order to improve the efficiency of model training, before training a model to be trained, at least one video segment is randomly extracted for each video data of sports data in a data set to be trained, and for each video segment, a preset number of image information may be extracted according to a preset sampling frequency. The image information of the preset number is input into the first model to be trained, and the model to be trained is trained, so that the accuracy of the first model to be trained can be guaranteed, and the training efficiency of the model can be improved. For example, a video segment may be randomly extracted from the video data of the sports data, the video segment includes 64 consecutive frames with the size of 224 × 224, eight frames with the dimension of 8 × 224 may be extracted from the video segment at equal intervals, and the eight frames are input into the model to be trained.

Fig. 5 is a network architecture diagram of a category identification network model provided in an embodiment of the present application, and on the basis of any of the above embodiments, step 104 specifically includes:

In this embodiment, in order to realize the identification of the category of the sports video to be identified, image information corresponding to the sports video to be identified may be input to the category identification network model obtained through training. FIG. 5 is a network architecture diagram of a class-specific network model, as shown in FIG. 5, that is capable of performing convolution operations on image information in three dimensions, time, width, and height, for a class-specific network typeIn each dimension, the network architecture specifically includes a Split (Split) layer, a Fuse (Fuse) layer, and a Select (Select) layer. After receiving the sports video to be identified, the Split layer can perform complete convolution operation with different convolution kernel sizes on the image information through three dimensions of time, height and width. For example, it may perform a convolution operation of 3 × 3 and 5 × 5 on the image information to obtain three pairs of output results, where the three pairs of results are output results corresponding to three dimensions respectively. After obtaining the output result, the Fuse layer may sum the three pairs of results, and perform a global average pooling operation on the summed output result, as shown in fig. 4, F_gpThe representation performs global average pooling operation on the summed output results, inputs the pooled output results to two fully-connected layers of which dimensionality is reduced and dimensionality is raised, and obtains a matrix corresponding to three pairs of output results, as shown in fig. 4, wherein F is_fcAnd representing two fully connected layers with dimension reduction and dimension increase. After the matrixes corresponding to the three pairs of output results are obtained, weighting operation can be carried out on the matrixes according to the weight of each matrix to obtain output vectors, and then the sports category information to which the sports video to be identified belongs can be determined according to the output vectors.

Fig. 6 is a schematic structural diagram of a video classification apparatus according to a third embodiment of the present application, and as shown in fig. 4, the video classification apparatus 40 includes:

a to-be-processed data acquiring module 41, configured to acquire a to-be-processed data set from a data server, where the to-be-processed data set includes video data of a plurality of labeled sports categories;

the optimization module 42 is configured to obtain an optimization request sent by a terminal device, and perform optimization operation on the data set to be processed according to the optimization request to obtain a data set to be trained;

a training module 43, configured to train a preset model to be trained through the data set to be trained, so as to obtain a class identification network model;

the identification module 44 is configured to acquire at least one to-be-identified sports video sent by the terminal device, identify the at least one to-be-identified sports video through the category identification network model, and acquire sports category information corresponding to each to-be-identified sports video;

and the classification module 45 is configured to perform a classification operation on the at least one to-be-identified sports video according to the sports category information corresponding to each to-be-identified sports video.

The video classification device that this embodiment provided, through carrying out optimization processing to predetermined pending data, obtain and treat the training data set, train predetermined waiting to train the model through treating the training data set, obtain the category identification network model, and discern the sports type of the sports video data of waiting to discern that acquires through this category identification network model, obtain the sports category information that the sports video that waits to discern corresponds, can treat according to the sports category information that the sports video that waits to discern corresponds and discern the sports video and carry out effectual classification. Therefore, the identification of the sports category videos in the live broadcast platform or the video website can be effectively realized, and in addition, the subsequent video classification can better meet the personalized requirements of domestic users through the optimization operation of the data set to be processed.

accordingly, the optimization module comprises:

and the first optimization unit is used for deleting the sports video data corresponding to the preset first sports category identification in the data set to be processed to obtain a data set to be trained.

accordingly, the optimization module comprises:

and the second optimization unit is used for modifying the category of the sports video data corresponding to the preset second sports category identifier in the data set to be processed into the target sports category identifier according to the renaming request, so as to obtain the data set to be trained.

accordingly, the optimization module comprises:

and the third optimization unit is used for adding the sports video data corresponding to the at least one third sports category identifier to the data set to be processed according to the addition request to obtain the data set to be trained.

accordingly, the optimization module comprises:

and the fourth optimization unit is used for adjusting the quality of the sports video data in the data set to be processed according to the video quality adjustment request to obtain the data set to be trained.

Further, on the basis of any one of the above embodiments, the apparatus further includes:

the paragraph extraction module is used for randomly extracting at least one video paragraph for each sports video in the data set to be trained;

and the frame extracting module is used for extracting a preset number of image information from each video paragraph according to a preset sampling frequency and inputting the preset number of image information into the first model to be trained.

Further, on the basis of any of the above embodiments, the identification module includes:

the identification unit is used for inputting the sports video to be identified into the category identification network model, the category identification network model is used for performing complete convolution operation with different convolution kernel sizes on the sports video to be identified in three dimensions of time, height and width to obtain three pairs of output results, adding the six output results, performing global average pooling operation on the added output results to obtain pooled output results, inputting the pooled output results into two full-connected layers with dimensionality reduction and dimensionality rise, obtaining matrixes corresponding to the three pairs of output results, performing weighting operation on the matrixes according to the weight of each matrix to obtain output vectors, and determining the sports category information of the sports video to be identified according to the output vectors.

Further, on the basis of any of the above embodiments, the class identification network model includes a CoSKNet model.

According to an embodiment of the present application, a server and a readable storage medium are also provided.

Fig. 7 is a block diagram of a server of a video classification method according to an embodiment of the present application, in which the video classification apparatus according to the above-described embodiment is a component of the server. Server is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The server may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 7, the server includes: one or more processors 501, memory 502, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executed within the server, including instructions stored in or on the memory to display graphical information of the GUI on an external input/output device (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple servers may be connected, with each device providing portions of the necessary operations (e.g., as an array of servers, a group of blade servers, or a multi-processor system). Fig. 7 illustrates an example of a processor 501.

Memory 502 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the video classification method provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the video classification method provided herein.

The memory 502, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the video classification method in the embodiment of the present application (for example, the to-be-processed data acquisition module 41, the optimization module 42, the first training module 43, the second training module 44, and the classification module 45 shown in fig. 4). The processor 501 executes various functional applications of the server and data processing, i.e., implements the video classification method in the above-described method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 502.

The memory 502 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the video classification server, and the like. Further, the memory 502 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 502 optionally includes memory located remotely from processor 501, which may be connected to a video classification server via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The server of the video classification method may further include: an input device 503 and an output device 504. The processor 501, the memory 502, the input device 503 and the output device 504 may be connected by a bus or other means, and fig. 7 illustrates the connection by a bus as an example.

The input device 503 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the video classification server, such as a touch screen, keypad, mouse, track pad, touch pad, pointer stick, one or more mouse buttons, track ball, joystick, or other input device. The output devices 504 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

Fig. 8 is a schematic flowchart of a video classification method according to a fourth embodiment of the present application, and as shown in fig. 8, the method includes:

301, acquiring a data set to be processed, wherein the data set to be processed comprises a plurality of video data labeled with sports categories;

step 302, obtaining an optimization request, and performing optimization operation on the data set to be processed according to the optimization request to obtain a data set to be trained;

303, training a preset model to be trained through the data set to be trained to obtain a class recognition network model;

and 304, acquiring at least one sports video to be identified, identifying the at least one sports video to be identified through the category identification network model, and acquiring sports category information corresponding to each sports video to be identified.

The execution subject of the embodiment is a video classification device, and the video classification device may acquire a data set to be processed, where the data set to be processed includes video data of a plurality of labeled sports categories. The data set to be processed may be specifically obtained by the video classification device from a data server, or may be pre-stored in the video classification device, which is not limited herein. After the data set to be processed is obtained, since the data set to be processed may not meet the requirements and exercise habits of domestic users, the data set to be processed also needs to be optimized. Specifically, an optimization request may be obtained, and the data set to be processed is optimized according to the optimization request, so as to obtain the data set to be trained. The optimization request can be sent by the video classification device through the terminal equipment for receiving the user. The class recognition network model is obtained through the preset to-be-trained model by the to-be-trained data set, and then the class recognition of the received to-be-recognized sports video can be carried out according to the class recognition network model.

According to the video classification method provided by the embodiment, the database to be processed is optimized, model training is performed through optimized data, and video category identification is performed according to the model, so that the technical problems that video classification is performed according to keywords corresponding to videos and accuracy is low in the prior art are solved, and the technical effect of accurately classifying sports video data is achieved.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method for video classification is applied to a video classification device, and is characterized in that the method comprises the following steps:

2. The method of claim 1, wherein the optimization request comprises a deletion request, and the deletion request comprises a preset first sports category identifier;

3. The method of claim 1, wherein the optimization request comprises a renaming request, and wherein the renaming request comprises at least one preset second sports category identifier and a target sports category identifier;

4. The method of claim 1, wherein the optimization request comprises an add request including sports video data corresponding to at least one third sports category identifier;

5. The method of claim 1, wherein the optimization request comprises a video quality adjustment request, the video quality adjustment request comprising at least one of a chroma parameter, a size parameter, and an angle parameter;

6. The method according to any one of claims 1 to 5, wherein before training a preset first model to be trained through the data set to be trained, the method further comprises:

7. The method according to any one of claims 1-5, wherein said identifying said at least one sports video to be identified by said category identification network model comprises:

8. The method of any of claims 1-5, wherein the class-specific network model comprises a CoSKNet model.

9. A video classification apparatus, comprising:

10. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

11. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-8.

12. A method of video classification, comprising: