CN113657230B

CN113657230B - Method for training news video recognition model, method for detecting video and device thereof

Info

Publication number: CN113657230B
Application number: CN202110904144.XA
Authority: CN
Inventors: 许冬容; 邓天生; 于天宝; 贠挺; 陈国庆; 林赛群
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-08-06
Filing date: 2021-08-06
Publication date: 2024-04-23
Anticipated expiration: 2041-08-06
Also published as: CN113657230A

Abstract

The disclosure discloses a method and a device for training a news video recognition model, a method and a device for detecting videos, corresponding electronic equipment and a computer-readable storage medium, and relates to the technical field of computers, in particular to the field of video detection. The model training method comprises the following steps: acquiring an image frame from a first predetermined time period of each of a plurality of video files; and training a news video recognition model based on a first sample data set, wherein the first sample data set includes image frames of the acquired image frames having a defined representation of news video features. By the method, training of the news video recognition model can be completed quickly, efficiently and at low cost, so that the detection result of the video file to be detected can be accurately determined.

Description

Method for training news video recognition model, method for detecting video and device thereof

Technical Field

The present disclosure relates to the field of computer technology, and in particular, to the field of video detection. In particular, the present disclosure relates to a method for training a news video recognition model and an apparatus thereof, a method for detecting video and an apparatus thereof, a corresponding electronic device, and a computer-readable storage medium.

Background

With the development of multimedia technology, video has become one of the main sources of information acquired by people. In various video resources, news video is taken as a weight video type, so that people can quickly and timely know various events around the world.

However, in the face of massive video resources in the network, how to effectively screen out videos required by users becomes a new problem. News videos are particularly highly time-efficient. When the search list or push list of the user contains a lot of outdated news, the user will be given unnecessary or even erroneous information, which will seriously degrade the user experience.

Disclosure of Invention

The present disclosure provides a method for training a news video recognition model and an apparatus thereof, a method for detecting a video and an apparatus thereof, a corresponding electronic device, and a computer-readable storage medium.

According to a first aspect of the present disclosure, a method for training a news video recognition model is provided. The method may include: acquiring an image frame from a first predetermined time period of each of a plurality of video files; and training a news video recognition model based on a first sample data set, wherein the first sample data set includes image frames of the acquired image frames having a defined representation of news video features.

According to a second aspect of the present disclosure, a method for detecting video is provided. The method may include: acquiring a plurality of image frames from a second preset time period of a video file to be detected; inputting a plurality of image frames into a news video recognition model; and in the case where the proportion of the number of image frames identified as news video among the plurality of image frames to the total number of image frames of the plurality of image frames exceeds a predetermined threshold, identifying the video file to be detected as news video.

According to a third aspect of the present disclosure, an apparatus for training a news video-recognition model is provided. The apparatus may include: a first image acquisition module configured to acquire an image frame from a first predetermined period of time of each of a plurality of video files; and a model training module configured to train a news video recognition model based on a first sample data set, wherein the first sample data set includes image frames of the acquired image frames having a defined representation of news video features.

According to a fourth aspect of the present disclosure, there is provided an apparatus for detecting video. The apparatus may include: a second image acquisition module configured to acquire a plurality of image frames from a second predetermined period of time of a video file to be detected; an input module configured to input a plurality of image frames into a news video recognition model; and an identification module configured to identify the video file to be detected as a news video in a case where a ratio of the number of image frames identified as the news video among the plurality of image frames to the total number of image frames of the plurality of image frames exceeds a predetermined threshold.

According to a fifth aspect of the present disclosure, an electronic device is provided. The electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method according to the first aspect of the present disclosure.

According to a sixth aspect of the present disclosure, an electronic device is provided. The electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method according to the second aspect of the present disclosure.

According to a seventh aspect of the present disclosure, a computer-readable storage medium is provided. On which a computer program is stored which, when being executed by a processor, implements a method according to the first aspect of the present disclosure.

According to an eighth aspect of the present disclosure, a computer-readable storage medium is provided. On which a computer program is stored which, when being executed by a processor, implements a method according to the second aspect of the present disclosure.

Drawings

The above and other features, advantages and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, wherein like or similar reference numerals denote like or similar elements, in which:

FIG. 1 illustrates an environmental schematic for implementing an embodiment in accordance with the present disclosure;

FIG. 2 illustrates a flowchart of a method for training a news video-recognition model, in accordance with an embodiment of the present disclosure;

FIG. 3 illustrates an environmental schematic for implementing an embodiment in accordance with the present disclosure;

FIG. 4 illustrates a flow chart of a method for detecting video according to an embodiment of the present disclosure;

FIG. 5 illustrates a block diagram of an apparatus for training a news video-recognition model, in accordance with an embodiment of the present disclosure;

FIG. 6 shows a block diagram of an apparatus for detecting video according to an embodiment of the disclosure;

fig. 7 illustrates a block diagram of an apparatus capable of implementing various embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

In describing embodiments of the present disclosure, the term "comprising" and its like should be taken to be open-ended, i.e., including, but not limited to. The term "based on" should be understood as "based at least in part on". The term "one embodiment" or "the embodiment" should be understood as "at least one embodiment". The terms "first," "second," and the like, may refer to different or the same object. Other explicit and implicit definitions are also possible below.

With the development of network technology, the way in which people acquire information is not limited to traditional text and picture forms. More and more people acquire information of interest from the network through video type files. The news video is a special type of video, and provides information with high timeliness and attention. The news video is correctly identified, so that a large number of news video materials can be conveniently and effectively managed and retrieved.

However, conventional schemes typically identify news videos by template matching. However, the template matching-based scheme cannot cover the template features, and the scheme has very limited universality and recognition accuracy. In addition, if each template feature is individually matched, although the recognition accuracy can be improved to a certain extent, the time consumption and the recognition complexity of the system can be seriously increased.

In order to solve at least one of the above problems, the present disclosure provides a method for detecting video, which can quickly and efficiently implement detection and identification of news video at low cost. The detection method performs news video recognition by using a trained model (e.g., a neural network model). To this end, the present disclosure also provides a method for training a news video recognition model.

According to an embodiment of the present disclosure, a model training scheme is presented. In this approach, image frames in a large number of video material having a representation of news video features may be marked as a first sample (e.g., as a positive sample), and a news video recognition model trained based on the first sample. Specifically, the training process of the news video recognition model of the present disclosure may include: acquiring an image frame from a first predetermined time period of each of a plurality of video files; and training the news video recognition model based on a first sample data set, wherein the first sample data set includes image frames of the acquired image frames having a defined representation of news video features. By using the news video recognition model trained based on the mode, video files can be detected so as to screen news related videos, and therefore, the news videos can be subjected to subsequent processing. Therefore, efficient and accurate model training and video detection are realized.

In addition, to optimize the model training scheme, image frames in the bulk video material that do not have a representation of the news video features may also be labeled as a second sample (e.g., as a negative sample), and a news video recognition model may be trained based on the second sample. Therefore, the training speed and the detection accuracy of the model can be increased.

Fig. 1 shows an environmental schematic for implementing an embodiment according to the present disclosure. The example environment 100 includes a computing device 104. The computing device 104 may train the news video-detection model 106 with the training image 102. The news video detection model 106 may be used to identify news videos.

Computing device 104 includes, but is not limited to, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, consumer electronics, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, virtual machines or other computing devices in a cloud platform, and the like.

The computing device 104 receives the training image 102. The training image 102 is from a large number of video-type materials. In some embodiments, the training image 102 includes a first sample data set, wherein each image in the first sample data set includes a predefined news video feature representation. For example, the news video feature representation may be a specific news program background, or one or more news announcers, or a specific news subtitle (e.g., a subtitle column of blue-bottom white). Of course, the news video feature representation may also include other similar items, not limited herein.

In generating the training image 102, images may be taken from a number of different types of video, respectively. Illustratively, the first 5-15 seconds of each video may be framed, for example 1 frame per second, to generate 10 images. One or more images are then randomly selected from the 10 images generated and annotated according to the news video feature representation described above. Alternatively, the images may be scaled to a uniform size and have a uniform format. For example, the image is adjusted to 380×380 size, RGB format image.

The detection result 108 may be obtained by inputting the annotated image as a training image 102 into a news video detection model 106 in a computing device 104. By verifying the detection result 108, the degree of model convergence can be deduced, and thus whether training of the news video detection model 106 has been completed can be confirmed.

Fig. 2 depicts a flowchart of a method for training a news video-recognition model in accordance with an embodiment of the present disclosure. The method may be implemented, for example, by computing device 104 in fig. 1 or any other suitable device. The news video recognition model may be a neural network model, such as a convolutional neural network model.

At block 202, the computing device 104 obtains image frames of a plurality of video files that are located in a predetermined time period.

In some embodiments, the plurality of image frames are acquired uniformly from a predetermined time period of each video file. In some embodiments, the predetermined period of time is 5-15 seconds after the video begins. It is counted that news videos often have features that characterize news videos with a higher probability in image frames during 5-15 seconds of the video, such as a mouth-cast frame containing a news announcer or a subtitle frame containing a news subtitle. In this way, a training image for training a model can be obtained more efficiently.

In some embodiments, one image frame is taken every second from a predetermined time period of each video file, and one image frame is randomly selected from the image frames, and the selected image frame is finally scaled to a predetermined size. The number of frames of current network video is typically 25-30 frames/second, and image frames within 1 second typically have only subtle differences. To be able to increase the diversity of the samples, one image frame per second can be truncated and randomly selected. In addition, model training may be facilitated by adjusting the image frames to the same size and format.

At block 204, the computing device 104 trains the news video recognition model 106 based on a first sample data set composed of particular image frames, where the particular image frames all have a predefined news video feature representation.

In some embodiments, the news feature representations may include one or more of news program background, news announcers, news captions. In this way, the training image 102 can be accurately labeled.

In some embodiments, the news video-recognition model 106 is trained also based on an additional second sample dataset, wherein image frames in the second sample dataset have feature representations that are different from the news video feature representations described above.

In the case of training the two-classification model, the first sample data set may be used as a positive sample for determining news videos, and the second sample data set may be used as a negative sample for inputting into the news video recognition model 106 for training. At this time, the image frames in the second sample data set are image frames that do not have the above-described news video feature representation.

In the case of training a multi-classification model, the first sample data set may be used as a sample for judging news videos, and the second sample data set may be used as a sample for judging other types of videos, such as movie videos, sports videos, game videos, and the like. At this time, the image frames in the second sample data set are image frames having respective feature representations that are different from the news video feature representations.

In some embodiments, additional image frames obtained from additional video files identified as news videos by the news video identification model 106 may be added to the first sample data set for further training of the news video identification model 106.

In order to obtain a model with high detection accuracy, a large amount of training material needs to be provided to the model. However, obtaining relevant training material and labeling is a costly and time consuming task. To solve this problem, a batch training method may be employed. Specifically, a training image 102 from a relatively small number of videos may be provided first, and a news video contact model 106 may be trained to obtain a first version of the model. And then, detecting other videos by using the model of the first version, thereby screening out video files which belong to the news videos with high probability. Finally, image frames are captured from the screened video files, annotated and used as new training images 102 to further iterate training the news video contact model 106. In this way, image frames with a news video feature representation can be more efficiently acquired, thereby increasing the number of samples more rapidly.

In some embodiments, additional image frames obtained from additional video files, which are predetermined news videos and are identified as non-news videos by the news video recognition model 106, may be added to the first sample data set for further training of the news video recognition model 106.

In some embodiments, additional image frames obtained from additional video files that are predetermined non-news videos and are identified as news videos by the news video recognition model 106 may be added to the second sample data set for further training of the news video recognition model 106.

After a large number of training the model, some samples which are difficult to identify still exist, and the model can obtain false detection results for the samples which are difficult to identify. In this regard, embodiments of the present disclosure re-annotate these difficult-to-identify samples that resulted in false detection results and add to the corresponding sample data set to further iteratively train the news video contact model 106 to enhance the ability of the news video identification model 106 to distinguish such difficult-to-identify video frames. In this way, the recognition accuracy of the model can be further improved.

Fig. 3 shows an environmental schematic for implementing an embodiment according to the present disclosure. The example environment 300 includes a computing device 304. The computing device 304 may receive the video to be detected 302, acquire a plurality of image frames 310 therefrom, and input the acquired image frames 310 into the news video-detection model 306 to obtain detection results 308. The computing device 304 finally outputs the detection result 308, i.e., whether the video to be detected 302 is a news video.

Computing device 304 includes, but is not limited to, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, consumer electronics, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, virtual machines or other computing devices in a cloud platform, and the like.

The news video detection model 306 herein is, for example, a trained detection model obtained by combining the model training environment described above and the training method described above.

Fig. 4 depicts a flowchart of a method for detecting video according to an embodiment of the present disclosure. The method may be implemented, for example, by computing device 304 in fig. 3 or any other suitable device.

At block 402, the computing device 304 acquires a plurality of image frames from a predetermined period of time of the video file 302 to be detected.

In some embodiments, the plurality of image frames are acquired uniformly from a predetermined period of time of the video file 302 to be detected. In some embodiments, the predetermined period of time is 3-13 seconds after the start of the video file 302 to be detected. During this time period, the video file 302 to be detected includes, with a higher probability, a feature representation for detection, in particular a feature representation embodying a news video. Thus, the accuracy of the detection result can be improved.

In some embodiments, one image frame is taken every second from a predetermined time period of the video file 302 to be detected, and the image frames are scaled to a predetermined size. In this way, video detection can be facilitated and the detection accuracy can be improved through the image frames with reasonable intervals and the proper size.

At block 404, the computing device 304 inputs the acquired plurality of image frames to the news video-recognition model 306.

At block 406, each image frame is detected separately by news video recognition model 306 and an intermediate detection result is obtained for each image frame. Wherein the video file 302 to be detected is identified as a news video in the case where the ratio of the number of image frames identified as a news video to the total number of image frames exceeds a predetermined threshold.

Assume that the threshold is set to, for example, 0.6. If a total of 10 image frames are taken from the video file 302 to be detected and respectively input into the news video recognition model 306 to obtain 10 intermediate detection results, when more than 6 of the intermediate detection results indicate that the video file is a news video (i.e., when the frame ratio of the news video recognized as being greater than or equal to 0.6 in the 10 image frames), the computing device 304 finally gives a detection result that the video file 302 to be detected belongs to the news video. It should be appreciated that the threshold may be adjusted based on actual model performance and sensed data conditions.

By the video detection method according to the embodiment of the present disclosure, the news video can be more efficiently detected based on the model, without being limited to a specific template matching, and without individual matching for each template feature. Thereby enabling a great reduction in detection time and complexity. By accurately identifying news videos, more accurate subsequent processing can be performed on such videos. Such as reducing the recommended distribution of news videos that are not time-efficient, thereby improving the user experience.

Fig. 5 shows a block diagram of an apparatus for training a news video-recognition model, in accordance with an embodiment of the present disclosure. As shown in fig. 5, the apparatus 500 may include: a first image acquisition module 502 configured to acquire an image frame from a first predetermined period of time for each of a plurality of video files; and a model training module 504 configured to train a news video recognition model based on a first sample data set, wherein the first sample data set includes image frames of the acquired image frames having a defined representation of news video features.

In some embodiments, model training module 504 may be further configured to: a news video recognition model is trained based on a second sample dataset, wherein the second sample dataset includes image frames of the acquired image frames having a feature representation that is different from the feature representation of the news video.

In some embodiments, the apparatus 500 may further include a first sample extension module configured to: and adding the first sample data set to additional image frames obtained from additional video files for further training the news video recognition model, wherein the additional video files are recognized as news videos by the news video recognition model.

In some embodiments, the apparatus 500 may further comprise a second sample expansion module configured to: additional image frames obtained from additional video files are added to the first sample data set for further training of the news video recognition model, wherein the additional video files are predetermined news videos and are recognized as non-news videos by the news video recognition model.

In some embodiments, the apparatus 500 may further comprise a third sample expansion module configured to: additional image frames obtained from additional video files, which are predetermined non-news videos and are identified as news videos by the news video recognition model, are added to the second sample data set for further training of the news video recognition model.

In some embodiments, the first image acquisition module 502 may be further configured to: a plurality of image frames are uniformly acquired from a first predetermined period of time.

In some embodiments, the first predetermined period of time is the first 5-15 seconds of the video file.

In some embodiments, the first image acquisition module 502 may be further configured to: capturing one image frame per second from a first predetermined time period of each video file to obtain a plurality of image frames corresponding to the video file; randomly selecting one image frame from a plurality of image frames; and scaling the selected image frame to a predetermined size.

In some embodiments, the news video feature representation is set to include at least one of: news program background; a news reader; news captions.

Fig. 6 shows a block diagram of an apparatus for detecting video according to an embodiment of the disclosure. As shown in fig. 6, the apparatus 600 may include: a second image acquisition module 602 configured to acquire a plurality of image frames from a second predetermined period of time of a video file to be detected; an input module 604 configured to input a plurality of image frames into a news video recognition model; and an identification module 606 configured to identify the video file to be detected as a news video in a case where a ratio of the number of image frames identified as the news video among the plurality of image frames to the total number of image frames of the plurality of image frames exceeds a predetermined threshold.

In some embodiments, the second image acquisition module 602 may be further configured to: a plurality of image frames are uniformly acquired from the second predetermined period.

In some embodiments, the second predetermined period of time may be set to the first 3-13 seconds of the video file to be detected.

In some embodiments, the second image acquisition module 602 may be further configured to: capturing one image frame every second from a second preset time period of a video file to be detected so as to acquire a plurality of image frames; and scaling the plurality of image frames to a predetermined size.

In some embodiments, the news video recognition model is trained by the apparatus 500 for training a news video recognition model according to the present disclosure.

According to embodiments of the present disclosure, a corresponding electronic device and a corresponding computer-readable storage medium are also provided.

Fig. 7 illustrates a block diagram of a computing device 700 capable of implementing various embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, the apparatus 700 includes a computing unit 701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 may also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Various components in device 700 are connected to I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, etc.; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, an optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 701 performs the various methods and processes described above, such as processes 300, 400, 500. For example, in some embodiments, the processes 300, 400, 500 may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 700 via ROM 702 and/or communication unit 709. When the computer program is loaded into RAM 703 and executed by the computing unit 701, one or more of the steps of the processes 300, 400, 500 described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the processes 300, 400, 500 by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A method for training a news video recognition model, comprising:

Acquiring an image frame from each of a plurality of video files for a first predetermined period of time, wherein acquiring the image frame comprises:

Intercepting one image frame from each video file in each second in the first preset time period to acquire a plurality of image frames corresponding to each video file;

Randomly selecting one or more image frames from the plurality of image frames;

Scaling the selected one or more image frames to a predetermined size; and

Unifying the selected one or more image frames into a predetermined format; and

Training the news video recognition model in batches based on a first sample dataset, wherein the first sample dataset includes image frames having a defined representation of news video features in the acquired image frames, wherein training the news video recognition model in batches includes:

Training a first version of news video recognition model based on the first sample dataset;

Detecting other video files by using the news video identification model of the first version to screen out other video files belonging to the news video; and

Adding further image frames obtained from the further video file to the first sample data set for further training of the news video recognition model.

2. The method of claim 1, further comprising:

The news video recognition model is trained based on a second sample data set, wherein the second sample data set includes image frames of the acquired image frames having a feature representation that is different from the news video feature representation.

3. The method of claim 1, further comprising:

adding further image frames obtained from further video files to the first sample data set for further training of the news video recognition model,

Wherein the additional video file is a predetermined news video and is identified as a non-news video by the news video identification model.

4. The method of claim 2, further comprising:

adding further image frames obtained from further video files to the second sample data set for further training of the news video recognition model,

Wherein the additional video file is a predetermined non-news video and is identified as a news video by the news video identification model.

5. The method of claim 1, wherein the first predetermined period of time is the first 5-15 seconds of a video file.

6. The method of claim 1 or 2, wherein the news video-feature representation comprises at least one of: news program background; a news reader; news captions.

7. A method of detecting video using the news video recognition model trained in accordance with claim 1, comprising:

Acquiring a plurality of image frames from a second preset time period of a video file to be detected;

Inputting the plurality of image frames into a news video recognition model; and

And in the case that the proportion of the number of image frames identified as news videos in the plurality of image frames to the total number of image frames in the plurality of image frames exceeds a predetermined threshold, identifying the video file to be detected as news video.

8. The method of claim 7, wherein acquiring a plurality of image frames from a second predetermined period of time of a video file to be detected further comprises:

A plurality of image frames are uniformly acquired from the second predetermined period of time.

9. The method of claim 7, wherein the second predetermined period of time is the first 3-13 seconds of the video file to be detected.

10. The method of claim 8, wherein uniformly acquiring a plurality of image frames from the second predetermined period of time further comprises:

intercepting one image frame every second from the second preset time period of the video file to be detected so as to acquire a plurality of image frames; and

The plurality of image frames are scaled to a predetermined size.

11. An apparatus for training a news video recognition model, comprising:

A first image acquisition module configured to acquire an image frame from a first predetermined time period of each of a plurality of video files, the first image acquisition module further configured to:

Randomly selecting one or more image frames from the plurality of image frames;

Scaling the selected one or more image frames to a predetermined size; and

Unifying the selected one or more image frames into a predetermined format; and

A model training module configured to train the news video recognition model based on a first sample dataset, wherein the first sample dataset includes image frames of the acquired image frames having a defined representation of news video features, the model training module further configured to:

12. The apparatus of claim 11, wherein the model training module is further configured to:

13. The apparatus of claim 11, further comprising a second sample expansion module configured to:

14. The apparatus of claim 12, further comprising a third sample expansion module configured to:

15. The apparatus of claim 11, wherein the first predetermined period of time is the first 5-15 seconds of a video file.

16. The apparatus of claim 11 or 12, wherein the news video-feature representation is set to include at least one of: news program background; a news reader; news captions.

17. An apparatus for detecting video using the apparatus for training a news video-recognition model of claim 11, comprising:

A second image acquisition module configured to acquire a plurality of image frames from a second predetermined period of time of a video file to be detected;

an input module configured to input the plurality of image frames into a news video recognition model; and

And an identification module configured to identify the video file to be detected as a news video in a case where a ratio of the number of image frames identified as the news video among the plurality of image frames to the total number of image frames exceeds a predetermined threshold.

18. The apparatus of claim 17, wherein the second image acquisition module is further configured to:

19. The apparatus of claim 17, wherein the second predetermined period of time is set to the first 3-13 seconds of the video file to be detected.

20. The apparatus of claim 18, wherein the second image acquisition module is further configured to:

The plurality of image frames are scaled to a predetermined size.

21. An electronic device, the electronic device comprising:

at least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 6.

22. An electronic device, the electronic device comprising:

at least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 7 to 10.

23. A computer readable storage medium, on which a computer program is stored, which program, when being executed by a processor, implements the method according to any of claims 1 to 6.

24. A computer readable storage medium, on which a computer program is stored, which program, when being executed by a processor, implements the method according to any of claims 7 to 10.