CN112818737B

CN112818737B - Video identification method, device, storage medium and terminal

Info

Publication number: CN112818737B
Application number: CN202011507838.1A
Authority: CN
Inventors: 朱帅
Original assignee: Guangzhou Shiyuan Electronics Thecnology Co Ltd
Current assignee: Guangzhou Shiyuan Electronics Thecnology Co Ltd
Priority date: 2020-12-18
Filing date: 2020-12-18
Publication date: 2024-02-02
Anticipated expiration: 2040-12-18
Also published as: CN112818737A

Abstract

The embodiment of the application discloses a video identification method, a device, a storage medium and a terminal, wherein the method comprises the following steps: acquiring a video to be processed, and determining at least one frame of image in the video; determining a definition score of each frame of the image, and calculating a target definition score of the video according to each definition score; and carrying out clear identification on the video according to a comparison result of the target definition score and a preset definition threshold. By adopting the embodiment of the application, objective judgment can be performed by calculating the definition score without relying on manual judgment, so that a brand new objective evaluation standard is provided for the observation quality evaluation of the teaching video, and meanwhile, the time cost can be saved and the examination efficiency can be improved.

Description

Video identification method, device, storage medium and terminal

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a video identification method, a video identification device, a storage medium, and a terminal.

Background

Today of education informatization, teachers transmit more common learning and use for students by recording course contents into video forms; the teaching forms of network live lessons and recorded lessons are becoming more common, and the quality of high-quality video pictures is particularly important. The quality of the visual quality of the video picture directly influences the teaching quality, and the high-quality picture is helpful for students to more effectively draw knowledge from the video course.

The definition of the teaching video picture of the online learning platform (such as the Highway college) is manually judged, the judgment depends on the background experience of auditors, different individuals grasp the definition of the picture, the subjective auditing result possibly causes that the visual perception of the video picture assessed by different individuals cannot be judged under the unified standard, and meanwhile, the time cost is increased due to the fact that the manual auditing is relied on, and the auditing efficiency is low.

Disclosure of Invention

The embodiment of the application provides a video identification method, a device, a storage medium and a terminal, which can be used for carrying out objective judgment by calculating a definition score without relying on manual judgment, and provides a brand new objective evaluation standard for evaluating the visual quality of teaching videos and can improve the examination efficiency. The technical proposal is as follows:

in a first aspect, an embodiment of the present application provides a video identification method, where the method includes:

acquiring a video to be processed, and determining at least one frame of image in the video;

determining a definition score of each frame of the image, and calculating a target definition score of the video according to each definition score;

and carrying out clear identification on the video according to a comparison result of the target definition score and a preset definition threshold.

In a second aspect, embodiments of the present application provide a video recognition apparatus, the apparatus including:

the image determining module is used for acquiring a video to be processed and determining at least one frame of image in the video;

the score calculating module is used for determining the definition score of each frame of the image and calculating the target definition score of the video according to each definition score;

and the video identification module is used for clearly identifying the video according to the comparison result of the target definition score and a preset definition threshold.

In a third aspect, embodiments of the present application provide a computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the above-described method steps.

In a fourth aspect, embodiments of the present application provide a terminal, which may include: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the above-mentioned method steps.

The technical scheme provided by some embodiments of the present application has the beneficial effects that at least includes:

obtaining a video to be processed, determining at least one frame of image in the video, determining the definition score of each frame of image, calculating the target definition score of the video according to each definition score, and carrying out definition recognition on the video according to the comparison result of the target definition score and a preset definition threshold. The method can perform objective judgment by calculating the definition score without relying on manual judgment, provides a brand new objective evaluation standard for the observation quality evaluation of the teaching video, and can save time cost and improve the examination efficiency.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic structural diagram of a terminal provided in an embodiment of the present application;

FIG. 2 is a schematic diagram of an operating system and user space provided in an embodiment of the present application;

FIG. 3 is an architecture diagram of the android operating system of FIG. 1;

fig. 4 is a schematic flow chart of a video recognition method according to an embodiment of the present application;

FIG. 5 is an exemplary schematic diagram of a manner of setting a sharpness threshold according to an embodiment of the present application;

fig. 6 is a flowchart of another video recognition method according to an embodiment of the present application;

fig. 7 is a flowchart of another video recognition method according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a video recognition device according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a score calculating module according to an embodiment of the present application;

Fig. 10 is a schematic structural diagram of an information calculating unit provided in the embodiment of the present application;

fig. 11 is a schematic structural diagram of a video recognition device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

In the description of the present application, it should be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In the description of the present application, it is to be understood that the terms "comprise" and "have," and any variations thereof, are intended to cover non-exclusive inclusions, unless otherwise specifically defined and defined. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus. The specific meaning of the terms in this application will be understood by those of ordinary skill in the art in a specific context. Furthermore, in the description of the present application, unless otherwise indicated, "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

Referring to fig. 1, a block diagram of a terminal according to an exemplary embodiment of the present application is shown. The terminal can be terminal equipment such as an interactive intelligent tablet, a mobile phone, a computer, a tablet and the like. A terminal in the present application may include one or more of the following: processor 110, memory 120, input device 130, output device 140, and bus 150. The processor 110, the memory 120, the input device 130, and the output device 140 may be connected by a bus 150.

Processor 110 may include one or more processing cores. The processor 110 connects various parts within the overall terminal using various interfaces and lines, performs various functions of the terminal 100 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 120, and invoking data stored in the memory 120. Alternatively, the processor 110 may be implemented in at least one hardware form of digital signal processing (digital signal processing, DSP), field-programmable gate array (field-programmable gate array, FPGA), programmable logic array (programmable logic Array, PLA). The processor 110 may integrate one or a combination of several of a central processing unit (central processing unit, CPU), an image processor (graphics processing unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user page, an application program and the like; the GPU is used for being responsible for rendering and drawing of display content; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 110 and may be implemented solely by a single communication chip.

The memory 120 may include a random access memory (random Access Memory, RAM) or a read-only memory (ROM). Optionally, the memory 120 includes a non-transitory computer readable medium (non-transitory computer-readable storage medium). Memory 120 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 120 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, which may be an Android (Android) system, including an Android system-based deep development system, an IOS system developed by apple corporation, including an IOS system-based deep development system, or other systems, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing various method embodiments described below, and the like. The storage data area may also store data created by the terminal in use, such as phonebooks, audio-video data, chat-record data, etc.

Referring to FIG. 2, the memory 120 may be divided into an operating system space in which the operating system runs and a user space in which native and third party applications run. In order to ensure that different third party application programs can achieve better operation effects, the operating system allocates corresponding system resources for the different third party application programs. However, the requirements of different application scenarios in the same third party application program on system resources are different, for example, under the local resource loading scenario, the third party application program has higher requirement on the disk reading speed; in the animation rendering scene, the third party application program has higher requirements on the GPU performance. The operating system and the third party application program are mutually independent, and the operating system often cannot timely sense the current application scene of the third party application program, so that the operating system cannot perform targeted system resource adaptation according to the specific application scene of the third party application program.

In order to enable the operating system to distinguish specific application scenes of the third-party application program, data communication between the third-party application program and the operating system needs to be communicated, so that the operating system can acquire current scene information of the third-party application program at any time, and targeted system resource adaptation is performed based on the current scene.

Taking an operating system as an Android system as an example, as shown in fig. 3, a program and data stored in the memory 120 may be stored in the memory 120 with a Linux kernel layer 320, a system runtime library layer 340, an application framework layer 360 and an application layer 380, where the Linux kernel layer 320, the system runtime library layer 340 and the application framework layer 360 belong to an operating system space, and the application layer 380 belongs to a user space. The Linux kernel layer 320 provides the various hardware of the terminal with the underlying drivers such as display drivers, audio drivers, camera drivers, bluetooth drivers, wi-Fi drivers, power management, etc. The system runtime layer 340 provides the main feature support for the Android system through some C/c++ libraries. For example, the SQLite library provides support for databases, the OpenGL/ES library provides support for 3D graphics, the Webkit library provides support for browser kernels, and the like. Also provided in the system runtime library layer 340 is a An Zhuoyun runtime library (Android run) which provides mainly some core libraries that can allow developers to write Android applications using the Java language. The application framework layer 360 provides various APIs that may be used in building applications, which developers can also build their own applications by using, for example, campaign management, window management, view management, notification management, content provider, package management, call management, resource management, location management. At least one application program is running in the application layer 380, and these application programs may be native application programs of the operating system, such as a contact program, a short message program, a clock program, a camera application, etc.; and may also be a third party application developed by a third party developer, such as a game-like application, instant messaging program, photo beautification program, text translation program, etc.

The input device 130 is configured to receive input instructions or data, and the input device 130 includes, but is not limited to, a keyboard, a mouse, a camera, a microphone, or a touch device. The output device 140 is used to output instructions or data, and the output device 140 includes, but is not limited to, a display device, a speaker, and the like. In one example, the input device 130 and the output device 140 may be combined, and the input device 130 and the output device 140 are touch display screens.

The touch display screen may be designed as a full screen, a curved screen, or a contoured screen. The touch display screen can also be designed to be a combination of a full screen and a curved screen, and a combination of a special-shaped screen and a curved screen, which is not limited in the embodiment of the application.

In addition, those skilled in the art will appreciate that the configuration of the terminal illustrated in the above-described figures does not constitute a limitation of the terminal, and the terminal may include more or less components than illustrated, or may combine certain components, or may have a different arrangement of components. For example, the terminal further includes components such as a radio frequency circuit, an input unit, a sensor, an audio circuit, a wireless fidelity (wireless fidelity, wiFi) module, a power supply, and a bluetooth module, which are not described herein.

In the embodiment of the present application, the execution subject of each step may be the terminal described above. Optionally, the execution subject of each step is an operating system of the terminal. The operating system may be an android system, an IOS system, or other operating systems, which embodiments of the present application do not limit.

In the terminal shown in fig. 1, the processor 110 may be configured to invoke an application program stored in the memory 120 and specifically execute the video recognition method according to the embodiment of the present application.

The present application is described in detail below in connection with specific embodiments.

In one embodiment, as shown in fig. 4, a video recognition method is specifically proposed, which can be implemented in dependence on a computer program, and can be run on a video recognition device based on von neumann system. The computer program may be integrated in the application or may run as a stand-alone tool class application.

Specifically, the video identification method comprises the following steps:

s101, acquiring a video to be processed, and determining at least one frame of image in the video;

it can be understood that the video to be processed may be any content video, such as recorded teaching video, movie video of a website, chat video received by instant messaging, MV in a player, and the like.

The video to be processed can be obtained by downloading from a network, recording by adopting video equipment, receiving transmission of other users, or making by the users.

The video to be processed includes a plurality of frames of images, and at least one frame of image may be all or part of the frames of the video, such as a key frame. The key frame corresponds to the original picture in the two-dimensional animation, and refers to the frame where the key action in the movement or change of the character or the object is located.

It should be noted that, in general, the continuous multi-frame images have small differences in image content, and when the video is large, there are many image frames to be processed, so that the processing efficiency is reduced, and therefore, a portion of the image frames having large differences is generally selected for processing.

In a feasible implementation mode, performing frame extraction processing on the video according to a set frame number to obtain at least one frame of image; or according to the processing efficiency and the processing time requirement of the processor, calculating to obtain the processing frame number, and then extracting according to a preset rule to obtain at least one frame of image; or extracting according to the set interval frame number to obtain at least one frame of image.

S102, determining a definition score of each frame of the image, and calculating a target definition score of the video according to each definition score;

Sharpness refers to the degree of sharpness of an image that is macroscopically seen by the human eye, and is the subjective perception of the image by a person as a result of the combination of the objective performance of the system and equipment.

And respectively calculating the definition score of each frame of acquired image, and then calculating the average value of the definition scores of each frame of image, thereby obtaining the target definition score of the video.

Among them, for the calculation of the definition of the image, common modes include mean-square error method and peak signal-to-noise ratio, HVS, point sharpness algorithm, spatial parameter variance, frequency domain modulation transfer function MTF, etc. The definition score of the image can be calculated based on the gradient pixels of the sobel operator and Gaussian noise, and the definition of the video picture can be roughly estimated by directly using the gradient score of the sobel operator.

In a feasible implementation manner, calculating an average value of the definition scores of all frames, and taking the average value as a target definition score of the video; or determining the partial scores arranged in front in the definition score, calculating the average value of the partial scores, and taking the average value as the target definition score of the video; it is also possible to calculate the highest score and the lowest score among the sharpness scores, calculate the average of both as the target sharpness score of the video, and so on.

And S103, clearly identifying the video according to a comparison result of the target definition score and a preset definition threshold.

Comparing the target definition score of the video with a manually preset auditing threshold value to judge whether the video image quality belongs to a definition category, specifically, determining that the video is a clear video when the target definition score is greater than or equal to the preset definition threshold value, and determining that the video is a fuzzy video when the target definition score is less than the preset definition threshold value.

The preset definition threshold can be understood as the lowest definition which can be normally watched by a user, and the auditing threshold preset by the user can be modified according to the actual conditions of different scenes to adapt to the requirements of different scenes.

The definition threshold can be set by the following two schemes:

first, it is set directly by the manual work according to experience. The definition obtained by the scheme takes values in real intervals [0,1], and the higher the value is, the clearer the picture texture quality is. For example, the video definition requirements are low for everyday life scenarios, then the threshold is typically relatively low; for scenes with higher precision requirements in part of medical industry, manufacturing industry and the like, the video definition requirements are higher, and higher numerical values can be obtained as definition threshold values according to requirements. The subjective perception of sharpness by people is affected by different business scenes, and for algorithm application of scenes in certain professional fields, it is reasonable to set a sharpness threshold by referring to artificial experience.

And secondly, setting by the distribution of the existing video data scores. In the case of existing video samples and a sufficient number of samples (the statistical distribution results are not generic if the number of samples is too small, the present scheme requires that the positive and negative samples are at least greater than or equal to 30), a data distribution curve can be used to determine the threshold. The existing video data samples are manually inspected and divided into clear samples and unclear samples. The algorithm calculates the definition of all the existing samples, respectively draws distribution curves of the scores of the clear samples and the scores of the unclear samples, and takes the intersection of the two distribution curves as a definition threshold, as shown in fig. 5.

In the embodiment of the application, a video to be processed is acquired, at least one frame of image in the video is determined, the definition score of each frame of image is determined, the target definition score of the video is calculated according to each definition score, and then the video is clearly identified according to the comparison result of the target definition score and a preset definition threshold. The method can perform objective judgment by calculating the definition score without relying on manual judgment, provides a brand new objective evaluation standard for the observation quality evaluation of the teaching video, and can save time cost and improve the examination efficiency.

Referring to fig. 6, fig. 6 is a flowchart of another embodiment of a video recognition method according to the present application. Specific:

s201, acquiring a video to be processed, and performing frame extraction processing on the video according to a set frame number to obtain at least one frame of image;

it can be understood that the video to be processed may be any video, and in this embodiment of the present application, the description is given taking the video to be processed as a recorded teaching video as an example.

In order to reduce the amount of computation to increase efficiency, the teaching video needs to be subjected to frame extraction processing, i.e., part of frames are extracted for computation. Specifically, the extraction may be performed according to the number of frames set to the same frame number per interval, or the extraction may be performed according to the number of frames set to the different frame numbers per interval. The number of frames may be random or progressive (increasing or decreasing), and is not particularly limited herein.

For example, the video includes 10000 frames, 1000 frames need to be extracted, one frame is extracted every 10 frames, and the extracted 1000 frames are determined as the resulting at least one frame.

Of course, if the number of frames extracted is too large, the accuracy of the sharpness calculation can be improved, but the efficiency is correspondingly reduced, if the number of frames extracted is too small, the accuracy of the sharpness calculation can be improved, and when invalid information is introduced, the accuracy of the sharpness can be affected, so that the preferable range can be 25-400 frames.

S202, acquiring a target image in the image, and calculating pixel gradients and Gaussian noise of the target image;

the same processing is adopted for each extracted frame of image, and therefore, one of the frame of target image is described here as an example.

The Sobel operator (Sobel operator) is a characteristic operator in computer vision computation and image processing, and is sometimes called a Sobel filter, and is often used to solve problems such as edge detection in the image field. The Sober operator was originally proposed by the United states computer scientist Irwin Sobel in 1968. The sharpness of a frame of image can be quantified by the edge gradient intensity exhibited by the pixel points in the fixed window size, and the Sober operator can detect the edge gradient intensity in the horizontal direction and the vertical direction.

Wherein the horizontal pixel detection operator matrix of the cable operator is shown in table 1, and the vertical pixel detection operator matrix is shown in table 2:

TABLE 1

+1	0	-1
			+2	0	-2
+1	0	-1

TABLE 2

+1	+2	+1
			0	0	0
-1	+2	-1

If the A represents the original image, the matrix is convolved with the original image to obtain the image direction gradient Gx of the horizontal edge detection and the image direction gradient Gy of the vertical edge detection, respectively, and the formulas are as follows:

The Sobert operator gradient is thenG is the pixel gradient of the target image.

Gaussian noise can be calculated by a gaussian blur operator. Gaussian blur operator is a smoothing operator commonly used in image processing and is used to smooth and blur a picture.

By "blurring" it is understood that each pixel takes a weighted average of the surrounding pixels, thereby reducing the variance from the surrounding pixels. Since the images in the computer are all two-dimensional matrix representations, a two-dimensional gaussian function is used to calculate the weight matrix in the embodiments of the present application. The two-dimensional gaussian function is as follows:

wherein x and y are coordinates of each pixel point of the selected window, sigma is standard deviation, and sigma is a constant. For example, taking σ=1.5, using a window with width and height of 3 pixels to perform gaussian blur processing on an image, and coordinates of a center point are (0, 0), coordinates of other points in the window are shown in table 3, and corresponding weights are shown in table 4:

TABLE 3 Table 3

(-1,1)	(0,1)	(1,1)
			(-1,0)	(0,0)	(1,0)
(-1,-1)	(0,-1)	(1,-1)

TABLE 4 Table 4

0.09474	0.1183	0.09474
			0.1183	0.1478	0.1183
0.09474	0.1183	0.09474

According to the Gaussian weight distribution diagram, gaussian blur calculation can be carried out on each pixel point, namely the blurred pixel point is obtained by multiplying weights of the original pixel point and 8 peripheral pixel points at corresponding positions respectively, and if the pixel point is a pixel at the edge of a picture, 0 compensation treatment is carried out on points outside the edge

Wherein the weight matrix corresponding to Table 4 is Filter

Assuming that the pixel value before the gaussian blur is a and the pixel value after the gaussian blur is a', then:

wherein a is ₁ ～a ₈ The value of 8 pixels around the center point a is taken as 0 if not. For an entire video frame, if the original image is represented by a and the blurred image is represented by a', then there are:

A’＝Filter*A。

it can be understood that, for each pixel point in a, the calculation is performed in the above manner, so as to obtain a blurred image, i.e. obtain gaussian noise of the target image.

S203, calculating a definition score of the target image according to the pixel gradient and the Gaussian noise;

SSIM (Structural SIMilarity), structural similarity, is an indicator for measuring the similarity of two frames of images. The index feature is employed in this embodiment to normalize the sharpness score of the image.

For each frame of image, calculating the regional sobel gradient score G _i Sum region gaussian blur a' _i 。

Defining a regional sobel gradient score G _i Mean value of (1)Variance is->Regional Gaussian blur A' _i Mean value of (1)Variance is->G _i With A' _i Covariance of (2) is gamma _GA Constant parameter for maintaining stability is c ₁ And c ₂ The SSIM calculation formula for the target image is:

I.e. SSIM is the sharpness Score of the target image.

S204, calculating a target definition score of the video according to the definition score of each image;

calculating an average value of sharpness scores Score of the extracted frame imagesWill be->A target sharpness score for the video is determined.

S205, when the target definition score is greater than or equal to the preset definition threshold, determining that the video is a clear video;

the video auditor operates the scheme according to the previous experience or on the existing video sample to debug and determine a judgment Threshold value Threshold E [0,1] of video definition, so that the video quality definition audit is automatically carried out according to the rules in the following table:

in the application of the actual scene, threshold can be adjusted at any time according to the business demand change.

S206, when the target definition score is smaller than the preset definition threshold, determining that the video is a blurred video;

and S207, storing the clear video into a resource library.

The resource library can be a teaching video resource library, and clear videos are automatically stored in the resource library so that related videos can be directly called from the resource library for playing later.

In the embodiment of the application, objective judgment can be performed by calculating the definition score without relying on manual judgment, so that a brand new objective evaluation standard is provided for evaluating the visual quality of the teaching video, and meanwhile, the time cost can be saved and the examination efficiency can be improved. The pixel gradient and Gaussian noise are calculated by taking the whole frame of image as a unit, so that the step of blocking processing can be saved, in addition, the identified clear video can be automatically saved in a resource library, and the convenience for reading and playing the video later is improved.

Referring to fig. 7, fig. 7 is a flowchart of another embodiment of a video recognition method according to the present application. Specific:

s301, acquiring a video to be processed, and performing frame extraction processing on the video according to a set frame number to obtain at least one frame of image;

see S201 for details, and will not be described here.

S302, acquiring a target image in the image, and dividing the target image into at least one image block;

the target image is any one of the extracted at least one frame image, and the processing manner is the same for each frame image, so only one frame of the target image will be described here.

Dividing the target image into multiple parts (such as 10 parts respectively) according to width and height, and dividing into multiple (100) regions with the same size, namely multiple image block regions _i 。

S303, calculating the cross entropy of each image block in the image blocks;

in order to reduce the calculation amount and improve the calculation efficiency, part of the image blocks can be selected for processing. Therefore, the cross entropy corresponding to each image block in the target image can be calculated respectively, and the image block for subsequent processing is determined according to the cross entropy. Cross entropy is an important concept in information theory, in which cross entropy of two probability distributions p and q based on the same event measure refers to the average number of bits required to uniquely identify an event in a set of events when encoded based on an "unnatural" probability distribution, and is commonly used to define information variability between subdivisions.

The cross entropy based on probability distributions p and q is defined as: h (p, q) =e _p [-log q]＝H(p)+D _KL (p||q)

Where H (p) is the entropy of p, D _KL (p||q) is the KL divergence from p to q (also known as the relative entropy of p with respect to q).

In this embodiment, the degree of difference in pixel value distribution in a certain region is defined by using the characteristic of entropy, and the larger this difference is, the more abundant the content information contained in the region is.

For each image block, defining a pixel value a in the image block _x E { all pixel sets of image block N }, subscript x identifies the different pixels,the pixel point value range is [0, 255 ]]. The cross Entropy of the whole image block is Entropy, and the calculation formula is:

for each image block, a cross entropy needs to be calculated.

S304, arranging according to the order of the cross entropies, and acquiring the target cross entropies with the preset number arranged in front in the cross entropies;

it can be understood that the larger the cross entropy, the greater the degree of difference in pixel value distribution in a certain region, and the more abundant the pixel information.

Specifically, the Entropy is arranged in the order from high to low, and then the largest preset number (such as 10) of image block regions of Entropy are fetched _i ,i∈[1,10]。

S305, determining target image blocks corresponding to the target cross entropies respectively;

the image blocks are in one-to-one correspondence with the cross entropy, so that the target image blocks corresponding to the target cross entropy are determined.

S306, calculating pixel gradients and Gaussian noise respectively corresponding to the target image blocks;

for each target image block, a pixel gradient and gaussian noise are calculated separately.

And calculating pixel gradients corresponding to the target image blocks respectively by adopting a Sobel operator, and calculating Gaussian noise corresponding to the target image blocks respectively by adopting a Gaussian blur algorithm. The specific calculation process is shown in the above-mentioned step S202, and will not be repeated here.

S307, calculating structural similarity according to pixel gradients and Gaussian noise corresponding to the target image blocks respectively;

the SSIM calculation mode of each image block is the same as the calculation mode of one frame of image in S203, and will not be described here again.

S308, determining a first average value of each structural similarity as a definition score of the target image;

calculating an average value of SSIM of the plurality of image blocks to obtain a sharpness Score Score=average (SSIM) of the target image corresponding to the plurality of image blocks _i )。

S309, calculating a second average value of each definition score, and determining the second average value as a target definition score of the video;

in the above way, the definition Score of each frame image is obtained, and the average value of the scores is calculated to obtain the target definition Score of the video

For example, 9 frames of images are randomly extracted for each video in the application scene at present and are used as judgment materials of definition. That is, the sharpness Score for these 9 frames of images is calculated as the average valueAs the image quality sharpness score of the teaching video.

S310, when the target definition score is greater than or equal to the preset definition threshold, determining that the video is a clear video;

S311, when the target definition score is smaller than the preset definition threshold, determining that the video is a blurred video;

and S312, storing the clear video into a resource library.

S310 to S312 can be specifically referred to S205 to S207, and will not be described herein.

Optionally, a specific application scenario is to evaluate lesson teaching videos recorded by hundreds of teachers. In the conventional process, whenever a teacher uploads a teaching video, an auditor needs to first open the teaching video and then watch the video, the whole process may take several minutes to tens of minutes, and finally a judgment of video definition is given according to own watching experience. If the definition is judged to be qualified, the resource library is added through auditing; if the video is judged to be too fuzzy to be suitable for watching, the video is eliminated. By applying the scheme, every time a teacher uploads a teaching video, the teaching video is automatically cut into frames, the definition of picture quality is calculated for each frame, and finally, the average value is taken as the quantized definition of the video. According to the quantized result, an auditor can set a threshold value to automatically screen the video courses according to own requirements or experience, namely, all unchecked videos do not need to be watched one by one and then manually evaluated; in addition, the scheme only needs 7.5 seconds for calculating the definition of one video, the efficiency is far higher than that of manual auditing, and the scheme provides a preparation for auditing video increase caused by future business volume increase. Therefore, the scheme greatly lightens the burden of the auditor of the teaching video course and provides powerful support for the service scene.

Alternatively, the gradient score of the sobel operator can be directly used to roughly estimate the video picture sharpness.

In the embodiment of the application, objective judgment can be performed by calculating the definition score without relying on manual judgment, so that a brand new objective evaluation standard is provided for evaluating the visual quality of the teaching video, time cost can be saved, and examination efficiency is improved. By means of the concept of entropy and the principle of structural similarity, the scheme of calculating definition by using a sobel operator is optimized to compare with the uncertain time of manual auditing video, meanwhile, the time for calculating definition of each video only needs a few seconds, and compared with manual efficiency, the scheme has great optimization. In addition, the identified clear video can be automatically saved in the resource library, so that convenience in subsequent video reading and video playing is improved.

The following are device embodiments of the present application, which may be used to perform method embodiments of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method embodiments of the present application.

Referring to fig. 8, a schematic structural diagram of a video recognition device according to an exemplary embodiment of the present application is shown. The video recognition device may be implemented as all or part of the terminal by software, hardware or a combination of both. The apparatus 1 comprises an image determination module 10, a score calculation module 20 and a video recognition module 30.

The image determining module 10 is used for acquiring a video to be processed and determining at least one frame of image in the video;

a score calculating module 20, configured to determine a sharpness score of the image for each frame, and calculate a target sharpness score of the video according to each sharpness score;

and the video identification module 30 is configured to clearly identify the video according to a comparison result of the target definition score and a preset definition threshold.

Optionally, the image determining module 10 is specifically configured to:

and performing frame extraction processing on the video according to the set frame number to obtain at least one frame of image.

Alternatively, as shown in fig. 9, the score calculating module 20 includes:

an information calculation unit 21 for acquiring a target image of the images, calculating a pixel gradient of the target image, and gaussian noise;

a score calculating unit 22 for calculating a sharpness score of the target image based on the pixel gradient and the gaussian noise.

Alternatively, as shown in fig. 10, the information calculating unit 21 includes:

an image segmentation subunit 211 for segmenting the target image into at least one image block;

a cross entropy calculating subunit 212, configured to calculate cross entropy of each image block in the image blocks;

An information calculation subunit 213, configured to calculate a pixel gradient and gaussian noise of the target image according to each of the cross entropies.

Optionally, the information calculating subunit 21 is specifically configured to:

arranging according to the order of the cross entropies, and obtaining the target cross entropies with the preset number arranged in front in the cross entropies;

determining target image blocks corresponding to the target cross entropies respectively;

and calculating pixel gradients and Gaussian noise corresponding to the target image blocks respectively.

Optionally, the score calculating module 20 is specifically configured to:

calculating structural similarity according to pixel gradients and Gaussian noise respectively corresponding to the target image blocks;

and determining a first average value of each structural similarity as a definition score of the target image.

Optionally, the information calculating subunit 213 is specifically configured to:

calculating pixel gradients corresponding to the target image blocks respectively by adopting a Sobel operator;

and calculating Gaussian noise corresponding to each target image block by adopting a Gaussian blur algorithm.

Optionally, the score calculating module 20 is specifically configured to:

and calculating a second average value of each definition score, and determining the second average value as a target definition score of the video.

Optionally, the video identification module 30 is specifically configured to:

when the target definition score is greater than or equal to the preset definition threshold, determining that the video is a clear video;

and when the target definition score is smaller than the preset definition threshold, determining that the video is a blurred video.

Optionally, as shown in fig. 11, the apparatus further includes a video saving module 40, configured to:

and storing the clear video into a resource library.

It should be noted that, in the video recognition apparatus provided in the foregoing embodiment, only the division of the foregoing functional modules is used for illustration when the video recognition method is executed, and in practical application, the foregoing functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the video recognition device provided in the above embodiment and the video recognition method embodiment belong to the same concept, which embody the detailed implementation process in the method embodiment, and are not described herein again.

The foregoing embodiment numbers of the present application are merely for describing, and do not represent advantages or disadvantages of the embodiments.

The embodiments of the present application further provide a computer storage medium, where a plurality of instructions may be stored, where the instructions are adapted to be loaded by a processor and execute the method steps of the embodiments shown in fig. 4 to fig. 7, and the specific execution process may refer to the specific description of the embodiments shown in fig. 4 to fig. 7, which is not repeated herein.

Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory, a random access memory, or the like.

The foregoing disclosure is only illustrative of the preferred embodiments of the present application and is not intended to limit the scope of the claims herein, as the equivalent of the claims herein shall be construed to fall within the scope of the claims herein.

Claims

1. A method of video recognition, the method comprising:

according to the comparison result of the target definition score and a preset definition threshold, the video is clearly identified;

the determining a sharpness score for the image for each frame includes:

acquiring a target image in the image, and calculating pixel gradients and Gaussian noise of the target image;

calculating a sharpness score of the target image according to the pixel gradient and the Gaussian noise;

wherein said calculating a sharpness score of said target image from said pixel gradient and said gaussian noise comprises:

calculating the structural similarity of the target image according to the pixel gradient and the Gaussian noise;

and determining the structural similarity of the target image as a definition score of the target image.

2. The method of claim 1, wherein said determining at least one frame of image in said video comprises:

3. The method of claim 1, wherein the step of determining the position of the substrate comprises,

Pixel gradients of the target imageG _x G is a gradient in the horizontal direction _x Target image, G _y Is a gradient in the vertical direction G _y =vertical pixel detection operator matrix.

4. The method of claim 1, wherein the step of determining the position of the substrate comprises,

the Gaussian noise A '=Filter of the target image is A, wherein A represents the target image before blurring, A' represents the target image after blurring, and Filter is a Gaussian blurring weight matrix corresponding to the selected window;

the Filter is based on a two-dimensional Gaussian functionAnd calculating, wherein x and y are coordinates of each pixel point of the selected window, and sigma is a standard deviation.

5. The method of claim 1, wherein the calculating pixel gradients and gaussian noise of the target image comprises:

dividing the target image into at least one image block;

calculating the cross entropy of each image block in the image blocks;

and calculating pixel gradients and Gaussian noise of the target image according to each cross entropy.

6. The method of claim 5, wherein the step of determining the position of the probe is performed,

cross entropy of the image blockWherein a is _x For pixel values in an image block, x identifies different pixels.

7. The method of claim 5, wherein said calculating pixel gradients and gaussian noise of said target image from each of said cross entropies comprises:

8. The method of claim 7, wherein said calculating a sharpness score of the target image from the pixel gradient and the gaussian noise comprises:

9. The method of claim 8, wherein the step of determining the position of the first electrode is performed,

the structural similarityWherein, regional sobel gradient score G _i Is +.>Variance is->Region Gaussian blur A' _i Is +.>Variance is->G _i With A' _i Covariance of (2) is gamma _GA Constant parameter for maintaining stability is c ₁ And c ₂ 。

10. The method of claim 8, wherein said calculating pixel gradients and gaussian noise for each of said target image blocks, respectively, comprises:

11. The method of claim 1, wherein said calculating a target sharpness score for said video from each of said sharpness scores comprises:

12. The method according to claim 1, wherein the clearly identifying the video according to the comparison result of the target definition score and a preset definition threshold value comprises:

13. The method according to claim 12, wherein the method further comprises:

And storing the clear video into a resource library.

14. A video recognition device, the device comprising:

the video identification module is used for clearly identifying the video according to the comparison result of the target definition score and a preset definition threshold;

the score calculation module comprises:

an information calculation unit for acquiring a target image in the image, and calculating a pixel gradient and Gaussian noise of the target image;

a score calculation unit for calculating a sharpness score of the target image from the pixel gradient and the gaussian noise;

the score calculating unit is specifically configured to:

15. A computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the method steps of any one of claims 1 to 13.

16. A terminal, comprising: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method steps of any of claims 1-13.