CN112419257A

CN112419257A - Method and device for detecting definition of text recorded video, computer equipment and storage medium

Info

Publication number: CN112419257A
Application number: CN202011286396.2A
Authority: CN
Inventors: 王家桢
Original assignee: OneConnect Financial Technology Co Ltd Shanghai
Current assignee: OneConnect Smart Technology Co Ltd; OneConnect Financial Technology Co Ltd Shanghai
Priority date: 2020-11-17
Filing date: 2020-11-17
Publication date: 2021-02-26
Also published as: WO2022105507A1

Abstract

The embodiment of the application belongs to the field of artificial intelligence and relates to a method for detecting the definition of a text recorded video, which comprises the steps of obtaining a service recorded video; calculating a ambiguity curve of the service recorded video, intercepting the service recorded video according to the ambiguity curve, and obtaining a text recorded video segment to be detected; extracting N video frames in the text recording video clip to be detected, wherein N is a positive integer greater than 1; inputting the N video frames into an OCR-based character recognition model, obtaining a character recognition result of each frame in the N video frames, and judging the frame definition of each frame according to the character recognition result; and judging the definition of the text recording video clip to be detected according to the frame definition of each frame. The application also provides a device for detecting the definition of the text-recorded video, computer equipment and a storage medium. The definition of the text recording video clip is detected without watching the video by human eyes, so that the time and labor are saved, and the efficiency is higher.

Description

Method and device for detecting definition of text recorded video, computer equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for detecting the sharpness of a text-recorded video, a computer device, and a storage medium.

Background

In the process of developing financial services such as insurance, securities, banks and the like, the requirement on service standardization is high, in order to reduce disputes after the fact as much as possible and provide supervision elements for the after the fact, the China bank insurance and the certificate prison set up industrial specifications, and a salesman is required to deal with key link synchronous recording and video in the process of providing financial services for customers. In the recording and video recording process, not only the face image needs to be recorded, but also in many key links, a service person needs to show some important text materials to a client in the scene as a supervision element for post supervision.

In a real service scene, a service person may not well focus on a text material, so that the video recording is not qualified, and the video recording cannot be used as a material for subsequent supervision and needs to be recorded again. At present, the examination of whether the text material is recorded is carried out manually, the whole video is required to be watched and examined manually step by step, and time and labor are wasted.

Disclosure of Invention

The embodiment of the application aims to provide a method and a device for detecting the definition of a text recorded video, a computer device and a storage medium, so as to solve the problem that manual review of the text recorded video is time-consuming and labor-consuming.

In order to solve the above technical problem, an embodiment of the present application provides a method for detecting a definition of a text-recorded video, which adopts the following technical solutions:

acquiring a service recording video;

calculating a ambiguity curve of the service recorded video, intercepting the service recorded video according to the ambiguity curve, and obtaining a text recorded video segment to be detected;

extracting N video frames in the text recording video clip to be detected, wherein N is a positive integer greater than 1;

inputting the N video frames into an OCR-based character recognition model, obtaining a character recognition result of each frame in the N video frames, and judging the frame definition of each frame according to the character recognition result;

and judging the definition of the text recording video clip to be detected according to the frame definition of each frame.

Further, the step of acquiring the service recording video includes:

acquiring audio synchronized with the service recorded video;

performing character conversion on the audio to obtain a character conversion result, comparing the character conversion result with a preset first keyword and a preset second keyword to obtain a first time point and a second time point of the first keyword and the second keyword appearing in the audio for the first time;

intercepting the service recording video according to a time period formed by the first time point and the second time point to obtain a first video segment;

and calculating a ambiguity curve of the first video segment, and intercepting the first video segment according to the ambiguity curve to obtain a text recording video segment to be detected.

Further, the step of extracting N video frames in the text recording video segment to be detected includes:

analyzing the text recording video clip to be detected into a video frame set;

and extracting L video frame subsets from the video frame set according to a set interval, wherein L is a positive integer larger than 1, the video frame subsets are formed by M video frames adjacent to each other in time in the video frame set, and M is a positive integer larger than 1.

Further, the step of judging the definition of the text recording video clip to be detected according to the frame definition of each frame includes:

judging the definition of each video frame subset according to the frame definition of each frame;

and judging the definition of the text recording video clip to be detected according to the definition of each video frame subset.

Further, the step of judging the definition of the text recording video segment to be detected according to the definition of each video frame subset comprises:

calculating the ratio of the number of clear video frame subsets in each video frame subset to the total number L of the extracted video frame subsets according to the definition of each video frame subset;

and comparing the ratio with a preset first threshold value, and judging that the text recording video clip to be detected is clear when the ratio is greater than the first threshold value.

Further, the step of inputting the N video frames into a character recognition model based on OCR to obtain a character recognition result of each frame of the N video frames, and determining the frame definition of each frame according to the character recognition result includes:

respectively calculating the number of characters contained in the character recognition result of each frame;

and comparing the number of the characters with a preset second threshold value, and judging that the corresponding video frame is clear when the number of the characters is greater than the second threshold value.

In order to solve the above technical problem, an embodiment of the present application further provides a device for detecting a definition of a text-recorded video, which adopts the following technical scheme:

the acquisition module is used for acquiring a service recording video;

the intercepting module is used for calculating a fuzziness curve of the service recorded video, intercepting the service recorded video according to the fuzziness curve and obtaining a text recorded video segment to be detected;

the extraction module is used for extracting N video frames in the text recording video clip to be detected, wherein N is a positive integer greater than 1;

the processing module is used for inputting the N video frames into an OCR-based character recognition model, obtaining a character recognition result of each frame in the N video frames, and judging the frame definition of each frame according to the character recognition result;

and the judging module is used for judging the definition of the text recording video clip to be detected according to the frame definition of each frame.

Further, the device for detecting the definition of the text-recorded video further comprises:

the first acquisition submodule is used for acquiring audio synchronized with the service recorded video;

the first processing sub-module is used for performing character conversion on the audio to obtain a character conversion result, comparing the character conversion result with a preset first keyword and a preset second keyword to obtain a first time point and a second time point of the first keyword and the second keyword appearing in the audio for the first time;

the first intercepting submodule is used for intercepting the service recorded video according to a time period formed by the first time point and the second time point to obtain a first video segment;

and the second intercepting submodule is used for calculating a fuzziness curve of the first video segment, intercepting the first video segment according to the fuzziness curve and obtaining a text recording video segment to be detected.

Further, the extraction module comprises:

the first analysis submodule is used for analyzing the text recording video clip to be detected into a video frame set;

the first extraction submodule is used for extracting L video frame subsets from the video frame set according to a set interval, wherein L is a positive integer larger than 1, the video frame subsets are formed by M video frames adjacent to each other in time in the video frame set, and M is a positive integer larger than 1.

Further, the judging module includes:

the second processing submodule is used for judging the definition of each video frame subset according to the frame definition of each frame;

and the first judgment submodule is used for judging the definition of the text recording video clip to be detected according to the definition of each video frame subset.

Further, the first determining sub-module includes:

a first calculating subunit, configured to calculate, according to the definition of each video frame subset, a ratio of the number of clear video frame subsets in each video frame subset to the total number L of extracted video frame subsets;

and the first judging subunit is used for comparing the ratio with a preset first threshold value, and judging that the text recording video clip to be detected is clear when the ratio is greater than the first threshold value.

Further, the processing module comprises:

the first calculation submodule is used for respectively calculating the number of characters contained in the character recognition result of each frame;

and the second judgment submodule is used for comparing the number of the characters with a preset second threshold value, and judging that the corresponding video frame is clear when the number of the characters is greater than the second threshold value.

In order to solve the above technical problem, an embodiment of the present application further provides a computer device, which adopts the following technical solutions:

a computer device comprises a memory and a processor, wherein computer readable instructions are stored in the memory, and the processor executes the computer readable instructions to realize the steps of the method for detecting the definition of the text-recorded video.

In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, which adopts the following technical solutions:

a computer readable storage medium having computer readable instructions stored thereon which, when executed by a processor, implement the steps of the method for detecting sharpness of a text-recorded video as described above.

Compared with the prior art, the embodiment of the application mainly has the following beneficial effects: recording a video by acquiring a service; calculating a ambiguity curve of the service recorded video, intercepting the service recorded video according to the ambiguity curve, and obtaining a text recorded video segment to be detected; extracting N video frames in the text recording video clip to be detected, wherein N is a positive integer greater than 1; inputting the N video frames into an OCR-based character recognition model, obtaining a character recognition result of each frame in the N video frames, and judging the frame definition of each frame according to the character recognition result; and judging the definition of the text recording video clip to be detected according to the frame definition of each frame. The definition of the text recording video clip is detected without watching the video by human eyes, so that the time and labor are saved, and the efficiency is higher.

Drawings

In order to more clearly illustrate the solution of the present application, the drawings needed for describing the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for detecting sharpness of a video recording of a text according to the present application;

FIG. 3 is a schematic diagram of an ambiguity curve of a service recorded video;

FIG. 4 is a schematic block diagram of one embodiment of a recorded text video sharpness detection apparatus according to the present application;

FIG. 5 is a schematic block diagram of one embodiment of a computer device according to the present application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may have various communication client applications installed thereon, such as a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture experts Group Audio Layer III, mpeg compression standard Audio Layer 3), MP4 players (Moving Picture experts Group Audio Layer IV, mpeg compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like.

The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the

terminal devices

101, 102, 103.

It should be noted that, the method for detecting the definition of a text-recorded video provided in the embodiments of the present application generally consists ofService Device/terminal equipmentThe implementation, accordingly, the text-recorded video sharpness detection means are generally arrangedServer/terminal deviceIn (1).

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continuing reference to fig. 2, a flow diagram of one embodiment of a method of text-recorded video sharpness detection in accordance with the present application is shown. The method for detecting the definition of the text recorded video comprises the following steps:

step S201, a service recording video is obtained.

In this embodiment, an electronic device (e.g., as shown in fig. 1) on which a text-recorded video sharpness detection method operatesServer/terminal device) The text recorded video clip to be detected can be received in a wired connection mode or a wireless connection mode. It should be noted that the wireless connection means may include, but is not limited to, a 3G/4G connection, a WiFi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, a uwb (ultra wideband) connection, and other wireless connection means now known or developed in the future.

And shooting a service through the electronic equipment with the camera to record the video clip.

Step S202, calculating a ambiguity curve of the service recorded video, intercepting the service recorded video according to the ambiguity curve, and obtaining a text recorded video segment to be detected.

In this embodiment, the intercepted segment is determined according to the characteristics of the ambiguity curve of the real scene service recorded video. Real scene services record video, which typically includes the following processes: the face image is recorded- > is converted to the text material- > is recorded to the text material- > is converted to the face image- > is recorded to the face image, the video recording equipment does not focus in the process of converting to the text material and the process of converting to the face, and the shot video is fuzzy, so that the change of the fuzziness curve of the real business recorded video is characterized by being clear- > fuzzy- > clear- > fuzzy-clear. And intercepting the service recorded video by using a time period corresponding to a second clear segment on the ambiguity curve, so as to obtain a text recorded video segment to be detected.

The ambiguity curve of the business recorded video is obtained by calculating the square of the gray difference of two adjacent pixels by using a Brenner gradient function D (f). D (f) the function is defined as follows:

D(f)＝∑_y∑_x|f(x+2,y)-f(x,y)|²

wherein f (x, y) represents the gray value of the pixel point (x, y) corresponding to the image f, and D (f) is the result of the image definition calculation.

According to the characteristics of the service recorded video, the ambiguity curve is roughly shown in fig. 3, wherein the video segment of the time period t1-t2 is the text recorded video segment to be detected.

By means of the characteristics of the ambiguity curve of the service recorded video, the intercepted time period can be obtained more accurately, the redundant part is effectively removed, and the efficiency of detecting the definition of the text recorded video segment is improved.

Step S203, extracting N video frames in the text recording video clip to be detected, wherein N is a positive integer greater than 1.

In this embodiment, the video is composed of a series of images that are played back in succession, which are video frames. The video frame rate played on the network is 30 frames/second, and can be reduced to 25 frames/second. The video frame extraction is carried out on the video, namely the video frame sampling is carried out on the video, so that the calculation amount can be reduced, and the processing efficiency can be improved.

For example, a video with a duration of 5 seconds is detected, the frame rate is 30 frames/second, the video has 150 frames, one video frame is extracted every 10 frames, 15 images are obtained, only 15 images need to be processed subsequently, and the 15 images are uniformly distributed on a time axis, so that the calculation amount is reduced, and the definition of the video can be truly embodied.

Step S204, inputting the N video frames into a character recognition model based on OCR, obtaining the character recognition result of each frame in the N video frames, and judging the frame definition of each frame according to the character recognition result.

In the embodiment, the extracted N video frames are input into an OCR-based character recognition model, and a character recognition result of each video frame is obtained. OCR (Optical Character Recognition) refers to a process in which an electronic device (e.g., a scanner or a digital camera) examines a Character printed on paper, determines its shape by detecting dark and light patterns, and then translates the shape into a computer text using a Character Recognition method. OCR-based character recognition models can be implemented by general-purpose software.

And respectively judging whether each frame is clear or not according to the character recognition result of each video frame. When the video frame can recognize characters through a character recognition model based on OCR, the video frame is considered to be clear, otherwise, the video frame is considered to be fuzzy.

And S205, judging the definition of the text recording video clip to be detected according to the frame definition of each frame.

In this embodiment, the number of video frames judged to be clear is calculated according to the frame definition of each frame obtained in step S204, the ratio of this value to the total number of extracted video frames is calculated, the obtained ratio is compared with a preset threshold, and when the ratio is greater than the preset threshold, it is judged that the recorded video of the text to be detected is clear.

The method comprises the steps of obtaining a service recording video; calculating a ambiguity curve of the service recorded video, intercepting the service recorded video according to the ambiguity curve, and obtaining a text recorded video segment to be detected; extracting N video frames in the text recording video clip to be detected, wherein N is a positive integer greater than 1; inputting the N video frames into an OCR-based character recognition model, obtaining a character recognition result of each frame in the N video frames, and judging the frame definition of each frame according to the character recognition result; and judging the definition of the text recording video clip to be detected according to the frame definition of each frame. The definition of the text recording video clip is detected without watching the video by human eyes, so that the time and labor are saved, and the efficiency is higher.

In some optional implementations of this embodiment, after step S201, the following steps are included:

acquiring audio synchronized with the service recorded video;

In the above embodiment, the recording of the real service scene includes not only video recording but also audio recording in synchronization. The video recording usually not only comprises a text recording video segment but also comprises a face recording part, in order to more accurately separate the text recording video segment, a time period for intercepting the service recording video is determined by the time point of the first occurrence of a keyword in an audio file recorded synchronously with the service recording video, for example, a salesman usually says 'reading' when starting to display text materials in the process of serving customers, and usually says 'reading is finished' when finishing displaying, wherein 'reading is required' and 'reading is finished' are respectively set as a first keyword and a second keyword, the audio is converted into characters through a general voice-to-character software, the character conversion result is compared with the first keyword and the second keyword to obtain a first time point and a second time point of the first occurrence of the first keyword and the second keyword in the audio, and the service recording video is intercepted at the first time point and the second time point, a first video segment is obtained.

In some embodiments, only the first keyword is set, a time point at which the first keyword appears in the audio is used as a start time, and the service recorded video is intercepted for a set duration, for example, the interception duration is set to 5 seconds, so as to obtain the first video segment, which may also play a role in removing a redundant portion and only keeping the text recorded video segment.

And then, calculating a fuzzy degree curve of the first video segment, removing a fuzzy part in the first video segment, further removing redundancy, and improving the efficiency and accuracy of the definition detection of the text recording video segment.

In some optional implementation manners of this embodiment, step S203 includes the following steps:

analyzing the text recording video clip to be detected into a video frame set;

In the embodiment, the recorded video of the text to be detected is used as an important element for post-monitoring, so that the focusing time is required to last for a certain time when the text material is recorded, and the text material can be conveniently identified by human eyes after the fact. In order to avoid the problem that the extraction mode may introduce the deviation between the definition detection and the human eye identification, namely the definition is detected by a computer and is not easily identified by the human eye, a plurality of video frames which are adjacent in time are extracted at regular intervals, for example 300 frames in the whole video segment, M frames are extracted continuously at fixed frame number intervals, M is a positive integer larger than 1, for example 5 frames are extracted at 20 frames.

Through the mode of continuously extracting a plurality of video frames each time, the factor of a certain time length is needed for simulating the identification of human eyes, so that the definition detection is more accurate.

In some optional implementations of this embodiment, step S205 includes the following steps:

step S301, judging the definition of each video frame subset according to the frame definition of each frame;

and S302, judging the definition of the text recording video clip to be detected according to the definition of each video frame subset.

In the above embodiment, the definition of each extracted video frame subset is determined according to the frame definition of each frame, and may be determined by calculating the ratio of the number of clear video frames in the video frame subset to the total number of video frames in the video frame subset, comparing the ratio with a set threshold, and determining that the video frame subset is clear if the ratio is greater than the set threshold, otherwise, determining that the video frame subset is fuzzy.

In some optional implementations of this embodiment, step S302 includes the following steps:

In the above embodiment, the definition of the text recording video segment to be detected is determined by calculating the ratio of the number of clear video frame subsets to the total number of extracted video frame subsets. And when the ratio is larger than a preset first threshold value, judging that the text recorded video segment to be detected is clear, otherwise, judging that the text recorded video segment to be detected is fuzzy.

The video frame subset is composed of a plurality of video frames which are adjacent in time, the factor that a certain time is needed for human eye identification is simulated, and the deviation between computer judgment and human eye identification can be avoided.

In some optional implementations of this embodiment, step S204 includes the following steps:

In the above embodiment, whether the corresponding video frame is clear is determined by calculating the number of characters included in the character recognition result of each frame, for example, the set second threshold is 20, when the number of characters recognized by one video frame through the character recognition model based on the OCR is greater than or equal to 20, it is determined that the video frame is clear, otherwise, it is determined that the video frame is blurred.

The embodiment judges whether the video frame is clear or not by setting the threshold value of the number of characters, and the judgment result is more objective and accurate.

The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware associated with computer readable instructions, which can be stored in a computer readable storage medium, and when executed, the processes of the embodiments of the methods described above can be included. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

With further reference to fig. 4, as an implementation of the method shown in fig. 2, the present application provides an embodiment of a device for detecting a sharpness of a text-recorded video, where the embodiment of the device corresponds to the embodiment of the method shown in fig. 2, and the device may be applied to various electronic devices.

As shown in fig. 4, the apparatus 400 for detecting the sharpness of a text-recorded video according to this embodiment includes: an acquisition module 401, an extraction module 403, a processing module 404, and a determination module 405. Wherein:

an obtaining module 401, configured to obtain a service recording video;

an intercepting module 402, configured to calculate a ambiguity curve of the service recorded video, and intercept the service recorded video according to the ambiguity curve to obtain a text recorded video segment to be detected;

an extracting module 403, configured to extract N video frames in the text recording video segment to be detected, where N is a positive integer greater than 1;

a processing module 404, configured to input the N video frames into an OCR-based character recognition model, obtain a character recognition result of each frame in the N video frames, and determine a frame definition of each frame according to the character recognition result;

and a judging module 405, configured to judge the definition of the text recording video segment to be detected according to the frame definition of each frame.

In this embodiment, a video is recorded by acquiring a service; calculating a ambiguity curve of the service recorded video, intercepting the service recorded video according to the ambiguity curve, and obtaining a text recorded video segment to be detected; extracting N video frames in the text recording video clip to be detected, wherein N is a positive integer greater than 1; inputting the N video frames into an OCR-based character recognition model, obtaining a character recognition result of each frame in the N video frames, and judging the frame definition of each frame according to the character recognition result; and judging the definition of the text recording video clip to be detected according to the frame definition of each frame. The definition of the text recording video clip is detected without watching the video by human eyes, so that the time and labor are saved, and the efficiency is higher.

In some optional implementations of this embodiment, the apparatus for detecting the sharpness of a text-recorded video further includes:

In some optional implementations of this embodiment, the extraction module 403 includes:

In some optional implementations of this embodiment, the determining module 405 includes:

Further, the first judgment sub-module includes:

In some optional implementations of this embodiment, the processing module 404 includes:

In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 5, fig. 5 is a block diagram of a basic structure of a computer device according to the present embodiment.

The computer device 5 comprises a memory 51, a processor 52, a network interface 53 communicatively connected to each other via a system bus. It is noted that only a computer device 5 having components 51-53 is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.

The memory 51 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the memory 51 may be an internal storage unit of the computer device 5, such as a hard disk or a memory of the computer device 5. In other embodiments, the memory 51 may also be an external storage device of the computer device 5, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the computer device 5. Of course, the memory 51 may also comprise both an internal storage unit of the computer device 5 and an external storage device thereof. In this embodiment, the memory 51 is generally used for storing an operating system installed in the computer device 5 and various application software, such as computer readable instructions of a method for detecting the definition of a text-recorded video. Further, the memory 51 may also be used to temporarily store various types of data that have been output or are to be output.

The processor 52 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 52 is typically used to control the overall operation of the computer device 5. In this embodiment, the processor 52 is configured to execute computer readable instructions or processing data stored in the memory 51, for example, execute computer readable instructions of the method for detecting the sharpness of a text-recorded video.

The network interface 53 may comprise a wireless network interface or a wired network interface, and the network interface 53 is generally used for establishing communication connections between the computer device 5 and other electronic devices.

In the embodiment, a service recorded video is obtained; calculating a ambiguity curve of the service recorded video, intercepting the service recorded video according to the ambiguity curve, and obtaining a text recorded video segment to be detected; extracting N video frames in the text recording video clip to be detected, wherein N is a positive integer greater than 1; inputting the N video frames into an OCR-based character recognition model, obtaining a character recognition result of each frame in the N video frames, and judging the frame definition of each frame according to the character recognition result; and judging the definition of the text recording video clip to be detected according to the frame definition of each frame. The definition of the text recording video clip is detected without watching the video by human eyes, so that the time and labor are saved, and the efficiency is higher.

The present application provides yet another embodiment, which is a computer-readable storage medium having computer-readable instructions stored thereon which are executable by at least one processor to cause the at least one processor to perform the steps of the method for detecting the sharpness of a recorded text video as described above.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims

1. A method for detecting the definition of a text recorded video is characterized by comprising the following steps:

acquiring a service recording video;

2. The method of claim 1, wherein the step of obtaining the service recorded video is followed by:

acquiring audio synchronized with the service recorded video;

3. The method for detecting the clarity of the recorded text video according to claim 1, wherein the step of extracting N video frames in the recorded text video segment to be detected comprises:

analyzing the text recording video clip to be detected into a video frame set;

4. The method according to claim 3, wherein the step of determining the sharpness of the to-be-detected text-recorded video segment according to the frame sharpness of each frame comprises:

5. The method according to claim 4, wherein the step of determining the sharpness of the to-be-detected text-recorded video segment according to the sharpness of the subset of the video frames comprises:

6. The method of claim 1, wherein the step of inputting the N video frames into an OCR-based character recognition model to obtain a character recognition result of each of the N video frames, and the step of determining the frame sharpness of each of the frames according to the character recognition result comprises:

7. A device for detecting the sharpness of a video recorded with a text, comprising:

the acquisition module is used for acquiring a service recording video;

8. The apparatus for detecting the sharpness of a video recording of claim 7, further comprising:

9. A computer device comprising a memory having computer readable instructions stored therein and a processor that when executed performs the steps of the method of detecting the sharpness of a text recording video according to any one of claims 1 to 7.

10. A computer readable storage medium having computer readable instructions stored thereon which, when executed by a processor, implement the steps of the method for detecting the sharpness of a text recording video according to any one of claims 1 to 7.