WO2020087713A1 - 视频质检方法、装置、计算机设备及存储介质 - Google Patents

视频质检方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2020087713A1
WO2020087713A1 PCT/CN2018/123132 CN2018123132W WO2020087713A1 WO 2020087713 A1 WO2020087713 A1 WO 2020087713A1 CN 2018123132 W CN2018123132 W CN 2018123132W WO 2020087713 A1 WO2020087713 A1 WO 2020087713A1
Authority
WO
WIPO (PCT)
Prior art keywords
link
video
video picture
transaction
text
Prior art date
Application number
PCT/CN2018/123132
Other languages
English (en)
French (fr)
Inventor
付舒婷
Original Assignee
深圳壹账通智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳壹账通智能科技有限公司 filed Critical 深圳壹账通智能科技有限公司
Priority to JP2021508040A priority Critical patent/JP7111887B2/ja
Priority to EP18938571.9A priority patent/EP3876549A4/en
Priority to SG11202101615QA priority patent/SG11202101615QA/en
Priority to KR1020207036022A priority patent/KR20210016551A/ko
Publication of WO2020087713A1 publication Critical patent/WO2020087713A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06395Quality analysis or management
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234336Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by media transcoding, e.g. video is transformed into a slideshow of still pictures or audio is converted into text
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream

Definitions

  • the present application relates to the technical field of video processing, and in particular to video quality inspection methods, devices, computer equipment, and storage media.
  • Embodiments of the present application provide a video quality inspection method, device, computer equipment, and storage medium to solve the problem of low timeliness of video quality inspection.
  • a video quality inspection method including:
  • the required reading rate means that the content of the target text containing the required reading text accounts for the required reading text proportion
  • the unreadable rate means that the content of the target text containing the unreadable text accounts for the unreadable text proportion
  • a video quality inspection device including:
  • Frame extraction module used for frame extraction processing of target video to get each video picture
  • the first detection module is used to perform face recognition on each video picture, detect whether each video picture includes the face of a designated person, and obtain a first detection result corresponding to each video picture;
  • the voice recognition module is used to perform voice recognition processing on the voice of the target video to obtain the target text;
  • a required reading rate calculation module used to calculate the required reading rate of the required reading text according to the target text and the preset required reading text, the required reading rate refers to the target text containing the required reading text The content accounts for the proportion of the required text;
  • An unreadable rate calculation module configured to calculate the unreadable rate of the unreadable text according to the target text and the preset unreadable read text, the unreadable rate means that the target text includes the unreadable text The proportion of the content in the unreadable text;
  • a second detection module configured to detect whether the required reading rate is higher than a preset first threshold and the unreadable rate is lower than a preset second threshold, to obtain a second detection result
  • a module for determining that the quality inspection does not pass is used to determine that the target video quality inspection fails if the first detection result or the second detection result is negative;
  • a quality inspection pass module is determined, which is used to determine that the target video quality inspection passes if both the first detection result and the second detection result are yes.
  • a computer device includes a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, and the processor implements the computer-readable instructions to implement the above-mentioned video quality inspection method A step of.
  • One or more non-volatile readable storage media storing computer readable instructions, the computer readable storage media storing computer readable instructions, so that the one or more processors execute the above video quality inspection method step.
  • FIG. 1 is a schematic diagram of an application environment of a video quality inspection method in an embodiment of the present application
  • FIG. 2 is a flowchart of a video quality inspection method in an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of a video quality inspection method step 102 in an application scenario in an embodiment of the present application
  • step 202 of a video quality inspection method in an application scenario in an embodiment of the present application is a schematic flowchart of step 202 of a video quality inspection method in an application scenario in an embodiment of the present application
  • FIG. 5 is a schematic flowchart of a video quality inspection method step 301 in an application scenario in an embodiment of the present application
  • FIG. 6 is a schematic structural diagram of a video quality inspection device in an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a computer device in an embodiment of the present application.
  • the video quality inspection method provided in this application can be applied in the application environment as shown in FIG. 1, in which the client communicates with the server through the network.
  • the client may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices.
  • the server can be realized by an independent server or a server cluster composed of multiple servers.
  • a video quality inspection method is provided.
  • the method is applied to the server in FIG. 1 as an example for illustration, which includes the following steps:
  • the server may perform frame extraction processing on the target video to obtain each video picture.
  • the server can extract image frames from the target video at equal time intervals to obtain each video picture. For example, an image frame is extracted as a video picture every 3 seconds. Assuming a video has a total of 30 seconds, a total of 11 can be extracted. Video pictures.
  • the server can perform face recognition on the video pictures, such as extracting facial features on each video picture.
  • the server may detect whether each video picture includes the face of the designated person, thereby obtaining the first detection result corresponding to each video picture. It is understandable that when conducting quality inspections on the video of the insurance transaction process, it is very important to check whether there are insurers and agents of insurance product transactions in the target video. If there are faces of the insured and the agent in the target video, the target video can be regarded as meeting this requirement.
  • the designated person may be an agent and / or an insured person, and detecting whether each video picture includes the face of the designated person may specifically be: detecting whether the facial features in each video picture include the designated person's pre- Facial features.
  • the server can pre-specify the facial features of the person as reserved facial features. For example, as an employee of an insurance company, the agent can reserve its facial features on the server, and the agent can request insurance before the insurance transaction The person performs face recognition, and collects the face features of the insured person and reserves them on the server.
  • the target video when the target video is actually inspected for quality, since the target video is obtained by recording a specified transaction process, and a specified transaction process often includes multiple transaction links, the target video can be divided into videos corresponding to multiple transaction links The quality inspection of each segment is carried out separately. During the quality inspection, the corresponding quality inspection standards can be preset for these transaction links, that is, the preset quality inspection conditions. When the video segments corresponding to all transaction links meet the preset quality inspection conditions, then It can be considered that the detection of the target video passes. To this end, specifically, in this embodiment, the target video is obtained by recording a designated transaction process, the designated transaction process includes each transaction link, and each video picture includes a link video picture corresponding to each transaction link; As shown in 3, step 102 may specifically include:
  • For each transaction link perform face recognition on the link video picture corresponding to the transaction link, detect whether the facial features of the designated person included in the link video picture meet the preset quality inspection conditions, and obtain all Describe the link test results of the transaction link;
  • step 201 it can be understood that the preset quality inspection conditions corresponding to each transaction link can be preset on the server, and during quality inspection, the preset quality inspection conditions corresponding to these transaction links are obtained.
  • step 202 for each transaction link, it can be seen that there may be more than one link video image corresponding to each transaction link. For example, assuming that the target video has a total of 30 seconds and a total of 3 transaction links, each of which has a transaction duration of 10 seconds. When drawing frames, a frame is drawn every 2 seconds as the video picture, then the first transaction link has a total of 5 link videos image.
  • the server can perform face recognition on each link video picture corresponding to each transaction link, and detect whether the facial features of the designated person included in these link video pictures meet the preset quality inspection conditions, thereby obtaining the transaction link Link test results. It should be noted that the preset quality inspection conditions corresponding to each transaction link can be set according to actual use conditions.
  • the designated person does not necessarily have to face the lens of the recorded video at all times, which leads to the fact that the facial features of the designated person cannot be detected on the video pictures of some links.
  • the facial features in a video link that reaches a certain proportion in a transaction link include the reserved facial features of the designated person.
  • the test result of the transaction link is yes.
  • a transaction link has a total of 5 link video pictures. If more than 60% of the link video pictures include the insured ’s face, it is considered that the preset quality inspection conditions are met. After detection, 4 of the 5 link video pictures The face of the insured appears on the video picture of the link, so the link detection result of the transaction link can be considered as yes.
  • step 203 and step 204 it is understandable that when the link detection result corresponding to each of the transaction links is yes, it means that the video segment corresponding to each transaction link on the specified transaction process meets the requirements, so it can be determined that all The first detection result corresponding to each video picture is yes; conversely, when any one of the link detection results corresponding to each of the transaction links is negative, it means that at least one video segment corresponding to the transaction link does not meet the requirements Therefore, the server may determine whether the first detection result corresponding to each video picture is negative.
  • step 202 may specifically include:
  • the qualified ratio refers to the link where the judgment result is yes.
  • the server can determine whether the facial features of the designated person in the link video picture are consistent with the reserved facial features required by the preset quality inspection conditions .
  • step 304 as can be seen from the above, for each transaction link, considering that the designated person does not necessarily have to face the lens of the recorded video at all times, this leads to the fact that the designated person cannot be detected on the video picture of some links. Facial features, for this reason, after obtaining the judgment results corresponding to the video images of each link in the transaction link, the server only needs to determine whether the qualified ratio exceeds the preset ratio threshold. If so, it means that the transaction link is comprehensive It meets the testing requirements. If it does not, it means that the transaction link does not generally meet the testing requirements.
  • the qualified ratio refers to the link video pictures with the judgment result being yes. The proportion in. For example, there are 5 link video pictures under a certain transaction link, and the judgment result of 3 link video pictures is yes, then the qualified ratio is 60%.
  • step 305 and step 306 it can be known that if the qualified ratio exceeds the preset ratio threshold, the transaction link can be considered to be comprehensively in line with the detection requirements, so the server can determine the test result of the transaction link as yes; otherwise, If the qualified ratio does not exceed the preset ratio threshold, it can be considered that the transaction link generally does not meet the detection requirements, so the server can determine whether the detection result of the transaction link is negative.
  • the designated transaction process is a sales process of insurance products
  • the designated personnel include agents and policyholders of the sales process, as shown in FIG. 5, in step 301, the determination is made Whether the facial features of the designated person in the link video picture are consistent with the reserved facial features required by the preset quality inspection conditions may specifically include:
  • both the first judgment result and the second judgment result are yes, determine that the facial features of the designated person in the link video picture are consistent with the reserved facial features required by the preset quality inspection conditions ;
  • step 401 and step 402 it can be understood that when the designated person includes an agent and an insured person, the server can separately determine whether the faces of the agent and the insured person are included in the link video picture. Specifically, the server can determine Whether the facial features identified in the link video picture include the first facial features reserved by the insured, to obtain a first judgment result, and whether the facial features identified in the link video picture include The second facial feature reserved by the agent obtains a second judgment result.
  • steps 403 and 404 after it is judged that the first judgment result and the second judgment result are obtained, when both the first judgment result and the second judgment result are yes, it means that the video picture of the link also includes an agent Face of the insured person and the face of the insured person, so it can be determined that the face characteristics of the designated person in the link video picture are consistent with the reserved face characteristics required by the preset quality inspection conditions; otherwise, when the first judgment If the result or the second judgment result is negative, it means that at least the face of the agent or the face of the insured person is missing in the video picture of the link, so the face characteristics and the face of the designated person in the link video picture can be determined.
  • the reserved facial features required by the preset quality inspection conditions are inconsistent.
  • steps 101-102 mainly perform quality inspection on the image in the target video
  • steps 103-106 perform quality inspection on the voice in the target video. These two steps can be performed independently.
  • the technician should understand that there is no strict execution sequence between steps 101-102 and steps 103-106.
  • the server may perform voice recognition processing on the voice of the target video, that is, perform voice recognition on the audio in the target video to obtain the target text.
  • the server may calculate the required reading rate of the required reading text according to the target text and the preset required reading text, where the required reading rate means that the content of the target text containing the required reading text accounts for the The percentage of required text. For example, assume that the mandatory text includes 10 designated sentences. It is found by detecting the target text that the target text includes 9 designated sentences. Therefore, the content of the target text including the mandatory text accounts for the mandatory reading. The proportion of text is 90%, which means that the required reading rate is 90%.
  • the server may calculate the unreadable rate of the unreadable text according to the target text and the preset unreadable read text, where the unreadable rate refers to the content of the target text containing the unreadable text Describe the proportion of unreadable text.
  • the preset unreadable text includes 10 specified sentences, and it is found by detecting the target text that the target text includes one of the specified sentences. Therefore, the content of the target text containing the unreadable text accounts for The ratio of reading text is 10%, that is, the unreadable rate is 10%.
  • the server can set the first threshold and the second threshold in advance, and then detect whether the required The read rate is higher than the preset first threshold and the unreadable rate is lower than the preset second threshold to obtain a second detection result.
  • the higher the first threshold the higher the reading rate of the required text in the specified transaction process; the lower the second threshold, the lower the reading rate of the unreadable text in the specified transaction process.
  • the preset first threshold may be set to 90%
  • the preset second threshold may be set to 0, and the preset second threshold is 0 indicating the specified transaction process
  • the target video cannot have unreadable text, for example, no insulting sentence.
  • step 107 Determine whether both the first detection result and the second detection result are yes, if not, go to step 108, if yes, go to step 109
  • step 107 for the quality inspection of the target video, both the image and voice of the target video are required to pass the quality inspection. Therefore, if the first detection result or the second detection result is no, then It may be determined that the target video quality inspection fails; otherwise, if both the first detection result and the second detection result are yes, it may be determined that the target video quality inspection passes.
  • the target video after the target video is subjected to automatic quality inspection of the server, regardless of whether the quality inspection is passed or not, it may be randomly inspected to an artificial quality inspection position for inspection. For inspection, you can mark the important time points of the target video in the quality inspection process, and set links to jump to these time points to facilitate the staff to quickly check.
  • the method may also include one or more of the following four marking methods:
  • the first way Mark the missed reading link at the first time point of the start playback position of the target video.
  • the missed reading link refers to the transaction link where the required reading text missed in the target text is located.
  • the first time point is provided with a link to jump to the target video and start playing from the first time point.
  • the first time point at which the missed reading link is at the start playing position of the target video can be marked, and a jump link is set on the first time point, and the staff can directly click the first time Point, the server automatically opens the target video and locates it at the start of the missed reading link, which greatly facilitates the random inspection work of the staff.
  • the second way mark the unreadable link at the second time point of the start playback position of the target video, the unreadable link refers to the transaction link where the unreadable text appears in the target text, the marked The second time point is provided with a link to jump to the target video and start playing from the second time point.
  • the server may mark the second time point at which the unreadable link is at the start playing position of the target video, and the second time point is provided with a jump link to work
  • the personnel can directly click on the second time point, and the server automatically opens the target video and locates the position where the unreadable link starts to play, which greatly facilitates the random inspection work of the staff.
  • the third method mark the third time point at which each of the transaction links is respectively at the start playing position of the target video, the marked third time point is provided with a jump to the target video and from the The link to start playing at three points in time; for the third way, sometimes the staff wants to check the image and voice of a certain transaction link in the target video spot-checked, so the server can also mark each of the transaction links as The third time point of the start playing position of the target video, the third time point is provided with a jump link, the staff can click the third time point, the server automatically opens the target video and locates the start of the transaction link Start playing the target video at the location.
  • the marked fourth time point is provided with a jump to the fourth time point on the target video Link to the playback location.
  • the target video extracts each video picture during the automatic quality inspection process.
  • the server may also mark each of the video pictures as The fourth time point of the playback position of the target video, a jump link is set on the fourth time point, the staff can click the fourth time point corresponding to a certain video picture, the server automatically opens the target video and locates to The playback position of the video picture.
  • the target video is frame-processed to obtain each video picture; then, face recognition is performed on each video picture to detect whether each video picture includes the person of the designated person Face, get the first detection result corresponding to each video picture; on the other hand, perform speech recognition processing on the voice of the target video to obtain the target text; then, calculate the target text according to the target text and the preset required reading text
  • the required reading rate of the required text refers to the proportion of content in the target text that contains the required text to the required text.
  • the unreadable rate refers to the content of the target text containing the unreadable text in the proportion of the unreadable text; finally, detect whether the required A read rate is higher than a preset first threshold and the unreadable rate is lower than a preset second threshold to obtain a second detection result; if the first detection result or the second detection If NO, it is determined that the quality does not pass through the target video; if the first detection result and the second detection results are yes, it is determined by the target video quality.
  • the target video passes the quality inspection based on the first detection result obtained by detecting the video picture and the second detection result obtained by detecting the voice. Not only can the quality inspection of the video be completed more accurately, but also the efficiency of the video quality inspection can be improved. When facing a large number of videos requiring quality inspection, the timeliness of the video quality inspection can also be ensured.
  • a video quality inspection device corresponds to the video quality inspection method in the foregoing embodiment.
  • the video quality inspection device includes a frame extraction module 501, a first detection module 502, a voice recognition module 503, a required reading rate calculation module 504, an unreadable rate calculation module 505, a second detection module 506, The inspection failed module 507 and the determined quality inspection passed module 508.
  • the detailed description of each functional module is as follows:
  • the frame extraction module 501 is used to perform frame extraction on the target video to obtain various video pictures
  • the first detection module 502 is configured to perform face recognition on each video picture, detect whether each video picture includes the face of a designated person, and obtain a first detection result corresponding to each video picture;
  • the voice recognition module 503 is used to perform voice recognition processing on the voice of the target video to obtain target text;
  • the required reading rate calculation module 504 is configured to calculate the required reading rate of the required reading text according to the target text and the preset required reading text, and the required reading rate means that the target text includes the required reading text The proportion of the content in the required reading text;
  • the unreadable rate calculation module 505 is configured to calculate the unreadable rate of the unreadable text according to the target text and the preset unreadable read text, the unreadable rate means that the target text includes the unreadable rate The content of the text accounts for the proportion of the unreadable text;
  • the second detection module 506 is configured to detect whether the required reading rate is higher than a preset first threshold and the unreadable rate is lower than a preset second threshold to obtain a second detection result;
  • Determining that the quality inspection fails module 507 is used to determine that the target video quality inspection fails if the first detection result or the second detection result is negative;
  • the determination quality inspection passing module 508 is configured to determine that the target video quality inspection passes if both the first detection result and the second detection result are yes.
  • the target video is obtained by recording a designated transaction process
  • the designated transaction process includes various transaction links
  • each video picture may include a link video picture corresponding to each transaction link
  • the first detection module may include:
  • a quality inspection condition obtaining unit configured to obtain a preset quality inspection condition corresponding to each of the transaction links
  • a link detection unit for each transaction link, to perform face recognition on the link video picture corresponding to the transaction link, to detect whether the facial features of the designated person included in the link video picture meet the preset quality inspection Conditions to obtain the link test results of the transaction link;
  • a first determining unit configured to determine that the first detection result corresponding to each video picture is yes if the link detection results corresponding to each of the transaction links are yes;
  • the second determining unit is configured to determine whether the first detection result corresponding to each video picture is negative if any link detection result corresponding to each of the transaction links is negative.
  • the link detection unit may include:
  • the face feature judgment subunit is used to determine for each link video picture whether the face features of the designated person in the link video picture are consistent with the reserved face features required by the preset quality inspection conditions;
  • the first determining subunit is used for each link video picture. If the judgment result of the face feature judgment subunit is yes, the judgment result of the link video picture is yes;
  • the second determining subunit is used for each link video picture, and if the face feature determines that the determination result of the subunit is no, the determination result of the link video picture is no;
  • the qualified ratio judgment subunit is used to determine whether the qualified ratio exceeds the preset ratio threshold for each transaction link, after obtaining the respective judgment results corresponding to each link video picture under the transaction link, the qualified ratio refers to The proportion of the link video pictures whose judgment result is yes in each link video picture under the transaction link;
  • a third determination subunit configured to determine the link detection result of the transaction link as yes if the judgment result of the qualified ratio determination subunit is yes;
  • the fourth determination subunit is used to determine whether the link detection result of the transaction link is negative if the judgment result of the qualified ratio judgment subunit is negative.
  • the facial feature judgment subunit may include:
  • the first judgment subunit is used to judge whether the facial features identified in the link video pictures include the first facial features reserved by the insured, to obtain the first judgment result;
  • a second judgment subunit used to judge whether the facial features identified in the link video pictures include the second facial features reserved by the agent, to obtain a second judgment result
  • the first determining subunit is used to determine the facial features of the designated person in the link video picture and the requirements of the preset quality inspection conditions if the first judgment result and the second judgment result are both yes
  • the reserved facial features are consistent
  • the second determination sub-unit is used to determine the face characteristics of the designated person in the link video picture and the pre-determined requirements for the preset quality inspection conditions if the first judgment result or the second judgment result is negative
  • the facial features are inconsistent.
  • the video quality inspection device may further include:
  • the first marking module is used to mark the first time point when the missed reading link is at the start playing position of the target video.
  • the missed reading link refers to the transaction link where the required reading text missed in the target text is located.
  • a link to jump to the target video and start playing from the first time point is provided on the first time point of;
  • the second marking module is used to mark the second time point when the unreadable link is at the start playing position of the target video.
  • the unreadable link refers to the transaction link where the unreadable text appearing in the target text is marked.
  • the second time point is provided with a link to jump to the target video and start playing from the second time point;
  • the third marking module is used to mark the third time point at which each of the transaction links are respectively at the start playing position of the target video, the marked third time point is provided with a jump to the target video and from the The link to start playing at the third point in time;
  • a fourth marking module configured to mark a fourth time point at which each of the video pictures is respectively at the playback position of the target video, and a jump to a fourth time on the target video is set at the marked fourth time point Click the link of the playback position.
  • Each module in the above-mentioned video quality inspection device may be implemented in whole or in part by software, hardware, or a combination thereof.
  • the above modules may be embedded in the hardware or independent of the processor in the computer device, or may be stored in the memory in the computer device in the form of software so that the processor can call and execute the operations corresponding to the above modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure may be as shown in FIG. 7.
  • the computer device includes a processor, memory, network interface, and database connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, computer-readable instructions, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium.
  • the database of the computer device is used to store the data involved in the video quality inspection method.
  • the network interface of the computer device is used to communicate with external terminals through a network connection. When the computer readable instructions are executed by the processor to implement a video quality inspection method.
  • a computer device which includes a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor.
  • the processor executes the computer-readable instructions
  • the video in the foregoing embodiment is implemented
  • the steps of the quality inspection method are, for example, steps 101 to 109 shown in FIG. 2.
  • the processor executes the computer-readable instructions
  • the functions of each module / unit of the video quality inspection device in the foregoing embodiments are implemented, for example, the functions of the modules 501 to 508 shown in FIG. 6. To avoid repetition, I will not repeat them here.
  • a computer-readable storage medium is provided, the one or more non-volatile storage media storing computer-readable instructions, when the computer-readable instructions are executed by one or more processors To enable one or more processors to execute the steps of the video quality inspection method in the above method embodiments when the computer-readable instructions are executed, or the one or more non-volatile storage media storing the computer-readable instructions,
  • the computer-readable instructions are executed by one or more processors, when one or more processors execute the computer-readable instructions, the functions of each module / unit in the video quality inspection device in the foregoing device embodiments are implemented. To avoid repetition, I will not repeat them here.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM random access memory
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain (Synchlink) DRAM
  • RDRAM direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Abstract

本申请公开了一种视频质检方法,用于解决视频质检时效性低的问题。本申请提供的方法包括:对目标视频进行抽帧处理,得到各个视频图片;对各个视频图片进行人脸识别,检测各个视频图片中是否包括指定人员的人脸,得到各个视频图片对应的第一检测结果;对目标视频的语音进行语音识别处理,得到目标文本;根据目标文本和预设的必读文本计算必读文本的必读率;根据目标文本和预设的不可读读文本计算不可读文本的不可读率;检测是否必读率高于预设第一阈值且不可读率低于预设第二阈值,得到第二检测结果;若均为是,则确定目标视频质检通过;反之,则确定目标视频质检不通过。本申请还提供视频质检装置、计算机设备及存储介质。

Description

视频质检方法、装置、计算机设备及存储介质
本申请以2018年11月2日提交的申请号为201811301549.9,名称为“视频质检方法、装置、计算机设备及存储介质”的中国发明专利申请为基础,并要求其优先权。
技术领域
本申请涉及视频处理技术领域,尤其涉及视频质检方法、装置、计算机设备及存储介质。
背景技术
随着保险行业的发展和完善,保险公司对保险销售过程的管控要求越来越高。目前,当销售保险产品的代理人与投保人进行身份核验、投保注意事项告知、保险合同签订等环节时,均需要对这些环节的全过程进行录音录像,录制得到的视频将提交给保险公司的系统,专业的质检人员会通过系统对这些视频进行质量检查。
然而,随着保险产品的销量增多,录制保险交易过程得到的视频数量也越来越多,有限的质检人员往往难以及时完成对视频的质检工作,容易导致对这些视频质检的时效性降低。
发明内容
本申请实施例提供一种视频质检方法、装置、计算机设备及存储介质,以解决视频质检时效性低的问题。
一种视频质检方法,包括:
对目标视频进行抽帧处理,得到各个视频图片;
对所述各个视频图片进行人脸识别,检测各个视频图片中是否包括指定人员的人脸,得到各个视频图片对应的第一检测结果;
对所述目标视频的语音进行语音识别处理,得到目标文本;
根据所述目标文本和预设的必读文本计算所述必读文本的必读率,所述必读率是指所述目标文本中包含所述必读文本的内容占所述必读文本的比例;
根据所述目标文本和预设的不可读读文本计算所述不可读文本的不可读率,所述不可读率是指所述目标文本中包含所述不可读文本的内容占所述不可读文本的比例;
检测是否所述必读率高于预设第一阈值且所述不可读率低于预设第二阈值,得到第二检测结果;
若所述第一检测结果或所述第二检测结果为否,则确定所述目标视频质检不通过;
若所述第一检测结果和所述第二检测结果均为是,则确定所述目标视频质检通过。
一种视频质检装置,包括:
抽帧模块,用于对目标视频进行抽帧处理,得到各个视频图片;
第一检测模块,用于对所述各个视频图片进行人脸识别,检测各个视频图片中是否包括指定人员的人脸,得到各个视频图片对应的第一检测结果;
语音识别模块,用于对所述目标视频的语音进行语音识别处理,得到目标文本;
必读率计算模块,用于根据所述目标文本和预设的必读文本计算所述必读文本的必读率,所述必读率是指所述目标文本中包含所述必读文本的内容占所述必读文本的比例;
不可读率计算模块,用于根据所述目标文本和预设的不可读读文本计算所述不可读文本的不可读率,所述不可读率是指所述目标文本中包含所述不可读文本的内容占所述不可读文本的比例;
第二检测模块,用于检测是否所述必读率高于预设第一阈值且所述不可读率低于预设第二阈值,得到第二检测结果;
确定质检不通过模块,用于若所述第一检测结果或所述第二检测结果为否,则确定所述目标视频质检不通过;
确定质检通过模块,用于若所述第一检测结果和所述第二检测结果均为是,则确定所述目标视频质检通过。
一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现上述视频质检方法的步骤。
一个或多个存储有计算机可读指令的非易失性可读存储介质,所述计算机可读存储介质存储有计算机可读指令,使得所述一个或多个处理器执行上述视频质检方法的步骤。
本申请的一个或多个实施例的细节在下面的附图和描述中提出,本申请的其他特征和优点将从说明书、附图以及权利要求变得明显。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请一实施例中视频质检方法的一应用环境示意图;
图2是本申请一实施例中视频质检方法的一流程图;
图3是本申请一实施例中视频质检方法步骤102在一个应用场景下的流程示意图;
图4是本申请一实施例中视频质检方法步骤202在一个应用场景下的流程示意图;
图5是本申请一实施例中视频质检方法步骤301在一个应用场景下的流程示意图;
图6是本申请一实施例中视频质检装置的结构示意图;
图7是本申请一实施例中计算机设备的一示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请提供的视频质检方法,可应用在如图1的应用环境中,其中,客户端通过网络与服务器进行通信。其中,该客户端可以但不限于各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备。服务器可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
在一实施例中,如图2所示,提供一种视频质检方法,以该方法应用在图1中的服务器为例进行说明,包括如下步骤:
101、对目标视频进行抽帧处理,得到各个视频图片;
本实施例中,在确定出本方法需要质检的目标视频之后,服务器可以对该目标视频进行抽帧处理,得到各个视频图片。具体地,服务器可以从目标视频上以等时间间距抽取图像帧,得到各个视频图片,比如每3秒钟抽取一个图像帧作为视频图片,假设某个视频共30秒,则总共可以提取得到11个视频图片。
102、对所述各个视频图片进行人脸识别,检测各个视频图片中是否包括指定人员的 人脸,得到各个视频图片对应的第一检测结果;
可以理解的是,服务器在得到各个视频图片之后,可以对这些视频图片进行人脸识别,比如提取各个视频图片上的人脸特征。服务器可以检测各个视频图片中是否包括指定人员的人脸,从而得到各个视频图片对应的第一检测结果。可以理解的是,在对保险交易过程的视频进行质检时,其中很重要的一点是,检查目标视频中是否存在保险产品交易的投保人和代理人。若目标视频中存在投保人和代理人的人脸,则可以认为该目标视频符合这一规定。因此,具体地,该指定人员可以是代理人和/或投保人,而检测各个视频图片中是否包括指定人员的人脸具体可以是:检测各个视频图片中的人脸特征是否包含指定人员的预留人脸特征。服务器可以预先指定人员的人脸特征作为预留人脸特征,比如代理人作为保险公司的员工,可以预留其人脸特征在服务器上,而投保人在进行保险交易之前,代理人可以要求投保人进行人脸识别,采集投保人的人脸特征预留在服务器上。
进一步地,在实际对目标视频进行质检时,由于目标视频是录制指定交易流程得到的,而一个指定交易流程往往包含多个交易环节,因此可以将目标视频划分为多个交易环节对应的视频段分别进行质检,质检时,对这些交易环节可以分别预设对应的质检标准,也即预设质检条件,当所有交易环节对应的视频段均满足预设质检条件时,则可以认为该目标视频的检测通过。为此,具体地,在本实施例中,所述目标视频由录制指定交易流程得到,所述指定交易流程包括各个交易环节,所述各个视频图片包括各个交易环节对应的环节视频图片;如图3所示,步骤102具体可以包括:
201、获取各个所述交易环节对应的预设质检条件;
202、针对每个交易环节,对所述交易环节对应的环节视频图片进行人脸识别,检测所述环节视频图片中包括的指定人员的人脸特征是否满足所述预设质检条件,得到所述交易环节的环节检测结果;
203、若各个所述交易环节对应的环节检测结果均为是,则确定所述各个视频图片对应的第一检测结果为是;
204、若各个所述交易环节对应的环节检测结果中任一环节检测结果为否,则确定所述各个视频图片对应的第一检测结果为否。
对于步骤201,可以理解的是,服务器上可以预先设置好各个交易环节对应的预设质检条件,在质检时,获取这些交易环节对应的预设质检条件。
对于步骤202,针对每个交易环节,可知,每个交易环节对应的环节视频图片可以有1个以上。比如,假设目标视频共30秒,共3个交易环节,每个交易环节10秒,在抽帧时,每2秒抽一帧作为所述视频图片,则第一个交易环节共有5个环节视频图片。服务器可以对每个交易环节对应的各张环节视频图片进行人脸识别,并检测这些环节视频图片中包括的指定人员的人脸特征是否满足所述预设质检条件,从而得到该交易环节的环节检测结果。需要说明的是,这里对于每个交易环节对应的预设质检条件,具体可以根据实际使用情况进行设定。例如,考虑到在每个交易环节中,指定人员不一定非要时时刻刻均正脸面对录制视频的镜头,这就导致有些环节视频图片上无法检测到指定人员的人脸特征,为此,对于交易环节对应的预设质检条件,可以设置相对宽松的标准,比如一个交易环节上达到一定比例的环节视频图片中的人脸特征包括指定人员的预留人脸特征,则可以认为该交易环节的环节检测结果为是。比如,某个交易环节共有5个环节视频图片,设定60%以上的环节视频图片中包括投保人的人脸则认为满足预设质检条件,经过检测,5个环节视频图片中有4个环节视频图片上出现投保人的人脸,因此可以认为该交易环节的环节检测结果为是。
对于步骤203和步骤204,可以理解的死,当各个所述交易环节对应的环节检测结果均为是时,则说明该指定交易流程上各个交易环节对应的视频段均符合要求,因此可以 确定所述各个视频图片对应的第一检测结果为是;反之,当各个所述交易环节对应的环节检测结果中任一环节检测结果为否时,则说明至少存在一个交易环节对应的视频段不符合要求,因此服务器可以确定所述各个视频图片对应的第一检测结果为否。
更进一步地,如图4所示,步骤202具体可以包括:
301、针对每张环节视频图片,判断所述环节视频图片中指定人员的人脸特征是否与所述预设质检条件要求的预留人脸特征一致;
302、针对每张环节视频图片,若所述环节视频图片中指定人员的人脸特征与所述预设质检条件要求的预留人脸特征一致,则确定所述环节视频图片的判断结果为是;
303、针对每张环节视频图片,若所述环节视频图片中指定人员的人脸特征与所述预设质检条件要求的预留人脸特征不一致,则确定所述环节视频图片的判断结果为否;
304、针对每个交易环节,在得到所述交易环节下各张环节视频图片对应的各个判断结果后,判断合格比例是否超过预设的比例阈值,所述合格比例是指判断结果为是的环节视频图片在所述交易环节下各张环节视频图片中的占比;
305、若合格比例超过预设的比例阈值,则确定所述交易环节的环节检测结果为是;
306、若合格比例不超过预设的比例阈值,则确定所述交易环节的环节检测结果为否。
对于上述步骤301,在一个交易环节中,针对每张环节视频图片,服务器可以判断所述环节视频图片中指定人员的人脸特征是否与所述预设质检条件要求的预留人脸特征一致。
对于步骤302和303,容易理解的是,当所述环节视频图片中指定人员的人脸特征与所述预设质检条件要求的预留人脸特征一致时,也就是该环节视频图片的判断结果为是;反之,当所述环节视频图片中指定人员的人脸特征与所述预设质检条件要求的预留人脸特征不一致时,也就是该环节视频图片的判断结果为否。
对于步骤304,由上述内容可知,针对每个交易环节,考虑到指定人员不一定非要时时刻刻均正脸面对录制视频的镜头,这就导致有些环节视频图片上无法检测到指定人员的人脸特征,为此,在得到所述交易环节下各张环节视频图片对应的各个判断结果后,服务器只需判断合格比例是否超过预设的比例阈值,若是,则说明该交易环节综合上来说是符合检测要求的,若否,则说明该交易环节综合上来说不符合检测要求的,其中,所述合格比例是指判断结果为是的环节视频图片在所述交易环节下各张环节视频图片中的占比。举例说明,某个交易环节下共5张环节视频图片,其中3张环节视频图片的判断结果为是,则该合格比例为60%。
对于步骤305和步骤306,可知,若合格比例超过预设的比例阈值,则可以认为该交易环节综合上来说符合检测的要求,因此服务器可以确定所述交易环节的环节检测结果为是;反之,若合格比例不超过预设的比例阈值,则可以认为该交易环节综合上来说不符合检测的要求,因此服务器可以确定所述交易环节的环节检测结果为否。
更进一步地,在如下情况下,所述指定交易流程为保险产品的销售流程,所述指定人员包括所述销售流程的代理人和投保人,如图5所示,步骤301中,判断所述环节视频图片中指定人员的人脸特征是否与所述预设质检条件要求的预留人脸特征一致具体可以包括:
401、判断所述环节视频图片中识别出的人脸特征是否包括所述投保人预留的第一人脸特征,得到第一判断结果;
402、判断所述环节视频图片中识别出的人脸特征是否包括所述代理人预留的第二人脸特征,得到第二判断结果;
403、若所述第一判断结果和所述第二判断结果均为是,则确定所述环节视频图片中指定人员的人脸特征与所述预设质检条件要求的预留人脸特征一致;
404、若所述第一判断结果或所述第二判断结果为否,则确定所述环节视频图片中指定人员的人脸特征与所述预设质检条件要求的预留人脸特征不一致。
对于步骤401和步骤402,可以理解的是,当指定人员包括代理人和投保人时,服务器可以分别对环节视频图片中是否包括代理人和投保人的人脸进行判断,具体地,服务器可以判断所述环节视频图片中识别出的人脸特征是否包括所述投保人预留的第一人脸特征,得到第一判断结果,以及可以判断所述环节视频图片中识别出的人脸特征是否包括所述代理人预留的第二人脸特征,得到第二判断结果。
对于步骤403和404,在判断得到第一判断结果和第二判断结果之后,当所述第一判断结果和所述第二判断结果均为是时,代表该环节视频图片中同时包括了代理人的人脸和投保人的人脸,因此可以确定所述环节视频图片中指定人员的人脸特征与所述预设质检条件要求的预留人脸特征一致;反之,当所述第一判断结果或所述第二判断结果为否时,则代表该环节视频图片中至少缺少了代理人的人脸或投保人的人脸,因此可以确定所述环节视频图片中指定人员的人脸特征与所述预设质检条件要求的预留人脸特征不一致。
103、对所述目标视频的语音进行语音识别处理,得到目标文本;
可以理解的是,上述步骤101-102主要是对目标视频中的图像进行质检,下述步骤103-106则是对目标视频中的语音进行质检,这两部分步骤可以独立执行,本领域技术人员应当清楚步骤101-102与步骤103-106之间没有严格的执行先后顺序。
对于步骤103,在得到目标视频之后,服务器可以对该目标视频的语音进行语音识别处理,也即对目标视频中的音频进行语音识别,得到目标文本。
104、根据所述目标文本和预设的必读文本计算所述必读文本的必读率,所述必读率是指所述目标文本中包含所述必读文本的内容占所述必读文本的比例;
可以理解的是,在指定交易流程中可以规定指定人员的必读话术,比如保险销售的代理人必须要对投保人详细讲述投保的注意事项和风险等内容,并录制到视频中。因此,在对目标视频的语音进行质检时,需要考察该语音中是否包括了所要求的必读文本。服务器可以根据所述目标文本和预设的必读文本计算所述必读文本的必读率,其中,所述必读率是指所述目标文本中包含所述必读文本的内容占所述必读文本的比例。举例说明,假设必读文本包括了10个指定语句,通过检测目标文本发现,目标文本中包括了9个该指定语句,因此所述目标文本中包含所述必读文本的内容占所述必读文本的比例为90%,也即该必读率为90%。
105、根据所述目标文本和预设的不可读读文本计算所述不可读文本的不可读率,所述不可读率是指所述目标文本中包含所述不可读文本的内容占所述不可读文本的比例;
与必读文本相反的是,在指定交易流程中可以规定指定人员的不可读话术,比如保险销售的代理人在与投保人交谈过程中不能说出的侮辱性语言。因此,在对目标视频的语音进行质检时,需要考察该语音中是否出现了不可读文本。服务器可以根据所述目标文本和预设的不可读读文本计算所述不可读文本的不可读率,其中,所述不可读率是指所述目标文本中包含所述不可读文本的内容占所述不可读文本的比例。举例说明,预设的不可读文本包括了10个指定语句,通过检测目标文本发现,目标文本中包括了1个该指定语句,因此该目标文本中包含所述不可读文本的内容占所述不可读文本的比例为10%,也即该不可读率为10%。
106、检测是否所述必读率高于预设第一阈值且所述不可读率低于预设第二阈值,得到第二检测结果;
在指定交易流程中,通常要求上述必读率要达到一定的值,二不可读率则需要低于一定值,因此,服务器可以预先设定第一阈值和第二阈值,然后检测是否所述必读率高于预设第一阈值且所述不可读率低于预设第二阈值,得到第二检测结果。其中,该第一 阈值越高,则代表要求指定交易流程中必读文本的阅读比例要越高;该第二阈值越低,则代表要求指定交易流程中不可读文本的阅读比例要越低。因此,在某个保险产品的指定交易流程中,该预设第一阈值可以设定为90%,该预设第二阈值可以设定为0,预设第二阈值为0表示该指定交易流程的目标视频中不能出现不可读文本,例如不能出现任何一句侮辱性语句。
107、判断所述第一检测结果和所述第二检测结果是否均为是,若否,则执行步骤108,若是,则执行步骤109
108、确定所述目标视频质检不通过;
109、确定所述目标视频质检通过。
对于步骤107、步骤108和步骤109,对于目标视频的质检,要求该目标视频的图像和语音均通过质检,因此,若所述第一检测结果或所述第二检测结果为否,则可以确定所述目标视频质检不通过;反之,若所述第一检测结果和所述第二检测结果均为是,则可以确定所述目标视频质检通过。
进一步地,本实施例中,在目标视频进行服务器的自动质检之后,不论质检通过与否,均可能会被抽检至人工质检岗位进行检查,为了便于工作人员对目标视频自动质检的检查,可以标记出目标视频在质检过程中的重要时间点,并且设置跳转至这些时间点的链接,以方便工作人员快速检查。具体地,本方法还可以包括以下四种标记方式中的一种或多种来实现:
第一种方式:标记漏读环节处于所述目标视频的开始播放位置的第一时间点,所述漏读环节是指所述目标文本中漏读的必读文本所在的交易环节,标记的所述第一时间点上设置有跳转至所述目标视频且从所述第一时间点开始播放的链接。在第一种方式中,可以标记出漏读环节处于所述目标视频的开始播放位置的第一时间点,该第一时间点上设置有跳转的链接,工作人员可以直接点击该第一时间点,服务器则自动打开该目标视频并定位到漏读环节的开始播放位置处,极大地方便了工作人员的抽检工作。
第二种方式:标记不可读环节处于所述目标视频的开始播放位置的第二时间点,所述不可读环节是指所述目标文本中出现的不可读文本所在的交易环节,标记的所述第二时间点上设置有跳转至所述目标视频且从所述第二时间点开始播放的链接。对于第二种方式,与第一种方式同理,服务器可以标记出不可读环节处于所述目标视频的开始播放位置的第二时间点,该第二时间点上设置有跳转的链接,工作人员可以直接点击该第二时间点,服务器则自动打开该目标视频并定位到不可读环节的开始播放位置处,极大地方便了工作人员的抽检工作。
第三种方式:标记各个所述交易环节分别处于所述目标视频的开始播放位置的第三时间点,标记的所述第三时间点上设置有跳转至所述目标视频且从所述第三时间点开始播放的链接;对于第三种方式,有时候工作人员希望检查抽检到的目标视频中的某个交易环节的图像和语音,因此服务器还可以标记出各个所述交易环节分别处于所述目标视频的开始播放位置的第三时间点,该第三时间点上设置有跳转的链接,工作人员可以点击该第三时间点,服务器自动打开目标视频并定位到该交易环节的开始播放位置处开始播放该目标视频。
第四种方式:标记各个所述视频图片分别处于所述目标视频的播放位置的第四时间点,标记的所述第四时间点上设置有跳转至所述目标视频上第四时间点所处播放位置的链接。在第四种方式中,目标视频在自动质检过程中抽帧出了各个视频图片,为便于工作人员检查其中的某个或某些视频图片,服务器还可以标记出各个所述视频图片分别处于所述目标视频的播放位置的第四时间点,该第四时间点上设置有跳转的链接,工作人员可以点击某个视频图片对应的第四时间点,服务器自动打开目标视频并定位至该视频图片所处播放位置。
由上述内容可知,本申请实施例中,首先,对目标视频进行抽帧处理,得到各个视频图片;然后,对所述各个视频图片进行人脸识别,检测各个视频图片中是否包括指定人员的人脸,得到各个视频图片对应的第一检测结果;另一方面,对所述目标视频的语音进行语音识别处理,得到目标文本;接着,根据所述目标文本和预设的必读文本计算所述必读文本的必读率,所述必读率是指所述目标文本中包含所述必读文本的内容占所述必读文本的比例;再之,根据所述目标文本和预设的不可读读文本计算所述不可读文本的不可读率,所述不可读率是指所述目标文本中包含所述不可读文本的内容占所述不可读文本的比例;最后,检测是否所述必读率高于预设第一阈值且所述不可读率低于预设第二阈值,得到第二检测结果;若所述第一检测结果或所述第二检测结果为否,则确定所述目标视频质检不通过;若所述第一检测结果和所述第二检测结果均为是,则确定所述目标视频质检通过。可见,通过将目标视频拆分成视频图片和语音两大部分分别进行质检,根据检测视频图片得到的第一检测结果和检测语音得到的第二检测结果综合判断该目标视频是否质检通过,不仅可以更准确地完成对视频的质检,而且可以提高视频质检的效率,面对大量需质检的视频时,也可以保证视频质检的时效性。
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
在一实施例中,提供一种视频质检装置,该视频质检装置与上述实施例中视频质检方法一一对应。如图6所示,该视频质检装置包括抽帧模块501、第一检测模块502、语音识别模块503、必读率计算模块504、不可读率计算模块505、第二检测模块506、确定质检不通过模块507和确定质检通过模块508。各功能模块详细说明如下:
抽帧模块501,用于对目标视频进行抽帧处理,得到各个视频图片;
第一检测模块502,用于对所述各个视频图片进行人脸识别,检测各个视频图片中是否包括指定人员的人脸,得到各个视频图片对应的第一检测结果;
语音识别模块503,用于对所述目标视频的语音进行语音识别处理,得到目标文本;
必读率计算模块504,用于根据所述目标文本和预设的必读文本计算所述必读文本的必读率,所述必读率是指所述目标文本中包含所述必读文本的内容占所述必读文本的比例;
不可读率计算模块505,用于根据所述目标文本和预设的不可读读文本计算所述不可读文本的不可读率,所述不可读率是指所述目标文本中包含所述不可读文本的内容占所述不可读文本的比例;
第二检测模块506,用于检测是否所述必读率高于预设第一阈值且所述不可读率低于预设第二阈值,得到第二检测结果;
确定质检不通过模块507,用于若所述第一检测结果或所述第二检测结果为否,则确定所述目标视频质检不通过;
确定质检通过模块508,用于若所述第一检测结果和所述第二检测结果均为是,则确定所述目标视频质检通过。
进一步地,所述目标视频由录制指定交易流程得到,所述指定交易流程包括各个交易环节,所述各个视频图片可以包括各个交易环节对应的环节视频图片;
所述第一检测模块可以包括:
质检条件获取单元,用于获取各个所述交易环节对应的预设质检条件;
环节检测单元,用于针对每个交易环节,对所述交易环节对应的环节视频图片进行人脸识别,检测所述环节视频图片中包括的指定人员的人脸特征是否满足所述预设质检条件,得到所述交易环节的环节检测结果;
第一确定单元,用于若各个所述交易环节对应的环节检测结果均为是,则确定所述各 个视频图片对应的第一检测结果为是;
第二确定单元,用于若各个所述交易环节对应的环节检测结果中任一环节检测结果为否,则确定所述各个视频图片对应的第一检测结果为否。
进一步地,所述环节检测单元可以包括:
人脸特征判断子单元,用于针对每张环节视频图片,判断所述环节视频图片中指定人员的人脸特征是否与所述预设质检条件要求的预留人脸特征一致;
第一确定子单元,用于针对每张环节视频图片,若所述人脸特征判断子单元的判断结果为是,则确定所述环节视频图片的判断结果为是;
第二确定子单元,用于针对每张环节视频图片,若所述人脸特征判断子单元的判断结果为否,则确定所述环节视频图片的判断结果为否;
合格比例判断子单元,用于针对每个交易环节,在得到所述交易环节下各张环节视频图片对应的各个判断结果后,判断合格比例是否超过预设的比例阈值,所述合格比例是指判断结果为是的环节视频图片在所述交易环节下各张环节视频图片中的占比;
第三确定子单元,用于若所述合格比例判断子单元的判断结果为是,则确定所述交易环节的环节检测结果为是;
第四确定子单元,用于若所述合格比例判断子单元的判断结果为否,则确定所述交易环节的环节检测结果为否。
进一步地,所述人脸特征判断子单元可以包括:
第一判断次单元,用于判断所述环节视频图片中识别出的人脸特征是否包括所述投保人预留的第一人脸特征,得到第一判断结果;
第二判断次单元,用于判断所述环节视频图片中识别出的人脸特征是否包括所述代理人预留的第二人脸特征,得到第二判断结果;
第一确定次单元,用于若所述第一判断结果和所述第二判断结果均为是,则确定所述环节视频图片中指定人员的人脸特征与所述预设质检条件要求的预留人脸特征一致;
第二确定次单元,用于若所述第一判断结果或所述第二判断结果为否,则确定所述环节视频图片中指定人员的人脸特征与所述预设质检条件要求的预留人脸特征不一致。
进一步地,所述视频质检装置还可以包括:
第一标记模块,用于标记漏读环节处于所述目标视频的开始播放位置的第一时间点,所述漏读环节是指所述目标文本中漏读的必读文本所在的交易环节,标记的所述第一时间点上设置有跳转至所述目标视频且从所述第一时间点开始播放的链接;
和/或
第二标记模块,用于标记不可读环节处于所述目标视频的开始播放位置的第二时间点,所述不可读环节是指所述目标文本中出现的不可读文本所在的交易环节,标记的所述第二时间点上设置有跳转至所述目标视频且从所述第二时间点开始播放的链接;
和/或
第三标记模块,用于标记各个所述交易环节分别处于所述目标视频的开始播放位置的第三时间点,标记的所述第三时间点上设置有跳转至所述目标视频且从所述第三时间点开始播放的链接;
和/或
第四标记模块,用于标记各个所述视频图片分别处于所述目标视频的播放位置的第四时间点,标记的所述第四时间点上设置有跳转至所述目标视频上第四时间点所处播放位置的链接。
关于视频质检装置的具体限定可以参见上文中对于视频质检方法的限定,在此不再赘述。上述视频质检装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述 各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图7所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储视频质检方法中涉及到的数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种视频质检方法。
在一个实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,处理器执行计算机可读指令时实现上述实施例中视频质检方法的步骤,例如图2所示的步骤101至步骤109。或者,处理器执行计算机可读指令时实现上述实施例中视频质检装置的各模块/单元的功能,例如图6所示模块501至模块508的功能。为避免重复,这里不再赘述。
在一个实施例中,提供了一种计算机可读存储介质,该一个或多个存储有计算机可读指令的非易失性可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行计算机可读指令时实现上述方法实施例中视频质检方法的步骤,或者,该一个或多个存储有计算机可读指令的非易失性可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行计算机可读指令时实现上述装置实施例中视频质检装置中各模块/单元的功能。为避免重复,这里不再赘述。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种视频质检方法,其特征在于,包括:
    对目标视频进行抽帧处理,得到各个视频图片;
    对所述各个视频图片进行人脸识别,检测各个视频图片中是否包括指定人员的人脸,得到各个视频图片对应的第一检测结果;
    对所述目标视频的语音进行语音识别处理,得到目标文本;
    根据所述目标文本和预设的必读文本计算所述必读文本的必读率,所述必读率是指所述目标文本中包含所述必读文本的内容占所述必读文本的比例;
    根据所述目标文本和预设的不可读读文本计算所述不可读文本的不可读率,所述不可读率是指所述目标文本中包含所述不可读文本的内容占所述不可读文本的比例;
    检测是否所述必读率高于预设第一阈值且所述不可读率低于预设第二阈值,得到第二检测结果;
    若所述第一检测结果或所述第二检测结果为否,则确定所述目标视频质检不通过;
    若所述第一检测结果和所述第二检测结果均为是,则确定所述目标视频质检通过。
  2. 根据权利要求1所述的视频质检方法,其特征在于,所述目标视频由录制指定交易流程得到,所述指定交易流程包括各个交易环节,所述各个视频图片包括各个交易环节对应的环节视频图片;
    所述对所述各个视频图片进行人脸识别,检测各个视频图片中是否包括指定人员的人脸,得到各个视频图片对应的第一检测结果包括:
    获取各个所述交易环节对应的预设质检条件;
    针对每个交易环节,对所述交易环节对应的环节视频图片进行人脸识别,检测所述环节视频图片中包括的指定人员的人脸特征是否满足所述预设质检条件,得到所述交易环节的环节检测结果;
    若各个所述交易环节对应的环节检测结果均为是,则确定所述各个视频图片对应的第一检测结果为是;
    若各个所述交易环节对应的环节检测结果中任一环节检测结果为否,则确定所述各个视频图片对应的第一检测结果为否。
  3. 根据权利要求2所述的视频质检方法,其特征在于,所述检测所述环节视频图片中包括的指定人员的人脸特征是否满足所述预设质检条件,得到所述交易环节的环节检测结果包括:
    针对每张环节视频图片,判断所述环节视频图片中指定人员的人脸特征是否与所述预设质检条件要求的预留人脸特征一致;
    针对每张环节视频图片,若所述环节视频图片中指定人员的人脸特征与所述预设质检条件要求的预留人脸特征一致,则确定所述环节视频图片的判断结果为是;
    针对每张环节视频图片,若所述环节视频图片中指定人员的人脸特征与所述预设质检条件要求的预留人脸特征不一致,则确定所述环节视频图片的判断结果为否;
    针对每个交易环节,在得到所述交易环节下各张环节视频图片对应的各个判断结果后,判断合格比例是否超过预设的比例阈值,所述合格比例是指判断结果为是的环节视频图片在所述交易环节下各张环节视频图片中的占比;
    若合格比例超过预设的比例阈值,则确定所述交易环节的环节检测结果为是;
    若合格比例不超过预设的比例阈值,则确定所述交易环节的环节检测结果为否。
  4. 根据权利要求3所述的视频质检方法,其特征在于,所述判断所述环节视频图片中指定人员的人脸特征是否与所述预设质检条件要求的预留人脸特征一致包括:
    判断所述环节视频图片中识别出的人脸特征是否包括所述投保人预留的第一人脸特 征,得到第一判断结果;
    判断所述环节视频图片中识别出的人脸特征是否包括所述代理人预留的第二人脸特征,得到第二判断结果;
    若所述第一判断结果和所述第二判断结果均为是,则确定所述环节视频图片中指定人员的人脸特征与所述预设质检条件要求的预留人脸特征一致;
    若所述第一判断结果或所述第二判断结果为否,则确定所述环节视频图片中指定人员的人脸特征与所述预设质检条件要求的预留人脸特征不一致。
  5. 根据权利要求2至4中任一项所述的视频质检方法,其特征在于,所述视频质检方法还包括:
    标记漏读环节处于所述目标视频的开始播放位置的第一时间点,所述漏读环节是指所述目标文本中漏读的必读文本所在的交易环节,标记的所述第一时间点上设置有跳转至所述目标视频且从所述第一时间点开始播放的链接;
    和/或
    标记不可读环节处于所述目标视频的开始播放位置的第二时间点,所述不可读环节是指所述目标文本中出现的不可读文本所在的交易环节,标记的所述第二时间点上设置有跳转至所述目标视频且从所述第二时间点开始播放的链接;
    和/或
    标记各个所述交易环节分别处于所述目标视频的开始播放位置的第三时间点,标记的所述第三时间点上设置有跳转至所述目标视频且从所述第三时间点开始播放的链接;
    和/或
    标记各个所述视频图片分别处于所述目标视频的播放位置的第四时间点,标记的所述第四时间点上设置有跳转至所述目标视频上第四时间点所处播放位置的链接。
  6. 一种视频质检装置,其特征在于,包括:
    抽帧模块,用于对目标视频进行抽帧处理,得到各个视频图片;
    第一检测模块,用于对所述各个视频图片进行人脸识别,检测各个视频图片中是否包括指定人员的人脸,得到各个视频图片对应的第一检测结果;
    语音识别模块,用于对所述目标视频的语音进行语音识别处理,得到目标文本;
    必读率计算模块,用于根据所述目标文本和预设的必读文本计算所述必读文本的必读率,所述必读率是指所述目标文本中包含所述必读文本的内容占所述必读文本的比例;
    不可读率计算模块,用于根据所述目标文本和预设的不可读读文本计算所述不可读文本的不可读率,所述不可读率是指所述目标文本中包含所述不可读文本的内容占所述不可读文本的比例;
    第二检测模块,用于检测是否所述必读率高于预设第一阈值且所述不可读率低于预设第二阈值,得到第二检测结果;
    确定质检不通过模块,用于若所述第一检测结果或所述第二检测结果为否,则确定所述目标视频质检不通过;
    确定质检通过模块,用于若所述第一检测结果和所述第二检测结果均为是,则确定所述目标视频质检通过。
  7. 根据权利要求6所述的视频质检装置,其特征在于,所述目标视频由录制指定交易流程得到,所述指定交易流程包括各个交易环节,所述各个视频图片包括各个交易环节对应的环节视频图片;
    所述第一检测模块包括:
    质检条件获取单元,用于获取各个所述交易环节对应的预设质检条件;
    环节检测单元,用于针对每个交易环节,对所述交易环节对应的环节视频图片进行人脸识别,检测所述环节视频图片中包括的指定人员的人脸特征是否满足所述预设质检条 件,得到所述交易环节的环节检测结果;
    第一确定单元,用于若各个所述交易环节对应的环节检测结果均为是,则确定所述各个视频图片对应的第一检测结果为是;
    第二确定单元,用于若各个所述交易环节对应的环节检测结果中任一环节检测结果为否,则确定所述各个视频图片对应的第一检测结果为否。
  8. 根据权利要求7所述的视频质检装置,其特征在于,所述环节检测单元包括:
    人脸特征判断子单元,用于针对每张环节视频图片,判断所述环节视频图片中指定人员的人脸特征是否与所述预设质检条件要求的预留人脸特征一致;
    第一确定子单元,用于针对每张环节视频图片,若所述人脸特征判断子单元的判断结果为是,则确定所述环节视频图片的判断结果为是;
    第二确定子单元,用于针对每张环节视频图片,若所述人脸特征判断子单元的判断结果为否,则确定所述环节视频图片的判断结果为否;
    合格比例判断子单元,用于针对每个交易环节,在得到所述交易环节下各张环节视频图片对应的各个判断结果后,判断合格比例是否超过预设的比例阈值,所述合格比例是指判断结果为是的环节视频图片在所述交易环节下各张环节视频图片中的占比;
    第三确定子单元,用于若所述合格比例判断子单元的判断结果为是,则确定所述交易环节的环节检测结果为是;
    第四确定子单元,用于若所述合格比例判断子单元的判断结果为否,则确定所述交易环节的环节检测结果为否。
  9. 根据权利要求8所述的视频质检装置,其特征在于,所述人脸特征判断子单元包括:
    第一判断次单元,用于判断所述环节视频图片中识别出的人脸特征是否包括所述投保人预留的第一人脸特征,得到第一判断结果;
    第二判断次单元,用于判断所述环节视频图片中识别出的人脸特征是否包括所述代理人预留的第二人脸特征,得到第二判断结果;
    第一确定次单元,用于若所述第一判断结果和所述第二判断结果均为是,则确定所述环节视频图片中指定人员的人脸特征与所述预设质检条件要求的预留人脸特征一致;
    第二确定次单元,用于若所述第一判断结果或所述第二判断结果为否,则确定所述环节视频图片中指定人员的人脸特征与所述预设质检条件要求的预留人脸特征不一致。
  10. 根据权利要求7至9中任一项所述的视频质检装置,其特征在于,所述视频质检装置还包括:
    第一标记模块,用于标记漏读环节处于所述目标视频的开始播放位置的第一时间点,所述漏读环节是指所述目标文本中漏读的必读文本所在的交易环节,标记的所述第一时间点上设置有跳转至所述目标视频且从所述第一时间点开始播放的链接;
    和/或
    第二标记模块,用于标记不可读环节处于所述目标视频的开始播放位置的第二时间点,所述不可读环节是指所述目标文本中出现的不可读文本所在的交易环节,标记的所述第二时间点上设置有跳转至所述目标视频且从所述第二时间点开始播放的链接;
    和/或
    第三标记模块,用于标记各个所述交易环节分别处于所述目标视频的开始播放位置的第三时间点,标记的所述第三时间点上设置有跳转至所述目标视频且从所述第三时间点开始播放的链接;
    和/或
    第四标记模块,用于标记各个所述视频图片分别处于所述目标视频的播放位置的第四时间点,标记的所述第四时间点上设置有跳转至所述目标视频上第四时间点所处播放位置 的链接。
  11. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现如下步骤:
    对目标视频进行抽帧处理,得到各个视频图片;
    对所述各个视频图片进行人脸识别,检测各个视频图片中是否包括指定人员的人脸,得到各个视频图片对应的第一检测结果;
    对所述目标视频的语音进行语音识别处理,得到目标文本;
    根据所述目标文本和预设的必读文本计算所述必读文本的必读率,所述必读率是指所述目标文本中包含所述必读文本的内容占所述必读文本的比例;
    根据所述目标文本和预设的不可读读文本计算所述不可读文本的不可读率,所述不可读率是指所述目标文本中包含所述不可读文本的内容占所述不可读文本的比例;
    检测是否所述必读率高于预设第一阈值且所述不可读率低于预设第二阈值,得到第二检测结果;
    若所述第一检测结果或所述第二检测结果为否,则确定所述目标视频质检不通过;
    若所述第一检测结果和所述第二检测结果均为是,则确定所述目标视频质检通过。
  12. 根据权利要求11所述的计算机设备,其特征在于,所述目标视频由录制指定交易流程得到,所述指定交易流程包括各个交易环节,所述各个视频图片包括各个交易环节对应的环节视频图片;
    所述对所述各个视频图片进行人脸识别,检测各个视频图片中是否包括指定人员的人脸,得到各个视频图片对应的第一检测结果包括:
    获取各个所述交易环节对应的预设质检条件;
    针对每个交易环节,对所述交易环节对应的环节视频图片进行人脸识别,检测所述环节视频图片中包括的指定人员的人脸特征是否满足所述预设质检条件,得到所述交易环节的环节检测结果;
    若各个所述交易环节对应的环节检测结果均为是,则确定所述各个视频图片对应的第一检测结果为是;
    若各个所述交易环节对应的环节检测结果中任一环节检测结果为否,则确定所述各个视频图片对应的第一检测结果为否。
  13. 根据权利要求12所述的计算机设备,其特征在于,所述检测所述环节视频图片中包括的指定人员的人脸特征是否满足所述预设质检条件,得到所述交易环节的环节检测结果包括:
    针对每张环节视频图片,判断所述环节视频图片中指定人员的人脸特征是否与所述预设质检条件要求的预留人脸特征一致;
    针对每张环节视频图片,若所述环节视频图片中指定人员的人脸特征与所述预设质检条件要求的预留人脸特征一致,则确定所述环节视频图片的判断结果为是;
    针对每张环节视频图片,若所述环节视频图片中指定人员的人脸特征与所述预设质检条件要求的预留人脸特征不一致,则确定所述环节视频图片的判断结果为否;
    针对每个交易环节,在得到所述交易环节下各张环节视频图片对应的各个判断结果后,判断合格比例是否超过预设的比例阈值,所述合格比例是指判断结果为是的环节视频图片在所述交易环节下各张环节视频图片中的占比;
    若合格比例超过预设的比例阈值,则确定所述交易环节的环节检测结果为是;
    若合格比例不超过预设的比例阈值,则确定所述交易环节的环节检测结果为否。
  14. 根据权利要求13所述的计算机设备,其特征在于,所述判断所述环节视频图片中指定人员的人脸特征是否与所述预设质检条件要求的预留人脸特征一致包括:
    判断所述环节视频图片中识别出的人脸特征是否包括所述投保人预留的第一人脸特征,得到第一判断结果;
    判断所述环节视频图片中识别出的人脸特征是否包括所述代理人预留的第二人脸特征,得到第二判断结果;
    若所述第一判断结果和所述第二判断结果均为是,则确定所述环节视频图片中指定人员的人脸特征与所述预设质检条件要求的预留人脸特征一致;
    若所述第一判断结果或所述第二判断结果为否,则确定所述环节视频图片中指定人员的人脸特征与所述预设质检条件要求的预留人脸特征不一致。
  15. 根据权利要求12至14中任一项所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还实现如下步骤:
    标记漏读环节处于所述目标视频的开始播放位置的第一时间点,所述漏读环节是指所述目标文本中漏读的必读文本所在的交易环节,标记的所述第一时间点上设置有跳转至所述目标视频且从所述第一时间点开始播放的链接;
    和/或
    标记不可读环节处于所述目标视频的开始播放位置的第二时间点,所述不可读环节是指所述目标文本中出现的不可读文本所在的交易环节,标记的所述第二时间点上设置有跳转至所述目标视频且从所述第二时间点开始播放的链接;
    和/或
    标记各个所述交易环节分别处于所述目标视频的开始播放位置的第三时间点,标记的所述第三时间点上设置有跳转至所述目标视频且从所述第三时间点开始播放的链接;
    和/或
    标记各个所述视频图片分别处于所述目标视频的播放位置的第四时间点,标记的所述第四时间点上设置有跳转至所述目标视频上第四时间点所处播放位置的链接。
  16. 一个或多个存储有计算机可读指令的非易失性可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:
    对目标视频进行抽帧处理,得到各个视频图片;
    对所述各个视频图片进行人脸识别,检测各个视频图片中是否包括指定人员的人脸,得到各个视频图片对应的第一检测结果;
    对所述目标视频的语音进行语音识别处理,得到目标文本;
    根据所述目标文本和预设的必读文本计算所述必读文本的必读率,所述必读率是指所述目标文本中包含所述必读文本的内容占所述必读文本的比例;
    根据所述目标文本和预设的不可读读文本计算所述不可读文本的不可读率,所述不可读率是指所述目标文本中包含所述不可读文本的内容占所述不可读文本的比例;
    检测是否所述必读率高于预设第一阈值且所述不可读率低于预设第二阈值,得到第二检测结果;
    若所述第一检测结果或所述第二检测结果为否,则确定所述目标视频质检不通过;
    若所述第一检测结果和所述第二检测结果均为是,则确定所述目标视频质检通过。
  17. 根据权利要求16所述的非易失性可读存储介质,其特征在于,所述目标视频由录制指定交易流程得到,所述指定交易流程包括各个交易环节,所述各个视频图片包括各个交易环节对应的环节视频图片;
    所述对所述各个视频图片进行人脸识别,检测各个视频图片中是否包括指定人员的人脸,得到各个视频图片对应的第一检测结果包括:
    获取各个所述交易环节对应的预设质检条件;
    针对每个交易环节,对所述交易环节对应的环节视频图片进行人脸识别,检测所述环节视频图片中包括的指定人员的人脸特征是否满足所述预设质检条件,得到所述交易环节 的环节检测结果;
    若各个所述交易环节对应的环节检测结果均为是,则确定所述各个视频图片对应的第一检测结果为是;
    若各个所述交易环节对应的环节检测结果中任一环节检测结果为否,则确定所述各个视频图片对应的第一检测结果为否。
  18. 根据权利要求17所述的非易失性可读存储介质,其特征在于,所述检测所述环节视频图片中包括的指定人员的人脸特征是否满足所述预设质检条件,得到所述交易环节的环节检测结果包括:
    针对每张环节视频图片,判断所述环节视频图片中指定人员的人脸特征是否与所述预设质检条件要求的预留人脸特征一致;
    针对每张环节视频图片,若所述环节视频图片中指定人员的人脸特征与所述预设质检条件要求的预留人脸特征一致,则确定所述环节视频图片的判断结果为是;
    针对每张环节视频图片,若所述环节视频图片中指定人员的人脸特征与所述预设质检条件要求的预留人脸特征不一致,则确定所述环节视频图片的判断结果为否;
    针对每个交易环节,在得到所述交易环节下各张环节视频图片对应的各个判断结果后,判断合格比例是否超过预设的比例阈值,所述合格比例是指判断结果为是的环节视频图片在所述交易环节下各张环节视频图片中的占比;
    若合格比例超过预设的比例阈值,则确定所述交易环节的环节检测结果为是;
    若合格比例不超过预设的比例阈值,则确定所述交易环节的环节检测结果为否。
  19. 根据权利要求18所述的非易失性可读存储介质,其特征在于,所述判断所述环节视频图片中指定人员的人脸特征是否与所述预设质检条件要求的预留人脸特征一致包括:
    判断所述环节视频图片中识别出的人脸特征是否包括所述投保人预留的第一人脸特征,得到第一判断结果;
    判断所述环节视频图片中识别出的人脸特征是否包括所述代理人预留的第二人脸特征,得到第二判断结果;
    若所述第一判断结果和所述第二判断结果均为是,则确定所述环节视频图片中指定人员的人脸特征与所述预设质检条件要求的预留人脸特征一致;
    若所述第一判断结果或所述第二判断结果为否,则确定所述环节视频图片中指定人员的人脸特征与所述预设质检条件要求的预留人脸特征不一致。
  20. 根据权利要求17至19中任一项所述的非易失性可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:
    标记漏读环节处于所述目标视频的开始播放位置的第一时间点,所述漏读环节是指所述目标文本中漏读的必读文本所在的交易环节,标记的所述第一时间点上设置有跳转至所述目标视频且从所述第一时间点开始播放的链接;
    和/或
    标记不可读环节处于所述目标视频的开始播放位置的第二时间点,所述不可读环节是指所述目标文本中出现的不可读文本所在的交易环节,标记的所述第二时间点上设置有跳转至所述目标视频且从所述第二时间点开始播放的链接;
    和/或
    标记各个所述交易环节分别处于所述目标视频的开始播放位置的第三时间点,标记的所述第三时间点上设置有跳转至所述目标视频且从所述第三时间点开始播放的链接;
    和/或
    标记各个所述视频图片分别处于所述目标视频的播放位置的第四时间点,标记的所述 第四时间点上设置有跳转至所述目标视频上第四时间点所处播放位置的链接。
PCT/CN2018/123132 2018-11-02 2018-12-24 视频质检方法、装置、计算机设备及存储介质 WO2020087713A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
JP2021508040A JP7111887B2 (ja) 2018-11-02 2018-12-24 ビデオ品質検査方法、装置、コンピュータデバイス及び記憶媒体
EP18938571.9A EP3876549A4 (en) 2018-11-02 2018-12-24 VIDEO QUALITY INSPECTION PROCESS AND APPARATUS, COMPUTER DEVICE, AND STORAGE MEDIA
SG11202101615QA SG11202101615QA (en) 2018-11-02 2018-12-24 Video quality inspection method and apparatus, computer device, and storage medium
KR1020207036022A KR20210016551A (ko) 2018-11-02 2018-12-24 비디오 품질 검사 방법, 장치, 컴퓨터 디바이스 및 저장 매체

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811301549.9 2018-11-02
CN201811301549.9A CN109472487A (zh) 2018-11-02 2018-11-02 视频质检方法、装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2020087713A1 true WO2020087713A1 (zh) 2020-05-07

Family

ID=65666757

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/123132 WO2020087713A1 (zh) 2018-11-02 2018-12-24 视频质检方法、装置、计算机设备及存储介质

Country Status (6)

Country Link
EP (1) EP3876549A4 (zh)
JP (1) JP7111887B2 (zh)
KR (1) KR20210016551A (zh)
CN (1) CN109472487A (zh)
SG (1) SG11202101615QA (zh)
WO (1) WO2020087713A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111741356A (zh) * 2020-08-25 2020-10-02 腾讯科技(深圳)有限公司 双录视频的质检方法、装置、设备及可读存储介质
CN113128390A (zh) * 2021-04-14 2021-07-16 北京奇艺世纪科技有限公司 抽检方法、装置、电子设备及存储介质
CN113792600A (zh) * 2021-08-10 2021-12-14 武汉光庭信息技术股份有限公司 一种基于深度学习的视频抽帧方法和系统

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109472487A (zh) * 2018-11-02 2019-03-15 深圳壹账通智能科技有限公司 视频质检方法、装置、计算机设备及存储介质
CN110147726B (zh) * 2019-04-12 2024-02-20 财付通支付科技有限公司 业务质检方法和装置、存储介质及电子装置
CN110147926A (zh) * 2019-04-12 2019-08-20 深圳壹账通智能科技有限公司 一种业务类型的风险等级计算方法、存储介质及终端设备
CN110111071A (zh) * 2019-04-24 2019-08-09 上海商汤智能科技有限公司 签到方法、装置、电子设备和计算机存储介质
CN111008925A (zh) * 2019-12-11 2020-04-14 京东数字科技控股有限公司 证件水印的验证方法、装置、设备及存储介质
CN111885375A (zh) * 2020-07-15 2020-11-03 中国工商银行股份有限公司 双录视频的检验方法、装置、服务器及系统
CN112804587B (zh) * 2020-12-31 2022-10-14 平安科技(深圳)有限公司 基于观看人数序列的视频质检方法、装置及计算机设备
CN115250375B (zh) * 2021-04-26 2024-01-26 北京中关村科金技术有限公司 一种基于固定话术的音视频内容合规性检测方法及装置
CN115631448B (zh) * 2022-12-19 2023-04-04 广州佰锐网络科技有限公司 一种音视频质检处理方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120281885A1 (en) * 2011-05-05 2012-11-08 At&T Intellectual Property I, L.P. System and method for dynamic facial features for speaker recognition
CN106250837A (zh) * 2016-07-27 2016-12-21 腾讯科技(深圳)有限公司 一种视频的识别方法、装置和系统
CN107862258A (zh) * 2017-10-24 2018-03-30 广东小天才科技有限公司 视频中文本内容的校验方法、装置、设备及存储介质
CN108124191A (zh) * 2017-12-22 2018-06-05 北京百度网讯科技有限公司 一种视频审核方法、装置及服务器
CN109472487A (zh) * 2018-11-02 2019-03-15 深圳壹账通智能科技有限公司 视频质检方法、装置、计算机设备及存储介质

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4236502B2 (ja) * 2003-04-03 2009-03-11 三菱電機株式会社 音声認識装置
CN102056026B (zh) * 2009-11-06 2013-04-03 中国移动通信集团设计院有限公司 音视频同步检测方法及其系统、语音检测方法及其系统
JP2011215942A (ja) * 2010-03-31 2011-10-27 Nec Personal Products Co Ltd ユーザ認証装置、ユーザ認証システム、ユーザ認証方法及びプログラム
JP2015099474A (ja) * 2013-11-19 2015-05-28 芳子 明石 保険渉外システム
CN105187674B (zh) * 2015-08-14 2020-02-14 上海银赛计算机科技有限公司 服务录音的合规检查方法及装置
CN106911630A (zh) * 2015-12-22 2017-06-30 上海仪电数字技术股份有限公司 终端及身份认证方法、终端和认证中心的认证方法及系统
CN105654372A (zh) * 2015-12-22 2016-06-08 深圳前海微众银行股份有限公司 远程开户的身份识别方法、服务器及系统
US9743077B2 (en) * 2016-01-12 2017-08-22 Sling Media LLC Detection and marking of low quality video content
CN106934713B (zh) * 2017-02-13 2021-05-28 杭州百航信息技术有限公司 金融交易风险管控系统及其存储文件快速识别定位方法
CN107016608A (zh) * 2017-03-30 2017-08-04 广东微模式软件股份有限公司 一种基于身份信息验证的远程开户方法及系统
CN107864118B (zh) * 2017-08-14 2020-03-17 深圳壹账通智能科技有限公司 登录验证方法、系统及计算机可读存储介质
CN107610718A (zh) * 2017-08-29 2018-01-19 深圳市买买提乐购金融服务有限公司 一种对语音文件内容进行标记的方法及装置
CN108053838B (zh) * 2017-12-01 2019-10-11 深圳壹账通智能科技有限公司 结合音频分析和视频分析的欺诈识别方法、装置及存储介质
CN108510213A (zh) * 2018-05-11 2018-09-07 苏州华兴源创电子科技有限公司 将任务依次分配至任务组的方法、装置、设备及介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120281885A1 (en) * 2011-05-05 2012-11-08 At&T Intellectual Property I, L.P. System and method for dynamic facial features for speaker recognition
CN106250837A (zh) * 2016-07-27 2016-12-21 腾讯科技(深圳)有限公司 一种视频的识别方法、装置和系统
CN107862258A (zh) * 2017-10-24 2018-03-30 广东小天才科技有限公司 视频中文本内容的校验方法、装置、设备及存储介质
CN108124191A (zh) * 2017-12-22 2018-06-05 北京百度网讯科技有限公司 一种视频审核方法、装置及服务器
CN109472487A (zh) * 2018-11-02 2019-03-15 深圳壹账通智能科技有限公司 视频质检方法、装置、计算机设备及存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3876549A4

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111741356A (zh) * 2020-08-25 2020-10-02 腾讯科技(深圳)有限公司 双录视频的质检方法、装置、设备及可读存储介质
CN113128390A (zh) * 2021-04-14 2021-07-16 北京奇艺世纪科技有限公司 抽检方法、装置、电子设备及存储介质
CN113792600A (zh) * 2021-08-10 2021-12-14 武汉光庭信息技术股份有限公司 一种基于深度学习的视频抽帧方法和系统
CN113792600B (zh) * 2021-08-10 2023-07-18 武汉光庭信息技术股份有限公司 一种基于深度学习的视频抽帧方法和系统

Also Published As

Publication number Publication date
EP3876549A4 (en) 2021-11-17
JP2021520014A (ja) 2021-08-12
CN109472487A (zh) 2019-03-15
JP7111887B2 (ja) 2022-08-02
EP3876549A1 (en) 2021-09-08
KR20210016551A (ko) 2021-02-16
SG11202101615QA (en) 2021-03-30

Similar Documents

Publication Publication Date Title
WO2020087713A1 (zh) 视频质检方法、装置、计算机设备及存储介质
WO2020140665A1 (zh) 双录视频质量检测方法、装置、计算机设备和存储介质
WO2021004132A1 (zh) 异常数据检测方法、装置、计算机设备和存储介质
WO2020098249A1 (zh) 电子装置、应对话术推荐方法和计算机可读存储介质
EP3890333A1 (en) Video cutting method and apparatus, computer device and storage medium
CN109473093B (zh) 语音识别方法、装置、计算机设备及存储介质
US9607615B2 (en) Classifying spoken content in a teleconference
CN112017056B (zh) 一种智能双录方法及系统
WO2020125386A1 (zh) 表情识别方法、装置、计算机设备和存储介质
WO2019174073A1 (zh) 通话中客户信息修改方法、装置、计算机设备及存储介质
CN109831677B (zh) 视频脱敏方法、装置、计算机设备和存储介质
WO2020073492A1 (zh) 数据安全处理方法、装置、计算机设备及存储介质
WO2020211233A1 (zh) 批量数据编辑方法、装置、计算机设备及存储介质
CN110598008A (zh) 录制数据的数据质检方法及装置、存储介质
CN114493902A (zh) 多模态信息异常监控方法、装置、计算机设备及存储介质
WO2020258904A1 (zh) 确定质检效果的方法、装置、设备及存储介质
CN113051924A (zh) 一种录制数据分段质检方法及系统
CN110298543B (zh) 业务跟踪方法、装置、计算机设备及存储介质
CN110533381B (zh) 案件管辖权审核方法、装置、计算机设备和存储介质
US10341491B1 (en) Identifying unreported issues through customer service interactions and website analytics
CN110163183B (zh) 目标检测算法的评估方法、装置、计算机设备和存储介质
CN113645357B (zh) 通话质检方法、装置、计算机设备和计算机可读存储介质
US20230289700A1 (en) Systems and methods for call compliance and verification
CN113572900A (zh) 外呼测试方法、装置、计算机设备和计算机可读存储介质
KR20220122355A (ko) 비대면 계약을 관리하는 계약 관리 시스템 및 방법

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18938571

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021508040

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018938571

Country of ref document: EP

Effective date: 20210602