WO2020140665A1 - 双录视频质量检测方法、装置、计算机设备和存储介质 - Google Patents
双录视频质量检测方法、装置、计算机设备和存储介质 Download PDFInfo
- Publication number
- WO2020140665A1 WO2020140665A1 PCT/CN2019/122478 CN2019122478W WO2020140665A1 WO 2020140665 A1 WO2020140665 A1 WO 2020140665A1 CN 2019122478 W CN2019122478 W CN 2019122478W WO 2020140665 A1 WO2020140665 A1 WO 2020140665A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- video
- node
- face recognition
- detected
- recorded
- Prior art date
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 224
- 238000000034 method Methods 0.000 title claims abstract description 27
- 238000007689 inspection Methods 0.000 claims abstract description 82
- 238000006243 chemical reaction Methods 0.000 claims abstract description 12
- 239000000284 extract Substances 0.000 claims description 22
- 230000008921 facial expression Effects 0.000 claims description 21
- 239000013598 vector Substances 0.000 claims description 14
- 238000012549 training Methods 0.000 claims description 5
- 239000003795 chemical substances by application Substances 0.000 description 23
- 230000009977 dual effect Effects 0.000 description 10
- 238000000605 extraction Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 6
- 230000011218 segmentation Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 230000001815 facial effect Effects 0.000 description 5
- 238000012360 testing method Methods 0.000 description 5
- 238000012880 independent component analysis Methods 0.000 description 2
- 238000000513 principal component analysis Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N17/00—Diagnosis, testing or measuring for television systems or their details
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/233—Processing of audio elementary streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/2343—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/235—Processing of additional data, e.g. scrambling of additional data or processing content descriptors
Definitions
- the present application relates to a dual-recorded video quality detection method, device, computer equipment, and storage medium.
- the dual-recording terminal usually sends the entire dual-recorded video to the quality inspector terminal after the recording is finished. After the quality inspector terminal obtains the quality inspection result of the entire dual-recorded video, if there is any inconsistency in the dual-recorded video Where required, the dual-recording terminal needs to re-record the entire video, however, the inventor realized that in this way, because the entire video needs to be re-recorded every time, the system resources of the dual-recording terminal are wasted.
- a dual-recorded video quality detection method, device, computer equipment, and storage medium are provided.
- a dual video quality detection method includes:
- a double recording video quality detection device includes:
- a quality detection request module configured to receive a quality detection request sent by a terminal, where the quality detection request carries the node identifier of the node to be detected;
- a quality inspection rule search module which is used to search for a corresponding quality inspection rule based on the node identification
- the video detection result acquisition module is used to extract multiple frames of video images from the double-recorded video corresponding to the node to be detected according to the quality inspection rules, and perform face recognition on the extracted frames of the video images to obtain frames of video The face recognition result corresponding to the image, and obtaining the video detection result of the node to be detected according to the face recognition result corresponding to each frame of the video image;
- a double-recorded text acquisition module configured to obtain double-recorded audio data from the double-recorded video corresponding to the node to be detected, and perform voice conversion on the double-recorded audio data to obtain double-recorded text;
- the word detection module is used to perform word detection on the double-recorded text according to the quality inspection rules to obtain the audio detection result corresponding to the node to be detected;
- the quality detection result judgment module is used to determine the video detection result and all Obtaining the quality detection result corresponding to the node to be detected by the audio detection result;
- a re-recording instruction generation module is used to generate a re-recording instruction according to the node identifier when the quality detection result is failed, and send the re-recording instruction to the terminal, and the re-recording instruction is used to indicate The terminal jumps to the node corresponding to the node identifier.
- a computer device includes a memory and one or more processors.
- the memory stores computer-readable instructions.
- the one or more processors are implemented The steps of the above dual video quality detection method.
- One or more non-volatile computer-readable storage media storing computer-readable instructions, when the computer-readable instructions are executed by one or more processors, cause the one or more processors to implement the above-described dual-recorded video quality detection method A step of.
- FIG. 1 is an application scenario diagram of a dual-recording video quality detection method according to one or more embodiments.
- FIG. 2 is a schematic flowchart of a method for detecting dual-recorded video quality according to one or more embodiments.
- FIG. 3 is a schematic flowchart of a dual-recording video quality detection method according to one or more embodiments.
- FIG. 4 is a block diagram of a dual-recording video quality detection device according to one or more embodiments.
- Figure 5 is a block diagram of a computer device in accordance with one or more embodiments.
- the dual recording video quality detection method provided by this application can be applied in the application environment shown in FIG. 1.
- the terminal 102 communicates with the server 104 through the network through the network.
- the node is used as the node to be detected, and a quality detection request is generated according to the node identifier of the node to be detected, and the quality detection request is sent to the server 104, and the server 104 according to the quality detection request Carry the node identifier to find the corresponding quality inspection rules, and then extract multiple frames of video images from the double-recorded video corresponding to the node to be detected according to the quality inspection rules, and perform face recognition on each frame of the extracted video images to obtain the correspondence of each frame of video images
- the face recognition result of the video is based on the face recognition result corresponding to each frame of video image to obtain the video detection result of the node to be detected, and obtain the dual-recorded audio data from the dual-recorded video corresponding to the node to be
- a re-recording instruction is generated according to the node identifier, and the re-recording instruction is sent to the terminal 102, and the terminal 102 can jump to the node corresponding to the node identifier to re-record.
- the terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices.
- the server 104 may be implemented by an independent server or a server cluster composed of multiple servers.
- a dual-recorded video quality detection method is provided. Taking the method applied to the server in FIG. 1 as an example for illustration, it includes the following steps:
- Step S202 Receive a quality detection request sent by the terminal, where the quality detection request carries the node identifier of the node to be detected.
- the dual recording process is composed of multiple nodes, and each node corresponds to a different link in the dual recording.
- a bank wealth management product can include the following three nodes: 1. Tell about the specific product information purchased by the customer; 2. Request The customer agrees; 3. Clear and specific exemption clauses.
- the terminal may automatically enter each node for double recording in sequence according to a preset sequence, or may enter a node for double recording when a trigger event is detected for a certain node.
- the trigger event for a certain node can be the user's click operation on the terminal, for example, the user clicks the button to switch to the next node, or the user's voice, for example, the user can speak a fixed voice to trigger the entry to the next node .
- a quality detection request is generated according to the node identifier of the node and sent to the server.
- the server parses the quality detection request to obtain the node carried therein Mark, and then perform real-time quality inspection on the dual-recorded video corresponding to the node.
- the node ID is used to uniquely identify a dual-recording node.
- Step S204 searching for the corresponding quality inspection rules according to the node identification.
- Quality inspection rules include video image frame extraction frequency, keywords, and stop words.
- Video image extraction frame frequency refers to the frequency of extracting video image frames from double-recorded video for face recognition.
- Keyword refers to the process of double recording Words that must be spoken; forbidden words are words that are forbidden to be spoken in the process of double recording. It can be understood that the quality inspection rules in this embodiment are set in advance based on experience, and can also be modified according to specific needs.
- Step S206 Extract multiple frames of video images from the double-recorded video corresponding to the node to be detected according to the quality inspection rules, and perform face recognition on each frame of the extracted video images to obtain the face recognition result corresponding to each frame of the video image.
- the face recognition result corresponding to the video image obtains the video detection result of the node to be detected.
- the server extracts video images from the double-recorded video corresponding to the node to be detected according to the video image frame extraction frequency in the quality inspection rules, and each extracted video is determined by the quality inspection rules.
- the frequency of video image frame extraction in the quality inspection rules is to extract one frame of image every 5 seconds, then the server extracts one frame of video image from the double-recorded video of the terminal every 5 seconds.
- the server can perform face recognition on the extracted video image, and then combine the extracted face recognition results of each frame of the video image to obtain the video detection result of the node to be detected.
- the server may calculate the face recognition pass rate based on the face recognition results of each frame of video images to determine whether the face recognition pass rate is greater than a preset threshold, and if so, determine that the video detection result corresponding to the node to be detected is Pass; otherwise, it is determined that the video detection result corresponding to the node to be detected is failed.
- a preset threshold For example, a total of 10 frames of video images have been subjected to face recognition. Among them, 8 frames of video images have passed the face recognition result.
- the pass rate of face recognition corresponding to the node to be detected is 80%. If the preset threshold is 75%, the video detection result corresponding to the node to be detected is passed.
- Step S208 Obtain dual-recorded audio data from the dual-recorded video corresponding to the node to be detected, and perform voice conversion on the dual-recorded audio data to obtain dual-recorded text.
- the server may acquire the double-recorded audio data from the terminal at a preset frequency.
- the server may obtain the double-recorded audio data from the terminal every preset time period, for example, obtain audio data from the terminal every 30S; in other embodiments, the server may wait for the double-recording of the detection node to end Obtain the double-recorded audio data of the node to be detected at one time. Further, after acquiring the dual-recorded audio data, the server performs voice conversion on the dual-recorded audio data to obtain the corresponding dual-recorded text.
- Step S210 Perform word detection on the double-recorded text according to the quality inspection rules to obtain the audio detection result corresponding to the node to be detected.
- the server may first perform word segmentation on the double-recorded text, and perform word detection on the words obtained from the word segmentation according to the quality inspection rules, specifically, whether the prohibited words listed in the quality inspection rules appear in the words obtained from the word segmentation, If the keywords listed in the quality inspection rules are missing, when the words obtained from the word segmentation are detected without the forbidden words listed in the quality inspection rules and without the keywords listed in the quality inspection rules, the obtained double The audio test result corresponding to the recorded audio data is passed.
- the server can generate an initial inspection suggestion according to the specific inspection result, for example, when the keyword "whether agree” is not detected, the initial inspection suggestion "not mentioned” is generated Keywords [Do you agree]".
- Step S212 Obtain the quality detection result corresponding to the node to be detected according to the video detection result and the audio detection result.
- the quality detection result corresponding to the node to be detected is failed; when both the video detection result and the audio detection result are passed, the node to be detected The corresponding quality test result is passed.
- the server may send the quality detection result of the node to be detected to the terminal. Specifically, the server may send the quality detection result to the terminal after receiving the quality detection result acquisition request sent by the server; or it may actively send the quality detection result to the terminal after obtaining the quality detection result of the node to be detected.
- Step S214 when the quality detection result is failed, a re-recording instruction is generated according to the node identifier, and the re-recording instruction is sent to the terminal.
- the re-recording instruction is used to instruct the terminal to jump to the node corresponding to the node identifier.
- the server may generate a re-recording instruction according to the node identifier of the node, and send the re-recording instruction to the terminal. After receiving the re-recording instruction, the terminal performs the re-recording instruction Analyze and get the node ID.
- the terminal may automatically jump to the node corresponding to the node identification; in other embodiments, after acquiring the node identification, the terminal may jump after receiving an operation to confirm the jump Go to the node corresponding to the node ID, for example, the terminal may display a "confirm jump" button on the display screen after receiving the re-recording instruction, and after detecting the user's click operation on the "confirm jump” button, jump Go to the node corresponding to the node ID and start re-recording the node.
- the server can perform quality detection on a single node after receiving a quality detection request for a single node, and when the detection result is failed , Generate a re-recording instruction for the node so that the terminal can re-record the unqualified video node in time, avoiding re-recording of the entire video when the video is unqualified in a certain place, saving the terminal's system resources.
- the server searches for the corresponding quality inspection rules according to the node identification, and can automatically detect the video image and audio data of the inspection node according to the quality inspection rules to obtain the corresponding
- the video detection results and audio detection results are finally obtained based on the video detection results and audio detection results.
- the quality detection results corresponding to the nodes to be detected realize the automatic detection of the quality of the double-recorded video, which not only saves the time of manual detection, Improve the quality detection efficiency of dual-recorded video, and can improve the accuracy of quality detection.
- the above method further includes: extracting face images from the extracted frames of video images; extracting the face images Expression characteristics, using the trained fraud probability prediction model according to the expression characteristics to obtain the fraud probability corresponding to each frame of video image; obtaining the video detection result of the node to be detected according to the face recognition result corresponding to each frame of video image, including: according to each frame The face recognition result corresponding to the video image and the probability of fraud obtain the video detection result corresponding to the node to be detected.
- the server first performs face detection on each frame of the extracted video image to obtain a face image, and uses a feature extraction algorithm to extract facial expression features.
- the facial expression features include facial organs, texture regions, and predefined feature points.
- the expression feature vector corresponding to the video image is obtained, and the expression feature vector is input into a pre-trained fraud probability prediction model to obtain the fraud probability.
- the fraud probability is used to characterize the possibility of fraud of the subject corresponding to the face image. The greater the probability of fraud, the greater the probability of fraud.
- Face detection algorithms include, but are not limited to, face detection algorithms based on histogram rough segmentation and singular value features, face detection algorithms based on binary wavelet transform, face detection based on AdaBoost algorithm, and face based on facial binocular structure features Detection algorithms, etc.; expression feature extraction methods include, but are not limited to, Principal Component Analysis (PCA), Independent Component Analysis (Indenpent Compondent Analysis, ICA) and Linear Discriminant Analysis (LDA), Gabor Wavelet Method, LBP operator method, etc.
- PCA Principal Component Analysis
- ICA Independent Component Analysis
- LDA Linear Discriminant Analysis
- the agent and the customer when performing dual recording, usually perform dual recording together.
- two face images will be obtained, which are The first face image corresponding to the agent and the second face image corresponding to the customer, and then extracting the expression features from the first face image, obtaining the first expression feature vector according to the extracted expression features, and inputting the first expression feature vector in advance
- the trained fraud probability prediction model obtains the first fraud probability corresponding to the agent, and at the same time extracts facial expression features from the second face image, obtains the second facial expression feature vector according to the extracted facial expression feature, and inputs the second facial expression feature vector in advance
- the trained fraud probability prediction model obtains the second fraud probability corresponding to the customer.
- the training steps of the fraud probability prediction model are as follows: First, select video samples with obvious fraud and non-fraud from the network information or audio and video database, and assign a fraud label to each video sample ,
- the fraud label indicates whether the person in the video sample is suspected of fraud, for example, 1 is suspected of fraud, 0 is not suspected of fraud, extract facial expression features from the video sample, get the facial expression feature vector according to the facial expression feature, and take the facial expression feature vector as input
- the corresponding fraud label is used as the expected output sample for supervised model training, so as to obtain a trained fraud probability prediction model.
- the server may calculate the fraud probability corresponding to each frame of image to calculate the average fraud probability, when the average fraud probability does not exceed the preset threshold and the pass rate of the face recognition result exceeds the preset At the threshold, it indicates that the video detection result corresponding to the node to be detected is passed; otherwise, it indicates that the video detection result corresponding to the node to be detected is failed.
- the above method further includes: extracting voiceprint features from the dual-recorded audio data, and combining the extracted voiceprint features with the pre-stored voice The pattern features are compared, and the double-recorded text is marked according to the comparison result; the word detection is performed on the double-recorded text according to the quality inspection rules to obtain the audio detection result corresponding to the node to be detected, including: according to the marking result and the quality inspection rules The recorded text is subjected to word detection to obtain the audio detection result corresponding to the node to be detected.
- the pre-stored voiceprint feature is the agent's voiceprint feature.
- the server may extract the voiceprint feature from the dual-recorded audio data, and perform the pre-stored voiceprint feature of the agent Comparison, when the comparison is successful, the corresponding double recorded text is marked as the agent voice text; when the comparison is not successful, the corresponding double recorded text is marked as the customer voice text.
- voiceprint features Mel Cepstral Coefficients can be used for extraction.
- the quality inspection rules of the agent and the customer are separately configured, that is, the quality inspection rules may include keywords corresponding to the agent and corresponding to the agent Forbidden words, keywords corresponding to customers, and forbidden words corresponding to customers.
- the voice text corresponding to the agent can be detected according to the quality inspection rules corresponding to the agent, and the voice text corresponding to the customer can be detected according to the quality inspection rules corresponding to the customer.
- the agent Corresponding speech and text detection detects whether the keyword corresponding to the agent is missing and whether the forbidden language corresponding to the agent is included.
- the quality detection result of the double-recorded video can be made more accurate.
- a dual-recorded video quality detection method including the following steps:
- Step S302 Receive a quality detection request sent by the terminal, where the quality detection request carries the node identifier of the node to be detected.
- Step S304 Search for the corresponding quality inspection rules according to the node identification.
- Step S306 Extract multiple frames of video images from the double-recorded video corresponding to the node to be detected according to the quality inspection rules, and perform face recognition on each frame of the extracted video images to obtain a face recognition result corresponding to each frame of video images.
- step S308 a face image is extracted from each frame of extracted video images.
- Step S310 Extract facial expression features from the face image, and use the trained fraud probability prediction model according to the facial expression features to obtain the fraud probability corresponding to each frame of the video image.
- Step S312 Obtain the video detection result corresponding to the current node according to the face recognition result and fraud probability corresponding to each frame of the video image.
- Step S314 Obtain dual-recorded audio data from the dual-recorded video corresponding to the node to be detected, and perform voice conversion on the dual-recorded audio data to obtain dual-recorded text.
- Step S316 extract voiceprint features from the double-recorded audio data, compare the extracted voiceprint features with pre-stored voiceprint features, and mark the double-recorded text according to the comparison result.
- Step S318 Perform word detection on the double-recorded text according to the marking result and the quality inspection rules to obtain the audio detection result corresponding to the node to be detected.
- Step S320 Obtain the quality detection result corresponding to the node to be detected according to the video detection result and the audio detection result.
- Step S322 when the quality detection result is failed, a re-recording instruction is generated according to the node identifier, and the re-recording instruction is sent to the terminal.
- the re-recording instruction is used to instruct the terminal to jump to the node corresponding to the node identifier.
- performing face recognition on each frame of the extracted video image to obtain a face recognition result corresponding to each frame of the video image includes: performing face detection on each frame of the extracted video image to obtain a first face image And the second face image; compare the first face image and the second face image with the pre-stored face images respectively to obtain the two face recognition scores and the second face corresponding to the first face image The two face recognition scores corresponding to the image; the face recognition score with the larger value is obtained from the two face recognition scores corresponding to the first face image as the first target face recognition score corresponding to the first face image.
- the first face image is the face image corresponding to the agent
- the second face image is the face image corresponding to the customer.
- the face recognition result of the frame of the video image is passed, when the target person corresponding to the agent When any one of the face recognition score and the target face recognition score corresponding to the client does not exceed its corresponding score threshold, the face recognition result of the frame of video image is failed.
- steps in the flowcharts of FIGS. 2-3 are displayed in order according to the arrows, the steps are not necessarily executed in the order indicated by the arrows. Unless clearly stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least a part of the steps in FIG. 2-3 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but may be executed at different times. These sub-steps or stages The execution order of is not necessarily sequential, but may be executed in turn or alternately with at least a part of other steps or sub-steps or stages of other steps.
- a dual-recording video quality detection device 400 including:
- the quality detection request module 402 is used to receive the quality detection request sent by the terminal, and the quality detection request carries the node identifier of the node to be detected;
- the quality inspection rule search module 404 is used to search for the corresponding quality inspection rule according to the node identification
- the video detection result acquisition module 406 is used to extract multiple frames of video images from the double-recorded video corresponding to the node to be detected according to the quality inspection rules, and perform face recognition on the extracted frames of video images to obtain the faces corresponding to the frames of video images Recognition results, according to the facial recognition results corresponding to each frame of video images to get the video detection results of the node to be detected;
- the double-recorded text acquisition module 408 is used to obtain double-recorded audio data from the double-recorded video corresponding to the node to be detected, and perform voice conversion on the double-recorded audio data to obtain the double-recorded text;
- the word detection module 410 is used to perform word detection on the double-recorded text according to the quality inspection rules to obtain the audio detection result corresponding to the node to be detected;
- the quality detection result judgment module 412 is used to obtain the quality detection result corresponding to the node to be detected according to the video detection result and the audio detection result;
- the re-recording instruction generation module 414 is used to generate a re-recording instruction according to the node identifier when the quality detection result is failed; send the re-recording instruction to the terminal, and the re-recording instruction is used to instruct the terminal to jump to the node corresponding to the node identifier.
- the video detection result acquisition module 406 is further used to extract facial images from the extracted frames of video images; extract facial expression features from the facial images, and use the trained fraud probability prediction model according to facial expression features to obtain each Fraud probability corresponding to the frame video image; according to the face recognition result and fraud probability corresponding to each frame video image, the video detection result corresponding to the node to be detected is obtained.
- the dual-record text acquisition module 408 is further used to extract voiceprint features from the dual-recorded audio data, compare the extracted voiceprint features with pre-stored voiceprint features, and perform dual-recording based on the comparison result
- the text is marked;
- the word detection module 410 is also used to perform word detection on the double-recorded text according to the marking result and the quality inspection rules to obtain the audio detection result corresponding to the node to be detected.
- the video detection result acquisition module 406 is further used to perform face detection on each frame of the extracted video images to obtain the first face image and the second face image; the first face image and the second person
- the face images are compared with the pre-stored face images respectively to obtain two face recognition scores corresponding to the first face image and two face recognition scores corresponding to the second face image; from the first face image corresponding to Obtain the largest face recognition score from the two face recognition scores as the first target face recognition score corresponding to the first face image, and obtain the person with the largest value from the two face recognition scores corresponding to the second face image
- the face recognition score is taken as the second target face recognition score corresponding to the second face image; according to the first target face recognition score and the second target face recognition score, the face recognition result corresponding to each frame of the video image is obtained.
- each module in the above-mentioned dual-recording video quality detection device may be implemented in whole or in part by software, hardware, or a combination thereof.
- the above modules may be embedded in the hardware form or independent of the processor in the computer device, or may be stored in the memory in the computer device in the form of software so that the processor can call and execute the operations corresponding to the above modules.
- a computer device is provided.
- the computer device may be a server, and an internal structure diagram thereof may be as shown in FIG. 5.
- the computer device includes a processor, memory, network interface, and database connected by a system bus.
- the processor of the computer device is used to provide computing and control capabilities.
- the memory of the computer device includes a non-volatile storage medium and an internal memory.
- the non-volatile storage medium stores an operating system, computer-readable instructions, and a database.
- the internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium.
- the database of the computer device is used to store relevant data in the process of double-recorded video quality detection.
- the network interface of the computer device is used to communicate with external terminals through a network connection. When the computer-readable instructions are executed by the processor, a method for detecting dual-recorded video quality is realized.
- FIG. 5 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
- the specific computer device may Include more or fewer components than shown in the figure, or combine certain components, or have different component arrangements.
- a computer device includes a memory and one or more processors.
- the memory stores computer-readable instructions.
- the computer-readable instructions are executed by the processor, a method for detecting the quality of a dual-recorded video provided in any embodiment of the present application is implemented. step.
- One or more non-volatile computer-readable storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to implement any one of the embodiments of the present application Provide the steps of the dual recording video quality detection method.
- Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
- Volatile memory can include random access memory (RAM) or external cache memory.
- RAM random access memory
- DRAM dynamic RAM
- SDRAM synchronous DRAM
- DDRSDRAM double data rate SDRAM
- ESDRAM enhanced SDRAM
- SLDRAM synchronous chain (Synchlink) DRAM
- RDRAM direct RAM
- DRAM direct memory bus dynamic RAM
- RDRAM memory bus dynamic RAM
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Television Signal Processing For Recording (AREA)
Abstract
一种双录视频质量检测方法,包括:接收终端发送的携带待检测节点的节点标识的质量检测请求;根据节点标识查找对应的质检规则;根据质检规则从待检测节点对应的双录视频中抽取多帧视频图像,对抽取的各帧视频图像进行人脸识别,得到对应的人脸识别结果,根据人脸识别结果得到视频检测结果;从待检测节点对应的双录视频中获取双录音频数据,对双录音频数据进行语音转换,得到双录文本;根据质检规则对双录文本进行词语检测,得到音频检测结果;根据视频检测结果及音频检测结果得到待检测节点对应的质量检测结果;当质量检测结果为未通过时,根据节点标识生成重录指令并发送至终端,重录指令用于指示终端跳转至节点标识对应的节点。
Description
相关申请的交叉引用
本申请要求于2019年01月04日提交中国专利局,申请号为2019100074351,申请名称为“双录视频质量检测方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及一种双录视频质量检测方法、装置、计算机设备和存储介质。
为了响应中国银监会办公厅关于加强银行业金融机构内控管理有效防范柜面业务风险的通知中对加强业务录音录像的要求,各大银行业金融机构、保险公司在销售保险产品、理财产品等时纷纷增加了双录业务。
传统技术中,通常是双录终端在录制结束后,将整个双录视频发送至质检员终端,质检员终端在得到整个双录视频的质检结果后,若是双录视频中存在不符合要求的地方,则双录终端需要对整个视频重新录制,然而,发明人意识到,这种方式,由于每次都需要重新录制整个视频,导致双录终端系统资源的浪费。
发明内容
根据本申请公开的各种实施例,提供一种双录视频质量检测方法、装置、计算机设备和存储介质。
一种双录视频质量检测方法包括:
接收终端发送的质量检测请求,所述质量检测请求中携带待检测节点的节点标识;
根据所述节点标识查找对应的质检规则;
根据所述质检规则从所述待检测节点对应的双录视频中抽取多帧视频图像,对抽取的各帧所述视频图像进行人脸识别,得到各帧视频图像对应的人脸识别结果,根据各帧视频图像对应的人脸识别结果得到所述待检测节点的视频检测结果;
从所述待检测节点对应的双录视频中获取双录音频数据,对所述双录音频数据进行语音转换,得到双录文本;
根据所述质检规则对所述双录文本进行词语检测,得到所述待检测节点对应的音频检测结果;根据所述视频检测结果及所述音频检测结果得到所述待检测节点对应的质量检测结果;
当所述质量检测结果为未通过时,根据所述节点标识生成重录指令;及
将所述重录指令发送至所述终端,所述重录指令用于指示所述终端跳转至所述节点标识对应的节点。
一种双录视频质量检测装置包括:
质量检测请求模块,用于接收终端发送的质量检测请求,所述质量检测请求中携带待检测节点的节点标识;
质检规则查找模块,用于根据所述节点标识查找对应的质检规则;
视频检测结果获取模块,用于根据所述质检规则从所述待检测节点对应的双录视频中抽取多帧视频图像,对抽取的各帧所述视频图像进行人脸识别,得到各帧视频图像对应的人脸识别结果,根据各帧视频图像对应的人脸识别结果得到所述待检测节点的视频检测结果;
双录文本获取模块,用于从所述待检测节点对应的双录视频中获取双录音频数据,对所述双录音频数据进行语音转换,得到双录文本;
词语检测模块,用于根据所述质检规则对所述双录文本进行词语检测,得到所述待检测节点对应的音频检测结果;质量检测结果判定模块,用于根据所述视频检测结果及所述音频检测结果得到所述待检测节点对应的质量检测结果;
重录指令生成模块,用于当所述质量检测结果为未通过时,根据所述节点标识生成重录指令,将所述重录指令发送至所述终端,所述重录指令用于指示所述终端跳转至所述节点标识对应的节点。
一种计算机设备,包括存储器和一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述一个或多个处理器实现上述双录视频质量检测方法的步骤。
一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器实现上述双录视频质量检测方法的步骤。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征和优点将从说明书、附图以及权利要求书变得明显。
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1为根据一个或多个实施例中双录视频质量检测方法的应用场景图。
图2为根据一个或多个实施例中双录视频质量检测方法的流程示意图。
图3为根据一个或多个实施例中双录视频质量检测方法的流程示意图。
图4为根据一个或多个实施例中双录视频质量检测装置的框图。
图5为根据一个或多个实施例中计算机设备的框图。
为了使本申请的技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请提供的双录视频质量检测方法,可以应用于如图1所示的应用环境中。终端102通过网络与服务器104通过网络进行通信。当终端102进入任意一个双录节点时,将该节点作为待检测节点,并根据该待检测节点的节点标识生成质量检测请求,将该质量检测请求发送至服务器104,服务器104根据质量检测请求中携带的节点标识查找对应的质检规则,然后根据质检规则从待检测节点对应的双录视频中抽取多帧视频图像,对抽取的各帧视频图像进行人脸识别,得到各帧视频图像对应的人脸识别结果,根据各帧视频图像对应的人脸识别结果得到待检测节点的视频检测结果,并从待检测节点对应的双录视频中获取双录音频数据,对双录音频数据进行语音转换,得到双录文本,根据质检规则对双录文本进行词语检测,得到待检测节点对应的音频检测结果,最后根据视频检测结果及音频检测结果得到待检测节点对应的质量检测结果,在质量检测结果为未通过时,根据节点标识生成重录指令,将重录指令发送至终端102,终端102可以跳转至节点标识对应的节点进行重新录制。
终端102可以但不限于是各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备,服务器104可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
在一些实施例中,如图2所示,提供了一种双录视频质量检测方法,以该方法应用于图1中的服务器为例进行说明,包括以下步骤:
步骤S202,接收终端发送的质量检测请求,质量检测请求中携带待检测节点的节点标识。
具体地,双录流程由多个节点组成,每一个节点对应双录中不同的环节,例如,某个银行理财产品可以包括以下三个节点:1、讲述客户购买的具体产品信息;2、征求客户同意;3、明确具体的免责条款。终端可按照预设的顺序依次自动地进入各个节点进行双录,也可以是在监测到对某个节点的触发事件时,进入该节点进行双录。对某个节点的触发事件可以是用户在终端的点击操作,例如,用户点击切换至下一个节点的按钮,也可以是用户的语音,例如,用户可以说出固定的语音来触发进入下一个节点。当终端进入到某个节点开始双录时,根据该节点的节点标识生成质量检测请求,并发送至服务器,服务器接收到该质量检测请求后,对该质量检测请求进行解析,获取其中携带的节点标识,然后对该节点对应的双录视频进行实时质量检测。节点标识用于唯一标识某个双录节点。
步骤S204,根据节点标识查找对应的质检规则。
具体地,不同的节点,由于功能不同,所以其质检规则也不相同,质检规则与节点标识之间事先建立一一对应的映射关系,服务器在接收到质量检测请求后,就可以根据质量检测请求中携带的节点标识去查找对应的质检规则。质检规则包括视频图像帧抽取频率、关键词、禁用词,视频图像抽取帧频率指的是从双录视频中抽取视频图像帧进行人脸识别的频率,关键词指的是在双录过程中必须说的词语;禁用词指的是双录的过程中禁止说的词语。可以理解,本实施例中的质检规则为根据经验进行事先设定的,也可以根据具体需要进行配置修改。
步骤S206,根据质检规则从待检测节点对应的双录视频中抽取多帧视频图像,对抽取的各帧视频图像进行人脸识别,得到各帧视频图像对应的人脸识别结果,根据各帧视频图像对应的人脸识别结果得到待检测节点的视频检测结果。
具体地,服务器根据质检规则中的视频图像帧抽取频率从待检测节点对应的双录视频中抽取视频图像,每次抽取的视频由质检规则进行确定。例如,质检规则中的视频图像帧抽取频率为每5S抽取一帧图像,则服务器每隔5秒从终端的双录视频中抽取一帧视频图像。进一步,服务器每抽取一次视频图像后,可对抽取的视频图像进行人脸识别,然后结合抽取的各帧视频图像的人脸识别结果得到待检测节点的视频检测结果。
在一些实施例中,服务器可根据各帧视频图像的人脸识别结果计算人脸识别通过率,判断人脸识别通过率是否大于预设阈值,若是,则判定待检测节点对应的视频检测结果为通过;否则,判定待检测节点对应的视频检测结果为不通过。举个例子,总共对10帧视频图像进行了人脸识别,其中有8帧视频图像的人脸识别结果为通过,则待检测节点对应的人脸识别通过率为80%,若预设阈值为75%,则待检测节点对应的视频检测结果为通过。
步骤S208,从待检测节点对应的双录视频中获取双录音频数据,对双录音频数据进行语音转换,得到双录文本。
具体地,服务器可以按照预先设定的频率从终端获取双录音频数据。在一些实施例中,服务器可以每隔预设的时间段从终端获取双录音频数据,例如每隔30S从终端获取音频数据;在另一些实施例中,服务器可以待检测节点的双录结束后一次性获取待检测节点的双录音频数据。进一步,服务器在获取到双录音频数据后,对双录音频数据进行语音转换,以得到对应的双录文本。
步骤S210,根据质检规则对双录文本进行词语检测,得到待检测节点对应的音频检测结果。具体地,服务器可以首先对双录文本进行分词,并根据质检规则对分词得到的词语进行词语检测,具体来说,检测分词得到的词语中是否出现质检规则中列出的禁用词、是否缺少质检规则中列出的关键词,当检测到分词得到的词语中未出现质检规则中列出的禁用词且未缺少质检规则中列出的关键词时,则判定获取到的双录音频数据对应的音频检测结果为通过,反之,当出现“分词得到的词语中出现质检规则中列出的禁用词”、“缺少质检规则中列出的关键词”中的任意一种情况时,则判定获取到的双录音频数据对应的音频检测结果为不通过。进一步,当待检测节点对应的多个双录音频数据中的任意一个双录 音频数据的音频检测结果为不通过时,该待检测节点对应的音频检测结果为不通过时。
进一步,当待检测节点对应的音频检测结果为不通过时,服务器可以根据具体的检测结果生成初检建议,例如,当未检测到关键词“是否同意”时,生成初检建议“未提及关键词[是否同意]”。
步骤S212,根据视频检测结果及音频检测结果得到待检测节点对应的质量检测结果。
具体地,当视频检测结果及音频检测结果中的任意一个为未通过时,则待检测节点对应的质量检测结果为未通过;当视频检测结果及音频检测结果均为通过时,则待检测节点对应的质量检测结果为通过。
进一步,服务器可以将待检测节点的质量检测结果发送至终端。具体地,服务器可以在接收到服务器发送的质量检测结果获取请求后将质量检测结果发送至终端;也可以在得到待检测节点的质量检测结果后主动将质量检测结果发送至终端。
步骤S214,当质量检测结果为未通过时,根据节点标识生成重录指令,将重录指令发送至终端,重录指令用于指示终端跳转至节点标识对应的节点。
具体地,对于质量检测结果为未通过的节点,服务器可根据该节点的节点标识生成重录指令,并将重录指令发送至终端,终端在接收到重录指令后,对该重录指令进行解析,获取节点标识。
在一些实施例中,在获取到节点标识后,终端可自动跳转至节点标识对应的节点;在另一些实施例中,终端获取到节点标识后,可以在接收到确认跳转的操作后跳转至节点标识对应的节点,例如,终端可以在接收到重录指令后在显示屏上显示“确认跳转”按钮,在检测到用户对该“确认跳转”按钮的点击操作后,跳转至节点标识对应的节点,开始对该节点进行重录。
上述双录视频质量检测方法中,通过将双录视频分成多个节点进行录制,服务器在接收到对单个节点的质量检测请求后,可以对单个节点进行质量检测,并在检测结果为未通过时,对该节点生成重录指令使得终端能够及时对不合格的视频节点进行重新录制,避免了由于视频中某个地方不合格时,对整个视频进行重新录制,节省了终端的系统资源。
进一步,通过预先设置质检规则,服务器在接收到质量检测请求后,根据节点标识查找对应的质检规则后,根据质检规则可以自动地对待检测节点的视频图像、音频数据进行检测,得到对应的视频检测结果及音频检测结果,最后根据视频检测结果及音频检测结果得到,待检测节点对应的质量检测结果,实现了对实现了双录视频质量的自动检测,不仅节省了人工检测的时间,提高双录视频的质量检测效率,而且可以提高质量检测的准确性。
在一些实施例中,根据质检规则从待检测节点对应的双录视频中抽取多帧视频图像之后,上述方法还包括:从抽取的各帧视频图像中提取人脸图像;对人脸图像提取表情特征,根据表情特征采用已训练的欺诈概率预测模型,得到各帧视频图像对应的欺诈概率;根据各帧视频图像对应的人脸识别结果得到待检测节点的视频检测结果,包括:根据各帧视频图像对应的人脸识别结果及欺诈概率得到待检测节点对应的视频检测结果。
具体地,服务器首先对抽取的各帧视频图像进行人脸检测,得到人脸图像,对人脸图像采用特征提取算法提取表情特征,表情特征包括人脸器官、纹理区域以及预定义的特征点等等,根据这些表情特征得到视频图像对应的表情特征向量,将表情特征向量输入预先已训练的欺诈概率预测模型中得到欺诈概率,欺诈概率用于表征人脸图像对应的主体的欺诈可能性大小,欺诈概率越大则表示欺诈可能性越大。人脸检测算法包括但不限于基于直方图粗分割和奇异值特征的人脸检测算法、基于二进小波变换的人脸检测算法、基于AdaBoost算法的人脸检测、基于面部双眼结构特征的人脸检测算法等;表情特征提取方法包括但不限于主成分分析法(Principal Component Analysis,PCA)、独立成分分析法(Indenpent Compondent Analysis,ICA)和线性判别分析法(Linear Discriminant Analysis,LDA)、Gabor小波法、LBP算子法等。
在具体实施时,大多数场景下,进行双录时,通常是由代理人与客户一起进行双录,这种情况下,对视频图像进行人脸检测时,会得到两个人脸图像,分别是代理人对应的第一人脸图像及客户对应的第二人脸图像,然后对第一人脸图像提取表情特征,根据提取的表情特征得到第一表情特征向量,将第一表情特征向量输入预先已训练的欺诈概率预测模型中得到代理人对应的第一欺诈概率,同时对第二人脸图像提取表情特征,根据提取的表情特征得到第二表情特征向量,将第二表情特征向量输入预先已训练的欺诈概率预测模型中得到客户对应的第二欺诈概率。
在一些实施例中,欺诈概率预测模型的训练步骤如下:首先从网络信息或音视频资料库中选取明显存在欺诈行为的视频样本和无欺诈行为的视频样本,为每个视频样本分配一个欺诈标注,欺诈标注表示该视频样本中的人物有无欺诈嫌疑,例如1表示有欺诈嫌疑,0表示无欺诈嫌疑,从视频样本中提取表情特征,根据表情特征得到表情特征向量,将表情特征向量作为输入样本,将对应的欺诈标注作为期望的输出样本进行有监督地模型训练,从而得到训练好的欺诈概率预测模型。
进一步,服务器得到各帧视频图像对应的欺诈概率后,可计算各帧图像对应的欺诈概率计算得到欺诈概率平均值,当欺诈概率平均值不超过预设阈值且人脸识别结果通过率超过预设阈值时,则表示待检测节点对应的视频检测结果为通过;反之,则表示待检测节点对应的视频检测结果为未通过。
在一些实施例中,从待检测节点对应的双录视频中获取双录音频数据之后,上述方法还包括:对双录音频数据提取声纹特征,将提取到的声纹特征与预先存储的声纹特征进行比对,根据比对结果对双录文本进行标记;根据质检规则对双录文本进行词语检测,得到待检测节点对应的音频检测结果,包括:根据标记结果及质检规则对双录文本进行词语检测,得到待检测节点对应的音频检测结果。
具体地,预先存储的声纹特征为代理人的声纹特征,服务器在获取到双录音频数据后,可从双录音频数据中提取声纹特征,与预先存储的代理人的声纹特征进行比对,当比对成功时,将对应的双录文本标记为代理人语音文本;当比对不成功时,将对应的双录文本标 记为客户语音文本。在进行声纹特征提取时,可利用梅尔倒谱系数进行提取。
本实施例中,为使得音频检测的结果更加准备,在配置质检规则时,将代理人和客户的质检规则分开配置,即质检规则可以包括代理人对应的关键词、代理人对应的禁用语,以及客户对应的关键词、客户对应的禁用语。服务器在完成标记后,可将代理人对应的语音文本按照代理人对应的质检规则进行词语检测,将客户对应的语音文本按照客户对应的质检规则进行词语检测,具体来说,对代理人对应的语音文本检测是否缺少代理人对应的关键词、是否包含了代理人对应的禁用语,对客户对应的语音文本检测是否缺少客户对应的关键词、是否包含了客户对应的禁用语,只有当代理人对应的语音文本未缺少代理人对应的关键词且未包含代理人对应的禁用语,同时客户对应的语音文本未缺少客户对应的关键词且未包含客户对应的禁用语时,该双录音频数据对应的音频质检结果才为通过,其他情况下,该双录音频数据对应的音频质检结果均为未通过。
本实施例中,通过对双录文本进行标记,对代理人语音文本和客户语音文本进行区分检测,可以使得双录视频的质量检测结果更加准确。
在一些实施例中,如图3所示,提供了一种双录视频质量检测方法,包括以下步骤:
步骤S302,接收终端发送的质量检测请求,质量检测请求中携带待检测节点的节点标识。
步骤S304,根据节点标识查找对应的质检规则。
步骤S306,根据质检规则从待检测节点对应的双录视频中抽取多帧视频图像,对抽取的各帧视频图像进行人脸识别,得到各帧视频图像对应的人脸识别结果。
步骤S308,从抽取的各帧视频图像中提取人脸图像。
步骤S310,对人脸图像提取表情特征,根据表情特征采用已训练的欺诈概率预测模型,得到各帧视频图像对应的欺诈概率。
步骤S312,根据各帧视频图像对应的人脸识别结果及欺诈概率得到当前节点对应的视频检测结果。
步骤S314,从待检测节点对应的双录视频中获取双录音频数据,对双录音频数据进行语音转换,得到双录文本。
步骤S316,对双录音频数据提取声纹特征,将提取到的声纹特征与预先存储的声纹特征进行比对,根据比对结果对双录文本进行标记。
步骤S318,根据标记结果及质检规则对双录文本进行词语检测,得到待检测节点对应的音频检测结果。
步骤S320,根据视频检测结果及音频检测结果得到待检测节点对应的质量检测结果。
步骤S322,当质量检测结果为未通过时,根据节点标识生成重录指令,将重录指令发送至终端,重录指令用于指示终端跳转至节点标识对应的节点。
上述实施例中,通过对双录视频图像进行欺诈检测,对双录音频进行标记检测,可以进一步提高双录视频质量检测的准确性。
在一些实施例中,对抽取的各帧视频图像进行人脸识别,得到各帧视频图像对应的人脸识别结果,包括:对抽取的各帧视频图像进行人脸检测,得到第一人脸图像和第二人脸图像;将第一人脸图像和第二人脸图像分别与预先存储的人脸图像进行比对,分别得到第一人脸图像对应的两个人脸识别分数及第二人脸图像对应的两个人脸识别分数;从第一人脸图像对应的两个人脸识别分数中获取数值较大的人脸识别分数作为第一人脸图像对应的第一目标人脸识别分数,从第二人脸图像对应的两个人脸识别分数中获取数值最大的人脸识别分数作为第二人脸图像对应的第二目标人脸识别分数;根据第一目标人脸识别分数及第二目标人脸识别分数,得到各帧视频图像对应的人脸识别结果。
第一人脸图像为代理人对应的人脸图像,第二人脸图像为客户对应的人脸图像,在检测到人脸图像时,由于无法判断各个人脸图像对应的身份,因此,需要将检测到的两个人脸图像分别与预先存储的代理人对应的人脸图像与客户对应的人脸图像进行两次比对,在比对的过程中,身份相匹配的两个人脸图像的人脸识别分数显然比身份不匹配的两个人脸图像的人脸识别分数高些,因此,可以将比对得到的人脸识别分数中数值较大的人脸识别分数作为目标人脸识别分数,从而可以分别得到代理人对应的目标人脸识别分数及客户对应的目标人脸识别分数。
进一步,当代理人对应的目标人脸识别分数及客户对应的目标人脸识别分数均超过各自对应的分数阈值时,则该帧视频图像的人脸识别结果为通过,当代理人对应的目标人脸识别分数、客户对应的目标人脸识别分数中任意一个不超过其对应的分数阈值时,则该帧视频图像的人脸识别结果为未通过。
应该理解的是,虽然图2-3的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图2-3中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
在一些实施例中,如图4所示,提供了一种双录视频质量检测装置400,包括:
质量检测请求模块402,用于接收终端发送的质量检测请求,质量检测请求中携带待检测节点的节点标识;
质检规则查找模块404,用于根据节点标识查找对应的质检规则;
视频检测结果获取模块406,用于根据质检规则从待检测节点对应的双录视频中抽取多帧视频图像,对抽取的各帧视频图像进行人脸识别,得到各帧视频图像对应的人脸识别结果,根据各帧视频图像对应的人脸识别结果得到待检测节点的视频检测结果;
双录文本获取模块408,用于从待检测节点对应的双录视频中获取双录音频数据,对 双录音频数据进行语音转换,得到双录文本;
词语检测模块410,用于根据质检规则对双录文本进行词语检测,得到待检测节点对应的音频检测结果;
质量检测结果判定模块412,用于根据视频检测结果及音频检测结果得到待检测节点对应的质量检测结果;
重录指令生成模块414,用于当质量检测结果为未通过时,根据节点标识生成重录指令;将重录指令发送至终端,重录指令用于指示终端跳转至节点标识对应的节点。
在一些实施例中,视频检测结果获取模块406还用于从抽取的各帧视频图像中提取人脸图像;对人脸图像提取表情特征,根据表情特征采用已训练的欺诈概率预测模型,得到各帧视频图像对应的欺诈概率;根据各帧视频图像对应的人脸识别结果及欺诈概率得到待检测节点对应的视频检测结果。
在一些实施例中,双录文本获取模块408还用于对双录音频数据提取声纹特征,将提取到的声纹特征与预先存储的声纹特征进行比对,根据比对结果对双录文本进行标记;词语检测模块410还用于根据标记结果及质检规则对双录文本进行词语检测,得到待检测节点对应的音频检测结果。
在一些实施例中,视频检测结果获取模块406还用于对抽取的各帧视频图像进行人脸检测,得到第一人脸图像和第二人脸图像;将第一人脸图像和第二人脸图像分别与预先存储的人脸图像进行比对,分别得到第一人脸图像对应的两个人脸识别分数及第二人脸图像对应的两个人脸识别分数;从第一人脸图像对应的两个人脸识别分数中获取数值较大的人脸识别分数作为第一人脸图像对应的第一目标人脸识别分数,从第二人脸图像对应的两个人脸识别分数中获取数值最大的人脸识别分数作为第二人脸图像对应的第二目标人脸识别分数;根据第一目标人脸识别分数及第二目标人脸识别分数,得到各帧视频图像对应的人脸识别结果。
关于双录视频质量检测装置的具体限定可以参见上文中对于双录视频质量检测方法的限定,在此不再赘述。上述双录视频质量检测装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一些实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图5所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储双录视频质量检测过程中的相关数据。该计算机设备的网 络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种双录视频质量检测方法。
本领域技术人员可以理解,图5中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
一种计算机设备,包括存储器和一个或多个处理器,存储器中存储有计算机可读指令,计算机可读指令被处理器执行时实现本申请任意一个实施例中提供的双录视频质量检测方法的步骤。
一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器实现本申请任意一个实施例中提供的双录视频质量检测方法的步骤。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。
Claims (20)
- 一种双录视频质量检测方法,包括:接收终端发送的质量检测请求,所述质量检测请求中携带待检测节点的节点标识;根据所述节点标识查找对应的质检规则;根据所述质检规则从所述待检测节点对应的双录视频中抽取多帧视频图像,对抽取的各帧所述视频图像进行人脸识别,得到各帧视频图像对应的人脸识别结果,根据各帧视频图像对应的人脸识别结果得到所述待检测节点的视频检测结果;从所述待检测节点对应的双录视频中获取双录音频数据,对所述双录音频数据进行语音转换,得到双录文本;根据所述质检规则对所述双录文本进行词语检测,得到所述待检测节点对应的音频检测结果;根据所述视频检测结果及所述音频检测结果得到所述待检测节点对应的质量检测结果;当所述质量检测结果为未通过时,根据所述节点标识生成重录指令;及将所述重录指令发送至所述终端,所述重录指令用于指示所述终端跳转至所述节点标识对应的节点。
- 根据权利要求1所述的方法,其特征在于,在所述根据所述质检规则从所述待检测节点对应的双录视频中抽取多帧视频图像之后,所述方法包括:从抽取的各帧所述视频图像中提取人脸图像;对所述人脸图像提取表情特征,根据所述表情特征采用已训练的欺诈概率预测模型,得到各帧所述视频图像对应的欺诈概率;及所述根据各帧视频图像对应的人脸识别结果得到所述待检测节点的视频检测结果,包括:根据各帧视频图像对应的人脸识别结果及欺诈概率得到待检测节点对应的视频检测结果。
- 根据权利要求1所述的方法,其特征在于,在所述从所述待检测节点对应的双录视频中获取双录音频数据之后,所述方法还包括:对所述双录音频数据提取声纹特征,将提取到的所述声纹特征与预先存储的声纹特征进行比对,根据比对结果对双录文本进行标记;及所述根据所述质检规则对所述双录文本进行词语检测,得到所述待检测节点对应的音频检测结果,包括:根据标记结果及所述质检规则对所述双录文本进行词语检测,得到所述待检测节点对应的音频检测结果。
- 根据权利要求1至3任意一项所述的方法,其特征在于,所述对抽取的各帧所述视频图像进行人脸识别,得到各帧视频图像对应的人脸识别结果,包括:对抽取的各帧视频图像进行人脸检测,得到第一人脸图像和第二人脸图像;将所述第一人脸图像和第二人脸图像分别与预先存储的人脸图像进行比对,分别得到第一人脸图像对应的两个人脸识别分数及第二人脸图像对应的两个人脸识别分数;从第一人脸图像对应的两个人脸识别分数中获取数值较大的人脸识别分数作为所述第一人脸图像对应的第一目标人脸识别分数,从第二人脸图像对应的两个人脸识别分数中获取数值最大的人脸识别分数作为所述第二人脸图像对应的第二目标人脸识别分数;及根据所述第一目标人脸识别分数及所述第二目标人脸识别分数,得到各帧视频图像对应的人脸识别结果。
- 根据权利要求1所述的方法,其特征在于,所述根据各帧视频图像对应的人脸识别结果得到所述待检测节点的视频检测结果,包括:根据各帧视频图像的人脸识别结果计算人脸识别通过率;判断所述人脸识别通过率是否大于预设阈值;若是,则判定所述待检测节点对应的视频检测结果为第一结果,所述第一结果用于表征对所述待检测节点对应的视频质量检测通过;及若否,则判定所述待检测节点对应的视频检测结果为第二结果,所述第二结果用于表征对所述待检测节点对应的视频质量检测未通过。
- 根据权利要求2所述的方法,其特征在于,所述欺诈概率预测模型按照以下步骤训练得到:从预设的视频资料库中获取视频样本;所述视频样本包括第一视频样本及第二视频样本;所述第一视频样本为存在欺诈行为的视频样本;所述第二视频样本为不存在欺诈行为的视频样本;获取所述视频样本对应的欺诈标注;及从所述视频样本中提取表情特征,根据提取到的所述表情特征得到表情特征向量,将得到的所述表情特征向量作为输入样本,将对应的欺诈标注作为期望的输出样本进行模型训练,得到训练好的欺诈概率预测模型。
- 一种双录视频质量检测装置,其特征在于,所述装置包括:质量检测请求模块,用于接收终端发送的质量检测请求,所述质量检测请求中携带待检测节点的节点标识;质检规则查找模块,用于根据所述节点标识查找对应的质检规则;视频检测结果获取模块,用于根据所述质检规则从所述待检测节点对应的双录视频中抽取多帧视频图像,对抽取的各帧所述视频图像进行人脸识别,得到各帧视频图像对应的人脸识别结果,根据各帧视频图像对应的人脸识别结果得到所述待检测节点的视频检测结果;双录文本获取模块,用于从所述待检测节点对应的双录视频中获取双录音频数据,对所述双录音频数据进行语音转换,得到双录文本;词语检测模块,用于根据所述质检规则对所述双录文本进行词语检测,得到所述待检测节点对应的音频检测结果;质量检测结果判定模块,用于根据所述视频检测结果及所述音频检测结果得到所述待检测节点对应的质量检测结果;及重录指令生成模块,用于当所述质量检测结果为未通过时,根据所述节点标识生成重录指令,将所述重录指令发送至所述终端,所述重录指令用于指示所述终端跳转至所述节点标识对应的节点。
- 根据权利要求7所述的装置,其特征在于,所述视频检测结果获取模块还用于从抽取的各帧所述视频图像中提取人脸图像;对所述人脸图像提取表情特征,根据所述表情特征采用已训练的欺诈概率预测模型,得到各帧所述视频图像对应的欺诈概率;及根据各帧视频图像对应的人脸识别结果及欺诈概率得到待检测节点对应的视频检测结果。
- 一种计算机设备,包括存储器及一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:接收终端发送的质量检测请求,所述质量检测请求中携带待检测节点的节点标识;根据所述节点标识查找对应的质检规则;根据所述质检规则从所述待检测节点对应的双录视频中抽取多帧视频图像,对抽取的各帧所述视频图像进行人脸识别,得到各帧视频图像对应的人脸识别结果,根据各帧视频图像对应的人脸识别结果得到所述待检测节点的视频检测结果;从所述待检测节点对应的双录视频中获取双录音频数据,对所述双录音频数据进行语音转换,得到双录文本;根据所述质检规则对所述双录文本进行词语检测,得到所述待检测节点对应的音频检测结果;根据所述视频检测结果及所述音频检测结果得到所述待检测节点对应的质量检测结果;当所述质量检测结果为未通过时,根据所述节点标识生成重录指令;及将所述重录指令发送至所述终端,所述重录指令用于指示所述终端跳转至所述节点标识对应的节点。
- 根据权利要求9所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:从抽取的各帧所述视频图像中提取人脸图像;对所述人脸图像提取表情特征,根据所述表情特征采用已训练的欺诈概率预测模型,得到各帧所述视频图像对应的欺诈概率;及根据各帧视频图像对应的人脸识别结果及欺诈概率得到待检测节点对应的视频检测 结果。
- 根据权利要求9所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:对所述双录音频数据提取声纹特征,将提取到的所述声纹特征与预先存储的声纹特征进行比对,根据比对结果对双录文本进行标记;及根据标记结果及所述质检规则对所述双录文本进行词语检测,得到所述待检测节点对应的音频检测结果。
- 根据权利要求9至11任意一项所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:对抽取的各帧视频图像进行人脸检测,得到第一人脸图像和第二人脸图像;将所述第一人脸图像和第二人脸图像分别与预先存储的人脸图像进行比对,分别得到第一人脸图像对应的两个人脸识别分数及第二人脸图像对应的两个人脸识别分数;从第一人脸图像对应的两个人脸识别分数中获取数值较大的人脸识别分数作为所述第一人脸图像对应的第一目标人脸识别分数,从第二人脸图像对应的两个人脸识别分数中获取数值最大的人脸识别分数作为所述第二人脸图像对应的第二目标人脸识别分数;及根据所述第一目标人脸识别分数及所述第二目标人脸识别分数,得到各帧视频图像对应的人脸识别结果。
- 根据权利要求9所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:根据各帧视频图像的人脸识别结果计算人脸识别通过率;判断所述人脸识别通过率是否大于预设阈值;若是,则判定所述待检测节点对应的视频检测结果为第一结果,所述第一结果用于表征对所述待检测节点对应的视频质量检测通过;及若否,则判定所述待检测节点对应的视频检测结果为第二结果,所述第二结果用于表征对所述待检测节点对应的视频质量检测未通过。
- 根据权利要求10所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:从预设的视频资料库中获取视频样本;所述视频样本包括第一视频样本及第二视频样本;所述第一视频样本为存在欺诈行为的视频样本;所述第二视频样本为不存在欺诈行为的视频样本;获取所述视频样本对应的欺诈标注;及从所述视频样本中提取表情特征,根据提取到的所述表情特征得到表情特征向量,将得到的所述表情特征向量作为输入样本,将对应的欺诈标注作为期望的输出样本进行模型训练,得到训练好的欺诈概率预测模型。
- 一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,所述计算机 可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:接收终端发送的质量检测请求,所述质量检测请求中携带待检测节点的节点标识;根据所述节点标识查找对应的质检规则;根据所述质检规则从所述待检测节点对应的双录视频中抽取多帧视频图像,对抽取的各帧所述视频图像进行人脸识别,得到各帧视频图像对应的人脸识别结果,根据各帧视频图像对应的人脸识别结果得到所述待检测节点的视频检测结果;从所述待检测节点对应的双录视频中获取双录音频数据,对所述双录音频数据进行语音转换,得到双录文本;根据所述质检规则对所述双录文本进行词语检测,得到所述待检测节点对应的音频检测结果;根据所述视频检测结果及所述音频检测结果得到所述待检测节点对应的质量检测结果;当所述质量检测结果为未通过时,根据所述节点标识生成重录指令;及将所述重录指令发送至所述终端,所述重录指令用于指示所述终端跳转至所述节点标识对应的节点。
- 根据权利要求15所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:从抽取的各帧所述视频图像中提取人脸图像;对所述人脸图像提取表情特征,根据所述表情特征采用已训练的欺诈概率预测模型,得到各帧所述视频图像对应的欺诈概率;及根据各帧视频图像对应的人脸识别结果及欺诈概率得到待检测节点对应的视频检测结果。
- 根据权利要求15所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:对所述双录音频数据提取声纹特征,将提取到的所述声纹特征与预先存储的声纹特征进行比对,根据比对结果对双录文本进行标记;及根据标记结果及所述质检规则对所述双录文本进行词语检测,得到所述待检测节点对应的音频检测结果。
- 根据权利要求15至17任意一项所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:对抽取的各帧视频图像进行人脸检测,得到第一人脸图像和第二人脸图像;将所述第一人脸图像和第二人脸图像分别与预先存储的人脸图像进行比对,分别得到第一人脸图像对应的两个人脸识别分数及第二人脸图像对应的两个人脸识别分数;从第一人脸图像对应的两个人脸识别分数中获取数值较大的人脸识别分数作为所述第一人脸图像对应的第一目标人脸识别分数,从第二人脸图像对应的两个人脸识别分数中 获取数值最大的人脸识别分数作为所述第二人脸图像对应的第二目标人脸识别分数;及根据所述第一目标人脸识别分数及所述第二目标人脸识别分数,得到各帧视频图像对应的人脸识别结果。
- 根据权利要求15所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:根据各帧视频图像的人脸识别结果计算人脸识别通过率;判断所述人脸识别通过率是否大于预设阈值;若是,则判定所述待检测节点对应的视频检测结果为第一结果,所述第一结果用于表征对所述待检测节点对应的视频质量检测通过;及若否,则判定所述待检测节点对应的视频检测结果为第二结果,所述第二结果用于表征对所述待检测节点对应的视频质量检测未通过。
- 根据权利要求16所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:从预设的视频资料库中获取视频样本;所述视频样本包括第一视频样本及第二视频样本;所述第一视频样本为存在欺诈行为的视频样本;所述第二视频样本为不存在欺诈行为的视频样本;获取所述视频样本对应的欺诈标注;及从所述视频样本中提取表情特征,根据提取到的所述表情特征得到表情特征向量,将得到的所述表情特征向量作为输入样本,将对应的欺诈标注作为期望的输出样本进行模型训练,得到训练好的欺诈概率预测模型。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910007435.1 | 2019-01-04 | ||
CN201910007435.1A CN109729383B (zh) | 2019-01-04 | 2019-01-04 | 双录视频质量检测方法、装置、计算机设备和存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020140665A1 true WO2020140665A1 (zh) | 2020-07-09 |
Family
ID=66298169
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/122478 WO2020140665A1 (zh) | 2019-01-04 | 2019-12-02 | 双录视频质量检测方法、装置、计算机设备和存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN109729383B (zh) |
WO (1) | WO2020140665A1 (zh) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112187814A (zh) * | 2020-10-07 | 2021-01-05 | 广州云智通讯科技有限公司 | 一种智能双录方法、系统及服务器 |
CN112287898A (zh) * | 2020-11-26 | 2021-01-29 | 深源恒际科技有限公司 | 一种图像的文本检测质量评价方法及系统 |
CN112560772A (zh) * | 2020-12-25 | 2021-03-26 | 北京百度网讯科技有限公司 | 人脸的识别方法、装置、设备及存储介质 |
CN112669814A (zh) * | 2020-12-17 | 2021-04-16 | 北京猎户星空科技有限公司 | 一种数据处理方法、装置、设备及介质 |
CN113055667A (zh) * | 2021-03-31 | 2021-06-29 | 北京飞讯数码科技有限公司 | 一种视频质量检测方法、装置、电子设备和存储介质 |
CN113095203A (zh) * | 2021-04-07 | 2021-07-09 | 中国工商银行股份有限公司 | 双录数据质检中的客户签名检测方法及装置 |
CN113642335A (zh) * | 2021-08-16 | 2021-11-12 | 上海云从企业发展有限公司 | 银行双录场景的语言合规性检查方法、装置、设备和介质 |
CN115250375A (zh) * | 2021-04-26 | 2022-10-28 | 北京中关村科金技术有限公司 | 一种基于固定话术的音视频内容合规性检测方法及装置 |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109729383B (zh) * | 2019-01-04 | 2021-11-02 | 深圳壹账通智能科技有限公司 | 双录视频质量检测方法、装置、计算机设备和存储介质 |
CN110266645A (zh) * | 2019-05-21 | 2019-09-20 | 平安科技(深圳)有限公司 | 实时数据的验证方法、装置、服务器及介质 |
CN110287318B (zh) * | 2019-06-06 | 2021-09-17 | 秒针信息技术有限公司 | 业务操作的检测方法及装置、存储介质、电子装置 |
CN110364183A (zh) * | 2019-07-09 | 2019-10-22 | 深圳壹账通智能科技有限公司 | 语音质检的方法、装置、计算机设备和存储介质 |
CN110781916B (zh) * | 2019-09-18 | 2024-07-16 | 平安科技(深圳)有限公司 | 视频数据的欺诈检测方法、装置、计算机设备和存储介质 |
CN111050023A (zh) * | 2019-12-17 | 2020-04-21 | 深圳追一科技有限公司 | 视频检测方法、装置、终端设备及存储介质 |
CN111639529A (zh) * | 2020-04-24 | 2020-09-08 | 深圳壹账通智能科技有限公司 | 基于多层次逻辑的语音话术检测方法、装置及计算机设备 |
CN111741356B (zh) * | 2020-08-25 | 2020-12-08 | 腾讯科技(深圳)有限公司 | 双录视频的质检方法、装置、设备及可读存储介质 |
CN112258317B (zh) * | 2020-10-30 | 2022-11-11 | 深圳壹账通智能科技有限公司 | 基于人工智能的线上信贷方法、装置、计算机设备及介质 |
CN112101311A (zh) * | 2020-11-16 | 2020-12-18 | 深圳壹账通智能科技有限公司 | 基于人工智能的双录质检方法、装置、计算机设备及介质 |
CN112788269B (zh) * | 2020-12-30 | 2023-12-29 | 国网甘肃省电力公司白银供电公司 | 视频处理方法、装置、服务器及存储介质 |
CN112348005A (zh) * | 2021-01-11 | 2021-02-09 | 北京远鉴信息技术有限公司 | 双录审核方法、装置、客户端设备及存储介质 |
CN113206998B (zh) * | 2021-04-30 | 2022-12-09 | 中国工商银行股份有限公司 | 一种业务录制的视频数据质检方法及装置 |
CN115883760A (zh) * | 2022-01-11 | 2023-03-31 | 北京中关村科金技术有限公司 | 音视频的实时质检方法、装置及存储介质 |
CN115914673A (zh) * | 2022-01-27 | 2023-04-04 | 北京中关村科金技术有限公司 | 一种基于流媒体服务的合规检测方法及装置 |
CN114926464B (zh) * | 2022-07-20 | 2022-10-25 | 平安银行股份有限公司 | 在双录场景下的图像质检方法、图像质检装置及系统 |
CN116055808B (zh) * | 2022-12-15 | 2024-10-15 | 北京奇艺世纪科技有限公司 | 基于直播间的审核处理方法、装置、设备及介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108053838A (zh) * | 2017-12-01 | 2018-05-18 | 上海壹账通金融科技有限公司 | 结合音频分析和视频分析的欺诈识别方法、装置及存储介质 |
CN108491388A (zh) * | 2018-03-22 | 2018-09-04 | 平安科技(深圳)有限公司 | 数据集获取方法、分类方法、装置、设备及存储介质 |
CN108734570A (zh) * | 2018-05-22 | 2018-11-02 | 深圳壹账通智能科技有限公司 | 一种风险预测方法、存储介质和服务器 |
CN109729383A (zh) * | 2019-01-04 | 2019-05-07 | 深圳壹账通智能科技有限公司 | 双录视频质量检测方法、装置、计算机设备和存储介质 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001036844A (ja) * | 1999-07-16 | 2001-02-09 | Nec Corp | 画質確認装置、画質確認方法及びそのプログラムを記録した記録媒体 |
CN106973305B (zh) * | 2017-03-20 | 2020-02-07 | 广东小天才科技有限公司 | 一种视频中不良内容的检测方法及装置 |
CN107038582A (zh) * | 2017-03-31 | 2017-08-11 | 福建升腾资讯有限公司 | 一种基于理财双录系统上的语音扩展应用方法 |
CN108737667B (zh) * | 2018-05-03 | 2021-09-10 | 平安科技(深圳)有限公司 | 语音质检方法、装置、计算机设备及存储介质 |
CN108776932A (zh) * | 2018-05-22 | 2018-11-09 | 深圳壹账通智能科技有限公司 | 用户投资类型的确定方法、存储介质和服务器 |
-
2019
- 2019-01-04 CN CN201910007435.1A patent/CN109729383B/zh active Active
- 2019-12-02 WO PCT/CN2019/122478 patent/WO2020140665A1/zh active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108053838A (zh) * | 2017-12-01 | 2018-05-18 | 上海壹账通金融科技有限公司 | 结合音频分析和视频分析的欺诈识别方法、装置及存储介质 |
CN108491388A (zh) * | 2018-03-22 | 2018-09-04 | 平安科技(深圳)有限公司 | 数据集获取方法、分类方法、装置、设备及存储介质 |
CN108734570A (zh) * | 2018-05-22 | 2018-11-02 | 深圳壹账通智能科技有限公司 | 一种风险预测方法、存储介质和服务器 |
CN109729383A (zh) * | 2019-01-04 | 2019-05-07 | 深圳壹账通智能科技有限公司 | 双录视频质量检测方法、装置、计算机设备和存储介质 |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112187814A (zh) * | 2020-10-07 | 2021-01-05 | 广州云智通讯科技有限公司 | 一种智能双录方法、系统及服务器 |
CN112187814B (zh) * | 2020-10-07 | 2022-09-09 | 上海基煜基金销售有限公司 | 一种智能双录方法、系统及服务器 |
CN112287898A (zh) * | 2020-11-26 | 2021-01-29 | 深源恒际科技有限公司 | 一种图像的文本检测质量评价方法及系统 |
CN112669814A (zh) * | 2020-12-17 | 2021-04-16 | 北京猎户星空科技有限公司 | 一种数据处理方法、装置、设备及介质 |
CN112560772A (zh) * | 2020-12-25 | 2021-03-26 | 北京百度网讯科技有限公司 | 人脸的识别方法、装置、设备及存储介质 |
CN112560772B (zh) * | 2020-12-25 | 2024-05-14 | 北京百度网讯科技有限公司 | 人脸的识别方法、装置、设备及存储介质 |
CN113055667A (zh) * | 2021-03-31 | 2021-06-29 | 北京飞讯数码科技有限公司 | 一种视频质量检测方法、装置、电子设备和存储介质 |
CN113055667B (zh) * | 2021-03-31 | 2023-05-05 | 北京飞讯数码科技有限公司 | 一种视频质量检测方法、装置、电子设备和存储介质 |
CN113095203A (zh) * | 2021-04-07 | 2021-07-09 | 中国工商银行股份有限公司 | 双录数据质检中的客户签名检测方法及装置 |
CN115250375A (zh) * | 2021-04-26 | 2022-10-28 | 北京中关村科金技术有限公司 | 一种基于固定话术的音视频内容合规性检测方法及装置 |
CN115250375B (zh) * | 2021-04-26 | 2024-01-26 | 北京中关村科金技术有限公司 | 一种基于固定话术的音视频内容合规性检测方法及装置 |
CN113642335A (zh) * | 2021-08-16 | 2021-11-12 | 上海云从企业发展有限公司 | 银行双录场景的语言合规性检查方法、装置、设备和介质 |
Also Published As
Publication number | Publication date |
---|---|
CN109729383A (zh) | 2019-05-07 |
CN109729383B (zh) | 2021-11-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020140665A1 (zh) | 双录视频质量检测方法、装置、计算机设备和存储介质 | |
CN110781916B (zh) | 视频数据的欺诈检测方法、装置、计算机设备和存储介质 | |
CN110147726B (zh) | 业务质检方法和装置、存储介质及电子装置 | |
WO2020244153A1 (zh) | 会议语音数据处理方法、装置、计算机设备和存储介质 | |
US20210142111A1 (en) | Method and device of establishing person image attribute model, computer device and storage medium | |
WO2021068321A1 (zh) | 基于人机交互的信息推送方法、装置和计算机设备 | |
CN109960725B (zh) | 基于情感的文本分类处理方法、装置和计算机设备 | |
EP3617946B1 (en) | Context acquisition method and device based on voice interaction | |
WO2020077896A1 (zh) | 提问数据生成方法、装置、计算机设备和存储介质 | |
WO2021068616A1 (zh) | 身份验证方法、装置、计算机设备和存储介质 | |
CN110751533B (zh) | 产品画像生成方法、装置、计算机设备和存储介质 | |
CN108920640B (zh) | 基于语音交互的上下文获取方法及设备 | |
CN111160275B (zh) | 行人重识别模型训练方法、装置、计算机设备和存储介质 | |
CN110598008B (zh) | 录制数据的数据质检方法及装置、存储介质 | |
CN109831677B (zh) | 视频脱敏方法、装置、计算机设备和存储介质 | |
WO2020052183A1 (zh) | 商标侵权的识别方法、装置、计算机设备和存储介质 | |
CN109766474A (zh) | 审讯信息审核方法、装置、计算机设备和存储介质 | |
CN110310169A (zh) | 基于兴趣值的信息推送方法、装置、设备及介质 | |
CN113435196B (zh) | 意图识别方法、装置、设备及存储介质 | |
WO2020135756A1 (zh) | 视频段的提取方法、装置、设备及计算机可读存储介质 | |
CN113240510A (zh) | 异常用户预测方法、装置、设备及存储介质 | |
CN112632248A (zh) | 问答方法、装置、计算机设备和存储介质 | |
CN114218427A (zh) | 语音质检分析方法、装置、设备及介质 | |
CN114493902A (zh) | 多模态信息异常监控方法、装置、计算机设备及存储介质 | |
CN113571096B (zh) | 语音情绪分类模型训练方法、装置、计算机设备及介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19906697 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 28.10.2021) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19906697 Country of ref document: EP Kind code of ref document: A1 |