WO2022151639A1 - 待识别图片的提取方法、装置、设备以及存储介质 - Google Patents

待识别图片的提取方法、装置、设备以及存储介质 Download PDF

Info

Publication number
WO2022151639A1
WO2022151639A1 PCT/CN2021/097542 CN2021097542W WO2022151639A1 WO 2022151639 A1 WO2022151639 A1 WO 2022151639A1 CN 2021097542 W CN2021097542 W CN 2021097542W WO 2022151639 A1 WO2022151639 A1 WO 2022151639A1
Authority
WO
WIPO (PCT)
Prior art keywords
picture
intervals
temporary
video data
pictures
Prior art date
Application number
PCT/CN2021/097542
Other languages
English (en)
French (fr)
Inventor
王锁平
周登宇
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022151639A1 publication Critical patent/WO2022151639A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/4788Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • H04N7/147Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals

Definitions

  • the present application relates to the field of image recognition, and in particular, to a method, device, device and storage medium for extracting a picture to be recognized.
  • the video customer service robot is a landing application in the field of remote contact jointly created by artificial intelligence technology and traditional audio and video technology.
  • the technology can be applied in financial fields such as policy video playback, remote account opening, etc., and can provide 24-hour uninterrupted service, which not only brings convenience to customers, but also greatly improves the company's service level.
  • the traditional verification method requires the salesman to verify the remote customer against the customer's ID card. The inventor realizes that this verification method will still waste a huge amount of money.
  • Human Resources At present, the identification can be performed by extracting picture frames in the video, but the quality of the extracted picture frames is very unstable, and the face cannot be accurately identified. Therefore, a method for selecting high-quality image frames is urgently needed.
  • the main purpose of this application is to provide a method, device, device and storage medium for extracting a picture to be recognized, which aims to solve the problem that the quality of the extracted picture frame is very unstable and the face cannot be accurately recognized.
  • the present application provides a method for extracting a picture to be recognized, including:
  • the present application also provides a device for extracting a picture to be recognized, including:
  • a data acquisition module for acquiring video data and audio data during a video call
  • a segmentation module configured to divide the video call process into a plurality of first intervals in chronological order, and count the sum of the video data packets and the audio data packets in each of the first intervals;
  • the first selection module is used to select the first interval of the quantity and the preset quantity from each of the first intervals as the second interval;
  • a second selection module configured to select a preset number of second intervals from each of the second intervals as a target interval, and to arbitrarily select a frame of temporary pictures from each of the target intervals;
  • a decoding module configured to perform decoding processing on each of the temporary pictures to obtain corresponding decoded pictures
  • a scoring module configured to score each of the decoded pictures according to a preset picture quality scoring method
  • the extraction module is configured to extract the decoded picture with the highest score as the picture to be recognized for face recognition.
  • the second selection module includes:
  • a detection submodule for detecting whether the number and time of the video data packets and the audio data packets in each of the second intervals correspond;
  • Extraction submodule for the described second interval corresponding to the time and quantity of the video data packet and the audio data packet, marked as the third interval;
  • the actual number of packets is the actual number of the video data packets in the third interval;
  • a preset number of target intervals are selected in order from large to small.
  • the present application also provides a computer device, comprising a memory and a processor, wherein the memory stores a computer program
  • the present application also provides a computer-readable storage medium on which a computer program is stored,
  • Obtain video data and audio data divide them into multiple first intervals, select the second interval with the least packet loss, that is, the number and the largest number, and then select a preset number of target intervals from each second interval, and select the target interval from each target interval.
  • Select a frame of temporary picture for decoding in the interval and then select a picture to be recognized with the highest score for recognition, which improves the accuracy of the picture to be recognized, thereby ensuring the accuracy of automatic recognition, and does not require manual verification by customer service personnel. operation, saving human resources.
  • FIG. 1 is a schematic flowchart of a method for extracting a picture to be recognized according to an embodiment of the present application
  • FIG. 2 is a schematic structural block diagram of an apparatus for extracting a picture to be recognized according to an embodiment of the present application
  • FIG. 3 is a schematic structural block diagram of a computer device according to an embodiment of the present application.
  • the present application proposes a method for extracting a to-be-recognized picture, including:
  • S2 Divide the video call process into a plurality of first intervals in chronological order, and count the sum of the number of video data packets and audio data packets in each of the first intervals;
  • S3 Select the number and the first interval that reaches the preset number from each of the first intervals as the second interval;
  • S4 selecting a preset number of second intervals from each of the second intervals as the target interval, and arbitrarily selecting a frame of temporary pictures from each of the target intervals;
  • step S1 video data and audio data during the video call are acquired.
  • the acquisition method can be directly collected on the terminal where the video customer service robot is located.
  • the video customer service robot When conducting business transactions with customers, the video customer service robot will collect the customer's video data and audio data, so it can be obtained directly from the video customer service robot terminal. Of course, if the execution subject is the video customer service robot, it can be obtained directly.
  • obtaining the video data and audio data during the video call can be a complete video call process, that is, after the conversation is completed, the customer's identity information is verified to determine whether relevant services can be handled for the customer, or it can be during the call process.
  • a piece of video data and audio data obtained in real time in a specific embodiment, the video customer service robot will first have a hello dialogue with the customer, and then start business communication, so the obtained video data and audio data can be the hello conversation.
  • the video call process is divided into a plurality of first intervals according to time sequence, and the sum of the numbers of video data packets and audio data packets in each of the first intervals is counted.
  • the size of each first interval should be the same, and the first interval is specifically the video data and audio data of the customer within the interval. Detect the sum of the number of video data packets and audio data packets, and use the first interval whose sum reaches the preset number as the second interval, so that the number of lost packets in the extracted second interval will be less. The decoding is better and the picture is clearer.
  • the customer can be verified in combination with the video data and audio data in the second interval.
  • the specific verification method is that when the customer speaks, it will generate corresponding voice data, and at this time, the change of the customer's face will be detected. , to judge whether the change is consistent with the sound data, if it is consistent, it can be indicated that the operation is performed by the customer himself.
  • the method of detecting changes is to obtain multiple frames of continuous pictures in the video data with sound data, digitize the pictures according to a preset ternary method, detect the eigenvalues of the face in each frame of pictures, and determine the eigenvalues of the eigenvalues. Whether the change is consistent with the voice data, that is, to detect whether the current voice data is sent by the client, so as to realize the detection of the client's information.
  • the number and the first interval reaching the preset number are selected from each of the first intervals as the second interval.
  • the number and the first interval reaching the preset number are identified as the second interval, and pictures are extracted from the second interval with a small number of lost packets, so that the decoded pictures of the extracted pictures are clearer.
  • a preset number of second intervals are selected from each of the second intervals as target intervals, and a frame of temporary pictures is arbitrarily selected from each of the target intervals.
  • the preset number of settings will not be too large, and there will be more second intervals that meet the requirements.
  • a preset number of target intervals can be selected from each second interval, and the selection method can be It is a random selection, and it can also be a number and a larger second interval, which is not limited in this application, and the selection methods that can realize the selection of a preset number are all within the protection scope of this application.
  • the number of the second interval is less than the preset number, it means that the entire video call process is under the condition of poor network, and it is impossible to communicate with the customer normally at this time, that is, there is no need to verify the customer's information.
  • each temporary picture is decoded separately to obtain a corresponding decoded picture.
  • the decoding processing method is as follows: first determine the category of the temporary picture.
  • the obtained video data is often compressed data, that is, the original image data is generally compressed in the H.264 encoding format.
  • the image is encoded into a GOP (Group of Pictures) segment by segment, and each GOP combination consists of an I frame and several B/P frames.
  • the I frame represents the key frame, which can be understood as the complete preservation of this frame; only the data of this frame can be completed during decoding;
  • B frame To compress the coded image of the transmission data volume; B frame, consider both the coded frame before the source image sequence and the temporal redundancy information between the coded frames after the source image sequence to compress the coded image of the transmission data volume. Therefore, corresponding decoding needs to be performed according to the properties of the temporary picture.
  • each of the decoded pictures is scored according to a preset picture quality scoring method.
  • the preset image quality scoring method may only be based on the pixel dimension, or may be scored in multiple dimensions, for example, the exposure rate, dark light rate, occlusion degree, large declination angle and blur degree of the image. Quality parameters, which comprehensively score images from different dimensions.
  • the decoded picture with the highest score is extracted as a to-be-recognized picture for face recognition. According to the scoring results, the decoded picture with the highest score is extracted for face recognition, so as to ensure the accuracy of the recognized face, so that automatic recognition can be completed to save human resources.
  • the step S4 of selecting a preset number of second intervals from each of the second intervals as the target interval includes:
  • S401 Detect whether the time and quantity of video data packets and audio data packets in each of the second intervals correspond to each other;
  • S402 Mark the second interval corresponding to the time and quantity of the video data packet and the audio data packet as the third interval;
  • S405 Compare the actual number of packets with the theoretical number of packets to obtain a ratio; the actual number of packets is the actual number of the video data packets in the third interval;
  • the selection of the target interval is realized. That is, firstly detect whether the video data packets and audio data packets of each second interval correspond, and the detection method is to detect whether the generation times of the video data packets and the audio data packets correspond, and then detect whether the numbers of the audio data packets and the video data packets correspond.
  • a floating range can be set, and when it is within the floating range, it is determined that the second interval meets the requirements, and the second interval that meets the requirements can be extracted as the third interval, so as to facilitate the next detection.
  • the theoretical packet number can be obtained according to the difference between the packet sequence number of the last video data packet and the packet sequence number of the first video data packet, Then, the number of video packets in the third interval is detected, and the corresponding actual packet number is obtained. The actual packet number is compared with the theoretical packet number. The obtained ratio can reflect the packet loss ratio of each third interval, and then select the least packet loss ratio. , that is, the third interval with the largest ratio is used as the target interval, thereby further ensuring that the extracted picture can be decoded with high quality to obtain a clear decoded picture.
  • the step S5 of performing decoding processing on each of the temporary pictures to obtain a corresponding decoded picture includes:
  • S503 Input all pictures between the picture corresponding to the target key frame and the temporary picture to a CODEC decoder for decoding, so as to obtain the decoded picture.
  • the information of the temporary picture is detected, and it is judged which frame of picture in the GOP (Group of Pictures) combination, when the extracted picture is a P frame, because it is obtained by sufficiently reducing the picture in the image sequence
  • the temporal redundancy information of the previously coded frames is used to compress the coded image of the transmitted data volume. Therefore, according to the video data, it is necessary to find the target key frame closest to the temporary picture, and then transfer the corresponding picture of the target key frame to the temporary picture. After all the pictures in between are combined, they are decoded into one picture by the CODEC decoder, that is, the decoded picture is obtained. That is, the decoding of the P frame picture is realized.
  • the method further includes:
  • S5022 Input the temporary picture, the picture corresponding to the target key frame, and all the P-frame pictures to a CODEC decoder for decoding to obtain the decoded picture.
  • the picture information shows that the temporary picture is a B frame
  • it is related to both the coded frames before the source image sequence and the temporal redundancy information between the coded frames after the source image sequence.
  • the picture corresponding to the target key frame and all the P frame pictures are input to the CODEC decoder for decoding to obtain the decoded picture.
  • the picture information shows that the temporary picture is an I frame (ie, a key frame), it can be directly decoded according to the CODEC decoder.
  • the step S6 of scoring each of the decoded pictures according to a preset picture quality scoring method includes:
  • S602 Obtain a corresponding score coefficient according to the corresponding relationship between the pixel value and the score coefficient
  • S603 Input the decoded picture into a pre-built image detection model to obtain dimension values of the decoded picture in each dimension;
  • S604 Input the score coefficient and each of the dimension values into a formula Calculate in , to obtain the score value of the decoded picture; wherein, Score represents the score value, k represents the score coefficient, n represents the total number of detected dimensions in the image detection model, and w i represents the i-th dimension pair The influence weight of the rating value, v i represents the dimension value of the i-th dimension.
  • the pixel value of the decoded picture is obtained. Since the decoded picture has been obtained, the pixel value of the decoded picture can be directly obtained according to the corresponding image processing software, such as PS (Photoshop). Considering the decoded picture There will also be differences in the pixels of the decoded picture, and for the decoded picture, the pixel is a very important indicator, so the corresponding relationship between the pixel and the score coefficient can be established in advance, and the corresponding score can be directly obtained after the pixels of the decoded picture are obtained. coefficient k, and then the influence of other dimensions on the score also needs to be considered.
  • Different weight coefficients w i can be assigned in advance according to the importance of face recognition for decoded pictures in each dimension, and then according to the dimension value v i in each dimension, according to formula
  • the scoring value of each decoded picture is calculated. This formula takes into account the various dimensions of the picture, and scores the picture dimensions according to the synthesis, which makes the scoring more standardized, and also improves the accuracy of face recognition of the decoded pictures.
  • step S2 of dividing the video call process into a plurality of first intervals in chronological order, and counting the sum of the numbers of video data packets and audio data packets in each of the first intervals include:
  • the identification of the place is realized, that is, the sound feature information in the audio data is first extracted, and the extraction method can be to extract the sound through the Librosa audio processing library and the openSMILE toolkit.
  • the pre-stored sound feature information and business scene correspondence table in the sound database can be used to identify business scene information, and then convert the audio data into semantic information, extract the address keywords in the semantic information, and query the location of customers according to the address keywords.
  • the location of the customer can also be identified through GPS), according to the business scenario information and location, obtain the location where the customer is located, that is, the current location, and determine whether the current location satisfies the call request, that is, whether it is in a crowded area.
  • the method before the step S7 of extracting the decoded picture with the highest score as a to-be-recognized picture, the method further includes:
  • S612 Obtain auxiliary identification pictures at preset time intervals based on the timestamp
  • S613 Perform grayscale processing on the auxiliary identification picture and the to-be-identified picture to obtain a first grayscale picture and a second grayscale picture correspondingly;
  • S614 Calculate the average value Am of the grayscale values of all the pixels in the mth column or the mth row of the grayscale picture, and calculate the average value B of the grayscale values of all the pixels in the grayscale picture;
  • the degree picture is the first grayscale picture or the second grayscale picture;
  • S615 According to the formula Calculate the overall variance of the m-th column or the m-th row of the grayscale image, where N is the total number of columns or rows in the grayscale image;
  • an auxiliary identification picture at a preset time interval is obtained, where the preset interval time may be a forward interval preset time or a backward time
  • the interval preset time can be set by yourself, or both can be obtained, and the process of two identifications is performed.
  • the time stamp is the reception time corresponding to the video data packet corresponding to the image to be identified, and the reception time of the video data packet is used as the time stamp.
  • the detection before face recognition is realized in the picture to be recognized, wherein the grayscale refers to the color representing a grayscale color.
  • the grayscale refers to the color representing a grayscale color.
  • the gray scale range is, for example, 0-255 (when the values of R, G, and B are all 0-255, of course, it will also change with the change of the value ranges of R, G, and B).
  • the grayscale processing method may be any method, such as component method, maximum value method, average value method, weighted average method, and the like. Among them, since there are only 256 gray value values, comparing pictures on this basis can greatly reduce the amount of calculation. Then calculate the average value Am of the grayscale values of all the pixels in the mth column or the mth row of the grayscale picture, and calculate the average value B of the grayscale values of all the pixels in the grayscale picture. Wherein, the process of calculating the average value Am of the grayscale values of all the pixels in the mth column or the mth row of the grayscale picture includes: collecting all the pixels in the mth column or the mth row of the grayscale picture.
  • the grayscale value of the grayscale value of all the pixels in the mth column or the mth row is added, and the sum of the grayscale values obtained by the summation process is divided by the mth column or the mth.
  • the number of all the pixels in the m-th row is obtained, and the average value Am of the gray-scale values of all the pixel points in the m-th column or the m-th row of the grayscale picture is obtained.
  • the process of calculating the average value B of the grayscale values of all the pixels in the grayscale picture includes: calculating the sum of the grayscale values of all the pixels in the grayscale picture, and then dividing the sum of the grayscale values by The number of the pixel points, the average value B of the grayscale values of all the pixel points in the grayscale picture is obtained.
  • N is the total number of columns or rows in the grayscale image.
  • the overall variance is used to measure the average value Am of the grayscale values of the pixels in the mth column or the mth row of the grayscale picture and the average of the grayscale values of all the pixels in the grayscale picture. Difference between values B.
  • the gray value of the mth column or mth row of the first grayscale image is the same or approximately the same as the gray value of the mth column or mth row of the second grayscale image (approximate judgment to save computing power) , and because the overall variance of the two different pictures is generally unequal, the accuracy of the judgment is very high), on the contrary, the gray value of the mth column or the mth row of the first grayscale picture is considered to be the same as the second grayscale value.
  • the grayscale values of the mth column or the mth row of the picture are different. judge Is it less than the preset variance error threshold.
  • the return value of is the maximum value in . like If it is less than the preset variance error threshold, it is determined that the specified picture is similar to the picture obtained by the specified terminal in the previous time. Thus, approximate judgment is used (because all grayscale values of grayscale pictures converted from two different pictures are generally unequal, and all grayscale values of grayscale pictures converted from the same picture are generally equal), realizing less consumption. On the premise of computing resources, it is determined whether the auxiliary identification picture is similar to the to-be-identified picture.
  • the subsequent steps are performed only when the auxiliary recognition picture is similar to the to-be-recognized picture (if the auxiliary-recognized picture is similar to the to-be-recognized picture, it means that the customer has been in the same scene, the network is good, and there is no location change, that is, the environment where the customer is located meets the identification conditions to prevent illegal elements from using the customer's clipped video to pretend to be the customer for business), thus ensuring the security of customer data.
  • the present application also provides a device for extracting a picture to be recognized, including:
  • the data acquisition module 10 is used for acquiring video data and audio data during the video call;
  • the dividing module 20 is used to divide the video call process into a plurality of first intervals according to time sequence, and count the number sum of video data packets and audio data packets in each of the first intervals;
  • the first selection module 30 is used to select the first interval of the quantity and the preset quantity from each of the first intervals as the second interval;
  • the second selection module 40 is configured to select a preset number of second intervals from each of the second intervals as a target interval, and to select a frame of temporary pictures from each of the target intervals at will;
  • a decoding module 50 configured to perform decoding processing on each of the temporary pictures to obtain corresponding decoded pictures
  • a scoring module 60 configured to score each of the decoded pictures according to a preset picture quality scoring method
  • the extraction module 70 is configured to extract the decoded picture with the highest score as the picture to be recognized for face recognition.
  • the second selection module 40 includes:
  • a detection submodule for detecting whether the number and time of the video data packets and the audio data packets in each of the second intervals correspond;
  • Extraction submodule for the described second interval corresponding to the time and quantity of the video data packet and the audio data packet, marked as the third interval;
  • the actual number of packets is the actual number of the video data packets in the third interval;
  • a preset number of target intervals are selected in order from large to small.
  • Beneficial effects of the present application obtain video data and audio data, divide them into multiple first intervals, select the second interval with the least packet loss, that is, the number and the largest number, and then select a preset number from each second interval
  • a frame of temporary pictures is selected from each target interval for decoding, and then a picture to be recognized with the highest score is selected for recognition, which improves the recognition accuracy of the to-be-recognized picture, thereby ensuring the accuracy of automatic recognition.
  • an embodiment of the present application further provides a computer device.
  • the computer device may be a server, and its internal structure may be as shown in FIG. 3 .
  • the computer device includes a processor, memory, a network interface and a database connected by a system bus.
  • the processor of the computer design is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium, an internal memory.
  • the nonvolatile storage medium stores an operating system, a computer program, and a database.
  • the memory provides an environment for the execution of the operating system and computer programs in the non-volatile storage medium.
  • the database of the computer device is used to store various video data, audio data, and the like.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • FIG. 3 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied.
  • Embodiments of the present application further provide a computer-readable storage medium on which a computer program is stored, and the computer-readable storage medium may be non-volatile or volatile.
  • the computer program is executed by the processor, the method for extracting the to-be-recognized picture described in any of the foregoing embodiments can be implemented.
  • Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in a variety of forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • SSRSDRAM double-rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous Link (Synchlink) DRAM
  • SLDRAM synchronous Link (Synchlink) DRAM
  • Rambus direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM
  • Blockchain is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • the underlying platform of the blockchain can include processing modules such as user management, basic services, smart contracts, and operation monitoring.
  • the user management module is responsible for the identity information management of all blockchain participants, including maintenance of public and private key generation (account management), key management, and maintenance of the corresponding relationship between the user's real identity and blockchain address (authority management), etc.
  • account management maintenance of public and private key generation
  • key management key management
  • authorization management maintenance of the corresponding relationship between the user's real identity and blockchain address
  • the basic service module is deployed on all blockchain node devices to verify the validity of business requests, After completing the consensus on valid requests, record them in the storage.
  • the basic service For a new business request, the basic service first adapts the interface for analysis and authentication processing (interface adaptation), and then encrypts the business information through the consensus algorithm (consensus management), After encryption, it is completely and consistently transferred to the shared ledger (network communication), and records are stored; the smart contract module is responsible for the registration and issuance of contracts, as well as contract triggering and contract execution.
  • contract logic through a programming language and publish to On the blockchain (contract registration), according to the logic of the contract terms, call the key or other events to trigger execution, complete the contract logic, and also provide the function of contract upgrade and cancellation;
  • the operation monitoring module is mainly responsible for the deployment in the product release process , configuration modification, contract settings, cloud adaptation, and visual output of real-time status in product operation, such as: alarms, monitoring network conditions, monitoring node equipment health status, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

本申请涉及人工智能领域,提供了一种待识别图片的提取方法、装置、设备以及存储介质,其中,方法包括:获取视频数据和音频数据,分为多个第一区间,选取其中丢包最少,也即数量和最多的第二区间,然后从各个第二区间内选取预设个数的目标区间,从各个目标区间内选取一帧暂时图片进行解码,再选取一张评分最高的待识别图片进行识别,提高了待识别图片被识别出来的准确度,进而保证了自动识别的准确度,无需客服人员进行人工核对操作,节约了人力资源。

Description

待识别图片的提取方法、装置、设备以及存储介质
本申请要求于2021年01月12日提交中国专利局、申请号为202110037554.9,发明名称为“待识别图片的提取方法、装置、设备以及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像识别领域,特别涉及一种待识别图片的提取方法、装置、设备以及存储介质。
背景技术
随着人工智能基础研究的逐步深化,人工智能在远程接触应用领域正在加速落地,视频客服机器人事是利用人工智能技术和传统的音视频技术联合打造的一款远程接触领域的落地应用,这个组合技术可以应用在保单视频回放、远程开户等金融领域,同时可以提供24小时不间断服务,给客户带来方便的同时也很大的提升了公司的服务水平。目前,在进行业务往来时,需要核实客户的身份信息,传统的核实方式需要由业务员比照客户的身份证对远端的客户进行核实,发明人意识到,这种核实方式仍会浪费巨大的人力资源。目前,可以通过抽取视频中的图片帧进行识别,但是抽取的图片帧的质量非常不稳定,无法准确识别出人脸,因此,亟需一种选取质量高的图像帧的方法。
技术问题
本申请的主要目的为提供一种待识别图片的提取方法、装置、设备以及存储介质,旨在解决抽取的图片帧的质量非常不稳定,无法准确识别出人脸的问题。
技术解决方案
本申请提供了一种待识别图片的提取方法,包括:
获取视频通话过程中的视频数据和音频数据;
按照时间顺序将所述视频通话过程分割为多个第一区间,并统计每个所述第一区间内视频数据包和音频数据包的数量和;
从各所述第一区间中选取所述数量和达到预设数量的第一区间,作为第二区间;
从各所述第二区间中选取预设个数的第二区间作为目标区间,并分别从各所述目标区间中任意选取一帧暂时图片;
将各所述暂时图片分别进行解码处理,得到对应的解码图片;
按照预设的图片质量评分方法对各所述解码图片进行评分;
将所述评分最高的所述解码图片作为待识别图片进行提取,以进行人脸识别。
本申请还提供了一种待识别图片的提取装置,包括:
数据获取模块,用于获取视频通话过程中的视频数据和音频数据;
分割模块,用于按照时间顺序将所述视频通话过程分割为多个第一区间,并统计每个所述第一区间内视频数据包和音频数据包的数量和;
第一选取模块,用于从各所述第一区间中选取所述数量和达到预设数量的第一区间,作为第二区间;
第二选取模块,用于从各所述第二区间中选取预设个数的第二区间作为目标区间,并分别从各所述目标区间中任意选取一帧暂时图片;
解码模块,用于将各所述暂时图片分别进行解码处理,得到对应的解码图片;
评分模块,用于按照预设的图片质量评分方法对各所述解码图片进行评分;
提取模块,用于将所述评分最高的所述解码图片作为待识别图片进行提取,以进行人脸识别。
进一步地,所述第二选取模块,包括:
检测子模块,用于检测各所述第二区间内的视频数据包和音频数据包的数量和时间是否对应;
提取子模块,用于将视频数据包和音频数据包的时间和数量相对应的所述第二区间,标记为第三区间;
获取所述第三区间的各个所述视频数据包以及各所述视频数据包的包序号;
根据所述第三区间内最后一个视频数据包的包序号,与最前一个视频数据包的包序号之间的差值,得到理论包数量;
将实际包数量与所述理论包数量相比,得到比值;所述实际包数量为在所述第三区间内的所述视频数据包的实际数量;
按照各所述第三区间的比值大小,从大至小依次选取预设个数的目标区间。
本申请还提供了一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,
所述处理器执行所述计算机程序时实现待识别图片的提取方法的步骤:
获取视频通话过程中的视频数据和音频数据;
按照时间顺序将所述视频通话过程分割为多个第一区间,并统计每个所述第一区间内视频数据包和音频数据包的数量和;
从各所述第一区间中选取所述数量和达到预设数量的第一区间,作为第二区间;
从各所述第二区间中选取预设个数的第二区间作为目标区间,并分别从各所述目标区间中任意选取一帧暂时图片;
将各所述暂时图片分别进行解码处理,得到对应的解码图片;
按照预设的图片质量评分方法对各所述解码图片进行评分;
将所述评分最高的所述解码图片作为待识别图片进行提取,以进行人脸识别。
本申请还提供了一种计算机可读存储介质,其上存储有计算机程序,
所述计算机程序被处理器执行时实现待识别图片的提取方法的步骤:
获取视频通话过程中的视频数据和音频数据;
按照时间顺序将所述视频通话过程分割为多个第一区间,并统计每个所述第一区间内视频数据包和音频数据包的数量和;
从各所述第一区间中选取所述数量和达到预设数量的第一区间,作为第二区间;
从各所述第二区间中选取预设个数的第二区间作为目标区间,并分别从各所述目标区间中任意选取一帧暂时图片;
将各所述暂时图片分别进行解码处理,得到对应的解码图片;
按照预设的图片质量评分方法对各所述解码图片进行评分;
将所述评分最高的所述解码图片作为待识别图片进行提取,以进行人脸识别。
有益效果
获取视频数据和音频数据,分为多个第一区间,选取其中丢包最少,也即数量和最多的第二区间,然后从各个第二区间内选取预设个数的目标区间,从各个目标区间内选取一帧暂时图片进行解码,再选取一张评分最高的待识别图片进行识别,提高了待识别图片被识别出来的准确度,进而保证了自动识别的准确度,无需客服人员进行人工核对操作,节约了人力资源。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。
图1是本申请一实施例的一种待识别图片的提取方法的流程示意图;
图2是本申请一实施例的一种待识别图片的提取装置的结构示意框图;
图3为本申请一实施例的计算机设备的结构示意框图。
本申请的最佳实施方式
参照图1,本申请提出一种待识别图片的提取方法,包括:
S1:获取视频通话过程中的视频数据和音频数据;
S2:按照时间顺序将所述视频通话过程分割为多个第一区间,并统计每个所述第一区间内视频数据包和音频数据包的数量和;
S3:从各所述第一区间中选取所述数量和达到预设数量的第一区间,作为第二区间;
S4:从各所述第二区间中选取预设个数的第二区间作为目标区间,并分别从各所述目标区间中任意选取一帧暂时图片;
S5:将各所述暂时图片分别进行解码处理,得到对应的解码图片;
S6:按照预设的图片质量评分方法对各所述解码图片进行评分;
S7:将所述评分最高的所述解码图片作为待识别图片进行提取,以进行人脸识别。
如上述步骤S1所述,获取视频通话过程中的视频数据和音频数据。获取的方式可以是直接在视频客服机器人所在的终端上进行采集,在与客户进行业务往来时,视频客服机器人都会采集客户的视频数据和音频数据,因此可以从视频客服机器人端直接获取。当然若执行主体为该视频客服机器人,则可以直接获取。另外,获取视频通话过程中的视频数据和音频数据可以是完整的视频通话过程,即在对话完毕后,验证客户的身份信息,以判断是否可以给客户办理相关业务,也可以是在通话过程中,实时获取的一段视频数据和音频数据,在具体地实施例中,视频客服机器人会先与客户进行一段问好的对话,然后才开始业务的交流,因此获取的视频数据和音频数据可以是该问好的对话。
如上述步骤S2所述,按照时间顺序将所述视频通话过程分割为多个第一区间,并统计每个所述第一区间内视频数据包和音频数据包的数量和。其中,各个第一区间的大小应当一致,第一区间具体为间隔时间内客户的视频数据和音频数据,为了防止不法分子利用别人的图片或视频数据冒充对应的客户,可以对各第一区间所具有的视频数据包和音频数据包的数量和进行检测,将数量和达到预设数量的第一区间作为第二区间,以使抽取的第二区间的丢包数量会更少,对抽取的图片解码更好,图片更加清晰。另一方面,可以结合第二区间内的视频数据和音频数据对客户进行验证,具体地的验证方式为,客户在说话时,其会产生对应的声音数据,此时检测客户脸部的变化情况,判断变化情况是否与声音数据保持一致,若一致,则可以表明是由客户本人在进行操作。另外检测变化情况的方式为获取具有声音数据时视频数据中的多帧连续图片,根据预设的三值化法对图片进行数字化,检测每帧图片中脸部处的特征值,判断特征值的变化情况是否与声音数据保持一致,即检测当前的声音数据是否由客户所发出的,从而实现对客户的信息的检测。
如上述步骤S3所述,从各所述第一区间中选取所述数量和达到预设数量的第一区间,作为第二区间。即将数量和达到预设数量的第一区间认定为第二区间,从丢包数量少的第二区间中抽取图片,以使抽取的图片解码后的图片更加清晰。
如上述步骤S4所述,从各所述第二区间中选取预设个数的第二区间作为目标区间,并分别从各所述目标区间中任意选取一帧暂时图片。一般而言,设置的预设数量不会太大,而满足要求的第二区间会比较多,为了节省计算量,可以从各个第二区间中选取预设个数的目标区间,选取的方式可以是随机选取,也可以是选取数量和较多的第二区间,本申请对此不做限定,可以实现选取预设个数的选取方法均在本申请的保护范围内。另外,若第二区间的数量小于该预设个数,则说明整个视频通话过程都处于网络不好的情况下,此时都无法与客 户进行正常沟通,也即无需对客户的信息进行核实。
如上述步骤S5所述,将各所述暂时图片分别进行解码处理,得到对应的解码图片。解码处理的方式具体为,先判断暂时图片的类别,在视频通话过程中,获取到的视频数据往往是压缩后的数据,即原始图像数据一般会采用H.264编码格式进行压缩,每多张图像进行编码后生产成一段一段的GOP(Group of Pictures),每一个GOP组合由一张I帧和数张B/P帧组成。其中,I帧表示关键帧,可以理解为这一帧画面的完整保留;解码时只需要本帧数据就可以完成;P帧,是通过充分降低于图像序列中前面已编码帧的时间冗余信息来压缩传输数据量的编码图像;B帧,既考虑与源图像序列前面已编码帧,也顾及源图像序列后面已编码帧之间的时间冗余信息来压缩传输数据量的编码图像。因此需要根据暂时图片的属性进行对应的解码。
如上述步骤S6所述,按照预设的图片质量评分方法对各所述解码图片进行评分。其中,预设的图片质量评分方法,可以仅仅是针对像素的维度进行评分,也可以是多维度进行评分,例如,图片的曝光率、暗光率、遮挡度、大偏角和模糊度等图像质量参数,从不同的维度对图片进行综合评分。
如上述步骤S7所述,将所述评分最高的所述解码图片作为待识别图片进行提取,以进行人脸识别。根据评分结果将评分最高的解码图片进行提取,以进行人脸识别,从而保证识别到人脸的准确度,使可以完成自动化识别,以节约人力资源。
在一个实施例中,所述从各所述第二区间中选取预设个数的第二区间作为目标区间的步骤S4,包括:
S401:检测各所述第二区间内的视频数据包和音频数据包的时间和数量是否对应;
S402:将视频数据包和音频数据包的时间和数量相对应的所述第二区间,标记为第三区间;
S403:获取所述第三区间的各个所述视频数据包以及各所述视频数据包的包序号;
S404:根据所述第三区间内最后一个视频数据包的包序号,与最前一个视频数据包的包序号之间的差值,得到理论包数量;
S405:将实际包数量与所述理论包数量相比,得到比值;所述实际包数量为在所述第三区间内的所述视频数据包的实际数量;
S406:按照各所述第三区间的比值大小,从大至小依次选取预设个数的目标区间。
如上述步骤S401-S402所述,实现了对目标区间的选取。即先检测各个第二区间的视频数据包和音频数据包是否对应,检测的方式为检测视频数据包和音频数据包的产生时间是否对应,然后再检测音频数据包和视频数据包的数量是否对应,因为一个区间内,其接收音频数据包和视频数据包到的时间应当是相同或者极为接近的,且单位时间内产生的视频数据包和音频数据包也是对应的,考虑到丢包的因素,可以设置一个浮动范围,当在该浮动范围内时,则认定该第二区间符合要求,可以将符合要求的第二区间当作第三区间进行提取,以便于进行下一步的检测。
如上述步骤S403-S406所述,由于每个视频数据包具有包序号,可以根据其最后一个视频数据包的包序号,与最前一个视频数据包的包序号之间的差值得到理论包数量,然后检测第三区间内视频包的数量,得到对应的实际包数量,将实际包数量与所述理论包数量相比,得到的比值可以反应出各个第三区间丢包比率,然后选取丢包最少的,即比值最大的第三区间,以作为目标区间,从而进一步保证了提取到的图片,可以进行高质量的解码,以得到清晰的解码图片。
在一个实施例中,所述将各所述暂时图片分别进行解码处理,得到对应的解码图片的步骤S5,包括:
S501:检测所述暂时图片的图片信息;
S502:若所述图片信息显示所述暂时图片为P帧,则在所述视频数据中位于所述暂时图片之前的图片中,找出离所述暂时图片最近的目标关键帧;
S503:将所述目标关键帧对应的图片至所述暂时图片之间的所有图片输入至CODEC解码器进行解码,以得到所述解码图片。
在一个实施例中,检测到该暂时图片的信息,判断其为GOP(Group of Pictures)组合中的哪一帧图片,当提取到的图片为P帧时,由于是通过充分降低于图像序列中前面已编码帧的时间冗余信息来压缩传输数据量的编码图像,故根据视频数据需要向前述找出离暂时图片最近的目标关键帧,然后将该目标关键帧对应的图片至所述暂时图片之间的所有图片组合之后,通过CODEC解码器解码为一张图片,即得到了解码图片。即实现了对P帧图片的解码。
在一个实施例中,所述检测所述暂时图片的图片信息的步骤S501之后,还包括:
S5021:若所述图片信息显示所述暂时图片为B帧,则获取位于所述暂时图片之后的图片中,与下一个目标关键帧图片之间所有的P帧图片,以及位于所述暂时图片之前的离所述暂时图片最近的目标关键帧;其中,所述目标关键帧为携带全部信息的独立帧;
S5022:将所述暂时图片、所述目标关键帧对应的图片以及所述所有的P帧图片输入至CODEC解码器进行解码,以得到所述解码图片。
如上述步骤S5021-S5022所述,当图片信息显示暂时图片为B帧时,由于其既与源图像序列前面已编码帧相关,也与源图像序列后面已编码帧之间的时间冗余信息相关,以压缩传输数据量的编码图像。故而需要获取到其后述图片中至下一个I帧(即目标关键帧)之间所有的P帧图片,也需要找出前述图片中里暂时图片最近的目标关键帧,然后将暂时图片、所述目标关键帧对应的图片以及所述所有的P帧图片输入至CODEC解码器进行解码,以得到所述解码图片。
另外,若图片信息显示暂时图片为I帧(即关键帧),则可以直接根据CODEC解码器进行解码。
在一个实施例中,所述按照预设的图片质量评分方法对各所述解码图片进行评分的步骤S6,包括:
S601:获取所述解码图片的像素值;
S602:根据所述像素值与得分系数的对应关系,得到对应的得分系数;
S603:将所述解码图片输入至预先构建的图像检测模型中,得到所述解码图片在各个维度中的维度值;
S604:将所述得分系数和各个所述维度值输入公式
Figure PCTCN2021097542-appb-000001
中进行计算,得到所述解码图片的评分值;其中,Score表示所述评分值,k表示所述得分系数,n表示所述图像检测模型中检测维度总数量,w i表示第i个维度对所述评分值的影响权重,v i表示第i个维度的所述维度值。
如上述步骤S601-S604所述,获取到解码图片的像素值,由于解码图片已经得到,可以根据相应的图像处理软件,例如通过PS(Photoshop)可以直接得到解码图片的像素值,考虑到解码图片的像素也会有差异,而对于解码图片来说,像素是一个非常重要的指标,故而可以事先建立其像素和得分系数的对应关系,在获取到解码图片的像素后,可以直接得到对应的得分系数k,然后还需要考虑到其他维度对分数的影响,可以按照各个维度对解码图片进行人脸识别的重要性事先分配不同的权重系数w i,然后根据在各个维度的维度值v i,根据公式
Figure PCTCN2021097542-appb-000002
计算得到各个解码图片的评分值,该公式考虑到了图片的各个维度,根据综合对图片维度进行评分,使评分更具标准化,也提高了解码图片可以进行人脸识别的准确性。
在一个实施例中,所述按照时间顺序将所述视频通话过程分割为多个第一区间,并统计每个所述第一区间内视频数据包和音频数据包的数量和的步骤S2之前,包括:
S101:提取所述音频数据中的声音特征信息;
S102:在预设的声音数据库中,获取与所述声音特征信息所对应的业务场景信息;
S103:将所述音频数据转换成语义信息,并提取所述语义信息中的地址关键字;
S104:根据所述业务场景信息和所述地址关键字识别当前场所;
S105:判断所述当前场所是否满足对话要求;
S106:若满足对话要求,则可以继续执行所述按照时间顺序将所述视频通话过程分割为多个第一区间,并分别统计各所述第一区间内视频数据包和音频数据包的数量和的步骤。
如上述步骤S101-S106所述,实现了对场所的认定,即先提取音频数据中的声音特征信息,提取的方式可以是通过Librosa音频处理库和openSMILE工具包进行声音的提取,通过在查询预设的声音数据库中预先存储的声音特征信息与业务场景对应表,来识别业务场景信息,再将音频数据转换成语义信息,提取语义信息中的地址关键字,根据地址关键字查询客户所在的位置(在一些实施例中也可以通过GPS来识别客户所在的位置),根据业务场景信息和位置,得到客户所在的场所,即当前场所,判断当前场所是否满足通话请求,即是否在人多嘈杂的场所,当然,各类场所与是否满足通话请求也事先存储在数据库中,得到了客户的当前场所,即可以判断处当前场所是否满足通话请求,若满足了通话请求,则可以继续执行步骤S2中的内容,从而保证了客户信息的安全。
在一个实施例中,所述将所述评分最高的所述解码图片作为待识别图片进行提取,以进行人脸识别的步骤S7之前,还包括:
S611:获取所述待识别图片在所述视频通话过程时间戳;
S612:基于所述时间戳获取间隔预设时间的辅助识别图片;
S613:对所述辅助识别图片和所述待识别图片进行灰度化处理,对应得到第一灰度图片和第二灰度图片;
S614:计算灰度图片的第m列或者第m行的所有像素点的灰度值的平均值A m,以及计算灰度图片中所有像素点的灰度值的平均值B;其中所述灰度图片为所述第一灰度图片或第二灰度图片;
S615:根据公式
Figure PCTCN2021097542-appb-000003
计算灰度图片的第m列或者第m行的总体方差,其中N为所述灰度图片中的列或者行的总数量;
S616:根据公式
Figure PCTCN2021097542-appb-000004
获得所述第一灰度图片与所述第二灰度图片的第m列或者第m行的总体方差之差
Figure PCTCN2021097542-appb-000005
其中,
Figure PCTCN2021097542-appb-000006
为所述第一灰度图片的第m列或者第m行的总体方差,
Figure PCTCN2021097542-appb-000007
为所述第二灰度图片的第m列或者第m行的总体方差;
S617:判断
Figure PCTCN2021097542-appb-000008
是否小于预设的方差误差阈值;
S618:若
Figure PCTCN2021097542-appb-000009
小于预设的方差误差阈值,则判定所述待识别图片达到人脸识别的条件。
如上述步骤S611-S612所述,根据当前视频通话的时间戳获取到间隔预设时间的辅助识别图片,其中,该间隔预设时间可以是向前的间隔预设时间,也可以是向后的间隔预设时间,可以自行设定,也可以都获取,进行两次识别的过程,时间戳为待识别图片对应的视频数据包所对应的接收时间,将视频数据包的接收时间作为时间戳。
如上述步骤S613-S618所述,实现了对待识别图片进行人脸识别前的检测,其中,灰度化指将彩色表示一种灰度颜色,例如在在RGB模型中,如果R=G=B时,则彩色表示一种灰度颜色,其中R=G=B的值叫灰度值,因此,灰度图像每个像素只需一个字节存放灰度值(又称强度值、亮度值),从而减少存储量。灰度范围例如为0-255(当R,G,B的取值均为0-255时,当然也会随R,G,B的取值范围的变化而变化)。采用灰度化处理的方法可以为任意方法,例如分量法、最大值法、平均值法、加权平均法等。其中,由于灰度值的取值只有256种, 在此基础上进行图片对比能够大大减轻计算量。再计算所述灰度图片的第m列或者第m行的所有像素点的灰度值的平均值A m,以及计算所述灰度图片中所有像素点的灰度值的平均值B。其中,计算所述灰度图片的第m列或者第m行的所有像素点的灰度值的平均值Am的过程包括:采集所述灰度图片的第m列或者第m行的所有像素点的灰度值,对所述第m列或者第m行的所有像素点的灰度值进行加和处理,将进行过加和处理得到的灰度值之和除以所述第m列或者第m行的所有像素点的数量,得到所述灰度图片的第m列或者第m行的所有像素点的灰度值的平均值Am。计算所述灰度图片中所有像素点的灰度值的平均值B的过程包括:计算所述灰度图片中所有像素点的灰度值之和,再以所述灰度值之和除以所述像素点的数量,得到所述灰度图片中所有像素点的灰度值的平均值B。根据公式
Figure PCTCN2021097542-appb-000010
计算所述灰度图片的第m列或者第m行的总体方差
Figure PCTCN2021097542-appb-000011
其中N为所述灰度图片中的列或者行的总数量。在本申请中,采用总体方差来衡量所述灰度图片的第m列或者第m行的像素点的灰度值的平均值Am与所述灰度图片中所有像素点的灰度值的平均值B之间的差异。
根据公式
Figure PCTCN2021097542-appb-000012
获得两张所述灰度图片的第m列或者第m行的总体方差之差
Figure PCTCN2021097542-appb-000013
其中,
Figure PCTCN2021097542-appb-000014
为所述第一灰度图片的第m列或者第m行的总体方差,
Figure PCTCN2021097542-appb-000015
为所述第二灰度图片的第m列或者第m行的总体方差。总体方差之差
Figure PCTCN2021097542-appb-000016
反应了两张灰度图片的第m列或者第m行的灰度值的差异。当
Figure PCTCN2021097542-appb-000017
较小时,例如为0时,表明
Figure PCTCN2021097542-appb-000018
等于或者近似等于
Figure PCTCN2021097542-appb-000019
可视为第一张灰度图片第m列或者第m行的灰度值与第二张灰度图片第m列或者第m行的灰度值相同或者近似相同(近似判断,以节省算力,并且由于不同的两张图片的总体方差一般不相等,因此该判断的准确性很高),反之认为第一张灰度图片第m列或者第m行的灰度值与第二张灰度图片第m列或者第m行的灰度值不相同。判断
Figure PCTCN2021097542-appb-000020
是否小于预设的方差误差阈值。其中
Figure PCTCN2021097542-appb-000021
的返回值即为
Figure PCTCN2021097542-appb-000022
中的最大值。若
Figure PCTCN2021097542-appb-000023
小于预设的方差误差阈值,则判定所述指定图片与所述指定终端前一次获取的图片相似。从而利用了近似判断(由于两张不同图片转化为的灰度图片的所有灰度值一般不相等,而相同图片转化为的灰度图片的所有灰度值一般相等),实现了在消耗较少计算资源的前提下,判断所述辅助识别图片与所述待识别图片是否相似。据此,当辅助识别图片与待识别图片相似的前提下,才进行后续的步骤(若辅助识别图片与所述待识别图片相似,则表明客户一直在同一场景下,且网络良好,没有发生位置的变动,即客户所处的环境满足识别条件防止非法分子利用客户的剪辑视频以冒充客户办理业务),从而保证了客户资料的安全性。
参照图2,本申请还提供了一种待识别图片的提取装置,包括:
数据获取模块10,用于获取视频通话过程中的视频数据和音频数据;
分割模块20,用于按照时间顺序将所述视频通话过程分割为多个第一区间,并统计每个所述第一区间内视频数据包和音频数据包的数量和;
第一选取模块30,用于从各所述第一区间中选取所述数量和达到预设数量的第一区间,作为第二区间;
第二选取模块40,用于从各所述第二区间中选取预设个数的第二区间作为目标区间,并分别从各所述目标区间中任意选取一帧暂时图片;
解码模块50,用于将各所述暂时图片分别进行解码处理,得到对应的解码图片;
评分模块60,用于按照预设的图片质量评分方法对各所述解码图片进行评分;
提取模块70,用于将所述评分最高的所述解码图片作为待识别图片进行提取,以进行人脸识别。
在一个实施例中,所述第二选取模块40,包括:
检测子模块,用于检测各所述第二区间内的视频数据包和音频数据包的数量和时间是否对应;
提取子模块,用于将视频数据包和音频数据包的时间和数量相对应的所述第二区间,标记为第三区间;
获取所述第三区间的各个所述视频数据包以及各所述视频数据包的包序号;
根据所述第三区间内最后一个视频数据包的包序号,与最前一个视频数据包的包序号之间的差值,得到理论包数量;
将实际包数量与所述理论包数量相比,得到比值;所述实际包数量为在所述第三区间内的所述视频数据包的实际数量;
按照各所述第三区间的比值大小,从大至小依次选取预设个数的目标区间。
本申请的有益效果:获取视频数据和音频数据,分为多个第一区间,选取其中丢包最少,也即数量和最多的第二区间,然后从各个第二区间内选取预设个数的目标区间,从各个目标区间内选取一帧暂时图片进行解码,再选取一张评分最高的待识别图片进行识别,提高了待识别图片被识别出来的准确度,进而保证了自动识别的准确度,无需客服人员进行人工核对操作,节约了人力资源。
参照图3,本申请实施例中还提供一种计算机设备,该计算机设备可以是服务器,其内部结构可以如图3所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设计的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储各种视频数据和音频数据等。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时可以实现上述任一实施例所述的待识别图片的提取方法。
本领域技术人员可以理解,图3中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定。
本申请实施例还提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机可读存储介质可以是非易失性,也可以是易失性。计算机程序被处理器执行时可以实现上述任一实施例所述的待识别图片的提取方法。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储与一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的和实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可以包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM一多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双速据率SDRAM(SSRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、装置、物品或者方法所固有 的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。
区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层。
区块链底层平台可以包括用户管理、基础服务、智能合约以及运营监控等处理模块。其中,用户管理模块负责所有区块链参与者的身份信息管理,包括维护公私钥生成(账户管理)、密钥管理以及用户真实身份和区块链地址对应关系维护(权限管理)等,并且在授权的情况下,监管和审计某些真实身份的交易情况,提供风险控制的规则配置(风控审计);基础服务模块部署在所有区块链节点设备上,用来验证业务请求的有效性,并对有效请求完成共识后记录到存储上,对于一个新的业务请求,基础服务先对接口适配解析和鉴权处理(接口适配),然后通过共识算法将业务信息加密(共识管理),在加密之后完整一致的传输至共享账本上(网络通信),并进行记录存储;智能合约模块负责合约的注册发行以及合约触发和合约执行,开发人员可以通过某种编程语言定义合约逻辑,发布到区块链上(合约注册),根据合约条款的逻辑,调用密钥或者其它的事件触发执行,完成合约逻辑,同时还提供对合约升级注销的功能;运营监控模块主要负责产品发布过程中的部署、配置的修改、合约设置、云适配以及产品运行中的实时状态的可视化输出,例如:告警、监控网络情况、监控节点设备健康状态等。
以上所述仅为本申请的优选实施例而已,并不用于限制本申请,对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。

Claims (20)

  1. 一种待识别图片的提取方法,其中,包括:
    获取视频通话过程中的视频数据和音频数据;
    按照时间顺序将所述视频通话过程分割为多个第一区间,并统计每个所述第一区间内视频数据包和音频数据包的数量和;
    从各所述第一区间中选取所述数量和达到预设数量的第一区间,作为第二区间;
    从各所述第二区间中选取预设个数的第二区间作为目标区间,并分别从各所述目标区间中任意选取一帧暂时图片;
    将各所述暂时图片分别进行解码处理,得到对应的解码图片;
    按照预设的图片质量评分方法对各所述解码图片进行评分;
    将所述评分最高的所述解码图片作为待识别图片进行提取,以进行人脸识别。
  2. 如权利要求1所述的待识别图片的提取方法,其中,所述从各所述第二区间中选取预设个数的第二区间作为目标区间的步骤,包括:
    检测各所述第二区间内的视频数据包和音频数据包的时间和数量是否对应;
    将视频数据包和音频数据包的时间和数量相对应的所述第二区间,标记为第三区间;
    获取所述第三区间的各个所述视频数据包以及各所述视频数据包的包序号;
    根据所述第三区间内最后一个视频数据包的包序号,与最前一个视频数据包的包序号之间的差值,得到理论包数量;
    将实际包数量与所述理论包数量相比,得到比值;所述实际包数量为在所述第三区间内的所述视频数据包的实际数量;
    按照各所述第三区间的比值大小,从大至小依次选取预设个数的目标区间。
  3. 如权利要求1所述的待识别图片的提取方法,其中,所述将各所述暂时图片分别进行解码处理,得到对应的解码图片的步骤,包括:
    检测所述暂时图片的图片信息;
    若所述图片信息显示所述暂时图片为P帧,则在所述视频数据中位于所述暂时图片之前的图片中,找出离所述暂时图片最近的目标关键帧;
    将所述目标关键帧对应的图片至所述暂时图片之间的所有图片输入至CODEC解码器进行解码,以得到所述解码图片。
  4. 如权利要求3所述的待识别图片的提取方法,其中,所述检测所述暂时图片的图片信息的步骤之后,还包括:
    若所述图片信息显示所述暂时图片为B帧,则获取位于所述暂时图片之后的图片中,与下一个目标关键帧图片之间所有的P帧图片,以及位于所述暂时图片之前的离所述暂时图片最近的目标关键帧;其中,所述目标关键帧为携带全部信息的独立帧;
    将所述暂时图片、所述目标关键帧对应的图片以及所述所有的P帧图片输入至CODEC解码器进行解码,以得到所述解码图片。
  5. 如权利要求1所述的待识别图片的提取方法,其中,所述按照预设的图片质量评分方法对各所述解码图片进行评分的步骤,包括:
    获取所述解码图片的像素值;
    根据所述像素值与得分系数的对应关系,得到对应的得分系数;
    将所述解码图片输入至预先构建的图像检测模型中,得到所述解码图片在各个维度中的维度值;
    将所述得分系数和各个所述维度值输入公式
    Figure PCTCN2021097542-appb-100001
    中进行计算,得到所述解码图片的评分值;其中,Score表示所述评分值,k表示所述得分系数,n表示所述图像检 测模型中检测维度总数量,w i表示第i个维度对所述评分值的影响权重,v i表示第i个维度的所述维度值。
  6. 如权利要求1所述的待识别图片的提取方法,其中,所述按照时间顺序将所述视频通话过程分割为多个第一区间,并统计每个所述第一区间内视频数据包和音频数据包的数量和的步骤之前,包括:
    提取所述音频数据中的声音特征信息;
    在预设的声音数据库中,获取与所述声音特征信息所对应的业务场景信息;
    将所述音频数据转换成语义信息,并提取所述语义信息中的地址关键字;
    根据所述业务场景信息和所述地址关键字识别当前场所;
    判断所述当前场所是否满足对话要求;
    若满足对话要求,则执行所述按照时间顺序将所述视频通话过程分割为多个第一区间,并分别统计各所述第一区间内视频数据包和音频数据包的数量和的步骤。
  7. 如权利要求1所述的待识别图片的提取方法,其中,所述将所述评分最高的所述解码图片作为待识别图片进行提取,以进行人脸识别的步骤之前,还包括:
    获取所述待识别图片在所述视频通话过程时间戳;
    基于所述时间戳获取间隔预设时间的辅助识别图片;
    对所述辅助识别图片和所述待识别图片进行灰度化处理,对应得到第一灰度图片和第二灰度图片;
    计算灰度图片的第m列或者第m行的所有像素点的灰度值的平均值A m,以及计算灰度图片中所有像素点的灰度值的平均值B;其中所述灰度图片为所述第一灰度图片或第二灰度图片;
    根据公式
    Figure PCTCN2021097542-appb-100002
    计算灰度图片的第m列或者第m行的总体方差,其中N为所述灰度图片中的列或者行的总数量;
    根据公式
    Figure PCTCN2021097542-appb-100003
    获得所述第一灰度图片与所述第二灰度图片的第m列或者第m行的总体方差之差
    Figure PCTCN2021097542-appb-100004
    其中,
    Figure PCTCN2021097542-appb-100005
    为所述第一灰度图片的第m列或者第m行的总体方差,
    Figure PCTCN2021097542-appb-100006
    为所述第二灰度图片的第m列或者第m行的总体方差;
    判断
    Figure PCTCN2021097542-appb-100007
    是否小于预设的方差误差阈值;
    Figure PCTCN2021097542-appb-100008
    小于预设的方差误差阈值,则判定所述待识别图片达到人脸识别的条件。
  8. 一种待识别图片的提取装置,其中,包括:
    数据获取模块,用于获取视频通话过程中的视频数据和音频数据;
    分割模块,用于按照时间顺序将所述视频通话过程分割为多个第一区间,并统计每个所述第一区间内视频数据包和音频数据包的数量和;
    第一选取模块,用于从各所述第一区间中选取所述数量和达到预设数量的第一区间,作为第二区间;
    第二选取模块,用于从各所述第二区间中选取预设个数的第二区间作为目标区间,并分别从各所述目标区间中任意选取一帧暂时图片;
    解码模块,用于将各所述暂时图片分别进行解码处理,得到对应的解码图片;
    评分模块,用于按照预设的图片质量评分方法对各所述解码图片进行评分;
    提取模块,用于将所述评分最高的所述解码图片作为待识别图片进行提取,以进行人脸识别。
  9. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,其中,所 述处理器执行所述计算机程序时实现待识别图片的提取方法的步骤:
    获取视频通话过程中的视频数据和音频数据;
    按照时间顺序将所述视频通话过程分割为多个第一区间,并统计每个所述第一区间内视频数据包和音频数据包的数量和;
    从各所述第一区间中选取所述数量和达到预设数量的第一区间,作为第二区间;
    从各所述第二区间中选取预设个数的第二区间作为目标区间,并分别从各所述目标区间中任意选取一帧暂时图片;
    将各所述暂时图片分别进行解码处理,得到对应的解码图片;
    按照预设的图片质量评分方法对各所述解码图片进行评分;
    将所述评分最高的所述解码图片作为待识别图片进行提取,以进行人脸识别。
  10. 如权利要求9所述的计算机设备,其中,所述从各所述第二区间中选取预设个数的第二区间作为目标区间的步骤,包括:
    检测各所述第二区间内的视频数据包和音频数据包的时间和数量是否对应;
    将视频数据包和音频数据包的时间和数量相对应的所述第二区间,标记为第三区间;
    获取所述第三区间的各个所述视频数据包以及各所述视频数据包的包序号;
    根据所述第三区间内最后一个视频数据包的包序号,与最前一个视频数据包的包序号之间的差值,得到理论包数量;
    将实际包数量与所述理论包数量相比,得到比值;所述实际包数量为在所述第三区间内的所述视频数据包的实际数量;
    按照各所述第三区间的比值大小,从大至小依次选取预设个数的目标区间。
  11. 如权利要求9所述的计算机设备,其中,所述将各所述暂时图片分别进行解码处理,得到对应的解码图片的步骤,包括:
    检测所述暂时图片的图片信息;
    若所述图片信息显示所述暂时图片为P帧,则在所述视频数据中位于所述暂时图片之前的图片中,找出离所述暂时图片最近的目标关键帧;
    将所述目标关键帧对应的图片至所述暂时图片之间的所有图片输入至CODEC解码器进行解码,以得到所述解码图片。
  12. 如权利要求11所述的计算机设备,其中,所述检测所述暂时图片的图片信息的步骤之后,还包括:
    若所述图片信息显示所述暂时图片为B帧,则获取位于所述暂时图片之后的图片中,与下一个目标关键帧图片之间所有的P帧图片,以及位于所述暂时图片之前的离所述暂时图片最近的目标关键帧;其中,所述目标关键帧为携带全部信息的独立帧;
    将所述暂时图片、所述目标关键帧对应的图片以及所述所有的P帧图片输入至CODEC解码器进行解码,以得到所述解码图片。
  13. 如权利要求9所述的计算机设备,其中,所述按照预设的图片质量评分方法对各所述解码图片进行评分的步骤,包括:
    获取所述解码图片的像素值;
    根据所述像素值与得分系数的对应关系,得到对应的得分系数;
    将所述解码图片输入至预先构建的图像检测模型中,得到所述解码图片在各个维度中的维度值;
    将所述得分系数和各个所述维度值输入公式
    Figure PCTCN2021097542-appb-100009
    中进行计算,得到所述解码图片的评分值;其中,Score表示所述评分值,k表示所述得分系数,n表示所述图像检测模型中检测维度总数量,w i表示第i个维度对所述评分值的影响权重,v i表示第i个维度的所述维度值。
  14. 如权利要求9所述的计算机设备,其中,所述按照时间顺序将所述视频通话过程分割为多个第一区间,并统计每个所述第一区间内视频数据包和音频数据包的数量和的步骤之前,包括:
    提取所述音频数据中的声音特征信息;
    在预设的声音数据库中,获取与所述声音特征信息所对应的业务场景信息;
    将所述音频数据转换成语义信息,并提取所述语义信息中的地址关键字;
    根据所述业务场景信息和所述地址关键字识别当前场所;
    判断所述当前场所是否满足对话要求;
    若满足对话要求,则执行所述按照时间顺序将所述视频通话过程分割为多个第一区间,并分别统计各所述第一区间内视频数据包和音频数据包的数量和的步骤。
  15. 如权利要求9所述的计算机设备,其中,所述将所述评分最高的所述解码图片作为待识别图片进行提取,以进行人脸识别的步骤之前,还包括:
    获取所述待识别图片在所述视频通话过程时间戳;
    基于所述时间戳获取间隔预设时间的辅助识别图片;
    对所述辅助识别图片和所述待识别图片进行灰度化处理,对应得到第一灰度图片和第二灰度图片;
    计算灰度图片的第m列或者第m行的所有像素点的灰度值的平均值A m,以及计算灰度图片中所有像素点的灰度值的平均值B;其中所述灰度图片为所述第一灰度图片或第二灰度图片;
    根据公式
    Figure PCTCN2021097542-appb-100010
    计算灰度图片的第m列或者第m行的总体方差,其中N为所述灰度图片中的列或者行的总数量;
    根据公式
    Figure PCTCN2021097542-appb-100011
    获得所述第一灰度图片与所述第二灰度图片的第m列或者第m行的总体方差之差
    Figure PCTCN2021097542-appb-100012
    其中,
    Figure PCTCN2021097542-appb-100013
    为所述第一灰度图片的第m列或者第m行的总体方差,
    Figure PCTCN2021097542-appb-100014
    为所述第二灰度图片的第m列或者第m行的总体方差;
    判断
    Figure PCTCN2021097542-appb-100015
    是否小于预设的方差误差阈值;
    Figure PCTCN2021097542-appb-100016
    小于预设的方差误差阈值,则判定所述待识别图片达到人脸识别的条件。
  16. 一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现待识别图片的提取方法的步骤:
    获取视频通话过程中的视频数据和音频数据;
    按照时间顺序将所述视频通话过程分割为多个第一区间,并统计每个所述第一区间内视频数据包和音频数据包的数量和;
    从各所述第一区间中选取所述数量和达到预设数量的第一区间,作为第二区间;
    从各所述第二区间中选取预设个数的第二区间作为目标区间,并分别从各所述目标区间中任意选取一帧暂时图片;
    将各所述暂时图片分别进行解码处理,得到对应的解码图片;
    按照预设的图片质量评分方法对各所述解码图片进行评分;
    将所述评分最高的所述解码图片作为待识别图片进行提取,以进行人脸识别。
  17. 如权利要求16所述的计算机可读存储介质,其中,所述从各所述第二区间中选取预设个数的第二区间作为目标区间的步骤,包括:
    检测各所述第二区间内的视频数据包和音频数据包的时间和数量是否对应;
    将视频数据包和音频数据包的时间和数量相对应的所述第二区间,标记为第三区间;
    获取所述第三区间的各个所述视频数据包以及各所述视频数据包的包序号;
    根据所述第三区间内最后一个视频数据包的包序号,与最前一个视频数据包的包序号之间的差值,得到理论包数量;
    将实际包数量与所述理论包数量相比,得到比值;所述实际包数量为在所述第三区间内的所述视频数据包的实际数量;
    按照各所述第三区间的比值大小,从大至小依次选取预设个数的目标区间。
  18. 如权利要求16所述的计算机可读存储介质,其中,所述将各所述暂时图片分别进行解码处理,得到对应的解码图片的步骤,包括:
    检测所述暂时图片的图片信息;
    若所述图片信息显示所述暂时图片为P帧,则在所述视频数据中位于所述暂时图片之前的图片中,找出离所述暂时图片最近的目标关键帧;
    将所述目标关键帧对应的图片至所述暂时图片之间的所有图片输入至CODEC解码器进行解码,以得到所述解码图片。
  19. 如权利要求18所述的计算机可读存储介质,其中,所述检测所述暂时图片的图片信息的步骤之后,还包括:
    若所述图片信息显示所述暂时图片为B帧,则获取位于所述暂时图片之后的图片中,与下一个目标关键帧图片之间所有的P帧图片,以及位于所述暂时图片之前的离所述暂时图片最近的目标关键帧;其中,所述目标关键帧为携带全部信息的独立帧;
    将所述暂时图片、所述目标关键帧对应的图片以及所述所有的P帧图片输入至CODEC解码器进行解码,以得到所述解码图片。
  20. 如权利要求16所述的计算机可读存储介质,其中,所述按照预设的图片质量评分方法对各所述解码图片进行评分的步骤,包括:
    获取所述解码图片的像素值;
    根据所述像素值与得分系数的对应关系,得到对应的得分系数;
    将所述解码图片输入至预先构建的图像检测模型中,得到所述解码图片在各个维度中的维度值;
    将所述得分系数和各个所述维度值输入公式
    Figure PCTCN2021097542-appb-100017
    中进行计算,得到所述解码图片的评分值;其中,Score表示所述评分值,k表示所述得分系数,n表示所述图像检测模型中检测维度总数量,w i表示第i个维度对所述评分值的影响权重,v i表示第i个维度的所述维度值。
PCT/CN2021/097542 2021-01-12 2021-05-31 待识别图片的提取方法、装置、设备以及存储介质 WO2022151639A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110037554.9 2021-01-12
CN202110037554.9A CN112911385B (zh) 2021-01-12 2021-01-12 待识别图片的提取方法、装置、设备以及存储介质

Publications (1)

Publication Number Publication Date
WO2022151639A1 true WO2022151639A1 (zh) 2022-07-21

Family

ID=76112492

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/097542 WO2022151639A1 (zh) 2021-01-12 2021-05-31 待识别图片的提取方法、装置、设备以及存储介质

Country Status (2)

Country Link
CN (1) CN112911385B (zh)
WO (1) WO2022151639A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115297323A (zh) * 2022-08-16 2022-11-04 广东省信息网络有限公司 一种rpa流程自动化方法和系统
CN117615088A (zh) * 2024-01-22 2024-02-27 沈阳市锦拓电子工程有限公司 一种安全监控的视频数据高效存储方法

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114398449B (zh) * 2021-12-29 2023-01-06 深圳市海清视讯科技有限公司 数据处理方法、装置、视频监控系统、存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170330038A1 (en) * 2016-05-13 2017-11-16 Canon Kabushiki Kaisha Method, system and apparatus for selecting a video frame
CN107633209A (zh) * 2017-08-17 2018-01-26 平安科技(深圳)有限公司 电子装置、动态视频人脸识别的方法及存储介质
CN108038422A (zh) * 2017-11-21 2018-05-15 平安科技(深圳)有限公司 摄像装置、人脸识别的方法及计算机可读存储介质
CN112118442A (zh) * 2020-09-18 2020-12-22 平安科技(深圳)有限公司 Ai视频通话质量分析方法、装置、计算机设备及存储介质
CN112132103A (zh) * 2020-09-30 2020-12-25 新华智云科技有限公司 一种视频人脸检测识别方法和系统

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101873494B (zh) * 2010-04-30 2012-07-04 南京邮电大学 基于切片级别的视频传输中动态交织的方法
CN109922334B (zh) * 2017-12-13 2021-11-19 阿里巴巴(中国)有限公司 一种视频质量的识别方法及系统
CN109274554A (zh) * 2018-09-28 2019-01-25 中国科学院长春光学精密机械与物理研究所 图像数据丢包测试方法、装置、设备以及可读存储介质
CN111061912A (zh) * 2018-10-16 2020-04-24 华为技术有限公司 一种处理视频文件的方法及电子设备
CN109522814B (zh) * 2018-10-25 2020-10-02 清华大学 一种基于视频数据的目标追踪方法及装置
CN111277861B (zh) * 2020-02-21 2023-02-24 北京百度网讯科技有限公司 提取视频中热点片段的方法以及装置
CN111401315B (zh) * 2020-04-10 2023-08-22 浙江大华技术股份有限公司 基于视频的人脸识别方法、识别装置及存储装置
CN111862063A (zh) * 2020-07-27 2020-10-30 中国平安人寿保险股份有限公司 一种视频质量评估方法、装置、计算机设备和存储介质
CN112039699B (zh) * 2020-08-25 2022-11-22 RealMe重庆移动通信有限公司 网络切片选取方法、装置、存储介质与电子设备
CN112104897B (zh) * 2020-11-04 2021-03-12 北京达佳互联信息技术有限公司 视频获取方法、终端及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170330038A1 (en) * 2016-05-13 2017-11-16 Canon Kabushiki Kaisha Method, system and apparatus for selecting a video frame
CN107633209A (zh) * 2017-08-17 2018-01-26 平安科技(深圳)有限公司 电子装置、动态视频人脸识别的方法及存储介质
CN108038422A (zh) * 2017-11-21 2018-05-15 平安科技(深圳)有限公司 摄像装置、人脸识别的方法及计算机可读存储介质
CN112118442A (zh) * 2020-09-18 2020-12-22 平安科技(深圳)有限公司 Ai视频通话质量分析方法、装置、计算机设备及存储介质
CN112132103A (zh) * 2020-09-30 2020-12-25 新华智云科技有限公司 一种视频人脸检测识别方法和系统

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115297323A (zh) * 2022-08-16 2022-11-04 广东省信息网络有限公司 一种rpa流程自动化方法和系统
CN117615088A (zh) * 2024-01-22 2024-02-27 沈阳市锦拓电子工程有限公司 一种安全监控的视频数据高效存储方法
CN117615088B (zh) * 2024-01-22 2024-04-05 沈阳市锦拓电子工程有限公司 一种安全监控的视频数据高效存储方法

Also Published As

Publication number Publication date
CN112911385B (zh) 2021-12-07
CN112911385A (zh) 2021-06-04

Similar Documents

Publication Publication Date Title
WO2022151639A1 (zh) 待识别图片的提取方法、装置、设备以及存储介质
WO2019085403A1 (zh) 一种人脸识别智能比对方法、电子装置及计算机可读存储介质
WO2020238552A1 (zh) 基于微表情识别的审批指令生成方法、装置和计算机设备
US20190294900A1 (en) Remote user identity validation with threshold-based matching
CN112801608A (zh) 基于大数据和云计算的远程视频会议智能管理系统及云会议管理平台
WO2019047567A1 (zh) 服务提供方法、装置、存储介质和计算设备
CN112036749B (zh) 基于医疗数据识别风险用户的方法、装置和计算机设备
CN111464819A (zh) 直播图像检测方法、装置、设备及存储介质
WO2022142319A1 (zh) 虚假保险报案处理方法、装置、计算机设备及存储介质
WO2022134418A1 (zh) 视频识别方法及相关设备
CN111444362B (zh) 恶意图片拦截方法、装置、设备和存储介质
CN111914649A (zh) 人脸识别的方法及装置、电子设备、存储介质
CN114553838A (zh) 远程业务办理的实现方法、系统及服务器
CN108647613B (zh) 一种应用于考场的考生查验方法
CN113873088A (zh) 语音通话的交互方法、装置、计算机设备和存储介质
CN114902217A (zh) 用于认证数字内容的系统
CN112788280A (zh) 基于云计算和语音特征分析的智能会议媒体管理云平台
CN113011254A (zh) 一种视频数据处理方法、计算机设备及可读存储介质
CN114821380A (zh) 基于区块链的存证方法、装置、系统、设备及存储介质
CN110489438A (zh) 一种客户行为信息处理方法及装置
CN116012911A (zh) 重复开播事件检测方法及其装置、设备、介质
US20230206634A1 (en) Blockchain recordation and validation of video data
CN111931484B (zh) 一种基于大数据的数据传输方法
US11606461B2 (en) Method for training a spoofing detection model using biometric clustering
CN114297426A (zh) 企业异常贷款行为识别方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21918837

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21918837

Country of ref document: EP

Kind code of ref document: A1