WO2019104890A1 - 结合音频分析和视频分析的欺诈识别方法、装置及存储介质 - Google Patents

结合音频分析和视频分析的欺诈识别方法、装置及存储介质 Download PDF

Info

Publication number
WO2019104890A1
WO2019104890A1 PCT/CN2018/077345 CN2018077345W WO2019104890A1 WO 2019104890 A1 WO2019104890 A1 WO 2019104890A1 CN 2018077345 W CN2018077345 W CN 2018077345W WO 2019104890 A1 WO2019104890 A1 WO 2019104890A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
video
features
feature
order
Prior art date
Application number
PCT/CN2018/077345
Other languages
English (en)
French (fr)
Inventor
韦峰
徐国强
Original Assignee
深圳壹账通智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳壹账通智能科技有限公司 filed Critical 深圳壹账通智能科技有限公司
Publication of WO2019104890A1 publication Critical patent/WO2019104890A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition

Definitions

  • the present application relates to the field of computer information processing technologies, and in particular, to a fraud detection method, apparatus, and computer readable storage medium incorporating audio analysis and video analysis.
  • fraud identification is generally achieved through face-to-face review. It relies heavily on the experience and judgment of analysts. It takes a lot of time and manpower, and the analysis results are often inaccurate and objective. It also uses professional equipment to determine whether the test subject is suspected of fraud by detecting a series of indicators such as breathing, pulse, blood pressure, and skin resistance. However, such equipment is usually expensive and easily infringes on the human rights of the tested person.
  • the present application provides a fraud identification method, apparatus and computer readable storage medium combining audio analysis and video analysis, and objectively and accurately determining an object to be identified by analyzing audio and video data of an object to be identified. Whether the object is suspected of fraud.
  • the present application provides a fraud identification method combining audio analysis and video analysis, which is applied to an electronic device, and the method includes:
  • Sample preparation steps collecting character audio and video samples, cutting audio and video samples, obtaining audio and video clips, assigning a fraud label to each audio and video clip, decoding and preprocessing each audio and video clip to obtain each audio and video Audio clips and video clips of the clip;
  • Feature extraction step extracting speech features from each audio segment, and extracting expression features from each video segment;
  • Model training step the voice data of each audio segment and the fraud label are used as sample data, and the first support vector machine is trained to obtain a voice analysis model; the expression features of each video segment and fraud are labeled as sample data, and the second support is provided.
  • the vector machine performs training to obtain an expression analysis model;
  • the application step of the model is: collecting audio and video data of the object to be identified, analyzing the audio and video data by using the voice analysis model and the expression analysis model, and outputting an audio fraud probability P1 and a video fraud probability P2 of the object to be identified;
  • the weighting calculation step weighting P1 and P2 according to the weights of the speech analysis model and the expression analysis model to obtain a fraud identification result of the object to be identified.
  • the application also provides an electronic device comprising a memory and a processor, the memory including a fraud recognition program.
  • the electronic device is directly or indirectly connected to the imaging device, and the imaging device transmits the collected audio and video data to the electronic device.
  • the processor of the electronic device executes the fraud recognition program in the memory, the following steps are implemented:
  • Sample preparation steps collecting character audio and video samples, cutting audio and video samples, obtaining audio and video clips, assigning a fraud label to each audio and video clip, decoding and preprocessing each audio and video clip to obtain each audio and video Audio clips and video clips of the clip;
  • Feature extraction step extracting speech features from each audio segment, and extracting expression features from each video segment;
  • Model training step the voice data of each audio segment and the fraud label are used as sample data, and the first support vector machine is trained to obtain a voice analysis model; the expression features of each video segment and fraud are labeled as sample data, and the second support is provided.
  • the vector machine performs training to obtain an expression analysis model;
  • the application step of the model is: collecting audio and video data of the object to be identified, analyzing the audio and video data by using the voice analysis model and the expression analysis model, and outputting an audio fraud probability P1 and a video fraud probability P2 of the object to be identified;
  • the weighting calculation step weighting P1 and P2 according to the weights of the speech analysis model and the expression analysis model to obtain a fraud identification result of the object to be identified.
  • the present application further provides a computer readable storage medium including a fraud recognition program, when the fraud recognition program is executed by a processor, implementing the combined audio as described above Any step in the fraud identification method of analysis and video analysis.
  • the fraud detection method, device and computer readable storage medium provided by the present application combined with audio analysis and video analysis, by extracting the voice features of the audio and video sample audio segments and the expression features of the video segments, combined with the corresponding fraud annotation, the support vector machine Training is performed to obtain a speech analysis model and an expression analysis model.
  • the trained model is applied to the real-time fraud identification link: collecting audio and video data of the object to be identified, extracting the phonetic features and expression features of the audio and video data, and inputting the speech features and expression features into the trained speech analysis.
  • the model and the expression analysis model output the audio fraud probability P1 and the video fraud probability P2 of the object to be identified, and combine the P1 and P2 weights to obtain the fraud recognition result of the object to be identified.
  • FIG. 1 is an application environment diagram of a first preferred embodiment of an electronic device of the present application.
  • FIG. 2 is an application environment diagram of a second preferred embodiment of the electronic device of the present application.
  • FIG. 3 is a block diagram showing the program of the fraud recognition program in FIGS. 1 and 2.
  • FIG. 4 is a flow chart of a preferred embodiment of a fraud identification method in conjunction with audio analysis and video analysis.
  • the camera device 3 is connected to the electronic device 1 via the network 2, and the camera device 3 collects audio and video data of the person and transmits it to the electronic device 1 via the network 2.
  • the electronic device 1 analyzes the device using the fraud identification program 10 provided by the present application. The audio and video data is described, and the fraud recognition result of the character is obtained.
  • the electronic device 1 may be a terminal device having a storage and computing function such as a server, a smart phone, a tablet computer, a portable computer, a desktop computer, or the like.
  • the electronic device 1 includes a memory 11, a processor 12, a network interface 13, and a communication bus 14.
  • the camera device 3 is installed in a specific place, such as an office space, a monitoring area, and the like, for collecting audio and video data of a person, and then transmitting the audio and video data to the memory 11 through the network 2.
  • the network interface 13 may include a standard wired interface, a wireless interface (such as a WI-FI interface).
  • Communication bus 14 is used to implement connection communication between these components.
  • the memory 11 includes at least one type of readable storage medium.
  • the at least one type of readable storage medium may be a non-volatile storage medium such as a flash memory, a hard disk, a multimedia card, a card type memory, or the like.
  • the readable storage medium may be an internal storage unit of the electronic device 1, such as a hard disk of the electronic device 1.
  • the readable storage medium may also be an external memory 11 of the electronic device 1, such as a plug-in hard disk equipped on the electronic device 1, a smart memory card (SMC). , Secure Digital (SD) card, Flash Card, etc.
  • SMC smart memory card
  • SD Secure Digital
  • the memory 11 stores the program code of the fraud recognition program 10, the audio and video data collected by the camera 3, and other data to which the processor 12 executes the program code of the fraud recognition program 10 and the final output. Data, etc.
  • Processor 12 may be a Central Processing Unit (CPU), microprocessor or other data processing chip in some embodiments.
  • CPU Central Processing Unit
  • microprocessor or other data processing chip in some embodiments.
  • Figure 1 shows only the electronic device 1 with components 11-14, but it should be understood that not all illustrated components may be implemented, and more or fewer components may be implemented instead.
  • the electronic device 1 may further include a user interface
  • the user interface may include an input unit such as a keyboard, a voice input device such as a microphone, a device with a voice recognition function, a voice output device such as an audio, a headphone, and the like.
  • the user interface may also include a standard wired interface and a wireless interface.
  • the electronic device 1 may further include a display.
  • the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch sensor, or the like in some embodiments.
  • the display is used to display information processed by the electronic device 1 and a visualized user interface.
  • the electronic device 1 further comprises a touch sensor.
  • the area provided by the touch sensor for the user to perform a touch operation is referred to as a touch area.
  • the touch sensor described herein may be a resistive touch sensor, a capacitive touch sensor, or the like.
  • the touch sensor includes not only a contact type touch sensor but also a proximity type touch sensor or the like.
  • the touch sensor may be a single sensor or a plurality of sensors arranged, for example, in an array.
  • a user such as a counselor, a credit reviewer, etc., can initiate the fraud identification program 10 by touch.
  • the electronic device 1 may further include a radio frequency (RF) circuit, a sensor, an audio circuit, and the like, and details are not described herein.
  • RF radio frequency
  • FIG. 2 it is an application environment diagram of a second preferred embodiment of the electronic device of the present application.
  • the user implements the fraud identification process through the terminal 3, and the camera device 30 of the terminal 3 collects the audio and video data of the object to be identified, and transmits it to the electronic device 1 through the network 2, and the processor 12 of the electronic device 1 performs the fraud identification stored in the memory 11.
  • the program code of the program 10 analyzes the audio data and the video data of the audio and video data, outputs the audio fraud probability P1 and the video fraud probability P2 of the object to be identified, and weights the P1 and P2 to obtain the fraud identification of the object to be identified. The result is for reference by the object to be identified or the examiner.
  • the terminal 3 can be a terminal device having a storage and computing function, such as a smart phone, a tablet computer, a portable computer, and a desktop computer.
  • the fraud identification program 10 of Figures 1 and 2 when executed by the processor 12, implements the following steps:
  • Sample preparation steps collecting character audio and video samples, cutting audio and video samples, obtaining audio and video clips, assigning a fraud label to each audio and video clip, decoding and preprocessing each audio and video clip to obtain each audio and video Audio clips and video clips of the clip;
  • Feature extraction step extracting speech features from each audio segment, and extracting expression features from each video segment;
  • Model training step the voice data of each audio segment and the fraud label are used as sample data, and the first support vector machine is trained to obtain a voice analysis model; the expression features of each video segment and fraud are labeled as sample data, and the second support is provided.
  • the vector machine performs training to obtain an expression analysis model;
  • the application step of the model is: collecting audio and video data of the object to be identified, analyzing the audio and video data by using the voice analysis model and the expression analysis model, and outputting an audio fraud probability P1 and a video fraud probability P2 of the object to be identified;
  • the weighting calculation step weighting P1 and P2 according to the weights of the speech analysis model and the expression analysis model to obtain a fraud identification result of the object to be identified.
  • FIG. 3 for a program block diagram of the fraud identification program 10
  • FIG. 4 for a description of a flow chart of a preferred embodiment of the fraud identification method incorporating audio analysis and video analysis.
  • FIG. 3 it is a program block diagram of the fraud recognition program 10 in Figs.
  • the fraud recognition program 10 is divided into a plurality of modules that are stored in the memory 11 and executed by the processor 12 to complete the present application.
  • a module as referred to in this application refers to a series of computer program instructions that are capable of performing a particular function.
  • the fraud recognition program 10 can be divided into an acquisition module 110, an extraction module 120, a training module 130, a model application module 140, and a weighting calculation module 150.
  • the obtaining module 110 is configured to acquire audio and video of a character and decode and preprocess the same to obtain a corresponding audio part and a video part.
  • the audio and video may be collected by the camera device 3 of FIG. 1 or the camera device 30 of FIG. 2, or may be an audio and video and a fraud-free tone that are clearly selected from a network information or an audio-video database. video.
  • the audio and video samples used to train the support vector machine are cut in units of emotions to obtain audio and video segments, and each audio and video segment is assigned a fraud label, and the fraud mark indicates that the person in the audio and video clip is suspected of fraud. For example, 1 indicates fraud suspects and 0 indicates no fraud suspects.
  • the audio and video are decoded and preprocessed to obtain a corresponding audio portion and video portion.
  • the extracting module 120 is configured to extract a voice feature of the audio portion and an expression feature of the video portion.
  • the extraction module 120 extracts the voice features from each of the audio portions obtained by the acquisition module 110, and extracts the expression features from each of the video portions obtained by the acquisition module 110.
  • the extraction module 120 When extracting the speech feature of an audio part, the extraction module 120 first extracts low-order audio features such as the frequency cepstral coefficient, pitch, and zero-crossing rate from the audio part, and then extracts dynamic regression coefficients from the low-order audio features. Obtaining dynamic audio features of the audio portion, and then extracting high-order audio features from the low-order audio features and dynamic audio features by using a statistical function, and finally filtering the high-order audio feature subsets from the high-order audio features by using a feature filtering algorithm. The high-order audio feature subset is used as the speech feature of the audio portion.
  • low-order audio features such as the frequency cepstral coefficient, pitch, and zero-crossing rate
  • dynamic regression coefficients from the low-order audio features.
  • OpenSMILE software can be used to extract low-order audio features such as the Mel frequency cepstral coefficient, pitch, and zero-crossing rate of the audio portion.
  • the dynamic regression coefficients are used to indicate the importance of low-order audio features. For example, if a certain low-order audio feature (such as a pitch feature) of an audio portion is represented by a waveform file, the waveform file can be expressed by multiple linear regression as:
  • k is the number of the lower-order audio features in the audio portion
  • the statistical function includes a function for extracting a maximum value, a minimum value, a kurtosis, a skewness, and the like of the low-order audio feature and the dynamic audio feature, and the extraction module 120 combines and transforms the data extracted by the statistical function to obtain a high value.
  • Order audio features The number of high-order audio features extracted by each audio part is often very large, but usually only a small number of high-order audio features have a significant impact on the results of fraud detection, so we use feature filtering algorithms to reduce the number of high-order audio features. Improve the speed of fraud identification.
  • the feature screening algorithm may be a sequence forward selection (SFS) algorithm, a sequence backward selection (SBS) algorithm, a bidirectional search (BDS) algorithm, and filtering.
  • the filter feature selection algorithm may also be another feature screening algorithm.
  • the extraction module 120 when extracting the expression feature of a video part, the extraction module 120 first extracts low-order motion features such as head orientation, eye orientation, and action unit (AU) from the video portion, and then statistics each low-order motion feature. The number of occurrences and the duration of the video portion are constructed, and the high-order motion features of the video portion are constructed according to the statistical result, and then the feature filtering algorithm is used to filter the high-order motion feature subset from the high-order motion feature, and the high-order motion feature is selected. The feature subset is used as the emoticon feature of the video portion.
  • low-order motion features such as head orientation, eye orientation, and action unit (AU)
  • AU action unit
  • the training module 130 is configured to train the support vector machine to obtain a speech analysis model and an expression analysis model.
  • the voice feature of each audio portion of the audio and video samples extracted by the extraction module 120 and the fraud assigned by the acquisition module 110 are labeled as sample data, and the first support vector machine is trained to obtain a voice analysis model; and the audio and video extracted by the extraction module 120 is extracted.
  • the expression features of each video portion of the sample and the fraud assigned by the acquisition module 110 are labeled as sample data, and the second support vector machine is trained to obtain an expression analysis model.
  • the model application module 140 is configured to analyze audio and video data of the object to be identified, and obtain an audio fraud probability and a video fraud probability of the object to be identified.
  • the voice feature of the audio portion of the audio and video to be recognized by the extraction module 120 is input into the voice analysis model trained by the training module 130, and the audio fraud probability P1 of the object to be identified is output; the expression feature of the video portion of the object to be identified is input.
  • the trained expression analysis model outputs the video fraud probability P2 of the object to be identified.
  • the weighting calculation module 150 is configured to weight the audio fraud probability P1 and the video fraud probability P2 of the object to be identified to obtain a fraud identification result of the object to be identified.
  • the training module 130 uses the sample data training support vector machine to obtain the speech analysis model and the expression analysis model, the accuracy of the two models can be calculated, thereby calculating the weights of the speech analysis model and the expression analysis model, and calculating the final fraud of the object to be identified. Probability.
  • the weights of the computational speech analysis model and the expression analysis model can be expressed as follows:
  • P (Audio) represents the accuracy of the speech analysis model
  • P (Video) represents the accuracy of the expression analysis model
  • W (Audio) represents the weight of the speech analysis model
  • W (Video) represents the weight of the expression analysis model
  • the audio and video data of the object to be identified is analyzed by the speech analysis model and the expression analysis model, and the audio fraud probability of the object to be identified is 0.8, and the video fraud probability is 0.7, and the weighted fusion is performed according to W (Audio) and W (Video).
  • the probability of fraudulently obtaining the object to be identified is:
  • FIG. 4 it is a flowchart of a preferred embodiment of the fraud identification method in conjunction with audio analysis and video analysis.
  • the electronic device 1 is activated, and the processor 12 executes the fraud recognition program 10 stored in the memory 11 to implement the following steps:
  • step S10 the acquisition module 110 collects the audio and video samples of the character, and cuts the audio and video samples in units of emotions to obtain audio and video segments, and assigns a fraud label to each audio and video segment.
  • the audio and video samples may be acquired by the camera device 3 of FIG. 1 or the camera device 30 of FIG. 2, or may be an audio and video and fraud-free behavior selected from a network information or an audio-video database. Normal audio and video.
  • each audio and video segment is decoded and preprocessed by the obtaining module 110 to obtain an audio segment and a video segment of each audio and video segment.
  • the fraudulent annotation of each audio clip is still a fraudulent annotation of the corresponding audio clip and video clip.
  • Step S30 extracting the voice feature and the expression feature from each of the audio segment and the video segment by using the extraction module 120.
  • the voice feature and the expression feature please refer to the detailed description of the above extraction module 120.
  • Step S40 training the first support vector machine according to the voice features and fraud labels of each audio segment, obtaining a speech analysis model, training the second support vector machine according to the expression features and fraud labels of each video segment, and obtaining an expression analysis model.
  • the training module 130 the voice feature of each audio segment and the fraud are labeled as sample data, and the first support vector machine is trained to obtain a voice analysis model; and the expression features and fraud of each video segment are labeled as sample data,
  • the second support vector machine performs training to obtain an expression analysis model.
  • step S50 the audio and video data of the object to be identified is collected by the acquiring module 110, and the audio and video data is decoded and preprocessed to obtain audio data and video data of the object to be identified.
  • the audio and video data is acquired in real time by the imaging device 3 of FIG. 1 or the imaging device 30 of FIG.
  • step S60 the extraction module 120 is used to extract the voice features of the audio data of the object to be identified and the expression features of the video data.
  • the voice feature and the expression feature please refer to the detailed introduction of the above extraction module 120.
  • Step S70 input the voice feature of the audio data of the object to be identified and the expression feature of the video data into the voice analysis model and the expression analysis model, respectively, to obtain an audio fraud probability and a video fraud probability of the object to be identified.
  • the voice feature of the audio data of the object to be identified extracted by the extraction module 120 is input into the voice analysis model, and the audio fraud probability P1 of the object to be analyzed is output; the to-be-identified is extracted by the extraction module 120.
  • the expression feature of the video data of the object is input to the expression analysis model, and the video fraud probability P2 of the object to be identified is output.
  • Step S80 weighting P1 and P2 according to the weights of the speech analysis model and the expression analysis model to obtain a fraud identification result of the object to be identified.
  • the weight calculation module 150 For the method of determining the weight of the speech analysis model and the expression analysis model and the specific process of the P1 and P2 weight calculation, please refer to the detailed description of the weight calculation module 150 described above.
  • the embodiment of the present application further provides a computer readable storage medium, which may be a hard disk, a multimedia card, an SD card, a flash memory card, an SMC, a read only memory (ROM), and an erasable programmable Any combination or combination of any one or more of read only memory (EPROM), portable compact disk read only memory (CD-ROM), USB memory, and the like.
  • the computer readable storage medium includes an audio and video sample and a fraud recognition program 10, and when the fraud recognition program 10 is executed by the processor, the following operations are performed:
  • Sample preparation steps collecting character audio and video samples, cutting audio and video samples, obtaining audio and video clips, assigning a fraud label to each audio and video clip, decoding and preprocessing each audio and video clip to obtain each audio and video Audio clips and video clips of the clip;
  • Feature extraction step extracting speech features from each audio segment, and extracting expression features from each video segment;
  • Model training step the voice data of each audio segment and the fraud label are used as sample data, and the first support vector machine is trained to obtain a voice analysis model; the expression features of each video segment and fraud are labeled as sample data, and the second support is provided.
  • the vector machine performs training to obtain an expression analysis model;
  • the application step of the model is: collecting audio and video data of the object to be identified, analyzing the audio and video data by using the voice analysis model and the expression analysis model, and outputting an audio fraud probability P1 and a video fraud probability P2 of the object to be identified;
  • the weighting calculation step weighting P1 and P2 according to the weights of the speech analysis model and the expression analysis model to obtain a fraud identification result of the object to be identified.
  • a disk including a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the methods described in the various embodiments of the present application.
  • a terminal device which may be a mobile phone, a computer, a server, or a network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种结合音频分析和视频分析的欺诈识别方法、装置及存储介质。方法包括以下步骤:切割音视频样本,得到音视频片段,为每个音视频片段分配一个欺诈标注(S10);对每个音视频片段进行解码和预处理,得到每个音视频片段的音频片段和视频片段(S20);从每个音频片段和视频片段中分别提取语音特征和表情特征(S30);分别以各音频片段的语音特征和各视频片段的表情特征结合欺诈标注训练支持向量机,得到语音分析模型和表情分析模型(S40);采集待识别对象的音视频数据(S50);提取音视频数据的语音特征和表情特征(S60);将语音特征和表情特征分别输入语音分析模型和表情分析模型,输出待识别对象的欺诈概率P1和P2(S70);将P1、P2加权计算,得到待识别对象的欺诈识别结果(S80)。

Description

结合音频分析和视频分析的欺诈识别方法、装置及存储介质
本申请要求于2017年12月1日提交中国专利局、申请号为201711252009.1、发明名称为“结合音频分析和视频分析的欺诈识别方法、装置及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在申请中。
技术领域
本申请涉及计算机信息处理技术领域,尤其涉及一种结合音频分析和视频分析的欺诈识别方法、装置及计算机可读存储介质。
背景技术
目前,欺诈识别一般通过面审的方式实现,极度依赖分析人员的经验和判断,耗费大量的时间和人力,分析结果往往不准确客观。也有利用专业的仪器设备,通过检测呼吸、脉搏、血压、皮肤电阻等一系列指标判断被测试人员有无欺诈嫌疑,但此类仪器设备通常价格昂贵且容易对被测试人员的人权构成侵犯。
发明内容
为解决现有技术存在的不足,本申请提供一种结合音频分析和视频分析的欺诈识别方法、装置及计算机可读存储介质,通过分析待识别对象的音视频数据,客观、准确地判断待识别对象是否存在欺诈嫌疑。
为实现上述目的,本申请提供一种结合音频分析和视频分析的欺诈识别方法,应用于电子装置,该方法包括:
样本准备步骤:收集人物音视频样本,对音视频样本进行切割,得到音视频片段,为每个音视频片段分配一个欺诈标注,对每个音视频片段进行解码和预处理,得到每个音视频片段的音频片段和视频片段;
特征提取步骤:从每个音频片段中提取语音特征,从每个视频片段中提取表情特征;
模型训练步骤:以各音频片段的语音特征和欺诈标注为样本数据,对第一支持向量机进行训练,得到语音分析模型;以各视频片段的表情特征和欺 诈标注为样本数据,对第二支持向量机进行训练,得到表情分析模型;
模型应用步骤:采集待识别对象的音视频数据,利用所述语音分析模型和表情分析模型对该音视频数据进行分析,输出该待识别对象的音频欺诈概率P1和视频欺诈概率P2;及
加权计算步骤:根据所述语音分析模型和表情分析模型的权重将P1、P2加权计算,得到该待识别对象的欺诈识别结果。
本申请还提供一种电子装置,该电子装置包括存储器和处理器,所述存储器中包括欺诈识别程序。该电子装置直接或间接地与摄像装置相连接,摄像装置将采集的音视频数据传送至电子装置。该电子装置的处理器执行存储器中的欺诈识别程序时,实现以下步骤:
样本准备步骤:收集人物音视频样本,对音视频样本进行切割,得到音视频片段,为每个音视频片段分配一个欺诈标注,对每个音视频片段进行解码和预处理,得到每个音视频片段的音频片段和视频片段;
特征提取步骤:从每个音频片段中提取语音特征,从每个视频片段中提取表情特征;
模型训练步骤:以各音频片段的语音特征和欺诈标注为样本数据,对第一支持向量机进行训练,得到语音分析模型;以各视频片段的表情特征和欺诈标注为样本数据,对第二支持向量机进行训练,得到表情分析模型;
模型应用步骤:采集待识别对象的音视频数据,利用所述语音分析模型和表情分析模型对该音视频数据进行分析,输出该待识别对象的音频欺诈概率P1和视频欺诈概率P2;及
加权计算步骤:根据所述语音分析模型和表情分析模型的权重将P1、P2加权计算,得到该待识别对象的欺诈识别结果。
此外,为实现上述目的,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质中包括欺诈识别程序,所述欺诈识别程序被处理器执行时,实现如上所述的结合音频分析和视频分析的欺诈识别方法中的任意步骤。
本申请提供的结合音频分析和视频分析的欺诈识别方法、装置及计算机可读存储介质,通过提取音视频样本音频片段的语音特征和视频片段的表情特征,结合对应的欺诈标注,对支持向量机进行训练,得到语音分析模型和表情分析模型。之后,将训练好的模型应用于实时的欺诈识别环节:采集待 识别对象的音视频数据,提取该音视频数据的语音特征和表情特征,将该语音特征和表情特征分别输入训练得到的语音分析模型和表情分析模型,输出该待识别对象的音频欺诈概率P1和视频欺诈概率P2,将P1、P2加权融合,得到该待识别对象的欺诈识别结果。利用本申请,可以客观、准确地识别人物是否存在欺诈嫌疑。
附图说明
图1为本申请电子装置第一较佳实施例的应用环境图。
图2为本申请电子装置第二较佳实施例的应用环境图。
图3为图1、图2中欺诈识别程序的程序模块图。
图4为本申请结合音频分析和视频分析的欺诈识别方法较佳实施例的流程图。
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
具体实施方式
下面将参考若干具体实施例来描述本申请的原理和精神。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
参照图1所示,为本申请电子装置第一较佳实施例的应用环境图。在该实施例中,摄像装置3通过网络2连接电子装置1,摄像装置3采集人物的音视频数据,通过网络2传送至电子装置1,电子装置1利用本申请提供的欺诈识别程序10分析所述音视频数据,得到人物的欺诈识别结果。
电子装置1可以是服务器、智能手机、平板电脑、便携计算机、桌上型计算机等具有存储和运算功能的终端设备。
该电子装置1包括存储器11、处理器12、网络接口13及通信总线14。
摄像装置3安装于特定场所,如办公场所、监控区域等,用于采集人物的音视频数据,然后通过网络2将所述音视频数据传输至存储器11。网络接口13可以包括标准的有线接口、无线接口(如WI-FI接口)。通信总线14用于实现这些组件之间的连接通信。
存储器11包括至少一种类型的可读存储介质。所述至少一种类型的可读存储介质可为如闪存、硬盘、多媒体卡、卡型存储器等的非易失性存储介质。在一些实施例中,所述可读存储介质可以是所述电子装置1的内部存储单元,例如该电子装置1的硬盘。在另一些实施例中,所述可读存储介质也可以是所述电子装置1的外部存储器11,例如所述电子装置1上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。
在本实施例中,所述存储器11存储所述欺诈识别程序10的程序代码、摄像装置3采集的音视频数据,以及处理器12执行欺诈识别程序10的程序代码应用到的其他数据以及最后输出的数据等。
处理器12在一些实施例中可以是一中央处理器(Central Processing Unit,CPU),微处理器或其他数据处理芯片。
图1仅示出了具有组件11-14的电子装置1,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。
可选地,该电子装置1还可以包括用户接口,用户接口可以包括输入单元比如键盘(Keyboard)、语音输入装置比如麦克风(microphone)等具有语音识别功能的设备、语音输出装置比如音响、耳机等,可选地用户接口还可以包括标准的有线接口、无线接口。
可选地,该电子装置1还可以包括显示器。显示器在一些实施例中可以是LED显示器、液晶显示器、触控式液晶显示器以及OLED(Organic Light-Emitting Diode,有机发光二极管)触摸器等。显示器用于显示电子装置1处理的信息以及可视化的用户界面。
可选地,该电子装置1还包括触摸传感器。所述触摸传感器所提供的供用户进行触摸操作的区域称为触控区域。此外,这里所述的触摸传感器可以为电阻式触摸传感器、电容式触摸传感器等。而且,所述触摸传感器不仅包括接触式的触摸传感器,也可包括接近式的触摸传感器等。此外,所述触摸传感器可以为单个传感器,也可以为例如阵列布置的多个传感器。用户,例如心理咨询师、信贷面审人员等,可以通过触摸启动欺诈识别程序10。
该电子装置1还可以包括射频(Radio Frequency,RF)电路、传感器和音频电路等等,在此不再赘述。
参照图2所示,为本申请电子装置第二较佳实施例的应用环境图。用户通过终端3实现欺诈识别过程,终端3的摄像装置30采集待识别对象的音视频数据,并通过网络2传送至所述电子装置1,电子装置1的处理器12执行存储器11存储的欺诈识别程序10的程序代码,对音视频数据的音频数据和视频数据进行分析,输出该待识别对象的音频欺诈概率P1和视频欺诈概率P2,将P1、P2加权计算,得到该待识别对象的欺诈识别结果,供待识别对象或审查人员等参考。
图2中电子装置1的组件,例如图中示出的存储器11、处理器12、网络接口13及通信总线14,以及图中未示出的组件,请参照关于图1的介绍。
所述终端3可以为智能手机、平板电脑、便携计算机、桌上型计算机等具有存储和运算功能的终端设备。
图1、图2中的欺诈识别程序10,在被处理器12执行时,实现以下步骤:
样本准备步骤:收集人物音视频样本,对音视频样本进行切割,得到音视频片段,为每个音视频片段分配一个欺诈标注,对每个音视频片段进行解码和预处理,得到每个音视频片段的音频片段和视频片段;
特征提取步骤:从每个音频片段中提取语音特征,从每个视频片段中提取表情特征;
模型训练步骤:以各音频片段的语音特征和欺诈标注为样本数据,对第一支持向量机进行训练,得到语音分析模型;以各视频片段的表情特征和欺诈标注为样本数据,对第二支持向量机进行训练,得到表情分析模型;
模型应用步骤:采集待识别对象的音视频数据,利用所述语音分析模型和表情分析模型对该音视频数据进行分析,输出该待识别对象的音频欺诈概率P1和视频欺诈概率P2;及
加权计算步骤:根据所述语音分析模型和表情分析模型的权重将P1、P2加权计算,得到该待识别对象的欺诈识别结果。
关于上述步骤的详细介绍,请参照下述图3关于欺诈识别程序10的程序模块图及图4关于结合音频分析和视频分析的欺诈识别方法较佳实施例的流 程图的说明。
参照图3所示,为图1、图2中欺诈识别程序10的程序模块图。在本实施例中,欺诈识别程序10被分割为多个模块,该多个模块被存储于存储器11中,并由处理器12执行,以完成本申请。本申请所称的模块是指能够完成特定功能的一系列计算机程序指令段。
所述欺诈识别程序10可以被分割为:获取模块110、提取模块120、训练模块130、模型应用模块140和加权计算模块150。
获取模块110,用于获取人物的音视频并对其进行解码和预处理,得到对应的音频部分和视频部分。所述音视频可以是通过图1的摄像装置3或图2的摄像装置30采集的,也可以是从网络信息或音视频资料库中选取的明显存在欺诈行为的音视频和无欺诈行为的音视频。以情绪为单位对用于训练支持向量机的音视频样本进行切割,得到音视频片段,为每个音视频片段分配一个欺诈标注,所述欺诈标注表示该音视频片段中的人物有无欺诈嫌疑,例如1表示有欺诈嫌疑,0表示无欺诈嫌疑。对音视频进行解码和预处理,得到对应的音频部分和视频部分。
提取模块120,用于提取所述音频部分的语音特征和视频部分的表情特征。提取模块120从获取模块110得到的每个音频部分提取语音特征,从获取模块110得到的每个视频部分提取表情特征。
提取模块120提取一个音频部分的语音特征时,先从该音频部分提取梅尔频率倒谱系数、音高、过零率等低阶音频特征,再从这些低阶音频特征中提取动态回归系数,得到该音频部分的动态音频特征,然后利用统计函数从所述低阶音频特征和动态音频特征中提取高阶音频特征,最后用特征筛选算法从高阶音频特征中筛选出高阶音频特征子集,将该高阶音频特征子集作为该音频部分的语音特征。
在本实施例中,可以使用OpenSMILE软件来提取音频部分的梅尔频率倒谱系数、音高、过零率等低阶音频特征。所述动态回归系数用来表示低阶音频特征的重要程度。例如,将某个音频部分的某个低阶音频特征(比如音高特征)以一个波形文件表示,则该波形文件用多元线性回归的方式可以表示为:
Y=β 01X 12X 2+…+β KX K
其中,k为该低阶音频特征在该音频部分的数目,β j(j=1,2,…,k)为该低阶音频特征的动态回归系数。
所述统计函数包括用来提取低阶音频特征和动态音频特征的最大值、最小值、峰度、偏度等的函数,提取模块120将利用统计函数提取到的数据进行组合、变换,得到高阶音频特征。各音频部分提取到的高阶音频特征的数量往往非常大,但通常只有少部分高阶音频特征会对欺诈识别的结果产生显著影响,所以,我们用特征筛选算法来减少高阶音频特征的数量,提高欺诈识别速度。在本实施例中,所述特征筛选算法可以是序列前向选择(Sequential Forward Selection,SFS)算法、序列后向选择(Sequential Backward Selection,SBS)算法、双向搜索(Bidirectional Search,BDS)算法、过滤特征选择(filter feature selection)算法,也可以是其他特征筛选算法。
类似地,提取模块120提取一个视频部分的表情特征时,先从该视频部分提取头部朝向、眼球朝向和面部动作单元(action unit,AU)等低阶动作特征,再统计各低阶动作特征在该视频部分出现的次数和持续的时长,根据统计结果构造该视频部分的高阶动作特征,然后用特征筛选算法从高阶动作特征中筛选出高阶动作特征子集,将该高阶动作特征子集作为该视频部分的表情特征。
训练模块130,用于训练支持向量机,得到语音分析模型和表情分析模型。以提取模块120提取的音视频样本的各音频部分的语音特征和获取模块110分配的欺诈标注为样本数据,对第一支持向量机进行训练,得到语音分析模型;以提取模块120提取的音视频样本的各视频部分的表情特征和获取模块110分配的欺诈标注为样本数据,对第二支持向量机进行训练,得到表情分析模型。
模型应用模块140,用于分析待识别对象的音视频数据,得到待识别对象的音频欺诈概率和视频欺诈概率。将提取模块120提取的待识别对象的音视频的音频部分的语音特征输入训练模块130训练得到的语音分析模型,输出该待识别对象的音频欺诈概率P1;将待识别对象视频部分的表情特征输入训练好的表情分析模型,输出该待识别对象的视频欺诈概率P2。
加权计算模块150,用于将待识别对象的音频欺诈概率P1和视频欺诈概 率P2加权计算,得到该待识别对象的欺诈识别结果。训练模块130利用样本数据训练支持向量机得到语音分析模型和表情分析模型时,可以统计出两个模型的准确率,以此计算语音分析模型和表情分析模型的权重,计算待识别对象最终的欺诈概率。
例如,假设语音分析模型的准确率为85%,表情分析模型的准确率为95%,计算语音分析模型和表情分析模型的权重可以表示如下:
P(Audio)=a=0.85
P(Video)=b=0.95
Figure PCTCN2018077345-appb-000001
Figure PCTCN2018077345-appb-000002
其中,P(Audio)表示语音分析模型的准确率,P(Video)表示表情分析模型的准确率,W(Audio)表示语音分析模型的权重,W(Video)表示表情分析模型的权重。
假设待识别对象的音视频数据经过语音分析模型、表情分析模型分析,得到该待识别对象的音频欺诈概率为0.8,视频欺诈概率为0.7,则根据W(Audio)和W(Video)进行加权融合计算,最终得到该待识别对象的欺诈概率为:
P=(0.85/1.8)*0.8+(0.95/1.8)*0.7
参照图4所示,为本申请结合音频分析和视频分析的欺诈识别方法较佳实施例的流程图。利用图1或图2所示的架构,启动电子装置1,处理器12执行存储器11中存储的欺诈识别程序10,实现如下步骤:
步骤S10,利用获取模块110收集人物音视频样本,以情绪为单位对音视频样本进行切割,得到音视频片段,为每个音视频片段分配一个欺诈标注。所述音视频样本可以是通过图1的摄像装置3或图2的摄像装置30获取的,也可以是从网络信息或音视频资料库中选取的明显存在欺诈行为的音视频和无欺诈行为的正常音视频。
步骤S20,利用获取模块110对每个音视频片段进行解码和预处理,得到每个音视频片段的音频片段和视频片段。每个音视频片段的欺诈标注仍作为对应音频片段和视频片段的欺诈标注。
步骤S30,利用提取模块120从每个音频片段和视频片段中分别提取语音特征和表情特征。语音特征和表情特征的具体提取方法请参照上述提取模块 120的详细介绍。
步骤S40,根据每个音频片段的语音特征和欺诈标注训练第一支持向量机,得到语音分析模型,根据每个视频片段的表情特征和欺诈标注训练第二支持向量机,得到表情分析模型。利用训练模块130,以每个音频片段的语音特征和欺诈标注为样本数据,对第一支持向量机进行训练,得到语音分析模型;以每个视频片段的表情特征和欺诈标注为样本数据,对第二支持向量机进行训练,得到表情分析模型。
步骤S50,利用获取模块110采集待识别对象的音视频数据,对该音视频数据进行解码和预处理,得到该待识别对象的音频数据和视频数据。所述音视频数据通过图1的摄像装置3或图2的摄像装置30实时获取。
步骤S60,利用提取模块120提取所述待识别对象的音频数据的语音特征和视频数据的表情特征。语音特征和表情特征的具体提取方法请参照上述提取模块120的详细介绍。
步骤S70,将该待识别对象的音频数据的语音特征和视频数据的表情特征分别输入所述语音分析模型和表情分析模型,得到该待识别对象的音频欺诈概率和视频欺诈概率。利用模型应用模块140,将提取模块120提取到的待识别对象的音频数据的语音特征输入所述语音分析模型,输出该待分析对象的音频欺诈概率P1;将提取模块120提取到的该待识别对象的视频数据的表情特征输入所述表情分析模型,输出该待识别对象的视频欺诈概率P2。
步骤S80,根据所述语音分析模型和表情分析模型的权重将P1、P2加权计算,得到待识别对象的欺诈识别结果。语音分析模型和表情分析模型的权重的确定方法和P1、P2加权计算的具体过程请参照上述加权计算模块150的详细介绍。
此外,本申请实施例还提出一种计算机可读存储介质,所述计算机可读存储介质可以是硬盘、多媒体卡、SD卡、闪存卡、SMC、只读存储器(ROM)、可擦除可编程只读存储器(EPROM)、便携式紧致盘只读存储器(CD-ROM)、USB存储器等等中的任意一种或者几种的任意组合。所述计算机可读存储介质中包括音视频样本及欺诈识别程序10,所述欺诈识别程序10被处理器执行时实现如下操作:。
样本准备步骤:收集人物音视频样本,对音视频样本进行切割,得到音视频片段,为每个音视频片段分配一个欺诈标注,对每个音视频片段进行解码和预处理,得到每个音视频片段的音频片段和视频片段;
特征提取步骤:从每个音频片段中提取语音特征,从每个视频片段中提取表情特征;
模型训练步骤:以各音频片段的语音特征和欺诈标注为样本数据,对第一支持向量机进行训练,得到语音分析模型;以各视频片段的表情特征和欺诈标注为样本数据,对第二支持向量机进行训练,得到表情分析模型;
模型应用步骤:采集待识别对象的音视频数据,利用所述语音分析模型和表情分析模型对该音视频数据进行分析,输出该待识别对象的音频欺诈概率P1和视频欺诈概率P2;及
加权计算步骤:根据所述语音分析模型和表情分析模型的权重将P1、P2加权计算,得到该待识别对象的欺诈识别结果。
本申请之计算机可读存储介质的具体实施方式与上述结合音频分析和视频分析的欺诈识别方法以及电子装置1的具体实施方式大致相同,在此不再赘述。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或 者网络设备等)执行本申请各个实施例所述的方法。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种结合音频分析和视频分析的欺诈识别方法,应用于电子装置,其特征在于,该方法包括:
    样本准备步骤:收集人物音视频样本,对音视频样本进行切割,得到音视频片段,为每个音视频片段分配一个欺诈标注,对每个音视频片段进行解码和预处理,得到每个音视频片段的音频片段和视频片段;
    特征提取步骤:从每个音频片段中提取语音特征,从每个视频片段中提取表情特征;
    模型训练步骤:以各音频片段的语音特征和欺诈标注为样本数据,对第一支持向量机进行训练,得到语音分析模型;以各视频片段的表情特征和欺诈标注为样本数据,对第二支持向量机进行训练,得到表情分析模型;
    模型应用步骤:采集待识别对象的音视频数据,利用所述语音分析模型和表情分析模型对该音视频数据进行分析,输出该待识别对象的音频欺诈概率P1和视频欺诈概率P2;及
    加权计算步骤:根据所述语音分析模型和表情分析模型的权重将P1、P2加权计算,得到该待识别对象的欺诈识别结果。
  2. 如权利要求1所述的欺诈识别方法,其特征在于,所述特征提取步骤中提取语音特征包括:
    第一特征提取步骤:从每个音频片段中提取低阶音频特征;
    第二特征提取步骤:从各低阶语音特征中提取动态回归系数,得到每个音频片段的动态音频特征;
    第三特征提取步骤:利用统计函数从所述低阶音频特征及动态音频特征中提取各音频片段的高阶音频特征;及
    筛选步骤:利用特征筛选算法从各音频片段的高阶音频特征中筛选出高阶音频特征子集,将高阶音频特征子集作为各音频片段的语音特征。
  3. 如权利要求2所述的欺诈识别方法,其特征在于,所述低阶音频特征包括梅尔频率倒谱系数、音高和过零率。
  4. 如权利要求1所述的欺诈识别方法,其特征在于,所述特征提取步骤中提取表情特征包括:
    低阶特征提取步骤:从每个视频片段中提取低阶动作特征;
    高阶特征构造步骤:统计每个视频片段中各低阶动作特征出现的次数及持续的时长,根据统计结果构造各视频片段的高阶动作特征;及
    筛选步骤:利用特征筛选算法从各视频片段的高阶动作特征中筛选出高阶动作特征子集,将高阶动作特征子集作为各视频片段的表情特征。
  5. 如权利要求2所述的欺诈识别方法,其特征在于,所述特征提取步骤中提取表情特征包括:
    低阶特征提取步骤:从每个视频片段中提取低阶动作特征;
    高阶特征构造步骤:统计每个视频片段中各低阶动作特征出现的次数及持续的时长,根据统计结果构造各视频片段的高阶动作特征;及
    筛选步骤:利用特征筛选算法从各视频片段的高阶动作特征中筛选出高阶动作特征子集,将高阶动作特征子集作为各视频片段的表情特征。
  6. 如权利要求4或5所述的欺诈识别方法,其特征在于,所述低阶动作特征包括头部朝向、眼球朝向和面部动作单元(action unit,AU)。
  7. 如权利要求1-5任一项所述的欺诈识别方法,其特征在于,所述模型应用步骤还包括以下步骤:
    对待识别对象的音视频数据进行解码和预处理,得到该待识别对象的音频数据和视频数据;
    从该待识别对象的音频数据中提取语音特征,从该待识别对象的视频数据中提取表情特征。
  8. 一种电子装置,包括存储器和处理器,其特征在于,所述存储器中包括欺诈识别程序,所述欺诈识别程序被所述处理器执行时实现如下步骤:
    样本准备步骤:收集人物音视频样本,对音视频样本进行切割,得到音视频片段,为每个音视频片段分配一个欺诈标注,对每个音视频片段进行解码和预处理,得到每个音视频片段的音频片段和视频片段;
    特征提取步骤:从每个音频片段中提取语音特征,从每个视频片段中提取表情特征;
    模型训练步骤:以各音频片段的语音特征和欺诈标注为样本数据,对第一支持向量机进行训练,得到语音分析模型;以各视频片段的表情特征和欺诈标注为样本数据,对第二支持向量机进行训练,得到表情分析模型;
    模型应用步骤:采集待识别对象的音视频数据,利用所述语音分析模型和表情分析模型对该音视频数据进行分析,输出该待识别对象的音频欺诈概率P1和视频欺诈概率P2;及
    加权计算步骤:根据所述语音分析模型和表情分析模型的权重将P1、P2加权计算,得到该待识别对象的欺诈识别结果。
  9. 如权利要求8所述的电子装置,其特征在于,所述特征提取步骤中提取语音特征包括:
    第一特征提取步骤:从每个音频片段中提取低阶音频特征;
    第二特征提取步骤:从各低阶语音特征中提取动态回归系数,得到每个音频片段的动态音频特征;
    第三特征提取步骤:利用统计函数从所述低阶音频特征及动态音频特征中提取各音频片段的高阶音频特征;及
    筛选步骤:利用特征筛选算法从各音频片段的高阶音频特征中筛选出高阶音频特征子集,将高阶音频特征子集作为各音频片段的语音特征。
  10. 如权利要求9所述的电子装置,其特征在于,所述低阶音频特征包括梅尔频率倒谱系数、音高和过零率。
  11. 如权利要求8所述的电子装置,其特征在于,所述特征提取步骤中提取表情特征包括:
    低阶特征提取步骤:从每个视频片段中提取低阶动作特征;
    高阶特征构造步骤:统计每个视频片段中各低阶动作特征出现的次数及持续的时长,根据统计结果构造各视频片段的高阶动作特征;及
    筛选步骤:利用特征筛选算法从各视频片段的高阶动作特征中筛选出高阶动作特征子集,将高阶动作特征子集作为各视频片段的表情特征。
  12. 如权利要求9所述的电子装置,其特征在于,所述特征提取步骤中提取表情特征包括:
    低阶特征提取步骤:从每个视频片段中提取低阶动作特征;
    高阶特征构造步骤:统计每个视频片段中各低阶动作特征出现的次数及持续的时长,根据统计结果构造各视频片段的高阶动作特征;及
    筛选步骤:利用特征筛选算法从各视频片段的高阶动作特征中筛选出高阶动作特征子集,将高阶动作特征子集作为各视频片段的表情特征。
  13. 如权利要求11或12所述的电子装置,其特征在于,所述低阶动作特征包括头部朝向、眼球朝向和面部动作单元(action unit,AU)。
  14. 如权利要求8-12任一项所述的电子装置,其特征在于,所述模型应用步骤还包括以下步骤:
    对待识别对象的音视频数据进行解码和预处理,得到该待识别对象的音频数据和视频数据;
    从该待识别对象的音频数据中提取语音特征,从该待识别对象的视频数据中提取表情特征。
  15. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中包括欺诈识别程序,所述欺诈识别程序被处理器执行时实现如下步骤:
    样本准备步骤:收集人物音视频样本,对音视频样本进行切割,得到音视频片段,为每个音视频片段分配一个欺诈标注,对每个音视频片段进行解码和预处理,得到每个音视频片段的音频片段和视频片段;
    特征提取步骤:从每个音频片段中提取语音特征,从每个视频片段中提取表情特征;
    模型训练步骤:以各音频片段的语音特征和欺诈标注为样本数据,对第一支持向量机进行训练,得到语音分析模型;以各视频片段的表情特征和欺诈标注为样本数据,对第二支持向量机进行训练,得到表情分析模型;
    模型应用步骤:采集待识别对象的音视频数据,利用所述语音分析模型和表情分析模型对该音视频数据进行分析,输出该待识别对象的音频欺诈概率P1和视频欺诈概率P2;及
    加权计算步骤:根据所述语音分析模型和表情分析模型的权重将P1、P2加权计算,得到该待识别对象的欺诈识别结果。
  16. 如权利要求15所述的计算机可读存储介质,其特征在于,所述特征提取步骤中提取语音特征包括:
    第一特征提取步骤:从每个音频片段中提取低阶音频特征;
    第二特征提取步骤:从各低阶语音特征中提取动态回归系数,得到每个音频片段的动态音频特征;
    第三特征提取步骤:利用统计函数从所述低阶音频特征及动态音频特征中提取各音频片段的高阶音频特征;及
    筛选步骤:利用特征筛选算法从各音频片段的高阶音频特征中筛选出高阶音频特征子集,将高阶音频特征子集作为各音频片段的语音特征。
  17. 如权利要求16所述的计算机可读存储介质,其特征在于,所述低阶音频特征包括梅尔频率倒谱系数、音高和过零率。
  18. 如权利要求15或16所述的计算机可读存储介质,其特征在于,所述特征提取步骤中提取表情特征包括:
    低阶特征提取步骤:从每个视频片段中提取低阶动作特征;
    高阶特征构造步骤:统计每个视频片段中各低阶动作特征出现的次数及持续的时长,根据统计结果构造各视频片段的高阶动作特征;及
    筛选步骤:利用特征筛选算法从各视频片段的高阶动作特征中筛选出高阶动作特征子集,将高阶动作特征子集作为各视频片段的表情特征。
  19. 如权利要求18所述的计算机可读存储介质,其特征在于,所述低阶动作特征包括头部朝向、眼球朝向和面部动作单元(action unit,AU)。
  20. 如权利要求19所述的计算机可读存储介质,其特征在于,所模型应用步骤还包括以下步骤:
    对待识别对象的音视频数据进行解码和预处理,得到该待识别对象的音频数据和视频数据;
    从该待识别对象的音频数据中提取语音特征,从该待识别对象的视频数据中提取表情特征。
PCT/CN2018/077345 2017-12-01 2018-02-27 结合音频分析和视频分析的欺诈识别方法、装置及存储介质 WO2019104890A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711252009.1A CN108053838B (zh) 2017-12-01 2017-12-01 结合音频分析和视频分析的欺诈识别方法、装置及存储介质
CN201711252009.1 2017-12-01

Publications (1)

Publication Number Publication Date
WO2019104890A1 true WO2019104890A1 (zh) 2019-06-06

Family

ID=62121930

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/077345 WO2019104890A1 (zh) 2017-12-01 2018-02-27 结合音频分析和视频分析的欺诈识别方法、装置及存储介质

Country Status (2)

Country Link
CN (1) CN108053838B (zh)
WO (1) WO2019104890A1 (zh)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111444379A (zh) * 2020-03-30 2020-07-24 腾讯科技(深圳)有限公司 音频的特征向量生成方法及音频片段表示模型的训练方法
CN111460907A (zh) * 2020-03-05 2020-07-28 浙江大华技术股份有限公司 一种恶意行为识别方法、系统及存储介质
CN112133327A (zh) * 2020-09-17 2020-12-25 腾讯音乐娱乐科技(深圳)有限公司 一种音频样本的提取方法、设备、终端及存储介质
CN112331230A (zh) * 2020-11-17 2021-02-05 平安科技(深圳)有限公司 一种欺诈行为识别方法、装置、计算机设备及存储介质
CN112562687A (zh) * 2020-12-11 2021-03-26 天津讯飞极智科技有限公司 音视频处理方法、装置、录音笔和存储介质
CN113314103A (zh) * 2021-05-31 2021-08-27 中国工商银行股份有限公司 基于实时语音情感分析的非法信息识别方法及装置
CN113409822A (zh) * 2021-05-31 2021-09-17 青岛海尔科技有限公司 对象状态的确定方法、装置、存储介质及电子装置

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109389489B (zh) * 2018-09-25 2023-04-18 平安科技(深圳)有限公司 欺诈行为的识别方法、计算机可读存储介质及终端设备
CN109376603A (zh) * 2018-09-25 2019-02-22 北京周同科技有限公司 一种视频识别方法、装置、计算机设备及存储介质
CN109522799A (zh) * 2018-10-16 2019-03-26 深圳壹账通智能科技有限公司 信息提示方法、装置、计算机设备和存储介质
CN109472487A (zh) * 2018-11-02 2019-03-15 深圳壹账通智能科技有限公司 视频质检方法、装置、计算机设备及存储介质
CN109493882A (zh) * 2018-11-04 2019-03-19 国家计算机网络与信息安全管理中心 一种诈骗电话语音自动标注系统及方法
CN109831677B (zh) * 2018-12-14 2022-04-01 平安科技(深圳)有限公司 视频脱敏方法、装置、计算机设备和存储介质
CN109858330A (zh) * 2018-12-15 2019-06-07 深圳壹账通智能科技有限公司 基于视频的表情分析方法、装置、电子设备及存储介质
CN109729383B (zh) * 2019-01-04 2021-11-02 深圳壹账通智能科技有限公司 双录视频质量检测方法、装置、计算机设备和存储介质
CN109800720B (zh) * 2019-01-23 2023-12-22 平安科技(深圳)有限公司 情绪识别模型训练方法、情绪识别方法、装置、设备及存储介质
CN111144197A (zh) * 2019-11-08 2020-05-12 宇龙计算机通信科技(深圳)有限公司 人性识别方法、装置、存储介质和电子设备
CN111339940B (zh) * 2020-02-26 2023-07-21 中国工商银行股份有限公司 视频风险识别方法及装置
SG10202006357UA (en) 2020-07-01 2020-09-29 Alipay Labs Singapore Pte Ltd A Document Identification Method and System
CN112202720B (zh) * 2020-09-04 2023-05-02 中移雄安信息通信科技有限公司 音视频识别方法、装置、电子设备及计算机存储介质
CN112040488A (zh) * 2020-09-10 2020-12-04 安徽师范大学 基于mac地址和信道状态双层指纹的非法设备识别方法

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103023900A (zh) * 2012-12-06 2013-04-03 北京百度网讯科技有限公司 基于云服务器端的身份认证方法、云服务系统和云服务器
CN103226948A (zh) * 2013-04-22 2013-07-31 山东师范大学 一种基于声学事件的音频场景识别方法
CN103971700A (zh) * 2013-08-01 2014-08-06 哈尔滨理工大学 语音监控方法及装置
CN105100363A (zh) * 2015-06-29 2015-11-25 小米科技有限责任公司 信息处理方法、装置及终端
US20160050197A1 (en) * 2014-08-14 2016-02-18 Bank Of America Corporation Audio authentication system
CN105718874A (zh) * 2016-01-18 2016-06-29 北京天诚盛业科技有限公司 活体检测及认证的方法和装置
CN106157135A (zh) * 2016-07-14 2016-11-23 微额速达(上海)金融信息服务有限公司 基于声纹识别性别年龄的防欺诈系统及方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103023900A (zh) * 2012-12-06 2013-04-03 北京百度网讯科技有限公司 基于云服务器端的身份认证方法、云服务系统和云服务器
CN103226948A (zh) * 2013-04-22 2013-07-31 山东师范大学 一种基于声学事件的音频场景识别方法
CN103971700A (zh) * 2013-08-01 2014-08-06 哈尔滨理工大学 语音监控方法及装置
US20160050197A1 (en) * 2014-08-14 2016-02-18 Bank Of America Corporation Audio authentication system
CN105100363A (zh) * 2015-06-29 2015-11-25 小米科技有限责任公司 信息处理方法、装置及终端
CN105718874A (zh) * 2016-01-18 2016-06-29 北京天诚盛业科技有限公司 活体检测及认证的方法和装置
CN106157135A (zh) * 2016-07-14 2016-11-23 微额速达(上海)金融信息服务有限公司 基于声纹识别性别年龄的防欺诈系统及方法

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111460907A (zh) * 2020-03-05 2020-07-28 浙江大华技术股份有限公司 一种恶意行为识别方法、系统及存储介质
CN111460907B (zh) * 2020-03-05 2023-06-20 浙江大华技术股份有限公司 一种恶意行为识别方法、系统及存储介质
CN111444379A (zh) * 2020-03-30 2020-07-24 腾讯科技(深圳)有限公司 音频的特征向量生成方法及音频片段表示模型的训练方法
CN111444379B (zh) * 2020-03-30 2023-08-08 腾讯科技(深圳)有限公司 音频的特征向量生成方法及音频片段表示模型的训练方法
CN112133327A (zh) * 2020-09-17 2020-12-25 腾讯音乐娱乐科技(深圳)有限公司 一种音频样本的提取方法、设备、终端及存储介质
CN112133327B (zh) * 2020-09-17 2024-02-13 腾讯音乐娱乐科技(深圳)有限公司 一种音频样本的提取方法、设备、终端及存储介质
CN112331230A (zh) * 2020-11-17 2021-02-05 平安科技(深圳)有限公司 一种欺诈行为识别方法、装置、计算机设备及存储介质
CN112562687A (zh) * 2020-12-11 2021-03-26 天津讯飞极智科技有限公司 音视频处理方法、装置、录音笔和存储介质
CN112562687B (zh) * 2020-12-11 2023-08-04 天津讯飞极智科技有限公司 音视频处理方法、装置、录音笔和存储介质
CN113314103A (zh) * 2021-05-31 2021-08-27 中国工商银行股份有限公司 基于实时语音情感分析的非法信息识别方法及装置
CN113409822A (zh) * 2021-05-31 2021-09-17 青岛海尔科技有限公司 对象状态的确定方法、装置、存储介质及电子装置
CN113314103B (zh) * 2021-05-31 2023-03-03 中国工商银行股份有限公司 基于实时语音情感分析的非法信息识别方法及装置

Also Published As

Publication number Publication date
CN108053838A (zh) 2018-05-18
CN108053838B (zh) 2019-10-11

Similar Documents

Publication Publication Date Title
WO2019104890A1 (zh) 结合音频分析和视频分析的欺诈识别方法、装置及存储介质
WO2019085329A1 (zh) 基于循环神经网络的人物性格分析方法、装置及存储介质
WO2019085331A1 (zh) 欺诈可能性分析方法、装置及存储介质
WO2019085330A1 (zh) 人物性格分析方法、装置及存储介质
WO2019119505A1 (zh) 人脸识别的方法和装置、计算机装置及存储介质
CN104598644B (zh) 喜好标签挖掘方法和装置
CN110619568A (zh) 风险评估报告的生成方法、装置、设备及存储介质
CN106683688B (zh) 一种情绪检测方法及装置
US20210398416A1 (en) Systems and methods for a hand hygiene compliance checking system with explainable feedback
US20160232403A1 (en) Arabic sign language recognition using multi-sensor data fusion
CN110222331B (zh) 谎言识别方法及装置、存储介质、计算机设备
WO2019109530A1 (zh) 情绪识别方法、装置及存储介质
US20230410222A1 (en) Information processing apparatus, control method, and program
CN112768070A (zh) 一种基于对话交流的精神健康评测方法和系统
CN111738199B (zh) 图像信息验证方法、装置、计算装置和介质
CN113243918A (zh) 基于多模态隐匿信息测试的风险检测方法及装置
CN110717407A (zh) 基于唇语密码的人脸识别方法、装置及存储介质
CN106980658A (zh) 视频标注方法及装置
CN110393539B (zh) 心理异常检测方法、装置、存储介质及电子设备
CN112397052A (zh) Vad断句测试方法、装置、计算机设备及存储介质
CN111326142A (zh) 基于语音转文本的文本信息提取方法、系统和电子设备
CN116130088A (zh) 多模态面诊问诊方法、装置及相关设备
CN112911334A (zh) 基于音视频数据的情绪识别方法、装置、设备及存储介质
CN113921098A (zh) 一种医疗服务评价方法和系统
CN113808619B (zh) 一种语音情绪识别方法、装置及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18884240

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18884240

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 06.10.2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18884240

Country of ref document: EP

Kind code of ref document: A1