WO2019104890A1 - 结合音频分析和视频分析的欺诈识别方法、装置及存储介质 - Google Patents
结合音频分析和视频分析的欺诈识别方法、装置及存储介质 Download PDFInfo
- Publication number
- WO2019104890A1 WO2019104890A1 PCT/CN2018/077345 CN2018077345W WO2019104890A1 WO 2019104890 A1 WO2019104890 A1 WO 2019104890A1 CN 2018077345 W CN2018077345 W CN 2018077345W WO 2019104890 A1 WO2019104890 A1 WO 2019104890A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- audio
- video
- features
- feature
- order
- Prior art date
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 68
- 238000000034 method Methods 0.000 title claims abstract description 31
- 238000010195 expression analysis Methods 0.000 claims abstract description 41
- 238000012549 training Methods 0.000 claims abstract description 25
- 238000012706 support-vector machine Methods 0.000 claims abstract description 18
- 238000007781 pre-processing Methods 0.000 claims abstract description 11
- 238000000605 extraction Methods 0.000 claims description 45
- 238000004422 calculation algorithm Methods 0.000 claims description 17
- 238000012216 screening Methods 0.000 claims description 15
- 238000004364 calculation method Methods 0.000 claims description 11
- 230000006870 function Effects 0.000 claims description 11
- 238000001914 filtration Methods 0.000 claims description 8
- 238000002360 preparation method Methods 0.000 claims description 7
- 230000009471 action Effects 0.000 claims description 4
- 238000001514 detection method Methods 0.000 claims description 4
- 238000010276 construction Methods 0.000 claims 5
- 230000002996 emotional effect Effects 0.000 abstract 6
- 238000010586 diagram Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 6
- 239000000284 extract Substances 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 238000003384 imaging method Methods 0.000 description 4
- 230000008451 emotion Effects 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000036772 blood pressure Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000029058 respiratory gaseous exchange Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
Definitions
- the present application relates to the field of computer information processing technologies, and in particular, to a fraud detection method, apparatus, and computer readable storage medium incorporating audio analysis and video analysis.
- fraud identification is generally achieved through face-to-face review. It relies heavily on the experience and judgment of analysts. It takes a lot of time and manpower, and the analysis results are often inaccurate and objective. It also uses professional equipment to determine whether the test subject is suspected of fraud by detecting a series of indicators such as breathing, pulse, blood pressure, and skin resistance. However, such equipment is usually expensive and easily infringes on the human rights of the tested person.
- the present application provides a fraud identification method, apparatus and computer readable storage medium combining audio analysis and video analysis, and objectively and accurately determining an object to be identified by analyzing audio and video data of an object to be identified. Whether the object is suspected of fraud.
- the present application provides a fraud identification method combining audio analysis and video analysis, which is applied to an electronic device, and the method includes:
- Sample preparation steps collecting character audio and video samples, cutting audio and video samples, obtaining audio and video clips, assigning a fraud label to each audio and video clip, decoding and preprocessing each audio and video clip to obtain each audio and video Audio clips and video clips of the clip;
- Feature extraction step extracting speech features from each audio segment, and extracting expression features from each video segment;
- Model training step the voice data of each audio segment and the fraud label are used as sample data, and the first support vector machine is trained to obtain a voice analysis model; the expression features of each video segment and fraud are labeled as sample data, and the second support is provided.
- the vector machine performs training to obtain an expression analysis model;
- the application step of the model is: collecting audio and video data of the object to be identified, analyzing the audio and video data by using the voice analysis model and the expression analysis model, and outputting an audio fraud probability P1 and a video fraud probability P2 of the object to be identified;
- the weighting calculation step weighting P1 and P2 according to the weights of the speech analysis model and the expression analysis model to obtain a fraud identification result of the object to be identified.
- the application also provides an electronic device comprising a memory and a processor, the memory including a fraud recognition program.
- the electronic device is directly or indirectly connected to the imaging device, and the imaging device transmits the collected audio and video data to the electronic device.
- the processor of the electronic device executes the fraud recognition program in the memory, the following steps are implemented:
- Sample preparation steps collecting character audio and video samples, cutting audio and video samples, obtaining audio and video clips, assigning a fraud label to each audio and video clip, decoding and preprocessing each audio and video clip to obtain each audio and video Audio clips and video clips of the clip;
- Feature extraction step extracting speech features from each audio segment, and extracting expression features from each video segment;
- Model training step the voice data of each audio segment and the fraud label are used as sample data, and the first support vector machine is trained to obtain a voice analysis model; the expression features of each video segment and fraud are labeled as sample data, and the second support is provided.
- the vector machine performs training to obtain an expression analysis model;
- the application step of the model is: collecting audio and video data of the object to be identified, analyzing the audio and video data by using the voice analysis model and the expression analysis model, and outputting an audio fraud probability P1 and a video fraud probability P2 of the object to be identified;
- the weighting calculation step weighting P1 and P2 according to the weights of the speech analysis model and the expression analysis model to obtain a fraud identification result of the object to be identified.
- the present application further provides a computer readable storage medium including a fraud recognition program, when the fraud recognition program is executed by a processor, implementing the combined audio as described above Any step in the fraud identification method of analysis and video analysis.
- the fraud detection method, device and computer readable storage medium provided by the present application combined with audio analysis and video analysis, by extracting the voice features of the audio and video sample audio segments and the expression features of the video segments, combined with the corresponding fraud annotation, the support vector machine Training is performed to obtain a speech analysis model and an expression analysis model.
- the trained model is applied to the real-time fraud identification link: collecting audio and video data of the object to be identified, extracting the phonetic features and expression features of the audio and video data, and inputting the speech features and expression features into the trained speech analysis.
- the model and the expression analysis model output the audio fraud probability P1 and the video fraud probability P2 of the object to be identified, and combine the P1 and P2 weights to obtain the fraud recognition result of the object to be identified.
- FIG. 1 is an application environment diagram of a first preferred embodiment of an electronic device of the present application.
- FIG. 2 is an application environment diagram of a second preferred embodiment of the electronic device of the present application.
- FIG. 3 is a block diagram showing the program of the fraud recognition program in FIGS. 1 and 2.
- FIG. 4 is a flow chart of a preferred embodiment of a fraud identification method in conjunction with audio analysis and video analysis.
- the camera device 3 is connected to the electronic device 1 via the network 2, and the camera device 3 collects audio and video data of the person and transmits it to the electronic device 1 via the network 2.
- the electronic device 1 analyzes the device using the fraud identification program 10 provided by the present application. The audio and video data is described, and the fraud recognition result of the character is obtained.
- the electronic device 1 may be a terminal device having a storage and computing function such as a server, a smart phone, a tablet computer, a portable computer, a desktop computer, or the like.
- the electronic device 1 includes a memory 11, a processor 12, a network interface 13, and a communication bus 14.
- the camera device 3 is installed in a specific place, such as an office space, a monitoring area, and the like, for collecting audio and video data of a person, and then transmitting the audio and video data to the memory 11 through the network 2.
- the network interface 13 may include a standard wired interface, a wireless interface (such as a WI-FI interface).
- Communication bus 14 is used to implement connection communication between these components.
- the memory 11 includes at least one type of readable storage medium.
- the at least one type of readable storage medium may be a non-volatile storage medium such as a flash memory, a hard disk, a multimedia card, a card type memory, or the like.
- the readable storage medium may be an internal storage unit of the electronic device 1, such as a hard disk of the electronic device 1.
- the readable storage medium may also be an external memory 11 of the electronic device 1, such as a plug-in hard disk equipped on the electronic device 1, a smart memory card (SMC). , Secure Digital (SD) card, Flash Card, etc.
- SMC smart memory card
- SD Secure Digital
- the memory 11 stores the program code of the fraud recognition program 10, the audio and video data collected by the camera 3, and other data to which the processor 12 executes the program code of the fraud recognition program 10 and the final output. Data, etc.
- Processor 12 may be a Central Processing Unit (CPU), microprocessor or other data processing chip in some embodiments.
- CPU Central Processing Unit
- microprocessor or other data processing chip in some embodiments.
- Figure 1 shows only the electronic device 1 with components 11-14, but it should be understood that not all illustrated components may be implemented, and more or fewer components may be implemented instead.
- the electronic device 1 may further include a user interface
- the user interface may include an input unit such as a keyboard, a voice input device such as a microphone, a device with a voice recognition function, a voice output device such as an audio, a headphone, and the like.
- the user interface may also include a standard wired interface and a wireless interface.
- the electronic device 1 may further include a display.
- the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch sensor, or the like in some embodiments.
- the display is used to display information processed by the electronic device 1 and a visualized user interface.
- the electronic device 1 further comprises a touch sensor.
- the area provided by the touch sensor for the user to perform a touch operation is referred to as a touch area.
- the touch sensor described herein may be a resistive touch sensor, a capacitive touch sensor, or the like.
- the touch sensor includes not only a contact type touch sensor but also a proximity type touch sensor or the like.
- the touch sensor may be a single sensor or a plurality of sensors arranged, for example, in an array.
- a user such as a counselor, a credit reviewer, etc., can initiate the fraud identification program 10 by touch.
- the electronic device 1 may further include a radio frequency (RF) circuit, a sensor, an audio circuit, and the like, and details are not described herein.
- RF radio frequency
- FIG. 2 it is an application environment diagram of a second preferred embodiment of the electronic device of the present application.
- the user implements the fraud identification process through the terminal 3, and the camera device 30 of the terminal 3 collects the audio and video data of the object to be identified, and transmits it to the electronic device 1 through the network 2, and the processor 12 of the electronic device 1 performs the fraud identification stored in the memory 11.
- the program code of the program 10 analyzes the audio data and the video data of the audio and video data, outputs the audio fraud probability P1 and the video fraud probability P2 of the object to be identified, and weights the P1 and P2 to obtain the fraud identification of the object to be identified. The result is for reference by the object to be identified or the examiner.
- the terminal 3 can be a terminal device having a storage and computing function, such as a smart phone, a tablet computer, a portable computer, and a desktop computer.
- the fraud identification program 10 of Figures 1 and 2 when executed by the processor 12, implements the following steps:
- Sample preparation steps collecting character audio and video samples, cutting audio and video samples, obtaining audio and video clips, assigning a fraud label to each audio and video clip, decoding and preprocessing each audio and video clip to obtain each audio and video Audio clips and video clips of the clip;
- Feature extraction step extracting speech features from each audio segment, and extracting expression features from each video segment;
- Model training step the voice data of each audio segment and the fraud label are used as sample data, and the first support vector machine is trained to obtain a voice analysis model; the expression features of each video segment and fraud are labeled as sample data, and the second support is provided.
- the vector machine performs training to obtain an expression analysis model;
- the application step of the model is: collecting audio and video data of the object to be identified, analyzing the audio and video data by using the voice analysis model and the expression analysis model, and outputting an audio fraud probability P1 and a video fraud probability P2 of the object to be identified;
- the weighting calculation step weighting P1 and P2 according to the weights of the speech analysis model and the expression analysis model to obtain a fraud identification result of the object to be identified.
- FIG. 3 for a program block diagram of the fraud identification program 10
- FIG. 4 for a description of a flow chart of a preferred embodiment of the fraud identification method incorporating audio analysis and video analysis.
- FIG. 3 it is a program block diagram of the fraud recognition program 10 in Figs.
- the fraud recognition program 10 is divided into a plurality of modules that are stored in the memory 11 and executed by the processor 12 to complete the present application.
- a module as referred to in this application refers to a series of computer program instructions that are capable of performing a particular function.
- the fraud recognition program 10 can be divided into an acquisition module 110, an extraction module 120, a training module 130, a model application module 140, and a weighting calculation module 150.
- the obtaining module 110 is configured to acquire audio and video of a character and decode and preprocess the same to obtain a corresponding audio part and a video part.
- the audio and video may be collected by the camera device 3 of FIG. 1 or the camera device 30 of FIG. 2, or may be an audio and video and a fraud-free tone that are clearly selected from a network information or an audio-video database. video.
- the audio and video samples used to train the support vector machine are cut in units of emotions to obtain audio and video segments, and each audio and video segment is assigned a fraud label, and the fraud mark indicates that the person in the audio and video clip is suspected of fraud. For example, 1 indicates fraud suspects and 0 indicates no fraud suspects.
- the audio and video are decoded and preprocessed to obtain a corresponding audio portion and video portion.
- the extracting module 120 is configured to extract a voice feature of the audio portion and an expression feature of the video portion.
- the extraction module 120 extracts the voice features from each of the audio portions obtained by the acquisition module 110, and extracts the expression features from each of the video portions obtained by the acquisition module 110.
- the extraction module 120 When extracting the speech feature of an audio part, the extraction module 120 first extracts low-order audio features such as the frequency cepstral coefficient, pitch, and zero-crossing rate from the audio part, and then extracts dynamic regression coefficients from the low-order audio features. Obtaining dynamic audio features of the audio portion, and then extracting high-order audio features from the low-order audio features and dynamic audio features by using a statistical function, and finally filtering the high-order audio feature subsets from the high-order audio features by using a feature filtering algorithm. The high-order audio feature subset is used as the speech feature of the audio portion.
- low-order audio features such as the frequency cepstral coefficient, pitch, and zero-crossing rate
- dynamic regression coefficients from the low-order audio features.
- OpenSMILE software can be used to extract low-order audio features such as the Mel frequency cepstral coefficient, pitch, and zero-crossing rate of the audio portion.
- the dynamic regression coefficients are used to indicate the importance of low-order audio features. For example, if a certain low-order audio feature (such as a pitch feature) of an audio portion is represented by a waveform file, the waveform file can be expressed by multiple linear regression as:
- k is the number of the lower-order audio features in the audio portion
- the statistical function includes a function for extracting a maximum value, a minimum value, a kurtosis, a skewness, and the like of the low-order audio feature and the dynamic audio feature, and the extraction module 120 combines and transforms the data extracted by the statistical function to obtain a high value.
- Order audio features The number of high-order audio features extracted by each audio part is often very large, but usually only a small number of high-order audio features have a significant impact on the results of fraud detection, so we use feature filtering algorithms to reduce the number of high-order audio features. Improve the speed of fraud identification.
- the feature screening algorithm may be a sequence forward selection (SFS) algorithm, a sequence backward selection (SBS) algorithm, a bidirectional search (BDS) algorithm, and filtering.
- the filter feature selection algorithm may also be another feature screening algorithm.
- the extraction module 120 when extracting the expression feature of a video part, the extraction module 120 first extracts low-order motion features such as head orientation, eye orientation, and action unit (AU) from the video portion, and then statistics each low-order motion feature. The number of occurrences and the duration of the video portion are constructed, and the high-order motion features of the video portion are constructed according to the statistical result, and then the feature filtering algorithm is used to filter the high-order motion feature subset from the high-order motion feature, and the high-order motion feature is selected. The feature subset is used as the emoticon feature of the video portion.
- low-order motion features such as head orientation, eye orientation, and action unit (AU)
- AU action unit
- the training module 130 is configured to train the support vector machine to obtain a speech analysis model and an expression analysis model.
- the voice feature of each audio portion of the audio and video samples extracted by the extraction module 120 and the fraud assigned by the acquisition module 110 are labeled as sample data, and the first support vector machine is trained to obtain a voice analysis model; and the audio and video extracted by the extraction module 120 is extracted.
- the expression features of each video portion of the sample and the fraud assigned by the acquisition module 110 are labeled as sample data, and the second support vector machine is trained to obtain an expression analysis model.
- the model application module 140 is configured to analyze audio and video data of the object to be identified, and obtain an audio fraud probability and a video fraud probability of the object to be identified.
- the voice feature of the audio portion of the audio and video to be recognized by the extraction module 120 is input into the voice analysis model trained by the training module 130, and the audio fraud probability P1 of the object to be identified is output; the expression feature of the video portion of the object to be identified is input.
- the trained expression analysis model outputs the video fraud probability P2 of the object to be identified.
- the weighting calculation module 150 is configured to weight the audio fraud probability P1 and the video fraud probability P2 of the object to be identified to obtain a fraud identification result of the object to be identified.
- the training module 130 uses the sample data training support vector machine to obtain the speech analysis model and the expression analysis model, the accuracy of the two models can be calculated, thereby calculating the weights of the speech analysis model and the expression analysis model, and calculating the final fraud of the object to be identified. Probability.
- the weights of the computational speech analysis model and the expression analysis model can be expressed as follows:
- P (Audio) represents the accuracy of the speech analysis model
- P (Video) represents the accuracy of the expression analysis model
- W (Audio) represents the weight of the speech analysis model
- W (Video) represents the weight of the expression analysis model
- the audio and video data of the object to be identified is analyzed by the speech analysis model and the expression analysis model, and the audio fraud probability of the object to be identified is 0.8, and the video fraud probability is 0.7, and the weighted fusion is performed according to W (Audio) and W (Video).
- the probability of fraudulently obtaining the object to be identified is:
- FIG. 4 it is a flowchart of a preferred embodiment of the fraud identification method in conjunction with audio analysis and video analysis.
- the electronic device 1 is activated, and the processor 12 executes the fraud recognition program 10 stored in the memory 11 to implement the following steps:
- step S10 the acquisition module 110 collects the audio and video samples of the character, and cuts the audio and video samples in units of emotions to obtain audio and video segments, and assigns a fraud label to each audio and video segment.
- the audio and video samples may be acquired by the camera device 3 of FIG. 1 or the camera device 30 of FIG. 2, or may be an audio and video and fraud-free behavior selected from a network information or an audio-video database. Normal audio and video.
- each audio and video segment is decoded and preprocessed by the obtaining module 110 to obtain an audio segment and a video segment of each audio and video segment.
- the fraudulent annotation of each audio clip is still a fraudulent annotation of the corresponding audio clip and video clip.
- Step S30 extracting the voice feature and the expression feature from each of the audio segment and the video segment by using the extraction module 120.
- the voice feature and the expression feature please refer to the detailed description of the above extraction module 120.
- Step S40 training the first support vector machine according to the voice features and fraud labels of each audio segment, obtaining a speech analysis model, training the second support vector machine according to the expression features and fraud labels of each video segment, and obtaining an expression analysis model.
- the training module 130 the voice feature of each audio segment and the fraud are labeled as sample data, and the first support vector machine is trained to obtain a voice analysis model; and the expression features and fraud of each video segment are labeled as sample data,
- the second support vector machine performs training to obtain an expression analysis model.
- step S50 the audio and video data of the object to be identified is collected by the acquiring module 110, and the audio and video data is decoded and preprocessed to obtain audio data and video data of the object to be identified.
- the audio and video data is acquired in real time by the imaging device 3 of FIG. 1 or the imaging device 30 of FIG.
- step S60 the extraction module 120 is used to extract the voice features of the audio data of the object to be identified and the expression features of the video data.
- the voice feature and the expression feature please refer to the detailed introduction of the above extraction module 120.
- Step S70 input the voice feature of the audio data of the object to be identified and the expression feature of the video data into the voice analysis model and the expression analysis model, respectively, to obtain an audio fraud probability and a video fraud probability of the object to be identified.
- the voice feature of the audio data of the object to be identified extracted by the extraction module 120 is input into the voice analysis model, and the audio fraud probability P1 of the object to be analyzed is output; the to-be-identified is extracted by the extraction module 120.
- the expression feature of the video data of the object is input to the expression analysis model, and the video fraud probability P2 of the object to be identified is output.
- Step S80 weighting P1 and P2 according to the weights of the speech analysis model and the expression analysis model to obtain a fraud identification result of the object to be identified.
- the weight calculation module 150 For the method of determining the weight of the speech analysis model and the expression analysis model and the specific process of the P1 and P2 weight calculation, please refer to the detailed description of the weight calculation module 150 described above.
- the embodiment of the present application further provides a computer readable storage medium, which may be a hard disk, a multimedia card, an SD card, a flash memory card, an SMC, a read only memory (ROM), and an erasable programmable Any combination or combination of any one or more of read only memory (EPROM), portable compact disk read only memory (CD-ROM), USB memory, and the like.
- the computer readable storage medium includes an audio and video sample and a fraud recognition program 10, and when the fraud recognition program 10 is executed by the processor, the following operations are performed:
- Sample preparation steps collecting character audio and video samples, cutting audio and video samples, obtaining audio and video clips, assigning a fraud label to each audio and video clip, decoding and preprocessing each audio and video clip to obtain each audio and video Audio clips and video clips of the clip;
- Feature extraction step extracting speech features from each audio segment, and extracting expression features from each video segment;
- Model training step the voice data of each audio segment and the fraud label are used as sample data, and the first support vector machine is trained to obtain a voice analysis model; the expression features of each video segment and fraud are labeled as sample data, and the second support is provided.
- the vector machine performs training to obtain an expression analysis model;
- the application step of the model is: collecting audio and video data of the object to be identified, analyzing the audio and video data by using the voice analysis model and the expression analysis model, and outputting an audio fraud probability P1 and a video fraud probability P2 of the object to be identified;
- the weighting calculation step weighting P1 and P2 according to the weights of the speech analysis model and the expression analysis model to obtain a fraud identification result of the object to be identified.
- a disk including a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the methods described in the various embodiments of the present application.
- a terminal device which may be a mobile phone, a computer, a server, or a network device, etc.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- General Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
Claims (20)
- 一种结合音频分析和视频分析的欺诈识别方法,应用于电子装置,其特征在于,该方法包括:样本准备步骤:收集人物音视频样本,对音视频样本进行切割,得到音视频片段,为每个音视频片段分配一个欺诈标注,对每个音视频片段进行解码和预处理,得到每个音视频片段的音频片段和视频片段;特征提取步骤:从每个音频片段中提取语音特征,从每个视频片段中提取表情特征;模型训练步骤:以各音频片段的语音特征和欺诈标注为样本数据,对第一支持向量机进行训练,得到语音分析模型;以各视频片段的表情特征和欺诈标注为样本数据,对第二支持向量机进行训练,得到表情分析模型;模型应用步骤:采集待识别对象的音视频数据,利用所述语音分析模型和表情分析模型对该音视频数据进行分析,输出该待识别对象的音频欺诈概率P1和视频欺诈概率P2;及加权计算步骤:根据所述语音分析模型和表情分析模型的权重将P1、P2加权计算,得到该待识别对象的欺诈识别结果。
- 如权利要求1所述的欺诈识别方法,其特征在于,所述特征提取步骤中提取语音特征包括:第一特征提取步骤:从每个音频片段中提取低阶音频特征;第二特征提取步骤:从各低阶语音特征中提取动态回归系数,得到每个音频片段的动态音频特征;第三特征提取步骤:利用统计函数从所述低阶音频特征及动态音频特征中提取各音频片段的高阶音频特征;及筛选步骤:利用特征筛选算法从各音频片段的高阶音频特征中筛选出高阶音频特征子集,将高阶音频特征子集作为各音频片段的语音特征。
- 如权利要求2所述的欺诈识别方法,其特征在于,所述低阶音频特征包括梅尔频率倒谱系数、音高和过零率。
- 如权利要求1所述的欺诈识别方法,其特征在于,所述特征提取步骤中提取表情特征包括:低阶特征提取步骤:从每个视频片段中提取低阶动作特征;高阶特征构造步骤:统计每个视频片段中各低阶动作特征出现的次数及持续的时长,根据统计结果构造各视频片段的高阶动作特征;及筛选步骤:利用特征筛选算法从各视频片段的高阶动作特征中筛选出高阶动作特征子集,将高阶动作特征子集作为各视频片段的表情特征。
- 如权利要求2所述的欺诈识别方法,其特征在于,所述特征提取步骤中提取表情特征包括:低阶特征提取步骤:从每个视频片段中提取低阶动作特征;高阶特征构造步骤:统计每个视频片段中各低阶动作特征出现的次数及持续的时长,根据统计结果构造各视频片段的高阶动作特征;及筛选步骤:利用特征筛选算法从各视频片段的高阶动作特征中筛选出高阶动作特征子集,将高阶动作特征子集作为各视频片段的表情特征。
- 如权利要求4或5所述的欺诈识别方法,其特征在于,所述低阶动作特征包括头部朝向、眼球朝向和面部动作单元(action unit,AU)。
- 如权利要求1-5任一项所述的欺诈识别方法,其特征在于,所述模型应用步骤还包括以下步骤:对待识别对象的音视频数据进行解码和预处理,得到该待识别对象的音频数据和视频数据;从该待识别对象的音频数据中提取语音特征,从该待识别对象的视频数据中提取表情特征。
- 一种电子装置,包括存储器和处理器,其特征在于,所述存储器中包括欺诈识别程序,所述欺诈识别程序被所述处理器执行时实现如下步骤:样本准备步骤:收集人物音视频样本,对音视频样本进行切割,得到音视频片段,为每个音视频片段分配一个欺诈标注,对每个音视频片段进行解码和预处理,得到每个音视频片段的音频片段和视频片段;特征提取步骤:从每个音频片段中提取语音特征,从每个视频片段中提取表情特征;模型训练步骤:以各音频片段的语音特征和欺诈标注为样本数据,对第一支持向量机进行训练,得到语音分析模型;以各视频片段的表情特征和欺诈标注为样本数据,对第二支持向量机进行训练,得到表情分析模型;模型应用步骤:采集待识别对象的音视频数据,利用所述语音分析模型和表情分析模型对该音视频数据进行分析,输出该待识别对象的音频欺诈概率P1和视频欺诈概率P2;及加权计算步骤:根据所述语音分析模型和表情分析模型的权重将P1、P2加权计算,得到该待识别对象的欺诈识别结果。
- 如权利要求8所述的电子装置,其特征在于,所述特征提取步骤中提取语音特征包括:第一特征提取步骤:从每个音频片段中提取低阶音频特征;第二特征提取步骤:从各低阶语音特征中提取动态回归系数,得到每个音频片段的动态音频特征;第三特征提取步骤:利用统计函数从所述低阶音频特征及动态音频特征中提取各音频片段的高阶音频特征;及筛选步骤:利用特征筛选算法从各音频片段的高阶音频特征中筛选出高阶音频特征子集,将高阶音频特征子集作为各音频片段的语音特征。
- 如权利要求9所述的电子装置,其特征在于,所述低阶音频特征包括梅尔频率倒谱系数、音高和过零率。
- 如权利要求8所述的电子装置,其特征在于,所述特征提取步骤中提取表情特征包括:低阶特征提取步骤:从每个视频片段中提取低阶动作特征;高阶特征构造步骤:统计每个视频片段中各低阶动作特征出现的次数及持续的时长,根据统计结果构造各视频片段的高阶动作特征;及筛选步骤:利用特征筛选算法从各视频片段的高阶动作特征中筛选出高阶动作特征子集,将高阶动作特征子集作为各视频片段的表情特征。
- 如权利要求9所述的电子装置,其特征在于,所述特征提取步骤中提取表情特征包括:低阶特征提取步骤:从每个视频片段中提取低阶动作特征;高阶特征构造步骤:统计每个视频片段中各低阶动作特征出现的次数及持续的时长,根据统计结果构造各视频片段的高阶动作特征;及筛选步骤:利用特征筛选算法从各视频片段的高阶动作特征中筛选出高阶动作特征子集,将高阶动作特征子集作为各视频片段的表情特征。
- 如权利要求11或12所述的电子装置,其特征在于,所述低阶动作特征包括头部朝向、眼球朝向和面部动作单元(action unit,AU)。
- 如权利要求8-12任一项所述的电子装置,其特征在于,所述模型应用步骤还包括以下步骤:对待识别对象的音视频数据进行解码和预处理,得到该待识别对象的音频数据和视频数据;从该待识别对象的音频数据中提取语音特征,从该待识别对象的视频数据中提取表情特征。
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中包括欺诈识别程序,所述欺诈识别程序被处理器执行时实现如下步骤:样本准备步骤:收集人物音视频样本,对音视频样本进行切割,得到音视频片段,为每个音视频片段分配一个欺诈标注,对每个音视频片段进行解码和预处理,得到每个音视频片段的音频片段和视频片段;特征提取步骤:从每个音频片段中提取语音特征,从每个视频片段中提取表情特征;模型训练步骤:以各音频片段的语音特征和欺诈标注为样本数据,对第一支持向量机进行训练,得到语音分析模型;以各视频片段的表情特征和欺诈标注为样本数据,对第二支持向量机进行训练,得到表情分析模型;模型应用步骤:采集待识别对象的音视频数据,利用所述语音分析模型和表情分析模型对该音视频数据进行分析,输出该待识别对象的音频欺诈概率P1和视频欺诈概率P2;及加权计算步骤:根据所述语音分析模型和表情分析模型的权重将P1、P2加权计算,得到该待识别对象的欺诈识别结果。
- 如权利要求15所述的计算机可读存储介质,其特征在于,所述特征提取步骤中提取语音特征包括:第一特征提取步骤:从每个音频片段中提取低阶音频特征;第二特征提取步骤:从各低阶语音特征中提取动态回归系数,得到每个音频片段的动态音频特征;第三特征提取步骤:利用统计函数从所述低阶音频特征及动态音频特征中提取各音频片段的高阶音频特征;及筛选步骤:利用特征筛选算法从各音频片段的高阶音频特征中筛选出高阶音频特征子集,将高阶音频特征子集作为各音频片段的语音特征。
- 如权利要求16所述的计算机可读存储介质,其特征在于,所述低阶音频特征包括梅尔频率倒谱系数、音高和过零率。
- 如权利要求15或16所述的计算机可读存储介质,其特征在于,所述特征提取步骤中提取表情特征包括:低阶特征提取步骤:从每个视频片段中提取低阶动作特征;高阶特征构造步骤:统计每个视频片段中各低阶动作特征出现的次数及持续的时长,根据统计结果构造各视频片段的高阶动作特征;及筛选步骤:利用特征筛选算法从各视频片段的高阶动作特征中筛选出高阶动作特征子集,将高阶动作特征子集作为各视频片段的表情特征。
- 如权利要求18所述的计算机可读存储介质,其特征在于,所述低阶动作特征包括头部朝向、眼球朝向和面部动作单元(action unit,AU)。
- 如权利要求19所述的计算机可读存储介质,其特征在于,所模型应用步骤还包括以下步骤:对待识别对象的音视频数据进行解码和预处理,得到该待识别对象的音频数据和视频数据;从该待识别对象的音频数据中提取语音特征,从该待识别对象的视频数据中提取表情特征。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711252009.1A CN108053838B (zh) | 2017-12-01 | 2017-12-01 | 结合音频分析和视频分析的欺诈识别方法、装置及存储介质 |
CN201711252009.1 | 2017-12-01 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019104890A1 true WO2019104890A1 (zh) | 2019-06-06 |
Family
ID=62121930
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/077345 WO2019104890A1 (zh) | 2017-12-01 | 2018-02-27 | 结合音频分析和视频分析的欺诈识别方法、装置及存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108053838B (zh) |
WO (1) | WO2019104890A1 (zh) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111444379A (zh) * | 2020-03-30 | 2020-07-24 | 腾讯科技(深圳)有限公司 | 音频的特征向量生成方法及音频片段表示模型的训练方法 |
CN111460907A (zh) * | 2020-03-05 | 2020-07-28 | 浙江大华技术股份有限公司 | 一种恶意行为识别方法、系统及存储介质 |
CN112133327A (zh) * | 2020-09-17 | 2020-12-25 | 腾讯音乐娱乐科技(深圳)有限公司 | 一种音频样本的提取方法、设备、终端及存储介质 |
CN112331230A (zh) * | 2020-11-17 | 2021-02-05 | 平安科技(深圳)有限公司 | 一种欺诈行为识别方法、装置、计算机设备及存储介质 |
CN112562687A (zh) * | 2020-12-11 | 2021-03-26 | 天津讯飞极智科技有限公司 | 音视频处理方法、装置、录音笔和存储介质 |
CN113314103A (zh) * | 2021-05-31 | 2021-08-27 | 中国工商银行股份有限公司 | 基于实时语音情感分析的非法信息识别方法及装置 |
CN113409822A (zh) * | 2021-05-31 | 2021-09-17 | 青岛海尔科技有限公司 | 对象状态的确定方法、装置、存储介质及电子装置 |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109389489B (zh) * | 2018-09-25 | 2023-04-18 | 平安科技(深圳)有限公司 | 欺诈行为的识别方法、计算机可读存储介质及终端设备 |
CN109376603A (zh) * | 2018-09-25 | 2019-02-22 | 北京周同科技有限公司 | 一种视频识别方法、装置、计算机设备及存储介质 |
CN109522799A (zh) * | 2018-10-16 | 2019-03-26 | 深圳壹账通智能科技有限公司 | 信息提示方法、装置、计算机设备和存储介质 |
CN109472487A (zh) * | 2018-11-02 | 2019-03-15 | 深圳壹账通智能科技有限公司 | 视频质检方法、装置、计算机设备及存储介质 |
CN109493882A (zh) * | 2018-11-04 | 2019-03-19 | 国家计算机网络与信息安全管理中心 | 一种诈骗电话语音自动标注系统及方法 |
CN109831677B (zh) * | 2018-12-14 | 2022-04-01 | 平安科技(深圳)有限公司 | 视频脱敏方法、装置、计算机设备和存储介质 |
CN109858330A (zh) * | 2018-12-15 | 2019-06-07 | 深圳壹账通智能科技有限公司 | 基于视频的表情分析方法、装置、电子设备及存储介质 |
CN109729383B (zh) * | 2019-01-04 | 2021-11-02 | 深圳壹账通智能科技有限公司 | 双录视频质量检测方法、装置、计算机设备和存储介质 |
CN109800720B (zh) * | 2019-01-23 | 2023-12-22 | 平安科技(深圳)有限公司 | 情绪识别模型训练方法、情绪识别方法、装置、设备及存储介质 |
CN111144197A (zh) * | 2019-11-08 | 2020-05-12 | 宇龙计算机通信科技(深圳)有限公司 | 人性识别方法、装置、存储介质和电子设备 |
CN111339940B (zh) * | 2020-02-26 | 2023-07-21 | 中国工商银行股份有限公司 | 视频风险识别方法及装置 |
SG10202006357UA (en) | 2020-07-01 | 2020-09-29 | Alipay Labs Singapore Pte Ltd | A Document Identification Method and System |
CN112202720B (zh) * | 2020-09-04 | 2023-05-02 | 中移雄安信息通信科技有限公司 | 音视频识别方法、装置、电子设备及计算机存储介质 |
CN112040488A (zh) * | 2020-09-10 | 2020-12-04 | 安徽师范大学 | 基于mac地址和信道状态双层指纹的非法设备识别方法 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103023900A (zh) * | 2012-12-06 | 2013-04-03 | 北京百度网讯科技有限公司 | 基于云服务器端的身份认证方法、云服务系统和云服务器 |
CN103226948A (zh) * | 2013-04-22 | 2013-07-31 | 山东师范大学 | 一种基于声学事件的音频场景识别方法 |
CN103971700A (zh) * | 2013-08-01 | 2014-08-06 | 哈尔滨理工大学 | 语音监控方法及装置 |
CN105100363A (zh) * | 2015-06-29 | 2015-11-25 | 小米科技有限责任公司 | 信息处理方法、装置及终端 |
US20160050197A1 (en) * | 2014-08-14 | 2016-02-18 | Bank Of America Corporation | Audio authentication system |
CN105718874A (zh) * | 2016-01-18 | 2016-06-29 | 北京天诚盛业科技有限公司 | 活体检测及认证的方法和装置 |
CN106157135A (zh) * | 2016-07-14 | 2016-11-23 | 微额速达(上海)金融信息服务有限公司 | 基于声纹识别性别年龄的防欺诈系统及方法 |
-
2017
- 2017-12-01 CN CN201711252009.1A patent/CN108053838B/zh active Active
-
2018
- 2018-02-27 WO PCT/CN2018/077345 patent/WO2019104890A1/zh active Application Filing
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103023900A (zh) * | 2012-12-06 | 2013-04-03 | 北京百度网讯科技有限公司 | 基于云服务器端的身份认证方法、云服务系统和云服务器 |
CN103226948A (zh) * | 2013-04-22 | 2013-07-31 | 山东师范大学 | 一种基于声学事件的音频场景识别方法 |
CN103971700A (zh) * | 2013-08-01 | 2014-08-06 | 哈尔滨理工大学 | 语音监控方法及装置 |
US20160050197A1 (en) * | 2014-08-14 | 2016-02-18 | Bank Of America Corporation | Audio authentication system |
CN105100363A (zh) * | 2015-06-29 | 2015-11-25 | 小米科技有限责任公司 | 信息处理方法、装置及终端 |
CN105718874A (zh) * | 2016-01-18 | 2016-06-29 | 北京天诚盛业科技有限公司 | 活体检测及认证的方法和装置 |
CN106157135A (zh) * | 2016-07-14 | 2016-11-23 | 微额速达(上海)金融信息服务有限公司 | 基于声纹识别性别年龄的防欺诈系统及方法 |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111460907A (zh) * | 2020-03-05 | 2020-07-28 | 浙江大华技术股份有限公司 | 一种恶意行为识别方法、系统及存储介质 |
CN111460907B (zh) * | 2020-03-05 | 2023-06-20 | 浙江大华技术股份有限公司 | 一种恶意行为识别方法、系统及存储介质 |
CN111444379A (zh) * | 2020-03-30 | 2020-07-24 | 腾讯科技(深圳)有限公司 | 音频的特征向量生成方法及音频片段表示模型的训练方法 |
CN111444379B (zh) * | 2020-03-30 | 2023-08-08 | 腾讯科技(深圳)有限公司 | 音频的特征向量生成方法及音频片段表示模型的训练方法 |
CN112133327A (zh) * | 2020-09-17 | 2020-12-25 | 腾讯音乐娱乐科技(深圳)有限公司 | 一种音频样本的提取方法、设备、终端及存储介质 |
CN112133327B (zh) * | 2020-09-17 | 2024-02-13 | 腾讯音乐娱乐科技(深圳)有限公司 | 一种音频样本的提取方法、设备、终端及存储介质 |
CN112331230A (zh) * | 2020-11-17 | 2021-02-05 | 平安科技(深圳)有限公司 | 一种欺诈行为识别方法、装置、计算机设备及存储介质 |
CN112562687A (zh) * | 2020-12-11 | 2021-03-26 | 天津讯飞极智科技有限公司 | 音视频处理方法、装置、录音笔和存储介质 |
CN112562687B (zh) * | 2020-12-11 | 2023-08-04 | 天津讯飞极智科技有限公司 | 音视频处理方法、装置、录音笔和存储介质 |
CN113314103A (zh) * | 2021-05-31 | 2021-08-27 | 中国工商银行股份有限公司 | 基于实时语音情感分析的非法信息识别方法及装置 |
CN113409822A (zh) * | 2021-05-31 | 2021-09-17 | 青岛海尔科技有限公司 | 对象状态的确定方法、装置、存储介质及电子装置 |
CN113314103B (zh) * | 2021-05-31 | 2023-03-03 | 中国工商银行股份有限公司 | 基于实时语音情感分析的非法信息识别方法及装置 |
Also Published As
Publication number | Publication date |
---|---|
CN108053838A (zh) | 2018-05-18 |
CN108053838B (zh) | 2019-10-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2019104890A1 (zh) | 结合音频分析和视频分析的欺诈识别方法、装置及存储介质 | |
WO2019085329A1 (zh) | 基于循环神经网络的人物性格分析方法、装置及存储介质 | |
WO2019085331A1 (zh) | 欺诈可能性分析方法、装置及存储介质 | |
WO2019085330A1 (zh) | 人物性格分析方法、装置及存储介质 | |
WO2019119505A1 (zh) | 人脸识别的方法和装置、计算机装置及存储介质 | |
CN104598644B (zh) | 喜好标签挖掘方法和装置 | |
CN110619568A (zh) | 风险评估报告的生成方法、装置、设备及存储介质 | |
CN106683688B (zh) | 一种情绪检测方法及装置 | |
US20210398416A1 (en) | Systems and methods for a hand hygiene compliance checking system with explainable feedback | |
US20160232403A1 (en) | Arabic sign language recognition using multi-sensor data fusion | |
CN110222331B (zh) | 谎言识别方法及装置、存储介质、计算机设备 | |
WO2019109530A1 (zh) | 情绪识别方法、装置及存储介质 | |
US20230410222A1 (en) | Information processing apparatus, control method, and program | |
CN112768070A (zh) | 一种基于对话交流的精神健康评测方法和系统 | |
CN111738199B (zh) | 图像信息验证方法、装置、计算装置和介质 | |
CN113243918A (zh) | 基于多模态隐匿信息测试的风险检测方法及装置 | |
CN110717407A (zh) | 基于唇语密码的人脸识别方法、装置及存储介质 | |
CN106980658A (zh) | 视频标注方法及装置 | |
CN110393539B (zh) | 心理异常检测方法、装置、存储介质及电子设备 | |
CN112397052A (zh) | Vad断句测试方法、装置、计算机设备及存储介质 | |
CN111326142A (zh) | 基于语音转文本的文本信息提取方法、系统和电子设备 | |
CN116130088A (zh) | 多模态面诊问诊方法、装置及相关设备 | |
CN112911334A (zh) | 基于音视频数据的情绪识别方法、装置、设备及存储介质 | |
CN113921098A (zh) | 一种医疗服务评价方法和系统 | |
CN113808619B (zh) | 一种语音情绪识别方法、装置及电子设备 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18884240 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18884240 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 06.10.2020) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18884240 Country of ref document: EP Kind code of ref document: A1 |