CN108053838B - In conjunction with fraud recognition methods, device and the storage medium of audio analysis and video analysis - Google Patents

In conjunction with fraud recognition methods, device and the storage medium of audio analysis and video analysis Download PDF

Info

Publication number
CN108053838B
CN108053838B CN201711252009.1A CN201711252009A CN108053838B CN 108053838 B CN108053838 B CN 108053838B CN 201711252009 A CN201711252009 A CN 201711252009A CN 108053838 B CN108053838 B CN 108053838B
Authority
CN
China
Prior art keywords
audio
video
fraud
feature
identified
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711252009.1A
Other languages
Chinese (zh)
Other versions
CN108053838A (en
Inventor
韦峰
徐国强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Smart Technology Co Ltd
Original Assignee
OneConnect Smart Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Smart Technology Co Ltd filed Critical OneConnect Smart Technology Co Ltd
Priority to CN201711252009.1A priority Critical patent/CN108053838B/en
Priority to PCT/CN2018/077345 priority patent/WO2019104890A1/en
Publication of CN108053838A publication Critical patent/CN108053838A/en
Application granted granted Critical
Publication of CN108053838B publication Critical patent/CN108053838B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention provides fraud recognition methods, device and the storage medium of a kind of combination audio analysis and video analysis.Method includes the following steps: cutting audio-video sample, obtains audio-video segment, a fraud mark is distributed for each audio-video segment;Each audio-video segment is decoded and is pre-processed, the audio fragment and video clip of each audio-video segment are obtained;Phonetic feature and expressive features are extracted respectively from each audio fragment and video clip;Fraud mark Training Support Vector Machines are combined with the expressive features of the phonetic feature of each audio fragment and each video clip respectively, obtain speech analysis model and Expression analysis model;Acquire the audio, video data of object to be identified;Extract the phonetic feature and expressive features of the audio, video data;The phonetic feature and expressive features are inputted into the speech analysis model and Expression analysis model respectively, export the probability of cheating P1 and P2 of the object to be identified;By P1, P2 weighted calculation, the fraud recognition result of object to be identified is obtained.

Description

In conjunction with fraud recognition methods, device and the storage medium of audio analysis and video analysis
Technical field
The present invention relates to technical field of computer information processing more particularly to a kind of combination audio analysis and video analysis Cheat recognition methods, device and computer readable storage medium.
Background technique
Currently, fraud identification is generally realized in such a way that face is examined, the experience and judgement of extreme dependency analysis personnel expend A large amount of time and manpower, analysis result are often inaccurate objective.Also there is the instrument and equipment using profession, breathed by detection, A series of indexs such as pulse, blood pressure, skin resistance judge tested personnel whether there is or not fraud suspicion, but the usual valence of such instrument and equipment Lattice are expensive and are easy to constitute infringement to the human rights of tested personnel.
Summary of the invention
To solve the shortcomings of the prior art, the present invention provides the fraud knowledge of a kind of combination audio analysis and video analysis Other method, apparatus and computer readable storage medium, it is objective, accurately sentence by analyzing the audio, video data of object to be identified Disconnected object to be identified whether there is fraud suspicion.
To achieve the above object, the present invention provides the fraud recognition methods of a kind of combination audio analysis and video analysis, answers For electronic device, this method comprises:
Sample preparation process: collecting personage's audio-video sample, cut to audio-video sample, obtain audio-video segment, A fraud mark is distributed for each audio-video segment, each audio-video segment is decoded and is pre-processed, each sound is obtained The audio fragment and video clip of video clip;
Characteristic extraction step: extracting phonetic feature from each audio fragment, and it is special that expression is extracted from each video clip Sign;
Model training step: being labeled as sample data with the phonetic feature and fraud of each audio fragment, to first support to Amount machine is trained, and obtains speech analysis model;It is labeled as sample data with the expressive features and fraud of each video clip, to Two support vector machines are trained, and obtain Expression analysis model;
Model applying step: acquiring the audio, video data of object to be identified, utilizes the speech analysis model and expression point Analysis model analyzes the audio, video data, exports the audio probability of cheating P1 and video probability of cheating of the object to be identified P2;And
Weighted calculation step: according to the weight of the speech analysis model and Expression analysis model by P1, P2 weighted calculation, Obtain the fraud recognition result of the object to be identified.
Preferably, phonetic feature is extracted in the characteristic extraction step includes:
Fisrt feature extraction step: low order audio frequency characteristics are extracted from each audio fragment;
Second feature extraction step: dynamic regressioncoefficients are extracted from each low order phonetic feature, obtain each audio fragment Dynamic audio frequency feature;
Third feature extraction step: it is extracted from the low order audio frequency characteristics and dynamic audio frequency feature respectively using statistical function The high-order audio frequency characteristics of audio fragment;And
Screening step: it is special that high-order audio is filtered out from the high-order audio frequency characteristics of each audio fragment using Feature Selection algorithm Subset is levied, using high-order audio frequency characteristics subset as the phonetic feature of each audio fragment.
Preferably, the low order audio frequency characteristics include mel-frequency cepstrum coefficient, pitch and zero-crossing rate.
Preferably, expressive features are extracted in the characteristic extraction step includes:
Low order characteristic extraction step: low order motion characteristic is extracted from each video clip;
High-order latent structure step: count each low order motion characteristic occurs in each video clip number and it is lasting when It is long, the high-order motion characteristic of each video clip is constructed according to statistical result;And
Screening step: it is special that high-order movement is filtered out from the high-order motion characteristic of each video clip using Feature Selection algorithm Subset is levied, using high-order motion characteristic subset as the expressive features of each video clip.
Preferably, the low order motion characteristic includes head direction, eyeball direction and Facial action unit (action Unit, AU).
Preferably, the model applying step is further comprising the steps of:
The audio, video data of object to be identified is decoded and is pre-processed, obtain the object to be identified audio data and Video data;
Phonetic feature is extracted from the audio data of the object to be identified, is extracted from the video data of the object to be identified Expressive features.
The present invention also provides a kind of electronic device, which includes memory and processor, is wrapped in the memory Include fraud recognizer.The electronic device is directly or indirectly connected with photographic device, and photographic device is by the audio-video of acquisition Data transmission is to electronic device.When the processor of the electronic device executes the fraud recognizer in memory, following step is realized It is rapid:
Sample preparation process: collecting personage's audio-video sample, cut to audio-video sample, obtain audio-video segment, A fraud mark is distributed for each audio-video segment, each audio-video segment is decoded and is pre-processed, each sound is obtained The audio fragment and video clip of video clip;
Characteristic extraction step: extracting phonetic feature from each audio fragment, and it is special that expression is extracted from each video clip Sign;
Model training step: being labeled as sample data with the phonetic feature and fraud of each audio fragment, to first support to Amount machine is trained, and obtains speech analysis model;It is labeled as sample data with the expressive features and fraud of each video clip, to Two support vector machines are trained, and obtain Expression analysis model;
Model applying step: acquiring the audio, video data of object to be identified, utilizes the speech analysis model and expression point Analysis model analyzes the audio, video data, exports the audio probability of cheating P1 and video probability of cheating of the object to be identified P2;And
Weighted calculation step: according to the weight of the speech analysis model and Expression analysis model by P1, P2 weighted calculation, Obtain the fraud recognition result of the object to be identified.
Preferably, phonetic feature is extracted in the characteristic extraction step includes:
Fisrt feature extraction step: low order audio frequency characteristics are extracted from each audio fragment;
Second feature extraction step: dynamic regressioncoefficients are extracted from each low order phonetic feature, obtain each audio fragment Dynamic audio frequency feature;
Third feature extraction step: it is extracted from the low order audio frequency characteristics and dynamic audio frequency feature respectively using statistical function The high-order audio frequency characteristics of audio fragment;And
Screening step: it is special that high-order audio is filtered out from the high-order audio frequency characteristics of each audio fragment using Feature Selection algorithm Subset is levied, using high-order audio frequency characteristics subset as the phonetic feature of each audio fragment.
Preferably, the low order audio frequency characteristics include mel-frequency cepstrum coefficient, pitch and zero-crossing rate.
Preferably, expressive features are extracted in the characteristic extraction step includes:
Low order characteristic extraction step: low order motion characteristic is extracted from each video clip;
High-order latent structure step: count each low order motion characteristic occurs in each video clip number and it is lasting when It is long, the high-order motion characteristic of each video clip is constructed according to statistical result;And
Screening step: it is special that high-order movement is filtered out from the high-order motion characteristic of each video clip using Feature Selection algorithm Subset is levied, using high-order motion characteristic subset as the expressive features of each video clip.
Preferably, the low order motion characteristic includes head direction, eyeball direction and face AU.
Preferably, the model applying step is further comprising the steps of:
The audio, video data of object to be identified is decoded and is pre-processed, obtain the object to be identified audio data and Video data;
Phonetic feature is extracted from the audio data of the object to be identified, is extracted from the video data of the object to be identified Expressive features.
In addition, to achieve the above object, it is described computer-readable the present invention also provides a kind of computer readable storage medium Include fraud recognizer in storage medium, when the fraud recognizer is executed by processor, realizes as described above combine Arbitrary steps in the fraud recognition methods of audio analysis and video analysis.
Fraud recognition methods, device and the computer-readable storage of combination audio analysis and video analysis provided by the invention Medium, by extracting the phonetic feature of audio-video sample audio fragment and the expressive features of video clip, in conjunction with corresponding fraud Mark, is trained support vector machines, obtains speech analysis model and Expression analysis model.Later, by trained model Applied to real-time fraud identification link: acquiring the audio, video data of object to be identified, the voice for extracting the audio, video data is special It seeks peace expressive features, the phonetic feature and expressive features is inputted into the speech analysis model and Expression analysis mould that training obtains respectively Type exports the audio probability of cheating P1 and video probability of cheating P2 of the object to be identified, by P1, P2 Weighted Fusion, obtain this to Identify the fraud recognition result of object.Using the present invention, objective, accurately it can identify that personage whether there is fraud suspicion.
Detailed description of the invention
Fig. 1 is the applied environment figure of the first preferred embodiment of electronic device of the present invention.
Fig. 2 is the applied environment figure of the second preferred embodiment of electronic device of the present invention.
Fig. 3 is the Program modual graph that recognizer is cheated in Fig. 1, Fig. 2.
Fig. 4 is the flow chart for the fraud recognition methods preferred embodiment that the present invention combines audio analysis and video analysis.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
The principle and spirit of the invention are described below with reference to several specific embodiments.It should be appreciated that described herein Specific embodiment be only used to explain the present invention, be not intended to limit the present invention.
It is the applied environment figure of the first preferred embodiment of electronic device of the present invention shown in referring to Fig.1.In this embodiment, Photographic device 3 connects electronic device 1 by network 2, and photographic device 3 acquires the audio, video data of personage, is sent to by network 2 Electronic device 1, electronic device 1 analyze the audio, video data using fraud recognizer 10 provided by the invention, obtain personage Fraud recognition result.
Electronic device 1, which can be server, smart phone, tablet computer, portable computer, desktop PC etc., to be had The terminal device of storage and calculation function.
The electronic device 1 includes memory 11, processor 12, network interface 13 and communication bus 14.
Photographic device 3 is installed on particular place, such as office space, monitoring area, for acquiring the audio-video number of personage According to the audio, video data is then transmitted to memory 11 by network 2.Network interface 13 may include that the wired of standard connects Mouth, wireless interface (such as WI-FI interface).Communication bus 14 is for realizing the connection communication between these components.
Memory 11 includes the readable storage medium storing program for executing of at least one type.The readable storage medium storing program for executing of at least one type It can be the non-volatile memory medium of such as flash memory, hard disk, multimedia card, card-type memory.In some embodiments, described can Reading storage medium can be the internal storage unit of the electronic device 1, such as the hard disk of the electronic device 1.In other realities It applies in example, the readable storage medium storing program for executing is also possible to the external memory 11 of the electronic device 1, such as the electronic device 1 The plug-in type hard disk of upper outfit, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) block, flash card (Flash Card) etc..
In the present embodiment, program code, the photographic device 3 that the memory 11 stores the fraud recognizer 10 are adopted Other data that the program code that the audio, video data and processor 12 of collection execute fraud recognizer 10 is applied to and most The data etc. exported afterwards.
Processor 12 can be in some embodiments a central processing unit (Central Processing Unit, CPU), microprocessor or other data processing chips.
Fig. 1 illustrates only the electronic device 1 with component 11-14, it should be understood that being not required for implementing all show Component out, the implementation that can be substituted is more or less component.
Optionally, which can also include user interface, and user interface may include input unit such as keyboard (Keyboard), speech input device such as microphone (microphone) etc. has the equipment of speech identifying function, voice defeated Device such as sound equipment, earphone etc. out, optionally user interface can also include standard wireline interface and wireless interface.
Optionally, which can also include display.Display can be LED in some embodiments and show Device, liquid crystal display, touch-control liquid crystal display and OLED (Organic Light-Emitting Diode, organic light emission Diode) touch device etc..Display is used for the information and visual user interface that display electronics assemblies 1 are handled.
Optionally, which further includes touch sensor.It is touched provided by the touch sensor for user The region for touching operation is known as touch area.In addition, touch sensor described here can be resistive touch sensor, capacitor Formula touch sensor etc..Moreover, the touch sensor not only includes the touch sensor of contact, proximity may also comprise Touch sensor etc..In addition, the touch sensor can be single sensor, or such as multiple biographies of array arrangement Sensor.User, such as psychologist, credit face examine personnel etc., can start fraud recognizer 10 by touching.
The electronic device 1 can also include radio frequency (Radio Frequency, RF) circuit, sensor and voicefrequency circuit etc. Deng details are not described herein.
It is the applied environment figure of the second preferred embodiment of electronic device of the present invention referring to shown in Fig. 2.User passes through terminal 3 Realize fraud identification process, the photographic device 30 of terminal 3 acquires the audio, video data of object to be identified, and transmits by network 2 To the electronic device 1, the processor 12 of electronic device 1 executes the program generation for the fraud recognizer 10 that memory 11 stores Code, analyzes the audio data and video data of audio, video data, exports the audio probability of cheating P1 of the object to be identified P1, P2 weighted calculation are obtained into the fraud recognition result of the object to be identified, for object to be identified with video probability of cheating P2 Or the reference such as audit crew.
The component of electronic device 1 in Fig. 2, such as memory 11, processor 12 shown in figure, network interface 13 and communication Bus 14 and component not shown in the figure, please refer to the introduction about Fig. 1.
The terminal 3 can have for smart phone, tablet computer, portable computer, desktop PC etc. storage and The terminal device of calculation function.
Fraud recognizer 10 in Fig. 1, Fig. 2 is performed the steps of when being executed by processor 12
Sample preparation process: collecting personage's audio-video sample, cut to audio-video sample, obtain audio-video segment, A fraud mark is distributed for each audio-video segment, each audio-video segment is decoded and is pre-processed, each sound is obtained The audio fragment and video clip of video clip;
Characteristic extraction step: extracting phonetic feature from each audio fragment, and it is special that expression is extracted from each video clip Sign;
Model training step: being labeled as sample data with the phonetic feature and fraud of each audio fragment, to first support to Amount machine is trained, and obtains speech analysis model;It is labeled as sample data with the expressive features and fraud of each video clip, to Two support vector machines are trained, and obtain Expression analysis model;
Model applying step: acquiring the audio, video data of object to be identified, utilizes the speech analysis model and expression point Analysis model analyzes the audio, video data, exports the audio probability of cheating P1 and video probability of cheating of the object to be identified P2;And
Weighted calculation step: according to the weight of the speech analysis model and Expression analysis model by P1, P2 weighted calculation, Obtain the fraud recognition result of the object to be identified.
About being discussed in detail for above-mentioned steps, please refer to following Fig. 3 about the Program modual graph of fraud recognizer 10 and Explanation of the Fig. 4 about the flow chart for the fraud recognition methods preferred embodiment for combining audio analysis and video analysis.
It is the Program modual graph that recognizer 10 is cheated in Fig. 1, Fig. 2 referring to shown in Fig. 3.In the present embodiment, fraud is known Other program 10 is divided into multiple modules, and multiple module is stored in memory 11, and is executed by processor 12, to complete The present invention.The so-called module of the present invention is the series of computation machine program instruction section for referring to complete specific function.
The fraud recognizer 10 can be divided into: obtain module 110, extraction module 120, training module 130, mould Type application module 140 and weighted calculation module 150.
It obtains module 110 and obtains corresponding audio for obtaining the audio-video of personage and it being decoded and is pre-processed Part and video section.The audio-video can be the acquisition of photographic device 30 of the photographic device 3 or Fig. 2 by Fig. 1, can also To be the audio-video for being clearly present fraud chosen from the network information or audio and video data library and without the sound of fraud Video.The audio-video sample for Training Support Vector Machines is cut as unit of mood, obtains audio-video segment, is every A audio-video segment distributes a fraud mark, and the fraud mark indicates the personage in the audio-video segment, and whether there is or not frauds to dislike It doubts, such as 1 indicates fraud suspicion, 0 indicates without fraud suspicion.Audio-video is decoded and is pre-processed, corresponding sound is obtained Frequency part and video section.
Extraction module 120, for extracting the phonetic feature of the audio-frequency unit and the expressive features of video section.Extract mould Block 120 extracts phonetic feature from each audio-frequency unit that module 110 obtains is obtained, each video obtained from acquisition module 110 Extracting section expressive features.
When extraction module 120 extracts the phonetic feature of an audio-frequency unit, mel-frequency first is extracted from the audio-frequency unit and is fallen The low orders audio frequency characteristics such as spectral coefficient, pitch, zero-crossing rate, then dynamic regressioncoefficients are extracted from these low order audio frequency characteristics, it obtains Then the dynamic audio frequency feature of the audio-frequency unit is mentioned from the low order audio frequency characteristics and dynamic audio frequency feature using statistical function High-order audio frequency characteristics are taken, high-order audio frequency characteristics subset are finally filtered out from high-order audio frequency characteristics with Feature Selection algorithm, by this Phonetic feature of the high-order audio frequency characteristics subset as the audio-frequency unit.
In the present embodiment, OpenSMILE software can be used to extract mel-frequency cepstrum coefficient, the sound of audio-frequency unit The low orders audio frequency characteristics such as height, zero-crossing rate.The dynamic regressioncoefficients are used to indicate the significance level of low order audio frequency characteristics.For example, Some low order audio frequency characteristics (such as pitch parameters) of some audio-frequency unit are indicated with a wave file, then the wave file It can be indicated with the mode of multiple linear regression are as follows:
Y=β01X12X2+…+βKXK
Wherein, k is number of the low order audio frequency characteristics in the audio-frequency unit, βj(j=1,2 ..., k) it is low order audio The dynamic regressioncoefficients of feature.
The statistical function includes the maximum value, minimum value, peak for extracting low order audio frequency characteristics and dynamic audio frequency feature The data extracted using statistical function are combined, are converted by the function of degree, the degree of bias etc., extraction module 120, obtain high-order sound Frequency feature.The quantity for the high-order audio frequency characteristics that each audio-frequency unit extracts is often very big, but usually only small part high-order sound Frequency feature can make a significant impact the result of fraud identification, so, it is special that we reduce high-order audio with Feature Selection algorithm The quantity of sign improves fraud recognition speed.In the present embodiment, the Feature Selection algorithm can be before sequence to selection (Sequential Forward Selection, SFS) algorithm, sequence backward selection (Sequential Backward Selection, SBS) algorithm, bidirectional research (Bidirectional Search, BDS) algorithm, filtering characteristic selection (filter Feature selection) algorithm, it is also possible to other Feature Selection algorithms.
Similarly, when extraction module 120 extracts the expressive features of a video section, head first is extracted from the video section The low orders motion characteristics such as direction, eyeball direction and Facial action unit (action unit, AU), then count each low order movement spy The number occurred in the video section and lasting duration are levied, the high-order movement spy of the video section is constructed according to statistical result Sign, then filters out high-order motion characteristic subset, by the high-order motion characteristic with Feature Selection algorithm from high-order motion characteristic Expressive features of the subset as the video section.
Training module 130 is used for Training Support Vector Machines, obtains speech analysis model and Expression analysis model.To extract The phonetic feature of each audio-frequency unit for the audio-video sample that module 120 is extracted and the fraud for obtaining the distribution of module 110 are labeled as sample Notebook data is trained the first support vector machines, obtains speech analysis model;The audio-video sample extracted with extraction module 120 The expressive features of this each video section and the fraud for obtaining the distribution of module 110 are labeled as sample data, to the second supporting vector Machine is trained, and obtains Expression analysis model.
Model application module 140, for analyzing the audio, video data of object to be identified, the audio for obtaining object to be identified is taken advantage of Cheat probability and video probability of cheating.The voice of the audio-frequency unit of the audio-video for the object to be identified that extraction module 120 is extracted is special The speech analysis model that the sign input training of training module 130 obtains, exports the audio probability of cheating P1 of the object to be identified;It will be to It identifies that the expressive features of object video part input trained Expression analysis model, exports the video fraud of the object to be identified Probability P 2.
Weighted calculation module 150, for weighting the audio probability of cheating P1 of object to be identified and video probability of cheating P2 It calculates, obtains the fraud recognition result of the object to be identified.Training module 130 is obtained using sample data Training Support Vector Machines When speech analysis model and Expression analysis model, the accuracy rate of two models can be counted, speech analysis model is calculated with this With the weight of Expression analysis model, the final probability of cheating of object to be identified is calculated.
For example, it is assumed that the accuracy rate of speech analysis model is 85%, the accuracy rate of Expression analysis model is 95%, calculates language Sound analysis model and the weight of Expression analysis model can be expressed as follows:
P (Audio)=a=0.85
P (Video)=b=0.95
Wherein, P (Audio) indicates the accuracy rate of speech analysis model, and P (Video) indicates the accurate of Expression analysis model Rate, W (Audio) indicate the weight of speech analysis model, and W (Video) indicates the weight of Expression analysis model.
Assuming that the audio, video data of object to be identified pass through speech analysis model, Expression analysis model analysis, obtain this to The audio probability of cheating for identifying object is 0.8, and video probability of cheating is 0.7, then is added according to W (Audio) and W (Video) Fusion calculation is weighed, the probability of cheating of the object to be identified is finally obtained are as follows:
P=(0.85/1.8) * 0.8+ (0.95/1.8) * 0.7
Referring to shown in Fig. 4, the fraud recognition methods preferred embodiment of audio analysis and video analysis is combined for the present invention Flow chart.Using framework shown in fig. 1 or fig. 2, start electronic device 1, processor 12 executes the fraud stored in memory 11 Recognizer 10 realizes following steps:
Step S10, using obtain module 110 collect personage's audio-video sample, as unit of mood to audio-video sample into Row cutting, obtains audio-video segment, distributes a fraud mark for each audio-video segment.The audio-video sample can be logical What the photographic device 30 of the photographic device 3 or Fig. 2 of crossing Fig. 1 obtained, it is also possible to select from the network information or audio and video data library The audio-video for being clearly present fraud taken and the normal audio-video without fraud.
Step S20 is decoded and pre-processes to each audio-video segment using module 110 is obtained, and obtains each sound view The audio fragment and video clip of frequency segment.The fraud mark of each audio-video segment is still as corresponding audio fragment and piece of video The fraud mark of section.
Step S30 extracts phonetic feature and table from each audio fragment and video clip using extraction module 120 respectively Feelings feature.The specific extracting method of phonetic feature and expressive features please refers to being discussed in detail for said extracted module 120.
Step S40 marks the first support vector machines of training according to the phonetic feature of each audio fragment and fraud, obtains language Sound analysis model marks the second support vector machines of training according to the expressive features of each video clip and fraud, obtains expression point Analyse model.Using training module 130, sample data is labeled as with the phonetic feature and fraud of each audio fragment, to first It holds vector machine to be trained, obtains speech analysis model;Sample number is labeled as with the expressive features and fraud of each video clip According to being trained to the second support vector machines, obtain Expression analysis model.
Step S50 carries out the audio, video data using the audio, video data for obtaining the acquisition object to be identified of module 110 Decoding and pretreatment, obtain the audio data and video data of the object to be identified.The camera shooting that the audio, video data passes through Fig. 1 The photographic device 30 of device 3 or Fig. 2 obtain in real time.
Step S60 extracts the phonetic feature and video counts of the audio data of the object to be identified using extraction module 120 According to expressive features.The specific extracting method of phonetic feature and expressive features please refers to being discussed in detail for said extracted module 120.
Step S70 distinguishes the expressive features of the phonetic feature of the audio data of the object to be identified and video data defeated Enter the speech analysis model and Expression analysis model, audio probability of cheating and the video fraud for obtaining the object to be identified are general Rate.Using model application module 140, the phonetic feature of the audio data for the object to be identified that extraction module 120 is extracted is defeated Enter the speech analysis model, exports the audio probability of cheating P1 of the object to be analyzed;By extraction module 120 extract should be to Identify that the expressive features of the video data of object input the Expression analysis model, the video fraud for exporting the object to be identified is general Rate P2.
Step S80 is obtained according to the weight of the speech analysis model and Expression analysis model by P1, P2 weighted calculation The fraud recognition result of object to be identified.The determination method and P1, P2 of the weight of speech analysis model and Expression analysis model add The detailed process that power calculates please refers to being discussed in detail for above-mentioned weighted calculation module 150.
In addition, the embodiment of the present invention also proposes a kind of computer readable storage medium, the computer readable storage medium It can be hard disk, multimedia card, SD card, flash card, SMC, read-only memory (ROM), Erasable Programmable Read Only Memory EPROM (EPROM), any one in portable compact disc read-only memory (CD-ROM), USB storage etc. or several timess Meaning combination.It include that audio-video sample and fraud recognizer 10, the fraud identify journey in the computer readable storage medium Following operation is realized when sequence 10 is executed by processor:.
Sample preparation process: collecting personage's audio-video sample, cut to audio-video sample, obtain audio-video segment, A fraud mark is distributed for each audio-video segment, each audio-video segment is decoded and is pre-processed, each sound is obtained The audio fragment and video clip of video clip;
Characteristic extraction step: extracting phonetic feature from each audio fragment, and it is special that expression is extracted from each video clip Sign;
Model training step: being labeled as sample data with the phonetic feature and fraud of each audio fragment, to first support to Amount machine is trained, and obtains speech analysis model;It is labeled as sample data with the expressive features and fraud of each video clip, to Two support vector machines are trained, and obtain Expression analysis model;
Model applying step: acquiring the audio, video data of object to be identified, utilizes the speech analysis model and expression point Analysis model analyzes the audio, video data, exports the audio probability of cheating P1 and video probability of cheating of the object to be identified P2;And
Weighted calculation step: according to the weight of the speech analysis model and Expression analysis model by P1, P2 weighted calculation, Obtain the fraud recognition result of the object to be identified.
The specific embodiment of the computer readable storage medium of the present invention and above-mentioned combination audio analysis and video analysis Fraud recognition methods and electronic device 1 specific embodiment it is roughly the same, details are not described herein.
It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row His property includes, so that the process, device, article or the method that include a series of elements not only include those elements, and And further include other elements that are not explicitly listed, or further include for this process, device, article or method institute it is intrinsic Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do There is also other identical elements in the process, device of element, article or method.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.Pass through above embodiment party The description of formula, it is required general that those skilled in the art can be understood that above-described embodiment method can add by software The mode of hardware platform is realized, naturally it is also possible to which by hardware, but in many cases, the former is more preferably embodiment.It is based on Such understanding, substantially the part that contributes to existing technology can be with software product in other words for technical solution of the present invention Form embody, which is stored in a storage medium (such as ROM/RAM, magnetic disk, light as described above Disk) in, including some instructions use is so that a terminal device (can be mobile phone, computer, server or the network equipment Deng) execute method described in each embodiment of the present invention.
The above is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills Art field, is included within the scope of the present invention.

Claims (10)

1. the fraud recognition methods of a kind of combination audio analysis and video analysis is applied to electronic device, which is characterized in that the party Method includes:
Sample preparation process: personage's audio-video sample is collected, audio-video sample is cut, audio-video segment is obtained, is every A audio-video segment distributes a fraud mark, and each audio-video segment is decoded and is pre-processed, obtains each audio-video The audio fragment and video clip of segment;
Characteristic extraction step: extracting phonetic feature from each audio fragment, extracts expressive features from each video clip;
Model training step: sample data is labeled as with the phonetic feature and fraud of each audio fragment, to the first support vector machines It is trained, obtains speech analysis model;It is labeled as sample data with the expressive features and fraud of each video clip, to second It holds vector machine to be trained, obtains Expression analysis model;
Model applying step: acquiring the audio, video data of object to be identified, utilizes the speech analysis model and Expression analysis mould Type analyzes the audio, video data, exports the audio probability of cheating P1 and video probability of cheating P2 of the object to be identified;And
Weighted calculation step: it according to the weight of the speech analysis model and Expression analysis model by P1, P2 weighted calculation, obtains The fraud recognition result of the object to be identified.
2. fraud recognition methods as described in claim 1, which is characterized in that extract phonetic feature in the characteristic extraction step Include:
Fisrt feature extraction step: low order audio frequency characteristics are extracted from each audio fragment;
Second feature extraction step: extracting dynamic regressioncoefficients from each low order phonetic feature, obtains the dynamic of each audio fragment State audio frequency characteristics;
Third feature extraction step: each audio is extracted from the low order audio frequency characteristics and dynamic audio frequency feature using statistical function The high-order audio frequency characteristics of segment;And
Screening step: high-order audio frequency characteristics is filtered out from the high-order audio frequency characteristics of each audio fragment using Feature Selection algorithm Collection, using high-order audio frequency characteristics subset as the phonetic feature of each audio fragment.
3. fraud recognition methods as claimed in claim 2, which is characterized in that the low order audio frequency characteristics include that mel-frequency falls Spectral coefficient, pitch and zero-crossing rate.
4. fraud recognition methods as described in claim 1, which is characterized in that extract expressive features in the characteristic extraction step Include:
Low order characteristic extraction step: low order motion characteristic is extracted from each video clip;
High-order latent structure step: counting the number and lasting duration that each low order motion characteristic occurs in each video clip, The high-order motion characteristic of each video clip is constructed according to statistical result;And
Screening step: high-order motion characteristic is filtered out from the high-order motion characteristic of each video clip using Feature Selection algorithm Collection, using high-order motion characteristic subset as the expressive features of each video clip.
5. fraud recognition methods as claimed in claim 4, which is characterized in that the low order motion characteristic include head towards, Eyeball direction and Facial action unit.
6. fraud recognition methods as described in claim 1, which is characterized in that the model applying step further includes following step It is rapid:
The audio, video data of object to be identified is decoded and is pre-processed, the audio data and video of the object to be identified are obtained Data;
Phonetic feature is extracted from the audio data of the object to be identified, extracts expression from the video data of the object to be identified Feature.
7. a kind of electronic device, including memory and processor, which is characterized in that include fraud identification journey in the memory Sequence, the fraud recognizer realize following steps when being executed by the processor:
Sample preparation process: personage's audio-video sample is collected, audio-video sample is cut, audio-video segment is obtained, is every A audio-video segment distributes a fraud mark, and each audio-video segment is decoded and is pre-processed, obtains each audio-video The audio fragment and video clip of segment;
Characteristic extraction step: extracting phonetic feature from each audio fragment, extracts expressive features from each video clip;
Model training step: sample data is labeled as with the phonetic feature and fraud of each audio fragment, to the first support vector machines It is trained, obtains speech analysis model;It is labeled as sample data with the expressive features and fraud of each video clip, to second It holds vector machine to be trained, obtains Expression analysis model;
Model applying step: acquiring the audio, video data of object to be identified, utilizes the speech analysis model and Expression analysis mould Type analyzes the audio, video data, exports the audio probability of cheating P1 and video probability of cheating P2 of the object to be identified;And
Weighted calculation step: it according to the weight of the speech analysis model and Expression analysis model by P1, P2 weighted calculation, obtains The fraud recognition result of the object to be identified.
8. electronic device as claimed in claim 7, which is characterized in that extract phonetic feature packet in the characteristic extraction step It includes:
Fisrt feature extraction step: low order audio frequency characteristics are extracted from each audio fragment;
Second feature extraction step: extracting dynamic regressioncoefficients from each low order phonetic feature, obtains the dynamic of each audio fragment State audio frequency characteristics;
Third feature extraction step: each audio is extracted from the low order audio frequency characteristics and dynamic audio frequency feature using statistical function The high-order audio frequency characteristics of segment;And
Screening step: high-order audio frequency characteristics is filtered out from the high-order audio frequency characteristics of each audio fragment using Feature Selection algorithm Collection, using high-order audio frequency characteristics subset as the phonetic feature of each audio fragment.
9. electronic device as claimed in claim 7, which is characterized in that extract expressive features packet in the characteristic extraction step It includes:
Low order characteristic extraction step: low order motion characteristic is extracted from each video clip;
High-order latent structure step: counting the number and lasting duration that each low order motion characteristic occurs in each video clip, The high-order motion characteristic of each video clip is constructed according to statistical result;And
Screening step: high-order motion characteristic is filtered out from the high-order motion characteristic of each video clip using Feature Selection algorithm Collection, using high-order motion characteristic subset as the expressive features of each video clip.
10. a kind of computer readable storage medium, which is characterized in that include fraud identification in the computer readable storage medium Program when the fraud recognizer is executed by processor, realizes that fraud described in any one of claims 1 to 6 such as identifies The step of method.
CN201711252009.1A 2017-12-01 2017-12-01 In conjunction with fraud recognition methods, device and the storage medium of audio analysis and video analysis Active CN108053838B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201711252009.1A CN108053838B (en) 2017-12-01 2017-12-01 In conjunction with fraud recognition methods, device and the storage medium of audio analysis and video analysis
PCT/CN2018/077345 WO2019104890A1 (en) 2017-12-01 2018-02-27 Fraud identification method and device combining audio analysis and video analysis and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711252009.1A CN108053838B (en) 2017-12-01 2017-12-01 In conjunction with fraud recognition methods, device and the storage medium of audio analysis and video analysis

Publications (2)

Publication Number Publication Date
CN108053838A CN108053838A (en) 2018-05-18
CN108053838B true CN108053838B (en) 2019-10-11

Family

ID=62121930

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711252009.1A Active CN108053838B (en) 2017-12-01 2017-12-01 In conjunction with fraud recognition methods, device and the storage medium of audio analysis and video analysis

Country Status (2)

Country Link
CN (1) CN108053838B (en)
WO (1) WO2019104890A1 (en)

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109389489B (en) * 2018-09-25 2023-04-18 平安科技(深圳)有限公司 Method for identifying fraudulent behavior, computer readable storage medium and terminal equipment
CN109376603A (en) * 2018-09-25 2019-02-22 北京周同科技有限公司 A kind of video frequency identifying method, device, computer equipment and storage medium
CN109522799A (en) * 2018-10-16 2019-03-26 深圳壹账通智能科技有限公司 Information cuing method, device, computer equipment and storage medium
CN109472487A (en) * 2018-11-02 2019-03-15 深圳壹账通智能科技有限公司 Video quality detecting method, device, computer equipment and storage medium
CN109493882A (en) * 2018-11-04 2019-03-19 国家计算机网络与信息安全管理中心 A kind of fraudulent call voice automatic marking system and method
CN109831677B (en) * 2018-12-14 2022-04-01 平安科技(深圳)有限公司 Video desensitization method, device, computer equipment and storage medium
CN109858330A (en) * 2018-12-15 2019-06-07 深圳壹账通智能科技有限公司 Expression analysis method, apparatus, electronic equipment and storage medium based on video
CN109729383B (en) * 2019-01-04 2021-11-02 深圳壹账通智能科技有限公司 Double-recording video quality detection method and device, computer equipment and storage medium
CN109800720B (en) * 2019-01-23 2023-12-22 平安科技(深圳)有限公司 Emotion recognition model training method, emotion recognition device, equipment and storage medium
CN111144197A (en) * 2019-11-08 2020-05-12 宇龙计算机通信科技(深圳)有限公司 Human identification method, device, storage medium and electronic equipment
CN111339940B (en) * 2020-02-26 2023-07-21 中国工商银行股份有限公司 Video risk identification method and device
CN111460907B (en) * 2020-03-05 2023-06-20 浙江大华技术股份有限公司 Malicious behavior identification method, system and storage medium
CN111444379B (en) * 2020-03-30 2023-08-08 腾讯科技(深圳)有限公司 Audio feature vector generation method and audio fragment representation model training method
SG10202006357UA (en) 2020-07-01 2020-09-29 Alipay Labs Singapore Pte Ltd A Document Identification Method and System
CN112202720B (en) * 2020-09-04 2023-05-02 中移雄安信息通信科技有限公司 Audio and video identification method and device, electronic equipment and computer storage medium
CN112040488A (en) * 2020-09-10 2020-12-04 安徽师范大学 Illegal equipment identification method based on MAC address and channel state double-layer fingerprint
CN112133327B (en) * 2020-09-17 2024-02-13 腾讯音乐娱乐科技(深圳)有限公司 Audio sample extraction method, device, terminal and storage medium
CN112331230B (en) * 2020-11-17 2024-07-05 平安科技(深圳)有限公司 Fraud identification method, fraud identification device, computer equipment and storage medium
CN112562687B (en) * 2020-12-11 2023-08-04 天津讯飞极智科技有限公司 Audio and video processing method and device, recording pen and storage medium
CN113314103B (en) * 2021-05-31 2023-03-03 中国工商银行股份有限公司 Illegal information identification method and device based on real-time speech emotion analysis
CN113409822B (en) * 2021-05-31 2023-06-20 青岛海尔科技有限公司 Object state determining method and device, storage medium and electronic device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103023900A (en) * 2012-12-06 2013-04-03 北京百度网讯科技有限公司 Identity authentication method, cloud service system and cloud server based on cloud server-side
CN103226948A (en) * 2013-04-22 2013-07-31 山东师范大学 Audio scene recognition method based on acoustic events
CN103971700A (en) * 2013-08-01 2014-08-06 哈尔滨理工大学 Voice monitoring method and device
CN105100363A (en) * 2015-06-29 2015-11-25 小米科技有限责任公司 Information processing method, information processing device and terminal
CN105718874A (en) * 2016-01-18 2016-06-29 北京天诚盛业科技有限公司 Method and device of in-vivo detection and authentication
CN106157135A (en) * 2016-07-14 2016-11-23 微额速达(上海)金融信息服务有限公司 Antifraud system and method based on Application on Voiceprint Recognition Sex, Age

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9461987B2 (en) * 2014-08-14 2016-10-04 Bank Of America Corporation Audio authentication system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103023900A (en) * 2012-12-06 2013-04-03 北京百度网讯科技有限公司 Identity authentication method, cloud service system and cloud server based on cloud server-side
CN103226948A (en) * 2013-04-22 2013-07-31 山东师范大学 Audio scene recognition method based on acoustic events
CN103971700A (en) * 2013-08-01 2014-08-06 哈尔滨理工大学 Voice monitoring method and device
CN105100363A (en) * 2015-06-29 2015-11-25 小米科技有限责任公司 Information processing method, information processing device and terminal
CN105718874A (en) * 2016-01-18 2016-06-29 北京天诚盛业科技有限公司 Method and device of in-vivo detection and authentication
CN106157135A (en) * 2016-07-14 2016-11-23 微额速达(上海)金融信息服务有限公司 Antifraud system and method based on Application on Voiceprint Recognition Sex, Age

Also Published As

Publication number Publication date
WO2019104890A1 (en) 2019-06-06
CN108053838A (en) 2018-05-18

Similar Documents

Publication Publication Date Title
CN108053838B (en) In conjunction with fraud recognition methods, device and the storage medium of audio analysis and video analysis
CN110619568A (en) Risk assessment report generation method, device, equipment and storage medium
WO2019085329A1 (en) Recurrent neural network-based personal character analysis method, device, and storage medium
CN108021864A (en) Character personality analysis method, device and storage medium
CN107704834A (en) Householder method, device and storage medium are examined in micro- expression face
CN110910976A (en) Medical record detection method, device, equipment and storage medium
CN109859772A (en) Emotion identification method, apparatus and computer readable storage medium
CN108038413A (en) Cheat probability analysis method, apparatus and storage medium
WO2021068783A1 (en) Emotion recognition method, device and apparatus
CN110363090A (en) Intelligent heart disease detection method, device and computer readable storage medium
CN110706786A (en) Non-contact intelligent analysis and evaluation system for psychological parameters
CN105740808B (en) Face identification method and device
CN109800720A (en) Emotion identification model training method, Emotion identification method, apparatus, equipment and storage medium
WO2021151295A1 (en) Method, apparatus, computer device, and medium for determining patient treatment plan
CN113243918B (en) Risk detection method and device based on multi-mode hidden information test
CN110738998A (en) Voice-based personal credit evaluation method, device, terminal and storage medium
CN110427803A (en) Lie detecting method, device, electronic equipment and storage medium based on video analysis
CN109545387A (en) One kind abnormal case recognition methods neural network based and calculating equipment
CN110136726A (en) A kind of estimation method, device, system and the storage medium of voice gender
CN110391013A (en) A kind of system and device based on semantic vector building neural network prediction mental health
CN109602421A (en) Health monitor method, device and computer readable storage medium
CN115862868A (en) Psychological assessment system, psychological assessment platform, electronic device and storage medium
CN108962379B (en) Mobile phone auxiliary detection system for cranial nerve system diseases
CN109635113A (en) Abnormal insured people purchases medicine data detection method, device, equipment and storage medium
CN109697676A (en) Customer analysis and application method and device based on social group

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20180525

Address after: 518000 Room 201, building A, No. 1, Qian Wan Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong (Shenzhen Qianhai business secretary Co., Ltd.)

Applicant after: Shenzhen one ledger Intelligent Technology Co., Ltd.

Address before: 200030 Xuhui District, Shanghai Kai Bin Road 166, 9, 10 level.

Applicant before: Shanghai Financial Technologies Ltd

TA01 Transfer of patent application right
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1250835

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant