CN108053838B

CN108053838B - In conjunction with fraud recognition methods, device and the storage medium of audio analysis and video analysis

Info

Publication number: CN108053838B
Application number: CN201711252009.1A
Authority: CN
Inventors: 韦峰; 徐国强
Original assignee: OneConnect Smart Technology Co Ltd
Current assignee: OneConnect Smart Technology Co Ltd
Priority date: 2017-12-01
Filing date: 2017-12-01
Publication date: 2019-10-11
Anticipated expiration: 2037-12-01
Also published as: CN108053838A; WO2019104890A1

Abstract

The present invention provides fraud recognition methods, device and the storage medium of a kind of combination audio analysis and video analysis.Method includes the following steps: cutting audio-video sample, obtains audio-video segment, a fraud mark is distributed for each audio-video segment；Each audio-video segment is decoded and is pre-processed, the audio fragment and video clip of each audio-video segment are obtained；Phonetic feature and expressive features are extracted respectively from each audio fragment and video clip；Fraud mark Training Support Vector Machines are combined with the expressive features of the phonetic feature of each audio fragment and each video clip respectively, obtain speech analysis model and Expression analysis model；Acquire the audio, video data of object to be identified；Extract the phonetic feature and expressive features of the audio, video data；The phonetic feature and expressive features are inputted into the speech analysis model and Expression analysis model respectively, export the probability of cheating P1 and P2 of the object to be identified；By P1, P2 weighted calculation, the fraud recognition result of object to be identified is obtained.

Description

In conjunction with fraud recognition methods, device and the storage medium of audio analysis and video analysis

Technical field

The present invention relates to technical field of computer information processing more particularly to a kind of combination audio analysis and video analysis Cheat recognition methods, device and computer readable storage medium.

Background technique

Currently, fraud identification is generally realized in such a way that face is examined, the experience and judgement of extreme dependency analysis personnel expend A large amount of time and manpower, analysis result are often inaccurate objective.Also there is the instrument and equipment using profession, breathed by detection, A series of indexs such as pulse, blood pressure, skin resistance judge tested personnel whether there is or not fraud suspicion, but the usual valence of such instrument and equipment Lattice are expensive and are easy to constitute infringement to the human rights of tested personnel.

Summary of the invention

To solve the shortcomings of the prior art, the present invention provides the fraud knowledge of a kind of combination audio analysis and video analysis Other method, apparatus and computer readable storage medium, it is objective, accurately sentence by analyzing the audio, video data of object to be identified Disconnected object to be identified whether there is fraud suspicion.

To achieve the above object, the present invention provides the fraud recognition methods of a kind of combination audio analysis and video analysis, answers For electronic device, this method comprises:

Sample preparation process: collecting personage's audio-video sample, cut to audio-video sample, obtain audio-video segment, A fraud mark is distributed for each audio-video segment, each audio-video segment is decoded and is pre-processed, each sound is obtained The audio fragment and video clip of video clip；

Characteristic extraction step: extracting phonetic feature from each audio fragment, and it is special that expression is extracted from each video clip Sign；

Model training step: being labeled as sample data with the phonetic feature and fraud of each audio fragment, to first support to Amount machine is trained, and obtains speech analysis model；It is labeled as sample data with the expressive features and fraud of each video clip, to Two support vector machines are trained, and obtain Expression analysis model；

Model applying step: acquiring the audio, video data of object to be identified, utilizes the speech analysis model and expression point Analysis model analyzes the audio, video data, exports the audio probability of cheating P1 and video probability of cheating of the object to be identified P2；And

Weighted calculation step: according to the weight of the speech analysis model and Expression analysis model by P1, P2 weighted calculation, Obtain the fraud recognition result of the object to be identified.

Preferably, phonetic feature is extracted in the characteristic extraction step includes:

Fisrt feature extraction step: low order audio frequency characteristics are extracted from each audio fragment；

Second feature extraction step: dynamic regressioncoefficients are extracted from each low order phonetic feature, obtain each audio fragment Dynamic audio frequency feature；

Third feature extraction step: it is extracted from the low order audio frequency characteristics and dynamic audio frequency feature respectively using statistical function The high-order audio frequency characteristics of audio fragment；And

Screening step: it is special that high-order audio is filtered out from the high-order audio frequency characteristics of each audio fragment using Feature Selection algorithm Subset is levied, using high-order audio frequency characteristics subset as the phonetic feature of each audio fragment.

Preferably, the low order audio frequency characteristics include mel-frequency cepstrum coefficient, pitch and zero-crossing rate.

Preferably, expressive features are extracted in the characteristic extraction step includes:

Low order characteristic extraction step: low order motion characteristic is extracted from each video clip；

High-order latent structure step: count each low order motion characteristic occurs in each video clip number and it is lasting when It is long, the high-order motion characteristic of each video clip is constructed according to statistical result；And

Screening step: it is special that high-order movement is filtered out from the high-order motion characteristic of each video clip using Feature Selection algorithm Subset is levied, using high-order motion characteristic subset as the expressive features of each video clip.

Preferably, the low order motion characteristic includes head direction, eyeball direction and Facial action unit (action Unit, AU).

Preferably, the model applying step is further comprising the steps of:

The audio, video data of object to be identified is decoded and is pre-processed, obtain the object to be identified audio data and Video data；

Phonetic feature is extracted from the audio data of the object to be identified, is extracted from the video data of the object to be identified Expressive features.

The present invention also provides a kind of electronic device, which includes memory and processor, is wrapped in the memory Include fraud recognizer.The electronic device is directly or indirectly connected with photographic device, and photographic device is by the audio-video of acquisition Data transmission is to electronic device.When the processor of the electronic device executes the fraud recognizer in memory, following step is realized It is rapid:

Preferably, the low order motion characteristic includes head direction, eyeball direction and face AU.

Preferably, the model applying step is further comprising the steps of:

In addition, to achieve the above object, it is described computer-readable the present invention also provides a kind of computer readable storage medium Include fraud recognizer in storage medium, when the fraud recognizer is executed by processor, realizes as described above combine Arbitrary steps in the fraud recognition methods of audio analysis and video analysis.

Fraud recognition methods, device and the computer-readable storage of combination audio analysis and video analysis provided by the invention Medium, by extracting the phonetic feature of audio-video sample audio fragment and the expressive features of video clip, in conjunction with corresponding fraud Mark, is trained support vector machines, obtains speech analysis model and Expression analysis model.Later, by trained model Applied to real-time fraud identification link: acquiring the audio, video data of object to be identified, the voice for extracting the audio, video data is special It seeks peace expressive features, the phonetic feature and expressive features is inputted into the speech analysis model and Expression analysis mould that training obtains respectively Type exports the audio probability of cheating P1 and video probability of cheating P2 of the object to be identified, by P1, P2 Weighted Fusion, obtain this to Identify the fraud recognition result of object.Using the present invention, objective, accurately it can identify that personage whether there is fraud suspicion.

Detailed description of the invention

Fig. 1 is the applied environment figure of the first preferred embodiment of electronic device of the present invention.

Fig. 2 is the applied environment figure of the second preferred embodiment of electronic device of the present invention.

Fig. 3 is the Program modual graph that recognizer is cheated in Fig. 1, Fig. 2.

Fig. 4 is the flow chart for the fraud recognition methods preferred embodiment that the present invention combines audio analysis and video analysis.

The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.

Specific embodiment

The principle and spirit of the invention are described below with reference to several specific embodiments.It should be appreciated that described herein Specific embodiment be only used to explain the present invention, be not intended to limit the present invention.

It is the applied environment figure of the first preferred embodiment of electronic device of the present invention shown in referring to Fig.1.In this embodiment, Photographic device 3 connects electronic device 1 by network 2, and photographic device 3 acquires the audio, video data of personage, is sent to by network 2 Electronic device 1, electronic device 1 analyze the audio, video data using fraud recognizer 10 provided by the invention, obtain personage Fraud recognition result.

Electronic device 1, which can be server, smart phone, tablet computer, portable computer, desktop PC etc., to be had The terminal device of storage and calculation function.

The electronic device 1 includes memory 11, processor 12, network interface 13 and communication bus 14.

Photographic device 3 is installed on particular place, such as office space, monitoring area, for acquiring the audio-video number of personage According to the audio, video data is then transmitted to memory 11 by network 2.Network interface 13 may include that the wired of standard connects Mouth, wireless interface (such as WI-FI interface).Communication bus 14 is for realizing the connection communication between these components.

Memory 11 includes the readable storage medium storing program for executing of at least one type.The readable storage medium storing program for executing of at least one type It can be the non-volatile memory medium of such as flash memory, hard disk, multimedia card, card-type memory.In some embodiments, described can Reading storage medium can be the internal storage unit of the electronic device 1, such as the hard disk of the electronic device 1.In other realities It applies in example, the readable storage medium storing program for executing is also possible to the external memory 11 of the electronic device 1, such as the electronic device 1 The plug-in type hard disk of upper outfit, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) block, flash card (Flash Card) etc..

In the present embodiment, program code, the photographic device 3 that the memory 11 stores the fraud recognizer 10 are adopted Other data that the program code that the audio, video data and processor 12 of collection execute fraud recognizer 10 is applied to and most The data etc. exported afterwards.

Processor 12 can be in some embodiments a central processing unit (Central Processing Unit, CPU), microprocessor or other data processing chips.

Fig. 1 illustrates only the electronic device 1 with component 11-14, it should be understood that being not required for implementing all show Component out, the implementation that can be substituted is more or less component.

Optionally, which can also include user interface, and user interface may include input unit such as keyboard (Keyboard), speech input device such as microphone (microphone) etc. has the equipment of speech identifying function, voice defeated Device such as sound equipment, earphone etc. out, optionally user interface can also include standard wireline interface and wireless interface.

Optionally, which can also include display.Display can be LED in some embodiments and show Device, liquid crystal display, touch-control liquid crystal display and OLED (Organic Light-Emitting Diode, organic light emission Diode) touch device etc..Display is used for the information and visual user interface that display electronics assemblies 1 are handled.

Optionally, which further includes touch sensor.It is touched provided by the touch sensor for user The region for touching operation is known as touch area.In addition, touch sensor described here can be resistive touch sensor, capacitor Formula touch sensor etc..Moreover, the touch sensor not only includes the touch sensor of contact, proximity may also comprise Touch sensor etc..In addition, the touch sensor can be single sensor, or such as multiple biographies of array arrangement Sensor.User, such as psychologist, credit face examine personnel etc., can start fraud recognizer 10 by touching.

The electronic device 1 can also include radio frequency (Radio Frequency, RF) circuit, sensor and voicefrequency circuit etc. Deng details are not described herein.

It is the applied environment figure of the second preferred embodiment of electronic device of the present invention referring to shown in Fig. 2.User passes through terminal 3 Realize fraud identification process, the photographic device 30 of terminal 3 acquires the audio, video data of object to be identified, and transmits by network 2 To the electronic device 1, the processor 12 of electronic device 1 executes the program generation for the fraud recognizer 10 that memory 11 stores Code, analyzes the audio data and video data of audio, video data, exports the audio probability of cheating P1 of the object to be identified P1, P2 weighted calculation are obtained into the fraud recognition result of the object to be identified, for object to be identified with video probability of cheating P2 Or the reference such as audit crew.

The component of electronic device 1 in Fig. 2, such as memory 11, processor 12 shown in figure, network interface 13 and communication Bus 14 and component not shown in the figure, please refer to the introduction about Fig. 1.

The terminal 3 can have for smart phone, tablet computer, portable computer, desktop PC etc. storage and The terminal device of calculation function.

Fraud recognizer 10 in Fig. 1, Fig. 2 is performed the steps of when being executed by processor 12

About being discussed in detail for above-mentioned steps, please refer to following Fig. 3 about the Program modual graph of fraud recognizer 10 and Explanation of the Fig. 4 about the flow chart for the fraud recognition methods preferred embodiment for combining audio analysis and video analysis.

It is the Program modual graph that recognizer 10 is cheated in Fig. 1, Fig. 2 referring to shown in Fig. 3.In the present embodiment, fraud is known Other program 10 is divided into multiple modules, and multiple module is stored in memory 11, and is executed by processor 12, to complete The present invention.The so-called module of the present invention is the series of computation machine program instruction section for referring to complete specific function.

The fraud recognizer 10 can be divided into: obtain module 110, extraction module 120, training module 130, mould Type application module 140 and weighted calculation module 150.

It obtains module 110 and obtains corresponding audio for obtaining the audio-video of personage and it being decoded and is pre-processed Part and video section.The audio-video can be the acquisition of photographic device 30 of the photographic device 3 or Fig. 2 by Fig. 1, can also To be the audio-video for being clearly present fraud chosen from the network information or audio and video data library and without the sound of fraud Video.The audio-video sample for Training Support Vector Machines is cut as unit of mood, obtains audio-video segment, is every A audio-video segment distributes a fraud mark, and the fraud mark indicates the personage in the audio-video segment, and whether there is or not frauds to dislike It doubts, such as 1 indicates fraud suspicion, 0 indicates without fraud suspicion.Audio-video is decoded and is pre-processed, corresponding sound is obtained Frequency part and video section.

Extraction module 120, for extracting the phonetic feature of the audio-frequency unit and the expressive features of video section.Extract mould Block 120 extracts phonetic feature from each audio-frequency unit that module 110 obtains is obtained, each video obtained from acquisition module 110 Extracting section expressive features.

When extraction module 120 extracts the phonetic feature of an audio-frequency unit, mel-frequency first is extracted from the audio-frequency unit and is fallen The low orders audio frequency characteristics such as spectral coefficient, pitch, zero-crossing rate, then dynamic regressioncoefficients are extracted from these low order audio frequency characteristics, it obtains Then the dynamic audio frequency feature of the audio-frequency unit is mentioned from the low order audio frequency characteristics and dynamic audio frequency feature using statistical function High-order audio frequency characteristics are taken, high-order audio frequency characteristics subset are finally filtered out from high-order audio frequency characteristics with Feature Selection algorithm, by this Phonetic feature of the high-order audio frequency characteristics subset as the audio-frequency unit.

In the present embodiment, OpenSMILE software can be used to extract mel-frequency cepstrum coefficient, the sound of audio-frequency unit The low orders audio frequency characteristics such as height, zero-crossing rate.The dynamic regressioncoefficients are used to indicate the significance level of low order audio frequency characteristics.For example, Some low order audio frequency characteristics (such as pitch parameters) of some audio-frequency unit are indicated with a wave file, then the wave file It can be indicated with the mode of multiple linear regression are as follows:

Y=β₀+β₁X₁+β₂X₂+…+β_KX_K

Wherein, k is number of the low order audio frequency characteristics in the audio-frequency unit, β_j(j=1,2 ..., k) it is low order audio The dynamic regressioncoefficients of feature.

The statistical function includes the maximum value, minimum value, peak for extracting low order audio frequency characteristics and dynamic audio frequency feature The data extracted using statistical function are combined, are converted by the function of degree, the degree of bias etc., extraction module 120, obtain high-order sound Frequency feature.The quantity for the high-order audio frequency characteristics that each audio-frequency unit extracts is often very big, but usually only small part high-order sound Frequency feature can make a significant impact the result of fraud identification, so, it is special that we reduce high-order audio with Feature Selection algorithm The quantity of sign improves fraud recognition speed.In the present embodiment, the Feature Selection algorithm can be before sequence to selection (Sequential Forward Selection, SFS) algorithm, sequence backward selection (Sequential Backward Selection, SBS) algorithm, bidirectional research (Bidirectional Search, BDS) algorithm, filtering characteristic selection (filter Feature selection) algorithm, it is also possible to other Feature Selection algorithms.

Similarly, when extraction module 120 extracts the expressive features of a video section, head first is extracted from the video section The low orders motion characteristics such as direction, eyeball direction and Facial action unit (action unit, AU), then count each low order movement spy The number occurred in the video section and lasting duration are levied, the high-order movement spy of the video section is constructed according to statistical result Sign, then filters out high-order motion characteristic subset, by the high-order motion characteristic with Feature Selection algorithm from high-order motion characteristic Expressive features of the subset as the video section.

Training module 130 is used for Training Support Vector Machines, obtains speech analysis model and Expression analysis model.To extract The phonetic feature of each audio-frequency unit for the audio-video sample that module 120 is extracted and the fraud for obtaining the distribution of module 110 are labeled as sample Notebook data is trained the first support vector machines, obtains speech analysis model；The audio-video sample extracted with extraction module 120 The expressive features of this each video section and the fraud for obtaining the distribution of module 110 are labeled as sample data, to the second supporting vector Machine is trained, and obtains Expression analysis model.

Model application module 140, for analyzing the audio, video data of object to be identified, the audio for obtaining object to be identified is taken advantage of Cheat probability and video probability of cheating.The voice of the audio-frequency unit of the audio-video for the object to be identified that extraction module 120 is extracted is special The speech analysis model that the sign input training of training module 130 obtains, exports the audio probability of cheating P1 of the object to be identified；It will be to It identifies that the expressive features of object video part input trained Expression analysis model, exports the video fraud of the object to be identified Probability P 2.

Weighted calculation module 150, for weighting the audio probability of cheating P1 of object to be identified and video probability of cheating P2 It calculates, obtains the fraud recognition result of the object to be identified.Training module 130 is obtained using sample data Training Support Vector Machines When speech analysis model and Expression analysis model, the accuracy rate of two models can be counted, speech analysis model is calculated with this With the weight of Expression analysis model, the final probability of cheating of object to be identified is calculated.

For example, it is assumed that the accuracy rate of speech analysis model is 85%, the accuracy rate of Expression analysis model is 95%, calculates language Sound analysis model and the weight of Expression analysis model can be expressed as follows:

P (Audio)=a=0.85

P (Video)=b=0.95

Wherein, P (Audio) indicates the accuracy rate of speech analysis model, and P (Video) indicates the accurate of Expression analysis model Rate, W (Audio) indicate the weight of speech analysis model, and W (Video) indicates the weight of Expression analysis model.

Assuming that the audio, video data of object to be identified pass through speech analysis model, Expression analysis model analysis, obtain this to The audio probability of cheating for identifying object is 0.8, and video probability of cheating is 0.7, then is added according to W (Audio) and W (Video) Fusion calculation is weighed, the probability of cheating of the object to be identified is finally obtained are as follows:

P=(0.85/1.8) * 0.8+ (0.95/1.8) * 0.7

Referring to shown in Fig. 4, the fraud recognition methods preferred embodiment of audio analysis and video analysis is combined for the present invention Flow chart.Using framework shown in fig. 1 or fig. 2, start electronic device 1, processor 12 executes the fraud stored in memory 11 Recognizer 10 realizes following steps:

Step S10, using obtain module 110 collect personage's audio-video sample, as unit of mood to audio-video sample into Row cutting, obtains audio-video segment, distributes a fraud mark for each audio-video segment.The audio-video sample can be logical What the photographic device 30 of the photographic device 3 or Fig. 2 of crossing Fig. 1 obtained, it is also possible to select from the network information or audio and video data library The audio-video for being clearly present fraud taken and the normal audio-video without fraud.

Step S20 is decoded and pre-processes to each audio-video segment using module 110 is obtained, and obtains each sound view The audio fragment and video clip of frequency segment.The fraud mark of each audio-video segment is still as corresponding audio fragment and piece of video The fraud mark of section.

Step S30 extracts phonetic feature and table from each audio fragment and video clip using extraction module 120 respectively Feelings feature.The specific extracting method of phonetic feature and expressive features please refers to being discussed in detail for said extracted module 120.

Step S40 marks the first support vector machines of training according to the phonetic feature of each audio fragment and fraud, obtains language Sound analysis model marks the second support vector machines of training according to the expressive features of each video clip and fraud, obtains expression point Analyse model.Using training module 130, sample data is labeled as with the phonetic feature and fraud of each audio fragment, to first It holds vector machine to be trained, obtains speech analysis model；Sample number is labeled as with the expressive features and fraud of each video clip According to being trained to the second support vector machines, obtain Expression analysis model.

Step S50 carries out the audio, video data using the audio, video data for obtaining the acquisition object to be identified of module 110 Decoding and pretreatment, obtain the audio data and video data of the object to be identified.The camera shooting that the audio, video data passes through Fig. 1 The photographic device 30 of device 3 or Fig. 2 obtain in real time.

Step S60 extracts the phonetic feature and video counts of the audio data of the object to be identified using extraction module 120 According to expressive features.The specific extracting method of phonetic feature and expressive features please refers to being discussed in detail for said extracted module 120.

Step S70 distinguishes the expressive features of the phonetic feature of the audio data of the object to be identified and video data defeated Enter the speech analysis model and Expression analysis model, audio probability of cheating and the video fraud for obtaining the object to be identified are general Rate.Using model application module 140, the phonetic feature of the audio data for the object to be identified that extraction module 120 is extracted is defeated Enter the speech analysis model, exports the audio probability of cheating P1 of the object to be analyzed；By extraction module 120 extract should be to Identify that the expressive features of the video data of object input the Expression analysis model, the video fraud for exporting the object to be identified is general Rate P2.

Step S80 is obtained according to the weight of the speech analysis model and Expression analysis model by P1, P2 weighted calculation The fraud recognition result of object to be identified.The determination method and P1, P2 of the weight of speech analysis model and Expression analysis model add The detailed process that power calculates please refers to being discussed in detail for above-mentioned weighted calculation module 150.

In addition, the embodiment of the present invention also proposes a kind of computer readable storage medium, the computer readable storage medium It can be hard disk, multimedia card, SD card, flash card, SMC, read-only memory (ROM), Erasable Programmable Read Only Memory EPROM (EPROM), any one in portable compact disc read-only memory (CD-ROM), USB storage etc. or several timess Meaning combination.It include that audio-video sample and fraud recognizer 10, the fraud identify journey in the computer readable storage medium Following operation is realized when sequence 10 is executed by processor:.

The specific embodiment of the computer readable storage medium of the present invention and above-mentioned combination audio analysis and video analysis Fraud recognition methods and electronic device 1 specific embodiment it is roughly the same, details are not described herein.

It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row His property includes, so that the process, device, article or the method that include a series of elements not only include those elements, and And further include other elements that are not explicitly listed, or further include for this process, device, article or method institute it is intrinsic Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do There is also other identical elements in the process, device of element, article or method.

The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.Pass through above embodiment party The description of formula, it is required general that those skilled in the art can be understood that above-described embodiment method can add by software The mode of hardware platform is realized, naturally it is also possible to which by hardware, but in many cases, the former is more preferably embodiment.It is based on Such understanding, substantially the part that contributes to existing technology can be with software product in other words for technical solution of the present invention Form embody, which is stored in a storage medium (such as ROM/RAM, magnetic disk, light as described above Disk) in, including some instructions use is so that a terminal device (can be mobile phone, computer, server or the network equipment Deng) execute method described in each embodiment of the present invention.

The above is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills Art field, is included within the scope of the present invention.

Claims

1. the fraud recognition methods of a kind of combination audio analysis and video analysis is applied to electronic device, which is characterized in that the party Method includes:

Sample preparation process: personage's audio-video sample is collected, audio-video sample is cut, audio-video segment is obtained, is every A audio-video segment distributes a fraud mark, and each audio-video segment is decoded and is pre-processed, obtains each audio-video The audio fragment and video clip of segment；

Characteristic extraction step: extracting phonetic feature from each audio fragment, extracts expressive features from each video clip；

Model training step: sample data is labeled as with the phonetic feature and fraud of each audio fragment, to the first support vector machines It is trained, obtains speech analysis model；It is labeled as sample data with the expressive features and fraud of each video clip, to second It holds vector machine to be trained, obtains Expression analysis model；

Model applying step: acquiring the audio, video data of object to be identified, utilizes the speech analysis model and Expression analysis mould Type analyzes the audio, video data, exports the audio probability of cheating P1 and video probability of cheating P2 of the object to be identified；And

Weighted calculation step: it according to the weight of the speech analysis model and Expression analysis model by P1, P2 weighted calculation, obtains The fraud recognition result of the object to be identified.

2. fraud recognition methods as described in claim 1, which is characterized in that extract phonetic feature in the characteristic extraction step Include:

Second feature extraction step: extracting dynamic regressioncoefficients from each low order phonetic feature, obtains the dynamic of each audio fragment State audio frequency characteristics；

Third feature extraction step: each audio is extracted from the low order audio frequency characteristics and dynamic audio frequency feature using statistical function The high-order audio frequency characteristics of segment；And

Screening step: high-order audio frequency characteristics is filtered out from the high-order audio frequency characteristics of each audio fragment using Feature Selection algorithm Collection, using high-order audio frequency characteristics subset as the phonetic feature of each audio fragment.

3. fraud recognition methods as claimed in claim 2, which is characterized in that the low order audio frequency characteristics include that mel-frequency falls Spectral coefficient, pitch and zero-crossing rate.

4. fraud recognition methods as described in claim 1, which is characterized in that extract expressive features in the characteristic extraction step Include:

High-order latent structure step: counting the number and lasting duration that each low order motion characteristic occurs in each video clip, The high-order motion characteristic of each video clip is constructed according to statistical result；And

Screening step: high-order motion characteristic is filtered out from the high-order motion characteristic of each video clip using Feature Selection algorithm Collection, using high-order motion characteristic subset as the expressive features of each video clip.

5. fraud recognition methods as claimed in claim 4, which is characterized in that the low order motion characteristic include head towards, Eyeball direction and Facial action unit.

6. fraud recognition methods as described in claim 1, which is characterized in that the model applying step further includes following step It is rapid:

The audio, video data of object to be identified is decoded and is pre-processed, the audio data and video of the object to be identified are obtained Data；

Phonetic feature is extracted from the audio data of the object to be identified, extracts expression from the video data of the object to be identified Feature.

7. a kind of electronic device, including memory and processor, which is characterized in that include fraud identification journey in the memory Sequence, the fraud recognizer realize following steps when being executed by the processor:

8. electronic device as claimed in claim 7, which is characterized in that extract phonetic feature packet in the characteristic extraction step It includes:

9. electronic device as claimed in claim 7, which is characterized in that extract expressive features packet in the characteristic extraction step It includes:

10. a kind of computer readable storage medium, which is characterized in that include fraud identification in the computer readable storage medium Program when the fraud recognizer is executed by processor, realizes that fraud described in any one of claims 1 to 6 such as identifies The step of method.