CN112331230A

CN112331230A - Method and device for identifying fraudulent conduct, computer equipment and storage medium

Info

Publication number: CN112331230A
Application number: CN202011286464.5A
Authority: CN
Inventors: 李响
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-11-17
Filing date: 2020-11-17
Publication date: 2021-02-05
Also published as: WO2022105169A1

Abstract

The embodiment of the application belongs to the technical field of voice signal processing, and relates to a fraud identification method, a device, computer equipment and a storage medium based on a depth self-encoder. In addition, the application also relates to a block chain technology, and original voice data of a user can be stored in the block chain. Because most of the training corpora are normal samples without fraud, larger signal errors can not occur when the coding and decoding of the deep self-encoder are free of fraudulent speech, and the problem that the fraudulent samples are difficult to accurately identify is avoided.

Description

Method and device for identifying fraudulent conduct, computer equipment and storage medium

Technical Field

The present application relates to the field of speech signal processing technologies, and in particular, to a method and an apparatus for identifying fraudulent conduct based on a depth self-encoder, a computer device, and a storage medium.

Background

The face-check is that the bank loan officer knows the loan incentive and the fund condition of the borrower in the form of face-to-face conversation and makes a prejudgment on the potential credit risk and fraud risk. However, the lawless persons cheat the bank to loan, which causes huge loss to the bank. The identification of lie is of great significance for preventing telephone fraud, assisting criminal investigation case processing and information analysis, and therefore research on lie detection is a current research hotspot.

The existing method for identifying the fraudulent conduct is to discriminate whether the training corpus has the fraudulent conduct or not by a technician with enough experience, train the labeled training corpus to a speech emotion model, and finally train the speech data by the trained speech emotion model to confirm whether the speech data has the fraudulent conduct or not.

However, the conventional fraud identification method is generally not intelligent, and a technician is required to discriminate a large amount of training corpora, so that the workload of data annotation required by the conventional fraud identification method is extremely large; meanwhile, when the training corpora are discriminated, the subjective judgment often has great carelessness, and therefore the recognition accuracy is greatly reduced.

Disclosure of Invention

The embodiment of the application aims to provide a method and a device for identifying fraudulent conduct based on a depth self-encoder, a computer device and a storage medium, so as to solve the problems that the traditional method for identifying fraudulent conduct needs extremely large workload of data marking and low identification accuracy.

In order to solve the above technical problem, an embodiment of the present application provides a method for identifying a fraudulent conduct based on a depth self-encoder, which adopts the following technical solutions:

when performing a face examination, receiving original voice data acquired by an audio acquisition device;

inputting the original voice data into a pre-trained depth self-encoder to perform encoding and decoding operations to obtain an encoding and decoding result;

comparing the original voice data with the encoding and decoding results to obtain an error value;

judging whether the error value meets a preset fraud threshold value or not;

if the error value meets the fraud threshold, determining that the original voice data has fraud behaviors;

and if the error value does not meet the fraud threshold, determining that the original voice data has no fraud.

In order to solve the above technical problem, an embodiment of the present application further provides a device for identifying a fraudulent conduct based on a depth self-encoder, which adopts the following technical solutions:

the voice acquisition module is used for receiving the original voice data acquired by the audio acquisition equipment during face examination;

the coding and decoding module is used for inputting the original voice data into a pre-trained depth self-encoder to carry out coding and decoding operation so as to obtain a coding and decoding result;

the comparison operation module is used for performing comparison operation on the original voice data and the coding and decoding results to obtain an error value;

the threshold value judging module is used for judging whether the error value meets a preset fraud threshold value or not;

a first behavior determining module, configured to determine that a fraud behavior exists in the original voice data if the error value satisfies the fraud threshold;

and the second behavior determining module is used for determining that the original voice data has no fraud behavior if the error value does not meet the fraud threshold.

In order to solve the above technical problem, an embodiment of the present application further provides a computer device, which adopts the following technical solutions:

comprising a memory having computer readable instructions stored therein and a processor that when executed implements the steps of the depth self-encoder based fraud identification method as described above.

In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, which adopts the following technical solutions:

the computer readable storage medium has stored thereon computer readable instructions which, when executed by a processor, implement the steps of the depth self-encoder based fraud identification method as described above.

Compared with the prior art, the method, the device, the computer equipment and the storage medium for identifying the cheating behavior based on the depth self-encoder provided by the embodiment of the application have the following main beneficial effects:

the method comprises the steps of carrying out coding and decoding operations on original voice data through a trained deep self-encoder to obtain coding and decoding results recovered based on the deep self-encoder, comparing signal errors of the original voice data and the coding and decoding results to confirm whether fraud exists in the original voice data, and since most of training corpora are normal samples without fraud, the coding and decoding of the deep self-encoder cannot generate large signal errors when no fraud voice exists, so that the problem that a traditional speech emotion model is difficult to accurately identify fraud samples is avoided.

Drawings

In order to more clearly illustrate the solution of the present application, the drawings needed for describing the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.

Fig. 1 is a flowchart of an implementation of a deep self-encoder based fraud identification method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of an auto-encoder architecture according to an embodiment of the present application;

fig. 3 is a flowchart of an implementation of a depth self-encoder obtaining method according to an embodiment of the present application;

FIG. 4 is a flowchart of an implementation of a depth self-encoder optimization method according to an embodiment of the present application;

FIG. 5 is a flowchart of an implementation of step S301 in FIG. 4;

fig. 6 is a schematic structural diagram of a deep self-encoder based fraud detection apparatus according to a second embodiment of the present application;

fig. 7 is a schematic structural diagram of an acquisition apparatus of a depth self-encoder according to a second embodiment of the present application;

fig. 8 is a schematic structural diagram of an optimization apparatus of a depth self-encoder according to a second embodiment of the present application;

FIG. 9 is a schematic block diagram of one embodiment of a computer device according to the present application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.

Example one

Referring to fig. 1, a flowchart of an implementation of a deep self-encoder-based fraud identification method according to an embodiment of the present application is shown, and for convenience of description, only the portion related to the present application is shown.

In step S101, when performing an audit, the original voice data collected by the audio collecting device is received.

In the embodiment of the present application, the review refers to a scenario in which an auditor and an audited person perform in-person review, and the application scenario of the review may be "school review", "officer review", "loan review", and the like.

In the embodiment of the application, the audio acquisition device is mainly used for acquiring voice signals, and the audio acquisition device collects audio data in a trial-and-error environment through a microphone.

In the embodiment of the application, the original voice data refers to voice information sent by audited personnel collected during auditing. The voice data is identified through a voiceprint identification network, and voiceprint data based on the voice data is obtained.

In step S102, the original speech data is input to a pre-trained deep auto-encoder to perform encoding and decoding operations, so as to obtain an encoding and decoding result.

In the embodiment of the present application, the depth autoencoder is mainly used for recognizing fraudulent speech, and is composed of a speech encoder and a speech decoder, and the main function of the speech encoder is to encode PCM (pulse code modulation) samples of user speech into a small number of bits (frames). This approach makes speech robust (robust) in the presence of link-generated bit errors, network jitter and bursty transmissions. At the receiving end, the speech frame is firstly converted into PCM speech sample, then converted into speech waveform, the speech decoder converts the coding result output by the speech coder into speech output data, under normal condition, the converted speech output data is identical with the speech of the user belonging to the speech coder, if not, the speech of the user has fraud behavior.

In the embodiment of the application, the pre-trained deep self-encoder refers to that before the deep self-encoder is used, the target function of the trained self-encoder is the difference value between decoded data and original input data, while the distribution of fraud-free voice data has certain similarity. The fraudulent speech data and the non-fraudulent speech data are distributed with a large difference, and thus cannot be recovered well after being input from the encoder. Thus, fraudulent speech can be identified using the self-encoder.

In this embodiment, the encoding and decoding result refers to the speech output data obtained by converting the encoding result output by the speech encoder into the speech decoder, and the speech output data is the encoding and decoding result.

In the embodiment of the present application, referring to fig. 2, a schematic diagram of an architecture of a self-encoder provided in the embodiment of the present application is shown, where input data passes through an encoder to obtain an encoding result, and then passes through a decoder to recover the input data. f. of_θ(x) Mapping function representing a depth encoder neural network characterizing an input vector x to an encoding layer representation vector y ═ f_θ(x) The non-linear mapping relation between the y and the y is output as coded data; f'_θ′(y) represents a mapping function of the depth decoder neural network, characterizing the coding layer representation vector y to the reconstructed vector z ═ f'_θ′(y) a non-linear mapping relationship between (y), and outputting z as decoded data.

In step S103, the original speech data and the encoding/decoding result are compared to obtain an error value.

In the embodiment of the present application, the comparison operation is mainly used to distinguish whether the sound wave shapes of the original voice data and the encoding and decoding results are similar.

In the embodiment of the application, the sound wave shapes corresponding to the original voice data and the encoding and decoding results can be obtained through fourier transform processing, wherein the sound wave shapes are represented as 1 upwards and 0 downwards, then all the shapes can be represented as "1" and "0" to obtain two sections of sound wave shape texts with the same length, and then the error values of the original voice data and the encoding and decoding results are calculated according to the hamming distance calculation mode.

In the embodiment of the present application, the Hamming Distance (Hamming Distance) calculation method requires that the input texts have the same length, and the Hamming Distance refers to the number of digits of different letters between two texts.

In step S104, it is determined whether the error value satisfies a predetermined fraud threshold.

In the embodiment of the present application, the fraud threshold is mainly used to distinguish whether fraud is present in the original voice data, and the user may preset according to actual situations, for example, the fraud threshold may be 10, 15, 20, and the like, and it should be understood that the examples of the fraud threshold are only for convenience of understanding and are not used to limit the present application.

In step S105, if the error value satisfies the fraud threshold, it is determined that fraud exists in the original voice data.

In step S106, if the error value does not satisfy the fraud threshold, it is determined that the original voice data does not have fraud.

In practical applications, if the fraud threshold is set to 0.02 and the voiceprint 0 of the voice data input by the user is [1.0,2.0,3.0, 4.0,5.0], and the output voiceprint 1 obtained after the encoding and decoding by the deep self-encoder is [1.1, 2.1, 3.1, 4.1, 5.1], then the mean square error between voiceprint 0 and voiceprint 1 is [ (1.1-1.0) ^2+ (2.1-2.0) ^2+ (3.1-3.0) ^2+ (4.1-4.0) ^2+ (5.1-5.0) ^2]/5 is 0.01<0.02, then the voice data is determined to be fraudulent; if the output voiceprint 2 obtained after the encoding and decoding by the depth auto-encoder is [3.0, 4.0,5.0, 6.0, 7.0], then the mean square error between voiceprint 0 and voiceprint 2 is [ (3.0-1.0) ^2+ (4.0-2.0) ^2+ (5.0-3.0) ^2+ (6.0-4.0) ^2+ (7.0-5.0) ^2]/5 is 4.0>0.02, then the speech data is determined to have fraud.

According to the method for identifying the fraudulent conduct based on the deep self-encoder, the original voice data is subjected to encoding operation and decoding operation through the trained deep self-encoder, the encoding and decoding result recovered based on the deep self-encoder is obtained, whether the fraudulent conduct exists in the original voice data can be confirmed by comparing the signal error of the original voice data and the encoding and decoding result, and due to the fact that most of training linguistic data are normal samples without fraud, the large signal error can not occur when the encoding and decoding of the deep self-encoder are free of fraudulent voice, the problem that a traditional voice emotion model is difficult to accurately identify the fraudulent sample is solved.

Continuing to refer to fig. 3, a flowchart of an implementation of a depth self-encoder obtaining method provided in an embodiment of the present application is shown, and for convenience of description, only the portion related to the present application is shown.

In some optional implementation manners of the first embodiment of the present application, the method for identifying fraud based on a depth self-encoder further includes: step S201, step S202, step S203, and step S204.

In step S201, the local database is read, and the training speech data is acquired in the local database.

In the embodiment of the application, the local database stores the analyzed voice data sample in advance, the voice data sample can be obtained by screening through technical personnel, further, in order to avoid the limitation of subjective judgment, emotion fluctuation can be analyzed through voice signals to be screened, and the accuracy of the sample is improved.

In step S202, a default self-encoder is constructed, which is composed of at least one self-encoder.

In the embodiment of the application, each self-encoder is a neural network with the same input and learning targets, and the structure of the neural network is divided into an encoder part and a decoder part. Given an input space X belonged to X and a feature space h belonged to F, solving the mapping F and g of the input space X belonged to X and the feature space h belonged to F by an automatic encoder to minimize the reconstruction error of the input feature:

f：x→F

g：F→x

after the solution is completed, the hidden layer feature h output by the encoder, i.e., "encoded feature", may be regarded as a representation of the input data X.

In the embodiment of the present application, the number of the self-encoders may be selected according to the actual situation of the user, and as an example, the number of the self-encoders may be 4, 6, and so on.

In step S203, it is determined whether the default self-encoder is composed of one self-encoder.

In the embodiment of the present application, the training mode of the default self-encoder is determined by determining the number of the default self-encoder components.

In step S204, if the default self-encoder is composed of one self-encoder, the training speech data is input to the self-encoder for self-encoder training operation, so as to obtain a pre-trained deep self-encoder.

In the embodiment of the present application, when there is only one self-encoder in the default self-encoder, the depth self-encoder can be obtained only by training the self-encoder.

In step S205, if the default self-encoder is not only composed of one self-encoder, the training speech data is input to a first self-encoder in the depth self-encoder for performing a self-encoder training operation, so as to obtain first training data.

In step S206, the first training data is input to the second self-encoder to perform the self-encoder training operation, and the remaining self-encoders are trained one by one in sequence;

in step S207, after all the self-encoders complete the self-encoder training operation, the pre-trained depth self-encoder is obtained.

In the embodiment of the application, a user can preset the number of the components of the self-encoder in the depth self-encoder, and distribute the corresponding training mode according to the number of the components.

Continuing to refer to fig. 4, a flowchart of an implementation of the depth self-encoder optimization method provided in the embodiment of the present application is shown, and for convenience of description, only the portion related to the present application is shown.

In some optional implementation manners of the first embodiment of the present application, before step S207, the method further includes: step S301.

In step S301, the depth self-encoder is tuned based on the error back-propagation algorithm to minimize the input and output errors of the depth self-encoder.

In the embodiment of the application, the error back propagation algorithm is one of the most important and effective algorithms which are applied most on automatic control. The implementation process of the error back propagation algorithm is a process of feeding back error data output by an output layer to each self-encoder, and each self-encoder corrects the weight of each self-encoder according to the error data, so that self-optimization is realized.

In practical applications, the tuning operation is as follows:

1) initialization

2) Inputting training sample pairs, calculating output of each layer

3) Calculating network output error

4) Calculating error signals of each layer

5) Adjust the weight of each layer

6) Checking whether the total error of the network meets the precision requirement

If yes, ending the training; if not, returning to the step 2.

The tuning operation is specifically carried out twice, wherein the first time is to add Gaussian noise with specific distribution at the input end of the coding layer; the second time is to force the input to the encoding layer to be binary '0' or '1' in a rounded manner, in the backward propagation, the gradient is still computed in floating-point real numbers.

Continuing to refer to fig. 5, a flowchart for implementing step S301 in fig. 4 is shown, and for convenience of illustration, only the portions relevant to the present application are shown.

In some optional implementation manners of the first embodiment of the present application, the step S301 specifically includes: step S401 and step S402.

In step S401, gaussian noise is added to the input from the encoder coding layer to generate errors in the input data.

In the embodiment of the present application, gaussian noise is an error conforming to a gaussian normal distribution. In some cases, the addition of proper Gaussian noise to standard data can make the data have certain errors and have experimental value. Specifically, the mean value of Gaussian noise is 0 and the variance σ is²Predetermined and kept constant during the first tuning training. Further, the variance σ of the Gaussian noise²Is 0.3.

In the embodiment of the application, Gaussian noise with specific distribution is added at the input end of the coding layer, so that the output of the coding layer of the trained neural network of the depth self-encoder is approximate to 0-1 Boolean distribution. This is because the decoder network is very sensitive to the output of the coding layer, very small changes in the output of the coding layer will cause the decoder output to be different, and the goal of the self-encoder optimization is to reconstruct the input vector as much as possible, so the output of the decoder is relatively deterministic. When Gaussian noise with specific distribution is added at the input end of the coding layer, in order to adapt to the randomness in the training process of the neural network, the output of the coding layer tends to be 0-1 Boolean distribution, because only the output of the coding layer under the Boolean distribution is minimally affected by the randomness, so that the output of a decoder is ensured to be stable.

In step S402, when data is output from the output terminal of the encoder coding layer, a binarization operation is performed on the output data to reduce the influence of the randomness of the input data on the output data.

In the embodiment of the present application, the binarization operation refers to a manner of forcibly converting the output data of the encoding layer into '0' or '1' in a rounded manner so that the output data calculates a gradient as a floating-point real number when performing self-optimization.

In the embodiment of the application, when tuning training is performed by using an error back propagation algorithm, error minimization is always attempted, and when training is performed under the mechanism of forced binarization at the output of the coding layer, the floating-point real number at the output of the coding layer will tend to be distributed in 0-1 Boolean mode, because the error minimization can be performed only under the 0-1 Boolean distribution.

In the embodiment of the application, on the basis that the Gaussian noise with specific distribution is added at the input end of the coding layer in the first tuning training, the output of the coding layer is forcedly binarized on the basis of the second tuning training, so that the neural network of the depth self-encoder has the best performance after training.

In some optional implementations of the first embodiment of the present application, the training operation is performed by minimizing θ^*To obtain, minimize, [ theta ]^*Expressed as:

wherein n represents the number of training data samples; θ ═ w, b } and θ' ═ w^TAnd b' respectively denote parameter matrices of an encoder and a decoder; theta^*,θ′^*Expressed as an optimized parameter matrix; x is the number of⁽ⁱ⁾Is an input from an encoder, z⁽ⁱ⁾＝f′θ′(fθ(X⁽ⁱ⁾) Is the output from the encoder; e (x, z) is a loss function, and E (x, z) is expressed as:

where N is the vector dimension and k is the dimension subscript.

In summary, the present application provides a method for identifying fraudulent conduct based on a deep self-encoder, which performs encoding operation and decoding operation on original voice data through a trained deep self-encoder to obtain an encoding and decoding result recovered based on the deep self-encoder, and can confirm whether fraudulent conduct exists in the original voice data by comparing signal errors of the original voice data and the encoding and decoding result. Meanwhile, Gaussian noise with specific distribution is added at the input end of the coding layer, so that the output of the coding layer of the neural network of the depth self-encoder obtained by training is approximate to 0-1 Boolean distribution. This is because the decoder network is very sensitive to the output of the coding layer, very small changes in the output of the coding layer will cause the decoder output to be different, and the goal of the self-encoder optimization is to reconstruct the input vector as much as possible, so the output of the decoder is relatively deterministic. When Gaussian noise with specific distribution is added at the input end of the coding layer, the output of the coding layer tends to be 0-1 Boolean distribution in the training process of the neural network in order to adapt to the randomness, and the influence of the randomness on the output of the coding layer is minimum under the Boolean distribution so as to ensure the output stability of a decoder; on the basis that Gaussian noise with specific distribution is added at the input end of the coding layer in the first tuning training, the output of the coding layer is forcedly binarized on the basis of the second tuning training, and therefore the performance of the neural network of the depth self-encoder is optimal after training.

It is emphasized that, in order to further ensure the privacy and security of the original voice data, the original voice data may also be stored in a node of a block chain.

The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware associated with computer readable instructions, which can be stored in a computer readable storage medium, and when executed, can include processes of the embodiments of the methods described above. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

Example two

With further reference to fig. 6, as an implementation of the method shown in fig. 1, a second embodiment of the present application provides a device for identifying fraud based on a depth self-encoder, where an embodiment of the device corresponds to the embodiment of the method shown in fig. 1, and the device may be applied to various electronic devices.

As shown in fig. 6, the depth self-encoder-based fraud recognition apparatus 100 according to the second embodiment of the present application includes: the voice recognition system comprises a voice acquisition module 101, an encoding operation module 102, a comparison operation module 103, a threshold judgment module 104, a first behavior determination module 105 and a second behavior determination module 106. Wherein:

the voice acquisition module 101 is used for receiving original voice data acquired by the audio acquisition equipment during face examination;

the encoding and decoding module 102 is configured to input original voice data to a pre-trained deep auto-encoder to perform encoding and decoding operations, so as to obtain an encoding and decoding result;

the comparison operation module 103 is configured to perform a comparison operation on the original voice data and the encoding and decoding result to obtain an error value;

a threshold determination module 104, configured to determine whether the error value satisfies a preset fraud threshold;

a first behavior determining module 105, configured to determine that the original voice data has a fraud behavior if the error value satisfies a fraud threshold;

and a second behavior determining module 106, configured to determine that the original voice data has no fraud behavior if the error value does not satisfy the fraud threshold.

The second embodiment of the application provides a fraud recognition device based on a depth self-encoder, the original voice data is encoded and decoded by the trained depth self-encoder, the coding and decoding result recovered by the depth self-encoder is obtained, and by comparing the signal errors of the original voice data and the coding and decoding result, whether fraud exists in the original voice data can be confirmed.

Continuing to refer to fig. 7, a schematic structural diagram of a depth self-encoder obtaining apparatus according to the second embodiment of the present application is shown, and for convenience of description, only the relevant portions of the present application are shown.

In some optional implementations of the second embodiment of the present application, the apparatus 100 for identifying fraud based on a depth self-encoder further includes: the system comprises a training data acquisition module 107, a construction module 108, a composition judgment module 109, a first result module 110, a second result module 111, a training operation module 112 and a deep self-encoder confirmation module 113. Wherein:

a training data acquisition module 107, configured to read a local database and acquire training speech data in the local database;

a building module 108, configured to build a default self-encoder, where the default self-encoder is composed of at least one self-encoder;

a composition determining module 109, configured to determine whether the default self-encoder consists of one self-encoder;

a first result module 110, configured to, if the default self-encoder consists of one self-encoder, input training speech data to the self-encoder for self-encoder training operation, so as to obtain a depth self-encoder trained in advance;

a second result module 111, configured to, if the default self-encoder is not only composed of one self-encoder, input the training speech data to a first self-encoder in the depth self-encoder to perform a self-encoder training operation, so as to obtain first training data;

a training operation module 112, configured to input the first training data to the second self-encoder for self-encoder training operation, and train the remaining self-encoders one by one in sequence;

the self-depth encoder confirming module 113 obtains a pre-trained self-depth encoder after all self-encoders complete the self-encoder training operation.

f：x→F

g：F→x

Continuing to refer to fig. 8, a schematic structural diagram of the depth self-encoder optimization apparatus according to the second embodiment of the present application is shown, and for convenience of description, only the portions related to the present application are shown.

In some optional implementations of the first embodiment of the present application, the apparatus 100 for identifying fraud based on a depth self-encoder further includes: tuning operation block 114. Wherein:

and a tuning operation module 114 for tuning the depth self-encoder based on the error back-propagation algorithm to minimize the input and output errors of the depth self-encoder.

In practical applications, the tuning operation is as follows:

1) initialization

2) Inputting training sample pairs, calculating output of each layer

3) Calculating network output error

4) Calculating error signals of each layer

5) Adjust the weight of each layer

If yes, ending the training; if not, returning to the step 2.

In some optional implementations of the second embodiment of the present application, the tuning operation module 114 specifically includes: a first tuning operation submodule and a second tuning operation submodule. Wherein:

the first tuning operation sub-module is used for adding Gaussian noise into the input end of the coding layer of the self-encoder so as to generate errors in input data;

and the second tuning operation submodule is used for carrying out binarization operation on the output data when the data is output from the output end of the coding layer of the coder so as to reduce the influence of the randomness of the input data on the output data.

When the depth self-encoder is in forward propagation, Gaussian noise with specific distribution is added at the input end of the encoding layer.

In the embodiment of the present application, the mean value of Gaussian noise is 0, and the variance σ ^2 is predetermined and remains unchanged in the first tuning training. Further, the variance σ ^2 of the Gaussian noise is 0.3.

In some optional implementation manners of the first embodiment of the present application, the step S301 specifically includes: when the depth self-encoder propagates in the forward direction, the output of the encoding layer is forced to be binary to '0' or '1' in a rounding manner; in back propagation, gradients are still computed in floating point real numbers.

In some optional implementations of the second embodiment of the present application, the training operation is performed by minimizing θ^*To obtain, minimize, [ theta ]^*Expressed as:

where N is the vector dimension and k is the dimension subscript.

To sum up, the application provides a fraudulent conduct recognition device based on depth autoencoder, carry out coding operation and decoding operation to original speech data through the depth autoencoder who trains well, obtain the codec result based on depth autoencoder resumes, signal error through comparing this original speech data and codec result, can confirm whether this original speech data has fraudulent conduct, because the training corpus majority all is the normal sample of no fraud, can not appear great signal error when making this depth autoencoder's coding decode the no fraudulent speech, and then avoid the tradition to adopt the speech emotion model to distinguish the problem of fraudulent sample with accuracy, simultaneously, this application need not the technical staff and examines whether the training corpus has fraudulent conduct, greatly save technical staff's manpower and materials. Meanwhile, Gaussian noise with specific distribution is added at the input end of the coding layer, so that the output of the coding layer of the neural network of the depth self-encoder obtained by training is approximate to 0-1 Boolean distribution. This is because the decoder network is very sensitive to the output of the coding layer, very small changes in the output of the coding layer will cause the decoder output to be different, and the goal of the self-encoder optimization is to reconstruct the input vector as much as possible, so the output of the decoder is relatively deterministic. When Gaussian noise with specific distribution is added at the input end of the coding layer, the output of the coding layer tends to be 0-1 Boolean distribution in the training process of the neural network in order to adapt to the randomness, and the influence of the randomness on the output of the coding layer is minimum under the Boolean distribution so as to ensure the output stability of a decoder; on the basis that Gaussian noise with specific distribution is added at the input end of the coding layer in the first tuning training, the output of the coding layer is forcedly binarized on the basis of the second tuning training, and therefore the performance of the neural network of the depth self-encoder is optimal after training.

In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 9, fig. 9 is a block diagram of a basic structure of a computer device according to the present embodiment.

The computer device 200 includes a memory 210, a processor 220, and a network interface 230 communicatively coupled to each other via a system bus. It is noted that only computer device 200 having

components

210 and 230 is shown, but it is understood that not all of the illustrated components are required and that more or fewer components may alternatively be implemented. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.

The memory 210 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage 210 may be an internal storage unit of the computer device 200, such as a hard disk or a memory of the computer device 200. In other embodiments, the memory 210 may also be an external storage device of the computer device 200, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, provided on the computer device 200. Of course, the memory 210 may also include both internal and external storage devices of the computer device 200. In this embodiment, the memory 210 is generally used for storing an operating system and various types of application software installed on the computer device 200, such as computer readable instructions of a deep self-encoder-based fraud identification method. In addition, the memory 210 may also be used to temporarily store various types of data that have been output or are to be output.

The processor 220 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 220 is generally operative to control overall operation of the computer device 200. In this embodiment, the processor 220 is configured to execute the computer readable instructions or process data stored in the memory 210, for example, execute the computer readable instructions of the deep self-encoder based fraud detection method.

The network interface 230 may include a wireless network interface or a wired network interface, and the network interface 230 is generally used to establish a communication connection between the computer device 200 and other electronic devices.

The application provides a fraudulent conduct identification method based on a depth self-encoder, encoding operation and decoding operation are carried out on original voice data through a trained depth self-encoder, encoding and decoding results recovered based on the depth self-encoder are obtained, whether fraudulent conduct exists on the original voice data can be confirmed by comparing signal errors of the original voice data and the encoding and decoding results, and due to the fact that most of training linguistic data are normal samples without fraud, large signal errors can not occur when the encoding and decoding of the depth self-encoder are free of fraudulent voices, the problem that a traditional voice emotion model is difficult to accurately identify the fraudulent samples is avoided.

The present application further provides another embodiment, which is to provide a computer-readable storage medium storing computer-readable instructions executable by at least one processor to cause the at least one processor to perform the steps of the deep self-encoder based fraud identification method as described above.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims

1. A fraud identification method based on a depth self-encoder is characterized by comprising the following steps:

judging whether the error value meets a preset fraud threshold value or not;

2. The method for identifying fraud according to claim 1, wherein the step of inputting the original speech data to a depth auto-encoder trained in advance to perform codec operation to obtain codec result further comprises:

reading a local database, and acquiring training voice data in the local database;

constructing a default self-encoder, wherein the default self-encoder at least consists of one self-encoder;

judging whether the default self-encoder consists of one self-encoder or not;

if the default self-encoder consists of one self-encoder, inputting the training voice data into the self-encoder to perform self-encoder training operation, and obtaining the depth self-encoder trained in advance;

if the default self-encoder consists of one self-encoder, inputting the training voice data to a first self-encoder in the depth self-encoder to perform self-encoder training operation to obtain first training data;

inputting the first training data into a second self-encoder to carry out self-encoder training operation, and training the rest self-encoders one by one in sequence;

and obtaining the depth autoencoder trained in advance after all the autoencoders finish the autoencoder training operation.

3. The method for identifying fraud according to claim 2, wherein the step of obtaining the pre-trained depth autoencoder after all the autoencoders complete the autoencoder training operation further comprises:

and carrying out tuning operation on the depth self-encoder based on an error back propagation algorithm so as to minimize input and output errors of the depth self-encoder.

4. The method for identifying fraud according to claim 3, wherein the step of performing tuning operation on the depth self-encoder based on the error back-propagation algorithm to minimize the input and output errors of the depth self-encoder specifically comprises:

adding Gaussian noise into the input end of the self-encoder coding layer to enable input data to generate errors;

when the output end of the self-encoder coding layer outputs data, binarization operation is carried out on the output data so as to reduce the influence of the randomness of the input data on the output data.

5. The method of claim 2, wherein the training operation is performed by minimizing θ^*To obtain said minimum θ^*Expressed as:

wherein n represents the number of training data samples; θ ═ w, b } and θ' ═ w^TAnd b' respectively denote parameter matrices of an encoder and a decoder; theta^*，θ′^*Expressed as an optimized parameter matrix; x is the number of⁽ⁱ⁾Is an input from an encoder, z⁽ⁱ⁾＝f′θ′(fθ(X⁽ⁱ⁾) Is the output from the encoder; e (x, z) is a loss function, and E (x, z) is expressed as:

where N is the vector dimension and k is the dimension subscript.

6. The deep self-encoder based fraud identification method of claim 1, wherein after said step of receiving raw speech data collected by an audio collection device while conducting an interview, the method further comprises the steps of:

storing the raw speech data into a blockchain.

7. An apparatus for identifying fraud based on a depth self-encoder, comprising:

8. The apparatus for deep self-encoder based fraud identification of claim 7, further comprising:

the training data acquisition module is used for reading a local database and acquiring training voice data in the local database;

the device comprises a construction module, a data processing module and a data processing module, wherein the construction module is used for constructing a default self-encoder which at least consists of one self-encoder;

the composition judgment module is used for judging whether the default self-encoder consists of one self-encoder or not;

a first result module, configured to, if the default self-encoder consists of one self-encoder, input the training speech data to the self-encoder for a self-encoder training operation, so as to obtain the depth self-encoder trained in advance;

a second result module, configured to, if the default self-encoder consists of one self-encoder, input the training speech data to a first self-encoder in the deep self-encoder to perform a self-encoder training operation, so as to obtain first training data;

the training operation module is used for inputting the first training data into a second self-encoder to carry out self-encoder training operation, and training the rest self-encoders one by one in sequence;

and the depth self-encoder confirming module is used for obtaining the depth self-encoder trained in advance after all the self-encoders finish the self-encoder training operation.

9. A computer device comprising a memory having computer readable instructions stored therein and a processor which when executed implements the steps of the method of deep self-encoder based fraud identification of claims 1 to 6.

10. A computer-readable storage medium, having computer-readable instructions stored thereon, which, when executed by a processor, implement the steps of the method for deep self-encoder based fraud identification according to any one of claims 1 to 6.