WO2021196458A1

WO2021196458A1 - Intelligent loan entry method, and apparatus and storage medium

Info

Publication number: WO2021196458A1
Application number: PCT/CN2020/103931
Authority: WO
Inventors: 张山; 余自雷
Original assignee: 深圳壹账通智能科技有限公司
Priority date: 2020-04-02
Filing date: 2020-07-24
Publication date: 2021-10-07
Also published as: CN111583935A

Abstract

The present application relates to the technical field of artificial intelligence. Disclosed are an intelligent loan entry method, and an apparatus and a storage medium. The method comprises: obtaining a first input voice of a user during entry and a second input voice of a user during examination and approval; extracting voice features; performing voice analysis by utilizing a voice analysis model to obtain whether the user during the entry and the user during the examination and approval are the same user; if the users are the same user, enabling the examination and approval to be passed, so that the entry of the user is successful; and if the users are not the same user, enabling the examination and approval not to be passed, so that the entry of the user fails, wherein the voice analysis model uses an adversarial neural network model and comprises a generation model and a discrimination model, the generation model is used for generating a voice vector corresponding to the second input voice, and the discrimination model is used for determining the probability that a user corresponding to the second input voice and a user corresponding to the first input voice are the same user. According to the present application, user information can be rechecked and compared to confirm whether the operation is performed by the user, and multi-dimensional discrimination of the user information is performed.

Description

Method, device and storage medium for intelligent loan entry

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on April 02, 2020, the application number is 202010254541.2, and the invention title is "the method, device and storage medium for loan intelligent entry", the entire content of which is incorporated by reference In this application.

Technical field

This application relates to the field of artificial intelligence technology, and in particular to a method, device and storage medium for intelligent loan entry.

Background technique

At present, loan products usually enter the fields required by the product one by one into the app of various loan products through manual entry by users, and usually need to enter dozens or even hundreds of fields, which takes a lot of time And energy, and due to differences in cultural level, some loan users are slow to type. If you have to enter dozens or even hundreds of fields, the efficiency can be imagined. The inventor realizes that this method of manually entering a large number of user information fields is not only inefficient, but also increases the barriers to loan application for some users who are willing to lend but are not good at typing. Moreover, it is difficult to verify the authenticity of users only through field entry. Information is judged in multiple dimensions.

technical problem

This application provides a method, a device and a storage medium for intelligent loan entry, to solve the problem in the prior art that it is difficult to judge the real information of users in multiple dimensions.

Technical solutions

In order to achieve the above objective, the first aspect of this application is to provide a method for intelligent loan entry, which includes: acquiring the user’s first input voice during the entry and the user’s second input voice during approval; extracting the first input The voice features of the voice and the second input voice; using the trained voice analysis model, perform voice analysis on the first input voice and the second input voice, and obtain the user at the time of the entry and the approval at the time of approval Whether the user is the same user; if the user at the time of the input and the user at the time of approval are the same user, the approval is passed and the user is successful; if the user at the time of the input and the user at the time of approval are not the same user, the approval is not passed. The user input fails; wherein the voice analysis model adopts an adversarial neural network model, the adversarial neural network model includes a generation model and a discriminant model, and the generation model is used to generate a speech vector corresponding to the second input speech, The discriminant model is used to determine the probability that the user corresponding to the second input voice and the user corresponding to the first input voice are the same user.

In order to achieve the above objective, a second aspect of the present application is to provide an electronic device, the electronic device comprising: a processor and a memory, the memory includes a loan smart entry program, the loan smart entry program is The processor implements the following intelligent loan entry method when executed: acquiring the user's first input voice during entry and the user's second input voice during approval; extracting voice features of the first input voice and the second input voice; Using the trained voice analysis model, perform voice analysis on the first input voice and the second input voice, and find out whether the user at the time of the entry and the user at the time of approval are the same user; if the user at the time of the entry If the user at the time of approval is the same user, the approval is passed, and the user's entry is successful; if the user at the time of entry and the user at the time of approval are not the same user, the approval is not passed and the user's entry fails; wherein, the voice analysis model Adopt a confrontation neural network model, the confrontation neural network model includes a generation model and a discrimination model, the generation model is used to generate a speech vector corresponding to the second input speech, the discrimination model is used to determine the second input speech corresponds to The probability that the user corresponding to the first input voice is the same user.

In order to achieve the above objective, the third aspect of the present application is to provide a computer-readable storage medium, the computer-readable storage medium includes a loan intelligent payment program, when the loan intelligent payment program is executed by a processor, Realize the following intelligent loan entry method: obtain the user’s first input voice during entry and the user’s second input voice during approval; extract the voice features of the first input voice and the second input voice; The voice analysis model is to perform voice analysis on the first input voice and the second input voice, and find whether the user at the time of the entry and the user at the time of approval are the same user; if the user at the time of the entry is the same as the user at the time of approval If the user is the same user, the approval is passed and the user's entry is successful; if the user at the time of the entry and the user at the time of approval are not the same user, the approval is not passed and the user's entry fails; wherein the voice analysis model adopts an adversarial neural network Model, the confrontation neural network model includes a generative model and a discriminant model, the generative model is used to generate a speech vector corresponding to the second input speech, and the discriminant model is used to determine whether the user corresponding to the second input speech and the first The probability that the user corresponding to an input voice is the same user.

In order to achieve the foregoing objective, the fourth aspect of the present application is to provide a loan intelligent entry device, including: a voice acquisition module for acquiring the first input voice of the user during entry and the second input voice of the user during approval; The feature extraction module is used to extract the voice features of the first input voice and the second input voice; the voice analysis module uses the trained voice analysis model to compare the first input voice and the second input voice Voice analysis is performed on the voice; the first judgment module judges whether the user at the time of the entry and the user at the time of approval are the same user according to the results of the voice analysis; if the user at the time of the entry and the user at the time of approval are the same user, the approval is passed, and the user The entry is successful; if the user at the time of entry and the user at the time of approval are not the same user, the approval is not passed, and the user entry fails; wherein, the voice analysis model adopts an adversarial neural network model, and the adversarial neural network model includes generating A model and a discriminant model, the generation model is used to generate a voice vector corresponding to the second input voice, and the discrimination model is used to determine that the user corresponding to the second input voice and the user corresponding to the first input voice are the same user Probability.

Beneficial effect

Compared with the prior art, this application has the following advantages and beneficial effects:

This application is based on artificial intelligence to approve and review the loan input, and use neural network to process the user's voice. Specifically, through the anti-neural network model, the voice input of the user's input and approval can be analyzed, and user information can be reviewed and compared. , In order to confirm whether it is the person who is operating it, and make a multi-dimensional judgment of user information.

This application uses voice interaction to allow users to fill in the information required for the loan during communication, which can effectively prevent customers from generating irritability and other negative emotions, and reduce the technical barriers to loan applications, reducing typing speed, not being able to type and other reasons And the number of customers who gave up their loans.

This application uses voice as a source of loan information, which can increase the user’s speech emotion judgment during voice entry, increase the user’s speech speed, voice frequency and other data, which can be used as an intelligent risk control method to determine the authenticity and effectiveness of the user Means, methods.

Description of the drawings

Figure 1 is a schematic diagram of the process of the intelligent loan entry method described in this application.

Figure 2 is a schematic diagram of the intelligent loan entry device in the application.

The realization, functional characteristics, and advantages of the purpose of this application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.

Detailed ways

The embodiments described in the present application will be described below with reference to the drawings. Those of ordinary skill in the art may realize that the described embodiments can be modified in various different ways or combinations thereof without departing from the spirit and scope of the present application. Therefore, the drawings and description are illustrative in nature, and are only used to explain the application, rather than to limit the protection scope of the claims. In addition, in this specification, the drawings are not drawn to scale, and the same reference numerals denote the same parts.

Figure 1 is a schematic diagram of the process of the loan intelligent entry method of the application. As shown in Figure 1, the loan intelligent entry method includes the following steps: Step S1, obtaining the first input voice of the user during the entry and the user during approval The second input voice of, in which the input refers to the materials required for the loan application submitted to the lending institution or the banking system when the user takes a loan, and the approval refers to the review of the submitted materials after the input, and the approval is passed, it means the input The loan can be issued to the user only if it succeeds; the first input voice refers to the voice information entered when the user applies for a loan, and the second input voice refers to the voice information entered by the user when the user’s loan application is approved; step S2, extract the first voice information One input voice and the voice characteristics of the second input voice; step S3, using the voice analysis model obtained after training, perform voice analysis on the first input voice and the second input voice, and obtain the user and Whether the user at the time of approval is the same user; step S4, judge whether the user at the time of the incoming shipment and the user at the time of approval are the same user. Success; if the user at the time of the incoming shipment is not the same user as the user at the time of approval, the approval will not be passed and the user will fail to enter the shipment.

Wherein, the speech analysis model adopts a confrontation neural network model, the confrontation neural network model includes a generation model and a discrimination model, and the generation model is used to generate a speech vector corresponding to the second input speech (wherein, the speech vector is composed of speech feature values) ), the discriminant model is used to determine the probability that the user corresponding to the second input voice and the user corresponding to the first input voice are the same user. Specifically, if the output probability is greater than or equal to a preset probability threshold, it is determined to be advanced If the output probability is less than the preset probability threshold, it is judged that the input and approval are not the same user, the input is unsuccessful and the loan cannot be issued.

It should be noted that the contents of the first input voice and the second input voice are all kinds of information that the user fills in when applying for a loan, including personal information such as personal ID card, residential address, and spouse, contact person, loan intention, and whether there is a house or not. There are car and other related information.

This application uses the anti-neural network model to perform voice analysis on the input voice of the user during the input and approval, and increases the judgment dimension for the approval of the user's input, so as to determine whether the input and the approval are the same user, so as to determine the authenticity of the user , To realize the user's intelligent input.

In an embodiment of the present application, the loan intelligent input method further includes a training step of a voice analysis model. When the voice analysis model adopts an adversarial neural network model, the step of training the adversarial neural network model includes: obtaining Training samples, the training samples include the first input voice of the user during the input and the second input voice of the user during the approval; the training samples are input into the counter neural network model for training, wherein the first input voice is voiced through the generative model Feature learning, a new voice is generated from the learned voice features as a voice vector corresponding to the second input voice, and the voice vector is used for confrontation training; the first input voice and the second input voice are corresponding to the discriminant model Whether the user is the same user is judged, and the probability that the user corresponding to the first input voice and the second input voice is the same user, the greater the probability, the higher the accuracy of the counter neural network model, when the discriminant model outputs When the accuracy of the result exceeds the preset threshold, the training ends. Further, the generation model uses the user's first input voice file to receive random noise, converts the voice features of the first input voice into a feature table through the fully connected layer of the generation model, and analyzes the voice of the feature table through the deconvolution layer. The feature data is subjected to a deconvolution operation, and an output voice feature is generated through a multi-layer deconvolution layer as a voice vector corresponding to the second input voice. Through the convolutional layer of the discriminating model, the output voice features of the generated model are convolved, and then connected to the fully connected layer for processing, and finally sent to the activation function. The probability that the output voice feature data is true or false is output. The greater the probability, It indicates that the higher the accuracy of the adversarial neural network model is, when the accuracy exceeds the preset threshold, the training can be ended.

When the second input voice during the user's approval is collected, the authenticity of each feature value of the second input voice of the user is judged through the discriminant model, so as to achieve the purpose of judging whether the user is himself. Specifically, the first input voice input when the user enters the document is input to the generation model to obtain the voice vector corresponding to the second input voice, and both the obtained voice vector and the second input voice are input into the discriminant model, so as to obtain the value of the second input voice. The true or false of the feature value, where the feature value refers to the extracted speech feature, such as the MFCC coefficient below. The probability that the output feature value of the discriminant model is true (the closer the feature value of the second input voice is to the feature value of the first input voice, the more true the second input voice is considered), when the output probability value is greater than the preset probability threshold , It is considered that the user who input the second voice and the user who input the first voice are the same user.

In this application, the same method can be used to extract the voice features of the first input voice and the second input voice. In one embodiment, the Mel frequency cepstrum coefficient (Mel Frequency Cepstrum Coefficient, MFCC) Voice features sample and analyze sample data, and extract voice features by using spectrogram, cepstrum analysis, Mel frequency analysis, and Mel frequency cepstrum coefficients.

Taking the extraction of the voice features of the first input speech as an example, the specific steps of extracting the MFCC voice features include: pre-processing the first input speech, the pre-processing including pre-emphasis, framing and windowing processing; through fast Fourier Change (fast Fourier transform, FFT) get the FFT spectrum corresponding to each short-term analysis window; get the Mel spectrum corresponding to the FFT spectrum through the Mel filter bank; perform cepstrum analysis on the Mel spectrum to obtain the Mel frequency cepstrum coefficient MFCC, where , Cepstrum analysis includes the logarithmic processing of the Mel spectrum, and then the inverse Fourier transform. The inverse Fourier transform uses the discrete cosine transform (Discrete Cosine Transform, DCT) is implemented, and the second to thirteenth coefficients after DCT are taken as MFCC coefficients.

It should be noted that the above-mentioned similar methods can be used to extract multiple features of speech, not limited to MFCC coefficient speech features, such as speech rate, loudness, treble, pause, etc. Among them, the loudness is related to the frequency, and the loudness is expressed in logarithmic values, which is the loudness level, and the unit of the loudness level is square. The corresponding relationship between the loudness and the frequency sound level is calculated using the equal loudness curve formula. Pitch is defined by the frequency of the sound. Pauses are distinguished by the number of rests. Therefore, a similar method can be used for spectrum analysis to obtain the above-mentioned voice characteristics. Through the analysis of other voice features, it is possible to review and compare the user information at the time of approval to confirm whether the operation is performed by the person.

In one embodiment, after acquiring the first input voice of the user when entering the document, the method further includes: acquiring input field information corresponding to the first input voice through voice recognition; acquiring the user's ID picture, and acquiring the user’s voice through image text recognition Credential information, where the credential picture refers to the captured ID card picture; the corresponding entry field information is verified by the obtained credential information, for example, the entry field is compared with the field of the credential information to obtain the entry field and For the similarity of the fields of the credential information, if the similarity is greater than or equal to the preset similarity threshold, the verification is passed, and if the similarity is less than the preset similarity threshold, the verification fails.

It should be noted that the speech recognition technology and image text recognition in this application are both existing technologies, and will not be repeated here. Through the combination of voice recognition and image and text recognition, this application further strengthens the verification process of user information and reduces the possibility of user information input errors.

In one embodiment, after acquiring the first input voice of the user during the input, it further includes: acquiring input field information corresponding to the first input voice through voice recognition; displaying the input field information in the form of a page to facilitate the user Enter information to check for deficiencies, ensure the correctness of loan application information, apply for loans in a more convenient and efficient way, and complete the input of loan scenarios.

In one embodiment, after obtaining the first input voice of the user during the entry, it further includes: converting the first input voice into text; performing text emotion recognition on the converted text; judging whether the user is lying according to the text emotion recognition result, and if it is judged If it is concluded that the user is lying, the input is ended, and if it is determined that the user is not lying, the step of obtaining the second input voice of the user during approval is performed. Pre-judge the user through text emotion recognition. If the user lies, there is no need to obtain the second input voice and directly end the entry.

Determining whether the user is lying according to the text emotion recognition result includes: determining whether the user's emotion when entering the first input voice meets a set condition, and if the set condition is met, the user is considered to be lying. For example, when it is recognized that the user’s emotions when entering the first input voice fluctuates sharply, or with panic, consternation and other emotions, the user is considered to be lying; when the user’s emotion is stable and calm when the first input voice is recognized, it is considered The user did not lie. Wherein, the setting conditions include: the speech rate exceeds the first set threshold, the fluctuation of the loudness frequency exceeds the preset fluctuation range (too large or too small can be regarded as the user lying), and the number of speech pauses exceeds the second set threshold. One or more of. For example, the first set threshold may be 150 words/min. When the speech rate is greater than 150 words/min, it is considered that the user's mood fluctuates greatly. That is to say, through the obtained speech features, it is possible to identify whether the emotion fluctuates and the emotion fluctuation situation. The emotion fluctuation here refers to the emotion fluctuation carried by the input voice, and emotion recognition is performed by converting the input voice into text. Analyze the emotional characteristics of the user's voice recording the loan information through the voice state of the user's voice to determine whether it is lying, increase the consideration of the user's speech rate, voice frequency, emotion, etc., which is conducive to effective intelligent risk control judgment.

Figure 2 is a schematic diagram of the intelligent loan entry device of the application. As shown in Figure 2, the loan intelligent entry device includes: a voice acquisition module 1, used to obtain the user's first input voice and approval time during entry The user’s second input voice, in which the input refers to the materials required for the loan application submitted by the user to the lending institution or the banking system when the user takes a loan, and the approval refers to the review of the submitted materials after the input. Only when the application is successful can the user be issued a loan; the first input voice refers to the voice information entered when the user applies for a loan, and the second input voice refers to the voice information entered by the user when the user’s loan application is approved; feature extraction module 2 , Is used to extract the voice features of the first input voice and the second input voice; the voice analysis module 3 uses the voice analysis model obtained through training to perform voice analysis on the first input voice and the second input voice; A judging module 4, according to the voice analysis result, judge whether the user at the time of the incoming shipment and the user at the time of approval are the same user; if the user at the time of incoming shipment and the user at the time of approval are the same user, the approval is passed and the user enters successfully; if If the user at the time of importing and approving is not the same user, the approval will not be passed, and the user will fail to enter the file.

Wherein, the speech analysis model adopts a confrontation neural network model, the confrontation neural network model includes a generation model and a discrimination model, and the generation model is used to generate a speech vector corresponding to the second input speech (wherein, the speech vector is composed of speech feature values) ), the discriminant model is used to determine the probability that the user corresponding to the second input voice and the user corresponding to the first input voice are the same user, specifically, if the output probability obtained by the voice analysis is greater than or equal to the preset probability threshold, Then the judgment module judges that it is the same user at the time of inbound and approval. If the output probability is less than the preset probability threshold, the judgment module judges that the inbound and approval are not the same user, then the inbound is unsuccessful and the loan cannot be issued.

In an embodiment of the present application, the loan intelligent input device further includes a training module to train a voice analysis model. When the voice analysis model adopts an adversarial neural network model, the step of training the adversarial neural network model includes : Obtain training samples, the training samples include the first input voice of the user during the input and the second input voice of the user during the approval; the training samples are input into the anti-neural network model for training, wherein the first input voice is Perform the learning of voice features, and generate new voices through the learned voice features as a voice vector corresponding to the second input voice, and the voice vector is used for confrontation training; Whether the corresponding user is the same user is judged, the probability that the user corresponding to the first input voice and the second input voice is the same user, the greater the probability, the higher the accuracy of the counter neural network model, when the judgment When the accuracy of the output result of the model exceeds the preset threshold, the training ends. Further, the generation model uses the user's first input voice file to receive random noise, converts the voice features of the first input voice into a feature table through the fully connected layer of the generation model, and analyzes the voice of the feature table through the deconvolution layer. The feature data is subjected to a deconvolution operation, and an output voice feature is generated through a multi-layer deconvolution layer as a voice vector corresponding to the second input voice. Through the convolutional layer of the discriminating model, the output voice features of the generated model are convolved, and then connected to the fully connected layer for processing, and finally sent to the activation function. The probability that the output voice feature data is true or false is output. The greater the probability, It indicates that the higher the accuracy of the adversarial neural network model is, when the accuracy exceeds the preset threshold, the training can be ended.

In this application, the feature extraction module can use the same method to extract the voice features of the first input voice and the second input voice. In one embodiment, the Mel frequency cepstrum coefficient (Mel Frequency Cepstrum Coefficient, MFCC) Voice features sample and analyze sample data, and extract voice features by using spectrogram, cepstrum analysis, Mel frequency analysis, and Mel frequency cepstrum coefficients.

Taking the extraction of the voice features of the first input speech as an example, the specific steps of extracting the MFCC voice features include: pre-processing the first input speech, the pre-processing including pre-emphasis, framing and windowing processing; through fast Fourier Transformation (Fast Fourier Transform, FFT) obtain the FFT spectrum corresponding to each short-term analysis window; obtain the Mel spectrum corresponding to the FFT spectrum through the Mel filter bank; perform cepstrum analysis on the Mel spectrum to obtain the Mel frequency cepstrum coefficient MFCC, where , Cepstrum analysis includes the logarithmic processing of the Mel spectrum, and then the inverse Fourier transform. The inverse Fourier transform uses the discrete cosine transform (Discrete Cosine Transform, DCT) is implemented, and the second to thirteenth coefficients after DCT are taken as MFCC coefficients.

It should be noted that the above-mentioned similar methods can be used to extract multiple features of speech, not limited to MFCC coefficient speech features, such as speech rate, loudness, treble, pause, etc. Among them, the loudness is related to the frequency, and the loudness is expressed in logarithmic values, which is the loudness level, and the unit of the loudness level is square. The corresponding relationship between the loudness and the frequency sound level is calculated using the equal loudness curve formula. Pitch is defined by the frequency of the sound. Pauses are distinguished by the number of rests. Therefore, a similar method can be used for spectrum analysis to obtain the above-mentioned voice characteristics.

In one embodiment, the intelligent loan entry device further includes: a voice recognition module, which obtains input field information corresponding to the first input voice through voice recognition, so that the user only needs to perform effective voice interaction to complete the input of the field , Which is conducive to the entry of large quantities of fields, completely separates the loan input from the user’s typing, whether typing is no longer a barrier for users to apply for a loan; the text recognition module obtains the user’s ID picture, and obtains the user’s information through image text recognition Credential information, where the credential picture refers to the captured ID card picture; the verification module verifies the corresponding entry field information through the acquired credential information to ensure the accuracy and reliability of the entered information. For example, by comparing the input field with the field of the certificate information, the similarity between the input field and the field of the certificate information is obtained. If the similarity is greater than or equal to the preset similarity threshold, the verification is passed; if the similarity is less than the preset Similarity threshold, the verification fails.

In one embodiment, the intelligent loan entry device further includes: a voice recognition module, which obtains input field information corresponding to the first input voice through voice recognition; and a page display module, which displays the input field information in the form of a page, with It is convenient for users to check deficiencies in the input information, ensure the correctness of loan application information, apply for loans in a more convenient and efficient way, and complete the input of loan scenarios.

In one embodiment, the intelligent loan entry device further includes: a text conversion module, which is used to convert the first input speech into text; an emotion recognition module, which is used to perform text emotion recognition on the converted text; and a second judgment module, It is used for judging whether the user is lying according to the result of text emotion recognition, if it is judged that the user is lying, then the entry is ended, if it is judged that the user is not lying, then the step of obtaining the second input voice of the user during approval is performed. Pre-judge the user through text emotion recognition. If the user lies, there is no need to obtain the second input voice and directly end the entry. Among them, the specific method for judging whether the user is lying based on the result of text emotion recognition is roughly the same as the judgment method in the above-mentioned intelligent loan entry method, and will not be repeated here.

It should be noted that the specific implementation of the intelligent loan entry device of this application is substantially the same as the specific implementation of the above-mentioned intelligent loan entry method, and will not be repeated here.

The intelligent loan payment method described in this application is applied to an electronic device, and the electronic device may be a terminal device such as a television, a smart phone, a tablet computer, and a computer.

The electronic device includes: a processor and a memory, which are used to store a loan smart purchase program, the processor executes the loan smart purchase program to implement the following loan smart purchase method: the first input voice of the user when the purchase is obtained And the user’s second input voice during approval; extract the voice features of the first input voice and the second input voice; use a trained voice analysis model to compare the first input voice and the second input voice Voice analysis is performed to find out whether the user at the time of the entry and the user at the time of approval are the same user; if the user at the time of the entry and the user at the time of approval are the same user, the approval is passed, and the user enters successfully; If the user and the user at the time of approval are not the same user, the approval is not passed and the user input fails; wherein, the voice analysis model adopts an adversarial neural network model, and the adversarial neural network model includes a generative model and a discriminant model. The model is used to generate a voice vector corresponding to the second input voice, and the discrimination model is used to determine the probability that the user corresponding to the second input voice and the user corresponding to the first input voice are the same user.

The electronic device also includes a network interface, a communication bus, and the like. Among them, the network interface may include a standard wired interface and a wireless interface, and the communication bus is used to realize the connection and communication between various components.

The memory includes at least one type of readable storage medium, which can be a non-volatile storage medium such as a flash memory, a hard disk, an optical disc, or a plug-in hard disk, etc., and is not limited to this, and can be stored in a non-transitory manner Any device that provides instructions or software and any associated data files to the processor to enable the processor to execute the instructions or software program. In this application, the software program stored in the memory includes the loan intelligent payment program, and the loan intelligent payment program can be provided to the processor, so that the processor can execute the loan intelligent payment program and realize the loan intelligent payment method.

The processor can be a central processing unit, a microprocessor, or other data processing chips, etc., and can run a stored program in the memory, for example, the loan smart entry program in this application.

The electronic device may also include a display, and the display may also be called a display screen or a display unit. In some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an organic light-emitting diode (OLED) touch device, and the like. The display is used to display the information processed in the electronic device and to display the visual work interface.

The electronic device may further include a user interface, and the user interface may include an input unit (such as a keyboard), a voice output device (such as a stereo, earphone), and the like.

It should be noted that the specific implementation of the electronic device of the present application is substantially the same as the specific implementation of the above-mentioned intelligent loan payment method and device, and will not be repeated here.

In other embodiments, the loan smart entry program can also be divided into one or more modules, and the one or more modules are stored in the memory and executed by the processor to complete the application. The module referred to in this application refers to a series of computer program instruction segments that can complete specific functions. For example, the loan smart entry program can be divided into: a voice acquisition module 1, a feature extraction module 2, a voice analysis module 3, and a first judgment module 4. The functions or operation steps implemented by the above modules are all similar to the above, and will not be described in detail here.

In an embodiment of the present application, the computer-readable storage medium may be any tangible medium that contains or stores a program or instruction, and may be non-volatile or volatile. The program can be executed through the stored program The instruction-related hardware implements the corresponding functions. For example, the computer-readable storage medium may be a computer disk, a hard disk, a random access memory, a read-only memory, and so on. This application is not limited to this, and it can be any device that stores instructions or software and any related data files or data structures in a non-transitory manner and can be provided to the processor to enable the processor to execute the programs or instructions therein. The computer-readable storage medium includes a loan intelligent payment program, and when the loan intelligent payment program is executed by a processor, the following loan intelligent payment method is realized: the first input voice of the user when the payment is acquired and the approval time The user’s second input voice; extract the voice features of the first input voice and the second input voice; use the trained voice analysis model to perform voice on the first input voice and the second input voice Analyze, it can be concluded whether the user at the time of the input and the user at the time of approval are the same user; if the user at the time of the input and the user at the time of approval are the same user, the approval is passed and the user input is successful; if the user at the time of the input is the same The user at the time of approval is not the same user, the approval is not passed, and the user input fails; wherein, the voice analysis model adopts an adversarial neural network model, and the adversarial neural network model includes a generative model and a discriminant model, and the generative model is used for A voice vector corresponding to the second input voice is generated, and the discrimination model is used to determine the probability that the user corresponding to the second input voice and the user corresponding to the first input voice are the same user.

The specific implementation of the computer-readable storage medium of this application is substantially the same as the specific implementation of the above-mentioned intelligent loan entry method and device, and will not be repeated here.

It should be noted that in this article, the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, device, article or method including a series of elements not only includes those elements, It also includes other elements not explicitly listed, or elements inherent to the process, device, article, or method. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, device, article, or method that includes the element.

The serial numbers of the above-mentioned embodiments of this application are for description only, and do not represent the advantages and disadvantages of the embodiments, and do not limit the scope of the patent of this application. Any equivalent structure or equivalent process transformation made by using the content of the specification and drawings of this application, or directly or Indirect applications in other related technical fields are included in the scope of patent protection of this application for the same reason. Through the description of the above implementation manners, those skilled in the art can clearly understand that the above-mentioned embodiment method can be implemented by means of software plus the necessary general hardware platform, of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM) as described above. , Magnetic disk, optical disk), including several instructions to make a terminal device (can be a mobile phone, a computer, a server, or a network device, etc.) execute the method described in each embodiment of the present application.

Claims

A smart loan entry method, applied to an electronic device, which includes: acquiring a user's first input voice during entry and a user's second input voice during approval; extracting the first input voice and the second input Voice characteristics of the voice; using a trained voice analysis model to perform voice analysis on the first input voice and the second input voice to find whether the user at the time of the entry and the user at the time of approval are the same user; if If the user at the time of input and the user at the time of approval are the same user, the approval is passed and the user enters the piece successfully; if the user at the time of the input and the user at the time of approval are not the same user, the approval is not passed and the user fails to enter the piece; among them, The voice analysis model adopts an adversarial neural network model, the adversarial neural network model includes a generation model and a discriminant model, the generative model is used to generate a speech vector corresponding to the second input speech, and the discriminant model is used to determine The probability that the user corresponding to the second input voice and the user corresponding to the first input voice are the same user.
The method of claim 1, wherein the method of intelligent loan processing further comprises: training the adversarial neural network model; the step of training the adversarial neural network model comprises: obtaining training Sample, the training sample includes the user’s first input voice at the time of entry and the user’s second input voice at the time of approval; the training sample is input to the counter neural network model for training, wherein the first input The voice learns voice features, and generates a voice vector corresponding to the second input voice. The voice vector is used for combat training; whether the user corresponding to the first input voice and the second input voice is The same user judges the probability that the user corresponding to the first input voice and the second input voice is the same user; when the accuracy of the output result of the discrimination model exceeds the preset threshold, the training ends.
The method of claim 2, wherein the step of learning voice features of the first input voice through the generation model, and generating a voice vector corresponding to the second input voice comprises: Input the first input voice into the generation model; convert the voice feature of the first input voice into a feature table through the fully connected layer of the generation model; perform deconvolution on the voice feature data of the feature table Operation, the output voice feature is generated through the multi-layer deconvolution layer as the voice vector corresponding to the second input voice.
The method of claim 3, wherein the step of judging whether the users corresponding to the first input voice and the second input voice are the same user through the discriminant model comprises: convolution layer The output voice feature is subjected to a convolution operation; the convolution operation result is processed through a fully connected layer; and the probability that the output voice feature is true or false is output through an activation function.
The method of claim 1, wherein when extracting the voice features of the first input voice and the second input voice, the voice feature of Mel frequency cepstral coefficient is used for sampling and analysis, and the voice Voice features are extracted by means of spectrogram, cepstrum analysis, Mel frequency analysis, and Mel frequency cepstrum coefficients.
The method of claim 5, wherein the step of extracting the voice feature of Mel frequency cepstral coefficients comprises: pre-processing the input speech, and the pre-processing includes pre-emphasis, framing and windowing; The FFT spectrum corresponding to each short-time analysis window is obtained through fast Fourier transform; the Mel spectrum corresponding to the FFT spectrum is obtained through the Mel filter bank; Cepstrum analysis is performed on the Mel spectrum to obtain the Mel frequency cepstrum coefficient.
The method of claim 1, wherein after obtaining the first input voice of the user during the purchase, the method further comprises: obtaining input field information corresponding to the first input voice through voice recognition; obtaining the user's ID picture , And obtain the user's credential information through image text recognition; verify the corresponding input field information through the obtained credential information.
The method of claim 1, wherein after obtaining the first input voice of the user during the purchase, the method further comprises: obtaining input field information corresponding to the first input voice through voice recognition; and converting the input field information to The form of the page is displayed.
The intelligent loan entry method according to claim 1, wherein after obtaining the first input voice of the user during the entry, the method further comprises: converting the first input voice into text; performing text emotion recognition on the converted text; and performing text emotion recognition on the converted text; The recognition result determines whether the user is lying. If it is determined that the user is lying, the entry is ended. If it is determined that the user is not lying, the step of obtaining the second input voice of the user during approval is performed.
The method of claim 1, wherein determining whether the user is lying according to the result of text emotion recognition comprises: determining whether the emotion of the user when entering the first input voice satisfies a set condition, and if the set condition is satisfied, then It is believed that the user is lying, where the set conditions include one or more of: the speech rate exceeds the first set threshold, the fluctuation of the loudness frequency exceeds the preset fluctuation range, and the number of speech pauses exceeds the second set threshold.
An electronic device, wherein the electronic device includes: a processor and a memory, the memory includes a loan intelligent payment program, and the loan intelligent payment program is executed by the processor to realize the following loan intelligent payment method: Acquire the user’s first input voice during the entry and the user’s second input voice during approval; extract the voice features of the first input voice and the second input voice; use the trained voice analysis model to compare the Perform voice analysis on the first input voice and the second input voice to find out whether the user at the time of the entry and the user at the time of approval are the same user; if the user at the time of the entry and the user at the time of approval are the same user, the approval is passed , The user’s input is successful; if the user at the time of the input and the user at the time of approval are not the same user, the approval is not passed and the user’s input fails; wherein the voice analysis model adopts an adversarial neural network model, and the adversarial neural network model It includes a generative model and a discriminant model, the generative model is used to generate a voice vector corresponding to the second input voice, and the discriminant model is used to determine that the user corresponding to the second input voice is the same as the user corresponding to the first input voice The probability of the user.
The electronic device according to claim 11, wherein, when the loan intelligent input program is executed by the processor, the step of training the counter neural network model is further implemented; wherein, the counter neural network model is trained The step of training includes: obtaining training samples, the training samples including the first input voice of the user during the entry and the second input voice of the user during the approval; inputting the training samples into the confrontation neural network model for training, wherein, through the The generation model learns the voice features of the first input voice, and generates a voice vector corresponding to the second input voice. The voice vector is used for confrontation training; Second, whether the user corresponding to the input voice is the same user is judged, and the probability that the user corresponding to the first input voice and the second input voice are the same user; when the accuracy of the output result of the discrimination model exceeds the preset threshold, the training ends .
The electronic device according to claim 12, wherein the step of learning voice features of the first input voice through the generation model, and generating a voice vector corresponding to the second input voice comprises: The first input voice is input to the generative model; the voice feature of the first input voice is converted into a feature table through the fully connected layer of the generative model; the voice feature data of the feature table is deconvolved, after The multi-layer deconvolution layer generates output voice features as a voice vector corresponding to the second input voice.
The electronic device according to claim 13, wherein the step of judging whether the users corresponding to the first input voice and the second input voice are the same user through the discriminant model comprises: using a convolutional layer to determine the output voice feature Perform a convolution operation; process the result of the convolution operation through a fully connected layer; output the probability that the output voice feature is true or false through an activation function.
11. The electronic device according to claim 11, wherein after obtaining the first input voice of the user during the entry, the method further comprises: converting the first input voice into text; performing text emotion recognition on the converted text; and judging based on the text emotion recognition result Whether the user is lying or not, if it is determined that the user is lying, the entry is ended, and if it is determined that the user is not lying, the step of obtaining the second input voice of the user during approval is performed.
A computer-readable storage medium, wherein the computer-readable storage medium includes a loan intelligent payment program, and when the loan intelligent payment program is executed by a processor, the following loan intelligent payment method is implemented: when acquiring the payment The user’s first input voice and the user’s second input voice at the time of approval; extract the voice features of the first input voice and the second input voice; use the trained voice analysis model to compare the first input voice Perform voice analysis with the second input voice to find out whether the user at the time of the entry and the user at the time of approval are the same user; if the user at the time of the entry and the user at the time of approval are the same user, the approval is passed and the user enters the document Success; if the user at the time of the entry and the user at the time of approval are not the same user, the approval is not passed and the user entry fails; wherein, the voice analysis model adopts an adversarial neural network model, and the adversarial neural network model includes a generative model and A discriminant model, where the generation model is used to generate a speech vector corresponding to the second input speech, and the discrimination model is used to determine the probability that the user corresponding to the second input speech and the user corresponding to the first input speech are the same user.
The computer-readable storage medium according to claim 16, wherein, when the loan smart input program is executed by the processor, the step of training the counter-neural network model is further implemented; wherein, the counter-neural network model is The step of training the model includes: obtaining training samples, the training samples including the first input voice of the user during the entry and the second input voice of the user during the approval; inputting the training samples into the adversarial neural network model for training, wherein The generating model learns voice features of the first input voice, and generates a voice vector corresponding to the second input voice, and the voice vector is used for confrontation training; Whether the user corresponding to the second input voice is the same user is judged, the probability that the user corresponding to the first input voice and the second input voice are the same user; when the accuracy of the output result of the discrimination model exceeds the preset threshold, the end train.
18. The computer-readable storage medium according to claim 17, wherein the step of learning voice features of the first input voice through the generation model, and generating a voice vector corresponding to the second input voice comprises: Input the first input voice into the generation model; convert the voice feature of the first input voice into a feature table through the fully connected layer of the generation model; perform deconvolution on the voice feature data of the feature table Operation, the output voice feature is generated through the multi-layer deconvolution layer as the voice vector corresponding to the second input voice.
18. The computer-readable storage medium according to claim 18, wherein the step of judging whether the users corresponding to the first input voice and the second input voice are the same user through the discriminant model comprises: convolutional layer The output voice feature is subjected to a convolution operation; the convolution operation result is processed through a fully connected layer; and the probability that the output voice feature is true or false is output through an activation function.
An intelligent loan entry device, which includes: a voice acquisition module for acquiring the first input voice of the user during entry and the second input voice of the user during approval; and a feature extraction module for extracting the first input The voice characteristics of the voice and the second input voice; the voice analysis module uses the voice analysis model obtained through training to perform voice analysis on the first input voice and the second input voice; the first judgment module is based on the voice The analysis result determines whether the user at the time of the incoming piece and the user at the time of approval are the same user; if the user at the time of the incoming piece and the user at the time of approval are the same user, the approval is passed, and the user enters the piece successfully; if the user at the time of incoming piece and the approved user are the same user When the user is not the same user, the approval is not passed and the user input fails; wherein, the voice analysis model adopts an adversarial neural network model, and the adversarial neural network model includes a generative model and a discriminant model, and the generative model is used to generate The voice vector corresponding to the second input voice, and the discrimination model is used to determine the probability that the user corresponding to the second input voice and the user corresponding to the first input voice are the same user.

To