WO2021196458A1 - Intelligent loan entry method, and apparatus and storage medium - Google Patents

Intelligent loan entry method, and apparatus and storage medium Download PDF

Info

Publication number
WO2021196458A1
WO2021196458A1 PCT/CN2020/103931 CN2020103931W WO2021196458A1 WO 2021196458 A1 WO2021196458 A1 WO 2021196458A1 CN 2020103931 W CN2020103931 W CN 2020103931W WO 2021196458 A1 WO2021196458 A1 WO 2021196458A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
voice
input
input voice
model
Prior art date
Application number
PCT/CN2020/103931
Other languages
French (fr)
Chinese (zh)
Inventor
张山
余自雷
Original Assignee
深圳壹账通智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳壹账通智能科技有限公司 filed Critical 深圳壹账通智能科技有限公司
Publication of WO2021196458A1 publication Critical patent/WO2021196458A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/18Artificial neural networks; Connectionist approaches
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/26Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to a method, device and storage medium for intelligent loan entry.
  • loan products usually enter the fields required by the product one by one into the app of various loan products through manual entry by users, and usually need to enter dozens or even hundreds of fields, which takes a lot of time And energy, and due to differences in cultural level, some loan users are slow to type. If you have to enter dozens or even hundreds of fields, the efficiency can be imagined.
  • the inventor realizes that this method of manually entering a large number of user information fields is not only inefficient, but also increases the barriers to loan application for some users who are willing to lend but are not good at typing. Moreover, it is difficult to verify the authenticity of users only through field entry. Information is judged in multiple dimensions.
  • This application provides a method, a device and a storage medium for intelligent loan entry, to solve the problem in the prior art that it is difficult to judge the real information of users in multiple dimensions.
  • the first aspect of this application is to provide a method for intelligent loan entry, which includes: acquiring the user’s first input voice during the entry and the user’s second input voice during approval; extracting the first input The voice features of the voice and the second input voice; using the trained voice analysis model, perform voice analysis on the first input voice and the second input voice, and obtain the user at the time of the entry and the approval at the time of approval Whether the user is the same user; if the user at the time of the input and the user at the time of approval are the same user, the approval is passed and the user is successful; if the user at the time of the input and the user at the time of approval are not the same user, the approval is not passed.
  • the user input fails; wherein the voice analysis model adopts an adversarial neural network model, the adversarial neural network model includes a generation model and a discriminant model, and the generation model is used to generate a speech vector corresponding to the second input speech, The discriminant model is used to determine the probability that the user corresponding to the second input voice and the user corresponding to the first input voice are the same user.
  • the voice analysis model adopts an adversarial neural network model
  • the adversarial neural network model includes a generation model and a discriminant model
  • the generation model is used to generate a speech vector corresponding to the second input speech
  • the discriminant model is used to determine the probability that the user corresponding to the second input voice and the user corresponding to the first input voice are the same user.
  • a second aspect of the present application is to provide an electronic device, the electronic device comprising: a processor and a memory, the memory includes a loan smart entry program, the loan smart entry program is The processor implements the following intelligent loan entry method when executed: acquiring the user's first input voice during entry and the user's second input voice during approval; extracting voice features of the first input voice and the second input voice; Using the trained voice analysis model, perform voice analysis on the first input voice and the second input voice, and find out whether the user at the time of the entry and the user at the time of approval are the same user; if the user at the time of the entry If the user at the time of approval is the same user, the approval is passed, and the user's entry is successful; if the user at the time of entry and the user at the time of approval are not the same user, the approval is not passed and the user's entry fails; wherein, the voice analysis model Adopt a confrontation neural network model, the confrontation neural network model includes a generation model and a discrimination model
  • the third aspect of the present application is to provide a computer-readable storage medium
  • the computer-readable storage medium includes a loan intelligent payment program, when the loan intelligent payment program is executed by a processor, Realize the following intelligent loan entry method: obtain the user’s first input voice during entry and the user’s second input voice during approval; extract the voice features of the first input voice and the second input voice;
  • the voice analysis model is to perform voice analysis on the first input voice and the second input voice, and find whether the user at the time of the entry and the user at the time of approval are the same user; if the user at the time of the entry is the same as the user at the time of approval If the user is the same user, the approval is passed and the user's entry is successful; if the user at the time of the entry and the user at the time of approval are not the same user, the approval is not passed and the user's entry fails; wherein the voice analysis model adopts an adversarial neural network Model, the confrontation neural network model includes a generative model and a discriminant model,
  • the fourth aspect of the present application is to provide a loan intelligent entry device, including: a voice acquisition module for acquiring the first input voice of the user during entry and the second input voice of the user during approval;
  • the feature extraction module is used to extract the voice features of the first input voice and the second input voice;
  • the voice analysis module uses the trained voice analysis model to compare the first input voice and the second input voice Voice analysis is performed on the voice;
  • the first judgment module judges whether the user at the time of the entry and the user at the time of approval are the same user according to the results of the voice analysis; if the user at the time of the entry and the user at the time of approval are the same user, the approval is passed, and the user The entry is successful; if the user at the time of entry and the user at the time of approval are not the same user, the approval is not passed, and the user entry fails;
  • the voice analysis model adopts an adversarial neural network model, and the adversarial neural network model includes generating A model and a discriminant model, the generation model is
  • This application is based on artificial intelligence to approve and review the loan input, and use neural network to process the user's voice. Specifically, through the anti-neural network model, the voice input of the user's input and approval can be analyzed, and user information can be reviewed and compared. , In order to confirm whether it is the person who is operating it, and make a multi-dimensional judgment of user information.
  • This application uses voice interaction to allow users to fill in the information required for the loan during communication, which can effectively prevent customers from generating irritability and other negative emotions, and reduce the technical barriers to loan applications, reducing typing speed, not being able to type and other reasons And the number of customers who gave up their loans.
  • This application uses voice as a source of loan information, which can increase the user’s speech emotion judgment during voice entry, increase the user’s speech speed, voice frequency and other data, which can be used as an intelligent risk control method to determine the authenticity and effectiveness of the user Means, methods.
  • Figure 1 is a schematic diagram of the process of the intelligent loan entry method described in this application.
  • Figure 2 is a schematic diagram of the intelligent loan entry device in the application.
  • FIG. 1 is a schematic diagram of the process of the loan intelligent entry method of the application.
  • the loan intelligent entry method includes the following steps: Step S1, obtaining the first input voice of the user during the entry and the user during approval
  • the second input voice of in which the input refers to the materials required for the loan application submitted to the lending institution or the banking system when the user takes a loan, and the approval refers to the review of the submitted materials after the input, and the approval is passed, it means the input
  • the loan can be issued to the user only if it succeeds;
  • the first input voice refers to the voice information entered when the user applies for a loan, and the second input voice refers to the voice information entered by the user when the user’s loan application is approved;
  • step S2 extract the first voice information One input voice and the voice characteristics of the second input voice;
  • step S3 using the voice analysis model obtained after training, perform voice analysis on the first input voice and the second input voice, and obtain the user and Whether the user at the time of approval is the same user;
  • the speech analysis model adopts a confrontation neural network model
  • the confrontation neural network model includes a generation model and a discrimination model
  • the generation model is used to generate a speech vector corresponding to the second input speech (wherein, the speech vector is composed of speech feature values)
  • the discriminant model is used to determine the probability that the user corresponding to the second input voice and the user corresponding to the first input voice are the same user. Specifically, if the output probability is greater than or equal to a preset probability threshold, it is determined to be advanced If the output probability is less than the preset probability threshold, it is judged that the input and approval are not the same user, the input is unsuccessful and the loan cannot be issued.
  • the contents of the first input voice and the second input voice are all kinds of information that the user fills in when applying for a loan, including personal information such as personal ID card, residential address, and spouse, contact person, loan intention, and whether there is a house or not. There are car and other related information.
  • This application uses the anti-neural network model to perform voice analysis on the input voice of the user during the input and approval, and increases the judgment dimension for the approval of the user's input, so as to determine whether the input and the approval are the same user, so as to determine the authenticity of the user , To realize the user's intelligent input.
  • the loan intelligent input method further includes a training step of a voice analysis model.
  • the step of training the adversarial neural network model includes: obtaining Training samples, the training samples include the first input voice of the user during the input and the second input voice of the user during the approval; the training samples are input into the counter neural network model for training, wherein the first input voice is voiced through the generative model Feature learning, a new voice is generated from the learned voice features as a voice vector corresponding to the second input voice, and the voice vector is used for confrontation training; the first input voice and the second input voice are corresponding to the discriminant model Whether the user is the same user is judged, and the probability that the user corresponding to the first input voice and the second input voice is the same user, the greater the probability, the higher the accuracy of the counter neural network model, when the discriminant model outputs When the accuracy of the result exceeds the preset threshold, the training ends.
  • the generation model uses the user's first input voice file to receive random noise, converts the voice features of the first input voice into a feature table through the fully connected layer of the generation model, and analyzes the voice of the feature table through the deconvolution layer.
  • the feature data is subjected to a deconvolution operation, and an output voice feature is generated through a multi-layer deconvolution layer as a voice vector corresponding to the second input voice.
  • the output voice features of the generated model are convolved, and then connected to the fully connected layer for processing, and finally sent to the activation function.
  • the probability that the output voice feature data is true or false is output. The greater the probability, It indicates that the higher the accuracy of the adversarial neural network model is, when the accuracy exceeds the preset threshold, the training can be ended.
  • the authenticity of each feature value of the second input voice of the user is judged through the discriminant model, so as to achieve the purpose of judging whether the user is himself.
  • the first input voice input when the user enters the document is input to the generation model to obtain the voice vector corresponding to the second input voice, and both the obtained voice vector and the second input voice are input into the discriminant model, so as to obtain the value of the second input voice.
  • the true or false of the feature value where the feature value refers to the extracted speech feature, such as the MFCC coefficient below.
  • the probability that the output feature value of the discriminant model is true (the closer the feature value of the second input voice is to the feature value of the first input voice, the more true the second input voice is considered), when the output probability value is greater than the preset probability threshold , It is considered that the user who input the second voice and the user who input the first voice are the same user.
  • the same method can be used to extract the voice features of the first input voice and the second input voice.
  • the Mel frequency cepstrum coefficient (Mel Frequency Cepstrum Coefficient, MFCC)
  • Voice features sample and analyze sample data, and extract voice features by using spectrogram, cepstrum analysis, Mel frequency analysis, and Mel frequency cepstrum coefficients.
  • the specific steps of extracting the MFCC voice features include: pre-processing the first input speech, the pre-processing including pre-emphasis, framing and windowing processing; through fast Fourier Change (fast Fourier transform, FFT) get the FFT spectrum corresponding to each short-term analysis window; get the Mel spectrum corresponding to the FFT spectrum through the Mel filter bank; perform cepstrum analysis on the Mel spectrum to obtain the Mel frequency cepstrum coefficient MFCC, where , Cepstrum analysis includes the logarithmic processing of the Mel spectrum, and then the inverse Fourier transform.
  • the inverse Fourier transform uses the discrete cosine transform (Discrete Cosine Transform, DCT) is implemented, and the second to thirteenth coefficients after DCT are taken as MFCC coefficients.
  • DCT discrete Cosine Transform
  • the above-mentioned similar methods can be used to extract multiple features of speech, not limited to MFCC coefficient speech features, such as speech rate, loudness, treble, pause, etc.
  • the loudness is related to the frequency, and the loudness is expressed in logarithmic values, which is the loudness level, and the unit of the loudness level is square.
  • the corresponding relationship between the loudness and the frequency sound level is calculated using the equal loudness curve formula.
  • Pitch is defined by the frequency of the sound.
  • Pauses are distinguished by the number of rests. Therefore, a similar method can be used for spectrum analysis to obtain the above-mentioned voice characteristics. Through the analysis of other voice features, it is possible to review and compare the user information at the time of approval to confirm whether the operation is performed by the person.
  • the method further includes: acquiring input field information corresponding to the first input voice through voice recognition; acquiring the user's ID picture, and acquiring the user’s voice through image text recognition Credential information, where the credential picture refers to the captured ID card picture; the corresponding entry field information is verified by the obtained credential information, for example, the entry field is compared with the field of the credential information to obtain the entry field and For the similarity of the fields of the credential information, if the similarity is greater than or equal to the preset similarity threshold, the verification is passed, and if the similarity is less than the preset similarity threshold, the verification fails.
  • the first input voice of the user during the input after acquiring the first input voice of the user during the input, it further includes: acquiring input field information corresponding to the first input voice through voice recognition; displaying the input field information in the form of a page to facilitate the user Enter information to check for deficiencies, ensure the correctness of loan application information, apply for loans in a more convenient and efficient way, and complete the input of loan scenarios.
  • the step of obtaining the second input voice of the user during approval is performed. Pre-judge the user through text emotion recognition. If the user lies, there is no need to obtain the second input voice and directly end the entry.
  • Determining whether the user is lying according to the text emotion recognition result includes: determining whether the user's emotion when entering the first input voice meets a set condition, and if the set condition is met, the user is considered to be lying. For example, when it is recognized that the user’s emotions when entering the first input voice fluctuates sharply, or with panic, consternation and other emotions, the user is considered to be lying; when the user’s emotion is stable and calm when the first input voice is recognized, it is considered The user did not lie.
  • the setting conditions include: the speech rate exceeds the first set threshold, the fluctuation of the loudness frequency exceeds the preset fluctuation range (too large or too small can be regarded as the user lying), and the number of speech pauses exceeds the second set threshold.
  • the first set threshold may be 150 words/min.
  • the speech rate is greater than 150 words/min, it is considered that the user's mood fluctuates greatly. That is to say, through the obtained speech features, it is possible to identify whether the emotion fluctuates and the emotion fluctuation situation.
  • the emotion fluctuation here refers to the emotion fluctuation carried by the input voice, and emotion recognition is performed by converting the input voice into text. Analyze the emotional characteristics of the user's voice recording the loan information through the voice state of the user's voice to determine whether it is lying, increase the consideration of the user's speech rate, voice frequency, emotion, etc., which is conducive to effective intelligent risk control judgment.
  • FIG 2 is a schematic diagram of the intelligent loan entry device of the application.
  • the loan intelligent entry device includes: a voice acquisition module 1, used to obtain the user's first input voice and approval time during entry The user’s second input voice, in which the input refers to the materials required for the loan application submitted by the user to the lending institution or the banking system when the user takes a loan, and the approval refers to the review of the submitted materials after the input.
  • the first input voice refers to the voice information entered when the user applies for a loan
  • the second input voice refers to the voice information entered by the user when the user’s loan application is approved
  • feature extraction module 2 Is used to extract the voice features of the first input voice and the second input voice
  • the voice analysis module 3 uses the voice analysis model obtained through training to perform voice analysis on the first input voice and the second input voice
  • a judging module 4 according to the voice analysis result, judge whether the user at the time of the incoming shipment and the user at the time of approval are the same user; if the user at the time of incoming shipment and the user at the time of approval are the same user, the approval is passed and the user enters successfully; if If the user at the time of importing and approving is not the same user, the approval will not be passed, and the user will fail to enter the file.
  • the speech analysis model adopts a confrontation neural network model
  • the confrontation neural network model includes a generation model and a discrimination model
  • the generation model is used to generate a speech vector corresponding to the second input speech (wherein, the speech vector is composed of speech feature values)
  • the discriminant model is used to determine the probability that the user corresponding to the second input voice and the user corresponding to the first input voice are the same user, specifically, if the output probability obtained by the voice analysis is greater than or equal to the preset probability threshold, Then the judgment module judges that it is the same user at the time of inbound and approval. If the output probability is less than the preset probability threshold, the judgment module judges that the inbound and approval are not the same user, then the inbound is unsuccessful and the loan cannot be issued.
  • the contents of the first input voice and the second input voice are all kinds of information that the user fills in when applying for a loan, including personal information such as personal ID card, residential address, and spouse, contact person, loan intention, and whether there is a house or not. There are car and other related information.
  • the loan intelligent input device further includes a training module to train a voice analysis model.
  • the step of training the adversarial neural network model includes : Obtain training samples, the training samples include the first input voice of the user during the input and the second input voice of the user during the approval; the training samples are input into the anti-neural network model for training, wherein the first input voice is Perform the learning of voice features, and generate new voices through the learned voice features as a voice vector corresponding to the second input voice, and the voice vector is used for confrontation training; Whether the corresponding user is the same user is judged, the probability that the user corresponding to the first input voice and the second input voice is the same user, the greater the probability, the higher the accuracy of the counter neural network model, when the judgment When the accuracy of the output result of the model exceeds the preset threshold, the training ends.
  • the generation model uses the user's first input voice file to receive random noise, converts the voice features of the first input voice into a feature table through the fully connected layer of the generation model, and analyzes the voice of the feature table through the deconvolution layer.
  • the feature data is subjected to a deconvolution operation, and an output voice feature is generated through a multi-layer deconvolution layer as a voice vector corresponding to the second input voice.
  • the output voice features of the generated model are convolved, and then connected to the fully connected layer for processing, and finally sent to the activation function.
  • the probability that the output voice feature data is true or false is output. The greater the probability, It indicates that the higher the accuracy of the adversarial neural network model is, when the accuracy exceeds the preset threshold, the training can be ended.
  • the authenticity of each feature value of the second input voice of the user is judged through the discriminant model, so as to achieve the purpose of judging whether the user is himself.
  • the first input voice input when the user enters the document is input to the generation model to obtain the voice vector corresponding to the second input voice, and both the obtained voice vector and the second input voice are input into the discriminant model, so as to obtain the value of the second input voice.
  • the true or false of the feature value where the feature value refers to the extracted speech feature, such as the MFCC coefficient below.
  • the probability that the output feature value of the discriminant model is true (the closer the feature value of the second input voice is to the feature value of the first input voice, the more true the second input voice is considered), when the output probability value is greater than the preset probability threshold , It is considered that the user who input the second voice and the user who input the first voice are the same user.
  • the feature extraction module can use the same method to extract the voice features of the first input voice and the second input voice.
  • the Mel frequency cepstrum coefficient (Mel Frequency Cepstrum Coefficient, MFCC)
  • Voice features sample and analyze sample data, and extract voice features by using spectrogram, cepstrum analysis, Mel frequency analysis, and Mel frequency cepstrum coefficients.
  • the specific steps of extracting the MFCC voice features include: pre-processing the first input speech, the pre-processing including pre-emphasis, framing and windowing processing; through fast Fourier Transformation (Fast Fourier Transform, FFT) obtain the FFT spectrum corresponding to each short-term analysis window; obtain the Mel spectrum corresponding to the FFT spectrum through the Mel filter bank; perform cepstrum analysis on the Mel spectrum to obtain the Mel frequency cepstrum coefficient MFCC, where , Cepstrum analysis includes the logarithmic processing of the Mel spectrum, and then the inverse Fourier transform.
  • the inverse Fourier transform uses the discrete cosine transform (Discrete Cosine Transform, DCT) is implemented, and the second to thirteenth coefficients after DCT are taken as MFCC coefficients.
  • DCT discrete Cosine Transform
  • the above-mentioned similar methods can be used to extract multiple features of speech, not limited to MFCC coefficient speech features, such as speech rate, loudness, treble, pause, etc.
  • the loudness is related to the frequency, and the loudness is expressed in logarithmic values, which is the loudness level, and the unit of the loudness level is square.
  • the corresponding relationship between the loudness and the frequency sound level is calculated using the equal loudness curve formula.
  • Pitch is defined by the frequency of the sound. Pauses are distinguished by the number of rests. Therefore, a similar method can be used for spectrum analysis to obtain the above-mentioned voice characteristics.
  • the intelligent loan entry device further includes: a voice recognition module, which obtains input field information corresponding to the first input voice through voice recognition, so that the user only needs to perform effective voice interaction to complete the input of the field , Which is conducive to the entry of large quantities of fields, completely separates the loan input from the user’s typing, whether typing is no longer a barrier for users to apply for a loan; the text recognition module obtains the user’s ID picture, and obtains the user’s information through image text recognition Credential information, where the credential picture refers to the captured ID card picture; the verification module verifies the corresponding entry field information through the acquired credential information to ensure the accuracy and reliability of the entered information.
  • a voice recognition module which obtains input field information corresponding to the first input voice through voice recognition, so that the user only needs to perform effective voice interaction to complete the input of the field , Which is conducive to the entry of large quantities of fields, completely separates the loan input from the user’s typing, whether typing is no longer a barrier for users to apply for a loan
  • the similarity between the input field and the field of the certificate information is obtained. If the similarity is greater than or equal to the preset similarity threshold, the verification is passed; if the similarity is less than the preset Similarity threshold, the verification fails.
  • the intelligent loan entry device further includes: a voice recognition module, which obtains input field information corresponding to the first input voice through voice recognition; and a page display module, which displays the input field information in the form of a page, with It is convenient for users to check deficiencies in the input information, ensure the correctness of loan application information, apply for loans in a more convenient and efficient way, and complete the input of loan scenarios.
  • the intelligent loan entry device further includes: a text conversion module, which is used to convert the first input speech into text; an emotion recognition module, which is used to perform text emotion recognition on the converted text; and a second judgment module, It is used for judging whether the user is lying according to the result of text emotion recognition, if it is judged that the user is lying, then the entry is ended, if it is judged that the user is not lying, then the step of obtaining the second input voice of the user during approval is performed. Pre-judge the user through text emotion recognition. If the user lies, there is no need to obtain the second input voice and directly end the entry.
  • the specific method for judging whether the user is lying based on the result of text emotion recognition is roughly the same as the judgment method in the above-mentioned intelligent loan entry method, and will not be repeated here.
  • the intelligent loan payment method described in this application is applied to an electronic device, and the electronic device may be a terminal device such as a television, a smart phone, a tablet computer, and a computer.
  • the electronic device includes: a processor and a memory, which are used to store a loan smart purchase program, the processor executes the loan smart purchase program to implement the following loan smart purchase method: the first input voice of the user when the purchase is obtained And the user’s second input voice during approval; extract the voice features of the first input voice and the second input voice; use a trained voice analysis model to compare the first input voice and the second input voice Voice analysis is performed to find out whether the user at the time of the entry and the user at the time of approval are the same user; if the user at the time of the entry and the user at the time of approval are the same user, the approval is passed, and the user enters successfully; If the user and the user at the time of approval are not the same user, the approval is not passed and the user input fails; wherein, the voice analysis model adopts an adversarial neural network model, and the adversarial neural network model includes a generative model and a discriminant model. The model is used to generate a voice vector corresponding to the second input voice, and the discrimination model is used
  • the electronic device also includes a network interface, a communication bus, and the like.
  • the network interface may include a standard wired interface and a wireless interface
  • the communication bus is used to realize the connection and communication between various components.
  • the memory includes at least one type of readable storage medium, which can be a non-volatile storage medium such as a flash memory, a hard disk, an optical disc, or a plug-in hard disk, etc., and is not limited to this, and can be stored in a non-transitory manner Any device that provides instructions or software and any associated data files to the processor to enable the processor to execute the instructions or software program.
  • the software program stored in the memory includes the loan intelligent payment program, and the loan intelligent payment program can be provided to the processor, so that the processor can execute the loan intelligent payment program and realize the loan intelligent payment method.
  • the processor can be a central processing unit, a microprocessor, or other data processing chips, etc., and can run a stored program in the memory, for example, the loan smart entry program in this application.
  • the electronic device may also include a display, and the display may also be called a display screen or a display unit.
  • the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an organic light-emitting diode (OLED) touch device, and the like.
  • the display is used to display the information processed in the electronic device and to display the visual work interface.
  • the electronic device may further include a user interface, and the user interface may include an input unit (such as a keyboard), a voice output device (such as a stereo, earphone), and the like.
  • a user interface may include an input unit (such as a keyboard), a voice output device (such as a stereo, earphone), and the like.
  • the loan smart entry program can also be divided into one or more modules, and the one or more modules are stored in the memory and executed by the processor to complete the application.
  • the module referred to in this application refers to a series of computer program instruction segments that can complete specific functions.
  • the loan smart entry program can be divided into: a voice acquisition module 1, a feature extraction module 2, a voice analysis module 3, and a first judgment module 4.
  • the functions or operation steps implemented by the above modules are all similar to the above, and will not be described in detail here.
  • the computer-readable storage medium may be any tangible medium that contains or stores a program or instruction, and may be non-volatile or volatile.
  • the program can be executed through the stored program
  • the instruction-related hardware implements the corresponding functions.
  • the computer-readable storage medium may be a computer disk, a hard disk, a random access memory, a read-only memory, and so on.
  • This application is not limited to this, and it can be any device that stores instructions or software and any related data files or data structures in a non-transitory manner and can be provided to the processor to enable the processor to execute the programs or instructions therein.
  • the computer-readable storage medium includes a loan intelligent payment program, and when the loan intelligent payment program is executed by a processor, the following loan intelligent payment method is realized: the first input voice of the user when the payment is acquired and the approval time The user’s second input voice; extract the voice features of the first input voice and the second input voice; use the trained voice analysis model to perform voice on the first input voice and the second input voice Analyze, it can be concluded whether the user at the time of the input and the user at the time of approval are the same user; if the user at the time of the input and the user at the time of approval are the same user, the approval is passed and the user input is successful; if the user at the time of the input is the same The user at the time of approval is not the same user, the approval is not passed, and the user input fails; wherein, the voice analysis model adopts an adversarial neural network model, and the adversarial neural network model includes a generative model and a discriminant model, and the generative model is used for A voice vector corresponding to the second input voice is

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The present application relates to the technical field of artificial intelligence. Disclosed are an intelligent loan entry method, and an apparatus and a storage medium. The method comprises: obtaining a first input voice of a user during entry and a second input voice of a user during examination and approval; extracting voice features; performing voice analysis by utilizing a voice analysis model to obtain whether the user during the entry and the user during the examination and approval are the same user; if the users are the same user, enabling the examination and approval to be passed, so that the entry of the user is successful; and if the users are not the same user, enabling the examination and approval not to be passed, so that the entry of the user fails, wherein the voice analysis model uses an adversarial neural network model and comprises a generation model and a discrimination model, the generation model is used for generating a voice vector corresponding to the second input voice, and the discrimination model is used for determining the probability that a user corresponding to the second input voice and a user corresponding to the first input voice are the same user. According to the present application, user information can be rechecked and compared to confirm whether the operation is performed by the user, and multi-dimensional discrimination of the user information is performed.

Description

贷款智能进件方法、装置及存储介质Method, device and storage medium for intelligent loan entry
本申请要求于2020年04月02日提交中国专利局、申请号为202010254541.2,发明名称为“贷款智能进件方法、装置及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on April 02, 2020, the application number is 202010254541.2, and the invention title is "the method, device and storage medium for loan intelligent entry", the entire content of which is incorporated by reference In this application.
技术领域Technical field
本申请涉及人工智能技术领域,尤其涉及一种贷款智能进件方法、装置及存储介质。This application relates to the field of artificial intelligence technology, and in particular to a method, device and storage medium for intelligent loan entry.
背景技术Background technique
目前,贷款产品通常是通过用户手工录入的方式将产品所需的字段一个一个地录入到各种贷款产品的app中,而通常都需要录入几十甚至几百个字段,这需要消耗大量的时间和精力,而且由于文化水平的差异,还有部分贷款用户打字速度比较慢,如果要录入几十甚至几百个字段,其效率可想而知。发明人意识到这种手工录入大批量用户信息字段的方式不仅效率低下,对于部分贷款意愿强烈,但不善于打字的用户,增加了贷款申请的壁垒,而且,仅通过字段录入难以对用户的真实信息进行多维度判别。At present, loan products usually enter the fields required by the product one by one into the app of various loan products through manual entry by users, and usually need to enter dozens or even hundreds of fields, which takes a lot of time And energy, and due to differences in cultural level, some loan users are slow to type. If you have to enter dozens or even hundreds of fields, the efficiency can be imagined. The inventor realizes that this method of manually entering a large number of user information fields is not only inefficient, but also increases the barriers to loan application for some users who are willing to lend but are not good at typing. Moreover, it is difficult to verify the authenticity of users only through field entry. Information is judged in multiple dimensions.
技术问题technical problem
本申请提供一种贷款智能进件方法、装置及存储介质,以解决现有技术中难以对用户的真实信息进行多维度判别的问题。This application provides a method, a device and a storage medium for intelligent loan entry, to solve the problem in the prior art that it is difficult to judge the real information of users in multiple dimensions.
技术解决方案Technical solutions
为了实现上述目的,本申请的第一个方面是提供一种贷款智能进件方法,包括:获取进件时用户的第一输入语音和审批时用户的第二输入语音;提取所述第一输入语音和所述第二输入语音的语音特征;利用经过训练得到的语音分析模型,对所述第一输入语音和所述第二输入语音进行语音分析,得出进件时的用户和审批时的用户是否是同一用户;若进件时的用户与审批时的用户是同一用户,则审批通过,用户进件成功;若进件时的用户与审批时的用户不是同一用户,则审批不通过,用户进件失败;其中,所述语音分析模型采用对抗神经网络模型,所述对抗神经网络模型包括生成模型和判别模型,所述生成模型用于生成与所述第二输入语音对应的语音向量,所述判别模型用于判断第二输入语音对应的用户与第一输入语音对应的用户是同一用户的概率。In order to achieve the above objective, the first aspect of this application is to provide a method for intelligent loan entry, which includes: acquiring the user’s first input voice during the entry and the user’s second input voice during approval; extracting the first input The voice features of the voice and the second input voice; using the trained voice analysis model, perform voice analysis on the first input voice and the second input voice, and obtain the user at the time of the entry and the approval at the time of approval Whether the user is the same user; if the user at the time of the input and the user at the time of approval are the same user, the approval is passed and the user is successful; if the user at the time of the input and the user at the time of approval are not the same user, the approval is not passed. The user input fails; wherein the voice analysis model adopts an adversarial neural network model, the adversarial neural network model includes a generation model and a discriminant model, and the generation model is used to generate a speech vector corresponding to the second input speech, The discriminant model is used to determine the probability that the user corresponding to the second input voice and the user corresponding to the first input voice are the same user.
为了实现上述目的,本申请的第二个方面是提供一种电子装置,该电子装置包括:处理器和存储器,所述存储器中包括贷款智能进件程序,所述贷款智能进件程序被所述处理器执行时实现如下贷款智能进件方法:获取进件时用户的第一输入语音和审批时用户的第二输入语音;提取所述第一输入语音和所述第二输入语音的语音特征;利用经过训练得到的语音分析模型,对所述第一输入语音和所述第二输入语音进行语音分析,得出进件时的用户和审批时的用户是否是同一用户;若进件时的用户与审批时的用户是同一用户,则审批通过,用户进件成功;若进件时的用户与审批时的用户不是同一用户,则审批不通过,用户进件失败;其中,所述语音分析模型采用对抗神经网络模型,所述对抗神经网络模型包括生成模型和判别模型,所述生成模型用于生成与所述第二输入语音对应的语音向量,所述判别模型用于判断第二输入语音对应的用户与第一输入语音对应的用户是同一用户的概率。In order to achieve the above objective, a second aspect of the present application is to provide an electronic device, the electronic device comprising: a processor and a memory, the memory includes a loan smart entry program, the loan smart entry program is The processor implements the following intelligent loan entry method when executed: acquiring the user's first input voice during entry and the user's second input voice during approval; extracting voice features of the first input voice and the second input voice; Using the trained voice analysis model, perform voice analysis on the first input voice and the second input voice, and find out whether the user at the time of the entry and the user at the time of approval are the same user; if the user at the time of the entry If the user at the time of approval is the same user, the approval is passed, and the user's entry is successful; if the user at the time of entry and the user at the time of approval are not the same user, the approval is not passed and the user's entry fails; wherein, the voice analysis model Adopt a confrontation neural network model, the confrontation neural network model includes a generation model and a discrimination model, the generation model is used to generate a speech vector corresponding to the second input speech, the discrimination model is used to determine the second input speech corresponds to The probability that the user corresponding to the first input voice is the same user.
为了实现上述目的,本申请的第三个方面是提供一种计算机可读存储介质,所述计算机可读存储介质中包括贷款智能进件程序,所述贷款智能进件程序被处理器执行时,实现如下贷款智能进件方法:获取进件时用户的第一输入语音和审批时用户的第二输入语音;提取所述第一输入语音和所述第二输入语音的语音特征;利用经过训练得到的语音分析模型,对所述第一输入语音和所述第二输入语音进行语音分析,得出进件时的用户和审批时的用户是否是同一用户;若进件时的用户与审批时的用户是同一用户,则审批通过,用户进件成功;若进件时的用户与审批时的用户不是同一用户,则审批不通过,用户进件失败;其中,所述语音分析模型采用对抗神经网络模型,所述对抗神经网络模型包括生成模型和判别模型,所述生成模型用于生成与所述第二输入语音对应的语音向量,所述判别模型用于判断第二输入语音对应的用户与第一输入语音对应的用户是同一用户的概率。In order to achieve the above objective, the third aspect of the present application is to provide a computer-readable storage medium, the computer-readable storage medium includes a loan intelligent payment program, when the loan intelligent payment program is executed by a processor, Realize the following intelligent loan entry method: obtain the user’s first input voice during entry and the user’s second input voice during approval; extract the voice features of the first input voice and the second input voice; The voice analysis model is to perform voice analysis on the first input voice and the second input voice, and find whether the user at the time of the entry and the user at the time of approval are the same user; if the user at the time of the entry is the same as the user at the time of approval If the user is the same user, the approval is passed and the user's entry is successful; if the user at the time of the entry and the user at the time of approval are not the same user, the approval is not passed and the user's entry fails; wherein the voice analysis model adopts an adversarial neural network Model, the confrontation neural network model includes a generative model and a discriminant model, the generative model is used to generate a speech vector corresponding to the second input speech, and the discriminant model is used to determine whether the user corresponding to the second input speech and the first The probability that the user corresponding to an input voice is the same user.
为了实现上述目的,本申请的第四个方面是提供一种贷款智能进件装置,包括:语音获取模块,用于获取进件时用户的第一输入语音和审批时用户的第二输入语音;特征提取模块,用于提取所述第一输入语音和所述第二输入语音的语音特征;语音分析模块,利用经过训练得到的语音分析模型,对所述第一输入语音和所述第二输入语音进行语音分析;第一判断模块,根据语音分析结果判断进件时的用户和审批时的用户是否是同一用户;若进件时的用户与审批时的用户是同一用户,则审批通过,用户进件成功;若进件时的用户与审批时的用户不是同一用户,则审批不通过,用户进件失败;其中,所述语音分析模型采用对抗神经网络模型,所述对抗神经网络模型包括生成模型和判别模型,所述生成模型用于生成与所述第二输入语音对应的语音向量,所述判别模型用于判断第二输入语音对应的用户与第一输入语音对应的用户是同一用户的概率。In order to achieve the foregoing objective, the fourth aspect of the present application is to provide a loan intelligent entry device, including: a voice acquisition module for acquiring the first input voice of the user during entry and the second input voice of the user during approval; The feature extraction module is used to extract the voice features of the first input voice and the second input voice; the voice analysis module uses the trained voice analysis model to compare the first input voice and the second input voice Voice analysis is performed on the voice; the first judgment module judges whether the user at the time of the entry and the user at the time of approval are the same user according to the results of the voice analysis; if the user at the time of the entry and the user at the time of approval are the same user, the approval is passed, and the user The entry is successful; if the user at the time of entry and the user at the time of approval are not the same user, the approval is not passed, and the user entry fails; wherein, the voice analysis model adopts an adversarial neural network model, and the adversarial neural network model includes generating A model and a discriminant model, the generation model is used to generate a voice vector corresponding to the second input voice, and the discrimination model is used to determine that the user corresponding to the second input voice and the user corresponding to the first input voice are the same user Probability.
有益效果Beneficial effect
相对于现有技术,本申请具有以下优点和有益效果:Compared with the prior art, this application has the following advantages and beneficial effects:
本申请基于人工智能对贷款进件进行审批复核,利用神经网络对用户语音处理,具体通过对抗神经网络模型对用户进件时和审批时的输入语音进行语音分析,可以对用户信息进行复核比对,以确认是否是本人在操作,对用户信息进行多维度判别。This application is based on artificial intelligence to approve and review the loan input, and use neural network to process the user's voice. Specifically, through the anti-neural network model, the voice input of the user's input and approval can be analyzed, and user information can be reviewed and compared. , In order to confirm whether it is the person who is operating it, and make a multi-dimensional judgment of user information.
本申请通过语音交互方式,让用户在沟通中即可填入贷款所需信息,可以有效避免客户产生烦躁等负面情绪,并且,降低贷款申请的技术壁垒,减少因打字速度、不会打字等原因而放弃贷款的客户人数。This application uses voice interaction to allow users to fill in the information required for the loan during communication, which can effectively prevent customers from generating irritability and other negative emotions, and reduce the technical barriers to loan applications, reducing typing speed, not being able to type and other reasons And the number of customers who gave up their loans.
本申请采用语音作为贷款信息的一个来源,可以增加用户在语音录入时的说话情绪判断,增加用户说话语速、声音频率等方面的数据,可作为智能风控判别用户真实性、有效性的一个手段、方式。This application uses voice as a source of loan information, which can increase the user’s speech emotion judgment during voice entry, increase the user’s speech speed, voice frequency and other data, which can be used as an intelligent risk control method to determine the authenticity and effectiveness of the user Means, methods.
附图说明Description of the drawings
图1为本申请所述贷款智能进件方法的流程示意图。Figure 1 is a schematic diagram of the process of the intelligent loan entry method described in this application.
图2为本申请中贷款智能进件装置的示意图。Figure 2 is a schematic diagram of the intelligent loan entry device in the application.
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization, functional characteristics, and advantages of the purpose of this application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.
具体实施方式Detailed ways
下面将参考附图来描述本申请所述的实施例。本领域的普通技术人员可以认识到,在不偏离本申请的精神和范围的情况下,可以用各种不同的方式或其组合对所描述的实施例进行修正。因此,附图和描述在本质上是说明性的,仅仅用以解释本申请,而不是用于限制权利要求的保护范围。此外,在本说明书中,附图未按比例画出,并且相同的附图标记表示相同的部分。The embodiments described in the present application will be described below with reference to the drawings. Those of ordinary skill in the art may realize that the described embodiments can be modified in various different ways or combinations thereof without departing from the spirit and scope of the present application. Therefore, the drawings and description are illustrative in nature, and are only used to explain the application, rather than to limit the protection scope of the claims. In addition, in this specification, the drawings are not drawn to scale, and the same reference numerals denote the same parts.
图1为本申请所述贷款智能进件方法的流程示意图,如图1所示,所述贷款智能进件方法包括以下步骤:步骤S1,获取进件时用户的第一输入语音和审批时用户的第二输入语音,其中,进件指的是用户贷款时向贷款机构或银行系统提交申请贷款所需材料,审批指的是进件后对所提交材料进行审核,审批通过,则表示进件成功,才可向用户发放贷款;第一输入语音指的是用户申请贷款时录入的语音信息,第二输入语音指的是对用户贷款申请进行审批时用户录入的语音信息;步骤S2,提取第一输入语音和第二输入语音的语音特征;步骤S3,利用经过训练得到的语音分析模型,对所述第一输入语音和所述第二输入语音进行语音分析,得出进件时的用户和审批时的用户是否是同一用户;步骤S4,判断进件时的用户和审批时的用户是否是同一用户,若进件时的用户与审批时的用户是同一用户,则审批通过,用户进件成功;若进件时的用户与审批时的用户不是同一用户,则审批不通过,用户进件失败。Figure 1 is a schematic diagram of the process of the loan intelligent entry method of the application. As shown in Figure 1, the loan intelligent entry method includes the following steps: Step S1, obtaining the first input voice of the user during the entry and the user during approval The second input voice of, in which the input refers to the materials required for the loan application submitted to the lending institution or the banking system when the user takes a loan, and the approval refers to the review of the submitted materials after the input, and the approval is passed, it means the input The loan can be issued to the user only if it succeeds; the first input voice refers to the voice information entered when the user applies for a loan, and the second input voice refers to the voice information entered by the user when the user’s loan application is approved; step S2, extract the first voice information One input voice and the voice characteristics of the second input voice; step S3, using the voice analysis model obtained after training, perform voice analysis on the first input voice and the second input voice, and obtain the user and Whether the user at the time of approval is the same user; step S4, judge whether the user at the time of the incoming shipment and the user at the time of approval are the same user. Success; if the user at the time of the incoming shipment is not the same user as the user at the time of approval, the approval will not be passed and the user will fail to enter the shipment.
其中,语音分析模型采用对抗神经网络模型,所述对抗神经网络模型包括生成模型和判别模型,所述生成模型用于生成与第二输入语音对应的语音向量(其中,语音向量由语音特征值组成),所述判别模型用于判断第二输入语音对应的用户与第一输入语音对应的用户是同一用户的概率,具体地,若输出概率大于或等于预先设定的概率阈值,则判定为进件和审批时是同一用户,若输出概率小于预先设定的概率阈值,则判定为进件和审批时不是同一用户,则进件不成功,不能发放贷款。Wherein, the speech analysis model adopts a confrontation neural network model, the confrontation neural network model includes a generation model and a discrimination model, and the generation model is used to generate a speech vector corresponding to the second input speech (wherein, the speech vector is composed of speech feature values) ), the discriminant model is used to determine the probability that the user corresponding to the second input voice and the user corresponding to the first input voice are the same user. Specifically, if the output probability is greater than or equal to a preset probability threshold, it is determined to be advanced If the output probability is less than the preset probability threshold, it is judged that the input and approval are not the same user, the input is unsuccessful and the loan cannot be issued.
需要说明的是,第一输入语音和第二输入语音的内容均为用户申请贷款时填写的各种信息,包括个人身份证、住址等的个人信息及配偶、联系人、贷款意图、是否有房有车等相关信息。It should be noted that the contents of the first input voice and the second input voice are all kinds of information that the user fills in when applying for a loan, including personal information such as personal ID card, residential address, and spouse, contact person, loan intention, and whether there is a house or not. There are car and other related information.
本申请通过对抗神经网络模型对进件和审批时用户的输入语音进行语音分析,增加对用户进件进行审批的判断维度,得出进件和审批时是否是同一用户,从而判别用户的真实性,实现用户的智能进件。This application uses the anti-neural network model to perform voice analysis on the input voice of the user during the input and approval, and increases the judgment dimension for the approval of the user's input, so as to determine whether the input and the approval are the same user, so as to determine the authenticity of the user , To realize the user's intelligent input.
本申请的一个实施例中,所述贷款智能进件方法还包括对语音分析模型的训练步骤,当语音分析模型采用对抗神经网络模型时,对所述对抗神经网络模型进行训练的步骤包括:获取训练样本,所述训练样本包括进件时用户的第一输入语音和审批时用户的第二输入语音;将训练样本输入对抗神经网络模型进行训练,其中,通过生成模型对第一输入语音进行语音特征的学习,通过学习的语音特征生成新的语音,作为与第二输入语音相对应的语音向量,所述语音向量用于对抗训练;通过判别模型对第一输入语音和第二输入语音对应的用户是否是同一用户进行判断,输出第一输入语音和第二输入语音对应的用户是同一用户的概率,概率越大,表明所述对抗神经网络模型的准确率越高,当所述判别模型输出结果的准确率超过预设阈值时,结束训练。进一步地,生成模型使用用户的第一输入语音文件,接收随机噪声,通过生成模型的全连接层将第一输入语音的语音特征转化为特征表,通过反卷积层对所述特征表的语音特征数据进行反卷积运算,经过多层反卷积层生成输出语音特征,作为与第二输入语音相对应的语音向量。通过判别模型的卷积层对生成模型的输出语音特征进行卷积运算,接入全连接层处理,最后送入激活函数中,输出所述输出语音特征数据为真假的概率,概率越大,表明所述对抗神经网络模型的准确率越高,当准确率超过预设阈值时,则可以结束训练。In an embodiment of the present application, the loan intelligent input method further includes a training step of a voice analysis model. When the voice analysis model adopts an adversarial neural network model, the step of training the adversarial neural network model includes: obtaining Training samples, the training samples include the first input voice of the user during the input and the second input voice of the user during the approval; the training samples are input into the counter neural network model for training, wherein the first input voice is voiced through the generative model Feature learning, a new voice is generated from the learned voice features as a voice vector corresponding to the second input voice, and the voice vector is used for confrontation training; the first input voice and the second input voice are corresponding to the discriminant model Whether the user is the same user is judged, and the probability that the user corresponding to the first input voice and the second input voice is the same user, the greater the probability, the higher the accuracy of the counter neural network model, when the discriminant model outputs When the accuracy of the result exceeds the preset threshold, the training ends. Further, the generation model uses the user's first input voice file to receive random noise, converts the voice features of the first input voice into a feature table through the fully connected layer of the generation model, and analyzes the voice of the feature table through the deconvolution layer. The feature data is subjected to a deconvolution operation, and an output voice feature is generated through a multi-layer deconvolution layer as a voice vector corresponding to the second input voice. Through the convolutional layer of the discriminating model, the output voice features of the generated model are convolved, and then connected to the fully connected layer for processing, and finally sent to the activation function. The probability that the output voice feature data is true or false is output. The greater the probability, It indicates that the higher the accuracy of the adversarial neural network model is, when the accuracy exceeds the preset threshold, the training can be ended.
当用户审批时的第二输入语音被采集时,通过判别模型来判断该用户的第二输入语音的各个特征值的真假,达到判断用户是否为本人的目的。具体地,将用户进件时的第一输入语音输入生成模型,得到与第二输入语音对应的语音向量,将得到的语音向量和第二输入语音均输入判别模型,从而得到第二输入语音的特征值的真假,其中,特征值指的是提取的语音特征,例如下文中的MFCC系数。通过判别模型输出特征值为真实的概率(第二输入语音的特征值与第一输入语音的特征值越接近,则认为第二输入语音越真),当输出概率值大于预设的概率阈值时,则认为第二输入语音的用户与第一输入语音的用户是同一用户。When the second input voice during the user's approval is collected, the authenticity of each feature value of the second input voice of the user is judged through the discriminant model, so as to achieve the purpose of judging whether the user is himself. Specifically, the first input voice input when the user enters the document is input to the generation model to obtain the voice vector corresponding to the second input voice, and both the obtained voice vector and the second input voice are input into the discriminant model, so as to obtain the value of the second input voice. The true or false of the feature value, where the feature value refers to the extracted speech feature, such as the MFCC coefficient below. The probability that the output feature value of the discriminant model is true (the closer the feature value of the second input voice is to the feature value of the first input voice, the more true the second input voice is considered), when the output probability value is greater than the preset probability threshold , It is considered that the user who input the second voice and the user who input the first voice are the same user.
本申请中,可以采用相同的方法提取第一输入语音和第二输入语音的语音特征。在一个实施例中,使用梅尔频率倒谱系数(Mel Frequency Cepstrum Coefficient, MFCC)语音特征对样本数据进行采样分析,通过使用声谱图、倒谱分析、Mel频率分析、Mel频率倒谱系数的方式来对语音特征进行提取。In this application, the same method can be used to extract the voice features of the first input voice and the second input voice. In one embodiment, the Mel frequency cepstrum coefficient (Mel Frequency Cepstrum Coefficient, MFCC) Voice features sample and analyze sample data, and extract voice features by using spectrogram, cepstrum analysis, Mel frequency analysis, and Mel frequency cepstrum coefficients.
以提取第一输入语音的语音特征为例,提取MFCC语音特征的具体步骤包括:对第一输入语音进行预处理,所述预处理包括预加重、分帧和加窗处理;通过快速傅里叶变换(fast Fourier transform, FFT)得到与每一个短时分析窗对应的FFT频谱;通过Mel滤波器组得到与FFT频谱对应的Mel频谱;在Mel频谱上进行倒谱分析,获得Mel频率倒谱系数MFCC,其中,倒谱分析包括对Mel频谱进行对数运算处理后,再进行傅里叶逆变换,傅里叶逆变换通过离散余弦变换(Discrete Cosine Transform,DCT)实现,取DCT后的第2个到第13个系数作为MFCC系数。Taking the extraction of the voice features of the first input speech as an example, the specific steps of extracting the MFCC voice features include: pre-processing the first input speech, the pre-processing including pre-emphasis, framing and windowing processing; through fast Fourier Change (fast Fourier transform, FFT) get the FFT spectrum corresponding to each short-term analysis window; get the Mel spectrum corresponding to the FFT spectrum through the Mel filter bank; perform cepstrum analysis on the Mel spectrum to obtain the Mel frequency cepstrum coefficient MFCC, where , Cepstrum analysis includes the logarithmic processing of the Mel spectrum, and then the inverse Fourier transform. The inverse Fourier transform uses the discrete cosine transform (Discrete Cosine Transform, DCT) is implemented, and the second to thirteenth coefficients after DCT are taken as MFCC coefficients.
需要说明的是,采用上述的类似方法可以提取语音的多种特征,而不仅限于MFCC系数语音特征,例如,语速、响度、高音、停顿等。其中,响度与频率有关,响度用对数值来表示,即为响度级,响度级单位为方,响度与频率声级对应关系利用等响度曲线公式计算出。音高通过声音频率高低来定义。停顿通过静息的次数来区分。因此,可以使用类似的方法进行频谱分析,得到上述语音特征。通过对其他语音特征的分析,可以对审批时的用户信息进行复核比对,确认是否本人操作。It should be noted that the above-mentioned similar methods can be used to extract multiple features of speech, not limited to MFCC coefficient speech features, such as speech rate, loudness, treble, pause, etc. Among them, the loudness is related to the frequency, and the loudness is expressed in logarithmic values, which is the loudness level, and the unit of the loudness level is square. The corresponding relationship between the loudness and the frequency sound level is calculated using the equal loudness curve formula. Pitch is defined by the frequency of the sound. Pauses are distinguished by the number of rests. Therefore, a similar method can be used for spectrum analysis to obtain the above-mentioned voice characteristics. Through the analysis of other voice features, it is possible to review and compare the user information at the time of approval to confirm whether the operation is performed by the person.
在一个实施例中,获取进件时用户的第一输入语音之后,还包括:通过语音识别获取与第一输入语音对应的录入字段信息;获取用户的证件图片,并通过图像文字识别获取用户的证件信息,其中,证件图片指的是摄取的身份证图片;通过获取的证件信息对相应的录入字段信息进行验证,例如,通过录入字段与证件信息的字段进行文字字段比对,得到录入字段与证件信息的字段的相似度,若相似度大于或等于预设相似度阈值,则验证通过,若相似度小于预设相似度阈值,则验证不通过。In one embodiment, after acquiring the first input voice of the user when entering the document, the method further includes: acquiring input field information corresponding to the first input voice through voice recognition; acquiring the user's ID picture, and acquiring the user’s voice through image text recognition Credential information, where the credential picture refers to the captured ID card picture; the corresponding entry field information is verified by the obtained credential information, for example, the entry field is compared with the field of the credential information to obtain the entry field and For the similarity of the fields of the credential information, if the similarity is greater than or equal to the preset similarity threshold, the verification is passed, and if the similarity is less than the preset similarity threshold, the verification fails.
需要说明的是,本申请中的语音识别技术和图像文字识别均为现有技术,在此不再赘述。本申请通过语音识别和图像文字识别相结合,进一步地加强了对用户信息的验证过程,降低用户信息录入错误的可能性。It should be noted that the speech recognition technology and image text recognition in this application are both existing technologies, and will not be repeated here. Through the combination of voice recognition and image and text recognition, this application further strengthens the verification process of user information and reduces the possibility of user information input errors.
在一个实施例中,获取进件时用户的第一输入语音之后,还包括:通过语音识别获取与第一输入语音对应的录入字段信息;将录入字段信息以页面的形式展现,以方便用户对录入信息进行查缺补漏,确保贷款申请信息的正确性,以更便捷、更高效的方式申请贷款,完成贷款场景的进件。In one embodiment, after acquiring the first input voice of the user during the input, it further includes: acquiring input field information corresponding to the first input voice through voice recognition; displaying the input field information in the form of a page to facilitate the user Enter information to check for deficiencies, ensure the correctness of loan application information, apply for loans in a more convenient and efficient way, and complete the input of loan scenarios.
在一个实施例中,获取进件时用户的第一输入语音之后,还包括:将第一输入语音转化为文本;对转化文本进行文字情绪识别;根据文字情绪识别结果判断用户是否说谎,若判断得出用户说谎,则结束进件,若判断得出用户未说谎,则进行获取审批时用户的第二输入语音的步骤。通过文字情绪识别对用户进行预先判断,若用户说谎,则无需再获取第二输入语音,直接结束进件。In one embodiment, after obtaining the first input voice of the user during the entry, it further includes: converting the first input voice into text; performing text emotion recognition on the converted text; judging whether the user is lying according to the text emotion recognition result, and if it is judged If it is concluded that the user is lying, the input is ended, and if it is determined that the user is not lying, the step of obtaining the second input voice of the user during approval is performed. Pre-judge the user through text emotion recognition. If the user lies, there is no need to obtain the second input voice and directly end the entry.
根据文字情绪识别结果判断用户是否说谎,包括:判断用户录入第一输入语音时的情绪是否满足设定条件,若满足设定条件,则认为用户说谎。例如,当识别到用户录入第一输入语音时的情绪波动剧烈,或者带有惊慌、惊愕等情绪时,则认为用户说谎;当识别到用户录入第一输入语音时的情绪稳定、平缓,则认为用户未说谎。其中,所述设定条件包括:语速超过第一设定阈值、响度频率的波动超过预设波动范围(过大或过小均可认为用户说谎)、语音停顿次数超过第二设定阈值中的一种或多种。例如,第一设定阈值可以是150字/min,当语速大于150字/min时,则认为用户情绪波动较大。也就是说,通过得到的语音特征等可以识别到情绪是否波动以及情绪的波动情况,此处的情绪波动指的是输入语音所携带的情绪波动,通过将输入语音转化为文字进行情绪识别。通过用户录音的声音状态来分析用户语音录入贷款信息时的情绪特征,以判断其是否说谎,增加对用户语速、声音频率、情绪等方面的考量,有利于有效地进行智能风控判别。Determining whether the user is lying according to the text emotion recognition result includes: determining whether the user's emotion when entering the first input voice meets a set condition, and if the set condition is met, the user is considered to be lying. For example, when it is recognized that the user’s emotions when entering the first input voice fluctuates sharply, or with panic, consternation and other emotions, the user is considered to be lying; when the user’s emotion is stable and calm when the first input voice is recognized, it is considered The user did not lie. Wherein, the setting conditions include: the speech rate exceeds the first set threshold, the fluctuation of the loudness frequency exceeds the preset fluctuation range (too large or too small can be regarded as the user lying), and the number of speech pauses exceeds the second set threshold. One or more of. For example, the first set threshold may be 150 words/min. When the speech rate is greater than 150 words/min, it is considered that the user's mood fluctuates greatly. That is to say, through the obtained speech features, it is possible to identify whether the emotion fluctuates and the emotion fluctuation situation. The emotion fluctuation here refers to the emotion fluctuation carried by the input voice, and emotion recognition is performed by converting the input voice into text. Analyze the emotional characteristics of the user's voice recording the loan information through the voice state of the user's voice to determine whether it is lying, increase the consideration of the user's speech rate, voice frequency, emotion, etc., which is conducive to effective intelligent risk control judgment.
图2为本申请所述贷款智能进件装置的示意图,如图2所示,所述贷款智能进件装置包括:语音获取模块1,用于获取进件时用户的第一输入语音和审批时用户的第二输入语音,其中,进件指的是用户贷款时向贷款机构或银行系统提交申请贷款所需材料,审批指的是进件后对所提交材料进行审核,审批通过,则表示进件成功,才可向用户发放贷款;第一输入语音指的是用户申请贷款时录入的语音信息,第二输入语音指的是对用户贷款申请进行审批时用户录入的语音信息;特征提取模块2,用于提取第一输入语音和第二输入语音的语音特征;语音分析模块3,利用经过训练得到的语音分析模型,对所述第一输入语音和所述第二输入语音进行语音分析;第一判断模块4,根据语音分析结果判断进件时的用户和审批时的用户是否是同一用户;若进件时的用户与审批时的用户是同一用户,则审批通过,用户进件成功;若进件时的用户与审批时的用户不是同一用户,则审批不通过,用户进件失败。Figure 2 is a schematic diagram of the intelligent loan entry device of the application. As shown in Figure 2, the loan intelligent entry device includes: a voice acquisition module 1, used to obtain the user's first input voice and approval time during entry The user’s second input voice, in which the input refers to the materials required for the loan application submitted by the user to the lending institution or the banking system when the user takes a loan, and the approval refers to the review of the submitted materials after the input. Only when the application is successful can the user be issued a loan; the first input voice refers to the voice information entered when the user applies for a loan, and the second input voice refers to the voice information entered by the user when the user’s loan application is approved; feature extraction module 2 , Is used to extract the voice features of the first input voice and the second input voice; the voice analysis module 3 uses the voice analysis model obtained through training to perform voice analysis on the first input voice and the second input voice; A judging module 4, according to the voice analysis result, judge whether the user at the time of the incoming shipment and the user at the time of approval are the same user; if the user at the time of incoming shipment and the user at the time of approval are the same user, the approval is passed and the user enters successfully; if If the user at the time of importing and approving is not the same user, the approval will not be passed, and the user will fail to enter the file.
其中,语音分析模型采用对抗神经网络模型,所述对抗神经网络模型包括生成模型和判别模型,所述生成模型用于生成与第二输入语音对应的语音向量(其中,语音向量由语音特征值组成),所述判别模型用于判断第二输入语音对应的用户与第一输入语音对应的用户是同一用户的概率,具体地,若语音分析得到的输出概率大于或等于预先设定的概率阈值,则判定模块判定为进件和审批时是同一用户,若输出概率小于预先设定的概率阈值,则判定模块判定为进件和审批时不是同一用户,则进件不成功,不能发放贷款。Wherein, the speech analysis model adopts a confrontation neural network model, the confrontation neural network model includes a generation model and a discrimination model, and the generation model is used to generate a speech vector corresponding to the second input speech (wherein, the speech vector is composed of speech feature values) ), the discriminant model is used to determine the probability that the user corresponding to the second input voice and the user corresponding to the first input voice are the same user, specifically, if the output probability obtained by the voice analysis is greater than or equal to the preset probability threshold, Then the judgment module judges that it is the same user at the time of inbound and approval. If the output probability is less than the preset probability threshold, the judgment module judges that the inbound and approval are not the same user, then the inbound is unsuccessful and the loan cannot be issued.
需要说明的是,第一输入语音和第二输入语音的内容均为用户申请贷款时填写的各种信息,包括个人身份证、住址等的个人信息及配偶、联系人、贷款意图、是否有房有车等相关信息。It should be noted that the contents of the first input voice and the second input voice are all kinds of information that the user fills in when applying for a loan, including personal information such as personal ID card, residential address, and spouse, contact person, loan intention, and whether there is a house or not. There are car and other related information.
本申请的一个实施例中,所述贷款智能进件装置还包括训练模块,对语音分析模型进行训练,当语音分析模型采用对抗神经网络模型时,对所述对抗神经网络模型进行训练的步骤包括:获取训练样本,所述训练样本包括进件时用户的第一输入语音和审批时用户的第二输入语音;将训练样本输入对抗神经网络模型进行训练,其中,通过生成模型对第一输入语音进行语音特征的学习,通过学习的语音特征生成新的语音,作为与第二输入语音相对应的语音向量,所述语音向量用于对抗训练;通过判别模型对第一输入语音和第二输入语音对应的用户是否是同一用户进行判断,输出第一输入语音和第二输入语音对应的用户是同一用户的概率,概率越大,表明所述对抗神经网络模型的准确率越高,当所述判别模型输出结果的准确率超过预设阈值时,结束训练。进一步地,生成模型使用用户的第一输入语音文件,接收随机噪声,通过生成模型的全连接层将第一输入语音的语音特征转化为特征表,通过反卷积层对所述特征表的语音特征数据进行反卷积运算,经过多层反卷积层生成输出语音特征,作为与第二输入语音相对应的语音向量。通过判别模型的卷积层对生成模型的输出语音特征进行卷积运算,接入全连接层处理,最后送入激活函数中,输出所述输出语音特征数据为真假的概率,概率越大,表明所述对抗神经网络模型的准确率越高,当准确率超过预设阈值时,则可以结束训练。In an embodiment of the present application, the loan intelligent input device further includes a training module to train a voice analysis model. When the voice analysis model adopts an adversarial neural network model, the step of training the adversarial neural network model includes : Obtain training samples, the training samples include the first input voice of the user during the input and the second input voice of the user during the approval; the training samples are input into the anti-neural network model for training, wherein the first input voice is Perform the learning of voice features, and generate new voices through the learned voice features as a voice vector corresponding to the second input voice, and the voice vector is used for confrontation training; Whether the corresponding user is the same user is judged, the probability that the user corresponding to the first input voice and the second input voice is the same user, the greater the probability, the higher the accuracy of the counter neural network model, when the judgment When the accuracy of the output result of the model exceeds the preset threshold, the training ends. Further, the generation model uses the user's first input voice file to receive random noise, converts the voice features of the first input voice into a feature table through the fully connected layer of the generation model, and analyzes the voice of the feature table through the deconvolution layer. The feature data is subjected to a deconvolution operation, and an output voice feature is generated through a multi-layer deconvolution layer as a voice vector corresponding to the second input voice. Through the convolutional layer of the discriminating model, the output voice features of the generated model are convolved, and then connected to the fully connected layer for processing, and finally sent to the activation function. The probability that the output voice feature data is true or false is output. The greater the probability, It indicates that the higher the accuracy of the adversarial neural network model is, when the accuracy exceeds the preset threshold, the training can be ended.
当用户审批时的第二输入语音被采集时,通过判别模型来判断该用户的第二输入语音的各个特征值的真假,达到判断用户是否为本人的目的。具体地,将用户进件时的第一输入语音输入生成模型,得到与第二输入语音对应的语音向量,将得到的语音向量和第二输入语音均输入判别模型,从而得到第二输入语音的特征值的真假,其中,特征值指的是提取的语音特征,例如下文中的MFCC系数。通过判别模型输出特征值为真实的概率(第二输入语音的特征值与第一输入语音的特征值越接近,则认为第二输入语音越真),当输出概率值大于预设的概率阈值时,则认为第二输入语音的用户与第一输入语音的用户是同一用户。When the second input voice during the user's approval is collected, the authenticity of each feature value of the second input voice of the user is judged through the discriminant model, so as to achieve the purpose of judging whether the user is himself. Specifically, the first input voice input when the user enters the document is input to the generation model to obtain the voice vector corresponding to the second input voice, and both the obtained voice vector and the second input voice are input into the discriminant model, so as to obtain the value of the second input voice. The true or false of the feature value, where the feature value refers to the extracted speech feature, such as the MFCC coefficient below. The probability that the output feature value of the discriminant model is true (the closer the feature value of the second input voice is to the feature value of the first input voice, the more true the second input voice is considered), when the output probability value is greater than the preset probability threshold , It is considered that the user who input the second voice and the user who input the first voice are the same user.
本申请中,特征提取模块可以采用相同的方法提取第一输入语音和第二输入语音的语音特征。在一个实施例中,使用梅尔频率倒谱系数(Mel Frequency Cepstrum Coefficient, MFCC)语音特征对样本数据进行采样分析,通过使用声谱图、倒谱分析、Mel频率分析、Mel频率倒谱系数的方式来对语音特征进行提取。In this application, the feature extraction module can use the same method to extract the voice features of the first input voice and the second input voice. In one embodiment, the Mel frequency cepstrum coefficient (Mel Frequency Cepstrum Coefficient, MFCC) Voice features sample and analyze sample data, and extract voice features by using spectrogram, cepstrum analysis, Mel frequency analysis, and Mel frequency cepstrum coefficients.
以提取第一输入语音的语音特征为例,提取MFCC语音特征的具体步骤包括:对第一输入语音进行预处理,所述预处理包括预加重、分帧和加窗处理;通过快速傅里叶变换(Fast Fourier Transform, FFT)得到与每一个短时分析窗对应的FFT频谱;通过Mel滤波器组得到与FFT频谱对应的Mel频谱;在Mel频谱上进行倒谱分析,获得Mel频率倒谱系数MFCC,其中,倒谱分析包括对Mel频谱进行对数运算处理后,再进行傅里叶逆变换,傅里叶逆变换通过离散余弦变换(Discrete Cosine Transform,DCT)实现,取DCT后的第2个到第13个系数作为MFCC系数。Taking the extraction of the voice features of the first input speech as an example, the specific steps of extracting the MFCC voice features include: pre-processing the first input speech, the pre-processing including pre-emphasis, framing and windowing processing; through fast Fourier Transformation (Fast Fourier Transform, FFT) obtain the FFT spectrum corresponding to each short-term analysis window; obtain the Mel spectrum corresponding to the FFT spectrum through the Mel filter bank; perform cepstrum analysis on the Mel spectrum to obtain the Mel frequency cepstrum coefficient MFCC, where , Cepstrum analysis includes the logarithmic processing of the Mel spectrum, and then the inverse Fourier transform. The inverse Fourier transform uses the discrete cosine transform (Discrete Cosine Transform, DCT) is implemented, and the second to thirteenth coefficients after DCT are taken as MFCC coefficients.
需要说明的是,采用上述的类似方法可以提取语音的多种特征,而不仅限于MFCC系数语音特征,例如,语速、响度、高音、停顿等。其中,响度与频率有关,响度用对数值来表示,即为响度级,响度级单位为方,响度与频率声级对应关系利用等响度曲线公式计算出。音高通过声音频率高低来定义。停顿通过静息的次数来区分。因此,可以使用类似的方法进行频谱分析,得到上述语音特征。It should be noted that the above-mentioned similar methods can be used to extract multiple features of speech, not limited to MFCC coefficient speech features, such as speech rate, loudness, treble, pause, etc. Among them, the loudness is related to the frequency, and the loudness is expressed in logarithmic values, which is the loudness level, and the unit of the loudness level is square. The corresponding relationship between the loudness and the frequency sound level is calculated using the equal loudness curve formula. Pitch is defined by the frequency of the sound. Pauses are distinguished by the number of rests. Therefore, a similar method can be used for spectrum analysis to obtain the above-mentioned voice characteristics.
在一个实施例中,所述贷款智能进件装置还包括:语音识别模块,通过语音识别获取与第一输入语音对应的录入字段信息,使得用户只需要进行有效的语音交互即可完成字段的输入,有利于大批量的字段录入,将贷款进件与用户打字完全分离开,是否会打字不再成为用户申请贷款的壁垒;文字识别模块,获取用户的证件图片,并通过图像文字识别获取用户的证件信息,其中,证件图片指的是摄取的身份证图片;验证模块,通过获取的证件信息对相应的录入字段信息进行验证,确保了录入信息的准确性和可靠性。例如,通过录入字段与证件信息的字段进行文字字段比对,得到录入字段与证件信息的字段的相似度,若相似度大于或等于预设相似度阈值,则验证通过,若相似度小于预设相似度阈值,则验证不通过。In one embodiment, the intelligent loan entry device further includes: a voice recognition module, which obtains input field information corresponding to the first input voice through voice recognition, so that the user only needs to perform effective voice interaction to complete the input of the field , Which is conducive to the entry of large quantities of fields, completely separates the loan input from the user’s typing, whether typing is no longer a barrier for users to apply for a loan; the text recognition module obtains the user’s ID picture, and obtains the user’s information through image text recognition Credential information, where the credential picture refers to the captured ID card picture; the verification module verifies the corresponding entry field information through the acquired credential information to ensure the accuracy and reliability of the entered information. For example, by comparing the input field with the field of the certificate information, the similarity between the input field and the field of the certificate information is obtained. If the similarity is greater than or equal to the preset similarity threshold, the verification is passed; if the similarity is less than the preset Similarity threshold, the verification fails.
在一个实施例中,所述贷款智能进件装置还包括:语音识别模块,通过语音识别获取与第一输入语音对应的录入字段信息;页面展示模块,将录入字段信息以页面的形式展现,以方便用户对录入信息进行查缺补漏,确保贷款申请信息的正确性,以更便捷、更高效的方式申请贷款,完成贷款场景的进件。In one embodiment, the intelligent loan entry device further includes: a voice recognition module, which obtains input field information corresponding to the first input voice through voice recognition; and a page display module, which displays the input field information in the form of a page, with It is convenient for users to check deficiencies in the input information, ensure the correctness of loan application information, apply for loans in a more convenient and efficient way, and complete the input of loan scenarios.
在一个实施例中,所述贷款智能进件装置还包括:文本转化模块,用于将第一输入语音转化为文本;情绪识别模块,用于对转化文本进行文字情绪识别;第二判断模块,用于根据文字情绪识别结果判断用户是否说谎,若判断得出用户说谎,则结束进件,若判断得出用户未说谎,则进行获取审批时用户的第二输入语音的步骤。通过文字情绪识别对用户进行预先判断,若用户说谎,则无需再获取第二输入语音,直接结束进件。其中,根据文字情绪识别结果判断用户是否说谎的具体方式与上述贷款智能进件方法中的判断方式大致相同,在此不再赘述。In one embodiment, the intelligent loan entry device further includes: a text conversion module, which is used to convert the first input speech into text; an emotion recognition module, which is used to perform text emotion recognition on the converted text; and a second judgment module, It is used for judging whether the user is lying according to the result of text emotion recognition, if it is judged that the user is lying, then the entry is ended, if it is judged that the user is not lying, then the step of obtaining the second input voice of the user during approval is performed. Pre-judge the user through text emotion recognition. If the user lies, there is no need to obtain the second input voice and directly end the entry. Among them, the specific method for judging whether the user is lying based on the result of text emotion recognition is roughly the same as the judgment method in the above-mentioned intelligent loan entry method, and will not be repeated here.
需要说明的是,本申请之贷款智能进件装置的具体实施方式与上述贷款智能进件方法的具体实施方式大致相同,在此不再赘述。It should be noted that the specific implementation of the intelligent loan entry device of this application is substantially the same as the specific implementation of the above-mentioned intelligent loan entry method, and will not be repeated here.
本申请所述贷款智能进件方法应用于电子装置,所述电子装置可以是电视机、智能手机、平板电脑、计算机等终端设备。The intelligent loan payment method described in this application is applied to an electronic device, and the electronic device may be a terminal device such as a television, a smart phone, a tablet computer, and a computer.
所述电子装置包括:处理器和存储器,用于存储贷款智能进件程序,处理器执行所述贷款智能进件程序,实现以下的贷款智能进件方法:获取进件时用户的第一输入语音和审批时用户的第二输入语音;提取所述第一输入语音和所述第二输入语音的语音特征;利用经过训练得到的语音分析模型,对所述第一输入语音和所述第二输入语音进行语音分析,得出进件时的用户和审批时的用户是否是同一用户;若进件时的用户与审批时的用户是同一用户,则审批通过,用户进件成功;若进件时的用户与审批时的用户不是同一用户,则审批不通过,用户进件失败;其中,所述语音分析模型采用对抗神经网络模型,所述对抗神经网络模型包括生成模型和判别模型,所述生成模型用于生成与所述第二输入语音对应的语音向量,所述判别模型用于判断第二输入语音对应的用户与第一输入语音对应的用户是同一用户的概率。The electronic device includes: a processor and a memory, which are used to store a loan smart purchase program, the processor executes the loan smart purchase program to implement the following loan smart purchase method: the first input voice of the user when the purchase is obtained And the user’s second input voice during approval; extract the voice features of the first input voice and the second input voice; use a trained voice analysis model to compare the first input voice and the second input voice Voice analysis is performed to find out whether the user at the time of the entry and the user at the time of approval are the same user; if the user at the time of the entry and the user at the time of approval are the same user, the approval is passed, and the user enters successfully; If the user and the user at the time of approval are not the same user, the approval is not passed and the user input fails; wherein, the voice analysis model adopts an adversarial neural network model, and the adversarial neural network model includes a generative model and a discriminant model. The model is used to generate a voice vector corresponding to the second input voice, and the discrimination model is used to determine the probability that the user corresponding to the second input voice and the user corresponding to the first input voice are the same user.
所述电子装置还包括网络接口和通信总线等。其中,网络接口可以包括标准的有线接口、无线接口,通信总线用于实现各个组件之间的连接通信。The electronic device also includes a network interface, a communication bus, and the like. Among them, the network interface may include a standard wired interface and a wireless interface, and the communication bus is used to realize the connection and communication between various components.
存储器包括至少一种类型的可读存储介质,可以是闪存、硬盘、光盘等非易失性存储介质,也可以是插接式硬盘等,且并不限于此,可以是以非暂时性方式存储指令或软件以及任何相关联的数据文件并向处理器提供指令或软件程序以使该处理器能够执行指令或软件程序的任何装置。本申请中,存储器存储的软件程序包括贷款智能进件程序,并可以向处理器提供该贷款智能进件程序,以使得处理器可以执行该贷款智能进件程序,实现贷款智能进件方法。The memory includes at least one type of readable storage medium, which can be a non-volatile storage medium such as a flash memory, a hard disk, an optical disc, or a plug-in hard disk, etc., and is not limited to this, and can be stored in a non-transitory manner Any device that provides instructions or software and any associated data files to the processor to enable the processor to execute the instructions or software program. In this application, the software program stored in the memory includes the loan intelligent payment program, and the loan intelligent payment program can be provided to the processor, so that the processor can execute the loan intelligent payment program and realize the loan intelligent payment method.
处理器可以是中央处理器、微处理器或其他数据处理芯片等,可以运行存储器中的存储程序,例如,本申请中贷款智能进件程序。The processor can be a central processing unit, a microprocessor, or other data processing chips, etc., and can run a stored program in the memory, for example, the loan smart entry program in this application.
所述电子装置还可以包括显示器,显示器也可以称为显示屏或显示单元。在一些实施例中显示器可以是LED显示器、液晶显示器、触控式液晶显示器以及有机发光二极管(Organic Light-Emitting Diode,OLED)触摸器等。显示器用于显示在电子装置中处理的信息以及用于显示可视化的工作界面。The electronic device may also include a display, and the display may also be called a display screen or a display unit. In some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an organic light-emitting diode (OLED) touch device, and the like. The display is used to display the information processed in the electronic device and to display the visual work interface.
所述电子装置还可以包括用户接口,用户接口可以包括输入单元(比如键盘)、语音输出装置(比如音响、耳机)等。The electronic device may further include a user interface, and the user interface may include an input unit (such as a keyboard), a voice output device (such as a stereo, earphone), and the like.
需要说明的是,本申请之电子装置的具体实施方式与上述贷款智能进件方法、装置的具体实施方式大致相同,在此不再赘述。It should be noted that the specific implementation of the electronic device of the present application is substantially the same as the specific implementation of the above-mentioned intelligent loan payment method and device, and will not be repeated here.
在其他实施例中,贷款智能进件程序还可以被分割为一个或者多个模块,一个或者多个模块被存储于存储器中,并由处理器执行,以完成本申请。本申请所称的模块是指能够完成特定功能的一系列计算机程序指令段。例如,所述贷款智能进件程序可以被分割为:语音获取模块1、特征提取模块2、语音分析模块3和第一判断模块4。上述模块所实现的功能或操作步骤均与上文类似,此处不再详述。In other embodiments, the loan smart entry program can also be divided into one or more modules, and the one or more modules are stored in the memory and executed by the processor to complete the application. The module referred to in this application refers to a series of computer program instruction segments that can complete specific functions. For example, the loan smart entry program can be divided into: a voice acquisition module 1, a feature extraction module 2, a voice analysis module 3, and a first judgment module 4. The functions or operation steps implemented by the above modules are all similar to the above, and will not be described in detail here.
本申请的一个实施例中,计算机可读存储介质可以是任何包含或存储程序或指令的有形介质,可以是非易失性,也可以是易失性,其中的程序可以被执行,通过存储的程序指令相关的硬件实现相应的功能。例如,计算机可读存储介质可以是计算机磁盘、硬盘、随机存取存储器、只读存储器等。本申请并不限于此,可以是以非暂时性方式存储指令或软件以及任何相关数据文件或数据结构并且可提供给处理器以使处理器执行其中的程序或指令的任何装置。所述计算机可读存储介质中包括贷款智能进件程序,所述贷款智能进件程序被处理器执行时,实现如下的贷款智能进件方法:获取进件时用户的第一输入语音和审批时用户的第二输入语音;提取所述第一输入语音和所述第二输入语音的语音特征;利用经过训练得到的语音分析模型,对所述第一输入语音和所述第二输入语音进行语音分析,得出进件时的用户和审批时的用户是否是同一用户;若进件时的用户与审批时的用户是同一用户,则审批通过,用户进件成功;若进件时的用户与审批时的用户不是同一用户,则审批不通过,用户进件失败;其中,所述语音分析模型采用对抗神经网络模型,所述对抗神经网络模型包括生成模型和判别模型,所述生成模型用于生成与所述第二输入语音对应的语音向量,所述判别模型用于判断第二输入语音对应的用户与第一输入语音对应的用户是同一用户的概率。In an embodiment of the present application, the computer-readable storage medium may be any tangible medium that contains or stores a program or instruction, and may be non-volatile or volatile. The program can be executed through the stored program The instruction-related hardware implements the corresponding functions. For example, the computer-readable storage medium may be a computer disk, a hard disk, a random access memory, a read-only memory, and so on. This application is not limited to this, and it can be any device that stores instructions or software and any related data files or data structures in a non-transitory manner and can be provided to the processor to enable the processor to execute the programs or instructions therein. The computer-readable storage medium includes a loan intelligent payment program, and when the loan intelligent payment program is executed by a processor, the following loan intelligent payment method is realized: the first input voice of the user when the payment is acquired and the approval time The user’s second input voice; extract the voice features of the first input voice and the second input voice; use the trained voice analysis model to perform voice on the first input voice and the second input voice Analyze, it can be concluded whether the user at the time of the input and the user at the time of approval are the same user; if the user at the time of the input and the user at the time of approval are the same user, the approval is passed and the user input is successful; if the user at the time of the input is the same The user at the time of approval is not the same user, the approval is not passed, and the user input fails; wherein, the voice analysis model adopts an adversarial neural network model, and the adversarial neural network model includes a generative model and a discriminant model, and the generative model is used for A voice vector corresponding to the second input voice is generated, and the discrimination model is used to determine the probability that the user corresponding to the second input voice and the user corresponding to the first input voice are the same user.
本申请之计算机可读存储介质的具体实施方式与上述贷款智能进件方法、装置的具体实施方式大致相同,在此不再赘述。The specific implementation of the computer-readable storage medium of this application is substantially the same as the specific implementation of the above-mentioned intelligent loan entry method and device, and will not be repeated here.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。It should be noted that in this article, the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, device, article or method including a series of elements not only includes those elements, It also includes other elements not explicitly listed, or elements inherent to the process, device, article, or method. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, device, article, or method that includes the element.
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。The serial numbers of the above-mentioned embodiments of this application are for description only, and do not represent the advantages and disadvantages of the embodiments, and do not limit the scope of the patent of this application. Any equivalent structure or equivalent process transformation made by using the content of the specification and drawings of this application, or directly or Indirect applications in other related technical fields are included in the scope of patent protection of this application for the same reason. Through the description of the above implementation manners, those skilled in the art can clearly understand that the above-mentioned embodiment method can be implemented by means of software plus the necessary general hardware platform, of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM) as described above. , Magnetic disk, optical disk), including several instructions to make a terminal device (can be a mobile phone, a computer, a server, or a network device, etc.) execute the method described in each embodiment of the present application.

Claims (20)

  1. 一种贷款智能进件方法,应用于电子装置,其中,包括:获取进件时用户的第一输入语音和审批时用户的第二输入语音;提取所述第一输入语音和所述第二输入语音的语音特征;利用经过训练得到的语音分析模型,对所述第一输入语音和所述第二输入语音进行语音分析,得出进件时的用户和审批时的用户是否是同一用户;若进件时的用户与审批时的用户是同一用户,则审批通过,用户进件成功;若进件时的用户与审批时的用户不是同一用户,则审批不通过,用户进件失败;其中,所述语音分析模型采用对抗神经网络模型,所述对抗神经网络模型包括生成模型和判别模型,所述生成模型用于生成与所述第二输入语音对应的语音向量,所述判别模型用于判断第二输入语音对应的用户与第一输入语音对应的用户是同一用户的概率。A smart loan entry method, applied to an electronic device, which includes: acquiring a user's first input voice during entry and a user's second input voice during approval; extracting the first input voice and the second input Voice characteristics of the voice; using a trained voice analysis model to perform voice analysis on the first input voice and the second input voice to find whether the user at the time of the entry and the user at the time of approval are the same user; if If the user at the time of input and the user at the time of approval are the same user, the approval is passed and the user enters the piece successfully; if the user at the time of the input and the user at the time of approval are not the same user, the approval is not passed and the user fails to enter the piece; among them, The voice analysis model adopts an adversarial neural network model, the adversarial neural network model includes a generation model and a discriminant model, the generative model is used to generate a speech vector corresponding to the second input speech, and the discriminant model is used to determine The probability that the user corresponding to the second input voice and the user corresponding to the first input voice are the same user.
  2. 根据权利要求1所述的贷款智能进件方法,其中,所述贷款智能进件方法还包括:对所述对抗神经网络模型进行训练;对所述对抗神经网络模型进行训练的步骤包括:获取训练样本,所述训练样本包括进件时用户的第一输入语音和审批时用户的第二输入语音;将训练样本输入对抗神经网络模型进行训练,其中,通过所述生成模型对所述第一输入语音进行语音特征的学习,生成与所述第二输入语音相对应的语音向量,所述语音向量用于对抗训练;通过所述判别模型对第一输入语音和第二输入语音对应的用户是否是同一用户进行判断,输出第一输入语音和第二输入语音对应的用户是同一用户的概率;当所述判别模型输出结果的准确率超过预设阈值时,结束训练。The method of claim 1, wherein the method of intelligent loan processing further comprises: training the adversarial neural network model; the step of training the adversarial neural network model comprises: obtaining training Sample, the training sample includes the user’s first input voice at the time of entry and the user’s second input voice at the time of approval; the training sample is input to the counter neural network model for training, wherein the first input The voice learns voice features, and generates a voice vector corresponding to the second input voice. The voice vector is used for combat training; whether the user corresponding to the first input voice and the second input voice is The same user judges the probability that the user corresponding to the first input voice and the second input voice is the same user; when the accuracy of the output result of the discrimination model exceeds the preset threshold, the training ends.
  3. 根据权利要求2所述的贷款智能进件方法,其中,通过所述生成模型对所述第一输入语音进行语音特征的学习,生成与所述第二输入语音相对应的语音向量的步骤包括:将所述第一输入语音输入所述生成模型;通过所述生成模型的全连接层将所述第一输入语音的语音特征转化为特征表;对所述特征表的语音特征数据进行反卷积运算,经过多层反卷积层生成输出语音特征,作为与第二输入语音相对应的语音向量。The method of claim 2, wherein the step of learning voice features of the first input voice through the generation model, and generating a voice vector corresponding to the second input voice comprises: Input the first input voice into the generation model; convert the voice feature of the first input voice into a feature table through the fully connected layer of the generation model; perform deconvolution on the voice feature data of the feature table Operation, the output voice feature is generated through the multi-layer deconvolution layer as the voice vector corresponding to the second input voice.
  4. 根据权利要求3所述的贷款智能进件方法,其中,通过所述判别模型对第一输入语音和第二输入语音对应的用户是否是同一用户进行判断的步骤包括:通过卷积层对所述输出语音特征进行卷积运算;通过全连接层对卷积运算结果进行处理;通过激活函数输出所述输出语音特征为真假的概率。The method of claim 3, wherein the step of judging whether the users corresponding to the first input voice and the second input voice are the same user through the discriminant model comprises: convolution layer The output voice feature is subjected to a convolution operation; the convolution operation result is processed through a fully connected layer; and the probability that the output voice feature is true or false is output through an activation function.
  5. 根据权利要求1所述的贷款智能进件方法,其中,提取所述第一输入语音和所述第二输入语音的语音特征时,使用梅尔频率倒谱系数语音特征进行采样分析,通过使用声谱图、倒谱分析、Mel频率分析、Mel频率倒谱系数的方式对语音特征进行提取。The method of claim 1, wherein when extracting the voice features of the first input voice and the second input voice, the voice feature of Mel frequency cepstral coefficient is used for sampling and analysis, and the voice Voice features are extracted by means of spectrogram, cepstrum analysis, Mel frequency analysis, and Mel frequency cepstrum coefficients.
  6. 根据权利要求5所述的贷款智能进件方法,其中,提取梅尔频率倒谱系数语音特征的步骤包括:对输入语音进行预处理,所述预处理包括预加重、分帧和加窗处理;通过快速傅里叶变换得到与每一个短时分析窗对应的FFT频谱;通过Mel滤波器组得到与FFT频谱对应的Mel频谱;在Mel频谱上进行倒谱分析,获得梅尔频率倒谱系数。The method of claim 5, wherein the step of extracting the voice feature of Mel frequency cepstral coefficients comprises: pre-processing the input speech, and the pre-processing includes pre-emphasis, framing and windowing; The FFT spectrum corresponding to each short-time analysis window is obtained through fast Fourier transform; the Mel spectrum corresponding to the FFT spectrum is obtained through the Mel filter bank; Cepstrum analysis is performed on the Mel spectrum to obtain the Mel frequency cepstrum coefficient.
  7. 根据权利要求1所述的贷款智能进件方法,其中,获取进件时用户的第一输入语音之后,还包括:通过语音识别获取与第一输入语音对应的录入字段信息;获取用户的证件图片,并通过图像文字识别获取用户的证件信息;通过获取的证件信息对相应的录入字段信息进行验证。The method of claim 1, wherein after obtaining the first input voice of the user during the purchase, the method further comprises: obtaining input field information corresponding to the first input voice through voice recognition; obtaining the user's ID picture , And obtain the user's credential information through image text recognition; verify the corresponding input field information through the obtained credential information.
  8. 根据权利要求1所述的贷款智能进件方法,其中,获取进件时用户的第一输入语音之后,还包括:通过语音识别获取与第一输入语音对应的录入字段信息;将录入字段信息以页面的形式展现。The method of claim 1, wherein after obtaining the first input voice of the user during the purchase, the method further comprises: obtaining input field information corresponding to the first input voice through voice recognition; and converting the input field information to The form of the page is displayed.
  9. 根据权利要求1所述的贷款智能进件方法,其中,获取进件时用户的第一输入语音之后,还包括:将第一输入语音转化为文本;对转化文本进行文字情绪识别;根据文字情绪识别结果判断用户是否说谎,若判断得出用户说谎,则结束进件,若判断得出用户未说谎,则进行获取审批时用户的第二输入语音的步骤。The intelligent loan entry method according to claim 1, wherein after obtaining the first input voice of the user during the entry, the method further comprises: converting the first input voice into text; performing text emotion recognition on the converted text; and performing text emotion recognition on the converted text; The recognition result determines whether the user is lying. If it is determined that the user is lying, the entry is ended. If it is determined that the user is not lying, the step of obtaining the second input voice of the user during approval is performed.
  10. 根据权利要求1所述的贷款智能进件方法,其中,根据文字情绪识别结果判断用户是否说谎,包括:判断用户录入第一输入语音时的情绪是否满足设定条件,若满足设定条件,则认为用户说谎,其中,所述设定条件包括:语速超过第一设定阈值、响度频率的波动超过预设波动范围、语音停顿次数超过第二设定阈值中的一种或多种。The method of claim 1, wherein determining whether the user is lying according to the result of text emotion recognition comprises: determining whether the emotion of the user when entering the first input voice satisfies a set condition, and if the set condition is satisfied, then It is believed that the user is lying, where the set conditions include one or more of: the speech rate exceeds the first set threshold, the fluctuation of the loudness frequency exceeds the preset fluctuation range, and the number of speech pauses exceeds the second set threshold.
  11. 一种电子装置,其中,该电子装置包括:处理器和存储器,所述存储器中包括贷款智能进件程序,所述贷款智能进件程序被所述处理器执行时实现如下贷款智能进件方法:获取进件时用户的第一输入语音和审批时用户的第二输入语音;提取所述第一输入语音和所述第二输入语音的语音特征;利用经过训练得到的语音分析模型,对所述第一输入语音和所述第二输入语音进行语音分析,得出进件时的用户和审批时的用户是否是同一用户;若进件时的用户与审批时的用户是同一用户,则审批通过,用户进件成功;若进件时的用户与审批时的用户不是同一用户,则审批不通过,用户进件失败;其中,所述语音分析模型采用对抗神经网络模型,所述对抗神经网络模型包括生成模型和判别模型,所述生成模型用于生成与所述第二输入语音对应的语音向量,所述判别模型用于判断第二输入语音对应的用户与第一输入语音对应的用户是同一用户的概率。An electronic device, wherein the electronic device includes: a processor and a memory, the memory includes a loan intelligent payment program, and the loan intelligent payment program is executed by the processor to realize the following loan intelligent payment method: Acquire the user’s first input voice during the entry and the user’s second input voice during approval; extract the voice features of the first input voice and the second input voice; use the trained voice analysis model to compare the Perform voice analysis on the first input voice and the second input voice to find out whether the user at the time of the entry and the user at the time of approval are the same user; if the user at the time of the entry and the user at the time of approval are the same user, the approval is passed , The user’s input is successful; if the user at the time of the input and the user at the time of approval are not the same user, the approval is not passed and the user’s input fails; wherein the voice analysis model adopts an adversarial neural network model, and the adversarial neural network model It includes a generative model and a discriminant model, the generative model is used to generate a voice vector corresponding to the second input voice, and the discriminant model is used to determine that the user corresponding to the second input voice is the same as the user corresponding to the first input voice The probability of the user.
  12. 根据权利要求11所述的电子装置,其中,所述贷款智能进件程序被所述处理器执行时,还实现对所述对抗神经网络模型进行训练的步骤;其中,对所述对抗神经网络模型进行训练的步骤包括:获取训练样本,所述训练样本包括进件时用户的第一输入语音和审批时用户的第二输入语音;将训练样本输入对抗神经网络模型进行训练,其中,通过所述生成模型对所述第一输入语音进行语音特征的学习,生成与所述第二输入语音相对应的语音向量,所述语音向量用于对抗训练;通过所述判别模型对第一输入语音和第二输入语音对应的用户是否是同一用户进行判断,输出第一输入语音和第二输入语音对应的用户是同一用户的概率;当所述判别模型输出结果的准确率超过预设阈值时,结束训练。The electronic device according to claim 11, wherein, when the loan intelligent input program is executed by the processor, the step of training the counter neural network model is further implemented; wherein, the counter neural network model is trained The step of training includes: obtaining training samples, the training samples including the first input voice of the user during the entry and the second input voice of the user during the approval; inputting the training samples into the confrontation neural network model for training, wherein, through the The generation model learns the voice features of the first input voice, and generates a voice vector corresponding to the second input voice. The voice vector is used for confrontation training; Second, whether the user corresponding to the input voice is the same user is judged, and the probability that the user corresponding to the first input voice and the second input voice are the same user; when the accuracy of the output result of the discrimination model exceeds the preset threshold, the training ends .
  13. 根据权利要求12所述的电子装置,其中,通过所述生成模型对所述第一输入语音进行语音特征的学习,生成与所述第二输入语音相对应的语音向量的步骤包括:将所述第一输入语音输入所述生成模型;通过所述生成模型的全连接层将所述第一输入语音的语音特征转化为特征表;对所述特征表的语音特征数据进行反卷积运算,经过多层反卷积层生成输出语音特征,作为与第二输入语音相对应的语音向量。The electronic device according to claim 12, wherein the step of learning voice features of the first input voice through the generation model, and generating a voice vector corresponding to the second input voice comprises: The first input voice is input to the generative model; the voice feature of the first input voice is converted into a feature table through the fully connected layer of the generative model; the voice feature data of the feature table is deconvolved, after The multi-layer deconvolution layer generates output voice features as a voice vector corresponding to the second input voice.
  14. 根据权利要求13所述的电子装置,其中,通过所述判别模型对第一输入语音和第二输入语音对应的用户是否是同一用户进行判断的步骤包括:通过卷积层对所述输出语音特征进行卷积运算;通过全连接层对卷积运算结果进行处理;通过激活函数输出所述输出语音特征为真假的概率。The electronic device according to claim 13, wherein the step of judging whether the users corresponding to the first input voice and the second input voice are the same user through the discriminant model comprises: using a convolutional layer to determine the output voice feature Perform a convolution operation; process the result of the convolution operation through a fully connected layer; output the probability that the output voice feature is true or false through an activation function.
  15. 根据权利要求11所述的电子装置,其中,获取进件时用户的第一输入语音之后,还包括:将第一输入语音转化为文本;对转化文本进行文字情绪识别;根据文字情绪识别结果判断用户是否说谎,若判断得出用户说谎,则结束进件,若判断得出用户未说谎,则进行获取审批时用户的第二输入语音的步骤。11. The electronic device according to claim 11, wherein after obtaining the first input voice of the user during the entry, the method further comprises: converting the first input voice into text; performing text emotion recognition on the converted text; and judging based on the text emotion recognition result Whether the user is lying or not, if it is determined that the user is lying, the entry is ended, and if it is determined that the user is not lying, the step of obtaining the second input voice of the user during approval is performed.
  16. 一种计算机可读存储介质,其中,所述计算机可读存储介质中包括贷款智能进件程序,所述贷款智能进件程序被处理器执行时,实现如下贷款智能进件方法:获取进件时用户的第一输入语音和审批时用户的第二输入语音;提取所述第一输入语音和所述第二输入语音的语音特征;利用经过训练得到的语音分析模型,对所述第一输入语音和所述第二输入语音进行语音分析,得出进件时的用户和审批时的用户是否是同一用户;若进件时的用户与审批时的用户是同一用户,则审批通过,用户进件成功;若进件时的用户与审批时的用户不是同一用户,则审批不通过,用户进件失败;其中,所述语音分析模型采用对抗神经网络模型,所述对抗神经网络模型包括生成模型和判别模型,所述生成模型用于生成与所述第二输入语音对应的语音向量,所述判别模型用于判断第二输入语音对应的用户与第一输入语音对应的用户是同一用户的概率。A computer-readable storage medium, wherein the computer-readable storage medium includes a loan intelligent payment program, and when the loan intelligent payment program is executed by a processor, the following loan intelligent payment method is implemented: when acquiring the payment The user’s first input voice and the user’s second input voice at the time of approval; extract the voice features of the first input voice and the second input voice; use the trained voice analysis model to compare the first input voice Perform voice analysis with the second input voice to find out whether the user at the time of the entry and the user at the time of approval are the same user; if the user at the time of the entry and the user at the time of approval are the same user, the approval is passed and the user enters the document Success; if the user at the time of the entry and the user at the time of approval are not the same user, the approval is not passed and the user entry fails; wherein, the voice analysis model adopts an adversarial neural network model, and the adversarial neural network model includes a generative model and A discriminant model, where the generation model is used to generate a speech vector corresponding to the second input speech, and the discrimination model is used to determine the probability that the user corresponding to the second input speech and the user corresponding to the first input speech are the same user.
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述贷款智能进件程序被处理器执行时,还实现对所述对抗神经网络模型进行训练的步骤;其中,对所述对抗神经网络模型进行训练的步骤包括:获取训练样本,所述训练样本包括进件时用户的第一输入语音和审批时用户的第二输入语音;将训练样本输入对抗神经网络模型进行训练,其中,通过所述生成模型对所述第一输入语音进行语音特征的学习,生成与所述第二输入语音相对应的语音向量,所述语音向量用于对抗训练;通过所述判别模型对第一输入语音和第二输入语音对应的用户是否是同一用户进行判断,输出第一输入语音和第二输入语音对应的用户是同一用户的概率;当所述判别模型输出结果的准确率超过预设阈值时,结束训练。The computer-readable storage medium according to claim 16, wherein, when the loan smart input program is executed by the processor, the step of training the counter-neural network model is further implemented; wherein, the counter-neural network model is The step of training the model includes: obtaining training samples, the training samples including the first input voice of the user during the entry and the second input voice of the user during the approval; inputting the training samples into the adversarial neural network model for training, wherein The generating model learns voice features of the first input voice, and generates a voice vector corresponding to the second input voice, and the voice vector is used for confrontation training; Whether the user corresponding to the second input voice is the same user is judged, the probability that the user corresponding to the first input voice and the second input voice are the same user; when the accuracy of the output result of the discrimination model exceeds the preset threshold, the end train.
  18. 根据权利要求17所述的计算机可读存储介质,其中,通过所述生成模型对所述第一输入语音进行语音特征的学习,生成与所述第二输入语音相对应的语音向量的步骤包括:将所述第一输入语音输入所述生成模型;通过所述生成模型的全连接层将所述第一输入语音的语音特征转化为特征表;对所述特征表的语音特征数据进行反卷积运算,经过多层反卷积层生成输出语音特征,作为与第二输入语音相对应的语音向量。18. The computer-readable storage medium according to claim 17, wherein the step of learning voice features of the first input voice through the generation model, and generating a voice vector corresponding to the second input voice comprises: Input the first input voice into the generation model; convert the voice feature of the first input voice into a feature table through the fully connected layer of the generation model; perform deconvolution on the voice feature data of the feature table Operation, the output voice feature is generated through the multi-layer deconvolution layer as the voice vector corresponding to the second input voice.
  19. 根据权利要求18所述的计算机可读存储介质,其中,通过所述判别模型对第一输入语音和第二输入语音对应的用户是否是同一用户进行判断的步骤包括:通过卷积层对所述输出语音特征进行卷积运算;通过全连接层对卷积运算结果进行处理;通过激活函数输出所述输出语音特征为真假的概率。18. The computer-readable storage medium according to claim 18, wherein the step of judging whether the users corresponding to the first input voice and the second input voice are the same user through the discriminant model comprises: convolutional layer The output voice feature is subjected to a convolution operation; the convolution operation result is processed through a fully connected layer; and the probability that the output voice feature is true or false is output through an activation function.
  20. 一种贷款智能进件装置,其中,包括:语音获取模块,用于获取进件时用户的第一输入语音和审批时用户的第二输入语音;特征提取模块,用于提取所述第一输入语音和所述第二输入语音的语音特征;语音分析模块,利用经过训练得到的语音分析模型,对所述第一输入语音和所述第二输入语音进行语音分析;第一判断模块,根据语音分析结果判断进件时的用户和审批时的用户是否是同一用户;若进件时的用户与审批时的用户是同一用户,则审批通过,用户进件成功;若进件时的用户与审批时的用户不是同一用户,则审批不通过,用户进件失败;其中,所述语音分析模型采用对抗神经网络模型,所述对抗神经网络模型包括生成模型和判别模型,所述生成模型用于生成与所述第二输入语音对应的语音向量,所述判别模型用于判断第二输入语音对应的用户与第一输入语音对应的用户是同一用户的概率。An intelligent loan entry device, which includes: a voice acquisition module for acquiring the first input voice of the user during entry and the second input voice of the user during approval; and a feature extraction module for extracting the first input The voice characteristics of the voice and the second input voice; the voice analysis module uses the voice analysis model obtained through training to perform voice analysis on the first input voice and the second input voice; the first judgment module is based on the voice The analysis result determines whether the user at the time of the incoming piece and the user at the time of approval are the same user; if the user at the time of the incoming piece and the user at the time of approval are the same user, the approval is passed, and the user enters the piece successfully; if the user at the time of incoming piece and the approved user are the same user When the user is not the same user, the approval is not passed and the user input fails; wherein, the voice analysis model adopts an adversarial neural network model, and the adversarial neural network model includes a generative model and a discriminant model, and the generative model is used to generate The voice vector corresponding to the second input voice, and the discrimination model is used to determine the probability that the user corresponding to the second input voice and the user corresponding to the first input voice are the same user.
     To
PCT/CN2020/103931 2020-04-02 2020-07-24 Intelligent loan entry method, and apparatus and storage medium WO2021196458A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010254541.2A CN111583935A (en) 2020-04-02 2020-04-02 Loan intelligent delivery method, device and storage medium
CN202010254541.2 2020-04-02

Publications (1)

Publication Number Publication Date
WO2021196458A1 true WO2021196458A1 (en) 2021-10-07

Family

ID=72112451

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/103931 WO2021196458A1 (en) 2020-04-02 2020-07-24 Intelligent loan entry method, and apparatus and storage medium

Country Status (2)

Country Link
CN (1) CN111583935A (en)
WO (1) WO2021196458A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113488069B (en) * 2021-07-06 2024-05-24 浙江工业大学 Speech high-dimensional characteristic rapid extraction method and device based on generation type countermeasure network

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8600798B1 (en) * 2007-09-21 2013-12-03 Ellie Mae, Inc. Loan screening
CN107977776A (en) * 2017-11-14 2018-05-01 重庆小雨点小额贷款有限公司 Information processing method, device, server and computer-readable recording medium
CN109325742A (en) * 2018-09-26 2019-02-12 平安普惠企业管理有限公司 Business approval method, apparatus, computer equipment and storage medium
CN109360571A (en) * 2018-10-31 2019-02-19 深圳壹账通智能科技有限公司 Processing method and processing device, storage medium, the computer equipment of credit information
CN109473108A (en) * 2018-12-15 2019-03-15 深圳壹账通智能科技有限公司 Auth method, device, equipment and storage medium based on Application on Voiceprint Recognition
CN110443692A (en) * 2019-07-04 2019-11-12 平安科技(深圳)有限公司 Enterprise's credit authorization method, apparatus, equipment and computer readable storage medium
CN110675881A (en) * 2019-09-05 2020-01-10 北京捷通华声科技股份有限公司 Voice verification method and device

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7242752B2 (en) * 2001-07-03 2007-07-10 Apptera, Inc. Behavioral adaptation engine for discerning behavioral characteristics of callers interacting with an VXML-compliant voice application
KR20140034958A (en) * 2012-09-10 2014-03-21 (주)크리젠솔루션 System for mamaging loans data using mobile applications
CN105513597B (en) * 2015-12-30 2018-07-10 百度在线网络技术(北京)有限公司 Voiceprint processing method and processing device
CN110010133A (en) * 2019-03-06 2019-07-12 平安科技(深圳)有限公司 Vocal print detection method, device, equipment and storage medium based on short text
CN110379441B (en) * 2019-07-01 2020-07-17 特斯联(北京)科技有限公司 Voice service method and system based on countermeasure type artificial intelligence network
CN110738998A (en) * 2019-09-11 2020-01-31 深圳壹账通智能科技有限公司 Voice-based personal credit evaluation method, device, terminal and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8600798B1 (en) * 2007-09-21 2013-12-03 Ellie Mae, Inc. Loan screening
CN107977776A (en) * 2017-11-14 2018-05-01 重庆小雨点小额贷款有限公司 Information processing method, device, server and computer-readable recording medium
CN109325742A (en) * 2018-09-26 2019-02-12 平安普惠企业管理有限公司 Business approval method, apparatus, computer equipment and storage medium
CN109360571A (en) * 2018-10-31 2019-02-19 深圳壹账通智能科技有限公司 Processing method and processing device, storage medium, the computer equipment of credit information
CN109473108A (en) * 2018-12-15 2019-03-15 深圳壹账通智能科技有限公司 Auth method, device, equipment and storage medium based on Application on Voiceprint Recognition
CN110443692A (en) * 2019-07-04 2019-11-12 平安科技(深圳)有限公司 Enterprise's credit authorization method, apparatus, equipment and computer readable storage medium
CN110675881A (en) * 2019-09-05 2020-01-10 北京捷通华声科技股份有限公司 Voice verification method and device

Also Published As

Publication number Publication date
CN111583935A (en) 2020-08-25

Similar Documents

Publication Publication Date Title
WO2018166187A1 (en) Server, identity verification method and system, and a computer-readable storage medium
WO2021208287A1 (en) Voice activity detection method and apparatus for emotion recognition, electronic device, and storage medium
WO2020177380A1 (en) Voiceprint detection method, apparatus and device based on short text, and storage medium
JP6621536B2 (en) Electronic device, identity authentication method, system, and computer-readable storage medium
TWI527023B (en) A voiceprint recognition method and apparatus
CN109493872B (en) Voice information verification method and device, electronic equipment and storage medium
JP6096333B2 (en) Method, apparatus and system for verifying payment
US8548818B2 (en) Method and system for authenticating customer identities
WO2019179029A1 (en) Electronic device, identity verification method and computer-readable storage medium
WO2021047319A1 (en) Voice-based personal credit assessment method and apparatus, terminal and storage medium
CN112562691A (en) Voiceprint recognition method and device, computer equipment and storage medium
WO2020238046A1 (en) Human voice smart detection method and apparatus, and computer readable storage medium
CN113177850A (en) Method and device for multi-party identity authentication of insurance
WO2021196458A1 (en) Intelligent loan entry method, and apparatus and storage medium
WO2021128847A1 (en) Terminal interaction method and apparatus, computer device, and storage medium
CN112201254A (en) Non-sensitive voice authentication method, device, equipment and storage medium
CN116312559A (en) Training method of cross-channel voiceprint recognition model, voiceprint recognition method and device
CN113035230B (en) Authentication model training method and device and electronic equipment
CN112992155B (en) Far-field voice speaker recognition method and device based on residual error neural network
CN115242927A (en) Customer service object distribution method and device, computer equipment and storage medium
Lopez‐Otero et al. Influence of speaker de‐identification in depression detection
CN114171032A (en) Cross-channel voiceprint model training method, recognition method, device and readable medium
CN113436633B (en) Speaker recognition method, speaker recognition device, computer equipment and storage medium
TW201944320A (en) Payment authentication method, device, equipment and storage medium
Moreno-Rodriguez et al. Bimodal biometrics using EEG-voice fusion at score level based on hidden Markov models

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20928894

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 01.02.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20928894

Country of ref document: EP

Kind code of ref document: A1