CN112365340A

CN112365340A - Multi-mode personal loan risk prediction method

Info

Publication number: CN112365340A
Application number: CN202011312355.6A
Authority: CN
Inventors: 杨赛; 顾全林
Original assignee: Wuxi Xishang Bank Co ltd
Current assignee: Wuxi Xishang Bank Co ltd
Priority date: 2020-11-20
Filing date: 2020-11-20
Publication date: 2021-02-12

Abstract

The invention relates to the technical field of online loans, and particularly discloses a multi-mode personal loan risk prediction method, which comprises the following steps: obtaining audio and video information of the borrower through the answer of the specified questions of the online video; preprocessing audio and video information and extracting audio features; carrying out face detection on a borrower in a video to obtain a face frame, carrying out face detection in the face frame to obtain 68 face key points, carrying out min-max coordinate standardization according to the face frame, and selecting 32 key points from the 68 face key points to extract video time domain features; performing z-score processing on the audio characteristic and the video time domain characteristic, and constructing a multi-modal combined characteristic according to weight proportion; and establishing an SVM risk prediction model of the personal loan based on the multi-mode combined characteristics. The multi-mode personal loan risk prediction method provided by the invention can predict the loan risk of a loan applicant in real time by extracting the micro-expression time domain characteristics of the audio frequency and the human face of the loan.

Description

Multi-mode personal loan risk prediction method

Technical Field

The invention relates to the technical field of online loans, in particular to a multi-mode personal loan risk prediction method.

Background

With the rapid development of internet technology, big data and artificial intelligence technology, internet finance (online loan) gradually becomes an important channel for small and micro enterprises, family workshop-type enterprises and low-income groups to acquire financing. Meanwhile, the nation advocates the promotion of "general finance" to stabilize growth, expand employment, promote innovation, thrive the market and meet the various needs of the people.

The current online loan auditing is mainly implemented by auditing the authenticity of data provided by a borrower through a telephone access mode by an auditing person, and specifically comprises auditing the authenticity of basic information of the borrower, personal credit investigation, multi-head loan, e-commerce and internet behavior data, black and grey lists and the like.

However, most loan applicants are small and miniature enterprises, family workshop-type enterprises and low-income groups, and the groups have the problems of difficult income information acquisition, difficult confirmation, lack of qualified mortgage, missing or insufficient credit records and the like.

In the face of the group, the approvers are difficult to accurately evaluate the risk level of the client only through telephone communication, and meanwhile, the manual auditing mode is low in efficiency, changes of micro-expression and context of the borrowers in the communication process are ignored, so that the auditing result of the manual auditing is not accurate enough, namely the credit degree is not high.

Disclosure of Invention

The invention aims to overcome the defects in the prior art, provides a multi-mode personal loan risk prediction method, can make reference to the preliminary loan risk level of small and micro enterprises and low-income groups, which have difficulty in acquiring, confirming and lacking qualified mortgage information, and have missing or insufficient credit records, and effectively improve the loan auditing efficiency.

As a first aspect of the present invention, there is provided a multi-modal personal loan risk prediction method, comprising:

step S1: obtaining audio and video information of a borrower through the answer of the specified questions of the online video, and carrying out comprehensive weighted manual evaluation scores according to the answer results of the questions and the speed and the micro-expression of the borrower;

step S2: carrying out mean value smooth denoising pretreatment on the audio and video information of the borrower, and extracting various audio features;

step S3: performing face detection on a borrower in a video by utilizing Retina face to obtain a face frame, performing face key point detection in the face frame by utilizing a PRNet to obtain 68 face key points, performing min-max coordinate standardization according to the face frame, and selecting 32 key points of eyes and mouths from the 68 face key points to extract multiple types of video time domain features;

step S4: carrying out z-score standardization processing on the multi-class audio features and the video time domain features, and constructing multi-modal combined features according to weight proportion;

step S5: and establishing an SVM risk prediction model of the personal loan based on the multi-mode combined characteristics.

Further, the step S2 further includes:

adopting 3 multiplied by 3 mean value smooth filtering to the audio and video information of the borrower to extract 34 types of audio features

Where N is the number of samples, the sampling frequency fs is 16000Hz, the calculation window win is 0.05, and the calculation step is 0.05.

Further, the step S3 further includes:

obtaining the largest face frame bbox in the video by utilizing Retina face, and performing 1.25 times of expansion on the face frame bbox to obtain the maximum face frame bbox

An area;

to this by PRNet

Detecting key points of the human face in the face frame of the area to obtain 68 key points of the human face;

carrying out min-max standardization processing on the point of landmark in the face frame in the x and y directions by a bbox scale respectively;

selecting 32 key points of the eyes and the mouth part in the face frame as target key points;

constructing micro-expression data set by target key points in key frames in video stream

Wherein fps_numThe number of key frames;

extracting 34 types of video time domain features F in the micro expression data set D_B∈{R^N×68|]R_x ^N×34,R_y ^N×34]And f, wherein the sampling frequency fs is 16000Hz, the calculation window win is 0.05, and the calculation step is 0.05.

Further, the step S4 further includes:

separately for audio features F₁And video temporal features F₂Carrying out the standardization processing of z-score to correspondingly obtain the audio frequency characteristic F conforming to the standard normal distribution₁' and video temporal feature F₂'；

For audio frequency characteristic F conforming to standard normal distribution₁' and video temporal feature F₂' Association to form a multimodal Association signature F ═ w₁*F'_A,w₂*F'_B]∈{R^N×102In which w₁+w₂＝1。

Further, the step S5 further includes:

loading the multi-modal combined features into an SVM regression prediction model to train a loan risk prediction model; wherein, the SVM regression prediction model adopts RBF kernel function image, and punishment coefficients C, gamma and w are subjected to cross validation grid optimization method according to 10 folds₁,w₂And optimizing.

The multi-mode personal loan risk prediction method provided by the invention has the following advantages: the method has the advantages that through analyzing and displaying the answer audio and video of the specified questions (such as personal income, loan experience and the like) submitted by an applicant during loan application, in the answering process, fluent speech speed, firm micro-expression change and the like of the loan applicant have positive correlation effects on the risk assessment of the loan, so that the answer audio and video information of the specified questions can be combined with an artificial intelligence technology to carry out preliminary assessment on the risk level of the loan applicant to serve as a new data dimension source for big data wind control; the method does not need manual intervention, can make reference to the preliminary loan risk level of small and micro enterprises and low-income groups, which have difficulty in collecting, identifying and lacking qualified mortgage information and lack or insufficient credit records, and effectively improves the loan auditing efficiency.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.

Fig. 1 is a flow chart of a multi-modal personal loan risk prediction method provided by the invention.

Fig. 2 is a flowchart of an embodiment of the method for predicting the risk of a multi-modal personal loan.

Detailed Description

To further illustrate the technical means and effects of the present invention for achieving the predetermined objects, the following detailed description of the embodiments, structures, features and effects of the method for predicting a multi-modal personal loan risk according to the present invention will be made with reference to the accompanying drawings and preferred embodiments. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without any inventive step, are within the scope of the present invention.

In this embodiment, a multi-modal personal loan risk prediction method is provided, as shown in fig. 1, the multi-modal personal loan risk prediction method includes:

Specifically, the multi-class audio features include 34 classes of audio features such as Spectral Spread and Energy, and the multi-class video time domain features include 34 classes of video time domain features such as Spectral Spread and Energy.

The multi-mode personal loan risk prediction method provided by the invention extracts multi-mode characteristics through the answering audio and video of the questions specified by the loan applicant, and performs preliminary loan risk assessment and loan amount estimation on a borrower by combining a risk prediction model of an SVM (support vector machine).

Preferably, in step S2, the method further includes:

in order to reduce the influence of factors such as background noise and the like, 3 multiplied by 3 mean value smoothing filtering is adopted for the audio and video information of the borrower, and 34-class audio features F are extracted_A∈{R^N×34N is the number of samples, the sampling frequency fs is 16000Hz, the calculation window win is 0.05, and the calculation step is 0.05; wherein,

respectively represent: zero Crossing Rate, Energy, entry of Energy, Spectral Centroid, Spectral Spread, Spectral entry, Spectral Flux, Spectral roll off,

are MFCCs (mel-frequency cepstral coefficients),

is Chroma Vector (12 steps scale),

chroma development (standard Deviation of 12-step scale), and N is the number of samples.

Preferably, in step S3, the method further includes:

obtaining the largest Face frame bbox in the video by using a Retina Face (Single-stage Face localization in the Wild) Face detection algorithm, and performing 1.25 times expansion on the Face frame bbox to obtain the maximum Face frame bbox

An area;

the information is processed by PRNet (Joint 3D Face Reconstruction and Dense Alignment with Position Map Regression Network)

in order to eliminate the influence of face scale change caused by face or screen relative motion, the min-max (dispersion standardization) of the bbox (face coordinate frame) scale is respectively carried out on the points of the landmark (face key points) in the face frame in the directions of x and y;

in the key points of the face, the eyes and the mouth can reflect the change of the micro-expression of the face, so that 32 key points index [ 36.,. 68] of the eyes and the mouth in the face frame are selected as target key points;

Wherein fps_numAs to the number of key frames,

representing the x-axis data information,

representing y-axis data information;

extracting 34 types of video time domain features F in the micro expression data set D_B∈{R^N×68|[R_x ^N×34,R_y ^N×34]And in order to keep consistency with the audio information, the sampling frequency fs is 16000Hz, the calculation window win is 0.05, and the calculation step is 0.05.

Specifically, a human Face detection algorithm is used to perform human Face detection on a borrower in a video to obtain a maximum human Face frame (x min, y min, x max, y max), in order to enable a human Face key point to be inside the human Face frame, the center of the human Face frame is expanded by 1.25 times, a PRNet (Joint 3D Face Reconstruction and place Alignment with Position Map Regression Network) is used to perform human Face key point detection to obtain 68 human Face key points, and the coordinate standardization of min-max is performed according to the human Face frame, as shown in formula (1):

preferably, in step S4, the method further includes:

in order to ensure that the audio features and the video temporal features can meet the order of magnitude consistency, the audio features F are respectively matched₁And video temporal features F₂Carrying out normalization processing on z-score (standard deviation normalization) to obtain an audio feature F 'conforming to standard normal distribution'₁And video temporal feature F'₂；

For audio feature F 'conforming to standard normal distribution'₁And video temporal feature F'₂Combining to form a multi-mode combined feature F ═ w₁*F'_A,w₂*F'_B]∈{R^N×102In which w₁+w₂＝1。

In particular, separately for the audio features F_AAnd video temporal features F_BThe z-score normalization is performed as shown in formula (2), where u is the mean of the corresponding features and σ is the pairF 'conforming to standard normal distribution is obtained according to characteristic standard deviation'_AAnd F'_B；

Preferably, in step S5, the method further includes:

Specifically, the loan applicant video feature extraction process comprises the following steps:

acquiring a video key frame, and acquiring a face frame bbox of a borrower in a video by using a RetinaFace face detection algorithm;

performing 1.25 times of central expansion on bbox, and acquiring landmark of the borrower in the video stream by using PRNet;

carrying out min-max standardization processing on the key points in the x and y directions to eliminate scale influence caused by relative movement of the human face, wherein the standardized scale is the bbox size;

acquiring a target key point index [ 36.., 68] of eyes and mouth, which is easier to reflect the change of micro expression, in the human face;

merging the normalized target key points in the x and y directions, extracting 34 types of video time domain features F by adopting a sampling frequency fs of 16000Hz, a calculation window win of 0.05 and a calculation step of 0.05_B∈{R^N×68|[R_x ^N×34,R_y ^N ^×34]}; wherein,

respectively represent: zero Crossing Rate, Energy of Energy, Spectral Central, Spectral Spread, Spectral Entrophy, Spectral Flux, and Spectral Rolloff,

in the form of the MFCCs, is,

is a Chroma Vector, and is a Chroma Vector,

is Chroma development.

As shown in fig. 2, a flow chart of an embodiment of a method for predicting a multi-modal personal loan risk includes the following specific steps:

acquiring audio and video information of a loan applicant;

eliminating the influence of noise by adopting 3 multiplied by 3 mean filtering on audio information, setting the sampling frequency fs to 16000Hz, the calculation window win to 0.05, the calculation step size step to 0.05, and extracting 34 types of audio features F_A∈{R^N×34And (c) the step of (c) in which,

respectively represent: zero Crossing Rate, Energy of Energy, Spectral Central, Spectral Spread, Spectral Entrophy, Spectral Flux, Spectral Rolloff,

in the form of the MFCCs, is,

is a Chroma Vector, and is a Chroma Vector,

chroma development;

utilizing Retina face to detect the face of a borrower in a video to obtain a maximum face frame (x min, y min, x max, y max), expanding the center of the face frame by 1.25 times in order to enable a face key point to be in the face frame, obtaining 68 personal face key points in an expansion area through PRNet, and carrying out min-max coordinate standardization according to the face frame to eliminate scale change caused by relative movement of a camera and the face, and selecting 32 key points index of eyes and mouth in the key points [36, 68] of the eyes and mouth]Combining the target key points in all the key frames to form a micro-expression data set

Wherein fps_numThe number of key frames is such that,

representing the x-axis data information,

representing y-axis data information, and extracting 34 types of video time domain features F_B∈{R^N×68|[R_x ^N×34,R_y ^N×34]And (c) the step of (c) in which,

in the form of the MFCCs, is,

is a Chroma Vector, and is a Chroma Vector,

chroma development;

to ensure that the audio and video features meet the order of magnitude consistency, the audio feature F is separately aligned_AAnd F_BStandardized to z-score to obtain F 'according with standard normal distribution'_AAnd F'_BThe normalized features are combined to form a combined feature F ═ w₁*F'_A,w₂*F'_B]∈{R^N×102In which w₁+w₂＝1；

And training a loan risk SVM regression prediction model by using the joint features.

The multi-mode personal loan risk prediction method provided by the invention predicts the loan risk of the user by the audio and video information of the loan applicant for the specified questions, increases the analysis of micro-expression and language dimension information, does not need manual intervention, and can make reference for the preliminary loan risk level of small and micro enterprises and low-income groups with difficulty in income determination, lack of qualified quality guarantee, and lack or deficiency of credit records.

Although the present invention has been described with reference to a preferred embodiment, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method for multi-modal personal loan risk prediction, comprising:

2. The method for predicting the risk of multi-modal personal loan according to claim 1, wherein the step S2 further comprises:

adopting 3 multiplied by 3 mean value smooth filtering to the audio and video information of the borrower to extract 34-class audio features F_A∈{R^N×34N is the number of samples, the sampling frequency fs is 16000Hz, the calculation window win is 0.05, and the calculation step is 0.05.

3. The method for predicting the risk of multi-modal personal loan according to claim 1, wherein the step S3 further comprises:

An area;

to this by PRNet

Wherein fps_numThe number of key frames;

extracting 34 types of video time domain features F in the micro expression data set D_B∈{R^N×68|[R_x ^N×34,R_y ^N×34]And f, wherein the sampling frequency fs is 16000Hz, the calculation window win is 0.05, and the calculation step is 0.05.

4. The method for predicting the risk of multi-modal personal loan according to claim 1, wherein the step S4 further comprises:

5. The method for predicting the risk of multi-modal personal loan according to claim 1, wherein the step S5 further comprises: