CN116503872B - Trusted client mining method based on machine learning - Google Patents

Trusted client mining method based on machine learning Download PDF

Info

Publication number
CN116503872B
CN116503872B CN202310757418.6A CN202310757418A CN116503872B CN 116503872 B CN116503872 B CN 116503872B CN 202310757418 A CN202310757418 A CN 202310757418A CN 116503872 B CN116503872 B CN 116503872B
Authority
CN
China
Prior art keywords
layer
input end
output end
text
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310757418.6A
Other languages
Chinese (zh)
Other versions
CN116503872A (en
Inventor
严松
黄奎
刘利科
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jixian Information Technology Co ltd
Original Assignee
Beijing Jixian Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jixian Information Technology Co ltd filed Critical Beijing Jixian Information Technology Co ltd
Priority to CN202310757418.6A priority Critical patent/CN116503872B/en
Publication of CN116503872A publication Critical patent/CN116503872A/en
Application granted granted Critical
Publication of CN116503872B publication Critical patent/CN116503872B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/18Extraction of features or characteristics of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19173Classification techniques
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Character Discrimination (AREA)

Abstract

The invention discloses a credit client mining method based on machine learning, which belongs to the technical field of information processing and analysis.

Description

Trusted client mining method based on machine learning
Technical Field
The invention relates to the technical field of information processing and analysis, in particular to a trusted client mining method based on machine learning.
Background
The conventional trusted client mining needs to adopt manual data approval to check information such as the registered capital, the registered time, the operation period, the transaction duration, the gross profit, the transaction amount, the order quantity, the unit price of the client, the overdue number of the refund, the overdue amount, the overdue days and the like of the client based on the data provided by the client, so that the credit class of the client is estimated. However, the existing method for mining the trusted clients has the disadvantages of long time consumption, low auditing speed and inconvenience in rapidly mining the trusted clients.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides the credit giving client mining method based on machine learning, which solves the problems of long time and low auditing speed of the existing credit giving client mining method.
In order to achieve the aim of the invention, the invention adopts the following technical scheme: a trusted client mining method based on machine learning comprises the following steps:
s1, shooting text data submitted by a client to obtain a text image;
s2, performing image and text recognition on the text image to obtain text data;
s3, extracting client characteristic information from the text data;
s4, processing the client characteristic information by adopting a classification model, and dividing the credit rating of the client;
and S5, grading credit giving is carried out on the clients according to the credit grades of the clients, and the credit giving clients are filed.
Further, the step S2 includes the following sub-steps:
s21, extracting a text image from the text image;
s22, extracting features of the character image by adopting a feature extraction model to obtain an image feature sequence;
s23, processing the image feature sequence by adopting a character recognition model to obtain text data.
Further, the step S21 includes the steps of:
s211, carrying out gray level processing on the text image to obtain a gray level image;
s212, finding all pixel points meeting the edge condition from the gray level image to serve as text pixel points, wherein the edge condition is as follows:
, wherein ,/>Gray value of any pixel point on gray scale map,/>The number of the pixel points at one side in the neighborhood range in any pixel point is +.>The number of the pixel points at the other side in the neighborhood range in any pixel point is +.>Is the +.o. of the other side in the neighborhood>Pixel value of each pixel, +.>Is the +.o of one side in the neighborhood>Pixel value of each pixel, +.>Is a distance threshold;
s213, forming the gray values of all the text pixels into a text image.
The beneficial effects of the above further scheme are: and (3) carrying out graying treatment on the text image to obtain a gray level image, screening out text pixel points by using pixel values of the text pixel points and pixel values of background pixel points, wherein two pixel values exist in a vicinity range of one pixel point, the pixel points are the text pixel points with high probability, whether a difference value of the two pixel values is larger than a distance threshold value is calculated, if so, the pixel values on two sides are larger, meanwhile, the pixel points are similar to the pixel value of the pixel point on one side, are far away from the pixel value of the pixel point on the other side, the pixel points are further determined to be edge text pixel points on the text, all edge text pixel points are extracted, and text features are extracted, so that the effects of quickly reducing the image features and accurately extracting the text pixel points are achieved.
Further, the feature extraction model in S22 includes: a first convolution layer, a second convolution layer, a third convolution layer, a fourth convolution layer, a depth convolution layer, a first normalization layer, a second normalization layer, a maximum pooling layer, an average pooling layer, a Concat layer, a first adder A1 and a second adder A2;
the input end of the first convolution layer is used as the input end of the feature extraction model, and the output end of the first convolution layer is respectively connected with the input end of the depth convolution layer, the input end of the maximum pooling layer, the input end of the average pooling layer and the input end of the second adder A2; the input end of the first normalization layer is connected with the output end of the depth convolution layer, and the output end of the first normalization layer is connected with the input end of the second convolution layer; the input end of the first adder A1 is respectively connected with the output end of the maximum pooling layer and the output end of the average pooling layer, and the output end of the first adder A1 is connected with the input end of the second normalization layer; the input end of the Concat layer is respectively connected with the output end of the second convolution layer and the output end of the second normalization layer, and the output end of the Concat layer is connected with the input end of the third convolution layer; the output end of the third convolution layer is connected with the input end of the second adder A2; the input end of the fourth convolution layer is connected with the output end of the second adder A2, and the output end of the fourth convolution layer is used as the output end of the feature extraction model.
The beneficial effects of the above further scheme are: the invention processes the character image by a first convolution layer, divides the character image into multiple paths, inputs different paths, extracts depth features by the path of the depth convolution layer, extracts significant features by a maximum pooling layer, extracts average features by an average pooling layer, and connects the first convolution layer and a second adder A2 to realize identity mapping, thereby solving the problem of gradient disappearance.
Further, the formula of the normalization layer is:
wherein ,is the->Output(s)>Is the->Personal input (s)/(s)>For normalizing the weights of the layers, +.>For normalizing layer bias, ++>For normalizing the number of input amounts of the layer, +.>For normalizing the coefficient, +.>Is the product.
Further, the Chinese character recognition model in S23 includes: a first LSTM layer, a second LSTM layer, an attention layer, a fully connected layer, and a Softmax layer;
the input end of the first LSTM layer is connected with the first input end of the attention layer and is used as the input end of the character recognition model; the input end of the second LSTM layer is connected with the output end of the first LSTM layer, and the output end of the second LSTM layer is connected with the second input end of the attention layer; the input end of the full-connection layer is connected with the output end of the attention layer, and the output end of the full-connection layer is connected with the input end of the Softmax layer; the output end of the Softmax layer is used as the output end of the character recognition model.
Further, the expression of the attention layer is:
wherein ,for the output of the attention layer +.>To activate the function +.>For the first weight of the attention layer, +.>For the second weight of the attention layer, +.>For the average pooling layer treatment,/->For maximum pooling layer treatment, +.>Is the +.>Input feature vectors, ">In order to input the number of feature vectors, the ++I is a two-norm process, ++I>Is the bias of the attention layer.
The beneficial effects of the above further scheme are: the invention carries out weighting treatment on the characteristics of the input attention layer, reflects the proportion of each quantity in the input characteristics according to the proportion of each quantity, avoids the pooling treatment from wiping out the data characteristics, carries out the maximum pooling treatment and the average pooling treatment, and respectively gives weights to increase the attention to the characteristics.
Further, the classification model in S4 is:
wherein ,for the output of the classification model, +.>Input for classification model->Customer characteristic information->Is->Customer characteristic information threshold,/->Is->Weight of the customer characteristic information, +.>Is->Bias of customer characteristic information->For the kind of the extracted customer characteristic information +.>As hyperbolic tangent function, +.>Is a proportionality coefficient.
The beneficial effects of the above further scheme are: in the classification model, each piece of customer characteristic information has a corresponding threshold value, if the customer characteristic information is smaller than the threshold value, the classification model plays a role in reducing the credit level of the customer, different weights and biases are given to each piece of customer characteristic information, different importance degrees of different customer characteristic information are achieved, the credit level of the customer is calculated by adopting a hyperbolic tangent function, a proportionality coefficient is set, the credit level is amplified, and the credit level of the customer is conveniently distinguished.
Further, the loss function of the classification model is:
wherein ,for loss function->For the statistical training times, ++>Is->Credit rating predicted by secondary training, +.>Is->Training the actual credit level +.>For counting the number of training times>For the number of actual training times>Satisfy the condition +.>Is>Is a loss difference threshold.
The beneficial effects of the above further scheme are: the invention adopts the difference square of the actual credit level and the predicted credit level as the main content of the loss function, and the condition of multiple times of training is adopted, so that the classification model can achieve higher precision on the whole, the influence of higher precision on the judgment of the training degree of the classification model is prevented, and the loss difference threshold value is set in the inventionFurther assists in judging the training degree of the classification model in the training process, and when the difference between the actual credit level and the predicted credit level is smaller, the classification model is +.>Equal to->When the calculated loss value is smaller, the training degree of the judgment classification model can be more accurate, if the difference between the actual credit level and the predicted credit level is smaller, but +.>Not equal to->Then->The value of 2 or more corresponds to multiplying the difference between the actual credit rating and the predicted credit rating by a scaling factor, and the loss value is increased, in which case the classification model still needs to be trained.
In summary, the invention has the following beneficial effects: the invention adopts image processing on the recorded customer data, extracts the text of the customer data, thereby obtaining the customer characteristic information, adopts a classification model to automatically divide the credit rating of the customer according to the customer characteristic information, and carries out classified credit giving on the customer according to the credit rating of the customer, thereby realizing a full-automatic and rapid credit giving customer mining method.
Drawings
FIG. 1 is a flow chart of a method of mining trusted clients based on machine learning;
FIG. 2 is a schematic diagram of a feature extraction model;
fig. 3 is a schematic diagram of a text recognition model.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and all the inventions which make use of the inventive concept are protected by the spirit and scope of the present invention as defined and defined in the appended claims to those skilled in the art.
As shown in fig. 1, a trusted client mining method based on machine learning includes the following steps:
s1, shooting text data submitted by a client to obtain a text image;
s2, performing image and text recognition on the text image to obtain text data;
the step S2 comprises the following sub-steps:
s21, extracting a text image from the text image;
the step S21 comprises the following steps:
s211, carrying out gray level processing on the text image to obtain a gray level image;
s212, finding all pixel points meeting the edge condition from the gray level image to serve as text pixel points, wherein the edge condition is as follows:
, wherein ,/>Gray value of any pixel point on gray scale map,/>The number of the pixel points at one side in the neighborhood range in any pixel point is +.>The number of the pixel points at the other side in the neighborhood range in any pixel point is +.>Is the +.o. of the other side in the neighborhood>Pixel value of each pixel, +.>Is the +.o of one side in the neighborhood>Pixel value of each pixel, +.>Is a distance threshold;
in this embodiment, the distance threshold is set according to the experimental conditions.
S213, forming the gray values of all the text pixels into a text image.
And (3) carrying out graying treatment on the text image to obtain a gray level image, screening out text pixel points by using pixel values of the text pixel points and pixel values of background pixel points, wherein two pixel values exist in a vicinity range of one pixel point, the pixel points are the text pixel points with high probability, whether a difference value of the two pixel values is larger than a distance threshold value is calculated, if so, the pixel values on two sides are larger, meanwhile, the pixel points are similar to the pixel value of the pixel point on one side, are far away from the pixel value of the pixel point on the other side, the pixel points are further determined to be edge text pixel points on the text, all edge text pixel points are extracted, and text features are extracted, so that the effects of quickly reducing the image features and accurately extracting the text pixel points are achieved.
In the present embodiment, words according to the text materialThe characteristics of the background can be known:less than->Typically, the text imaging is black, < >>Approaching 0, the background approaches 255, so the difference between text and background pixel values is large.
S22, extracting features of the character image by adopting a feature extraction model to obtain an image feature sequence;
as shown in fig. 2, the feature extraction model in S22 includes: a first convolution layer, a second convolution layer, a third convolution layer, a fourth convolution layer, a depth convolution layer, a first normalization layer, a second normalization layer, a maximum pooling layer, an average pooling layer, a Concat layer, a first adder A1 and a second adder A2;
the input end of the first convolution layer is used as the input end of the feature extraction model, and the output end of the first convolution layer is respectively connected with the input end of the depth convolution layer, the input end of the maximum pooling layer, the input end of the average pooling layer and the input end of the second adder A2; the input end of the first normalization layer is connected with the output end of the depth convolution layer, and the output end of the first normalization layer is connected with the input end of the second convolution layer; the input end of the first adder A1 is respectively connected with the output end of the maximum pooling layer and the output end of the average pooling layer, and the output end of the first adder A1 is connected with the input end of the second normalization layer; the input end of the Concat layer is respectively connected with the output end of the second convolution layer and the output end of the second normalization layer, and the output end of the Concat layer is connected with the input end of the third convolution layer; the output end of the third convolution layer is connected with the input end of the second adder A2; the input end of the fourth convolution layer is connected with the output end of the second adder A2, and the output end of the fourth convolution layer is used as the output end of the feature extraction model.
The invention processes the character image by a first convolution layer, divides the character image into multiple paths, inputs different paths, extracts depth features by the path of the depth convolution layer, extracts significant features by a maximum pooling layer, extracts average features by an average pooling layer, and connects the first convolution layer and a second adder A2 to realize identity mapping, thereby solving the problem of gradient disappearance.
The formulas of the first normalization layer and the second normalization layer are as follows:
wherein ,is the->Output(s)>Is the->Personal input (s)/(s)>For normalizing the weights of the layers, +.>For normalizing layer bias, ++>For normalizing the number of input amounts of the layer, +.>For normalizing the coefficient, +.>Is the product.
S23, processing the image feature sequence by adopting a character recognition model to obtain text data.
As shown in fig. 3, the chinese character recognition model in S23 includes: a first LSTM layer, a second LSTM layer, an attention layer, a fully connected layer, and a Softmax layer;
the input end of the first LSTM layer is connected with the first input end of the attention layer and is used as the input end of the character recognition model; the input end of the second LSTM layer is connected with the output end of the first LSTM layer, and the output end of the second LSTM layer is connected with the second input end of the attention layer; the input end of the full-connection layer is connected with the output end of the attention layer, and the output end of the full-connection layer is connected with the input end of the Softmax layer; the output end of the Softmax layer is used as the output end of the character recognition model.
The expression of the attention layer is:
wherein ,for the output of the attention layer +.>To activate the function +.>For the first weight of the attention layer, +.>For the second weight of the attention layer, +.>For the average pooling layer treatment,/->For maximum pooling layer treatment, +.>Is the +.>Input feature vectors, ">In order to input the number of feature vectors, the ++I is a two-norm process, ++I>Is the bias of the attention layer.
The invention carries out weighting treatment on the characteristics of the input attention layer, reflects the proportion of each quantity in the input characteristics according to the proportion of each quantity, avoids the pooling treatment from wiping out the data characteristics, carries out the maximum pooling treatment and the average pooling treatment, and respectively gives weights to increase the attention to the characteristics.
S3, extracting client characteristic information from the text data;
since text data is extracted by performing text recognition in step S2 and data vectors are obtained in the computer system, only the required text feature vector is extracted in step S3 to obtain client feature information, which corresponds to the extraction of the corresponding data vector from the storage unit, and the information of the client is known.
S4, processing the client characteristic information by adopting a classification model, and dividing the credit rating of the client;
the classification model in the S4 is as follows:
wherein ,for the output of the classification model, +.>Input for classification model->Customer characteristic information->Is->Customer characteristic information thresholdValue of->Is->Weight of the customer characteristic information, +.>Is->Bias of customer characteristic information->For the kind of the extracted customer characteristic information +.>As hyperbolic tangent function, +.>Is a proportionality coefficient.
In the classification model, each piece of customer characteristic information has a corresponding threshold value, if the customer characteristic information is smaller than the threshold value, the classification model plays a role in reducing the credit level of the customer, different weights and biases are given to each piece of customer characteristic information, different importance degrees of different customer characteristic information are achieved, the credit level of the customer is calculated by adopting a hyperbolic tangent function, a proportionality coefficient is set, the credit level is amplified, and the credit level of the customer is conveniently distinguished.
In this embodiment, the customer feature information may be normalized first, so as to ensure that each amount is in a range of 0-1, so that components of each amount in the classification model are conveniently measured, and after normalization, the customer feature information threshold may be set to 0.5.
In this embodiment, the types of the client characteristic information include: the method comprises the steps of taking two types of registered capital and registered time as an example classification model, setting the value of client characteristic information corresponding to the normalized registered capital type to be 0.5 if the registered capital is ten millions, setting the registered capital to be two tens of millions at maximum, setting the registered time to be 6 years, and setting the maximum year to be 30 years, wherein the value of client characteristic information corresponding to the registered time type to be 0.2, namely the client characteristic information described in the invention is a value obtained by quantifying the client information. The description is merely illustrative of the use of classification models, and specific use procedures may be set according to requirements, the specific requirement setting not affecting the structure of the classification model of the present invention.
The type of the customer characteristic information is selected, and the customer characteristic information is freely selected according to the direction of the business of each enterprise or bank.
The classification model of the invention realizes that various information of the clients are summarized and counted, thereby obtaining the credit rating of the clients.
The loss function of the classification model is as follows:
wherein ,for loss function->For the statistical training times, ++>Is->Credit rating predicted by secondary training, +.>Is->Training the actual credit level +.>For counting the number of training times>For the number of actual training times>Satisfy the condition +.>Is>Is a loss difference threshold.
The invention adopts the difference square of the actual credit level and the predicted credit level as the main content of the loss function, and the condition of multiple times of training is adopted, so that the classification model can achieve higher precision on the whole, the influence of higher precision on the judgment of the training degree of the classification model is prevented, and the loss difference threshold value is set in the inventionFurther assists in judging the training degree of the classification model in the training process, and when the difference between the actual credit level and the predicted credit level is smaller, the classification model is +.>Equal to->When the calculated loss value is smaller, the training degree of the judgment classification model can be more accurate, if the difference between the actual credit level and the predicted credit level is smaller, but +.>Not equal to->Then->The value of 2 or more corresponds to multiplying the difference between the actual credit rating and the predicted credit rating by a scaling factor, and the loss value is increased, in which case the classification model still needs to be trained.
The actual credit level in the invention is a label which is manually divided from a training sample of the classification model according to experience.
The higher the accuracy of the classification model training in the invention, the higher the classification accuracy of the client credit rating.
And S5, grading credit giving is carried out on the clients according to the credit grades of the clients, and the credit giving clients are filed.
In this embodiment, the credit rating of the client includes: bronze, silver, gold, platinum, and diamond customers, or primary, secondary, tertiary, and quaternary customers, etc.
In summary, the beneficial effects of the embodiment of the invention are as follows: the invention adopts image processing on the recorded customer data, extracts the text of the customer data, thereby obtaining the customer characteristic information, adopts a classification model to automatically divide the credit rating of the customer according to the customer characteristic information, and carries out classified credit giving on the customer according to the credit rating of the customer, thereby realizing a full-automatic and rapid credit giving customer mining method.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (4)

1. The method for mining the trusted clients based on machine learning is characterized by comprising the following steps:
s1, shooting text data submitted by a client to obtain a text image;
s2, performing image and text recognition on the text image to obtain text data;
the step S2 comprises the following sub-steps:
s21, extracting a text image from the text image;
s22, extracting features of the character image by adopting a feature extraction model to obtain an image feature sequence;
s23, processing the image feature sequence by adopting a character recognition model to obtain text data;
the feature extraction model in S22 includes: a first convolution layer, a second convolution layer, a third convolution layer, a fourth convolution layer, a depth convolution layer, a first normalization layer, a second normalization layer, a maximum pooling layer, an average pooling layer, a Concat layer, a first adder A1 and a second adder A2;
the input end of the first convolution layer is used as the input end of the feature extraction model, and the output end of the first convolution layer is respectively connected with the input end of the depth convolution layer, the input end of the maximum pooling layer, the input end of the average pooling layer and the input end of the second adder A2; the input end of the first normalization layer is connected with the output end of the depth convolution layer, and the output end of the first normalization layer is connected with the input end of the second convolution layer; the input end of the first adder A1 is respectively connected with the output end of the maximum pooling layer and the output end of the average pooling layer, and the output end of the first adder A1 is connected with the input end of the second normalization layer; the input end of the Concat layer is respectively connected with the output end of the second convolution layer and the output end of the second normalization layer, and the output end of the Concat layer is connected with the input end of the third convolution layer; the output end of the third convolution layer is connected with the input end of the second adder A2; the input end of the fourth convolution layer is connected with the output end of the second adder A2, and the output end of the fourth convolution layer is used as the output end of the feature extraction model;
the Chinese character recognition model in S23 comprises: a first LSTM layer, a second LSTM layer, an attention layer, a fully connected layer, and a Softmax layer;
the input end of the first LSTM layer is connected with the first input end of the attention layer and is used as the input end of the character recognition model; the input end of the second LSTM layer is connected with the output end of the first LSTM layer, and the output end of the second LSTM layer is connected with the second input end of the attention layer; the input end of the full-connection layer is connected with the output end of the attention layer, and the output end of the full-connection layer is connected with the input end of the Softmax layer; the output end of the Softmax layer is used as the output end of the character recognition model;
s3, extracting client characteristic information from the text data;
the types of the customer characteristic information include: register capital, register time, business scope, business deadline, stakeholder information, high management information, transaction duration, gross profit, transaction amount, amount of orders, and guest unit price;
s4, processing the client characteristic information by adopting a classification model, and dividing the credit rating of the client;
s5, grading credit giving is carried out on the clients according to the credit grades of the clients, and the credit giving clients are filed;
the classification model in the S4 is as follows:
wherein ,for the output of the classification model, +.>Input for classification model->Customer characteristic information->Is->Customer characteristic information threshold,/->Is->Weight of the customer characteristic information, +.>Is->Bias of customer characteristic information->For the kind of the extracted customer characteristic information +.>As hyperbolic tangent function, +.>Is a proportionality coefficient;
the loss function of the classification model is as follows:
wherein ,for loss function->For the statistical training times, ++>Is->Credit rating predicted by secondary training, +.>Is->Training the actual credit level +.>For counting the number of training times>For the number of actual training times>Satisfy the condition +.>Is used for the number of times of (a),is a loss difference threshold.
2. The machine learning based trusted client mining method of claim 1, wherein S21 comprises the steps of:
s211, carrying out gray level processing on the text image to obtain a gray level image;
s212, finding all pixel points meeting the edge condition from the gray level image to serve as text pixel points, wherein the edge condition is as follows:
wherein ,gray value of any pixel point on gray scale map,/>The number of the pixel points at one side in the neighborhood range in any pixel point is +.>The number of the pixel points at the other side in the neighborhood range in any pixel point is +.>Is the +.o. of the other side in the neighborhood>Pixel value of each pixel, +.>Is the +.o of one side in the neighborhood>Pixel value of each pixel, +.>Is a distance threshold;
s213, forming the gray values of all the text pixels into a text image.
3. The machine learning based trusted client mining method of claim 1, wherein the formula of the normalization layer is:
wherein ,is the->Output(s)>Is the->Personal input (s)/(s)>For normalizing the weights of the layers, +.>For normalizing layer bias, ++>For normalizing the number of input amounts of the layer, +.>For normalizing the coefficient, +.>Is the product.
4. The machine learning based trusted client mining method of claim 1, wherein the expression of the attention layer is:
wherein ,for the output of the attention layer +.>To activate the function +.>For the first weight of the attention layer, +.>For the second weight of the attention layer, +.>For the average pooling layer treatment,/->For maximum pooling layer treatment, +.>Is the +.>Input feature vectors, ">In order to input the number of feature vectors, the ++I is a two-norm process, ++I>Is the bias of the attention layer.
CN202310757418.6A 2023-06-26 2023-06-26 Trusted client mining method based on machine learning Active CN116503872B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310757418.6A CN116503872B (en) 2023-06-26 2023-06-26 Trusted client mining method based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310757418.6A CN116503872B (en) 2023-06-26 2023-06-26 Trusted client mining method based on machine learning

Publications (2)

Publication Number Publication Date
CN116503872A CN116503872A (en) 2023-07-28
CN116503872B true CN116503872B (en) 2023-09-05

Family

ID=87321639

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310757418.6A Active CN116503872B (en) 2023-06-26 2023-06-26 Trusted client mining method based on machine learning

Country Status (1)

Country Link
CN (1) CN116503872B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116934468B (en) * 2023-09-15 2023-12-22 成都运荔枝科技有限公司 Trusted client grading method based on semantic recognition

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108734338A (en) * 2018-04-24 2018-11-02 阿里巴巴集团控股有限公司 Credit risk forecast method and device based on LSTM models
CN109389055A (en) * 2018-09-21 2019-02-26 西安电子科技大学 Video classification methods based on mixing convolution sum attention mechanism
CN111062236A (en) * 2019-05-05 2020-04-24 杭州魔蝎数据科技有限公司 Data authorization method and device based on artificial intelligence
CN111652870A (en) * 2020-06-02 2020-09-11 集美大学诚毅学院 Cloth defect detection method and device, storage medium and electronic equipment
CN112069321A (en) * 2020-11-11 2020-12-11 震坤行网络技术(南京)有限公司 Method, electronic device and storage medium for text hierarchical classification
CN112138403A (en) * 2020-10-19 2020-12-29 腾讯科技(深圳)有限公司 Interactive behavior recognition method and device, storage medium and electronic equipment
CN112348654A (en) * 2020-09-23 2021-02-09 民生科技有限责任公司 Automatic assessment method, system and readable storage medium for enterprise credit line
CN112613501A (en) * 2020-12-21 2021-04-06 深圳壹账通智能科技有限公司 Information auditing classification model construction method and information auditing method
CN112819604A (en) * 2021-01-19 2021-05-18 浙江省农村信用社联合社 Personal credit evaluation method and system based on fusion neural network feature mining
CN113963147A (en) * 2021-09-26 2022-01-21 西安交通大学 Key information extraction method and system based on semantic segmentation
CN114118186A (en) * 2021-10-11 2022-03-01 西安理工大学 Calligraphy image style classification method based on directional feature enhancement
CN114841792A (en) * 2021-12-22 2022-08-02 云汉芯城(上海)互联网科技股份有限公司 Client credit line prediction method based on machine learning
CN114882011A (en) * 2022-06-13 2022-08-09 浙江理工大学 Fabric flaw detection method based on improved Scaled-YOLOv4 model
CN114971294A (en) * 2022-05-27 2022-08-30 平安银行股份有限公司 Data acquisition method, device, equipment and storage medium
CN115688002A (en) * 2022-11-02 2023-02-03 阿里云计算有限公司 Classification method and device, method and device for training classification model and classification model

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10354168B2 (en) * 2016-04-11 2019-07-16 A2Ia S.A.S. Systems and methods for recognizing characters in digitized documents
CN110751261B (en) * 2018-07-23 2024-05-28 第四范式(北京)技术有限公司 Training method and system and prediction method and system for neural network model
US10943274B2 (en) * 2018-08-28 2021-03-09 Accenture Global Solutions Limited Automation and digitizalization of document processing systems
US11423538B2 (en) * 2019-04-16 2022-08-23 Covera Health Computer-implemented machine learning for detection and statistical analysis of errors by healthcare providers
JP2022161564A (en) * 2021-04-09 2022-10-21 株式会社日立製作所 System for training machine learning model recognizing character of text image

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108734338A (en) * 2018-04-24 2018-11-02 阿里巴巴集团控股有限公司 Credit risk forecast method and device based on LSTM models
CN109389055A (en) * 2018-09-21 2019-02-26 西安电子科技大学 Video classification methods based on mixing convolution sum attention mechanism
CN111062236A (en) * 2019-05-05 2020-04-24 杭州魔蝎数据科技有限公司 Data authorization method and device based on artificial intelligence
CN111652870A (en) * 2020-06-02 2020-09-11 集美大学诚毅学院 Cloth defect detection method and device, storage medium and electronic equipment
CN112348654A (en) * 2020-09-23 2021-02-09 民生科技有限责任公司 Automatic assessment method, system and readable storage medium for enterprise credit line
CN112138403A (en) * 2020-10-19 2020-12-29 腾讯科技(深圳)有限公司 Interactive behavior recognition method and device, storage medium and electronic equipment
CN112069321A (en) * 2020-11-11 2020-12-11 震坤行网络技术(南京)有限公司 Method, electronic device and storage medium for text hierarchical classification
WO2022134588A1 (en) * 2020-12-21 2022-06-30 深圳壹账通智能科技有限公司 Method for constructing information review classification model, and information review method
CN112613501A (en) * 2020-12-21 2021-04-06 深圳壹账通智能科技有限公司 Information auditing classification model construction method and information auditing method
CN112819604A (en) * 2021-01-19 2021-05-18 浙江省农村信用社联合社 Personal credit evaluation method and system based on fusion neural network feature mining
CN113963147A (en) * 2021-09-26 2022-01-21 西安交通大学 Key information extraction method and system based on semantic segmentation
CN114118186A (en) * 2021-10-11 2022-03-01 西安理工大学 Calligraphy image style classification method based on directional feature enhancement
CN114841792A (en) * 2021-12-22 2022-08-02 云汉芯城(上海)互联网科技股份有限公司 Client credit line prediction method based on machine learning
CN114971294A (en) * 2022-05-27 2022-08-30 平安银行股份有限公司 Data acquisition method, device, equipment and storage medium
CN114882011A (en) * 2022-06-13 2022-08-09 浙江理工大学 Fabric flaw detection method based on improved Scaled-YOLOv4 model
CN115688002A (en) * 2022-11-02 2023-02-03 阿里云计算有限公司 Classification method and device, method and device for training classification model and classification model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于代价敏感和集成学习的网络借贷信用评价方法与应用;王浩旻;中国博士学位论文全文数据库 (基础科学辑)(第(2020)07期);A002-80 *

Also Published As

Publication number Publication date
CN116503872A (en) 2023-07-28

Similar Documents

Publication Publication Date Title
Franceschetti et al. Do bankrupt companies manipulate earnings more than the non-bankrupt ones?
CN109583966B (en) High-value customer identification method, system, equipment and storage medium
Chi et al. Bankruptcy prediction: Application of logit analysis in export credit risks
CN108648023A (en) A kind of businessman&#39;s passenger flow forecast method of fusion history mean value and boosted tree
CN116503872B (en) Trusted client mining method based on machine learning
CN112102073A (en) Credit risk control method and system, electronic device and readable storage medium
CN115545712A (en) Fraud prediction method, device, equipment and storage medium for transaction behaviors
US20160379138A1 (en) Classifying test data based on a maximum margin classifier
CN112365352A (en) Anti-cash-out method and device based on graph neural network
CN116227939A (en) Enterprise credit rating method and device based on graph convolution neural network and EM algorithm
CN115545909A (en) Approval method, device, equipment and storage medium
Faizova et al. The Impact of Digitalization Risks on the Business Processes of an Insurance Company
CN111428510B (en) Public praise-based P2P platform risk analysis method
CN110570301B (en) Risk identification method, device, equipment and medium
Fikriya et al. Support Vector Machine Predictive Analysis Implementation: Case Study of Tax Revenue in Government of South Lampung
Kao et al. Bayesian behavior scoring model
CN113191771A (en) Buyer account period risk prediction method
KR20220001127U (en) Commercial Real Estate Simulator using Public data and Vehicle Analysis
Hargreaves Machine learning application to identify good credit customers
Manawadu et al. Microfinance interest rate prediction and automate the loan application
CN115953166B (en) Customer information management method and system based on big data intelligent matching
CN117994017A (en) Method for constructing retail credit risk prediction model and online credit service Scoredelta model
CN118071482A (en) Method for constructing retail credit risk prediction model and consumer credit business Scorebetad model
CN117994016A (en) Method for constructing retail credit risk prediction model and consumer credit business Scorebeta model
CN118071483A (en) Method for constructing retail credit risk prediction model and personal credit business Scorepsi model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant