CN112613501A

CN112613501A - Information auditing classification model construction method and information auditing method

Info

Publication number: CN112613501A
Application number: CN202011521474.2A
Authority: CN
Inventors: 高文
Original assignee: OneConnect Financial Technology Co Ltd Shanghai
Current assignee: OneConnect Smart Technology Co Ltd; OneConnect Financial Technology Co Ltd Shanghai
Priority date: 2020-12-21
Filing date: 2020-12-21
Publication date: 2021-04-06
Also published as: WO2022134588A1

Abstract

The application relates to the technical field of artificial intelligence, and provides a method and a device for constructing an information auditing and classifying model, computer equipment and a storage medium. The method comprises the following steps: the method comprises the steps of obtaining a sample image, carrying out optical character recognition on the sample image to obtain a text recognition result, carrying out semantic analysis, extracting a target text according to the semantic analysis result of the text recognition result, generating an audit tag corresponding to the target text for audit tag labeling based on an audit tag generation rule corresponding to a business party, carrying out model training according to the target text with the audit tag, and obtaining an information audit classification model. The labeling processing is not dependent on manual work, so that the generation speed of the sample is improved, the audit labels suitable for model training of different business parties can be generated, the multiplexing of sample images is realized, and the information audit classification model is obtained through quick training. In addition, the application also relates to a block chain technology, and the information auditing result of the user can be stored in the block chain.

Description

Information auditing classification model construction method and information auditing method

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method for constructing an information audit classification model, an information audit method, an apparatus, a computer device, and a storage medium.

Background

With the development of information technology, the application of information auditing in daily life is more and more extensive. Taking loan audit as an example, the application information of a borrower is audited layer by layer, the bank then carries out manual audit, the manual audit process is complex and long in time consumption, the manual audit is difficult to control the timely completion of the audit amount, the borrower is in doubt of the service of the bank party, and the bank party is damaged in interests. Due to the numerous processes and operators involved in information auditing, requesters have a long wait time in terms of the process of information auditing. For the auditor, the auditing process is very complicated, and whether manual auditing meets the standard is difficult to judge.

With the development of artificial intelligence technology, the training of the model depends on a large number of training samples, the samples need to be labeled manually based on historical audit records, and different business parties have different audit standards, so that the model obtained by training the same sample cannot be suitable for different business parties, the model training needs to spend a large amount of time on sample preparation, and the efficiency of the model training process is low.

Disclosure of Invention

In view of the above, it is necessary to provide a method, an apparatus, a computer device, and a storage medium for constructing an information audit classification model, which can improve efficiency of a model training process.

A method for constructing an information auditing and classifying model comprises the following steps:

acquiring a sample image, and carrying out optical character recognition on the sample image to obtain a text recognition result;

performing semantic analysis on the text recognition result, and extracting a target text in the text recognition result according to the semantic analysis result;

generating an audit label corresponding to the target text based on an audit label generating rule corresponding to the business party, and labeling the audit label on the target text;

and training the initial classification model according to the target text carrying the audit tag to obtain an information audit classification model.

In one embodiment, generating an audit tag corresponding to a target text based on an audit tag generation rule corresponding to a business party, and performing audit tag labeling on the target text includes:

acquiring the evaluation configuration parameters of the business party on the risk level;

determining an audit tag generation rule according to the evaluation configuration parameters;

traversing the target text according to the audit tag generation rule, and carrying out classification statistics on information in the target text;

generating an audit label according to the classification statistical result and the audit label generation rule;

and generating a characteristic vector according to the target text, and labeling the characteristic vector with an audit tag.

In one embodiment, the image containing the target text is a credit report; determining the generation rule of the audit tag according to the evaluation configuration parameters comprises:

determining a negative credit transaction threshold parameter of each risk level corresponding to the user type according to the user type in the evaluation configuration parameters; wherein the user types comprise individual users and enterprise users;

and configuring a label generation rule corresponding to the risk level according to the negative credit transaction threshold parameter.

In one embodiment, configuring the tag generation rules corresponding to the risk levels according to the negative credit transaction threshold parameters comprises:

extracting weight parameters for different negative credit transaction types in the evaluation configuration parameters;

and configuring a label generation rule corresponding to the risk level according to the threshold data in the weight parameter and the negative credit transaction threshold parameter.

extracting a quantity threshold corresponding to each negative credit transaction type from the negative credit transaction threshold parameter;

and configuring a label generation rule corresponding to the risk level according to the quantity threshold corresponding to each negative credit transaction type.

An information auditing method, the method comprising:

acquiring an image to be audited of a user to be audited, and performing optical character recognition on the image to be audited to obtain an initial recognition text;

performing semantic analysis on the initial recognition text, and extracting a text to be analyzed in the initial recognition text according to a semantic analysis result;

and inputting the text to be analyzed into the information auditing and classifying model in any embodiment to obtain an information auditing result.

In one embodiment, the information auditing method further includes:

acquiring an actual auditing result corresponding to the information auditing result;

when the difference between the audit classification result and the actual audit result is larger than a preset threshold value, taking the actual audit result as a label of the image to be audited, and adding the label to the update sample set;

and performing iterative training on the information audit classification model according to the update sample set based on a preset model iteration cycle to obtain an updated information audit classification model.

An information auditing and classifying model constructing device comprises:

the text recognition module is used for acquiring a sample image and carrying out optical character recognition on the sample image to obtain a text recognition result;

the semantic analysis module is used for performing semantic analysis on the text recognition result and extracting a target text in the text recognition result according to the semantic analysis result;

the tag generation module is used for generating an audit tag corresponding to the target text based on an audit tag generation rule corresponding to the business party and marking the audit tag on the target text;

and the model training module is used for training the initial classification model according to the target text carrying the audit tag to obtain an information audit classification model.

An information auditing apparatus, the apparatus comprising:

the system comprises a to-be-audited image text recognition module, a to-be-audited image text recognition module and a text recognition module, wherein the to-be-audited image text recognition module is used for acquiring a to-be-audited image of a to-be-audited user, and performing optical character recognition on the to-be-audited image to;

the semantic analysis module of the image to be audited is used for carrying out semantic analysis on the initial recognition text and extracting the text to be analyzed in the initial recognition text according to the semantic analysis result;

and the information auditing and classifying module is used for inputting the text to be analyzed into the information auditing and classifying model in any embodiment to obtain an information auditing result.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

A computer storage medium having a computer program stored thereon, the computer program when executed by a processor implementing the steps of:

According to the method and the device for constructing the information auditing and classifying model, the image containing the target text is obtained, the image is identified through optical character identification, the target text in the identification result is extracted through semantic analysis, and preliminary screening of information is realized. The label corresponding to the target text is automatically generated through the preset audit label generation rule, label labeling is carried out, manual labeling processing is not needed, the generation speed of the model training sample is improved, the same image sample can be generated into audit labels suitable for model training of different business parties based on the audit label generation rule corresponding to the business party, the multiplexing of the sample images is realized, the initial classification model can be trained quickly and conveniently, the efficiency of the model training process is improved, and the information audit classification model is quickly obtained.

Drawings

FIG. 1 is a diagram of an application environment of a method for constructing an information audit classification model in one embodiment;

FIG. 2 is a schematic flow chart of a method for constructing an information audit classification model according to an embodiment;

FIG. 3 is a schematic flow chart of a method for constructing an information audit classification model in another embodiment;

FIG. 4 is a schematic flow chart of a method for constructing an information audit classification model in yet another embodiment;

FIG. 5 is a schematic flow chart of a method for constructing an information audit classification model in yet another embodiment;

FIG. 6 is a schematic flow chart of a method for constructing an information audit classification model in yet another embodiment;

FIG. 7 is a block diagram of an apparatus for constructing an information audit classification model according to an embodiment;

FIG. 8 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The construction method of the information auditing and classifying model provided by the application can be applied to the application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The server 104 receives an image which is uploaded by the terminal 102 and contains a target text, performs optical character recognition on the image to obtain a text recognition result, performs semantic analysis on the text recognition result, extracts the target text in the text recognition result according to the semantic analysis result, generates an audit tag corresponding to the target text based on an audit tag generation rule corresponding to a business party, performs audit tag labeling on the target text, and trains an initial classification model according to the target text with the audit tag to obtain an information audit classification model. The server 104 can process the image to be checked of the user to be checked uploaded by the terminal based on the information checking classification model, and quickly obtain an information checking result. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be implemented by an independent server or a server cluster formed by a plurality of servers.

In one embodiment, as shown in fig. 2, a method for constructing an information auditing and classification model is provided, which is described by taking the method as an example for being applied to the server in fig. 1, and includes the following steps 202 to 208.

Step 202, obtaining a sample image, and performing optical character recognition on the sample image to obtain a text recognition result.

The sample image refers to a sample which is selected in advance and used for training the model, and the sample can be in an image format, for example, an image obtained by scanning or image shooting a paper document by a terminal.

Optical character recognition refers to the process of an electronic device (e.g., a scanner or digital camera) examining a printed character on paper, determining its shape by detecting dark and light patterns, and then translating the shape into a computer text using a character recognition method. And aiming at the characters of the printed form, converting characters in a paper document into an image file of a black-and-white dot matrix in an optical mode, and converting the characters in the image into a text format through recognition software for further editing and processing the text by word processing software. And carrying out optical character recognition on the sample image to obtain a text recognition result.

By means of the mode of identifying the samples in the image format and the optical characters, the method is suitable for application scenes in which the obtained samples cannot directly obtain the contents contained in the samples, and can accurately and quickly obtain effective data for model training on the premise of reducing manual operation.

The sample image contains information for performing an audit classification. Take the sample image as a credit report or a credit information document issued by other fixed authorities as an example. The negative credit information in the credit information certification file is the information for performing the audit classification. Specifically, the credit investigation report has a fixed format, and a text recognition result corresponding to the credit investigation report can be conveniently and rapidly obtained through optical character recognition. The text recognition result for the credit report includes all the characters in the credit report.

And 204, performing semantic analysis on the text recognition result, and extracting a target text in the text recognition result according to the semantic analysis result.

The semantic analysis comprises two modes of keyword recognition and context recognition. Wherein the keyword may be a preconfigured keyword. Through the keyword recognition, the required text can be extracted from the text recognition result quickly and accurately, and the effective filtration of unnecessary text is realized. Taking credit report as an example, semantic analysis includes identifying keywords such as overdue, overdraft, etc. that can reflect negative credit transactions in the text recognition result.

The target text refers to text which can completely express semantics and is determined by sentence selection based on the context where the keywords are located. Such as a complete sentence or a segment of speech, the configuration may be selected according to the type of information in the sample image. For example, the credit investigation report reflects the credit status of the user as an independent single message, and the single message including the keyword may be confirmed as the target text.

The context recognition can be realized by respectively inputting all information in the text recognition result into the semantic recognition model through the trained semantic recognition model, and the semantic recognition model determines whether the input information is the text reflecting the negative credit transaction or not through context analysis. And judging the text capable of reflecting the negative credit transaction as a target text to be reserved, and directly discarding the text incapable of reflecting the negative credit transaction, thereby realizing effective screening of information in the text recognition result and obtaining the target text.

And step 206, generating an audit tag corresponding to the target text based on the audit tag generation rule corresponding to the business party, and labeling the audit tag on the target text.

The audit tag generation rule is a rule for generating a corresponding audit tag based on the content of the target text. Specifically, the audit tag generation rule may determine the corresponding tag according to the number of information in the target text, or may determine the corresponding tag according to the combination of the number of information in the target text and the weight of the category to which the information belongs.

Different business parties have different audit tag generation rules. The audit tag generation rules may be preconfigured with configuration parameters based on the business party's rating of risk level. The auditing label can be generated directly by means of auditing the label generation rule through the configuration parameters based on the evaluation of the business party to the risk level. By means of directly generating the labels based on the audit label generation rule, when the model can be trained based on different business party requirements, a large amount of manual sample labels do not need to be respectively carried out on the model required by each business party to carry out model training. Moreover, different service parties may have different auditing standards for the information of the same user. The same sample can generate different audit labels based on audit label generation rules of different business parties, so that model training sample labels suitable for different business parties can be obtained, and rapid construction of model training samples of different business parties is realized.

And 208, training the initial classification model according to the target text carrying the audit tag to obtain an information audit classification model.

And on the basis of the audit tag generation rule corresponding to the business party, the audit tags carried by the obtained target texts meet the audit requirements of the business party, and the initial classification model is trained by inputting the target texts carrying the audit tags into the initial classification model. The initial classification model may be any one of a decision tree, an artificial neural network, a support vector machine, a random forest, a logistic regression, and the like.

In the model training process, the target texts with the audit tags can be divided into a training set and a test set, the initial classification model is trained through the target texts in the training set, the classification model obtained after training is tested through the target texts in the test set, and whether the classification model obtained after training meets the precision requirement of the model or not is detected. If the information is not satisfied, performing iterative training on the initial classification model through model parameter adjustment until a model satisfying the precision requirement is obtained.

According to the method for constructing the information auditing and classifying model, the image containing the target text is obtained, the image is identified through optical character identification, the target text in the identification result is extracted through semantic analysis, and preliminary screening of information is achieved. The label corresponding to the target text is automatically generated through the preset audit label generation rule, label labeling is carried out, manual labeling processing is not needed, the generation speed of the model training sample is improved, the same image sample can be generated into audit labels suitable for model training of different business parties based on the audit label generation rule corresponding to the business party, the multiplexing of the sample images is realized, the initial classification model can be trained quickly and conveniently, the efficiency of the model training process is improved, and the information audit classification model is quickly obtained.

In one embodiment, as shown in fig. 3, an audit tag corresponding to the target text is generated based on an audit tag generation rule corresponding to the business party, and the target text is labeled with the audit tag, that is, step 206 includes steps 302 to 310.

Step 302, obtaining the evaluation configuration parameters of the business party on the risk level.

And step 304, determining an audit tag generation rule according to the evaluation configuration parameters.

And step 306, traversing the target text according to the generation rule of the audit tag, and performing classification statistics on the information in the target text.

And 308, generating an audit tag according to the classification statistical result and the audit tag generation rule.

And 310, generating a characteristic vector according to the target text, and performing audit tag labeling on the characteristic vector.

And training the initial classification model according to the target text carrying the audit tag to obtain an information audit classification model, namely step 208 comprises step 312.

And step 312, training the initial classification model according to the feature vector carrying the audit tag to obtain an information audit classification model.

In order to meet the auditing requirements of different business parties, corresponding information auditing classification models are respectively constructed for different business parties. Each model needs to be trained based on the sample carrying the label of the result of the audit classification. If each business party adopts actual historical data based on the business party to perform sample collection and sample labeling, obviously, the method needs to be realized based on a large amount of data collection and sample labeling. Aiming at the problem, the scheme provides a mode of automatically generating the audit tag generation rule corresponding to the business party by using the evaluation configuration parameters of the business party on the risk level to generate the audit tag, so that the audit tag meeting the requirements of each business party is determined.

Wherein, the number of the target texts in each sample image can be one or more (more than or equal to 2). The audit tag generation rule comprises a classification statistical mode for the target text, wherein the classification statistical mode can comprise a defined class and statistics of the number of the target texts corresponding to each class, and can also comprise classification statistics for the target text in a clustering mode. And the server performs classification statistics on the target texts according to the categories to which the target texts belong by traversing all the target texts, wherein the classification statistics comprises all the categories and the number of the target texts in all the categories. And generating an audit tag corresponding to the statistical result based on the audit tag generation rule. And collecting the target texts into a text set, wherein the data in the text set is the characteristic data corresponding to the model training sample, and the audit tag is the label tag corresponding to the characteristic data. In an embodiment, a feature vector may be generated based on the target text, and the feature vector may be labeled with an audit tag according to the generated audit tag. And the feature sets of all target texts in the same image sample are integrated into a whole through the feature vectors, so that a more accurate classification result can be obtained in the model training process.

In one embodiment, the image containing the target text is a credit report. As shown in fig. 4, the audit tag generation rules are determined based on the rating configuration parameters, i.e., step 304 includes steps 402 through 404.

Step 402, determining a negative credit transaction threshold parameter corresponding to each risk level of the user type according to the user type in the evaluation configuration parameters. Wherein the user types include individual users and business users.

Step 404, configuring a label generation rule corresponding to the risk level according to the negative credit transaction threshold parameter.

And configuring corresponding negative credit transaction threshold parameters aiming at different user types, so as to realize effective differentiation between individual users and enterprise users. Taking the credit investigation report as an example, the user type corresponding to the personal credit investigation report is a personal and the user type corresponding to the enterprise credit investigation report is an enterprise. The negative credit transaction types include an overdue credit transaction amount, an overdue time, a credit card overdraft, a debt, a civil decision, a forced enforcement, an administrative penalty, a telecommunication arrears, a guarantor representative, a debt, etc. The risk classes include low risk (e.g., no bad credit record), medium risk (e.g., one bad record in a year), high risk (e.g., three bad records in a year)

In an embodiment, different tag generation rules may be set for individual users and enterprise users, and specifically, different negative credit transaction threshold parameters may be set, such as negative credit transaction types affecting a negative credit transaction threshold, or different weights may be configured for different negative credit transaction types. The method avoids the influence on the training effect of model classification caused by the fact that objects of different user types are divided by the same standard.

Wherein the negative credit transaction thresholds for different risk levels are different. In particular, the negative credit transaction threshold is measured by the type to which the negative credit transaction information belongs and the amount of each type of negative credit transaction information. For example, one type of negative credit transaction information may be set as the evaluation condition, and another type of negative credit transaction information may not be set as the evaluation condition. As another example, a number threshold of negative credit transaction information in each type as an evaluation condition may be defined, and the negative credit transaction threshold may be the number of classes that exceed their corresponding number thresholds. The negative credit transaction threshold may also be a sum of the amounts of the types of negative credit transaction information. The objects of different user types are divided by adopting different standards, so that the effectiveness of the training effect of model classification is improved.

In one embodiment, as shown in fig. 5, tag generation rules corresponding to risk levels are configured according to the negative credit transaction threshold parameters, step 404, which includes steps 502 to 504.

Step 502, extracting the weight parameters for different negative credit transaction types in the evaluation configuration parameters.

Step 504, configuring a label generation rule corresponding to the risk level according to the threshold data in the weight parameter and the negative credit transaction threshold parameter.

Different business parties can configure different weight parameters for different negative credit transaction types, for example, credit card overdraft and telecom arrears can have different weight parameters. For example, the bank of the service party a may set the weight parameter of credit card overdraft to 0.3, the weight parameter of telecommunication arrears to 0.1, and the threshold of the risk level to 2, and the label generation rule corresponding to the risk level is that the sum of the products of the number of the negative credit transaction types of each type of the user and the weight parameter cannot exceed 2. The label generation rules corresponding to the rest risk levels are similar to the middle risk level, and are not described in detail.

Label generation rules corresponding to risk levels are configured by combining weight parameters of different negative credit transaction types with negative credit transaction thresholds, so that accurate risk level classification can be accurately performed on target texts corresponding to sample images, and accurate audit labels can be obtained.

In one embodiment, as shown in fig. 6, tag generation rules corresponding to risk levels are configured according to the negative credit transaction threshold parameter, step 404, which includes steps 602-604.

Step 602, a quantity threshold corresponding to each negative credit transaction type is extracted from the negative credit transaction threshold parameter.

Step 604, configuring a label generation rule corresponding to the risk level according to the quantity threshold corresponding to each negative credit transaction type.

The negative credit transaction threshold parameter may be a limit on the number of negative credit transactions for each negative credit transaction type. For example, if the low risk level corresponds to the low risk level corresponding tag generation rule defining the negative credit transaction numbers of type 1 and type 2 as 0, and defining the negative credit transaction numbers of type 3 and type 4 as ≦ 1, then the negative credit transaction numbers of type 1 and type 2 as 0, and the negative credit transaction numbers of type 3 and type 4 as ≦ 1. And by limiting the quantity threshold corresponding to each negative credit transaction type, generating a label generation rule corresponding to the risk level, not only meeting the classification requirement of a business party, but also carrying out accurate risk level classification on the target text corresponding to each sample image to obtain an accurate audit label.

In one embodiment, an information auditing method is provided, and the method comprises the following steps: and acquiring an image to be audited of the user to be audited, and performing optical character recognition on the image to be audited to obtain an initial recognition text. And performing semantic analysis on the initial recognition text, and extracting a text to be analyzed in the initial recognition text according to a semantic analysis result. And inputting the text to be analyzed into the information auditing and classifying model in any embodiment to obtain an information auditing result.

The user to be checked refers to a user who needs to check the information of the user to determine the checking result. The image to be checked refers to image data which is provided by the user to be checked and contains the checking information. For example, the image data of the credit investigation report of the user to be checked may be an image uploaded to the server by the user through a designated interface, or an image obtained by scanning or image-capturing a paper text through a scanning or image-capturing device provided in the terminal. The processing procedures of performing optical character recognition, semantic analysis and text extraction to be analyzed on the image to be audited are the same as those of the sample image, and are not repeated.

After extracting the text to be analyzed in the initial identification text, the server inputs the text to be analyzed into the information audit classification model obtained by the construction method of the information audit classification model, and obtains the corresponding information audit result through the classification analysis of the model. And the information auditing result comprises the corresponding risk level of the user. The information is audited and classified through the information audit classification model, so that rapid classification processing can be realized, and an information audit result can be obtained rapidly and accurately. It should be emphasized that, in order to further ensure the privacy and security of the information auditing result, the information auditing result may also be stored in a node of a block chain.

In one embodiment, the information auditing method further includes: and acquiring an actual auditing result corresponding to the information auditing result. And when the difference between the audit classification result and the actual audit result is larger than a preset threshold value, taking the actual audit result as a label of the image to be audited, and adding the label to the update sample set. And performing iterative training on the information audit classification model according to the update sample set based on a preset model iteration cycle to obtain an updated information audit classification model.

Based on the actual audit result corresponding to the information audit result, when the actual result is not consistent with the classification result of the model, the model is iteratively updated through adding a new sample, so that the accuracy of model classification can be further improved in the application process, and the actual application scene of a business party corresponding to the model can be better fitted.

It should be understood that, although the steps in the flowcharts related to the above embodiments are shown in sequence as indicated by the arrows, the steps are not necessarily executed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in each flowchart related to the above embodiments may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a part of the steps or stages in other steps.

In one embodiment, as shown in fig. 7, there is provided an apparatus for constructing an information audit classification model, including: text recognition module 702, semantic analysis module 704, label generation module 706, and model training module 708, wherein:

and the text recognition module 702 is configured to obtain a sample image, and perform optical character recognition on the sample image to obtain a text recognition result.

And the semantic analysis module 704 is configured to perform semantic analysis on the text recognition result, and extract a target text in the text recognition result according to the semantic analysis result.

And a tag generation module 706, configured to generate an audit tag corresponding to the target text based on an audit tag generation rule corresponding to the business party, and perform audit tag labeling on the target text.

And the model training module 708 is configured to train the initial classification model according to the target text with the audit tag, so as to obtain an information audit classification model.

In one embodiment, the tag generation module is further configured to obtain a configuration parameter for risk rating of the business party; determining an audit tag generation rule according to the evaluation configuration parameters; traversing the target text according to the audit tag generation rule, and carrying out classification statistics on information in the target text; generating an audit label according to the classification statistical result and the audit label generation rule; and generating a characteristic vector according to the target text, and labeling the characteristic vector with an audit tag.

In one embodiment, the image containing the target text is a credit report; the label generation module is also used for determining a negative credit transaction threshold parameter of each risk grade corresponding to the user type according to the user type in the evaluation configuration parameters; wherein the user types comprise individual users and enterprise users; and configuring a label generation rule corresponding to the risk level according to the negative credit transaction threshold parameter.

In one embodiment, the tag generation module is further configured to extract a weight parameter for different negative credit transaction types in the assessment configuration parameters; and configuring a label generation rule corresponding to the risk level according to the threshold data in the weight parameter and the negative credit transaction threshold parameter.

In one embodiment, the tag generation module is further configured to extract a quantity threshold corresponding to each negative credit transaction type from the negative credit transaction threshold parameter; and configuring a label generation rule corresponding to the risk level according to the quantity threshold corresponding to each negative credit transaction type.

According to the device for constructing the information auditing and classifying model, the image containing the target text is obtained, the image is identified through optical character identification, the target text in the identification result is extracted through semantic analysis, and preliminary screening of information is achieved. The label corresponding to the target text is automatically generated through the preset audit label generation rule, label labeling is carried out, manual labeling processing is not needed, the generation speed of the model training sample is improved, the same image sample can be generated into audit labels suitable for model training of different business parties based on the audit label generation rule corresponding to the business party, the multiplexing of the sample images is realized, the initial classification model can be trained quickly and conveniently, the efficiency of the model training process is improved, and the information audit classification model is quickly obtained.

In one embodiment, an information auditing apparatus is provided, the apparatus including:

and the text recognition module of the image to be audited is used for acquiring the image to be audited of the user to be audited, and performing optical character recognition on the image to be audited to obtain an initial recognition text.

And the semantic analysis module of the image to be audited is used for performing semantic analysis on the initial recognition text and extracting the text to be analyzed in the initial recognition text according to a semantic analysis result.

And the information auditing and classifying module is used for inputting the text to be analyzed into the information auditing and classifying model which is constructed by the construction device of any one information auditing and classifying model to obtain an information auditing result.

In one embodiment, the information auditing device is further configured to obtain an actual auditing result corresponding to the information auditing result; when the difference between the audit classification result and the actual audit result is larger than a preset threshold value, taking the actual audit result as a label of the image to be audited, and adding the label to the update sample set; and performing iterative training on the information audit classification model according to the update sample set based on a preset model iteration cycle to obtain an updated information audit classification model.

The information auditing device is based on the information auditing and classifying model which is constructed in advance, can be directly based on the image to be audited, and can input the text to be analyzed into the information auditing and classifying model through carrying out optical character recognition, semantic analysis and text extraction to be analyzed on the image to be audited, so that the information auditing result can be quickly and accurately obtained.

For the specific limitations of the information auditing and classifying model constructing device and the information auditing and classifying device, reference may be made to the above limitations of the information auditing and classifying model constructing method, which are not described herein again. The building device of the information auditing and classifying model and each module in the information auditing and classifying device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 8. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer equipment is used for storing information auditing and classifying result data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to realize a construction method of the information auditing and classifying model.

Those skilled in the art will appreciate that the architecture shown in fig. 8 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:

acquiring a sample image, and carrying out optical character recognition on the sample image to obtain a text recognition result; performing semantic analysis on the text recognition result, and extracting a target text in the text recognition result according to the semantic analysis result; generating an audit label corresponding to the target text based on an audit label generating rule corresponding to the business party, and labeling the audit label on the target text; and training the initial classification model according to the target text carrying the audit tag to obtain an information audit classification model.

In one embodiment, the processor, when executing the computer program, further performs the steps of:

acquiring the evaluation configuration parameters of the business party on the risk level; determining an audit tag generation rule according to the evaluation configuration parameters; traversing the target text according to the audit tag generation rule, and carrying out classification statistics on information in the target text; generating an audit label according to the classification statistical result and the audit label generation rule; and generating a characteristic vector according to the target text, and labeling the characteristic vector with an audit tag.

determining a negative credit transaction threshold parameter of each risk level corresponding to the user type according to the user type in the evaluation configuration parameters; wherein the user types comprise individual users and enterprise users; and configuring a label generation rule corresponding to the risk level according to the negative credit transaction threshold parameter.

extracting weight parameters for different negative credit transaction types in the evaluation configuration parameters; and configuring a label generation rule corresponding to the risk level according to the threshold data in the weight parameter and the negative credit transaction threshold parameter.

extracting a quantity threshold corresponding to each negative credit transaction type from the negative credit transaction threshold parameter; and configuring a label generation rule corresponding to the risk level according to the quantity threshold corresponding to each negative credit transaction type.

acquiring an image to be audited of a user to be audited, and performing optical character recognition on the image to be audited to obtain an initial recognition text; performing semantic analysis on the initial recognition text, and extracting a text to be analyzed in the initial recognition text according to a semantic analysis result; and inputting the text to be analyzed into the information auditing and classifying model in any embodiment to obtain an information auditing result.

acquiring an actual auditing result corresponding to the information auditing result; when the difference between the audit classification result and the actual audit result is larger than a preset threshold value, taking the actual audit result as a label of the image to be audited, and adding the label to the update sample set; and performing iterative training on the information audit classification model according to the update sample set based on a preset model iteration cycle to obtain an updated information audit classification model.

In one embodiment, a computer storage medium is provided, having a computer program stored thereon, the computer program, when executed by a processor, implementing the steps of:

In one embodiment, the computer program when executed by the processor further performs the steps of:

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

Further, the computer-usable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.

The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method for constructing an information auditing and classifying model is characterized by comprising the following steps:

generating an audit tag corresponding to the target text based on an audit tag generation rule corresponding to a business party, and labeling the audit tag of the target text;

2. The method of claim 1, wherein generating an audit tag corresponding to the target text based on an audit tag generation rule corresponding to a business party, and wherein performing audit tag labeling on the target text comprises:

and generating a characteristic vector according to the target text, and performing audit label marking on the characteristic vector.

3. The method of claim 2, wherein the image containing the target text is a credit report; the determining the generation rule of the audit tag according to the evaluation configuration parameters comprises:

determining a negative credit transaction threshold parameter of each risk level corresponding to the user type according to the user type in the evaluation configuration parameters; wherein the user types include individual users and business users;

4. The method of claim 3, wherein configuring tag generation rules corresponding to risk levels according to the negative credit transaction threshold parameter comprises:

5. The method of claim 3, wherein configuring tag generation rules corresponding to risk levels according to the negative credit transaction threshold parameter comprises:

6. An information auditing method, characterized in that the method comprises:

acquiring an image to be checked of a user to be checked, and performing optical character recognition on the image to be checked to obtain an initial recognition text;

inputting the text to be analyzed into an information auditing and classifying model to obtain an information auditing result, wherein the information auditing and classifying model is constructed based on the construction method of the information auditing and classifying model of any one of claims 1 to 5.

7. The method of claim 6, further comprising:

when the difference between the audit classification result and the actual audit result is larger than a preset threshold value, taking the actual audit result as a label of the image to be audited, and adding the label to an update sample set;

and performing iterative training on the information auditing classification model according to the updating sample set based on a preset model iteration cycle to obtain an updated information auditing classification model.

8. An apparatus for constructing an information audit classification model, the apparatus comprising:

the tag generation module is used for generating an audit tag corresponding to the target text based on an audit tag generation rule corresponding to a business party and marking the audit tag on the target text;

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 7.

10. A computer storage medium on which a computer program is stored, characterized in that the computer program, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.