CN117151647A - Auxiliary auditing method, equipment and medium for bill - Google Patents

Auxiliary auditing method, equipment and medium for bill Download PDF

Info

Publication number
CN117151647A
CN117151647A CN202311183836.5A CN202311183836A CN117151647A CN 117151647 A CN117151647 A CN 117151647A CN 202311183836 A CN202311183836 A CN 202311183836A CN 117151647 A CN117151647 A CN 117151647A
Authority
CN
China
Prior art keywords
reject
text
information
bill
logistic regression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311183836.5A
Other languages
Chinese (zh)
Inventor
王印智
马士中
王金丽
任聪
唐昌明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur General Software Co Ltd
Original Assignee
Inspur General Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur General Software Co Ltd filed Critical Inspur General Software Co Ltd
Priority to CN202311183836.5A priority Critical patent/CN117151647A/en
Publication of CN117151647A publication Critical patent/CN117151647A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/103Workflow collaboration or project management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/27Regression, e.g. linear or logistic regression

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides an auxiliary auditing method, equipment and medium for a bill, and belongs to the technical field of data processing. The method comprises the steps of obtaining to-be-checked single data information from a user terminal; at least one reject predictor information sequence corresponding to the to-be-checked document information is determined based on a pre-trained logistic regression model. The reject prediction sub-information sequence is obtained based on bill information with a first reject prediction probability value larger than a preset first probability threshold value in the to-be-checked bill information. And determining a second reject prediction probability value corresponding to the to-be-checked single data information based on the logistic regression model and the reject prediction sub-information sequence. And under the condition that the second reject prediction probability value is larger than a second probability threshold value, generating audit prompt information and sending the audit prompt information to a corresponding audit terminal. And generating a reject analysis text according to the reject reason text set corresponding to the reject prediction sub-information sequence, and storing the reject analysis text to the cloud server.

Description

Auxiliary auditing method, equipment and medium for bill
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method, an apparatus, and a medium for assisting in checking documents.
Background
Along with the development of enterprise business, the efficiency and accuracy of bill auditing become an important assessment index, and particularly, the workload of financial staff auditing bills is increased aiming at a financial sharing center of a group. Excessive document auditing work can cause financial staff to make mistakes easily in the auditing process, or auditing refuses can not timely give a document submitting staff answer.
Currently, for a document to be rejected, a document submitting person often obtains a rejection reason and inquires a solution by inquiring an auditor offline or online. The method consumes auditing working time, causes poor experience of a user of the bill auditing system, and cannot meet the intelligent requirement of the user on the bill auditing system.
Based on the above, a technical scheme capable of reducing the manual auditing amount of bill auditors, improving auditing efficiency and improving the intelligent level of a bill auditing system is needed.
Disclosure of Invention
The embodiment of the application provides an auxiliary auditing method, equipment and medium for a bill, which are used for solving the problems of large auditing quantity, low auditing efficiency and low intelligent level of a bill auditing system of the conventional bill auditing worker.
In one aspect, an embodiment of the present application provides a method for assisting in auditing a document, where the method includes:
acquiring to-be-checked single data information from a user terminal; the bill information to be checked at least comprises a header, bill transaction content and bill auditors;
determining at least one reject predictor information sequence corresponding to the to-be-checked bill information based on a pre-trained logistic regression model; the reject prediction sub-information sequence is obtained based on bill sub-information with a first reject prediction probability value larger than a preset first probability threshold value in the to-be-checked bill information; the bill sub-information is obtained by comparing a preset historical keyword set with the to-be-checked bill information;
determining a second reject prediction probability value corresponding to the to-be-checked single data information based on the logistic regression model and the reject prediction sub-information sequence;
generating audit prompt information and sending the audit prompt information to a corresponding audit terminal under the condition that the second reject prediction probability value is larger than a second probability threshold value; and
and generating a rejection analysis text according to the rejection reason text set corresponding to the rejection prediction sub-information sequence, and storing the rejection analysis text to a cloud server.
In one implementation of the present application, before determining at least one reject predicted sub-information sequence corresponding to the to-be-checked single data information based on a pre-trained logistic regression model, the method further includes:
acquiring a plurality of historical reject bill information and corresponding reject reason text data sets;
respectively determining word frequency and inverse text frequency index of words in the reject cause text data set through a preset TF-IDF model;
when the product value of the word frequency of the word and the inverse text frequency index is larger than a preset value, the word is used as a keyword of the reject reason text data set;
and generating keyword feature vectors respectively corresponding to the reject cause text data of the reject cause text data set based on the obtained keywords and the word bag model, so as to train the logistic regression model according to the keyword feature vectors and the historical reject document information.
In one implementation manner of the present application, training the logistic regression model according to each keyword feature vector and each history reject document information specifically includes:
taking any one of the keyword feature vectors as a feature vector to be associated;
sequentially calculating cosine similarity of the feature vector to be associated and other feature vectors of the keywords, and generating an associated feature vector set corresponding to the feature vector to be associated according to a comparison result of the cosine similarity and a first preset threshold; one of the sets of associated feature vectors corresponds to a predetermined historical reject cause;
determining occurrence frequencies of the keyword feature vectors in the associated feature vector sets according to the obtained associated feature vector sets, so as to determine occurrence probability values corresponding to the keyword feature vectors respectively according to the occurrence frequencies;
and adding the bill sub-information of each history reject bill information, the corresponding keyword feature vector and the occurrence probability value of each history reject bill information to a data dictionary as model training samples so as to train the logistic regression model.
In one implementation of the present application, training the logistic regression model specifically includes:
sequentially inputting the associated feature vector set and the corresponding occurrence frequency value corresponding to each bill sub-information in the data dictionary into a logistic regression model to be trained so as to train the logistic regression model to be trained;
under the condition that the logistic regression model to be trained is trained, updating corresponding model parameter values through a gradient descent algorithm until the corresponding model parameter values are determined so that the function value of the logistic regression cost function is smaller than a second preset threshold value, and obtaining the logistic regression model after training is completed.
In one implementation of the present application, the method further includes:
acquiring each piece of real-time updated to-be-checked document information and corresponding reject reason text;
and adding the to-be-checked list data information and the corresponding reject reason text to a preset database to update a model training sample, and retraining the logistic regression model.
In one implementation manner of the present application, generating a reject analysis text according to a reject cause text set corresponding to the reject predictor information sequence specifically includes:
inputting the reject cause text set into a preset analysis text generation model so as to combine the words of the reject cause text; the analysis text generation model is obtained based on training of a plurality of reject reason text word samples and corresponding reject reason sentences;
and determining the reject analysis text according to the output result of the analysis text generation model.
In one implementation of the present application, the method further includes:
responding to the rejection operation of the auditing terminal, and sending prompt information corresponding to the rejection analysis text to the auditing terminal; the prompt message comprises a control for checking the refused analysis text;
the prompt information is sent to the user terminal under the condition that the auditing terminal does not add the rejection analysis text to the rejection reason text box of the rejection operation, so that the user terminal can check the rejection analysis text;
and under the condition that the checking terminal adds the reject reason description text in the reject reason text box, comparing the reject reason description text with the reject analysis text, and sending the reject reason description text and/or the prompt message to the user terminal according to a text comparison result.
In one implementation manner of the present application, according to a text comparison result, the reject cause description text and/or the prompt message is sent to the user terminal, and specifically includes:
respectively calculating text feature vectors corresponding to the reject cause description text and the reject analysis text, and corresponding text similarity; the text similarity is cosine similarity;
sending the prompt information to the user terminal under the condition that the text similarity is larger than a similarity threshold;
and under the condition that the text similarity is smaller than or equal to a similarity threshold, sending the rejection reason description text to the user terminal, adding the rejection reason description text and the to-be-checked list data information to a preset database to update a model training sample, and retraining the logistic regression model.
In another aspect, an embodiment of the present application further provides an auxiliary audit device for a document, where the device includes:
at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to:
acquiring to-be-checked single data information from a user terminal; the bill information to be checked at least comprises a header, bill transaction content and bill auditors;
determining at least one reject predictor information sequence corresponding to the to-be-checked bill information based on a pre-trained logistic regression model; the reject prediction sub-information sequence is obtained based on bill sub-information with a first reject prediction probability value larger than a preset first probability threshold value in the to-be-checked bill information; the bill sub-information is obtained by comparing a preset historical keyword set with the to-be-checked bill information;
determining a second reject prediction probability value corresponding to the to-be-checked single data information based on the logistic regression model and the reject prediction sub-information sequence;
generating audit prompt information and sending the audit prompt information to a corresponding audit terminal under the condition that the second reject prediction probability value is larger than a second probability threshold value; and
and generating a rejection analysis text according to the rejection reason text set corresponding to the rejection prediction sub-information sequence, and storing the rejection analysis text to a cloud server.
In yet another aspect, an embodiment of the present application further provides a non-volatile computer storage medium for assisted review of documents, storing computer-executable instructions configured to:
acquiring to-be-checked single data information from a user terminal; the bill information to be checked at least comprises a header, bill transaction content and bill auditors;
determining at least one reject predictor information sequence corresponding to the to-be-checked bill information based on a pre-trained logistic regression model; the reject prediction sub-information sequence is obtained based on bill sub-information with a first reject prediction probability value larger than a preset first probability threshold value in the to-be-checked bill information; the bill sub-information is obtained by comparing a preset historical keyword set with the to-be-checked bill information;
determining a second reject prediction probability value corresponding to the to-be-checked single data information based on the logistic regression model and the reject prediction sub-information sequence;
generating audit prompt information and sending the audit prompt information to a corresponding audit terminal under the condition that the second reject prediction probability value is larger than a second probability threshold value; and
and generating a rejection analysis text according to the rejection reason text set corresponding to the rejection prediction sub-information sequence, and storing the rejection analysis text to a cloud server.
Through the technical scheme, the method and the device can predict the reject probability of the bill through the logistic regression model so as to classify the submitted bill, enable auditors to be clear of the bill with reject risk at a glance, and further focus on auditing the part of the bill. Meanwhile, the logistic regression model can automatically capture and learn key characteristic variables, improve the accuracy of intelligent prediction and reduce the risk of missed judgment. Therefore, the problems of large auditing quantity, low auditing efficiency and low intelligent level of the bill auditing system of the current bill auditing worker are solved.
In addition, the application can also give out the reject analysis text for the auditor or bill submitting personnel to check, and does not need to pay more manpower to generate or acquire the reject analysis text, thereby improving the use experience of the user on the bill auditing system.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
FIG. 1 is a schematic flow chart of an auxiliary audit method for documents according to an embodiment of the present application;
FIG. 2 is a schematic structural diagram of an auxiliary audit device for documents according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
For the document auditing flow intellectualization, algorithms such as a support vector machine, a random forest, a decision tree and the like are applied to the document intellectualization auditing, but the algorithms have low calculation efficiency and limited applicable document auditing range, and under certain conditions, part of auditing work still needs to be manually participated, for example, an auditor does not give out a rejection reason, and a document submitter cannot timely learn the rejection reason and the like. This makes current document auditing systems inadequate for users' needs for intelligent auditing procedures.
Based on the above, the embodiment of the application provides an auxiliary auditing method, equipment and medium for a bill, which are used for solving the problems of large auditing quantity, low auditing efficiency and low intelligent level of a bill auditing system of the current bill auditing worker.
Various embodiments of the present application are described in detail below with reference to the attached drawing figures.
The embodiment of the application provides an auxiliary auditing method for a bill, as shown in fig. 1, the method can comprise the steps of S101-S105:
s101, the server acquires the to-be-checked single data information from the user terminal.
The to-be-checked bill data information at least comprises a header, bill transaction content and bill auditors.
It should be noted that, the server is merely an example as an execution subject of the auxiliary auditing method for documents, and the execution subject is not limited to the server, and the present application is not limited thereto in particular.
The server can be a server running the bill auditing system or connected with the running server of the bill auditing system, and the server can be connected with a user terminal submitted by the bill and an auditing terminal for auditing the bill. The user terminal can submit the to-be-checked bill information through the bill checking system through the mobile phone, the computer and other equipment of the bill submitting personnel, such as the notebook computer of the financial reimbursement personnel.
The list head comprises the type of the bill, the bill transaction content refers to the specific content in the list information to be checked, and financial reimbursement bill is taken as an example, such as reimbursement amount, reimbursement item type, reimbursement item quantity and the like. The bill auditor can be the person responsible for auditing the bill after the bill submitting personnel submits the to-be-checked bill information.
S102, the server determines at least one reject prediction sub-information sequence corresponding to the to-be-checked single data information based on a pre-trained logistic regression model.
The reject prediction sub-information sequence is obtained based on bill information with a first reject prediction probability value larger than a preset first probability threshold value in the to-be-checked bill information. The bill information is obtained by comparing a preset historical keyword set with the to-be-checked bill information. The first probability threshold may be specified by a user, as the application is not particularly limited in this regard. Since the to-be-checked bill information may have a plurality of different types of bills, a plurality of reject predictor information sequences may be included in one to-be-checked bill information.
The first reject prediction probability value may be understood as a reject probability of the document when the document sub-information appears in the to-be-checked document information. The first probability threshold is used for screening part of bill sub-information, namely, when the first reject prediction probability value is larger than the first probability threshold, the bill sub-information is added to the reject prediction sub-information sequence. For example, the first rejection prediction probability value of the bill sub-information corresponding to a certain fee item is 80%, the first preset probability threshold value is 60%, and the bill sub-information corresponding to the fee item is added to the rejection prediction sub-information sequence; the first reject prediction probability value of the bill sub-information corresponding to the other bill travel selection is 30%, the preset first probability threshold value is 60%, and the bill sub-information corresponding to the bill travel selection is not added to the reject prediction sub-information sequence. The specific value of the first probability threshold is merely exemplary, and the first probability threshold may be set in an actual use process, and the present application is not limited to this specific value.
The application adopts a logistic regression model to process the to-be-checked bill information, wherein the refused forecast sub-information sequence can be understood as the pre-divided bill sub-information meeting the rule in the to-be-checked bill information. The bill sub-information, such as sub-information of money, project, date, sensitive words in abstract, etc. in the bill, the specific rejection prediction sub-information sequence includes feature vectors such as that the cost project should select travel cost, the invoice is affected repeatedly, the journey is not closed loop, etc., and the bill sub-information is respectively corresponding to a rejection probability (first rejection prediction probability value) as shown in the following table. The first probability threshold is set by the user, which is not particularly limited by the present application.
Multiple bill sub-information Probability of rejection
The expense items should choose the travel expense 80%
Invoice with repeated image 70%
The travel being not closed-loop 30%
In an embodiment of the present application, before determining at least one reject predictor information sequence corresponding to the to-be-checked individual data information based on a pre-trained logistic regression model, the method further includes:
the server acquires a plurality of historical reject document information and corresponding reject reason text data sets. And then, respectively determining word frequency and inverse text frequency indexes of words in the reject reason text data set through a preset TF-IDF model. And under the condition that the product value of the word frequency of the word and the inverse text frequency index is larger than a preset value, the word is used as a keyword of the text data set for rejecting reasons. And generating keyword feature vectors corresponding to the reject reason text data of the reject reason text data set respectively based on the obtained keywords and the word bag model, so as to train the logistic regression model according to the keyword feature vectors and the historical reject document information.
In other words, the server can obtain training samples of the logistic regression model from a preset database preset by a user, wherein the training samples comprise a plurality of pieces of historical reject bill information and corresponding reject reason text data sets, the reject reason text data sets are composed of a plurality of pieces of historical reject reason text data of the historical reject bill information, and the historical reject reason text data can be generated in advance by the user or filled by bill auditors in actual use, and the application is not limited in particular. By utilizing the TF-IDF model, the server can firstly count word frequencies of words in the same reject reason text data, which appear in the reject reason text data set, and calculate reverse text frequency indexes of the words in all reject reason text data. And obtaining TF-IDF values of the words according to the product of the word frequency and the inverse text frequency index, comparing the product value with a preset value, and selecting the words larger than the preset value as keywords in the reject reason text data set by the server. The preset value is set by a user in the actual use process, and the application is not particularly limited to this.
After obtaining the keywords of the reject cause text data set, the server may also obtain a keyword feature vector of the keywords included in each reject cause text data by using a Bag-of-Words model (BOW), for example, the reject cause text data is "monetary amount exceeds reimbursement index", the keywords include "exceeds" index ", and the keyword feature vector is {1,2}.
Then, the server trains the logistic regression model according to the feature vectors of the keywords and the information of the historical reject receipts, and specifically comprises the following steps:
and the server takes any keyword feature vector in the keyword feature vectors as the feature vector to be correlated. And sequentially calculating cosine similarity of the feature vector to be associated and other feature vectors of the keywords, so as to generate an associated feature vector set corresponding to the feature vector to be associated according to a comparison result of the cosine similarity and a first preset threshold. An associated feature vector set corresponds to a predetermined historical reject cause. And determining the occurrence frequency of each keyword feature vector in each associated feature vector set according to each obtained associated feature vector set, so as to determine the occurrence probability value corresponding to each keyword feature vector according to the occurrence frequency. And adding bill sub-information of each history reject bill information, and corresponding keyword feature vectors and occurrence probability values thereof to a data dictionary as model training samples so as to train a logistic regression model.
That is, the reject cause text data set includes a plurality of reject cause text data, each reject cause text data corresponds to one keyword feature vector, the server may randomly designate one keyword feature vector as a feature vector to be associated, and then obtain the associated feature vector set by calculating cosine similarity between the feature vector to be associated and other keyword feature vectors. Specifically, when the cosine similarity between the feature vector to be associated and the feature vector of another keyword is greater than a first preset threshold, the feature vector to be associated is similar to the feature vector of the other keyword, and the reason association is refused, namely, the two feature vectors are added to the association feature vector set. The first preset threshold may be set according to actual use, which is not particularly limited in the present application. Because of the reject cause association, the keyword feature vectors in the same associated feature vector set may correspond to the same reject cause or to several reject causes, i.e., the same reject cause may correspond to multiple keyword feature vectors.
The server can also count the occurrence times of the keyword feature vector in different associated feature vector sets, calculate the ratio of the occurrence times and the value to the associated feature vector sets, and use the occurrence probability value corresponding to the keyword feature vector as the occurrence probability value of the reject reason corresponding to the keyword feature vector, and use the probability value corresponding to the reject reason to reject the bill. The server may further predict the occurrence probability value as a first reject prediction probability value corresponding to the bill information according to a correspondence between the reject cause and the bill information. The reasons for rejection are that bill information A is wrong, bill information B is wrong, etc.
And then, the server can generate a data dictionary containing bill sub-information, the keyword feature vector corresponding to the bill sub-information and the occurrence frequency value of the keyword feature vector for training of the logistic regression model. The logistic regression model is trained, and specifically comprises the following steps:
and the server sequentially inputs the associated feature vector set and the corresponding occurrence frequency values corresponding to the bill sub-information in the data dictionary into the logistic regression model to be trained so as to train the logistic regression model to be trained. Under the condition that the logistic regression model to be trained is trained, updating corresponding model parameter values through a gradient descent algorithm until the corresponding model parameter values are determined so that the function value of the logistic regression cost function is smaller than a second preset threshold value, and obtaining the logistic regression model after training.
The server uses a logistic regression algorithm to model based on the feature variables and bill sub-information (feature values) data dictionary. Splitting the modeling data set into a training set and a corresponding test set for training and evaluation, wherein the specific implementation logic is as follows: first, assume that n keyword feature vectors x= [ X, X, … xn ], where xi is the i-th keyword feature vector. Logistic regression assumes that the probability function is hθ (x) =g (θ≡t x), where hθ (x) is the occurrence probability value of the keyword feature variable x and θ is the parameter value of the model. g (z) is a sigmoid function g (z) =1/(1+e++z), z is θt x, and θt represents the θtranspose. In addition, using a logistic regression cost function, using the difference between Log metric actual and predicted values, J (θ) = -1/m Σ [ y Log (hθ (x)) + (1-y) Log (1-hθ (x)) ], J (θ) is a cost function, m is the number of training samples, y is the actual tag value (0 or 1), and hθ (x) is an approximate estimate of model prediction: to minimize the cost function J (θ), a gradient descent algorithm is used to estimate the optimal parameter value θ. And continuously adjusting the model until the model is optimal through iteratively updating the parameter theta, and obtaining the logistic regression model after training even if the function value of the logistic regression cost function is smaller than a second preset threshold value. The second preset threshold is set by the user, which is not particularly limited by the present application.
Through the logistic regression model which is completed through the training, the server can calculate a first reject prediction probability value of each bill information in the to-be-checked bill information, and then the reject prediction sub-information sequence is generated through a comparison result with a first probability threshold value.
And S103, the server determines a second reject prediction probability value corresponding to the to-be-checked bill information based on the logistic regression model and the reject prediction sub-information sequence.
Through the logistic regression model, weighted average calculation can be performed on each first reject prediction probability value corresponding to each reject prediction sub-information sequence, and weights can be set according to different reject reasons in the logistic regression model training process, namely different weights are set by different associated feature vector sets, and a second reject prediction probability value is calculated according to the sum of the product values of each weight and the first reject prediction probability value.
And S104, the server generates auditing prompt information under the condition that the second reject prediction probability value is larger than a second probability threshold value, and sends the auditing prompt information to the corresponding auditing terminal.
The second probability threshold may be set according to actual use, which is not particularly limited by the present application. When the second reject prediction probability value is larger than the second probability threshold value, the server can generate prompt information for prompting the auditing terminal to audit the bill sub-information corresponding to the second reject prediction probability value corresponding to the auditor, and label the bill sub-information to generate prompt information.
And S105, the server generates reject analysis text according to the reject reason text set corresponding to the reject prediction sub-information sequence, and stores the reject analysis text to the cloud server.
The cloud server may be connected to the server for storing the reject analysis text.
In the embodiment of the application, the method further comprises the following steps:
the server acquires the real-time updated information of each to-be-checked list and the corresponding reject reason text. And adding the information of each to-be-checked list and the corresponding reject reason text to a preset database to update a model training sample, and retraining the logistic regression model.
That is, if the to-be-checked document information is rejected, the server can retrain the logistic regression model according to the rejected to-be-checked document information and the reject reason text corresponding to the to-be-checked document information, which is obtained by processing the logistic regression model, so as to ensure that the logistic regression model is kept optimal.
In an embodiment of the present application, the generating the reject analysis text according to the reject cause text set corresponding to the reject predictor information sequence specifically includes:
and the server inputs the reject reason text set into a preset analysis text generation model so as to combine the reject reason text words. The analysis text generation model is obtained based on training of a plurality of reject reason text word samples and corresponding reject reason sentences. And determining the refused analysis text according to the output result of the analysis text generation model.
The analysis text generation model may be a neural network model, which is obtained by training a plurality of reject cause text word samples and reject cause sentences corresponding to the samples, and can identify reject cause texts and output reject cause sentences corresponding to the reject cause texts, and the output reject cause sentences are taken as reject analysis texts.
Furthermore, the application also enables:
and responding to the rejection operation of the auditing terminal, and sending prompt information corresponding to the rejection analysis text to the auditing terminal. The hint information includes controls for viewing the overrule text. And sending prompt information to the user terminal under the condition that the auditing terminal does not add the reject analysis text to the reject reason text box of the reject operation, so that the user terminal can check the reject analysis text. Under the condition that the checking terminal adds the reject reason description text in the reject reason text box, comparing the reject reason description text with the reject analysis text, and sending the reject reason description text and/or prompt information to the user terminal according to the text comparison result.
That is, the auditing terminal can perform the rejection operation, and after obtaining the second rejection prediction probability value, the auditing terminal user can check the to-be-checked single data information with the auditing prompt information, and click the rejection control to reject the to-be-checked single data information under the condition that the rejection condition is met. At this time, the server may obtain the reject analysis text and generate a control capable of viewing the reject analysis text, e.g., the user may view the reject analysis text by clicking, sliding, etc. the control.
If the checking terminal is in the process of checking the bill information to be checked, the checking terminal does not actively add the checking analysis text, and the server can send prompt information to the user terminal, so that the user terminal accesses the cloud server to check the checking analysis text. If the auditing terminal actively adds the reject reason description text, the reject analysis text and the reject reason description text can be further compared, and then the reject reason description text is sent and/or prompt information is given.
According to the text comparison result, sending a reject reason description text and/or prompt information to the user terminal, wherein the method specifically comprises the following steps:
the server calculates the text feature vectors corresponding to the reject reason description text and the reject analysis text respectively, and the corresponding text similarity. The text similarity is cosine similarity. And sending prompt information to the user terminal under the condition that the text similarity is larger than a similarity threshold value. And under the condition that the text similarity is smaller than or equal to a similarity threshold, sending a reject reason description text to the user terminal, adding the reject reason description text and the to-be-checked single data information to a preset database to update a model training sample, and retraining the logistic regression model.
In other words, the server may send the rejection reason description text to the user terminal when the rejection reason description text is dissimilar to the rejection analysis text given by the logistic regression model, and perform feedback training of the logistic regression model through the rejection reason description text, so as to ensure freshness of the logistic regression model.
Through the technical scheme, the method and the device can predict the reject probability of the bill through the logistic regression model so as to classify the submitted bill, enable auditors to be clear of the bill with reject risk at a glance, and further focus on auditing the part of the bill. Meanwhile, the logistic regression model can automatically capture and learn key characteristic variables, improve the accuracy of intelligent prediction and reduce the risk of missed judgment. Therefore, the problems of large auditing quantity, low auditing efficiency and low intelligent level of the bill auditing system of the current bill auditing worker are solved.
In addition, the application can also give out the reject analysis text for the auditor or bill submitting personnel to check, and does not need to pay more manpower to generate or acquire the reject analysis text, thereby improving the use experience of the user on the bill auditing system.
Fig. 2 is a schematic structural diagram of an auxiliary audit device for documents according to an embodiment of the present application, where, as shown in fig. 2, the device includes:
at least one processor; and a memory communicatively coupled to the at least one processor. Wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to:
and acquiring the to-be-checked single data information from the user terminal. The to-be-checked bill data information at least comprises a header, bill transaction content and bill auditors. At least one reject predictor information sequence corresponding to the to-be-checked document information is determined based on a pre-trained logistic regression model. The reject prediction sub-information sequence is obtained based on bill information with a first reject prediction probability value larger than a preset first probability threshold value in the to-be-checked bill information. The bill information is obtained by comparing a preset historical keyword set with the to-be-checked bill information. And determining a second reject prediction probability value corresponding to the to-be-checked single data information based on the logistic regression model and the reject prediction sub-information sequence. And under the condition that the second reject prediction probability value is larger than a second probability threshold value, generating audit prompt information and sending the audit prompt information to a corresponding audit terminal. And generating a reject analysis text according to the reject reason text set corresponding to the reject prediction sub-information sequence, and storing the reject analysis text to the cloud server.
The embodiment of the application also provides a non-volatile computer storage medium for the auxiliary audit of the bill, which stores computer executable instructions, wherein the computer executable instructions are set as follows:
and acquiring the to-be-checked single data information from the user terminal. The to-be-checked bill data information at least comprises a header, bill transaction content and bill auditors. At least one reject predictor information sequence corresponding to the to-be-checked document information is determined based on a pre-trained logistic regression model. The reject prediction sub-information sequence is obtained based on bill information with a first reject prediction probability value larger than a preset first probability threshold value in the to-be-checked bill information. The bill information is obtained by comparing a preset historical keyword set with the to-be-checked bill information. And determining a second reject prediction probability value corresponding to the to-be-checked single data information based on the logistic regression model and the reject prediction sub-information sequence. And under the condition that the second reject prediction probability value is larger than a second probability threshold value, generating audit prompt information and sending the audit prompt information to a corresponding audit terminal. And generating a reject analysis text according to the reject reason text set corresponding to the reject prediction sub-information sequence, and storing the reject analysis text to the cloud server.
The embodiments of the present application are described in a progressive manner, and the same and similar parts of the embodiments are all referred to each other, and each embodiment is mainly described in the differences from the other embodiments. In particular, for the apparatus, medium embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.
The device, medium and method provided by the embodiment of the application are in one-to-one correspondence, so that the device and medium also have similar beneficial technical effects as the corresponding method, and the beneficial technical effects of the device and medium are not repeated here because the beneficial technical effects of the method are described in detail above.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims (10)

1. An assisted verification method for documents, the method comprising:
acquiring to-be-checked single data information from a user terminal; the bill information to be checked at least comprises a header, bill transaction content and bill auditors;
determining at least one reject predictor information sequence corresponding to the to-be-checked bill information based on a pre-trained logistic regression model; the reject prediction sub-information sequence is obtained based on bill sub-information with a first reject prediction probability value larger than a preset first probability threshold value in the to-be-checked bill information; the bill sub-information is obtained by comparing a preset historical keyword set with the to-be-checked bill information;
determining a second reject prediction probability value corresponding to the to-be-checked single data information based on the logistic regression model and the reject prediction sub-information sequence;
generating audit prompt information and sending the audit prompt information to a corresponding audit terminal under the condition that the second reject prediction probability value is larger than a second probability threshold value; and
and generating a rejection analysis text according to the rejection reason text set corresponding to the rejection prediction sub-information sequence, and storing the rejection analysis text to a cloud server.
2. The assisted review method of claim 1 in which prior to determining at least one reject predicted sub-information sequence corresponding to the document to be reviewed based on a pre-trained logistic regression model, the method further comprises:
acquiring a plurality of historical reject bill information and corresponding reject reason text data sets;
respectively determining word frequency and inverse text frequency index of words in the reject cause text data set through a preset TF-IDF model;
when the product value of the word frequency of the word and the inverse text frequency index is larger than a preset value, the word is used as a keyword of the reject reason text data set;
and generating keyword feature vectors respectively corresponding to the reject cause text data of the reject cause text data set based on the obtained keywords and the word bag model, so as to train the logistic regression model according to the keyword feature vectors and the historical reject document information.
3. The method for assisting in auditing documents according to claim 2, wherein training the logistic regression model according to the keyword feature vectors and the history reject document information specifically comprises:
taking any one of the keyword feature vectors as a feature vector to be associated;
sequentially calculating cosine similarity of the feature vector to be associated and other feature vectors of the keywords, and generating an associated feature vector set corresponding to the feature vector to be associated according to a comparison result of the cosine similarity and a first preset threshold; one of the sets of associated feature vectors corresponds to a predetermined historical reject cause;
determining occurrence frequencies of the keyword feature vectors in the associated feature vector sets according to the obtained associated feature vector sets, so as to determine occurrence probability values corresponding to the keyword feature vectors respectively according to the occurrence frequencies;
and adding the bill sub-information of each history reject bill information, the corresponding keyword feature vector and the occurrence probability value of each history reject bill information to a data dictionary as model training samples so as to train the logistic regression model.
4. A method for assisted review of documents as claimed in claim 3 in which training the logistic regression model comprises:
sequentially inputting the associated feature vector set and the corresponding occurrence frequency value corresponding to each bill sub-information in the data dictionary into a logistic regression model to be trained so as to train the logistic regression model to be trained;
under the condition that the logistic regression model to be trained is trained, updating corresponding model parameter values through a gradient descent algorithm until the corresponding model parameter values are determined so that the function value of the logistic regression cost function is smaller than a second preset threshold value, and obtaining the logistic regression model after training is completed.
5. An auxiliary auditing method for a document according to claim 1, the method further comprising:
acquiring each piece of real-time updated to-be-checked document information and corresponding reject reason text;
and adding the to-be-checked list data information and the corresponding reject reason text to a preset database to update a model training sample, and retraining the logistic regression model.
6. The auxiliary auditing method for documents according to claim 1, wherein generating a reject analysis text according to a reject cause text set corresponding to the reject predictor information sequence specifically comprises:
inputting the reject cause text set into a preset analysis text generation model so as to combine the words of the reject cause text; the analysis text generation model is obtained based on training of a plurality of reject reason text word samples and corresponding reject reason sentences;
and determining the reject analysis text according to the output result of the analysis text generation model.
7. An assisted review method for documents as claimed in claim 6, further comprising:
responding to the rejection operation of the auditing terminal, and sending prompt information corresponding to the rejection analysis text to the auditing terminal; the prompt message comprises a control for checking the refused analysis text;
the prompt information is sent to the user terminal under the condition that the auditing terminal does not add the rejection analysis text to the rejection reason text box of the rejection operation, so that the user terminal can check the rejection analysis text;
and under the condition that the checking terminal adds the reject reason description text in the reject reason text box, comparing the reject reason description text with the reject analysis text, and sending the reject reason description text and/or the prompt message to the user terminal according to a text comparison result.
8. The auxiliary auditing method for documents according to claim 1, wherein the sending the reject cause description text and/or the prompt message to the user terminal according to the text comparison result specifically comprises:
respectively calculating text feature vectors corresponding to the reject cause description text and the reject analysis text, and corresponding text similarity; the text similarity is cosine similarity;
sending the prompt information to the user terminal under the condition that the text similarity is larger than a similarity threshold;
and under the condition that the text similarity is smaller than or equal to a similarity threshold, sending the rejection reason description text to the user terminal, adding the rejection reason description text and the to-be-checked list data information to a preset database to update a model training sample, and retraining the logistic regression model.
9. An auxiliary auditing apparatus for documents, the apparatus comprising:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of assisted audit of documents according to any of the preceding claims 1-8.
10. A non-volatile computer storage medium for the assisted verification of documents, storing computer executable instructions capable of performing a method for the assisted verification of documents as claimed in any one of claims 1 to 8.
CN202311183836.5A 2023-09-13 2023-09-13 Auxiliary auditing method, equipment and medium for bill Pending CN117151647A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311183836.5A CN117151647A (en) 2023-09-13 2023-09-13 Auxiliary auditing method, equipment and medium for bill

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311183836.5A CN117151647A (en) 2023-09-13 2023-09-13 Auxiliary auditing method, equipment and medium for bill

Publications (1)

Publication Number Publication Date
CN117151647A true CN117151647A (en) 2023-12-01

Family

ID=88902382

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311183836.5A Pending CN117151647A (en) 2023-09-13 2023-09-13 Auxiliary auditing method, equipment and medium for bill

Country Status (1)

Country Link
CN (1) CN117151647A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117495314A (en) * 2024-01-02 2024-02-02 尚恰实业有限公司 Automatic approval method and system based on machine learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117495314A (en) * 2024-01-02 2024-02-02 尚恰实业有限公司 Automatic approval method and system based on machine learning
CN117495314B (en) * 2024-01-02 2024-04-02 尚恰实业有限公司 Automatic approval method and system based on machine learning

Similar Documents

Publication Publication Date Title
CN107958317B (en) Method and device for selecting crowdsourcing participants in crowdsourcing project
CN110070391B (en) Data processing method and device, computer readable medium and electronic equipment
Slapin et al. Words as data: Content analysis in legislative studies
CN117151647A (en) Auxiliary auditing method, equipment and medium for bill
CN107122432A (en) CSR analysis method, device and system
CN109359302A (en) A kind of optimization method of field term vector and fusion sort method based on it
CN107403325A (en) Air ticket order reliability evaluation method and device
CN110310012A (en) Data analysing method, device, equipment and computer readable storage medium
CN115423578A (en) Bidding method and system based on micro-service containerization cloud platform
CN113672797A (en) Content recommendation method and device
Haryono et al. Aspect-based sentiment analysis of financial headlines and microblogs using semantic similarity and bidirectional long short-term memory
CN117911039A (en) Control method, equipment and storage medium for after-sales service system
Latypova Reviewer assignment decision support in an academic journal based on multicriteria assessment and text mining
CN117172508A (en) Automatic dispatch method and system based on city complaint worksheet recognition
CN111353728A (en) Risk analysis method and system
CN116975910A (en) Method, device, equipment and medium for determining security level of data table
Kanchinadam et al. Graph neural networks to predict customer satisfaction following interactions with a corporate call center
CN116757835A (en) Method and device for monitoring transaction risk in credit card customer credit
TW202117584A (en) Intelligent conversation management method and system based on natural language processing for managing man-machine conversations related to different business fields
CN116611911A (en) Credit risk prediction method and device based on support vector machine
US11922352B1 (en) System and method for risk tracking
Koromyslova et al. Feature selection for natural language call routing based on self-adaptive genetic algorithm
CN114881600A (en) Evaluation method and system for reimbursement items
Su et al. Detection of tax arrears based on ensemble leaering model
Guo et al. Prediction and analysis of success on crowdfunding projects

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination