CN114662457A - Information generation method, device, equipment and computer storage medium - Google Patents

Information generation method, device, equipment and computer storage medium Download PDF

Info

Publication number
CN114662457A
CN114662457A CN202210299639.9A CN202210299639A CN114662457A CN 114662457 A CN114662457 A CN 114662457A CN 202210299639 A CN202210299639 A CN 202210299639A CN 114662457 A CN114662457 A CN 114662457A
Authority
CN
China
Prior art keywords
information
data
financial
calculation
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210299639.9A
Other languages
Chinese (zh)
Inventor
罗燕龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
Original Assignee
China Construction Bank Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp filed Critical China Construction Bank Corp
Priority to CN202210299639.9A priority Critical patent/CN114662457A/en
Publication of CN114662457A publication Critical patent/CN114662457A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/169Annotation, e.g. comment data or footnotes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The application discloses an information generation method, an information generation device, information generation equipment and a computer storage medium, which can acquire financial information including remark information, identify key content in the remark information through a preset language identification model, and generate first text information corresponding to the key content. Therefore, manual operation can be reduced, and more objective and accurate calculation results can be obtained during subsequent calculation. Then, performing matching calculation on the first text information and preset historical remark information to obtain a first result, wherein the first result is used for representing the matching degree of the text information and the preset historical remark information; and then generating first prompt information according to the first calculation result. Therefore, whether the first text information is in compliance or not can be determined by matching the calculated result, accurate analysis of compliance of the remarked information is achieved, the analysis result of the remarked information can be applied to an analysis and evaluation scene of the enterprise financial information, and accurate and objective analysis and evaluation results can be obtained.

Description

Information generation method, device, equipment and computer storage medium
Technical Field
The present application belongs to the field of financial technologies, and in particular, to an information generating method, apparatus, device, and computer storage medium.
Background
Financial data is a very important data resource for enterprises. The financial data usually takes the report forms as carriers, can embody data information with higher density, can be used for evaluating the operation condition of the enterprise, the repayment capacity, the willingness and the like, and comprehensively reflects the financial condition, the operation result, the cash flow condition and the like of the enterprise. In the related technology, a person who usually needs a relevant professional in financial auditing reads a financial statement, collects data inside and outside the industry, evaluates important concerned data items and indexes by combining financial data in the statement, and then realizes analysis, diagnosis and prediction of enterprise operation risks.
However, the analysis mode of the financial data in the related technology has strong dependence on professional ability of personnel. The method is limited by factors such as professional ability and working ability of personnel, and the analysis mode has the problems of strong subjectivity of analysis results and low working efficiency. And because the relevant data is collected manually, the data dispersity is high, and the efficiency and the accuracy of data analysis are easy to reduce.
Disclosure of Invention
The embodiment of the application provides an information generation method, an information generation device, information generation equipment and a computer storage medium, and can improve the processing efficiency and accuracy of financial information.
In a first aspect, an embodiment of the present application provides an information generating method, where the method includes:
acquiring financial information, wherein the financial information comprises remark information;
identifying key contents in the remark information through a preset language identification model, and generating first text information corresponding to the key contents;
performing matching calculation on the first text information and preset historical remark information to obtain a first result, wherein the first result is used for representing the matching degree of the text information and the preset historical remark information;
and generating first prompt information according to the first calculation result.
In some embodiments, obtaining financial information comprises:
acquiring a first image, wherein the first image comprises text content of financial data;
identifying the inclination angle of the first image through an angle identification model to obtain inclination angle information;
according to the inclination angle information, carrying out deviation rectification processing on the first image to obtain a second image, wherein the second image comprises text content;
and extracting the text content in the second image through an optical character recognition model to obtain financial information data corresponding to the text content.
In some embodiments, after generating the first text information corresponding to the key content, the method further comprises:
classifying and calculating words and sentences contained in the first text information through a preset classifier to obtain a second calculation result, wherein the classifier is obtained by training risk categories corresponding to sensitive word and sentence labels, and the sensitive word and sentence labels are from a preset database;
performing feature extraction calculation on words and sentences of the first text information through a preset first model to obtain a third calculation result, wherein the first model is obtained by training according to event risk types corresponding to historical data;
and performing fusion calculation on the second calculation result and the third calculation result to obtain risk evaluation data corresponding to the first text information.
In some embodiments, the financial data further includes reporting data, the reporting data including a plurality of data items; after acquiring the financial data, the method comprises:
determining data items with collusion relations from a plurality of data items;
verifying the data items with the checking relationship through a preset verification rule to obtain a verification result;
and generating second prompt information under the condition that the checking relation of the corresponding data items represented by the verification result is not established.
In some embodiments, the financial data further comprises reporting data, the reporting data comprising a plurality of data items; after acquiring the financial data, the method further comprises:
determining target data items corresponding to different event risk types from a plurality of data items;
performing feature extraction on the data of the target data item through a plurality of preset event risk identification models to obtain event risk types corresponding to report data;
and generating third prompt information according to the event risk type corresponding to the report data.
In a second aspect, the present application provides an information generating apparatus, comprising:
the first acquisition module is used for acquiring financial information, and the financial information comprises remark information;
the first identification module is used for identifying key contents in the remark information through a preset language identification model and generating first text information corresponding to the key contents;
the first calculation module is used for performing matching calculation on the first text information and the preset historical remark information to obtain a first result, and the first result is used for representing the matching degree of the text information and the preset historical remark information;
and the first generation module is used for generating first prompt information according to the first calculation result.
In some embodiments, the first obtaining module comprises:
the acquisition submodule is used for acquiring a first image, and the first image comprises text content of financial data;
the identification submodule is used for identifying the inclination angle of the first image through the angle identification model to obtain inclination angle information;
the deviation rectifying sub-module is used for rectifying deviation of the first image according to the inclination angle information to obtain a second image, and the second image comprises text content;
and the extraction submodule is used for extracting the text content in the second image through the optical character recognition model to obtain the financial information corresponding to the text content.
In some embodiments, the apparatus further comprises:
the second calculation module is used for performing classification calculation on words and sentences contained in the first text information through a preset classifier to obtain a second calculation result, the classifier is obtained by training according to risk categories corresponding to sensitive word and sentence labels, and the sensitive word and sentence labels are from a preset database;
the third calculation module is used for performing feature extraction calculation on words and sentences of the first text information through a preset first model to obtain a third calculation result, and the first model is obtained by training according to event risk types corresponding to historical data;
and the fourth calculation module is used for performing fusion calculation on the second calculation result and the third calculation result to obtain risk evaluation data corresponding to the first text information.
In some embodiments, the financial data further comprises reporting data, the reporting data comprising a plurality of data items; the device still includes:
the device comprises a first determining module, a second determining module and a third determining module, wherein the first determining module is used for determining data items with collusion relations from a plurality of data items;
the verification module is used for verifying the data items with the checking relationship through a preset verification rule to obtain a verification result;
and the second generation module is used for generating second prompt information under the condition that the checking relation of the corresponding data items represented by the verification result is not established.
In some embodiments, the financial data further includes reporting data, the reporting data including a plurality of data items; the device still includes:
the second determining module is used for determining target data items corresponding to different event risk types from the plurality of data items;
the characteristic extraction module is used for extracting characteristics of the data of the target data item through a plurality of preset event risk identification models to obtain an event risk type corresponding to the report data;
and the third generation module is used for generating third prompt information according to the event risk type corresponding to the report data.
In a third aspect, an embodiment of the present application provides an electronic device, where the device includes: a processor and a memory storing computer program instructions;
the processor, when executing the computer program instructions, implements the information generating method as described in the first aspect.
In a fourth aspect, the present application provides a computer storage medium having computer program instructions stored thereon, where the computer program instructions, when executed by a processor, implement the information generating method according to the first aspect.
In a fifth aspect, the present application provides a computer program product, and when executed by a processor of an electronic device, the instructions of the computer program product cause the electronic device to execute the information generation method according to the first aspect.
The information generation method, the information generation device, the information generation equipment and the computer storage medium can acquire financial information, wherein the financial information comprises remark information; and then, identifying key contents in the remark information through a preset language identification model, and generating first text information corresponding to the key contents. According to the embodiment of the application, the key content of the remark information of the financial information can be identified and extracted through the preset language identification model, manual operation is reduced, and a more objective and accurate calculation result is obtained when subsequent calculation is facilitated. Performing matching calculation on the first text information and preset historical remark information to obtain a first result, wherein the first result is used for representing the matching degree of the text information and the preset historical remark information; and then generating first prompt information according to the first calculation result. Therefore, whether the first text information is in compliance or whether the first text information is different from the historical information can be determined by matching the calculated result, so that multi-angle analysis of the remarked information is realized, and the method can be applied to the analysis and evaluation scene of the financial information of the enterprise to obtain an accurate and objective analysis and evaluation result.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flowchart of an information generating method according to an embodiment of the present application;
FIG. 2 is a schematic illustration of a process for obtaining financial data in an embodiment of the present application;
FIG. 3 is a schematic diagram of a network model in a specific example of the present application;
FIG. 4 is a schematic diagram of a CNN model according to a specific example of the present application;
fig. 5 is a schematic structural diagram of an information generating apparatus according to another embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to still another embodiment of the present application.
Detailed Description
Features and exemplary embodiments of various aspects of the present application will be described in detail below, and in order to make objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are intended to be illustrative only and are not intended to be limiting. It will be apparent to one skilled in the art that the present application may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present application by illustrating examples thereof.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The information density of the financial statement is higher, and the financial statement has extremely strong specialty when reading related personnel. For general bank practitioners, the risk condition of an enterprise cannot be directly or comprehensively analyzed based on the data on the financial statement, and the financial statement analysis tool needs to be used for calculating and processing the statement data. The personnel can be based on the professional knowledge of the personnel, and the operation risk condition of the enterprise is analyzed by combining the processing result of the financial statement analysis tool on the statement data. However, the risk condition analysis method is limited by the influence of professional ability and working ability of personnel, the risk condition analysis result is often subjective, the accuracy cannot be guaranteed, and the analysis efficiency is low.
In addition, in the process of analyzing the operation risk condition of the enterprise, the professional reads the remark information of the financial statement, which is needed by the financial statement. The remark information is key explanation and description information of the financial statement, and the financial statement data can be calculated and processed through a corresponding analysis tool, but at present, no tool for inputting and displaying the remark information exists. Also there is the problem because of individual professional ability restriction when relevant personnel read the remark information, lead to reading the accuracy and can't guarantee, reading inefficiency, influence relevant personnel then and read and analyze financial statement, influence the operation risk situation of enterprise and wait to analyze accuracy and efficiency.
In order to solve the prior art problem, embodiments of the present application provide an information generation method, apparatus, device, and computer storage medium, which can automatically identify financial information through a preset model, extract key content of the financial information to perform calculation processing, generate corresponding calculation results and prompt information of related risks, and reduce the influence on financial information analysis results due to the limitation of personal abilities of personnel. First, an information generating method provided in the embodiment of the present application is described below.
It should be understood that, in the technical solution of the present application, the acquisition, storage, use, processing, etc. of the related data of the assets, finance, etc. all conform to the related regulations of the national laws and regulations.
Fig. 1 shows a flowchart of an information generating method according to an embodiment of the present application. As shown in fig. 1, the method includes steps S101 to S104:
s101, acquiring financial information, wherein the financial information comprises remark information;
s102, identifying key contents in the remark information through a preset language identification model, and generating first text information corresponding to the key contents;
s103, performing matching calculation on the first text information and preset historical remark information to obtain a first result, wherein the first result is used for representing the matching degree of the text information and the preset historical remark information;
and S104, generating first prompt information according to the first calculation result.
According to the information generation method, the financial information including the remark information can be acquired, then the key content in the remark information is identified through the preset language identification model, and the first text information corresponding to the key content is generated. Therefore, the key content of the remarked information of the financial information can be identified and extracted through the preset language identification model, manual operation is reduced, and a more objective and accurate calculation result can be obtained during subsequent calculation. Then, performing matching calculation on the first text information and preset historical remark information to obtain a first result, wherein the first result is used for representing the matching degree of the text information and the preset historical remark information; and then generating first prompt information according to the first calculation result. Therefore, whether the first text information is in compliance or whether the first text information is different from the historical remark information can be determined by matching the calculated result, so that the compliance and accuracy of the remark information can be analyzed, the analysis result of the remark information can be applied to an analysis and evaluation scene of the enterprise financial information, and the accurate and objective analysis and evaluation result can be obtained.
In the embodiment of the application, the financial information may include report data and remark information of the report data. The financial statement is an accounting statement reflecting the fund and profit status of an enterprise or a budget unit in a certain period, such as an asset liability statement, a profit statement, a cash flow table, an owner equity change table and the like. The remark information is the text description or detail data of the items listed in the financial report, and the description of the items which can not be listed in the report. The remark information can disclose the establishment basis of the financial statement, and the related information in the remark information is mutually referred to the items listed in the statement such as an asset liability statement, a profit statement, a cash flow table and an owner equity change table.
Illustratively, the remark information discloses at least one or more of the following information in the following order:
I1. the basic situation of the enterprise; the method specifically comprises the following steps: enterprise registry, organizational form and headquarters address; business nature and major business activities of the enterprise; the names of the parent and the group's final parent; the approving and issuing person of the financial report and the approving and issuing date of the financial report are the same or the signatory and the signing date thereof are the same; enterprises with limited business deadlines should also disclose information about their business deadlines.
I2. Compiling a financial statement;
I3. a statement to comply with enterprise accounting criteria;
I4. a description of important accounting policies and accounting estimates; the description of the important accounting policy includes the measurement basis of the financial statement items and the important judgment made in the process of applying the accounting policy. And (3) description of important accounting estimation, including determination basis of accounting estimation which can cause important adjustment of assets, liabilities and account values in the next accounting period, and the like. The description also includes a description of important accounting policies and accounting estimates employed by the enterprise, the basis of determination of the important accounting policies and the basis of measurement of the financial statement entries, and the key assumptions and uncertainty factors employed by the accounting estimates.
I5. Accounting policy and accounting change and error correction; where the description includes accounting policies and accounting estimates changes and error correction.
I6. And the description of the important items of the report comprises the description of the important items of the report in a mode of combining text and digital description according to the sequence listed by the balance sheet, the profit sheet, the cash flow sheet, the ownership change sheet and the items thereof. The total statement amount of the important items of the statement should be linked with the amount of the statement items. The description also includes profit sheet supplement data that categorizes costs by nature, which may be categorized as consumed raw materials, worker compensation costs, depreciation costs, amortization costs, and the like.
I7. Items needing to be explained, such as commitment items, non-adjustment items after the balance sheet, relationship of related parties and transactions thereof, and the like; and the number of the first and second groups,
I8. the method is helpful for the financial statement user to evaluate the target, policy and program information of enterprise management capital.
Since most of the financial information comes from the banking system, and usually includes structured data (such as numerical data) and unstructured data (such as image data), in order to improve the data processing efficiency in analyzing the financial information, in the embodiment of the present application, the unstructured data may be extracted through an Optical Character Recognition (OCR) technology, so as to extract main text content in the unstructured data. Specifically, in this embodiment, as shown in fig. 2, the step S101 of acquiring the financial information may include steps S1011 to S1014:
s1011, acquiring a first image, wherein the first image comprises text content of financial data.
The first image may be an image of a paper document such as a bill, a report, an invoice, etc., and the images may be images captured by a scanner, a camera, etc. The first image may be an image in any format, such as png, jpg, bmp, and the like, and this embodiment is not limited thereto.
For example, the first image may be imported to a video platform by an accounting firm authorized to be accessed by a bank, and invoked from the video platform when performing the enterprise financial information analysis.
S1012, identifying the inclination angle of the first image through an angle identification model to obtain inclination angle information.
The angle recognition model may be a deep Convolutional Neural Network (CNN). In this embodiment, the CNN may be trained in advance according to the corresponding inclination angles of the images of the paper documents such as a large number of bills, statements, invoices, and the like, so as to obtain a trained angle recognition model, and after the angle recognition model obtains one image, the inclination angle of the image may be recognized.
Due to the limitation of conditions such as user shooting habits, view finding requirements, random interference and the like, the quality of image data acquired by the banking system is different. In order to improve the accuracy of image optical character recognition, image preprocessing operations such as image enhancement, gaussian filtering denoising, edge detection, tilt correction and the like can be performed on the acquired first image.
For example, the angle recognition model may employ a pre-trained VGG16 network model. The VGG16 network model is one of CNN models, the angle recognition model can adopt the VGG16 network model for transfer learning, and the direction of characters in the first image can be judged. In this example, referring to fig. 3, the VGG16 network model 300 is a model including 16 layers of networks, and specifically includes 13 convolutional layers 301 and 3 fully-connected layers 302, where one pooling layer 303 is set after every 2-3 convolutional layers 301, and until the last convolutional layer 301 is connected to the 3 fully-connected layers 302, an output node (using softmax function) 304 is set for outputting the recognition result. The first image passes through these convolution layer 301 and full-link layer 320 in sequence, and is processed by convolution pooling, etc., so that the tilt angle of the first image, i.e., the tilt angle in the direction of the characters (i.e., text content) in the image, can be identified and output.
Illustratively, in the actual image acquisition process, the image scanned by the document is often inclined in the directions of 0 °, 90 °, 180 ° and 270 °, and the inclination of other angles is rare. For the characteristics of the image tilt phenomenon, the present embodiment may classify the tilt angle of the image into tilt types in four directions, such as 0 °, 90 °, 180 °, 270 °, and train the neural network, so as to obtain the angle recognition model for determining the tilt angle of the text direction in the first image.
And S1013, performing deviation rectification processing on the first image according to the inclination angle information to obtain a second image, wherein the second image comprises text contents.
After the inclination angle information of the first image is determined through the angle identification model, the image can be rotated by a corresponding angle according to the inclination angle information, so that the deviation rectification processing is realized, and the text content in the image can be accurately identified subsequently.
And after the first image is rotated by the corresponding angle, a second image is obtained, and the second image and the first image contain the same text content but different inclination angles.
And S1014, extracting the text content in the second image through the optical character recognition model to obtain the financial information corresponding to the text content.
The OCR model, which is an optical character recognition model, can detect the position, the area range, and the layout of the text content in the second image, or can detect the layout, the lines of characters, and the like of the second image, and detect which areas in the second image have characters, and how large the area range has characters.
And then after the OCR model determines the information such as the position, the range and the like of the character, the OCR model can identify the text content in the position and the area range, and the text content in the second image is converted into text information.
Illustratively, the optical character recognition model may employ a standard CNN model (removing the fully connected layer). The standard CNN model includes a plurality of convolutional layers and a plurality of pooling layers and a circulation layer (RNN) connected in sequence, where, as shown in fig. 4, the convolutional layers and the largest pooling layer in the CNN model 400 are configured as a convolutional layer component 401, and are used to extract sequence features of text contents from the second image, and obtain a corresponding feature vector sequence as input data of the circulation layer RNN 402. The RNN has a strong ability to capture context information within a sequence, and can recognize a wider character by a complete description of consecutive frames. In this example, the RNN402 specifically uses a Long Short-Term Memory (LSTM) network to solve the problem of gradient disappearance in the feature recognition process, which is beneficial to obtaining a more accurate character recognition result. The LSTM is composed of a storage unit 403 and three multiple gates, namely an input gate 404, an output gate 405 and a forgetting gate 406, the storage unit 403 stores the context in the information, the input gate 404 and the output gate 405 allow the unit 403 to store the context information for a long time, meanwhile, part of the information stored in the unit 403 can be cleared by the forgetting gate 406, and the specific design of the LSTM allows the LSTM to capture long-distance memory, and finally obtain the text information of the text content.
After the text information of the text content is obtained through the optical character recognition module, the information can be checked again to ensure the correctness of the information. The checking and correcting of the text information can be calculated by matching with the vocabulary in the word stock, and the correctness of the text information is determined.
The text information after the second image is checked can be matched with the keywords in the regular expression, the key information in the text information can be extracted (for example, for the identity card image, the OCR recognition model can extract specific values corresponding to the name, gender, ethnicity, year and month of birth, address and identity card number attributes, and then directly stores the specific values into an identity card information table), and corresponding matching templates are designed for different types of images and converted into structured financial information.
It should be understood that the financial reports, the remark information and the like can be automatically identified and lifted in the manners of S1011 to S1014, and structured data information is obtained for analyzing the financial data of the enterprise. In the embodiment of the present application, after the financial information including the remark information is obtained in step S101, step S102 may be executed to capture first text information of key content in the remark information through a preset language identification model, and according to the text information, matching calculation may be performed to analyze compliance of the remark information, so as to analyze whether financial risk information exists in the remark information. Specifically, in this embodiment, after step S101, the method may further include S105 to S107:
s105, performing classification calculation on words and sentences contained in the first text information through a preset classifier to obtain a second calculation result, wherein the classifier is obtained by training according to risk categories corresponding to sensitive word and sentence labels, and the sensitive word and sentence labels are from a preset database;
s106, performing feature extraction calculation on words and sentences of the first text information through a preset first model to obtain a third calculation result, wherein the first model is obtained by training according to event risk types corresponding to historical data;
and S107, performing fusion calculation on the second calculation result and the third calculation result to obtain risk evaluation data corresponding to the first text information.
In this embodiment, in the process of executing step S105, the first text information may be decomposed by a Natural Language Processing (NLP) algorithm, the words and phrases of the first text information may be decomposed into usable units, such as units of phrases and words, and then stem extraction is performed according to the decomposed first text information, the keyword and sentence in the first text information are extracted, and syntactic analysis and semantic analysis are performed on the keyword and sentence, so as to understand the word and sentence structure and semantic information of the words and sentences.
In this embodiment, after obtaining the keyword sentence of the first text information, the keyword sentence of the first text information is converted into digital information by a Word Embedding (Word Embedding) rule. Word embedding rules may map each word to a separate vector, and vectors tend to be dense. In the mapping process, each phrase and the surrounding phrases are subjected to associated mapping, and the generated dense vectors can help better analyze and compare keyword sentences and context information in the annotation information. Meanwhile, in order to overcome noise generated by the fact that certain text information appears in a Document with too high Frequency and bears low information, such as 'yes' and 'yes', the noise can be processed through a weighting algorithm (such as Term Frequency-Inverse Document Frequency, TF-IDF, namely word Frequency-Inverse file Frequency) for information retrieval and data mining, so that the frequently appearing word group Frequency is faded, and the word group weight bearing more valuable information is highlighted.
After obtaining the keyword and sentence of the first text information, for example, in step S105, the preset classifier is used to perform classification calculation on the word and sentence included in the first text information, so as to obtain a second calculation result.
The classifier can adopt one of classification algorithms such as an artificial neural network, a support vector machine, a Bayesian algorithm and the like to realize the word and sentence classification of the first text information. Specifically, in this example, a large amount of historical annotation information may be obtained, and through experts or relevant professionals, the words and phrases of the historical annotation information and the terms to which the words and phrases belong are decomposed and labeled to form labels, and the labels such as the labeled words and terms are classified to construct the sensitive word database. Model training is carried out on the basis of one of an artificial neural network, a support vector machine, a Bayesian algorithm and the like through the sensitive word database to obtain the classifier, so that automatic classification of the keyword sentences and the corresponding clause information of the annotated information is realized through the classifier. The sensitive word database is used for storing word and sentence descriptions related to enterprise illegal and illegal events in the remarked information. For example, the sensitive word categories in the sensitive word database may be: the included sensitive words are word and sentence categories of exaggerated publicity classes, or description word and sentence categories and the like which frequently participate in activities unrelated to the main business, and when the keyword and sentence categories of the remarked information belong to any one or more of the word and sentence categories, the word and sentence descriptions of illegal and illegal events of the remarked information in the financial information of the enterprise can be shown, namely, the enterprise has an operation risk condition.
In this example, the keyword sentence of the first text information is input into the classifier, classification calculation is performed on the keyword sentence included in the first text information, and the category of the keyword sentence is determined, so that whether the keyword sentence included in the first text information belongs to the sensitive word classification set in the sensitive word database, or which sensitive word category the keyword sentence belongs to, or the like is determined.
In this embodiment, in addition to the sensitive word judgment on the keyword and sentence of the remarked information, in step S106, the feature extraction calculation is performed on the word and sentence of the first text information by using the first model obtained by training according to the event risk type corresponding to the historical data, so as to determine whether the word and sentence of the first text information corresponds to one or more event risks.
For example, the first model may also adopt one of classification algorithms such as an artificial neural network, a support vector machine, a bayesian algorithm, and the like to classify words and sentences of the first text information and determine event risk types corresponding to the words and sentences. Specifically, the first model is obtained by training according to event risk types corresponding to historical financial data, where the event risk may include an event risk in a management level, an event risk in a relationship level, an event risk in an organizational structure or industry level, an event risk in a financial result and business level, an event risk in sales income, an event risk in sales cost, an event risk in liability and expense, an event risk in assets, and an event risk in information disclosure. These event risks may each be reflected by financial data. Therefore, the first model is obtained by obtaining a large amount of historical financial data and carrying out model training based on one of an artificial neural network, a support vector machine, a Bayesian algorithm and the like.
In this example, the keyword sentence of the first text information is input to the first model, classification calculation is performed on the keyword sentence included in the first text information, and the event risk category corresponding to the keyword sentence is determined, so that whether the word sentence of the annotation information corresponds to the event risk classification, which event risk category the word sentence belongs to, or the like is determined.
After the classification result of the first text information in the sensitive word aspect and the classification result in the event risk type aspect are obtained, through step S107, corresponding weights are configured for the two results, and the weights can be customized according to an actual scene. And then fusing the second calculation result and the third calculation result based on respective weights to obtain corresponding risk rating data. A specific example is that the second calculation result of the business action keyword sentence in the enterprise remark information is a, the third calculation result of the registered address keyword sentence is B, and the second calculation result a and the third calculation result B are respectively summed up with respective weights k and h to obtain risk evaluation data S corresponding to the first text information, that is, S ═ a · k + B · h, and k, h are constants.
The risk evaluation data can be a numerical value, the risk grade of the risk evaluation data is determined according to the matching of the data and a preset risk grade table, corresponding risk prompt information is generated and sent to relevant operators or auditing agencies to obtain, and the risk prompt information is used for comprehensively evaluating the risk condition of the financial information of the corresponding enterprise. And if the risk evaluation data S is 0.7 and the corresponding level in the risk level table is medium risk, generating medium risk prompt information and sending the medium risk prompt information to related operators or an auditing agency for obtaining.
In some specific examples, when the second calculation result and the third calculation result are subjected to fusion calculation, the second calculation result and the third calculation result may also be subjected to data integration according to pre-constructed economic financial research database for marketing company violation information issued by a regulatory agency, for example, if the numerical value of the second calculation result is a, the second calculation result corresponds to a-type violation information, and if the numerical value of the third calculation result is B, the third calculation result corresponds to a B-type risk type, and so on, the disclosure characteristic of violation can be obtained through data integration and aggregation, so as to determine the risk type reflected by the remark information, and then, the prompt information may be generated according to the result of data fusion.
In this embodiment, through steps S105 to S107, based on the sensitive word database and the historical financial data, the text-based information anomaly and the value anomaly are detected, and the compliance of the word and sentence content of the remarked information is confirmed, so that the remarked information of the non-compliant financial statement can be early warned to reduce the potential risk in the remarked item.
For example, after the compliance of the sentence content of the comment information is confirmed through steps S105 to S107, the first text information and the preset history comment information may be subjected to matching calculation through step S103, so as to obtain data (i.e., a first calculation result) of the degree of matching between the text information and the preset history comment information, such as the matching probability. The historical remark information can be stored with past years of remark information or compliant templated remark information, the first text information is matched with the preset historical remark information for calculation, so that the difference condition of the first text information compared with the corresponding remark information can be identified, and therefore, the first prompt information is generated according to the first calculation result, and the non-compliant remark information content is early warned.
In this example, the remark information of the same enterprise in past years is used as the history remark information, and similarity calculation is performed with the first text information, so that the remark information of the same enterprise is longitudinally matched and examined. In the longitudinal examination, the change situation of the items annotated on the financial statement of the same enterprise in the past years can be matched and compared, for example, when the important accounting policy and the accounting estimation are evaluated, the item name related to the task is taken as the extracted and identified key information, the information is encoded into a regular expression, and the similarity calculation is carried out to determine the change situation of the important accounting policy and the accounting estimation. In this example, alternatively, the similarity calculation may be performed on the standard comment information of the same-industry enterprise and the first text information, so as to implement the horizontal review of the standard comment information of the same-industry enterprise and the enterprise. In the horizontal examination, the difference between the standard remarks of the subdivided industries belonging to the enterprise is compared, and the change of the enterprise compared with the industry is judged. It will be appreciated that to the extent that the items of interest of the same business or industry have similarities, the related information can be populated through the same predefined templates, and thus the matching patterns are communicated.
It is understood that analyzing the sentence by NLP algorithm is a well-known technique in the art and will not be described herein.
According to the embodiment of the application, key content extraction and identification can be carried out on the remark information of the financial data of an enterprise, so that the compliance of the remark information is detected, the notice information is generated and fed back to relevant personnel or mechanisms to know when an abnormal result is detected, personnel participation can be reduced in the reading process of the remark information, the subjective limitation of the self ability of the personnel on reading the remark information is avoided, the improvement of the objectivity and the accuracy of the analysis and the processing of the financial data of the enterprise is facilitated, and the improvement of the analysis and the processing efficiency of the financial data is facilitated.
Illustratively, due to the complexity of the financial remark and statement item relation examination, the method can convert the uploaded financial statement text information into a directed graph form, classify and mark the remark item information, and realize identification and early warning of non-standardized information.
For example, in this embodiment, when an office initiates a limit credit or a single business process enters a stage of submitting approval acceptance, the process of identifying the financial information of the enterprise may be triggered to monitor the financial risk of the enterprise, generate prompt information to be sent to a compliance examiner, an approver and a lead approver, and then feed the prompt information back to a front office, which is responsible for checking the risk information and simultaneously inputs the checking result into a credit process system for the approver to look up, thereby implementing comprehensive evaluation and approval of the financial status of the enterprise.
In some embodiments, the reporting data in the financial data may include a plurality of data items. In order to meet the requirement of multiple risk identification on the financial data, in this embodiment, after the financial data is acquired in step S101, the method may further include:
determining data items with collusion relations from a plurality of data items;
verifying the data items with the checking relationship through a preset verification rule to obtain a verification result;
and generating second prompt information under the condition that the checking relation of the corresponding data items represented by the verification result is not established.
The checking relationship refers to the relationship between the related numbers in the account book and the accounting report form, which can be checked and examined with each other. For example, the sum of the end balance of each general classification account and the end balance of each secondary or detail classification account to which the general classification account belongs has a mutually consistent and checkable relationship. The checking relationship can comprise the internal checking relationship of the financial statements, the checking relationship among the financial statements, the checking relationship between the financial statements and the supplementary notes of the financial statements, and the like.
The financial statement data is mainly obtained from annual financial statement data in a credit system and documents related to credit service declaration, most of the annual financial statement audit documents are stored in a PDF format, and in some examples, the documents can be accessed, and document images are identified and text content is extracted through the steps S1011 to S1014, so that corresponding financial information is obtained.
After the financial information is obtained, according to each data item in the financial report, the checking relation between the data items or between the reports is determined through a preset checking relation configuration rule or a manual configuration mode. For example, the audit relationship configuration rule sets the sales income, sales tax, sales factory cost, sales expense, technical transfer fee, total data item of sales profit and the amount data item of the same item of profit list of the product sales detail list to have a mutual check relationship, and the data items form the audit relationship.
And then, verifying the data items with the checking relationship through a preset verification rule. The verification rule can verify the corresponding relation of the data items and the corresponding relation of the data in the data items, and whether the checking relation is established or not. If the data items with the checking relationship or the data of the data items with the checking relationship are traversed, the data of some data items or the data items are determined to be wrong, if the data of the total data items and the data of the money data items of the same project are inconsistent, the checking relationship is represented to be false by the obtained verification result, and second prompt information is generated for operators or institutions to know.
According to the embodiment of the application, items of checking relations existing in the financial statement can be checked one by one, and the items of incorrect relations of the enterprise financial statement are warned, so that errors are reduced, and the accuracy of enterprise financial statement identification is guaranteed.
In some embodiments, after acquiring the financial data at step S101, the method may further include:
determining target data items corresponding to different event risk types from a plurality of data items;
performing feature extraction on the data of the target data item through a plurality of preset event risk identification models to obtain event risk types corresponding to the report data;
and generating third prompt information according to the event risk type corresponding to the report data.
For example, the event risk type may be one or more of the following types:
making up profit; the items of subjects related to profit increase, such as fictitious income, cost expense, and the like, include hiding loss with the purpose of fictitious profit, misleading investors about profitability of fictitious companies, and specifically include actions of hiding transaction facts through fictitious sales objects, filling false invoices, confusing accounting subjects in accounting, and the like, or hiding transaction facts through entrusted financing, associated transactions, litigation, guaranties, capitalization occupied by capitalists, and the like, and making transaction actions undisclosed or untied.
Virtual listing of assets; namely, the items of assets increase caused by related subjects such as fictional liabilities and the like, and the investment value of the fictional liabilities and the like misleads investors. For example, assets that are no longer valuable can be hung up on account, or assets that have already been discounted can be prepared without the asset subtraction, thereby increasing the value of the company assets.
False documentation (e.g., misleading statements); that is, the related subject who bears the obligation to disclose information makes a written action in the document disclosed by the related subject who is not in accordance with the fact while fulfilling the obligation to disclose information.
Major omissions; that is, the information disclosure document does not fully or only partially describe the matters to be described, and does not disclose the information according to the regulations, thereby weakening the timeliness and the relevance of the accounting information and influencing the decision of the investor.
For each of the above event risk types, different target data items may be determined to be identified by the corresponding event risk identification model.
As a specific example, the event risk identification model is a model using a bayesian algorithm. Target data items can be selected from a plurality of data items to form key financial characteristic indexes, such as data items of inventory increase/business income increase, inventory increase/main operation cost increase, inventory increase/account payment increase, fixed assets/accumulated depreciation, fixed assets/total assets and the like as characteristic indexes, corresponding data is input into the event risk identification model, abnormal profit risk identification is realized by Bayesian discrimination, and whether obvious profit violation operation behaviors exist in enterprises is identified.
As a specific example, the event risk identification model is a model using a decision tree algorithm. Financial reports containing deceptive languages can be screened from the report data, linguistic features are extracted from the financial reports, original data features are subjected to dimensionality reduction through a principal component analysis method, and problematic financial reports are marked from the perspective of linguistic analysis through a decision tree algorithm.
As a specific example, the event risk identification model is a model of a BP neural network. And selecting a target data item from the plurality of data items to form an enterprise key financial characteristic index, simplifying the indexes of profit violation operation by using a rough noise set, obtaining data of the data item corresponding to the index through Monte Carlo simulation, and inputting the data into a model to perform risk identification based on a BP neural network algorithm. And DEA efficiency index can be introduced to further improve based on the model, so that the second type of errors of the model can be effectively reduced.
As a specific example, the event risk identification model is a model of a logistic regression algorithm. The same number of non-illegal financial information samples can be matched according to the illegal financial information samples, the key pressure index and the key opportunity index are selected from a plurality of data items in all the samples according to the cheating triangle theory, a logistic regression model is established, the financial statement is subjected to feature extraction and recognition processing through the model, and risk recognition of illegal behaviors reflected by the corresponding financial statement is achieved.
As a specific example, the event risk identification model is a model of the AdaBoost algorithm. Selecting data items from the five aspects of profitability, growth capacity, operation capacity, repayment capacity and cash flow as key financial indexes, performing word segmentation and word frequency statistics on the text information of the financial information based on a preset financial dictionary through a text analysis algorithm (such as NLP), screening out characteristic variables, dividing a training set and a test set, establishing and training an identification model by using an AdaBoost algorithm to obtain an event risk identification model, performing feature extraction and identification processing on a financial statement through the trained event risk identification model, and performing risk identification on violation behaviors reflected by a corresponding financial statement.
As a specific example, the event risk identification model is a model of a random forest. Then, a random forest model can be built by using the existing industry report data, then the existing industry annual report text data is cleaned, then the text is digitized by using the pre-training word vector, a classification model based on a deep neural network is built, and the model is optimized by using an improved loss function. And finally, forming a random forest model by the two models, performing feature extraction and recognition processing on the financial statement by the two models in the random forest model, and synthesizing results output by the two models to obtain a final predicted value.
One specific example is the event risk identification model, the model of the RUSBoost algorithm. Then, the corresponding data item can be selected by using the original data of the existing industry financial statement as a key financial index, then a training sample data set and a test sample data set are divided according to the lag aging of the illegal behavior, the identification model is constructed and trained by using the RUSBoost algorithm to obtain an event risk identification model, then the characteristics of the financial statement are extracted and identified through the trained event risk identification model, and the illegal behavior reflected by the corresponding financial statement is subjected to risk identification.
As a specific example, the event risk identification model is a model of the Stacking algorithm. And selecting the existing industry key financial data as a characteristic variable, dividing a sample data set and a test sample data set according to the delayed aging of the illegal behavior, establishing an event risk identification model with a good prediction effect by integrating a single classifier and a clustering algorithm according to the advantages and disadvantages of each machine learning method, and then performing characteristic extraction and identification processing on the financial statement through the event risk identification model to perform risk identification on the illegal behavior reflected by the corresponding financial statement.
Illustratively, identifying the risk of an event that constitutes a profit may include the steps of:
selecting a net amount of cash flow/total amount of profit from the business activity, net amount of cash flow/total income from each business activity, net amount of cash flow/basic income from each business activity, accounts receivable/income, inventory/accounts payable, accounts payable/income, accounts payable/total assets, business cost/income, financial expense/business income, monetary fund/(accounts receivable + bills), monetary fund/total assets, tax receivable/business income, asset liability rate, property ratio, flow ratio, business cash ratio, cash flow structure ratio, business capital/total amount of money, cash flow rate of money, cash flow structure ratio, business capital/total amount of money, account receivable, account balance, and taking data items such as asset turnover rate, profit before tax interest/average total asset, retained profit/total asset and the like as target data items.
And acquiring the data of the data items in the example by adopting an event risk identification model of the BP neural network for identification, and determining whether the corresponding enterprise has an event risk of fictitious profit.
Illustratively, the risk identification of a fictitious profit may include the steps of:
and selecting data items such as inventory increase/business income increase, inventory increase/main operation and business cost increase, inventory increase/account payable increase, fixed assets/accumulated depreciation, fixed assets/total assets and the like in the financial statement data as target data items.
And acquiring the data of the data items in the example for identification through an event risk identification model adopting a Bayesian algorithm, and determining whether the corresponding enterprise has an event risk of fictitious profits.
For example, risk identification of false records may include:
and selecting data items such as flow rate, quick-action rate, interest guarantee multiple, asset liability rate, total asset growth rate, net profit growth rate, business income growth rate, account receivable turnover rate, flow asset turnover rate, income per share, net asset per share, unallocated profit per share, reserved income per share, net cash flow per share, asset return rate, total asset turnover rate, net asset income rate, business profit rate, business net interest rate, business index, cash appropriateness rate, share right concentration, circulation share rate, board party scale, board leader and general manager concurrent situation and the like in the financial statement data as target data items.
And acquiring the data of the financial indexes in the example for recognition through an event risk recognition model adopting a Lib SVM algorithm, and determining whether the corresponding enterprise has the event risk of false record. Actual verification shows that the correct recognition rate of the model to the accounting information distortion enterprise reaches 94.8%, and the model has good practical value.
According to the embodiment of the application, the identification of the risk types of the events can be triggered when a manager initiates credit line service or a single business process enters a submitting and examining and accepting stage. And corresponding reminding information is generated to inform relevant personnel or departments when the event risk types are identified, so that errors of the relevant personnel or departments in enterprise financial data analysis are reduced, and the accuracy of enterprise financial statement identification is guaranteed.
The information generation method according to the embodiment of the present application is described in detail above with reference to fig. 1 and 2, and the apparatus according to the embodiment of the present application will be described in detail below with reference to fig. 5.
As shown in fig. 5, the present application provides an information generating apparatus including:
a first obtaining module 501, configured to obtain financial information, where the financial information includes remark information;
a first identification module 502, configured to identify key content in the remark information through a preset language identification model, and generate first text information corresponding to the key content;
the first calculating module 503 is configured to perform matching calculation on the first text information and the preset history remark information to obtain a first result, where the first result is used to represent a matching degree between the text information and the preset history remark information;
the first generating module 504 is configured to generate first prompt information according to the first calculation result.
The information generation device of the embodiment of the application can acquire financial information including the remark information, then identifies key content in the remark information through a preset language identification model, and generates first text information corresponding to the key content. Therefore, the key content of the remarked information of the financial information can be identified and extracted through the preset language identification model, manual operation is reduced, and a more objective and accurate calculation result can be obtained during subsequent calculation. Then, performing matching calculation on the first text information and preset historical remark information to obtain a first result, wherein the first result is used for representing the matching degree of the text information and the preset historical remark information; and then generating first prompt information according to the first calculation result. Therefore, whether the first text information is in compliance or whether the first text information is different from the historical remark information can be determined by matching the calculated result, so that the compliance and accuracy of the remark information can be analyzed, the analysis result of the remark information can be applied to an analysis and evaluation scene of the enterprise financial information, and the accurate and objective analysis and evaluation result can be obtained.
In the embodiment of the application, the financial information may include report data and remark information of the report data.
In order to improve data processing efficiency in the process of analyzing financial information, in the embodiment of the application, unstructured data may be extracted through an Optical Character Recognition (OCR) technology, and main text content in the unstructured data is extracted. Specifically, in this embodiment, the first obtaining module 501 may specifically include:
the acquisition submodule is used for acquiring a first image, and the first image comprises text content of financial data;
the identification submodule is used for identifying the inclination angle of the first image through the angle identification model to obtain inclination angle information;
the deviation rectifying submodule is used for rectifying the deviation of the first image according to the inclination angle information to obtain a second image, and the second image comprises text content;
and the extraction submodule is used for extracting the text content in the second image through the optical character recognition model to obtain the financial information corresponding to the text content.
The first image may be an image of a paper document such as a bill, a report, an invoice, etc., and the images may be images captured by a scanner, a camera, etc. The first image may be an image in any format, such as png, jpg, bmp, and the like, and this embodiment is not limited thereto.
The angle recognition model may be a deep Convolutional Neural Network (CNN). In this embodiment, the CNN may be trained in advance according to the corresponding inclination angles of the images of the paper documents such as a large number of bills, statements, invoices, and the like, so as to obtain a trained angle recognition model, and after the angle recognition model obtains one image, the inclination angle of the image may be recognized.
For example, the angle recognition model may employ a pre-trained VGG16 network model.
After the inclination angle information of the first image is determined through the angle identification model, the image can be rotated by a corresponding angle according to the inclination angle information, so that the deviation rectification processing is realized, and the text content in the image can be accurately identified subsequently.
And after the first image is rotated by the corresponding angle, a second image is obtained, and the second image and the first image contain the same text content but different inclination angles.
The OCR model, which is an optical character recognition model, can detect the position, the area range, and the layout of the text content in the second image, or can detect the layout, the lines of characters, and the like of the second image, and detect which areas in the second image have characters, and how large the area range has characters.
And then after the OCR model determines the information such as the position, the range and the like of the character, the OCR model can identify the text content in the position and the area range, and the text content in the second image is converted into text information.
Illustratively, the optical character recognition model may employ a standard CNN model (removing the fully connected layer).
After the text information of the text content is obtained through the optical character recognition module recognition, the information can be checked again to ensure the correctness of the information. The checking and correcting of the text information can be calculated by matching with the vocabulary in the word stock, and the correctness of the text information is determined.
In some embodiments, the apparatus may further comprise:
the second calculation module is used for performing classification calculation on words and sentences contained in the first text information through a preset classifier to obtain a second calculation result, the classifier is obtained by training according to risk categories corresponding to sensitive word and sentence labels, and the sensitive word and sentence labels are from a preset database;
the third calculation module is used for performing feature extraction calculation on words and sentences of the first text information through a preset first model to obtain a third calculation result, and the first model is obtained by training according to event risk types corresponding to historical data;
and the fourth calculation module is used for performing fusion calculation on the second calculation result and the third calculation result to obtain risk evaluation data corresponding to the first text information.
Before the classification calculation is performed by the second calculation module, the first text information may be decomposed by a Natural Language Processing (NLP) algorithm, the words and sentences of the first text information may be decomposed into usable units, such as phrases and words, and then stem extraction is performed according to the decomposed first text information, the keyword and sentence in the first text information is extracted, and syntactic analysis and semantic analysis are performed on the keyword and sentence, so as to understand the word and sentence structure and semantic information of the words and sentences.
In this embodiment, after obtaining the keyword sentence of the first text information, the keyword sentence of the first text information is converted into digital information by a Word Embedding (Word Embedding) rule. Word embedding rules may map each word to a separate vector, and vectors tend to be dense. In the mapping process, each phrase and the surrounding phrases are subjected to associated mapping, and the generated dense vectors can help better analyze and compare keyword sentences and context information in the annotation information. Meanwhile, in order to overcome noise generated by the fact that Frequency of certain text information appearing in a Document is too high and bearing information is low, processing can be performed through a weighting algorithm (such as Term Frequency-Inverse Document Frequency, TF-IDF) for information retrieval and data mining, so that frequently appearing phrase Frequency is reduced, and phrase weight bearing more valuable information is highlighted.
After the keyword and the sentence of the first text information are obtained, the second calculation module can be used for carrying out classification calculation on the words and the sentences contained in the first text information by using a preset classifier to obtain a second calculation result.
The classifier can adopt one of classification algorithms such as an artificial neural network, a support vector machine, a Bayesian algorithm and the like to realize the word and sentence classification of the first text information.
For example, the first model may also adopt one of classification algorithms such as an artificial neural network, a support vector machine, a bayesian algorithm, and the like to classify words and sentences of the first text information and determine event risk types corresponding to the words and sentences.
In some embodiments, the financial data further comprises reporting data, the reporting data comprising a plurality of data items; the apparatus may further comprise:
the device comprises a first determining module, a second determining module and a third determining module, wherein the first determining module is used for determining data items with collusion relations from a plurality of data items;
the verification module is used for verifying the data items with the checking relationship through a preset verification rule to obtain a verification result;
and the second generation module is used for generating second prompt information under the condition that the checking relation of the corresponding data items represented by the verification result is not established.
According to the embodiment of the application, items of checking relations existing in the financial statement can be checked one by one, and the items of incorrect relations of the enterprise financial statement are warned, so that errors are reduced, and the accuracy of enterprise financial statement identification is guaranteed.
In some embodiments, the financial data further comprises reporting data, the reporting data comprising a plurality of data items; the device still includes:
the second determining module is used for determining target data items corresponding to different event risk types from the plurality of data items;
the characteristic extraction module is used for extracting characteristics of the data of the target data item through a plurality of preset event risk identification models to obtain an event risk type corresponding to the report data;
and the third generation module is used for generating third prompt information according to the event risk type corresponding to the report data.
According to the embodiment of the application, the identification of the risk types of the events can be triggered when a manager initiates credit line service or a single business process enters a submitting and examining and accepting stage. And corresponding reminding information is generated to inform relevant personnel or departments when the event risk types are identified, so that errors of the relevant personnel or departments in enterprise financial data analysis are reduced, and the accuracy of enterprise financial statement identification is guaranteed.
It should be noted that all relevant contents of each step related to the above method embodiment may be referred to the functional description of the corresponding functional module, and the corresponding technical effect can be achieved, and for brevity, no further description is provided herein.
Fig. 6 shows a hardware structure diagram of an electronic device provided in an embodiment of the present application.
The electronic device may comprise a processor 601 and a memory 602 in which computer program instructions are stored.
Specifically, the processor 601 may include a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.
Memory 602 may include mass storage for data or instructions. By way of example, and not limitation, memory 602 may include a Hard Disk Drive (HDD), floppy Disk Drive, flash memory, optical Disk, magneto-optical Disk, tape, or Universal Serial Bus (USB) Drive or a combination of two or more of these. Memory 602 may include removable or non-removable (or fixed) media, where appropriate. The memory 602 may be internal or external to the integrated gateway disaster recovery device, where appropriate. In a particular embodiment, the memory 602 is a non-volatile solid-state memory.
The memory may include Read Only Memory (ROM), Random Access Memory (RAM), magnetic disk storage media devices, optical storage media devices, flash memory devices, electrical, optical, or other physical/tangible memory storage devices. Thus, in general, the memory includes one or more tangible (non-transitory) computer-readable storage media (e.g., memory devices) encoded with software comprising computer-executable instructions and when the software is executed (e.g., by one or more processors), it is operable to perform operations described with reference to the methods according to an aspect of the application.
The processor 601 realizes any one of the information generation methods in the above embodiments by reading and executing computer program instructions stored in the memory 602.
In one example, the electronic device may also include a communication interface 603 and a bus 610. As shown in fig. 6, the processor 601, the memory 602, and the communication interface 603 are connected via a bus 610 to complete communication therebetween.
The communication interface 603 is mainly used for implementing communication between modules, apparatuses, units and/or devices in the embodiments of the present application.
Bus 610 includes hardware, software, or both to couple the components of the electronic device to each other. By way of example, and not limitation, a bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a Hypertransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus or a combination of two or more of these. Bus 610 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the present application, any suitable buses or interconnects are contemplated by the present application.
In addition, in combination with the information generation method in the foregoing embodiments, the embodiments of the present application may provide a computer storage medium to implement. The computer storage medium having computer program instructions stored thereon; the computer program instructions, when executed by a processor, implement any of the information generating methods of the above embodiments.
In combination with the information generation method in the foregoing embodiments, an embodiment of the present application provides a computer program product, where instructions in the computer program product, when executed by a processor of an electronic device, cause the electronic device to execute any one of the information generation methods in the foregoing embodiments.
It is to be understood that the present application is not limited to the particular arrangements and instrumentality described above and shown in the attached drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present application are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications, and additions or change the order between the steps after comprehending the spirit of the present application.
The functional blocks shown in the above-described structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the present application are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, Erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.
It should also be noted that the exemplary embodiments mentioned in this application describe some methods or systems based on a series of steps or devices. However, the present application is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.
Aspects of the present application are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions/acts specified in the flowchart and/or block diagram block or blocks. Such a processor may be, but is not limited to, a general purpose processor, a special purpose processor, an application specific processor, or a field programmable logic circuit. It will also be understood that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware for performing the specified functions or acts, or combinations of special purpose hardware and computer instructions.
As described above, only the specific embodiments of the present application are provided, and it can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the module and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present application, and these modifications or substitutions should be covered within the scope of the present application.

Claims (13)

1. An information generating method, characterized in that the method comprises:
acquiring financial information, wherein the financial information comprises remark information;
identifying key contents in the remark information through a preset language identification model, and generating first text information corresponding to the key contents;
performing matching calculation on the first text information and preset historical remark information to obtain a first calculation result, wherein the first calculation result is used for representing the matching degree of the text information and the preset historical remark information;
and generating first prompt information according to the first calculation result.
2. The method of claim 1, wherein said obtaining financial information comprises:
acquiring a first image, wherein the first image comprises text content of the financial data;
identifying the inclination angle of the first image through an angle identification model to obtain inclination angle information;
according to the inclination angle information, performing deviation rectification processing on the first image to obtain a second image, wherein the second image comprises the text content;
and extracting the text content in the second image through an optical character recognition model to obtain financial information corresponding to the text content.
3. The method of claim 1, wherein after the generating the first text information corresponding to the key content, the method further comprises:
classifying and calculating words and sentences contained in the first text information through a preset classifier to obtain a second calculation result, wherein the classifier is obtained by training according to risk categories corresponding to sensitive word and sentence labels, and the sensitive word and sentence labels are from a preset database;
performing feature extraction calculation on words and sentences of the first text information through a preset first model to obtain a third calculation result, wherein the first model is obtained by training according to event risk types corresponding to historical financial data;
and performing fusion calculation on the second calculation result and the third calculation result to obtain risk evaluation data corresponding to the first text information.
4. The method of claim 1, wherein the financial data further comprises reporting data, the reporting data comprising a plurality of data items; after the obtaining financial data, the method further comprises:
determining data items with collusion relations from the plurality of data items;
verifying the data items with the checking relationship through a preset verification rule to obtain a verification result;
and generating second prompt information under the condition that the checking relation of the verification result representation corresponding data items is not established.
5. The method of claim 1, wherein the financial data further comprises reporting data, the reporting data comprising a plurality of data items; after the obtaining financial data, the method further comprises:
determining target data items corresponding to different event risk types from the plurality of data items;
performing feature extraction on the data of the target data item through a plurality of preset event risk identification models to obtain an event risk type corresponding to the report data;
and generating third prompt information according to the event risk type corresponding to the report data.
6. An information generating apparatus, characterized in that the apparatus comprises:
the first acquisition module is used for acquiring financial information, and the financial information comprises remark information;
the first identification module is used for identifying key contents in the remark information through a preset language identification model and generating first text information corresponding to the key contents;
the first calculation module is used for performing matching calculation on the first text information and preset historical remark information to obtain a first result, and the first result is used for representing the matching degree of the text information and the preset historical remark information;
and the first generating module is used for generating first prompt information according to the first calculation result.
7. The apparatus of claim 6, wherein the first obtaining module comprises:
the acquisition submodule is used for acquiring a first image, and the first image comprises text content of the financial data;
the identification submodule is used for identifying the inclination angle of the first image through an angle identification model to obtain inclination angle information;
the deviation rectifying submodule is used for rectifying the deviation of the first image according to the inclination angle information to obtain a second image, and the second image comprises the text content;
and the extraction submodule is used for extracting the text content in the second image through an optical character recognition model to obtain financial information corresponding to the text content.
8. The apparatus of claim 6, further comprising:
the second calculation module is used for performing classification calculation on words and sentences contained in the first text information through a preset classifier to obtain a second calculation result, the classifier is obtained by training according to risk categories corresponding to sensitive word and sentence labels, and the sensitive word and sentence labels are from a preset database;
the third calculation module is used for performing feature extraction calculation on words and sentences of the first text information through a preset first model to obtain a third calculation result, wherein the first model is obtained by training according to event risk types corresponding to historical data;
and the fourth calculation module is used for performing fusion calculation on the second calculation result and the third calculation result to obtain risk evaluation data corresponding to the first text information.
9. The apparatus of claim 8, wherein the financial data further comprises reporting data, the reporting data comprising a plurality of data items; the device further comprises:
the first determining module is used for determining data items with collusion relations from the plurality of data items;
the verification module is used for verifying the data items with the audit relationship through a preset verification rule to obtain a verification result;
and the second generation module is used for generating second prompt information under the condition that the checking relation of the corresponding data items represented by the verification result is not established.
10. The apparatus of claim 6, wherein the financial data further comprises reporting data, the reporting data comprising a plurality of data items; the device further comprises:
the second determining module is used for determining target data items corresponding to different event risk types from the plurality of data items;
the characteristic extraction module is used for extracting characteristics of the data of the target data item through a plurality of preset event risk identification models to obtain an event risk type corresponding to the report data;
and the third generation module is used for generating third prompt information according to the event risk type corresponding to the report data.
11. An electronic device, characterized in that the device comprises: a processor and a memory storing computer program instructions;
the processor, when executing the computer program instructions, implements the information generating method of any of claims 1-5.
12. A computer storage medium having computer program instructions stored thereon, which when executed by a processor implement the information generating method of any one of claims 1 to 5.
13. A computer program product, wherein instructions in the computer program product, when executed by a processor of an electronic device, cause the electronic device to perform the information generating method of any one of claims 1-5.
CN202210299639.9A 2022-03-25 2022-03-25 Information generation method, device, equipment and computer storage medium Pending CN114662457A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210299639.9A CN114662457A (en) 2022-03-25 2022-03-25 Information generation method, device, equipment and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210299639.9A CN114662457A (en) 2022-03-25 2022-03-25 Information generation method, device, equipment and computer storage medium

Publications (1)

Publication Number Publication Date
CN114662457A true CN114662457A (en) 2022-06-24

Family

ID=82030559

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210299639.9A Pending CN114662457A (en) 2022-03-25 2022-03-25 Information generation method, device, equipment and computer storage medium

Country Status (1)

Country Link
CN (1) CN114662457A (en)

Similar Documents

Publication Publication Date Title
Zhaokai et al. Contract analytics in auditing
US20230004888A1 (en) Ai-augmented auditing platform including techniques for applying a composable assurance integrity framework
US20140258169A1 (en) Method and system for automated verification of customer reviews
CN112150298B (en) Data processing method, system, device and readable medium
US20210201266A1 (en) Systems and methods for processing claims
Amin et al. Application of optimistic and pessimistic OWA and DEA methods in stock selection
Fieberg et al. Machine learning in accounting research
KR20210029326A (en) Apparatus and method for diagnosing soundness of company using unstructured financial information
Khadivizand et al. Towards intelligent feature engineering for risk-based customer segmentation in banking
CN117114812A (en) Financial product recommendation method and device for enterprises
US20220301072A1 (en) Systems and methods for processing claims
CN114662457A (en) Information generation method, device, equipment and computer storage medium
Sun et al. Using an ensemble LSTM model for financial statement fraud detection
Doultani et al. Smart Underwriting-A Personalised Virtual Agent
Ravula Bankruptcy prediction using disclosure text features
Krause Beyond the Numbers: Unlocking Textual Insights from Parts II and III of Form 1-A for Regulation A Filings
Lagusto Predicting Fraudulent Financial Statement using Textual Analysis and Machine-Learning Techniques
Kellerman Evaluating the effectiveness of Benford's law as an investigative tool for forensic accountants
Eni Considerations on the use of XBRL during the financial audit missions: Approach of a model
Nugent Assesing Completeness of Solvency and Financial Condition Reports through the use of Machine Learning and Text Classification
Verhoog Quantifying and analysing complicated financial text data
AU2024202125A1 (en) Systems and methods for processing claims
Chen et al. Construction of Bank Credit White List Access System Based on Grey Clustering Algorithm
CN117751362A (en) AI-enhanced audit platform including techniques for applying combinable assurance integrity frameworks
AU2015207809A1 (en) Risk based data assessment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination