CN117789230A - Key information extraction method and device and contract signing supervision method and device - Google Patents

Key information extraction method and device and contract signing supervision method and device Download PDF

Info

Publication number
CN117789230A
CN117789230A CN202311819896.1A CN202311819896A CN117789230A CN 117789230 A CN117789230 A CN 117789230A CN 202311819896 A CN202311819896 A CN 202311819896A CN 117789230 A CN117789230 A CN 117789230A
Authority
CN
China
Prior art keywords
information
verification
model
splitting
account
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311819896.1A
Other languages
Chinese (zh)
Inventor
李泊言
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202311819896.1A priority Critical patent/CN117789230A/en
Publication of CN117789230A publication Critical patent/CN117789230A/en
Pending legal-status Critical Current

Links

Landscapes

  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The key information extraction method and device and the contract signing supervision method and device can be applied to the technical field of big data and the technical field of artificial intelligence. The key information extraction method comprises the following steps: identifying bank account information in a target contract file and verification information corresponding to the bank account information; inputting the bank account information into an information removal model to filter invalid information, and outputting account filtering information; splitting the account filtering information by using an information splitting model according to a preset splitting rule, and outputting N pieces of splitting key information, wherein N is a positive integer; based on the verification information, respectively carrying out validity verification on the N pieces of split key information by utilizing an information verification model, and outputting a verification result; and responding to the verification result to pass, and outputting the corresponding split key information as effective key information.

Description

Key information extraction method and device and contract signing supervision method and device
Technical Field
The invention relates to the technical field of big data and artificial intelligence, in particular to a key information extraction method and device and a contract signing supervision method and device.
Background
The contract signing of the current bank and the client is usually carried out through an electronic signing system, and the client manager completes the signing work with the client on behalf of the bank. After contract signing, the customer manager needs to perform a series of performing steps, such as entering the signing account into the supervisory system, performing funds according to the contract requirements, and the like. These steps are not directly linked with the signing link in the flow, and require independent operation and management by the customer manager.
However, since the verification of the key information is usually performed manually in the process of contract management at present, the efficiency is low and there is a risk of misoperation. In addition, because the performing step is separated from the signing flow, the problem that the operation of a client manager is not timely exists, and key operation delay related to the contract can be caused, which is shown in the following steps: if the customer manager omits or delays execution of the step of performing the performance, such as failure to timely enter the account into the supervisory system, the bank may not be able to intercept the suspicious transaction on time, thereby causing contract violations and related risk events.
Disclosure of Invention
In view of the above-mentioned problems, according to a first aspect of the present invention, there is provided a key information extraction method comprising: identifying bank account information in a target contract file and verification information corresponding to the bank account information; inputting the bank account information into an information removal model to filter invalid information, and outputting account filtering information; splitting the account filtering information by using an information splitting model according to a preset splitting rule, and outputting N pieces of splitting key information, wherein N is a positive integer; based on the verification information, respectively carrying out validity verification on the N pieces of split key information by utilizing an information verification model, and outputting a verification result; and responding to the verification result to pass, and outputting the corresponding split key information as effective key information.
According to some exemplary embodiments, according to a preset splitting rule, the splitting is performed on the account filtering information by using an information splitting model, and N pieces of splitting key information are output, including: defining the splitting rule according to the length and format of the bank account; based on the splitting rule, carrying out feature engineering on the account filtering information and outputting an account character sequence; and inputting the account character sequence into the information splitting model to identify account boundaries, and outputting N pieces of splitting key information.
According to some example embodiments, the information splitting model is trained using a transducer model, wherein the training set of information splitting models comprises: the original text data comprising a single account or a continuous account and corresponding identification information.
According to some exemplary embodiments, the information verification model includes a credential information verification model, an identity information verification model, and a contact information verification model, the verification information including a credential number, an identity card number, and a cell phone number; based on the verification information, the validity verification is carried out on the N pieces of split key information by utilizing an information verification model, and a verification result is output, and the method specifically comprises the following steps: writing a first regular expression based on structural features of the credential information, and using the credential information verification model as the credential information verification model to verify the validity of the credential number to obtain a first verification result; according to the standard of the identity information, a verification algorithm is realized, the identity information verification model is used for verifying the structure and the legality of the identity card number, and a second verification result is obtained; writing a second regular expression according to the standard of the mobile phone number, using the second regular expression as the contact information verification model, and verifying the format and the length of the mobile phone number by using the contact information verification model to obtain a third verification result; and responding to the first verification result, the second verification result and the third verification result to be passed, and outputting the verification result to be passed.
According to some exemplary embodiments, the identifying bank account information in the target contract file and the verification information corresponding to the bank account information specifically includes: converting the target contract file into an editable text based on OCR technology; acquiring a target keyword, and determining the position of the target keyword in the editable text by using a character string matching algorithm; setting search parameters based on the bank account information and the verification information; and identifying the bank account information and the verification information from the editable text based on the search parameters and the location.
According to some exemplary embodiments, the filtering the invalid information by inputting the bank account information into the information removal model, and outputting account filtering information specifically includes: and writing a custom filter as the information removal model, wherein the custom filter is used for filtering blank spaces, special characters, non-numerical characters, chinese characters, letters and/or punctuation marks in the bank account information.
According to a second aspect of the present invention, there is provided a method of supervising a contract subscription, the method comprising: acquiring information of a target contract and a file stream corresponding to the target contract; converting the file stream corresponding to the target contract into a target contract file; acquiring effective key information in the target contract file by the method according to the first aspect, and adding the effective key information into a database of a supervision model; and responding to the account corresponding to the effective key information to generate a transaction, checking and checking the detail of the transaction by using the supervision model, and outputting a supervision result.
According to some exemplary embodiments, the converting the file stream corresponding to the target contract into the target contract file specifically includes: decoding the file stream encoded by base64, determining a format of the file stream; and writing the file stream into a disk file based on the format, and creating a readable file as the target contract file.
According to a third aspect of the present invention, there is provided a key information extraction apparatus comprising: the information identification module is used for: identifying bank account information in a target contract file and verification information corresponding to the bank account information; an invalid information filtering module, configured to: inputting the bank account information into an information removal model to filter invalid information, and outputting account filtering information; the information splitting module is used for: splitting the account filtering information by using an information splitting model according to a preset splitting rule, and outputting N pieces of splitting key information, wherein N is a positive integer; the validity verification module is used for: based on the verification information, respectively carrying out validity verification on the N pieces of split key information by utilizing an information verification model, and outputting a verification result; and an effective key information output module for: and outputting the corresponding split key information as effective key information in response to the passing of the verification result.
According to some exemplary embodiments, the information recognition module may include an editable text conversion unit, a target keyword determination unit, a search parameter setting unit, and a recognition unit.
According to some example embodiments, the editable text conversion unit may be configured to convert the target contract file into editable text based on OCR technology.
According to some exemplary embodiments, the target keyword determining unit may be configured to obtain a target keyword, and determine a position of the target keyword in the editable text using a string matching algorithm.
According to some exemplary embodiments, the search parameter setting unit may be configured to set a search parameter based on the bank account information and the verification information.
According to some exemplary embodiments, the identifying unit may be configured to identify the bank account information and the verification information from the editable text based on the search parameter and the location.
According to some example embodiments, the invalidation information filtering module may include a custom filter writing unit.
According to some exemplary embodiments, the custom filter writing unit may be configured to write a custom filter as the information removal model, where the custom filter is configured to filter spaces, special characters, non-numeric characters, kanji, letters, and/or punctuation marks in the bank account information.
According to some example embodiments, the information splitting module may include a splitting rule definition unit, a feature engineering unit, and an account boundary identification unit.
According to some exemplary embodiments, the splitting rule definition unit may be configured to define the splitting rule according to a length and a format of a bank account number.
According to some exemplary embodiments, the feature engineering unit may be configured to perform feature engineering on the account filtering information based on the splitting rule, and output an account character sequence.
According to some exemplary embodiments, the account boundary recognition unit may be configured to input the account character sequence into the information splitting model to perform account boundary recognition, and output N pieces of splitting key information.
According to some example embodiments, the validity verification module may include a credential information verification unit, an identity information verification unit, a contact information verification unit, and a verification result output unit.
According to some exemplary embodiments, the credential information verification unit may be configured to write a first regular expression based on structural features of credential information, and use the credential information verification model to verify validity of a credential number as the credential information verification model, to obtain a first verification result.
According to some exemplary embodiments, the identity information verification unit may be configured to implement a verification algorithm according to a standard of identity information, and obtain, as the identity information verification model, a second verification result by using the identity information verification model to verify the structure and validity of the identity card number.
According to some exemplary embodiments, the contact information verification unit may be configured to write a second regular expression according to a standard of the mobile phone number, and use the second regular expression as the contact information verification model to verify a format and a length of the mobile phone number, so as to obtain a third verification result.
According to some exemplary embodiments, the verification result output unit may be configured to output the verification result as pass in response to the first, second, and third verification results being all pass.
According to a fourth aspect of the present invention, there is provided a supervision apparatus for contract subscriptions, the apparatus comprising: a file stream acquisition module, configured to: acquiring information of a target contract and a file stream corresponding to the target contract; the target contract file conversion module is used for: converting the file stream corresponding to the target contract into a target contract file; the effective key information acquisition module is used for: acquiring effective key information in the target contract file by the method according to the first aspect, and adding the effective key information into a database of a supervision model; and a supervision result output module for: and generating a transaction in response to the account corresponding to the effective key information, checking and verifying the detail of the transaction by using the supervision model, and outputting a supervision result.
According to some example embodiments, the target contract document conversion module may include a decoding unit and a document generation unit.
According to some exemplary embodiments, the decoding unit may be configured to decode the file stream encoded by base64, and determine a format of the file stream.
According to some exemplary embodiments, the file generating unit may be configured to write the file stream to a disk file based on the format, and create a readable file as the target contract file.
According to a fifth aspect of the present invention, there is provided an electronic device comprising: one or more processors; and a storage device for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method as described above.
According to a sixth aspect of the present invention there is provided a computer readable storage medium having stored thereon executable instructions which when executed by a processor cause the processor to perform the method as described above.
According to a seventh aspect of the present invention there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method as described above.
One or more of the above embodiments have the following advantages or benefits: according to the key information extraction method provided by the invention, the key information list of the contract can be formed by splitting the effective information and checking, which means that the computer can automatically extract the key information in the contract file, so that manual intervention is not required, and the user experience is improved; the account information is filtered by the information removal model, so that the processing load can be reduced, the calculation load is lightened, and the calculation efficiency is improved; meanwhile, by combining a plurality of models and algorithms, the system can automatically, rapidly and accurately extract key information in the contract file, so that manual intervention is reduced, and user satisfaction is improved. According to the contract signing supervision method provided by the invention, automation from data collection and processing to final transaction supervision is realized, supervision efficiency and accuracy are improved, and the requirement of manual intervention is reduced.
Drawings
The foregoing and other objects, features and advantages of the invention will be apparent from the following description of embodiments of the invention with reference to the accompanying drawings, in which:
fig. 1 schematically illustrates an application scenario diagram of a key information extraction method, device, equipment and medium according to an embodiment of the invention.
Fig. 2 schematically shows a flow chart of a key information extraction method according to an embodiment of the invention.
Fig. 3 schematically shows a flowchart of a method of identifying bank account information and verification information according to an embodiment of the invention.
Fig. 4 schematically shows a flow chart of a method of filtering invalid information according to an embodiment of the present invention.
Fig. 5 schematically illustrates a flow chart of a method of splitting account filter information according to an embodiment of the invention.
Fig. 6 schematically shows a flow chart of a method of performing validity verification according to an embodiment of the invention.
Fig. 7 schematically shows a flow chart of a method of supervision of contract subscriptions in accordance with an embodiment of the invention.
Fig. 8 schematically shows a flow chart of a method of converting a filestream into a target contract file according to an embodiment of the present invention.
FIG. 9 schematically illustrates a workflow diagram for audit verification in response to a supervising user initiating a transaction, according to an embodiment of the present invention.
Fig. 10 schematically shows a block diagram of the key information extraction apparatus according to an embodiment of the present invention.
Fig. 11 schematically shows a block diagram of a supervision apparatus for contract signing according to an embodiment of the present invention.
Fig. 12 schematically shows a block diagram of an electronic device adapted to a key information extraction method and a supervision method of contract subscriptions according to an embodiment of the invention.
Detailed Description
Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. It should be understood that the description is only illustrative and is not intended to limit the scope of the invention. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the invention. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the present invention.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.
Where expressions like at least one of "A, B and C, etc. are used, the expressions should generally be interpreted in accordance with the meaning as commonly understood by those skilled in the art (e.g.," a system having at least one of A, B and C "shall include, but not be limited to, a system having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
In the technical scheme of the invention, the acquisition, storage, application and the like of the related personal information of the user accord with the regulations of related laws and regulations, necessary security measures are taken, and the public order harmony is not violated.
First, technical terms described herein are explained and illustrated as follows.
The transducer model is a deep learning model based on a self-attention mechanism, is mainly used for processing sequential input data such as natural language and is widely applied to the fields of Natural Language Processing (NLP) and Computer Vision (CV), and can process all input data at one time, unlike a traditional cyclic neural network (RNN), so that the transducer model is more efficient in processing long-sequence data. At the heart of the transducer model is a self-attention mechanism that enables the model to assign different weights to each position in the input sequence, taking into account the importance of the different positions in the input sequence when generating the output.
OCR (Optical Character Recognition ) technology is a technology that is capable of converting text in a picture into machine-readable text. The technology is mainly applied to extracting text information from printed documents, handwritten manuscripts or image files. OCR technology is widely used in a variety of fields including document digitizing, automated document processing, data warehousing, barrier-free technology, and the like.
Custom filters generally refer to tools or functions designed to filter, process, or transform data according to specific needs in a software application or data processing flow. Such filters may be tailored to meet specific data processing objectives according to the specific needs of the user.
Base64 is a method for data encoding, primarily for transmitting and storing binary data in those applications where text data is only processed. Base64 converts binary data into an ASCII string of 64 printable characters, including 26 capital letters, 26 lowercase letters, 10 digits, and two symbols "+", "/", for a total of 64 characters, and is referred to as Base64.
File Stream (File Stream) refers to a data Stream used in a computer program or system to read or write File content. A file stream is a form of data stream, specific to file operations, that allows a program to process data in a file in a continuous stream, rather than reading the entire file at once.
In current banking operations, a customer manager contracts with a customer through an electronic signing system, typically including digital contract document processing, electronic signature functions, and contract storage and management. The customer manager is responsible for guiding the customer to complete the electronic signing process, and ensuring accurate understanding and legal compliance of the contractual terms. In addition, the customer manager also needs to oversee the execution and subsequent operations of the contract.
Currently, most banks rely on manual methods to verify key information in contracts, such as bank account numbers, identification card numbers, and the like. Such manual verification processes are often time consuming and inefficient, especially when dealing with large numbers of contracts. In particular, the manual verification process is susceptible to artifacts such as fatigue, distraction, etc., which may lead to verification errors. For example, a one-digit error or reverse order of a bank account number may result in significant operational errors; due to the speed limit of manual verification, delays in the processing and approval of contracts may occur, which may lead to lost opportunities or customer dissatisfaction in business scenarios requiring quick response; under the current manual verification system, it is difficult to achieve comprehensive and deep analysis of the compliance information, which may lead to vulnerabilities in compliance and risk management.
Further, the performing step mainly includes manually entering the subscription account information into the supervisory system, performing financial operations (such as transfer) specified in the contract, and tracking the performance of the contract. Since these performance operations need to be performed separately after contract signing and rely entirely on manual operations by the customer manager, there is a risk of operation delays and omissions, and this separate flow design results in an inadequate response on urgent or critical operations. For example, in the scenario of account pre-administration, although a supervision contract is signed with a customer, the customer manager does not enter account information into the supervision system in time, so that the bank cannot effectively perform pre-risk interception, and compliance risks and financial losses are further caused.
The existing contract management flow lacks automation and system linkage, so that contract signing and performing operations cannot be effectively docked in time and function, and operation risks and compliance risks of banking businesses are increased.
Based on this, an embodiment of the present invention provides a key information extraction method, which includes: identifying bank account information in a target contract file and verification information corresponding to the bank account information; inputting the bank account information into an information removal model to filter invalid information, and outputting account filtering information; splitting the account filtering information by using an information splitting model according to a preset splitting rule, and outputting N pieces of splitting key information, wherein N is a positive integer; based on the verification information, respectively carrying out validity verification on the N pieces of split key information by utilizing an information verification model, and outputting a verification result; and responding to the verification result to pass, and outputting the corresponding split key information as effective key information. According to the key information extraction method provided by the invention, the key information list of the contract can be formed by splitting the effective information and checking, which means that the computer can automatically extract the key information in the contract file, so that manual intervention is not required, and the user experience is improved; the account information is filtered by the information removal model, so that the processing load can be reduced, the calculation load is lightened, and the calculation efficiency is improved; meanwhile, by combining a plurality of models and algorithms, the system can automatically, rapidly and accurately extract key information in the contract file, so that manual intervention is reduced, and user satisfaction is improved.
The embodiment of the invention also provides a method for supervising contract signing, which is characterized by comprising the following steps: acquiring information of a target contract and a file stream corresponding to the target contract; converting the file stream corresponding to the target contract into a target contract file; acquiring effective key information in the target contract file by using the key information extraction method, and adding the effective key information into a database of a supervision model; and responding to the account corresponding to the effective key information to generate a transaction, checking and checking the detail of the transaction by using the supervision model, and outputting a supervision result. According to the contract signing supervision method provided by the embodiment of the invention, automation from data collection and processing to final transaction supervision is realized, supervision efficiency and accuracy are improved, and the requirement of manual intervention is reduced, so that user experience is improved.
It should be noted that the key information extraction method and device and the contract signing supervision method and device can be used in the big data technical field and the artificial intelligence technical field, can also be used in the financial field, and can also be used in various fields except the big data technical field, the artificial intelligence technical field and the financial field. The application fields of the key information extraction method and device and the contract signing supervision method and device provided by the embodiment of the invention are not limited.
In the technical scheme of the invention, the related user information (including but not limited to user personal information, user image information, user equipment information, such as position information and the like) and data (including but not limited to data for analysis, stored data, displayed data and the like) are information and data authorized by a user or fully authorized by all parties, and the processing of the related data such as collection, storage, use, processing, transmission, provision, disclosure, application and the like are all conducted according to the related laws and regulations and standards of related countries and regions, necessary security measures are adopted, no prejudice to the public welfare is provided, and corresponding operation inlets are provided for the user to select authorization or rejection.
Fig. 1 schematically illustrates an application scenario diagram of a key information extraction method, device, equipment and medium according to an embodiment of the invention.
As shown in fig. 1, an application scenario 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only) may be installed on the terminal devices 101, 102, 103.
The terminal devices 101, 102, 103 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (by way of example only) providing support for websites browsed by users using the terminal devices 101, 102, 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device.
It should be noted that, the key information extraction method and the contract signing supervision method provided by the embodiments of the present invention may be generally executed by the server 105. Accordingly, the key information extraction device and the contract signing supervision method device provided by the embodiment of the present invention may be generally disposed in the server 105. The key information extraction method and the supervision method of contract signing provided by the embodiments of the present invention may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the key information extraction apparatus and the contract signing supervision method apparatus provided by the embodiments of the present invention may also be provided in a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Fig. 2 schematically shows a flow chart of a key information extraction method according to an embodiment of the invention.
As shown in fig. 2, the key information extraction method 200 of this embodiment may include operations S210 to S250.
In operation S210, bank account information in the target contract file and verification information corresponding to the bank account information are identified.
Fig. 3 schematically shows a flowchart of a method of identifying bank account information and verification information according to an embodiment of the invention.
As shown in fig. 3, the method for identifying bank account information and verification information of this embodiment may include operations S310 to S340, and operations S310 to S340 may at least partially perform operation S210.
In operation S310, the target contract file is converted into editable text based on the OCR technology.
In an embodiment of the present invention, an image document or PDF file of a target contract file may be scanned, which contains text from which key information is to be extracted, and the image text may be converted to a text format by selecting an appropriate OCR tool or library for conversion. For example, OCR tools Tesseract, google Cloud Vision OCR, adobe Acrobat, etc. may be selected.
In operation S320, a target keyword is acquired, and a position of the target keyword in the editable text is determined using a string matching algorithm.
In embodiments of the present invention, specific key information, such as account numbers, cell phone numbers, credential numbers, etc., may be identified and located from the editable text. Specifically, the extraction of the key information is realized through key word recognition and text positioning: firstly, a keyword list of key information to be extracted, such as account keywords, mobile phone number keywords and the like, needs to be defined, and the keywords can be parameters configured in advance so as to be modified according to the needs; then, the system will traverse the entire editable text, checking each text paragraph or string one by one to find out whether the target keyword is contained; once a paragraph or string of text containing the target keyword is found, the starting and ending positions of the keyword may be obtained to determine the position of the keyword in the text.
In an embodiment of the present invention, text may be first segmented into words or phrases according to a specific separator, and then each segmented portion is checked for inclusion of a target keyword.
In operation S330, search parameters are set based on the bank account information and the verification information.
In embodiments of the present invention, setting search parameters may specify how to search for such information in editable text. Among these, search parameters may be included the scope of the search (whole text, specific paragraphs, etc.), the direction of the search (forward or backward), etc. These parameters should be adjustable to the particular situation to ensure accurate information extraction.
In operation S340, the bank account information and the verification information are identified from the editable text based on the search parameter and the location.
In the embodiment of the invention, based on the search parameter and the position, the bank account information and the verification information can be extracted from the text. For example, the specific account number following the text position is extracted.
Referring back to fig. 2, in operation S220, the bank account information input information removal model performs filtering of invalid information, and outputs account filtering information.
Fig. 4 schematically shows a flow chart of a method of filtering invalid information according to an embodiment of the present invention.
As shown in fig. 4, the method of filtering invalid information of this embodiment may include operation S410, and operation S410 may at least partially perform operation S220.
In operation S410, a custom filter is written as the information removal model, where the custom filter is used to filter spaces, special characters, non-numeric characters, chinese characters, letters, and/or punctuation marks in the bank account information.
In embodiments of the present invention, custom filter functions may be written to process bank account information to remove spaces, special characters, non-numeric characters, chinese characters, letters, and/or punctuation marks. For example, the following procedure may be included: initializing an empty character string in the self-defined filter function for storing the filtered bank account information, wherein the character string is gradually constructed in the processing process; traversing each character of the bank account information, and checking whether the character is a number (an isdigit () function can be used) in the traversing process, if the character is a number, adding the character into the filtered account information, and if the character is not a number, skipping the character and not adding the character into the filtered account information; after the custom filter function is completed, it returns filtered bank account information, which contains only numeric characters.
Referring back to fig. 2, in operation S230, according to a preset splitting rule, the account filtering information is split by using an information splitting model, and N pieces of splitting key information are output, where N is a positive integer.
Fig. 5 schematically illustrates a flow chart of a method of splitting account filter information according to an embodiment of the invention.
As shown in fig. 5, the method for splitting account filtering information of this embodiment may include operations S510 to S530, and operations S510 to S530 may at least partially perform operation S230.
In operation S510, the splitting rule is defined according to the length and format of the bank account number.
In the embodiment of the invention, how to split the bank account information can be determined according to the length and the format of the bank account of each bank. For example, when processing a plurality of continuous bank accounts, a bank to which each account belongs may be defined and identified first, specifically, a specific identifier of each account or a specific prefix of the account may be identified according to different requirements of different banks, and then, how to split may be defined by using a separator with a specific number of digits or a substring with a fixed number of digits of the bank.
In operation S520, based on the splitting rule, feature engineering is performed on the account filtering information, and an account character sequence is output.
In embodiments of the present invention, account filter information needs to be feature engineered before entering the model in preparation for subsequent processing or recording. The goal of feature engineering is to process a sequence of characters according to specific needs, such as: character code conversion, if the character sequence contains different character codes, the character code conversion can be carried out to ensure consistency; character normalization, normalizing characters to eliminate case differences or other character variations.
In operation S530, the account character sequence is input into the information splitting model to identify an account boundary, and N pieces of splitting key information are output.
In an embodiment of the present invention, the information splitting model may be trained using a transducer model, wherein the training set of information splitting models includes: the original text data comprising a single account or a continuous account and corresponding identification information.
In the embodiment of the invention, the account number character sequence can be input into the information splitting model. The transducer model may segment (segment) the input sequence or add special tags (e.g., [ CLS ] and [ SEP ] tags) to adapt to the input format of the model. This ensures that the model is able to process the input sequence correctly.
In embodiments of the present invention, the original text data may be understood and annotated by identifying the landmark characters and/or contextual windows. In particular, bank accounts typically have identifying indicia characters, e.g., a bank account typically starts, ends, or contains a particular separator with a series of numbers, which by detecting these characters or patterns can help determine the boundaries of the account; in processing account text data, a small window may be applied to a given location of text using a contextual sliding window, the text context within the window may help the model understand the context of key information and more easily determine boundaries.
In an embodiment of the invention, the model will identify account boundaries and extract split key information based on patterns and contexts it trains to learn. The output of the model includes the identified account boundary locations and extracted split key information including the start and end locations of each bank account, as well as the character sequence of the account. Further, a plurality of split key information (N pieces) may be output, each corresponding to one account.
Referring back to fig. 2, in operation S240, validity verification is performed on the N pieces of split key information using an information verification model, respectively, based on the verification information, and a verification result is output.
In the embodiment of the invention, the information verification model comprises a credential information verification model, an identity information verification model and a contact information verification model, and the verification information comprises a credential number, an identity card number and a mobile phone number.
Fig. 6 schematically shows a flow chart of a method of performing validity verification according to an embodiment of the invention.
As shown in fig. 6, the method for performing validity verification of this embodiment may include operations S610 to S640, and operations S610 to S640 may at least partially perform operation S240.
In operation S610, a first regular expression is written based on structural features of the credential information as the credential information verification model, and validity of the credential number is verified by using the credential information verification model to obtain a first verification result.
In embodiments of the present invention, the structural features of the credential information include possible formats, lengths, and other structural information of the credential number. The first regular expression may specify that the credential number must be made up of a particular number of numerical characters, or may include a particular separator or letter. The first regular expression written can be used for matching the certificate number to be verified with the regular expression, and if the certificate number is matched with the regular expression, the certificate number accords with the defined structural characteristics and is regarded as a valid certificate number.
In an embodiment of the present invention, the accounting interface of the bank may also be invoked to query the credential validity to further verify the validity of the credential information, for example, a dedicated accounting element interface of the bank may be used.
In operation S620, a verification algorithm is implemented according to the standard of the identity information, and the structure and the validity of the identification card number are verified by using the identity information verification model as the identity information verification model, so as to obtain a second verification result.
In the embodiment of the invention, a verification algorithm can be realized according to the structure and the validity standard of the identification card number, and the algorithm can comprise rules for checking the length, the digital composition, the verification position and the like of the identification card number.
In operation S630, a second regular expression is written according to the standard of the mobile phone number, and is used as the contact information verification model, and the format and the length of the mobile phone number are verified by using the contact information verification model, so as to obtain a third verification result.
In the embodiment of the invention, the standard of the mobile phone number can be obtained through the mobile phone numbers in different countries or regions with different formats and length requirements. The second regular expression may include numbers, area codes, separators, etc., with the specific rules depending on the country/region standard of the cell phone number.
In operation S640, in response to the first, second, and third verification results being all passed, the verification result is output as passed.
Referring back to fig. 2, in operation S250, corresponding split key information is output as valid key information in response to the verification result being passed.
According to the key information extraction method provided by the invention, the key information list of the contract can be formed by splitting the effective information and checking, which means that the computer can automatically extract the key information in the contract file, so that manual intervention is not required, and the user experience is improved; the account information is filtered by the information removal model, so that the processing load can be reduced, the calculation load is lightened, and the calculation efficiency is improved; meanwhile, by combining a plurality of models and algorithms, the system can automatically, rapidly and accurately extract key information in the contract file, so that manual intervention is reduced, and user satisfaction is improved. Specifically, the following beneficial effects are brought:
1. By splitting and checking the model, the computer can automatically extract key information in the contract file without manual intervention, which means that a user does not need to manually search and extract information, thereby improving user experience, enabling the user to more rapidly complete tasks, reducing tedious manual operation and improving efficiency and satisfaction;
2. the use of the information removal model can reduce unnecessary information processing burden, focusing on truly important key information. This helps to reduce the computational load and improve the computational efficiency; meanwhile, a plurality of tasks can be automatically processed by combining a plurality of models and algorithms, so that the need of manual intervention is reduced, and the risk of errors is reduced;
3. the accuracy of the information can be improved by using a verification model and a proper regular expression. The computer usually does not have human errors when performing the verification processes, so that the risk of data errors is reduced, which is critical to the accuracy of key information, especially in the financial or legal fields;
4. by using a proper algorithm and model, the computer can process a large number of contract files more efficiently, so that the calculation performance is improved, and particularly when large-scale data are processed, the automatic information extraction and verification can be completed in a short time, so that the processing time and cost are reduced;
5. By automatically, rapidly and accurately extracting key information in the contract file, the user experience is comprehensively improved, the user can easily finish tasks, accurate information provided by a system can be relied on, and the trust level is improved.
Based on the extraction method of the key information, the embodiment of the invention also provides a supervision method of contract signing. This method will be described in detail below in connection with fig. 7.
Fig. 7 schematically shows a flow chart of a method of supervision of contract subscriptions in accordance with an embodiment of the invention.
As shown in fig. 7, the supervision method of contract subscriptions of this embodiment may include operations S710 to S740.
In operation S710, information of a target contract and a file stream corresponding to the target contract are acquired.
In the embodiment of the invention, the contract signed on the same day and the binary file stream corresponding to the contract can be obtained from the data lake at the end of the day. Specifically, a query may be made with sql statements, e.g., a contract related to a bi-directional supervision service, where the sql statement is
select ZONENO,TPID,CPNO,CPNAME,CPPRO,CLIENTNAME,CPDEPARTMENT,APPROVAL_USERID,STATUS,LAUNCH_TIME,LAUNCH_USERID
from(select*,row_number()over(partition by CPNO order by pt_dt desc)as rn from bdpview.DCM_ASS_CONTRACT_PROPERTY_S
where pt_dt between′2022-01-01′and′2023-05-31′)as u
where u.rn=1 and cpname like '%some bi-directional supervision%'), create
quit;
In the embodiment of the invention, the contract number, the contract name and a binary file stream of a bidirectional supervision on the same day in the electronic signing system can be queried through the statement, and the file stream is used for restoring files.
In operation S720, the file stream corresponding to the target contract is converted into a target contract file.
Fig. 8 schematically shows a flow chart of a method of converting a filestream into a target contract file according to an embodiment of the present invention.
As shown in fig. 8, the method of converting a file stream into a target contract file of this embodiment may include operations S810 to S820.
In operation S810, the file stream encoded by the base64 is decoded, and a format of the file stream is determined.
In embodiments of the present invention, computer networks and communications typically use text data for transmission, rather than binary data, which can be converted to text data by Base64 encoding for transmission in a variety of network protocols and transmission channels. The Base64 coding uses common character sets (A-Z, a-Z,0-9, +, /) to represent binary data, so that the binary data has good compatibility in different character sets and coding environments, and the coding mode is not affected by the difference of the character sets.
In embodiments of the present invention, decoding may be implemented by a Base64 decoding function provided in a programming language or tool library. The decoded binary data may determine the file format by checking a Magic Number (Magic Number) or a specific identifier of the file.
In operation S820, the file stream is written to a disk file based on the format, and a readable file is created as the target contract file.
In embodiments of the present invention, a new disk file may be created based on the determined file format to save the decoded data. Writing the decoded data to the target file may be accomplished by a programming language or file manipulation tool, and the data will be written to the file in its original format.
Referring back to fig. 7, in operation S730, valid key information in the target contract file is acquired by a method (not described herein) as a key information extraction method, and the valid key information is added to the database of the supervision model.
In operation S740, in response to the account corresponding to the valid key information, a transaction is generated, the detail of the transaction is checked and verified by using the supervision model, and a supervision result is output.
FIG. 9 schematically illustrates a workflow diagram for audit verification in response to a supervising user initiating a transaction, according to an embodiment of the present invention.
As shown in fig. 9, in response to the transaction generated by the account corresponding to the effective key information, it may be first determined whether the account number of the transaction payer is in the supervision list stored in the database of the supervision model, if yes, it is necessary to check and verify the transaction details through the supervision model, specifically, it may be determined whether the validity and correctness of the transaction are intercepted or not, so as to automatically determine whether the interception is performed. For example, determining whether the transaction payer account balance is sufficient to effect a transaction, etc. If the judgment result is interception, the supervision result can output a compliance mark, a warning or an abnormal report, and the results can be sent to a financial supervisor and used as a basis for the subsequent judgment of the financial supervisor; if the judgment result is clear, the existing processes such as subsequent procedure auditing can be continued, and the details are not repeated here.
It should be noted that, if the account number of the transaction payer is not in the supervision list stored in the database of the supervision model, supervision can be performed according to the original manual flow (i.e. the transaction in the figure is the same as the existing one), so as to avoid supervision loopholes.
According to the contract signing supervision method provided by the invention, automation from data collection and processing to final transaction supervision is realized, supervision efficiency and accuracy are improved, and the requirement of manual intervention is reduced, so that user experience is improved. Specifically, the following beneficial effects are brought:
1. by automatically extracting key information, file stream processing and transaction supervision, the method greatly reduces the requirement of manual intervention, improves the efficiency, means that the acquisition of contract information, file stream conversion and transaction supervision can be completed more quickly, and reduces the processing time and cost;
2. the automatic process is generally more accurate than manual operation, and the risk of data errors and compliance problems is reduced by checking and checking the supervision model, so that the data accuracy is improved;
3. through the supervision model, the method can monitor transactions more comprehensively, detect abnormal behaviors, provide better supervision and reporting, and is helpful for meeting supervision requirements and providing better data tracking.
Based on the key information extraction method, the invention also provides a key information extraction device. The device will be described in detail below in connection with fig. 10.
Fig. 10 schematically shows a block diagram of the key information extraction apparatus according to an embodiment of the present invention.
As shown in fig. 10, the key information extraction apparatus 1000 according to this embodiment includes an information identification module 1010, an invalid information filtering module 1020, an information splitting module 1030, a validity verification module 1040, and a valid key information output module 1050.
The information identifying module 1010 may be configured to identify bank account information in a target contract file and verification information corresponding to the bank account information. In an embodiment, the information identifying module 1010 may be configured to perform the operation S210 described above, which is not described herein.
The invalid information filtering module 1020 may be configured to filter the invalid information by using the bank account information input information removal model, and output account filtering information. In an embodiment, the invalid information filtering module 1020 may be configured to perform the operation S220 described above, which is not described herein.
The information splitting module 1030 may be configured to split the account filtering information by using an information splitting model according to a preset splitting rule, and output N pieces of splitting key information, where N is a positive integer. In an embodiment, the information splitting module 1030 may be configured to perform the operation S230 described above, which is not described herein.
The validity verification module 1040 may be configured to perform validity verification on the N pieces of split key information by using an information verification model based on the verification information, and output a verification result. In an embodiment, the validity verification module 1040 may be used to perform the operation S240 described above, which is not described herein.
The valid key information output module 1050 may be configured to output the corresponding split key information as valid key information in response to the verification result being passed. In an embodiment, the valid key information output module 1050 may be used to perform the operation S250 described above, which is not described herein.
According to an embodiment of the present invention, the information recognition module 1010 may include an editable text conversion unit, a target keyword determination unit, a search parameter setting unit, and a recognition unit.
The editable text conversion unit may be configured to convert the target contract file into editable text based on OCR technology. In an embodiment, the editable text conversion unit may be used to perform the operation S310 described above, which is not described herein.
The target keyword determining unit may be configured to obtain a target keyword, and determine a position of the target keyword in the editable text using a string matching algorithm. In an embodiment, the target keyword determining unit may be configured to perform the operation S320 described above, which is not described herein.
The search parameter setting unit may be configured to set a search parameter based on the bank account information and the verification information. In an embodiment, the search parameter setting unit may be configured to perform the operation S330 described above, which is not described herein.
The identification unit may be configured to identify the bank account information and the verification information from the editable text based on the search parameter and the location. In an embodiment, the identifying unit may be configured to perform the operation S340 described above, which is not described herein.
According to an embodiment of the present invention, the invalidation information filtering module 1020 may include a custom filter writing unit.
The custom filter writing unit may be configured to write a custom filter as the information removal model, where the custom filter is configured to filter spaces, special characters, non-numeric characters, kanji, letters, and/or punctuation marks in the bank account information. In an embodiment, the custom filter writing unit may be configured to perform the operation S410 described above, which is not described herein.
According to an embodiment of the present invention, the information splitting module 1030 may include a splitting rule definition unit, a feature engineering unit, and an account boundary recognition unit.
The splitting rule definition unit may be configured to define the splitting rule according to a length and a format of a bank account number. In an embodiment, the splitting rule defining unit may be configured to perform the operation S510 described above, which is not described herein.
The feature engineering unit can be used for carrying out feature engineering on the account filtering information based on the splitting rule and outputting an account character sequence. In an embodiment, the feature engineering unit may be used to perform the operation S520 described above, which is not described herein.
The account boundary recognition unit can be used for inputting the account character sequence into the information splitting model to recognize account boundaries and outputting N pieces of splitting key information. In an embodiment, the account boundary identifying unit may be configured to perform the operation S530 described above, which is not described herein.
According to an embodiment of the present invention, the validity verification module 1040 may include a credential information verification unit, an identity information verification unit, a contact information verification unit, and a verification result output unit.
The credential information verification unit can be used for writing a first regular expression based on structural features of the credential information, and is used as the credential information verification model to verify the validity of the credential number by using the credential information verification model to obtain a first verification result. In an embodiment, the credential information verification unit may be configured to perform the operation S610 described above, which is not described herein.
The identity information verification unit can be used for realizing a verification algorithm according to the standard of the identity information, and is used as the identity information verification model to verify the structure and the validity of the identity card number by using the identity information verification model so as to obtain a second verification result. In an embodiment, the identity information verification unit may be configured to perform the operation S620 described above, which is not described herein.
The contact information verification unit can be used for writing a second regular expression according to the standard of the mobile phone number, and is used as the contact information verification model, and the format and the length of the mobile phone number are verified by using the contact information verification model to obtain a third verification result. In an embodiment, the contact information verification unit may be configured to perform the operation S630 described above, which is not described herein.
The verification result output unit may be configured to output the verification result as passing in response to the first verification result, the second verification result, and the third verification result being passing. In an embodiment, the verification result output unit may be configured to perform the operation S640 described above, which is not described herein.
Any of the information identification module 1010, the invalid information filtering module 1020, the information splitting module 1030, the validity verifying module 1040, and the valid key information output module 1050 may be combined in one module or any of the modules may be split into a plurality of modules according to an embodiment of the present invention. Alternatively, at least some of the functionality of one or more of the modules may be combined with at least some of the functionality of other modules and implemented in one module. At least one of the information identification module 1010, the invalid information filtering module 1020, the information splitting module 1030, the validity verification module 1040, and the valid key information output module 1050 may be implemented at least in part as hardware circuitry, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or as hardware or firmware in any other reasonable manner of integrating or packaging the circuitry, or as any one of or a suitable combination of three of software, hardware, and firmware, according to embodiments of the present invention. Alternatively, at least one of the information identification module 1010, the invalid information filtering module 1020, the information splitting module 1030, the validity verification module 1040, and the valid key information output module 1050 may be at least partially implemented as a computer program module, which when executed, may perform corresponding functions.
Based on the contract signing supervision method, the invention also provides a contract signing supervision device. The device will be described in detail below with reference to fig. 11.
Fig. 11 schematically shows a block diagram of a supervision apparatus for contract signing according to an embodiment of the present invention.
As shown in fig. 11, the supervision 1100 of contract subscriptions according to this embodiment includes a file stream acquisition module 1110, a target contract file conversion module 1120, a valid key information acquisition module 1130, and a supervision result output module 1140.
The file stream obtaining module 1110 may be configured to obtain information of the target contract and a file stream corresponding to the target contract. In an embodiment, the file stream obtaining module 1110 may be configured to perform the operation S710 described above, which is not described herein.
The target contract document conversion module 1120 may be configured to convert a document flow corresponding to the target contract into a target contract document. In an embodiment, the target contract file conversion module 1120 may be configured to perform the operation S720 described above, which is not described herein.
The effective key information obtaining module 1130 may be configured to obtain effective key information in the target contract file by using a method as described in a key information extraction method, and add the effective key information to a database of a supervision model. In an embodiment, the valid key information obtaining module 1130 may be configured to perform the operation S730 described above, which is not described herein.
The supervision result output module 1140 may be configured to generate a transaction in response to the account corresponding to the valid key information, and utilize the supervision model to check and verify details of the transaction, and output a supervision result. In an embodiment, the supervision result output module 1140 may be used to perform the operation S740 described above, which is not described herein.
According to an embodiment of the present invention, the target contract file conversion module 1120 may include a decoding unit and a file generating unit.
The decoding unit may be configured to decode the file stream encoded by base64 to determine a format of the file stream. In an embodiment, the decoding unit may be configured to perform the operation S810 described above, which is not described herein.
The file generating unit may be configured to write the file stream to a disk file based on the format, and create a readable file as the target contract file. In an embodiment, the file generating unit may be configured to perform the operation S820 described above, which is not described herein.
Any of the file stream acquisition module 1110, the target contract file conversion module 1120, the effective key information acquisition module 1130, and the supervision result output module 1140 may be combined in one module to be implemented, or any of the modules may be split into a plurality of modules, according to an embodiment of the present invention. Alternatively, at least some of the functionality of one or more of the modules may be combined with at least some of the functionality of other modules and implemented in one module. At least one of the file stream acquisition module 1110, the target contract file conversion module 1120, the valid key information acquisition module 1130, and the regulatory result output module 1140 may be implemented, at least in part, as hardware circuitry, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or as hardware or firmware in any other reasonable manner of integrating or packaging the circuitry, or as any one of or a suitable combination of any of the three. Alternatively, at least one of the file stream acquisition module 1110, the target contract file conversion module 1120, the valid key information acquisition module 1130, and the supervision result output module 1140 may be at least partially implemented as a computer program module, which when executed, may perform the corresponding functions.
Fig. 12 schematically shows a block diagram of an electronic device adapted to a key information extraction method and a supervision method of contract subscriptions according to an embodiment of the invention.
As shown in fig. 12, the electronic apparatus 1200 according to the embodiment of the present invention includes a processor 1201 which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1202 or a program loaded from a storage section 1208 into a Random Access Memory (RAM) 1203. The processor 1201 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. Processor 1201 may also include on-board memory for caching purposes. The processor 1201 may include a single processing unit or multiple processing units for performing the different actions of the method flow according to embodiments of the invention.
In the RAM 1203, various programs and data required for the operation of the electronic apparatus 1200 are stored. The processor 1201, the ROM 1202, and the RAM 1203 are connected to each other through a bus 1204. The processor 1201 performs various operations of the method flow according to the embodiment of the present invention by executing programs in the ROM 1202 and/or the RAM 1203. Note that the program may be stored in one or more memories other than the ROM 1202 and the RAM 1203. The processor 1201 may also perform various operations of the method flow according to embodiments of the present invention by executing programs stored in the one or more memories.
According to an embodiment of the invention, the electronic device 1200 may also include an input/output (I/O) interface 1205, the input/output (I/O) interface 1205 also being connected to the bus 1204. The electronic device 1200 may also include one or more of the following components connected to the I/O interface 1205: an input section 1206 including a keyboard, a mouse, and the like; an output portion 1207 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 1208 including a hard disk or the like; and a communication section 1209 including a network interface card such as a LAN card, a modem, or the like. The communication section 1209 performs communication processing via a network such as the internet. The drive 1210 is also connected to the I/O interface 1205 as needed. The removable medium 1 211, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is mounted on the drive 1210 as necessary, so that a computer program read out therefrom is mounted into the storage section 1208 as necessary.
The present invention also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed, implement methods in accordance with embodiments of the present invention.
According to embodiments of the present invention, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the invention, the computer-readable storage medium may include ROM 1202 and/or RAM 1203 and/or one or more memories other than ROM 1202 and RAM 1203 described above.
Embodiments of the present invention also include a computer program product comprising a computer program containing program code for performing the method shown in the flowcharts. The program code means for causing a computer system to carry out the methods provided by embodiments of the present invention when the computer program product is run on the computer system.
The above-described functions defined in the system/apparatus of the embodiment of the present invention are performed when the computer program is executed by the processor 1201. The systems, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the invention.
In one embodiment, the computer program may be based on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program can also be transmitted, distributed over a network medium in the form of signals, and downloaded and installed via a communication portion 1209, and/or from a removable medium 1211. The computer program may include program code that may be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
In such an embodiment, the computer program can be downloaded and installed from a network via the communication portion 1209, and/or installed from the removable media 1211. The above-described functions defined in the system of the embodiment of the present invention are performed when the computer program is executed by the processor 1201. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the invention.
According to embodiments of the present invention, program code for carrying out computer programs provided by embodiments of the present invention may be written in any combination of one or more programming languages, and in particular, such computer programs may be implemented in high-level procedural and/or object-oriented programming languages, and/or in assembly/machine languages. Programming languages include, but are not limited to, such as Java, c++, python, "C" or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The embodiments of the present invention are described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the invention, and such alternatives and modifications are intended to fall within the scope of the invention.

Claims (12)

1. A key information extraction method, the method comprising:
identifying bank account information in a target contract file and verification information corresponding to the bank account information;
inputting the bank account information into an information removal model to filter invalid information, and outputting account filtering information;
splitting the account filtering information by using an information splitting model according to a preset splitting rule, and outputting N pieces of splitting key information, wherein N is a positive integer;
based on the verification information, respectively carrying out validity verification on the N pieces of split key information by utilizing an information verification model, and outputting a verification result; and
and outputting the corresponding split key information as effective key information in response to the passing of the verification result.
2. The method according to claim 1, wherein the splitting the account filtering information by using an information splitting model according to a preset splitting rule, and outputting N pieces of splitting key information, specifically includes:
defining the splitting rule according to the length and format of the bank account;
based on the splitting rule, carrying out feature engineering on the account filtering information and outputting an account character sequence; and
and inputting the account character sequence into the information splitting model to identify account boundaries, and outputting N pieces of splitting key information.
3. The method of claim 2, wherein the information splitting model is trained using a transducer model, wherein the training set of information splitting models comprises: the original text data comprising a single account or a continuous account and corresponding identification information.
4. The method of claim 1, wherein the information verification model comprises a credential information verification model, an identity information verification model, and a contact information verification model, the verification information comprising a credential number, an identity card number, and a cell phone number;
based on the verification information, the validity verification is carried out on the N pieces of split key information by utilizing an information verification model, and a verification result is output, and the method specifically comprises the following steps:
Writing a first regular expression based on structural features of the credential information, and using the credential information verification model as the credential information verification model to verify the validity of the credential number to obtain a first verification result;
according to the standard of the identity information, a verification algorithm is realized, the identity information verification model is used for verifying the structure and the legality of the identity card number, and a second verification result is obtained;
writing a second regular expression according to the standard of the mobile phone number, using the second regular expression as the contact information verification model, and verifying the format and the length of the mobile phone number by using the contact information verification model to obtain a third verification result; and
and responding to the first verification result, the second verification result and the third verification result to be passed, and outputting the verification result to be passed.
5. The method according to any one of claims 1 to 4, wherein the identifying the bank account information in the target contract file and the verification information corresponding to the bank account information specifically includes:
converting the target contract file into an editable text based on OCR technology;
acquiring a target keyword, and determining the position of the target keyword in the editable text by using a character string matching algorithm;
Setting search parameters based on the bank account information and the verification information; and
and identifying the bank account information and the verification information from the editable text based on the search parameters and the location.
6. The method according to any one of claims 1 to 4, wherein the filtering the invalid information by inputting the bank account information into an information removal model, and outputting account filtering information, specifically includes:
and writing a custom filter as the information removal model, wherein the custom filter is used for filtering blank spaces, special characters, non-numerical characters, chinese characters, letters and/or punctuation marks in the bank account information.
7. A method of supervising a contract subscription, the method comprising:
acquiring information of a target contract and a file stream corresponding to the target contract;
converting the file stream corresponding to the target contract into a target contract file;
acquiring effective key information in the target contract file by using the method as claimed in any one of claims 1 to 6, and adding the effective key information into a database of a supervision model; and
and generating a transaction in response to the account corresponding to the effective key information, checking and verifying the detail of the transaction by using the supervision model, and outputting a supervision result.
8. The method according to claim 7, wherein the converting the file stream corresponding to the target contract into the target contract file specifically includes:
decoding the file stream encoded by base64, determining a format of the file stream; and
and writing the file stream into a disk file based on the format, and creating a readable file as the target contract file.
9. A key information extraction apparatus, characterized in that the apparatus comprises:
the information identification module is used for: identifying bank account information in a target contract file and verification information corresponding to the bank account information;
an invalid information filtering module, configured to: inputting the bank account information into an information removal model to filter invalid information, and outputting account filtering information;
the information splitting module is used for: splitting the account filtering information by using an information splitting model according to a preset splitting rule, and outputting N pieces of splitting key information, wherein N is a positive integer;
the validity verification module is used for: based on the verification information, respectively carrying out validity verification on the N pieces of split key information by utilizing an information verification model, and outputting a verification result; and
The effective key information output module is used for: and outputting the corresponding split key information as effective key information in response to the passing of the verification result.
10. A device for supervising a contract subscription, the device comprising:
a file stream acquisition module, configured to: acquiring information of a target contract and a file stream corresponding to the target contract;
the target contract file conversion module is used for: converting the file stream corresponding to the target contract into a target contract file;
the effective key information acquisition module is used for: acquiring effective key information in the target contract file by using the method as claimed in any one of claims 1 to 6, and adding the effective key information into a database of a supervision model; and
the monitoring result output module is used for: and generating a transaction in response to the account corresponding to the effective key information, checking and verifying the detail of the transaction by using the supervision model, and outputting a supervision result.
11. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-8.
12. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method according to any of claims 1-8.
CN202311819896.1A 2023-12-27 2023-12-27 Key information extraction method and device and contract signing supervision method and device Pending CN117789230A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311819896.1A CN117789230A (en) 2023-12-27 2023-12-27 Key information extraction method and device and contract signing supervision method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311819896.1A CN117789230A (en) 2023-12-27 2023-12-27 Key information extraction method and device and contract signing supervision method and device

Publications (1)

Publication Number Publication Date
CN117789230A true CN117789230A (en) 2024-03-29

Family

ID=90381146

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311819896.1A Pending CN117789230A (en) 2023-12-27 2023-12-27 Key information extraction method and device and contract signing supervision method and device

Country Status (1)

Country Link
CN (1) CN117789230A (en)

Similar Documents

Publication Publication Date Title
US11170179B2 (en) Systems and methods for natural language processing of structured documents
US10755093B2 (en) Hierarchical information extraction using document segmentation and optical character recognition correction
US11816710B2 (en) Identifying key-value pairs in documents
US20240054802A1 (en) System and method for spatial encoding and feature generators for enhancing information extraction
CN113986864A (en) Log data processing method and device, electronic equipment and storage medium
CN116912847A (en) Medical text recognition method and device, computer equipment and storage medium
CN107766498A (en) Method and apparatus for generating information
CN114298845A (en) Method and device for processing claim settlement bills
CN113051911A (en) Method, apparatus, device, medium, and program product for extracting sensitive word
CN108768742B (en) Network construction method and device, electronic equipment and storage medium
CN117789230A (en) Key information extraction method and device and contract signing supervision method and device
CN115294593A (en) Image information extraction method and device, computer equipment and storage medium
CN112417886A (en) Intention entity information extraction method and device, computer equipment and storage medium
CN107656909B (en) Document similarity judgment method and device based on document mixing characteristics
CN117172632B (en) Enterprise abnormal behavior detection method, device, equipment and storage medium
CN113850085B (en) Enterprise grade evaluation method and device, electronic equipment and readable storage medium
US12014142B2 (en) Machine learning for training NLP agent
CN110674497B (en) Malicious program similarity calculation method and device
US20220405473A1 (en) Machine learning for training nlp agent
CN116956891A (en) Information extraction method, information extraction device, electronic device, and readable storage medium
CN115827869A (en) Document image processing method and device, electronic equipment and storage medium
CN115080753A (en) User portrait information processing method, device, equipment and computer storage medium
CN117273451A (en) Enterprise risk information processing method, device, equipment and storage medium
CN114937282A (en) Enterprise data verification method and device, storage medium and electronic equipment
CN117421405A (en) Language model fine tuning method, device, equipment and medium for financial service

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination