CN111062803A - Financial business query and review method and system - Google Patents

Financial business query and review method and system Download PDF

Info

Publication number
CN111062803A
CN111062803A CN201911224951.6A CN201911224951A CN111062803A CN 111062803 A CN111062803 A CN 111062803A CN 201911224951 A CN201911224951 A CN 201911224951A CN 111062803 A CN111062803 A CN 111062803A
Authority
CN
China
Prior art keywords
message
natural language
query
language field
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911224951.6A
Other languages
Chinese (zh)
Inventor
侯文圣
张宪有
李玉仙
薛萌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Ltd
Original Assignee
Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of China Ltd filed Critical Bank of China Ltd
Priority to CN201911224951.6A priority Critical patent/CN111062803A/en
Publication of CN111062803A publication Critical patent/CN111062803A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/02Banking, e.g. interest calculation or account maintenance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Abstract

The invention provides a financial service inquiry and review method and a system, wherein the method comprises the following steps: splitting and assembling the received query message to form a natural language field; determining whether a natural language field is matched with a preset message classification according to a first matching rule, if so, determining a message type label of the query message, and if not, inputting the natural language field into a multi-label classification model obtained according to a machine learning technology to obtain the message type label of the query message; the obtained message type label is converted into the service language and fed back to the user, and the invention can reduce the workload and the working difficulty of service personnel.

Description

Financial business query and review method and system
Technical Field
The invention relates to the technical field of financial service query and review, in particular to a financial service query and review method and system.
Background
The inquiry and retrieval service is a specific service form of a bank, is an important guarantee for ensuring the safe and efficient operation of a bank payment system, has the characteristics of more financial professional terms, more financial abbreviations, complex service scenes, no fixed format of message key content and the like, and has different writing habits of different banks. The bank must respond within a fixed time limit after receiving the inquiry, so the system is gradually becoming an important carrier for communication between the bank and the client. In a traditional payment system, the services are usually processed manually, service analysis, judgment, processing and the like are basically processed manually by service personnel, and the traditional payment system has the following stations of rechecking, authorization, tracking and supervision and the like, and is low in automation degree and large in human resource consumption.
Disclosure of Invention
The invention aims to provide a financial business query and review method which can reduce the workload and the working difficulty of business personnel. Another object of the present invention is to provide a financial transaction query and review system. It is another object of the present invention to provide a computer apparatus. It is another object of the invention to provide a readable medium.
In order to achieve the above object, the present invention discloses a financial transaction query and review method on one hand, which comprises:
splitting and assembling the received query message to form a natural language field;
determining whether a natural language field is matched with a preset message classification according to a first matching rule, if so, determining a message type label of the query message, and if not, inputting the natural language field into a multi-label classification model obtained according to a machine learning technology to obtain the message type label of the query message;
and converting the obtained message type label into a service language and feeding back the service language to the user.
Preferably, the method further comprises, before converting the obtained message type label into a service language and feeding back to the user:
and determining whether the unrecognized part exists in the query message, and if so, correcting the unrecognized part through a second matching rule to obtain a message type label of the unrecognized part.
Preferably, the determining whether the unrecognized portion exists in the query message, and if so, the obtaining of the message type label of the unrecognized portion by correcting the unrecognized portion through the second matching rule specifically includes:
determining that the part of the natural language field which is not identified to obtain the message type label is an unidentified part;
and obtaining the message type label of the unidentified part by a keyword matching method and/or a regular matching method in a second matching rule.
Preferably, the determining whether the natural language field matches with the preset message type according to the first matching rule specifically includes:
and matching the message type and/or the message field in the first matching rule with the natural language field to determine whether the natural language field is matched with the preset message classification.
Preferably, the method further comprises, prior to entering the natural language field into a multi-label classification model derived from machine learning techniques:
and performing preprocessing operations of domain word reduction, domain stop word removal, symbol removal except letters and/or blank space removal on the natural language field to input the preprocessed natural language field into the multi-label classification model.
Preferably, the step of inputting the query message into a multi-label classification model obtained according to a machine learning technique to obtain a message type label of the query message specifically includes:
vectorizing the natural language field through a TF-IDF algorithm to obtain a feature vector;
and inputting the characteristic vector into a preset multi-label classification model to obtain a message type label of the query message.
Preferably, the method further comprises the step of pre-constructing the multi-label classification model.
Preferably, the constructing the multi-label classification model specifically includes:
splitting and assembling the historical query message to form a natural language field;
determining whether the natural language field is matched with a preset message classification according to a first matching rule, and if so, removing the natural language field capable of being matched;
carrying out manual marking on the residual natural language fields after the matched natural language fields are removed to obtain training data;
preprocessing the training and vectorizing to obtain a feature vector;
and learning the training data through a classifier to obtain a multi-label classification model.
The invention also discloses a financial business inquiry and check system, which comprises:
the query message processing unit is used for splitting and assembling the received query message to form a natural language field;
the message label determining unit is used for determining whether the natural language field is matched with a preset message classification according to a first matching rule, if so, determining a message type label of the query message, and if not, inputting the natural language field into a multi-label classification model obtained according to a machine learning technology to obtain the message type label of the query message;
and the message conversion processing unit is used for converting the obtained message type label into a service language and feeding back the service language to the user.
Preferably, the system further includes a post rule processing unit, configured to determine whether an unidentified part exists in the query message before converting the obtained message type tag into a service language and feeding the service language back to the user, and if so, correct the unidentified part according to a second matching rule to obtain a message type tag of the unidentified part.
Preferably, the post-rule processing unit is specifically configured to determine that a part of the natural language field that is not identified to obtain the packet type label is an unidentified part, and obtain the packet type label of the unidentified part by using a keyword matching method and/or a regular matching method in a second matching rule.
Preferably, the packet tag determining unit is specifically configured to match the packet type and/or the packet field in the first matching rule with the natural language field to determine whether the natural language field matches with a preset packet classification.
Preferably, the message tag determination unit is further configured to, before the natural language field is input into the multi-tag classification model obtained according to the machine learning technique, perform preprocessing operations of domain word reduction, domain stop word removal, symbol removal other than letter removal, and/or space removal on the natural language field to input the preprocessed natural language field into the multi-tag classification model.
Preferably, the message tag determination unit is specifically configured to perform vectorization on the natural language field through a TF-IDF algorithm to obtain a feature vector, and input the feature vector into a preset multi-tag classification model to obtain a message type tag of the query message.
Preferably, the system further comprises a model construction unit for pre-constructing the multi-label classification model.
Preferably, the model building unit is specifically configured to split and assemble a historical query message to form a natural language field, determine whether the natural language field matches a preset message classification according to a first matching rule, remove the natural language field that can be matched if the natural language field matches the preset message classification, perform manual marking on the remaining natural language field after the natural language field that can be matched is removed to obtain training data, perform preprocessing and vectorization on the training to obtain a feature vector, and learn the training data through a classifier to obtain the multi-label classification model.
The invention also discloses a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor,
the processor, when executing the program, implements the method as described above.
The invention also discloses a computer-readable medium, having stored thereon a computer program,
which when executed by a processor implements the method as described above.
In order to reduce the workload and the working difficulty of business personnel, the invention discloses a query and retrieval method in the financial field, which is characterized in that a received query message is split and reassembled to form a natural language field, the classification of the message can be directly determined through a first matching rule for the query message which can determine the classification of the message through the message type and/or the message field, and a message type label is set. For a query message which is relatively complex and can not be directly determined, the natural language field can be input into a multi-label classification model obtained according to a machine learning technology, and the message type label of the query message is automatically identified through the model. And finally, converting all the obtained message type labels into service languages and feeding back the service languages to the user, thereby improving the query and recovery efficiency of the financial service.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart diagram illustrating one embodiment of a financial transaction query replication method;
FIG. 2 is a second flowchart of an embodiment of a financial transaction query review method;
FIG. 3 is a third flowchart of an embodiment of a financial transaction query replication method;
FIG. 4 is a fourth flowchart of an embodiment of a financial transaction query review method;
FIG. 5 is a flow chart diagram illustrating a fifth embodiment of a financial transaction query review method;
FIG. 6 is one of the block diagrams of one embodiment of a financial transaction query review system;
FIG. 7 is a second block diagram of an embodiment of a financial transaction query review system;
FIG. 8 is a third block diagram of an exemplary embodiment of a financial transaction query review system;
FIG. 9 shows a schematic block diagram of a computer device suitable for use in implementing embodiments of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. The terms "comprises" and "comprising," and any variations thereof, in the description and claims of this invention and the above-described drawings, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict. The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.
In one or more embodiments of the present application, machine learning refers to a multi-domain interdisciplinary discipline, and relates to a multi-domain discipline such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer.
In one or more embodiments of the present application, a SWIFT message refers to a message format used in the financial field.
In one or more embodiments of the present application, query and review means that the bank provides international remittance correction, stop payment, compliance and other query services for overseas and overseas businesses and customers, and the main functions include: and receiving and processing SWIFT information messages, core bank system transfer inquiry messages and the like aiming at single international remittance.
In one or more embodiments of the present application, natural language processing is referred to as an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will relate to natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics, but has important difference. Natural language processing is not a general study of natural language but is directed to the development of computer systems, and particularly software systems therein, that can efficiently implement natural language communications. It is thus part of computer science.
In one or more embodiments of the present application, a natural language field refers to a field for placing content to be queried or replied, which is specified in a SWIFT query reply message, and can be divided into 75 fields/76 fields/77A fields/79 fields.
In one or more embodiments of the present application, a multi-label classification means that a message may belong to multiple classifications at the same time. For example, a press release may belong to political, economic, and historical categories.
In one or more embodiments of the present application, stop words refer to words or phrases that are automatically filtered before or after processing natural language data (or text) in order to save storage space and improve search efficiency in information retrieval, and these words or phrases are referred to as stop words.
In one or more embodiments of the present application, vectorization refers to converting text into a vector representation.
In one or more embodiments of the present application, training data refers to a set of data used for machine learning. Is an input to machine learning.
In one or more embodiments of the present application, marking refers to manually assigning a tag to a query reply message.
In one or more embodiments of the present application, TF-IDF refers to a statistical method for evaluating how important a word is for one of a set of documents or a corpus. The importance of a word increases in proportion to the number of times it appears in a document, but at the same time decreases in inverse proportion to the frequency with which it appears in the corpus.
In one or more embodiments of the present application, a random forest refers to an algorithm that integrates a plurality of trees by an ensemble learning idea, the basic unit of the algorithm is a decision tree, and the essence of the algorithm belongs to a large branch of machine learning, namely an ensemble learning method. The algorithm has excellent accuracy and can be effectively operated on a large data set; input samples with high dimensional characteristics can be processed without dimension reduction; the importance of each feature on the classification problem can be evaluated; in the generation process, an unbiased estimation of an internal generation error can be obtained; good results can be obtained also for the default value problem.
In a traditional payment system, the services are usually processed manually, service analysis, judgment, processing and the like are basically processed manually by service personnel, and the traditional payment system has the following stations of rechecking, authorization, tracking and supervision and the like, and is low in automation degree and large in human resource consumption. In the field of machine learning, text data can be well processed by adopting natural language processing, but machine learning is applied to financial industry query and review services, and the natural language processing query and review services cannot achieve a good effect due to the problems that the financial industry query and review services are multiple in professional terms, multiple in financial abbreviations, complex in service scenes, free of fixed formats of message key contents and the like.
Based on this, according to one aspect of the invention, the embodiment discloses a financial service query and review method. In this embodiment, as shown in fig. 1, the method includes:
s100: and splitting and assembling the received query message to form a natural language field.
S200: and determining whether the natural language field is matched with a preset message classification according to a first matching rule, if so, determining a message type label of the query message, and if not, inputting the natural language field into a multi-label classification model obtained according to a machine learning technology to obtain the message type label of the query message.
S300: and converting the obtained message type label into a service language and feeding back the service language to the user.
In order to reduce the workload and the working difficulty of business personnel, the invention discloses a query and retrieval method in the financial field, which is characterized in that a received query message is split and reassembled to form a natural language field, the classification of the message can be directly determined through a first matching rule for the query message which can determine the classification of the message through the message type and/or the message field, and a message type label is set. For a query message which is relatively complex and can not be directly determined, the natural language field can be input into a multi-label classification model obtained according to a machine learning technology, and the message type label of the query message is automatically identified through the model. And finally, converting all the obtained message type labels into service languages and feeding back the service languages to the user, thereby improving the query and recovery efficiency of the financial service.
In a preferred embodiment, as shown in fig. 2, the method further comprises, before S300:
s210: and determining whether the unrecognized part exists in the query message, and if so, correcting the unrecognized part through a second matching rule to obtain a message type label of the unrecognized part. In the preferred embodiment, a post rule is set, and query messages of different fields in the natural language field can be identified through a multi-label classification model to obtain message type labels corresponding to different parts in the natural language field. In some cases, there may also be content in the natural language field that cannot be identified. Therefore, after the model identification, a second matching rule can be further set to correct the unrecognized part of the natural language field.
In a preferred embodiment, as shown in fig. 3, the S210 may specifically include:
s211: and determining that the part of the natural language field which is not identified to obtain the message type label is an unidentified part.
S212: and obtaining the message type label of the unidentified part by a keyword matching method and/or a regular matching method in a second matching rule.
It will be appreciated that for some message classifications, the message classification may be determined entirely by keywords or special patterns, etc. In the post-correction process, whether the unrecognized content exists in the query message or not can be determined firstly, and the message type label can be further determined for the unrecognized part through a second matching rule, so that the further correction of the query message classification is realized. Preferably, the second matching rule may include a keyword matching method and/or a regular matching method, and of course, in practical applications, other further matching rules may be selected to further process the natural language field after the model processing, which is not limited in the present invention.
In a preferred embodiment, the determining, in S200, whether the natural language field matches the preset packet type according to the first matching rule may specifically include:
s201: and matching the message type and/or the message field in the first matching rule with the natural language field to determine whether the natural language field is matched with the preset message classification.
It can be understood that in query and reply, a part of messages do not need to be processed by a model, and the classification of the messages can be determined only by the type of the messages and/or the fixed field times in the messages, so that the preposed rules are set in the process, and the classification of the messages can be determined by hitting the rules. Preferably, the first matching rule may include matching of a message type and/or a message field, and in practical applications, other types of matching rules may also be set, which is not limited in the present invention.
In a preferred embodiment, the method further comprises, prior to entering the natural language field into a multi-label classification model derived from machine learning techniques:
s202: and performing preprocessing operations of domain word reduction, domain stop word removal, symbol removal except letters and/or blank space removal on the natural language field to input the preprocessed natural language field into the multi-label classification model.
It is understood that there may be a great number of professional abbreviations in the query message, and the professional abbreviations in the query message may be restored to complete words or phrases expressed in a natural language manner by combining with the domain dictionary in the financial domain. Wherein, sampling can be carried out from all inquiry reply messages in the past year to generate training data. In the marking process of business personnel, the query repeated terms used in the message are combed to generate a domain dictionary.
In addition, in natural language, general stop words cannot be well used in the query and retrieval field, some common stop words have special meanings in the query and retrieval field, and some words having meanings in the general field have no meanings in the query and retrieval field. Therefore, for the query reply message, word frequency statistics is carried out after word segmentation, and the field stop words can be obtained by combing business personnel. Therefore, before the model identification, the preprocessing operation of restoring the field words, removing the field stop words, removing symbols except letters and/or removing spaces is carried out on the natural language field, so that the accuracy of the model identification can be improved, and the omission of the message type labels is avoided.
In a preferred embodiment, as shown in fig. 4, the step S200 of inputting the query packet into a multi-label classification model obtained according to a machine learning technique, and obtaining a packet type label of the query packet specifically includes:
s203: vectorizing the natural language field through a TF-IDF algorithm to obtain a feature vector.
S204: and inputting the characteristic vector into a preset multi-label classification model to obtain a message type label of the query message.
In a preferred embodiment, the method further comprises a step S000 of pre-constructing the multi-label classification model.
In a preferred embodiment, as shown in fig. 5, the S000 specifically includes:
s010: and splitting and assembling the historical query message to form a natural language field.
S020: and determining whether the natural language field is matched with the preset message classification according to the first matching rule, and if so, removing the natural language field capable of being matched. The successful classification and matching of the message through the first matching rule indicates that the message type label can be determined without further identification through a model, and the partial data is identified without inputting the model, so that the method has no any significance to the process of training the model and needs to be screened out.
S030: and carrying out manual marking on the residual natural language fields after the matched natural language fields are removed to obtain training data.
S040: and preprocessing the training and vectorizing to obtain a feature vector. Similar to model recognition, the feature vector can be obtained by vectorizing the natural language field with TF-IDF.
S050: and learning the characteristic vectors through a classifier to obtain a multi-label classification model. Preferably, the classifier can be a random forest classifier.
In a preferred embodiment, the method further comprises:
s060: and evaluating the message type label of the obtained query message. The message type labels of the query messages are evaluated to determine whether the classification effect of the query messages reaches a preset standard, if the classification effect of the query messages reaches the preset standard, the query and reply process can be accurately carried out, and if the classification effect of the query messages does not reach the preset standard, a problem occurs in a model or rule in the query and reply process, and the model or rule needs to be adjusted in time to ensure that the query and reply service is accurately and efficiently completed. Specifically, the training data obtained in S030 for manual marking is not all used for training, and 70% or 80% of the training data is usually selected for training. The remaining data are evaluated and since they are also manually marked, they already have label a. After the classification model is obtained in S050, the model is used to predict the remaining data, and a classification label B is obtained. Then, by comparing the label A with the label B, whether the classification of the model to the message is accurate can be known. There are many specific evaluation Score calculation methods, and in the multi-label classification, F1-Score is generally used. After evaluation, if the two results are inconsistent, the error cause needs to be analyzed by reading the misclassified message, so as to make adjustments to the model or rules.
The invention will be further illustrated by means of a specific example. In a specific example, the query message is {1: F01 BKKCHCNBJAXXX 2896061717} {2: O1990941190628CITIUS33HXXX21777252751906282142N } {3: 90628AUU51927USA } } {4:
:20:CIT190620-010448
:21:S069148212C701
:79:THIS IS A URGENT MESSAGE BEING SENT
WITH REGARDS TO OUR PAYMENT ORDER S069148212C701
IN THE AMOUNT OF 175.00DATED 20190528.
WE HAVE BEEN CONTACTED BY THE REMITTING PARTY
WHO STATES THAT THE BENEFICIARY IS CLAIMING NON
RECEIPT OF THE PAYMENT.PLEASE CONFIRM THE
AMOUNT,VALUE DATE,REFERENCE AND ACCOUNT
CREDITED.
PLEASE QUOTE OUR CASE REFERENCE CIT190620-010448
ON ANY FUTURE FOLLOW UP REGARDING THIS MATTER.
REGARDS,
USD FT INVESTIGATIONS
-}{5:{CHK:1F51569FFCDB}}。
the messages 20, 21 and 79 are message fields, are completely clear and have corresponding SWIFT message specifications. However, each message does not need to write all fields, and only the used fields need to be written in the message. The natural language fields are 75 fields, 76 fields, 77A fields and 79 fields. In this example, the message contains only 79 fields. If there are multiple natural language fields in the message, the contents of the fields are spliced together.
The SWIFT messages are canonical, so that the messages can be disassembled according to the specification. For example, the field 79 of natural language is broken out as follows:
THIS IS A URGENT MESSAGE BEING SENT
WITH REGARDS TO OUR PAYMENT ORDER S069148212C701
IN THE AMOUNT OF 175.00DATED 20190528.
WE HAVE BEEN CONTACTED BY THE REMITTING PARTY
WHO STATES THAT THE BENEFICIARY IS CLAIMING NON
RECEIPT OF THE PAYMENT.PLEASE CONFIRM THE
AMOUNT,VALUE DATE,REFERENCE AND ACCOUNT
CREDITED.
PLEASE QUOTE OUR CASE REFERENCE CIT190620-010448
ON ANY FUTURE FOLLOW UP REGARDING THIS MATTER.
REGARDS,
USD FT INVESTIGATIONS。
the field word is restored, the field stop words are removed, the symbols except the letters are removed and/or the spaces are removed for the removed field, and the like, and then the following steps are carried out:
URGENT MESSAGE BEING SENT WITH REGARDS OUR PAYMENT ORDER SC
IN THE AMOUNT DATED WE HAVE BEEN CONTACTED BY REMITTINGPARTY WHOSTATES THAT BENEFICIARY CLAIMING NON RECEIPT PAYMENTCONFIRM AMOUNT VALUE DATEREFERENCE ACCOUNT CREDITED QUOTEOUR CASE REFERENCE CIT ON ANY FUTURE FOLLOWUP REGARDING MATTERUSD FT INVESTIGATIONS。
and then, processing the processed message in S200, namely judging whether the message is hit regularly or not according to rules, and if so, labeling. If not, sending the model judgment. And after the model is judged, correcting by using a post rule. The label of the resulting message, e.g., '05', '06', 05, 06, is the encoding of the label inside the background system. When returning to the user, the service description is converted into corresponding service description, such as '05', '06' converted into 'check and settlement', 'urge to check'.
In summary, the present invention can perform multi-label classification for each message, so that the service personnel can know the message classification before reading the message. A natural language processing model is constructed by constructing a query and query domain dictionary, constructing domain stop words and using a vectorization method and a machine learning method suitable for query and query. By adding the pre-rule and the post-rule, a processing method of pre-rule preprocessing, model judgment and post-rule correction supplement is formed according to the characteristics of the query and duplicate checking service message, a very good multi-label classification effect is finally realized, and the processing efficiency of the query and duplicate checking service in the financial industry is improved.
Based on the same principle, the embodiment also discloses a financial business query and review system. As shown in fig. 6, in this embodiment, the system includes an inquiry packet processing unit 11, a packet tag determining unit 12, and a packet conversion processing unit 13.
The query message processing unit 11 is configured to split and assemble a received query message to form a natural language field;
the message label determining unit 12 determines whether the natural language field matches a preset message classification according to a first matching rule, if so, determines a message type label of the query message, and if not, inputs the natural language field into a multi-label classification model obtained according to a machine learning technology to obtain the message type label of the query message;
the message conversion processing unit 13 is configured to convert the obtained message type label into a service language and feed back the service language to the user.
In a preferred embodiment, as shown in fig. 7, the system further comprises a post-rule processing unit 14. The post rule processing unit 14 is configured to determine whether an unidentified part exists in the query message before converting the obtained message type tag into a service language and feeding back the service language to the user, and if so, correct the unidentified part through a second matching rule to obtain a message type tag of the unidentified part.
In a preferred embodiment, the post-rule processing unit 14 is specifically configured to determine that a part of the natural language field that is not identified to obtain the packet type label is an unidentified part, and obtain the packet type label of the unidentified part by using a keyword matching method and/or a regular matching method in a second matching rule.
In a preferred embodiment, the packet tag determining unit 12 is specifically configured to match the packet type and/or the packet field in the first matching rule with a natural language field to determine whether the natural language field matches a preset packet classification.
In a preferred embodiment, the message tag determining unit 12 is further configured to perform preprocessing operations of restoring a domain word, removing a domain stop word, removing a symbol other than a letter, and/or removing a space on the natural language field to input the preprocessed natural language field into the multi-tag classification model before inputting the natural language field into the multi-tag classification model obtained according to the machine learning technology.
In a preferred embodiment, the message tag determining unit 12 is specifically configured to perform vectorization on the natural language field through a TF-IDF algorithm to obtain a feature vector, and input the feature vector into a preset multi-tag classification model to obtain a message type tag of the query message.
In a preferred embodiment, the system further comprises a model construction unit 10. The model building unit 10 is used for the step of building the multi-label classification model in advance.
In a preferred embodiment, as shown in fig. 8, the model building unit 10 is specifically configured to split and assemble a historical query message to form a natural language field, determine whether the natural language field matches a preset message classification according to a first matching rule, remove the natural language field that can be matched if the natural language field matches the preset message classification, manually mark the remaining natural language field from which the natural language field that can be matched is removed to obtain training data, pre-process and vectorize the training data to obtain a feature vector, and learn the training data through a classifier to obtain a multi-label classification model.
Since the principle of the system for solving the problem is similar to the above method, the implementation of the system can refer to the implementation of the method, and the detailed description is omitted here.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer device, which may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
In a typical example, the computer device specifically comprises a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method performed by the client as described above when executing the program, or the processor implementing the method performed by the server as described above when executing the program.
Referring now to FIG. 9, shown is a schematic diagram of a computer device 600 suitable for use in implementing embodiments of the present application.
As shown in fig. 9, the computer apparatus 600 includes a Central Processing Unit (CPU)601 which can perform various appropriate works and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM)) 603. In the RAM603, various programs and data necessary for the operation of the system 600 are also stored. The CPU601, ROM602, and RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output section 607 including a Cathode Ray Tube (CRT), a liquid crystal feedback (LCD), and the like, and a speaker and the like; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted as necessary on the storage section 608.
In particular, according to an embodiment of the present invention, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the invention include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (18)

1. A financial service inquiry and review method is characterized by comprising the following steps:
splitting and assembling the received query message to form a natural language field;
determining whether a natural language field is matched with a preset message classification according to a first matching rule, if so, determining a message type label of the query message, and if not, inputting the natural language field into a multi-label classification model obtained according to a machine learning technology to obtain the message type label of the query message;
and converting the obtained message type label into a service language and feeding back the service language to the user.
2. The financial transaction query review method of claim 1, further comprising, prior to converting the obtained message type tag to a transaction language and feeding back to the user:
and determining whether the unrecognized part exists in the query message, and if so, correcting the unrecognized part through a second matching rule to obtain a message type label of the unrecognized part.
3. The financial transaction query and review method according to claim 2, wherein the determining whether the unrecognized portion exists in the query message, and if so, the obtaining of the message type label of the unrecognized portion by correcting the unrecognized portion through the second matching rule specifically includes:
determining that the part of the natural language field which is not identified to obtain the message type label is an unidentified part;
and obtaining the message type label of the unidentified part by a keyword matching method and/or a regular matching method in a second matching rule.
4. The financial transaction query and review method according to claim 1, wherein the determining whether the natural language field matches the predetermined message type according to the first matching rule specifically comprises:
and matching the message type and/or the message field in the first matching rule with the natural language field to determine whether the natural language field is matched with the preset message classification.
5. The financial transaction query review method of claim 1, further comprising, prior to entering the natural language field into a multi-label classification model derived from machine learning techniques:
and performing preprocessing operations of domain word reduction, domain stop word removal, symbol removal except letters and/or blank space removal on the natural language field to input the preprocessed natural language field into the multi-label classification model.
6. The financial transaction query and review method according to claim 1, wherein the inputting the query message into a multi-label classification model obtained according to a machine learning technique to obtain a message type label of the query message specifically comprises:
vectorizing the natural language field through a TF-IDF algorithm to obtain a feature vector;
and inputting the characteristic vector into a preset multi-label classification model to obtain a message type label of the query message.
7. The financial transaction query review method of claim 1 further including the step of pre-constructing the multi-label classification model.
8. The financial transaction query review method of claim 7, wherein the constructing the multi-label classification model specifically comprises:
splitting and assembling the historical query message to form a natural language field;
determining whether the natural language field is matched with a preset message classification according to a first matching rule, and if so, removing the natural language field capable of being matched;
carrying out manual marking on the residual natural language fields after the matched natural language fields are removed to obtain training data;
preprocessing the training and vectorizing to obtain a feature vector;
and learning the training data through a classifier to obtain a multi-label classification model.
9. A financial transaction query review system, comprising:
the query message processing unit is used for splitting and assembling the received query message to form a natural language field;
the message label determining unit is used for determining whether the natural language field is matched with a preset message classification according to a first matching rule, if so, determining a message type label of the query message, and if not, inputting the natural language field into a multi-label classification model obtained according to a machine learning technology to obtain the message type label of the query message;
and the message conversion processing unit is used for converting the obtained message type label into a service language and feeding back the service language to the user.
10. The financial transaction query and review system of claim 9, further comprising a post-rule processing unit, configured to determine whether an unidentified portion exists in the query message before converting the obtained message type tag into a service language and feeding back the service language to the user, and if so, correct the unidentified portion by a second matching rule to obtain a message type tag of the unidentified portion.
11. The financial transaction query and review system according to claim 10, wherein the post-rule processing unit is specifically configured to determine that a part of the natural language field that is not identified to obtain the packet type tag is an unidentified part, and obtain the packet type tag of the unidentified part by a keyword matching method and/or a regular matching method in a second matching rule.
12. The financial transaction query and review system of claim 9, wherein the packet tag determination unit is specifically configured to match the packet type and/or the packet field in the first matching rule with a natural language field to determine whether the natural language field matches a preset packet classification.
13. The financial transaction query review system of claim 9 wherein the message tag determination unit is further configured to perform preprocessing operations of domain word reduction, domain stop word removal, removal of symbols other than letters, and/or space removal on the natural language field to input the preprocessed natural language field into the multi-tag classification model before inputting the natural language field into the multi-tag classification model obtained according to machine learning techniques.
14. The financial transaction query and review system of claim 9, wherein the message tag determination unit is specifically configured to obtain a feature vector by vectorizing the natural language field through a TF-IDF algorithm, and input the feature vector into a preset multi-tag classification model to obtain a message type tag of the query message.
15. The financial transaction query review system of claim 9 further comprising a model construction unit for the step of pre-constructing the multi-label classification model.
16. The financial transaction query and review system of claim 15, wherein the model construction unit is specifically configured to split and assemble a historical query message to form a natural language field, determine whether the natural language field matches a preset message classification according to a first matching rule, remove the natural language field that can be matched if the natural language field matches the preset message classification, manually mark the remaining natural language field after the natural language field that can be matched is removed to obtain training data, pre-process and vectorize the training data to obtain a feature vector, and learn the training data through a classifier to obtain a multi-label classification model.
17. A computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor,
the processor, when executing the program, implements the method of any of claims 1-8.
18. A computer-readable medium, having stored thereon a computer program,
the program when executed by a processor implementing the method according to any one of claims 1-8.
CN201911224951.6A 2019-12-04 2019-12-04 Financial business query and review method and system Pending CN111062803A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911224951.6A CN111062803A (en) 2019-12-04 2019-12-04 Financial business query and review method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911224951.6A CN111062803A (en) 2019-12-04 2019-12-04 Financial business query and review method and system

Publications (1)

Publication Number Publication Date
CN111062803A true CN111062803A (en) 2020-04-24

Family

ID=70299751

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911224951.6A Pending CN111062803A (en) 2019-12-04 2019-12-04 Financial business query and review method and system

Country Status (1)

Country Link
CN (1) CN111062803A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113014558A (en) * 2021-02-10 2021-06-22 中国工商银行股份有限公司 Message identification method, device, computer system and readable storage medium
CN113129120A (en) * 2021-04-16 2021-07-16 建信金融科技有限责任公司 Financial institution data supervision method and device
CN113377670A (en) * 2021-06-30 2021-09-10 大商所飞泰测试技术有限公司 Self-adaptive high-performance transaction simulation method and system suitable for financial industry
CN115473856A (en) * 2022-09-07 2022-12-13 中国银行股份有限公司 Message checking method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109992763A (en) * 2017-12-29 2019-07-09 北京京东尚科信息技术有限公司 Language marks processing method, system, electronic equipment and computer-readable medium
CN110322337A (en) * 2019-04-18 2019-10-11 中国工商银行股份有限公司 A kind of inquiry business looks into multiple method and device automatically
CN110347784A (en) * 2019-05-23 2019-10-18 深圳壹账通智能科技有限公司 Report form inquiring method, device, storage medium and electronic equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109992763A (en) * 2017-12-29 2019-07-09 北京京东尚科信息技术有限公司 Language marks processing method, system, electronic equipment and computer-readable medium
CN110322337A (en) * 2019-04-18 2019-10-11 中国工商银行股份有限公司 A kind of inquiry business looks into multiple method and device automatically
CN110347784A (en) * 2019-05-23 2019-10-18 深圳壹账通智能科技有限公司 Report form inquiring method, device, storage medium and electronic equipment

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113014558A (en) * 2021-02-10 2021-06-22 中国工商银行股份有限公司 Message identification method, device, computer system and readable storage medium
CN113014558B (en) * 2021-02-10 2022-12-27 中国工商银行股份有限公司 Message identification method, device, computer system and readable storage medium
CN113129120A (en) * 2021-04-16 2021-07-16 建信金融科技有限责任公司 Financial institution data supervision method and device
CN113377670A (en) * 2021-06-30 2021-09-10 大商所飞泰测试技术有限公司 Self-adaptive high-performance transaction simulation method and system suitable for financial industry
CN115473856A (en) * 2022-09-07 2022-12-13 中国银行股份有限公司 Message checking method and device

Similar Documents

Publication Publication Date Title
Gupta et al. Sentiment analysis for stock price prediction
CN108829681B (en) Named entity extraction method and device
CN111062803A (en) Financial business query and review method and system
US20190080352A1 (en) Segment Extension Based on Lookalike Selection
CN114357117A (en) Transaction information query method and device, computer equipment and storage medium
CN113505601A (en) Positive and negative sample pair construction method and device, computer equipment and storage medium
CN111651994B (en) Information extraction method and device, electronic equipment and storage medium
CN110222192A (en) Corpus method for building up and device
CN114693215A (en) Purchase request processing method and device, computer equipment and storage medium
CN111782793A (en) Intelligent customer service processing method, system and equipment
CN113592568B (en) Business opportunity mining method and device, computer equipment and storage medium
CN113378090B (en) Internet website similarity analysis method and device and readable storage medium
CN110717333A (en) Method and device for automatically generating article abstract and computer readable storage medium
CN113902569A (en) Method for identifying the proportion of green assets in digital assets and related products
CN111324738B (en) Method and system for determining text label
CN111241273A (en) Text data classification method and device, electronic equipment and computer readable medium
CN114266255B (en) Corpus classification method, apparatus, device and storage medium based on clustering model
CN115730237A (en) Junk mail detection method and device, computer equipment and storage medium
CN112883183B (en) Method for constructing multi-classification model, intelligent customer service method, and related device and system
CN109344388A (en) A kind of comment spam recognition methods, device and computer readable storage medium
CN115080730A (en) Account data processing method and device, electronic equipment and computer storage medium
CN113515587A (en) Object information extraction method and device, computer equipment and storage medium
Wang et al. Preprocessing and feature extraction methods for microfinance overdue data
CN110705287A (en) Method and system for generating text abstract
Bharadi Sentiment Analysis of Twitter Data Using Named Entity Recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200424