CN106844625B - Method and device for checking compliance of bank operation and maintenance regulation and change - Google Patents

Method and device for checking compliance of bank operation and maintenance regulation and change Download PDF

Info

Publication number
CN106844625B
CN106844625B CN201710040985.4A CN201710040985A CN106844625B CN 106844625 B CN106844625 B CN 106844625B CN 201710040985 A CN201710040985 A CN 201710040985A CN 106844625 B CN106844625 B CN 106844625B
Authority
CN
China
Prior art keywords
document
documents
change
matching
bank
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710040985.4A
Other languages
Chinese (zh)
Other versions
CN106844625A (en
Inventor
徐华
詹立雄
邓俊辉
石炎军
孙晓民
楼浩
郭京生
李佳
张帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Trust & Far Technology Co ltd
Tsinghua University
Original Assignee
Beijing Trust & Far Technology Co ltd
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Trust & Far Technology Co ltd, Tsinghua University filed Critical Beijing Trust & Far Technology Co ltd
Priority to CN201710040985.4A priority Critical patent/CN106844625B/en
Publication of CN106844625A publication Critical patent/CN106844625A/en
Application granted granted Critical
Publication of CN106844625B publication Critical patent/CN106844625B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/02Banking, e.g. interest calculation or account maintenance

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Finance (AREA)
  • Computational Linguistics (AREA)
  • Accounting & Taxation (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a device for checking the compliance of bank operation and maintenance regulation and regulation change, wherein the method comprises the following steps: acquiring system documents of a bank system; lexical transformation is carried out on system documents to establish an inverted index; carrying out synonym expansion of words on the system document to obtain a system document stored in a structuralized mode; acquiring a change document; and when the indexes are matched, performing similar matching of short texts according to the institutional documents and paragraphs of the changed documents which are stored in a structured mode to obtain a query result. The method can obtain the query result according to the system document and the change document which are stored in a structuralized mode, improves the accuracy and the efficiency of checking, and is simple and easy to implement.

Description

Method and device for checking compliance of bank operation and maintenance regulation and change
Technical Field
The invention relates to the technical field of computer application and banks, in particular to a method and a device for checking the compliance of bank operation and maintenance regulation and regulation change.
Background
At present, the safety and the efficiency of the bank system are particularly important, especially the safety is the life line of the bank system, but large-scale faults in the bank still occur sometimes. The large-scale fault is not caused by the working error of the foreground, because the comprehensive transaction steps of the foreground of the bank can almost prevent the occurrence of human errors, and even if the errors occur, the errors are small-scale errors of one or two transactions. Large-scale failures are often caused by failures of the background system. Therefore, to avoid bank failures more effectively, the background system should be emphasized. However, background systems of banks are often very complex, and causes of failures are more various, and may be formed by: a link network between banks, a database of back-end recorded data, a server for running a transaction program, etc. are malfunctioning. One of the failures often causes a series of chain reactions, for example, when the database is paralyzed, all transaction requests start to pile up, thereby causing insufficient resources of the server; conversely, if the memory of the server leaks, the system resources are gradually reduced, so that the resources required by the operation of the database are insufficient, and the database finally breaks down. Therefore, the system correlation of the back end is quite complex, and it is almost impossible to directly analyze the cause of the fault through a rule method. Although the number of times of fault generation is rare, the fault generation is not irregular, according to experience in banks, a system often generates some abnormal states before the fault occurs, the state of the system is often easier to monitor than the fault, and the parameters of the system can be monitored and analyzed in real time, so that when the fault occurs can be predicted, and the fault generation method is also an important research field in artificial intelligence.
An accurate failure prediction may alert people in advance of the failure so that appropriate measures such as troubleshooting, data backup, and software hardware restart may be taken. Evaluating the stability of a system can be evaluated from two indicators, reliability and availability. The reliability refers to the probability of system failure, and for the bank system, the reliability is often very high, that is, failure occurs in few cases, so that it is difficult to improve the performance of the system from the viewpoint of reliability; the availability refers to the length of time required for system recovery after a fault, and this performance index is also very important in the actual use process. Corresponding measures can be predicted in advance through a fault prediction method, so that the system recovery speed is accelerated, the system availability is improved, and the system performance is improved under the condition of certain reliability. On the other hand, since some system parameters related to the fault are known, the reliability of the system can be improved to some extent by artificially limiting and adjusting the parameters so as to avoid the occurrence of the fault in advance.
On the other hand, if a fault has occurred, it is necessary to find a method to remove the fault, which may be to directly restart the machine first, or may detect the location where the fault occurred to solve the problem causing the fault, and so on. The choice between these methods is related to many factors, but most importantly, how much transaction volume the bank can lose in the course of the failure. If the current is the peak period of a transaction, the system is often selected to be restarted directly, so that the system can be recovered as soon as possible; if the current transaction is not so intensive, the current transaction can be eliminated one by one, a fault source is found, the fault reason is analyzed, and the same fault is avoided from happening next time.
Due to the privacy of banking systems, it is difficult to find documents relevant to the failure prediction of banking transaction systems. But the problem of failure prediction has been a big direction in the field of artificial intelligence. The history of research on a prediction method for system faults has been over 30 years, and as systems become more and more complex, fault prediction methods are also developing over time, and as research in recent years, fault prediction methods can be roughly summarized as the following processes: data acquisition, key feature extraction, dimension reduction processing, model training and algorithm evaluation.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, an object of the present invention is to provide a method for checking compliance of bank operation and maintenance regulation and change, which can improve the accuracy and efficiency of checking and is simple and easy to implement.
Another objective of the present invention is to provide a device for checking the compliance of bank operation and maintenance regulation changes.
In order to achieve the above object, an embodiment of an aspect of the present invention provides a method for checking compliance of bank operation and maintenance regulation changes, including the following steps: acquiring system documents of a bank system; lexical transformation is carried out on the system documents to establish an inverted index; carrying out synonym expansion of words on the system document to obtain a system document stored in a structuralized mode; acquiring a change document; and when the indexes are matched, performing similar matching of short texts according to the institutional documents and the paragraphs of the changed documents stored in the structuralized mode to obtain a query result.
According to the method for checking the compliance of the bank operation and maintenance regulation and regulation change, when the system document stored in a structured mode is stored, the purpose of checking the compliance of the bank operation and maintenance regulation and regulation change is achieved by obtaining the query result according to the system document and the change document stored in the structured mode, the bank can conveniently and quickly troubleshoot and search problems through quick backtracking and searching of the regulation and regulation change, meanwhile, indexes are built for historical documents, the past operation and maintenance change can be clearly and visually known, the checking accuracy and efficiency are improved, and the checking is simple and easy to achieve.
In addition, the method for checking the compliance of the bank operation and maintenance regulation change according to the above embodiment of the present invention may further have the following additional technical features:
further, in an embodiment of the present invention, the index matching further includes: and obtaining a matching item of the target file according to the system document and the change document which are stored in a structuralized mode so as to obtain regular change history.
Optionally, in an embodiment of the present invention, the matching term of the target file is obtained by calculating a correlation coefficient.
Further, in an embodiment of the present invention, the lexicalizing the institutional document further comprises: segmenting the word sequence of the system document into a plurality of single words; and combining the plurality of single words into a new word sequence according to a preset specification.
Further, in an embodiment of the present invention, the method further includes: and after checking the compliance of the bank operation and maintenance regulation change according to the query result, storing the query result into a local file system in a file form according to the name of the change document.
In order to achieve the above object, another embodiment of the present invention provides a device for checking compliance of bank operation and maintenance regulation changes, including: the first acquisition module is used for acquiring system documents of a bank system; the lexical item module is used for lexical item transformation of the system documents to establish an inverted index; the storage module is used for carrying out synonym expansion on words and phrases on the system document so as to obtain a structured and stored regulation and regulation document; the second acquisition module is used for acquiring the changed document; and the index matching module is used for performing similar matching of short texts according to the institutional documents and the paragraphs of the changed documents stored in the structuralized mode when indexes are matched to obtain a query result.
According to the device for checking the compliance of the bank operation and maintenance regulation and regulation change, when the system document stored in a structuralized mode is stored, the purpose of checking the compliance of the bank operation and maintenance regulation and regulation change is achieved by obtaining the query result according to the system document and the change document stored in the structuralized mode, the bank can conveniently and quickly troubleshoot and search problems through quick backtracking and searching of the regulation and regulation change, meanwhile, indexes are built for historical documents, the past operation and maintenance change can be clearly and visually known, the checking accuracy and efficiency are improved, and the checking is simple and easy to achieve.
In addition, the compliance checking device for bank operation and maintenance regulation change according to the above embodiment of the present invention may further have the following additional technical features:
further, in an embodiment of the present invention, the index matching module is further configured to obtain a matching item of a target file according to the institutional document and the change document stored in the structured storage, so as to obtain a regular change history.
Optionally, in an embodiment of the present invention, the matching term of the target file is obtained by calculating a correlation coefficient.
Further, in one embodiment of the present invention, the lexical module includes: the segmentation unit is used for segmenting the word sequence of the institutional document into a plurality of single words; and the combining unit is used for combining the plurality of unit words into a new word sequence according to a preset specification.
Further, in an embodiment of the present invention, after checking the compliance of the bank operation and maintenance regulation change according to the query result, the storage module is further configured to store the query result in a local file system in a file form according to the name of the change document.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flow chart of a method for checking compliance of bank operation and maintenance regulatory changes according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a compliance checking device for bank operation and maintenance regulation change according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
The following describes a method and a device for checking the compliance of a bank operation and maintenance regulation change according to an embodiment of the present invention with reference to the accompanying drawings, and first, a method for checking the compliance of the bank operation and maintenance regulation change according to an embodiment of the present invention will be described with reference to the accompanying drawings.
Fig. 1 is a flowchart of a method for checking the compliance of bank operation and maintenance regulation change according to an embodiment of the present invention.
As shown in fig. 1, the method for checking the compliance of the bank operation and maintenance regulation change comprises the following steps:
in step S101, institution documents of the banking system are acquired.
In step S102, the institutional documents are lexicalized to build an inverted index.
Further, in one embodiment of the present invention, lexicalizing the institutional documents further comprises: dividing a word sequence of a system document into a plurality of single words; and combining a plurality of single words into a new word sequence according to a preset specification.
In particular, lexicalization refers to the segmentation of a sequence of Chinese characters into a single word. Lexicalization is the process of recombining successive word sequences into word sequences according to a certain specification. In English, a space is used as a natural delimiter between words, Chinese is only a character, a sentence and a paragraph can be simply delimited by an obvious delimiter, and only the word has no formal delimiter.
Lexical algorithms can be divided into three major categories: a lexical method based on string matching, a lexical method based on understanding, and a lexical method based on statistics. Whether the method is combined with the part-of-speech tagging process or not can be divided into a simple lexical method and an integrated method combining lexical and tagging.
Character matching:
the method is also called mechanical lexical method, which matches the Chinese character string to be analyzed with the entry in a sufficiently large machine dictionary according to a certain strategy, and if a certain character string is found in the dictionary, the matching is successful (a word is recognized). According to different scanning directions, the string matching lexical item method can be divided into forward matching and reverse matching; according to the condition of preferential matching of different lengths, the method can be divided into maximum (longest) matching and minimum (shortest) matching; several mechanical lexical methods are commonly used as follows:
1) forward maximum matching (left to right direction);
2) inverse maximum matching (right-to-left direction);
3) least segmentation (minimizing the number of words cut in each sentence);
4) bidirectional maximum matching (performing two scans from left to right and from right to left);
the above-described methods may be combined with each other, and for example, a forward maximum matching method and a reverse maximum matching method may be combined to constitute a bidirectional matching method. Because of the characteristic of Chinese single word formation, the forward minimum matching and the reverse minimum matching are generally rarely used. Generally speaking, the segmentation precision of the reverse matching is slightly higher than that of the forward matching, and the encountered ambiguity phenomenon is less. Statistics show that the error rate of pure forward maximum matching is 1/169, and the error rate of pure reverse maximum matching is 1/245. However, this accuracy is far from satisfactory for practical purposes. In the lexical system used in practice, mechanical lexical transformation is used as an initial segmentation means, and various other language information is used to further improve the segmentation accuracy.
One method is to improve the scanning mode, called feature scanning or mark segmentation, to identify and segment some words with obvious features in the character string to be analyzed, and to use these words as break points, the original character string can be divided into smaller strings to be processed mechanically, thereby reducing the matching error rate. The other method combines lexical item and part of speech tagging, utilizes rich part of speech information to provide help for lexical item decision making, and carries out inspection and adjustment on lexical item results in a tagging process, so that the segmentation accuracy is greatly improved.
For the mechanical lexical method, a general model can be established, which is not described in detail herein.
And (3) a cleavage method:
the lexical item method achieves the effect of recognizing words by enabling a computer to simulate human comprehension of sentences. The basic idea is to analyze syntax and semantics while lexical item is formed, and to process ambiguity phenomenon by using syntax information and semantic information. It generally comprises three parts: a lexical subsystem, a syntactic and semantic subsystem and a master control part. Under the coordination of the general control part, the lexical subsystem can obtain syntactic and semantic information of related words, sentences and the like to judge lexical ambiguity, namely, the lexical ambiguity simulates the understanding process of a person on the sentences. This lexical approach requires the use of a large amount of linguistic knowledge and information. Because of the generality and complexity of Chinese language knowledge, it is difficult to organize various language information into a form that can be directly read by a machine, so the current understanding-based lexical system is still in a test stage.
Statistical method:
a word is formally a stable combination of words, so in this context, the more times adjacent words occur simultaneously, the more likely it is to constitute a word. Therefore, the frequency or probability of the co-occurrence of the characters and the adjacent characters can better reflect the credibility of the words. The frequency of the combination of adjacent co-occurring words in the material can be counted to calculate their co-occurrence information. The co-occurrence information of two characters is defined, and the adjacent co-occurrence probability of two Chinese characters X, Y is calculated. The mutual-occurrence information embodies the closeness of the combination relationship between the Chinese characters. When the degree of closeness is above a certain threshold, it is considered that the word group may constitute a word. The method only needs to count the word group frequency in the corpus, does not need a word segmentation dictionary, and is called a dictionary-free lexical method or a statistical word-taking method. However, this method also has a limitation in that some common word groups, which have a high co-occurrence frequency but are not words, such as "this", "one", "some", "my", "many", and the like, are often extracted, and the accuracy of recognition of common words is poor, and the space-time overhead is large. The practical statistical lexical system uses a basic lexical dictionary (common word dictionary) to carry out string matching lexical item formation, and simultaneously uses a statistical method to identify some new words, namely, the string frequency statistics and the string matching are combined, so that the characteristics of high speed and high efficiency of matching lexical item segmentation are exerted, and the advantages of dictionary-free lexical item formation combined with the advantages of identifying new words and automatically eliminating ambiguity in the context are utilized.
Another class is statistical machine learning based methods. Firstly, a large number of lexical texts are given, and a rule of word segmentation is learned by using a statistical machine learning model (training), so that segmentation of unknown texts is realized. It is known that the ability of each character in Chinese to be a word or phrase independently varies, some characters often appear as prefixes, some characters often appear as suffixes ("ones" or "characters"), and information about whether two characters are temporarily formed or not is combined, so that a lot of knowledge about lexical item formation is obtained. The method fully utilizes the rule of Chinese word composition to carry out lexical transformation. The biggest disadvantage of the method is that a large amount of corpus of pre-divided words is needed for support, and the space-time overhead in the training process is very large.
It should be noted that the lexical algorithm is more accurate and is not necessarily the case. For any mature lexical system, the implementation cannot be realized by depending on one algorithm, and different algorithms need to be synthesized. For example, the lexical algorithms of mass science and technology use a "compound lexical method", i.e., a method of comprehensively applying mechanics and knowledge as a combination of traditional and western medicine. For a mature lexical system, a variety of algorithms are required to comprehensively address the problem.
Further, the indexing algorithm is derived from the fact that in practical applications, records need to be searched according to the values of attributes. Each entry in such an index table includes an attribute value and the address of the record having the attribute value. Since the attribute value is not determined by the record, but the position of the record is determined by the attribute value, it is called an index algorithm. The file with the indexing algorithm is called an inverted index file, which is called an inverted file for short.
The posting lists are used to record which documents contain a word. Generally, many documents in a document set contain a word, each document records information such as a document number (DocID), the number of Times (TF) that the word appears in the document, and where the word appears in the document, so that the information related to a document is called an indexing algorithm item (nesting), and a series of indexing algorithm items containing the word form a list structure, which is a Posting list corresponding to a word. The right diagram is a schematic diagram of a posting list, and all words appearing in the document collection and their corresponding posting lists constitute the indexing algorithm.
In an actual search engine system, the actual document numbers in the indexing algorithm terms are not stored, but instead, document number differences (D-Gaps). The document number difference is the difference between the document numbers of two adjacent indexing algorithm items in the inverted list, and generally in the index construction process, the document number appearing in the inverted list later is ensured to be larger than the document number appearing in the previous, so the document number difference is always an integer larger than 0. In the example shown in fig. 2, the original 3 document numbers are 187, 196 and 199, respectively, and are converted into the following numbers when actually stored by the number difference calculation: 187. 9 and 3.
The main reason for performing the difference calculation on the document numbers is to better compress the data, and the original document numbers are generally large values, and the large values are effectively converted into small values through the difference calculation, which helps to increase the compression rate of the data.
The indexing algorithm, which generally has an indexing method such as reverse indexing, posting or reverse archive, is used to store a mapping of a storage location of a word in a document or a group of documents in a full-text search. Which is the most common data structure in document retrieval systems. Through the indexing algorithm, the document list containing the word can be quickly acquired according to the word. The indexing algorithm is mainly composed of two parts: a "word dictionary" and an "inverted file".
The indexing algorithm has two different forms of inverted indexing:
a horizontal inverted index (or inverted archive index) of a record contains a list of documents for each reference word;
a horizontal inverted index (or full inverted index) of a word in turn contains the position of each word in a document.
It should be noted that the latter format provides more compatibility (such as phrase searching), but requires more time and space to create.
The indexing of modern search engines is based on an indexing algorithm. Compared with index structures such as 'signature files', 'suffix trees', and the like, an 'indexing algorithm' is the best implementation way and the most effective index structure for realizing the mapping relation of words to documents.
Pearson correlation coefficient:
in statistics, the Pearson product-moment correlation coefficient (Pearson product-moment correlation coefficient) is used to measure the correlation between two variables X and Y, with values between-1 and 1.
The pearson correlation coefficient between two variables is defined as the quotient of the covariance and the standard deviation between the two variables:
Figure BDA0001211908390000071
the above formula defines the overall correlation coefficient, often with the greek lower case letter p as the representative symbol. Estimating the covariance and standard deviation of the sample to obtain a sample correlation coefficient, which is represented by a lower case letter r in English:
Figure BDA0001211908390000072
r can also be estimated from the standard fractional mean of the (Xi, Yi) sample points, resulting in an expression equivalent to the above:
Figure BDA0001211908390000073
the pearson correlation coefficient varies from-1 to 1. A coefficient value of 1 means that X and Y can be well described by a straight line equation, all data points well fall on a straight line, and Y increases with increasing X. A coefficient value of-1 means that all data points fall on a straight line and Y decreases as X increases. A coefficient value of 0 means that there is no linear relationship between the two variables.
Spearman rank correlation coefficient:
in statistics, the spearman rank correlation coefficient is a non-parametric indicator that measures the dependence of two variables. It evaluates the correlation of two statistical variables using a monotonic equation. If there are no repeated values in the data, and when the two variables are perfectly monotonically correlated, the spearman correlation coefficient is either +1 or-1.
The spearman correlation coefficient is defined as the pearson correlation coefficient between the level variables. For a sample with a sample capacity of n, n original data Xi, Yi are converted into level data Xi, Yi, and the correlation coefficient p is:
Figure BDA0001211908390000081
the raw data is assigned a corresponding rank according to its average descending position in the overall data. Spearman correlation may also be referred to as "rank correlation"; that is, the "level" of observed data is replaced with a "level". In a continuous distribution, the level of observed data is usually always less than half the level. However, in this case, the level and rank correlation coefficients are consistent. More generally, the ratio of the "level" of observed data to the estimated overall sample is less than a given value, i.e., half of the observed value. That is, it is one possible solution for the corresponding level coefficient. Although not commonly used, "level-correlated" is still used.
The spearman correlation coefficient indicates the direction of correlation of X (independent variable) and Y (dependent variable). If Y tends to increase as X increases, the Spireman correlation coefficient is positive. If Y tends to decrease as X increases, the Spireman correlation coefficient is negative. A spearman correlation coefficient of zero indicates that Y does not have any tropism as X increases. As X and Y get closer to perfect monotonic correlation, the spearman correlation coefficient increases in absolute value. When X and Y are completely monotonically correlated, the absolute value of the spearman correlation coefficient is 1. The complete monotonically increasing relationship means that any two pairs of data Xi, Yi and Xj, Yj, with Xi-Xj and Yi-Yj always being of the same sign. The complete monotonic decreasing relationship means that any two pairs of data Xi, Yi and Xj, Yj, with Xi-Xj and Yi-Yj always being opposite signs.
Spearman correlation coefficients are often referred to as being "nonparametric". There are two layers. First, X and Y are completely Pearson-related when their relationship is described by an arbitrary monotonic function. Accordingly, the pearson correlation coefficient gives only the correlation of X and Y described by the linear equation. Second, spearman does not require a priori knowledge (i.e., knowledge of its parameters) to accurately obtain the X and Y sampling probability distributions.
Kender scale correlation coefficient:
the Kendel correlation coefficient is a statistical value used to measure the correlation between two random variables. A kendell test is a non-parametric hypothesis test that uses calculated correlation coefficients to test the statistical dependence of two random variables. The value range of the Kendel correlation coefficient is between-1 and 1, and when tau is 1, the two random variables have consistent level correlation; when tau is-1, the two random variables are shown to have completely opposite level correlation; when τ is 0, it indicates that the two random variables are independent of each other.
Assuming that the two random variables are X, Y (which may be regarded as two sets), the numbers of the elements are N, and the i (1 ═ i ═ N) th values of the two random variables are represented by Xi and Yi, respectively. The corresponding elements in X and Y form an element pair set XY, which contains (Xi, Yi) (1 ═ i ═ N). Any two elements (Xi, Yi) in the set XY are considered to be identical when they are ranked the same as (Xj, Yj) (i.e., when case 1 or 2 occurs; case 1: Xi > Xj and Yi > Yj; case 2: Xi < Xj and Yi < Yj). When cases 3 or 4 occur (case 3: Xi > Xj and Yi < Yj, case 4: Xi < Xj and Yi > Yj), the two elements are considered to be inconsistent. When case 5 or 6 occurs (case 5: Xi ═ Xj, case 6: Yi ═ Yj), these two elements are neither consistent nor inconsistent:
Figure BDA0001211908390000091
in step S103, synonym expansion of words is performed on the system document to obtain a structurally stored system document.
Namely, on one hand, an index algorithm is established for system documents through a lexical method, synonym expansion of words is carried out by using a synonym forest, and finally, the system documents in structured storage are formed.
In step S104, a change document is acquired.
In step S105, when the indexes match, performing similarity matching of short texts according to the institutional documents and paragraphs of the changed documents stored in the structured manner, so as to obtain a query result.
That is, on the other hand, when performing search matching, similar matching of short texts is performed in units of paragraphs of a system document and a modified document, for example, the matching method mainly used may be a morphological word-order method.
Wherein, in one embodiment of the present invention, index matching further comprises: and obtaining a matching item of the target file according to the system document and the change document which are stored in a structured manner so as to obtain regular change history.
Optionally, in an embodiment of the present invention, the matching term of the target file is obtained through a correlation coefficient calculation.
In addition, in one embodiment of the present invention, the method further includes: and after checking the compliance of the bank operation and maintenance regulation change according to the query result, storing the query result in a local file system in a file form according to the name of the changed document.
It can be understood that, in the embodiment of the present invention, on the basis that a large bank provides a regulation document and a part of operation and maintenance change documents, an inverted index is established for the regulation document by a Chinese word segmentation method, and synonyms of words are expanded by using a "synonym forest" at the same time, so that a structured storage regulation document is finally formed. On the other hand, when searching and matching are carried out, the similarity matching of short texts is carried out by taking the system document and the paragraph of the changed document as a unit, and the matching method mainly adopted is a word form word sequence method. And operations such as offline storage of system documents and addition, deletion, modification and the like of document contents are supported. And the method creates conditions for the matching of the following short text query by selecting a suitable storage data structure. And for each change order scheme document, content clauses related to the change document can be quickly retrieved from the system document, and the query result is structurally stored, so that the query result is convenient for a user to check. The method can realize efficient query, the retrieval time of one changed document is generally not more than 3s, and simultaneously, one-time retrieval and search of a plurality of changed documents are supported.
According to the method for checking the compliance of the bank operation and maintenance regulation and regulation change, when the system document stored in a structured mode is stored, the purpose of checking the compliance of the bank operation and maintenance regulation and regulation change is achieved by obtaining the query result according to the system document and the change document stored in the structured mode, the bank can conveniently and quickly eliminate faults and search problems by quickly backtracking and searching the regulation and regulation change, meanwhile, the index is established for the historical document, the past operation and maintenance change can be clearly and visually known, the checking accuracy and efficiency are improved, and the method is simple and easy to achieve.
Next, a compliance checking device for bank operation and maintenance regulation changes according to an embodiment of the present invention will be described with reference to the accompanying drawings.
Fig. 2 is a schematic structural diagram of a compliance checking device for bank operation and maintenance regulation change according to an embodiment of the present invention.
As shown in fig. 2, the compliance checking device 10 for bank operation and maintenance regulation change includes: a first retrieval module 100, a lexical module 200, a storage module 300, a second retrieval module 400, and an index matching module 500.
The first obtaining module 100 is configured to obtain an institution document of a banking system. The lexical item module 200 is used for lexical item analysis of system documents to establish an inverted index. The storage module 300 is used to perform synonym expansion of words on institutional documents to obtain the structured storage of regulatory documents. The second obtaining module 400 is used for obtaining a change document. The index matching module 500 is configured to, when indexes are matched, perform similar matching on short texts according to the institutional documents and paragraphs of the changed documents stored in the structured manner, so as to obtain a query result. The device 10 of the embodiment of the invention can obtain the query result according to the system document and the change document which are stored in a structuralized way, improves the accuracy and the efficiency of the examination, and is simple and easy to realize.
Further, in an embodiment of the present invention, the index matching module 500 is further configured to obtain a matching item of the target file according to the institutional documents and the change documents stored in the structured storage, so as to obtain a regular change history.
Optionally, in an embodiment of the present invention, the matching term of the target file is obtained through a correlation coefficient calculation.
Further, in one embodiment of the present invention, the lexical module 200 includes: the segmentation unit is used for segmenting the word sequence of the system document into a plurality of single words; and the combining unit is used for combining the plurality of unit words into a new word sequence according to a preset standard.
Further, in an embodiment of the present invention, after checking the compliance of the bank operation and maintenance regulation change according to the query result, the storage module 300 is further configured to store the query result in the local file system in a form of a file according to the name of the change document.
It should be noted that the explanation of the embodiment of the method for checking the compliance of the bank operation and maintenance regulation change is also applicable to the device for checking the compliance of the bank operation and maintenance regulation change of the embodiment.
For example, the apparatus 10 of the present embodiment includes: a data and processing module (which is equivalent to include the first obtaining module 100, the lexical module 200, and the second obtaining module 400), a document similarity matching module (which is equivalent to include the storage module 300 and the index matching module 500), and an operation and maintenance change compliance verification result display module.
The data preprocessing module is mainly used for acquiring change documents and system documents provided by the bank from background data of the large commercial bank by utilizing feature extraction. The project mainly researches the two parts of data, and the core is short text semantic matching of the two parts of data, and finally, texts with higher semantic similarity can be extracted from the two data sources respectively and integrated and presented to a user, so that the user can conveniently check the compliance of changed contents. The document similarity matching module can realize the rapid storage of a plurality of institutional documents, store institutional documents processed by word segmentation in the form of files and tables, and modify or expand the content of the documents through aspects. The system can quickly update the stored content after each modification or expansion operation, ensure the correctness of data, and can process a plurality of changed documents at one time by intensively placing the changed documents at the appointed position of the system, and store the query result matched with the system documents in the local file system in a file form according to the names of the changed documents. The result return time of each altered document does not exceed 3s, and the similarity between documents can be calculated using a correlation coefficient calculation method, for example. After the document matching is performed, the operation and maintenance change compliance inspection result display module may return to obtain the final inspection result.
It can be understood that, in the embodiment of the present invention, a set of software for checking the compliance of the operation and maintenance change of the large commercial bank may be developed. Analyzing all the changed documents and system documents of the bank, matching the contents of the two documents, finding the matching item of the target document, and obtaining the change history of the regulation, thereby establishing a set of complete bank operation and maintenance change compliance checking software.
Specifically, the compliance check of the business banking system document and the change document mainly comprises two stages: firstly, the preparation stage of the system is carried out; the second is the use phase of the system. In the system preparation phase, the bank data analysis in the background is mainly needed to be completed. First, the system lexically terms the documents in the candidate document set for the target document and uses an indexing algorithm to retrieve the matches. During the use stage of the system, the user can use the system to match index all regulations. The analysis result can not only provide reference for the bank background, but also assist the bank in rapid fault processing after the fault occurs.
The implementation of the operation and maintenance change compliance checking technical method software of the large-scale commercial bank adopts the following related technologies in the method, such as the core technologies of original data lexical transformation, index algorithm, similarity matching and the like. The algorithms and the functional modules such as the graphical user interface are developed and realized by java and the like under Windows 10.
Based on the development platform, deployment and operation of the whole operation and maintenance change compliance checking technology software need to be supported by the following operating environments. Firstly, at an operating system layer, a prediction system needs to run on Windows10 or a compatible operating system platform thereof; meanwhile, a program running support environment, namely a java running support environment, is also needed. Only with the supporting environment, the operation and maintenance change compliance checking technology software can normally run.
According to the device for checking the compliance of the bank operation and maintenance regulation and regulation change, when the system document stored in a structuralized mode is stored, the purpose of checking the compliance of the bank operation and maintenance regulation and regulation change is achieved by obtaining the query result according to the system document and the change document stored in the structuralized mode, the bank can conveniently and quickly eliminate faults and search problems by fast backtracking and searching of the regulation and regulation change, meanwhile, indexes are built for historical documents, the past operation and maintenance change can be clearly and visually known, the checking accuracy and efficiency are improved, and the checking is simple and easy to achieve.
In the description of the present invention, it is to be understood that the terms "central," "longitudinal," "lateral," "length," "width," "thickness," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," "clockwise," "counterclockwise," "axial," "radial," "circumferential," and the like are used in the orientations and positional relationships indicated in the drawings for convenience in describing the invention and to simplify the description, and are not intended to indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and are therefore not to be considered limiting of the invention.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
In the present invention, unless otherwise expressly stated or limited, the terms "mounted," "connected," "secured," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally formed; can be mechanically or electrically connected; they may be directly connected or indirectly connected through intervening media, or they may be connected internally or in any other suitable relationship, unless expressly stated otherwise. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
In the present invention, unless otherwise expressly stated or limited, the first feature "on" or "under" the second feature may be directly contacting the first and second features or indirectly contacting the first and second features through an intermediate. Also, a first feature "on," "over," and "above" a second feature may be directly or diagonally above the second feature, or may simply indicate that the first feature is at a higher level than the second feature. A first feature being "under," "below," and "beneath" a second feature may be directly under or obliquely under the first feature, or may simply mean that the first feature is at a lesser elevation than the second feature.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (2)

1. A method for checking the compliance of bank operation and maintenance regulation and change is characterized by comprising the following steps:
acquiring system documents of a bank system;
lexical transforming the institutional documents to establish an inverted index, wherein the lexical transforming the institutional documents further comprises: segmenting the word sequence of the system document into a plurality of single words; combining the plurality of unit words into a new word sequence according to a preset specification;
carrying out synonym expansion of words on the system document to obtain a system document stored in a structuralized mode, wherein the operations of offline storage of the system document and addition, deletion and modification of document contents are supported;
acquiring a change document; and
when indexes are matched, performing similar matching of short texts according to the institutional documents and the paragraphs of the changed documents stored in the structuralized mode to obtain a query result, wherein the index matching further comprises the following steps: obtaining a matching item of a target file according to the system document and the change document which are stored in a structured mode so as to obtain regular change history, wherein the matching item of the target file is obtained through correlation coefficient calculation;
and after checking the compliance of the bank operation and maintenance regulation change according to the query result, storing the query result into a local file system in a file form according to the name of the change document.
2. A device for checking the compliance of bank operation and maintenance regulation changes is characterized by comprising:
the first acquisition module is used for acquiring system documents of a bank system;
a lexical item module for lexical item transformation of the system documents to establish an inverted index, wherein the lexical item module comprises: the segmentation unit is used for segmenting the word sequence of the institutional document into a plurality of single words; the combination unit is used for combining the plurality of unit words into a new word sequence according to a preset specification;
the storage module is used for carrying out synonym expansion on words on the system document so as to obtain a structured storage regulation document, wherein the operations of offline storage of the system document and addition, deletion and modification of document contents are supported;
the second acquisition module is used for acquiring the changed document; and
the index matching module is used for performing similar matching of short texts according to the institutional documents and the sections of the changed documents stored in a structured mode to obtain a query result when indexes are matched, and is also used for obtaining a matching item of a target file according to the institutional documents stored in a structured mode and the changed documents to obtain a regular change history, wherein the matching item of the target file is obtained through correlation coefficient calculation;
after the compliance of the bank operation and maintenance regulation change is checked according to the query result, the storage module is further used for storing the query result into a local file system in a file form according to the name of the change document.
CN201710040985.4A 2017-01-17 2017-01-17 Method and device for checking compliance of bank operation and maintenance regulation and change Active CN106844625B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710040985.4A CN106844625B (en) 2017-01-17 2017-01-17 Method and device for checking compliance of bank operation and maintenance regulation and change

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710040985.4A CN106844625B (en) 2017-01-17 2017-01-17 Method and device for checking compliance of bank operation and maintenance regulation and change

Publications (2)

Publication Number Publication Date
CN106844625A CN106844625A (en) 2017-06-13
CN106844625B true CN106844625B (en) 2020-07-28

Family

ID=59119400

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710040985.4A Active CN106844625B (en) 2017-01-17 2017-01-17 Method and device for checking compliance of bank operation and maintenance regulation and change

Country Status (1)

Country Link
CN (1) CN106844625B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109871468A (en) * 2019-02-01 2019-06-11 国网四川省电力公司广元供电公司 Non-structured document management and rules and regulations entry management integration system
CN111400252B (en) * 2020-03-11 2023-10-31 国网江西省电力有限公司信息通信分公司 Project acceptance compliance-based detection system, device, detection method and application

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101582073A (en) * 2008-12-31 2009-11-18 北京中机科海科技发展有限公司 Intelligent retrieval system and method based on domain ontology
CN102096693A (en) * 2009-12-11 2011-06-15 鸿富锦精密工业(深圳)有限公司 Documentation change tracking system and method
CN102156711A (en) * 2011-03-08 2011-08-17 国网信息通信有限公司 Cloud storage based power full text retrieval method and system
CN102314418A (en) * 2011-10-09 2012-01-11 北京航空航天大学 Method for comparing Chinese similarity based on context relation

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201403354A (en) * 2012-07-03 2014-01-16 Univ Nat Taiwan Normal System and method using data reduction approach and nonlinear algorithm to construct Chinese readability model

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101582073A (en) * 2008-12-31 2009-11-18 北京中机科海科技发展有限公司 Intelligent retrieval system and method based on domain ontology
CN102096693A (en) * 2009-12-11 2011-06-15 鸿富锦精密工业(深圳)有限公司 Documentation change tracking system and method
CN102156711A (en) * 2011-03-08 2011-08-17 国网信息通信有限公司 Cloud storage based power full text retrieval method and system
CN102314418A (en) * 2011-10-09 2012-01-11 北京航空航天大学 Method for comparing Chinese similarity based on context relation

Also Published As

Publication number Publication date
CN106844625A (en) 2017-06-13

Similar Documents

Publication Publication Date Title
KR101419623B1 (en) Method of searching for document data files based on keywords, and computer system and computer program thereof
US20180300315A1 (en) Systems and methods for document processing using machine learning
Zhou et al. Resolving surface forms to wikipedia topics
CN102253930B (en) A kind of method of text translation and device
CN102693244B (en) Method and device for identifying information in non-structured text
JP2001034623A (en) Information retrievel method and information reteraval device
CN103136352A (en) Full-text retrieval system based on two-level semantic analysis
EP2577521A2 (en) Detection of junk in search result ranking
Sari et al. Rule-based pattern extractor and named entity recognition: A hybrid approach
EP2183684A2 (en) Coreference resolution in an ambiguity-sensitive natural language processing system
CN111666764B (en) Automatic abstracting method and device based on XLNet
US20230282018A1 (en) Generating weighted contextual themes to guide unsupervised keyphrase relevance models
KR20070007001A (en) Method and apparatus for searching information using automatic query creation
Jones et al. Automatically building a corpus for a minority language from the web
CN106844625B (en) Method and device for checking compliance of bank operation and maintenance regulation and change
JP6108212B2 (en) Synonym extraction system, method and program
US8229970B2 (en) Efficient storage and retrieval of posting lists
CN116932753A (en) Log classification method, device, computer equipment, storage medium and program product
Walas et al. Named entity recognition in a Polish question answering system
Liang Spell checkers and correctors: A unified treatment
Eliassi-Rad et al. A theory-refinement approach to information extraction
Garrido et al. NEREA: Named entity recognition and disambiguation exploiting local document repositories
EP3203384A1 (en) Method, device, and computer program for providing a definition or a translation of a word belonging to a sentence as a function of neighbouring words and of databases
JP2011159100A (en) Successive similar document retrieval apparatus, successive similar document retrieval method and program
CN116414939B (en) Article generation method based on multidimensional data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant