CN113408285B - Identification method of financial body, electronic device and storage medium - Google Patents

Identification method of financial body, electronic device and storage medium Download PDF

Info

Publication number
CN113408285B
CN113408285B CN202110578190.5A CN202110578190A CN113408285B CN 113408285 B CN113408285 B CN 113408285B CN 202110578190 A CN202110578190 A CN 202110578190A CN 113408285 B CN113408285 B CN 113408285B
Authority
CN
China
Prior art keywords
financial
model
trained
main body
character sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110578190.5A
Other languages
Chinese (zh)
Other versions
CN113408285A (en
Inventor
范如
范渊
杨勃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
DBAPPSecurity Co Ltd
Original Assignee
DBAPPSecurity Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by DBAPPSecurity Co Ltd filed Critical DBAPPSecurity Co Ltd
Priority to CN202110578190.5A priority Critical patent/CN113408285B/en
Publication of CN113408285A publication Critical patent/CN113408285A/en
Application granted granted Critical
Publication of CN113408285B publication Critical patent/CN113408285B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biophysics (AREA)
  • Business, Economics & Management (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Medical Informatics (AREA)
  • Economics (AREA)
  • Finance (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

The application relates to a financial subject identification method, an electronic device and a storage medium, wherein the financial subject identification method comprises the following steps: the method comprises the steps of inputting a financial document to be analyzed into more than two different first main body recognition models to obtain a first prediction result set, wherein the first prediction result set consists of first prediction results corresponding to the first main body recognition models, each first prediction result comprises a plurality of financial main bodies predicted by the corresponding first main body recognition models, determining whether the financial main bodies are output as recognition results according to the occurrence times of the financial main bodies in the first prediction result set, solving the problem that the main bodies of financial fraud information are easy to misjudge, and realizing the more accurate recognition of the main bodies of the financial fraud information.

Description

Identification method of financial body, electronic device and storage medium
Technical Field
The present invention relates to the field of natural language processing, and in particular, to a method for identifying a financial subject, an electronic device, and a storage medium.
Background
With the rapid progress of the internet and the rapid development of global finance, various finance industries represented by the internet have been integrated into various fields of economic and social development, and financial information has been explosively increased. The P2P network lending platform, the petty loan company, the equity investment organization and other various financial business states are continuously emerging, the financing scale and the transaction scale are continuously enlarged, the involved transaction main body is more and more complex, and the economic crime is more and more marked, wider in scope and greater in harm through the Internet means.
Financial supervision faces more difficulties than traditional industry supervision, and how to identify the main body of financial fraud information from massive financial information does not currently provide an effective solution. In the prior art, named entity identification is performed based on deep learning, but the method has the risk of generating misjudgment of a main body, and if the main body of the financial fraud information is directly identified, the main body of the financial fraud information cannot be accurately identified obviously.
Disclosure of Invention
In this embodiment, a method for identifying a financial subject, an electronic device, and a storage medium are provided to solve the problem that the subject of financial fraud information cannot be identified in the related art.
In a first aspect, in this embodiment, there is provided a method for identifying a financial subject, the method including:
acquiring a financial document to be analyzed;
inputting the financial document to be analyzed into more than two different first subject identification models to obtain a first prediction result set, wherein the first prediction result set consists of first prediction results corresponding to the first subject identification models, and each first prediction result comprises a plurality of financial subjects predicted by the corresponding first subject identification models;
And determining whether the financial subjects are output as recognition results according to the occurrence times of the financial subjects in the first prediction result set.
In some of these embodiments, the method further comprises:
acquiring a financial document to be trained, and acquiring a first character sequence and a second character sequence according to the financial document to be trained;
dividing the first character sequence into a training set and a verification set, and training more than two different second main body recognition models for more than one round according to the training set and the second character sequence to obtain a third main body recognition model set, wherein the third main body recognition model set consists of a plurality of third main body recognition models corresponding to the second main body recognition models, and one third main body recognition model is obtained by training the second main body recognition models for each round;
verifying each third subject identification model by using the verification set to obtain a recall rate of each third subject identification model and a second prediction result set, wherein the second prediction result set consists of second prediction results corresponding to each third subject identification model, each second prediction result comprises a plurality of financial subjects predicted by the corresponding third subject identification model, and a third subject identification model meeting the recall rate requirement in the third subject identification model set is determined as a fourth subject identification model, and each second prediction result corresponding to the fourth subject identification model consists of a third prediction result set;
Determining whether the financial subjects output as prediction results according to the occurrence times of the financial subjects in the third prediction result set;
and calculating the matching degree of the prediction result and the financial fraud information main body calibrated in the verification set, and determining the fourth main body recognition model with the calculated matching degree meeting the requirement as a first main body recognition model.
In some of these embodiments, the second subject identification model is constructed by at least one of:
a BERT-BLSTM-CRF model and a BERT-IDCNN-CRF model.
In some embodiments, obtaining a financial document to be trained, and obtaining a first character sequence and a second character sequence according to the financial document to be trained specifically includes:
acquiring a financial document to be trained, and preprocessing the financial document to be trained to obtain first text information;
and labeling the first text information to obtain a first character sequence and a second character sequence.
In some embodiments, preprocessing the financial document to be trained to obtain first text information specifically includes:
removing redundant information in the financial document to be trained through regular matching to obtain a processed financial document, wherein the processed financial document comprises a title and a text;
And acquiring the editing distance between the title and the text, and if the editing distance is larger than a first threshold value, splicing the title and the text to obtain first text information.
In some embodiments, labeling the first text information to obtain a first character sequence and a second character sequence includes:
labeling the financial main body in the first text information to obtain a third character sequence, wherein the third character sequence comprises a title and a text;
marking whether the financial main body appears in the text, the frequency of appearance in the text and whether the financial main body appears in the title in the third character sequence to obtain a second character sequence with marking information;
and marking the position information of the financial main body in the third character sequence to obtain a first character sequence with marking information.
In some of these embodiments, the first subject identification model includes a trained first sub-model and a trained second sub-model;
inputting the financial document to be analyzed into more than two different first subject identification models, wherein obtaining a first prediction result set comprises:
Inputting the financial document to be analyzed into the trained first sub-model to obtain characteristic information corresponding to the financial document to be analyzed, wherein the trained first sub-model is obtained through training of the financial document to be trained;
and inputting the characteristic information corresponding to the financial document to be analyzed into the trained second sub-model to obtain a first prediction result set.
In some of these embodiments, the first sub-model is a BERT model;
inputting the financial document to be analyzed into the trained first sub-model to obtain feature information corresponding to the financial document to be analyzed, wherein the feature information comprises the following specific steps:
training a layer of preamble coding predictors in a BERT model by using a financial document to be trained to obtain a first weight value corresponding to the trained preamble coding predictors, wherein the BERT model is provided with a plurality of layers of preamble coding predictors;
acquiring second weight values corresponding to a plurality of untrained preamble coding predictors in the BERT model;
obtaining a weight value of the BERT model according to the first weight value and each second weight value;
mapping the weight value of the BERT model to 512 dimensions through a full connection layer to obtain a trained BERT model;
And inputting the financial document to be analyzed into the trained BERT model to obtain the characteristic information corresponding to the financial document to be analyzed.
In some of these embodiments, the trained preamble coding predictor is a bottommost preamble coding predictor in the BERT model.
In some of these embodiments, determining whether the financial entity is output as a recognition result based on the number of times each of the financial entities appears in the first set of prediction results includes:
and determining a constant multiple of the number of the first predicted results as a second threshold, and outputting the financial body as a recognition result if the number of times of occurrence of the financial body in the first predicted result set is greater than or equal to the second threshold.
In a second aspect, in this embodiment, there is provided an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the method for identifying a financial subject according to the first aspect.
In a third aspect, in this embodiment, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the method for identifying a financial subject as described in the first aspect above.
Compared with the related art, the identification method, the electronic device and the storage medium for the financial subjects provided in the embodiment are characterized in that by acquiring the financial document to be analyzed, the financial document to be analyzed is input into more than two different first subject identification models to obtain a first prediction result set, the first prediction result set is composed of first prediction results corresponding to the first subject identification models, each first prediction result comprises a plurality of financial subjects predicted by the corresponding first subject identification model, whether the financial subjects are output as identification results is determined according to the occurrence times of the financial subjects in the first prediction result set, the problem that the subjects of financial fraud information are easy to misjudge is solved, and the subjects of the financial fraud information are more accurately identified.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the other features, objects, and advantages of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:
Fig. 1 is a hardware configuration block diagram of an application terminal of an identification method of a financial body according to an embodiment of the present application;
FIG. 2 is a flow chart of a method of identifying a financial institution in accordance with an embodiment of the present application;
FIG. 3 is a flow chart of a first subject identification model acquisition method according to an embodiment of the present application;
FIG. 4 is a flowchart of a method for obtaining feature information corresponding to a financial document to be analyzed according to an embodiment of the present application;
FIG. 5 is a flow chart of a method of identifying a further financial institution in accordance with an embodiment of the present application;
FIG. 6 is a flow chart of a method of identifying a further financial institution in accordance with an embodiment of the present application;
FIG. 7 is a schematic diagram of BERT model dynamic weight fusion according to an embodiment of the present application.
Detailed Description
For a clearer understanding of the objects, technical solutions and advantages of the present application, the present application is described and illustrated below with reference to the accompanying drawings and examples.
Unless defined otherwise, technical or scientific terms used herein shall have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terms "a," "an," "the," "these," and the like in this application are not intended to be limiting in number, but rather are singular or plural. The terms "comprising," "including," "having," and any variations thereof, as used in the present application, are intended to cover a non-exclusive inclusion; for example, a process, method, and system, article, or apparatus that comprises a list of steps or modules (units) is not limited to the list of steps or modules (units), but may include other steps or modules (units) not listed or inherent to such process, method, article, or apparatus. The terms "connected," "coupled," and the like in this application are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. Reference to "a plurality" in this application means two or more. "and/or" describes an association relationship of an association object, meaning that there may be three relationships, e.g., "a and/or B" may mean: a exists alone, A and B exist together, and B exists alone. Typically, the character "/" indicates that the associated object is an "or" relationship. The terms "first," "second," "third," and the like, as referred to in this application, merely distinguish similar objects and do not represent a particular ordering of objects.
The method embodiments provided in the present embodiment may be executed in a terminal, a computer, or similar computing device. For example, running on a terminal, fig. 1 is a block diagram of a hardware structure of an application terminal of a method for identifying a financial body according to an embodiment of the present application. As shown in fig. 1, the terminal may include one or more (only one is shown in fig. 1) processors 102 and a memory 104 for storing data, wherein the processors 102 may include, but are not limited to, a microprocessor MCU, a programmable logic device FPGA, or the like. The terminal may also include a transmission device 106 for communication functions and an input-output device 108. It will be appreciated by those skilled in the art that the structure shown in fig. 1 is merely illustrative and is not intended to limit the structure of the terminal. For example, the terminal may also include more or fewer components than shown in fig. 1, or have a different configuration than shown in fig. 1.
The memory 104 may be used to store a computer program, for example, a software program of application software and a module, such as a computer program corresponding to the identification method of a financial body in the present embodiment, and the processor 102 executes the computer program stored in the memory 104 to perform various functional applications and data processing, that is, to implement the above-described method. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory remotely located relative to the processor 102, which may be connected to the terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used to receive or transmit data via a network. The network includes a wireless network provided by a communication provider of the terminal. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, simply referred to as NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is configured to communicate with the internet wirelessly.
In this embodiment, a method for identifying a financial body is provided, and fig. 2 is a flowchart of a method for identifying a financial body according to an embodiment of the present application, as shown in fig. 2, where the flowchart includes the following steps:
step S201, a financial document to be analyzed is acquired.
Step S202, inputting a financial document to be analyzed into more than two different first subject identification models to obtain a first prediction result set, wherein the first prediction result set consists of first prediction results corresponding to the first subject identification models, and each first prediction result comprises a plurality of financial subjects predicted by the corresponding first subject identification models.
Step S203, determining whether the financial subjects output as the identification results according to the number of times that each financial subject appears in the first prediction result set.
In this embodiment, the financial principal is a principal of financial fraud information.
Through the steps, the problem that the main body of the financial fraud information is easy to misjudge is solved, the final financial main body is determined from a plurality of financial main bodies predicted by more than two different first main body recognition models according to the occurrence times of each financial main body in the first prediction result set, and the main body of the financial fraud information is more accurately recognized.
In some of these embodiments, fig. 3 is a flowchart of a first subject identification model acquisition method according to an embodiment of the present application, and as shown in fig. 3, the steps of acquiring the first subject identification model include:
step S301, a financial document to be trained is obtained, and a first character sequence and a second character sequence are obtained according to the financial document to be trained.
Step S302, the first character sequence is divided into a training set and a verification set, more than one round of training is conducted on more than two different second main body recognition models according to the training set and the second character sequence, a third main body recognition model set is obtained, the third main body recognition model set is composed of a plurality of third main body recognition models corresponding to the second main body recognition models, and one third main body recognition model is obtained after each round of training of the second main body recognition models.
Step S303, verifying each third subject identification model by using a verification set to obtain a recall rate of each third subject identification model and a second prediction result set, wherein the second prediction result set consists of each second prediction result corresponding to each third subject identification model, each second prediction result comprises a plurality of financial subjects predicted by the corresponding third subject identification model, and the third subject identification model meeting the recall rate requirement in the third subject identification model set is determined as a fourth subject identification model, wherein each second prediction result corresponding to the fourth subject identification model consists of the third prediction result set.
Step S304, determining whether the financial subjects output as the prediction results according to the occurrence times of the financial subjects in the third prediction result set.
Step S305, calculating the matching degree of the prediction result and the financial fraud information main body calibrated in the verification set, and determining a fourth main body recognition model with the calculated matching degree meeting the requirement as a first main body recognition model.
Through the steps, more than one round of training is carried out on more than two different second main body recognition models according to the training set and the second character sequence, each second main body recognition model carries out one round of training to obtain a third main body recognition model, the third main body recognition models corresponding to each second main body recognition model form a third main body recognition model set, the verification set is used for verifying each third main body recognition model to obtain recall rates of each third main body recognition model and each second prediction result corresponding to each third main body recognition model, and the third main body recognition model meeting the recall rate requirement is used as a fourth main body recognition model, so that the first screening of the third main body recognition models is realized;
Meanwhile, the second prediction results corresponding to the fourth main body recognition models are formed into a third prediction result set, the predicted financial main body is determined according to the occurrence times of financial main bodies predicted by the fourth main body recognition models in the third prediction result set, the matching degree of the predicted financial main body and financial fraud information main bodies marked in verification sets is calculated, the fourth main body recognition models meeting the matching degree requirement are determined as first main body recognition models, the second screening of the third main body recognition models is realized, and the first main body recognition models are determined through the second screening of the third main body recognition models, so that compared with main body recognition models which are not screened, the obtained first main body recognition models can predict the main body with financial fraud information more accurately.
In some of these embodiments, the second subject identification model is constructed by at least one of:
a BERT-BLSTM-CRF model and a BERT-IDCNN-CRF model.
The bi-directional encoder representation of the transformer (Bidirectional Encoder Representation from Transformers, abbreviated BERT) is a pre-training model proposed by Google AI institute, 10 months 2018.
Bi-directional Long Short-Term Memory (BiLSTM) is a combination of forward LSTM and backward LSTM.
Long Short-Term Memory (LSTM) is one type of RNN (Recurrent Neural Network).
The iterative expansion CNN (Iterated Dilated CNN, abbreviated as IDCNN) is formed by splicing together 4 Dilated CNN block layers with the same structure, and three layers of two convolution layers with the conditions of 1, 1 and 2 are arranged inside each block, so that the iterative expansion CNN is called Iterated Dilated CNN.
The conditional random field algorithm (conditional random field algorithm, abbreviated as CRF), a mathematical algorithm, was proposed in 2001 and was based on a probability map model that follows markov.
In some embodiments, obtaining a financial document to be trained, and obtaining a first character sequence and a second character sequence according to the financial document to be trained specifically includes:
acquiring a financial document to be trained, and preprocessing the financial document to be trained to obtain first text information;
and labeling the first text information to obtain a first character sequence and a second character sequence.
By the method, the first text information is marked, the corresponding first character sequence and second character sequence are obtained, and preparation is made for obtaining a first main body recognition model according to the first character sequence and the second character sequence.
In some embodiments, preprocessing a financial document to be trained to obtain first text information, including:
removing redundant information in the financial document to be trained through regular matching to obtain a processed financial document, wherein the processed financial document comprises a title and a text;
and acquiring the editing distance between the title and the text, and if the editing distance is larger than a first threshold value, splicing the title and the text to obtain first text information.
In the fields of information theory, linguistics, and computer science, the edit distance (Minimum Edit Distance, abbreviated as MED) is an index for measuring the similarity of two sequences.
In this embodiment, if the editing distance between the title and the text is less than or equal to the first threshold, only the text is retained, and the text is used as the first text information.
By the method, redundant information in the financial document to be trained is removed, whether the title and the text are spliced is judged according to the editing distance between the title and the text, the title and the text are spliced only when the editing distance is larger than a first threshold value, and the title and the text are similar when the editing distance is smaller than or equal to the first threshold value, and at the moment, the title and the text are not spliced and are used as first text information, so that the redundant information in the first text information is avoided.
In some embodiments, labeling the first text information to obtain a first character sequence and a second character sequence includes:
labeling the financial main body in the first text information to obtain a third character sequence, wherein the third character sequence comprises a title and a text;
marking whether the financial main body appears in the text, the number of times of appearance in the text and whether the financial main body appears in the title in the third character sequence to obtain a second character sequence with marking information;
labeling the position information of the financial body in the third character sequence to obtain a first character sequence with labeling information.
By the method, the first character sequence comprises marked financial subjects and the position information of the financial subjects, the second character sequence comprises marked financial subjects and the times of the financial subjects in texts and titles, and the preparation is made for training the second subject recognition model according to the first character sequence and the second character sequence to obtain the first subject recognition model.
In some of these embodiments, the first subject identification model includes a trained first sub-model and a trained second sub-model;
inputting the financial document to be analyzed into more than two different first subject identification models, wherein obtaining the first prediction result set comprises:
Inputting the financial document to be analyzed into a trained first sub-model to obtain characteristic information corresponding to the financial document to be analyzed, wherein the trained first sub-model is obtained through training of the financial document to be trained;
and inputting the characteristic information corresponding to the financial document to be analyzed into the trained second sub-model to obtain a first prediction result set.
By means of the method, the first sub-model is trained by using the financial document to be trained, and the trained first sub-model is obtained, so that the trained first sub-model can acquire the characteristic information corresponding to the financial document to be analyzed more accurately, and the more accurate characteristic information is input into the trained second sub-model, so that the main body of the financial fraud information in the financial document to be analyzed can be predicted more accurately.
In some of these embodiments, the first sub-model is a BERT model;
fig. 4 is a flowchart of a feature information obtaining method corresponding to a financial document to be analyzed according to an embodiment of the present application, as shown in fig. 4, the financial document to be analyzed is input into a first trained sub-model to obtain feature information corresponding to the financial document to be analyzed, and specifically includes the following steps:
Step S401, training a layer of preamble code predictors in the BERT model by using a financial document to be trained to obtain a first weight value corresponding to the trained preamble code predictors, wherein the BERT model is provided with a plurality of layers of preamble code predictors.
Step S402, obtaining second weight values corresponding to a plurality of untrained preamble coding predictors in the BERT model.
Step S403, obtaining the weight value of the BERT model according to the first weight value and each second weight value.
And step S404, mapping the weight value of the BERT model to 512 dimensions through a full connection layer to obtain the trained BERT model.
Step S405, inputting the financial document to be analyzed into the trained BERT model to obtain the feature information corresponding to the financial document to be analyzed.
Through the steps, a layer of preamble code predictors in the BERT model are trained to obtain the corresponding first weight value, and the trained BERT model is obtained according to the first weight value, so that the trained BERT model can extract the characteristic information corresponding to the financial document to be analyzed more accurately.
In some of these embodiments, the trained preamble encoding predictor is the lowest layer preamble encoding predictor in the BERT model.
In this embodiment, the pre-coding predictors of each layer are not independent of each other, and the pre-coding predictors of the subsequent layer can obtain the merging feature by combining the input of the pre-coding predictors of the previous layer and output the merging feature in addition to the input feature of the pre-coding predictors of the subsequent layer.
By means of the method, the preamble coding predictor at the bottommost layer is trained to obtain the trained BERT model, and the trained BERT model can extract the characteristic information corresponding to the financial document to be analyzed more accurately.
In some of these embodiments, determining whether the financial institution is to be output as the recognition result based on the number of times each financial institution is present in the first set of prediction results comprises:
and determining the constant multiple of the number of the first predicted results as a second threshold, and outputting the financial body as the identification result if the number of times of occurrence of the financial body in the first predicted result set is greater than or equal to the second threshold.
By the method, the financial subjects with the occurrence times larger than or equal to the second threshold value are output as the identification results in the first prediction result set, and the financial subjects with the occurrence times smaller than the second threshold value are removed, so that the subjects of the financial fraud information can be accurately determined from the first prediction result set according to the second threshold value, the subjects of the financial fraud information can be more accurately determined, and inaccurate financial subjects are prevented from being output as the identification results.
Fig. 5 is a flowchart of a method for identifying a financial subject according to an embodiment of the present application, as shown in fig. 5, the flowchart including the steps of:
step S501, determining more than two different first subject identification models according to a financial document to be trained, wherein the first subject identification models comprise a first trained sub-model and a second trained sub-model.
In the embodiment, a financial document to be trained is obtained, and a first character sequence and a second character sequence are obtained according to the financial document to be trained;
dividing the first character sequence into a training set and a verification set, and training more than two different second main body recognition models according to the training set and the second character sequence to obtain a third main body recognition model set, wherein the third main body recognition model set consists of a plurality of third main body recognition models corresponding to the second main body recognition models, and one third main body recognition model is obtained after each round of training of the second main body recognition models;
verifying each third subject identification model by using a verification set to obtain recall rates of each third subject identification model and second prediction result sets, wherein each second prediction result set consists of each second prediction result corresponding to each third subject identification model, each second prediction result comprises a plurality of financial subjects predicted by the corresponding third subject identification model, and a third subject identification model meeting the recall rate requirement in the third subject identification model set is determined to be a fourth subject identification model, wherein each second prediction result corresponding to the fourth subject identification model forms a third prediction result set;
Determining whether the financial subjects output as prediction results according to the occurrence times of the financial subjects in the third prediction result set;
and calculating the matching degree of the prediction result and the financial fraud information main body calibrated in the verification set, and determining a fourth main body recognition model with the calculated matching degree meeting the requirement as a first main body recognition model.
In one embodiment, the second subject identification model is constructed by at least one of:
a BERT-BLSTM-CRF model and a BERT-IDCNN-CRF model.
In one embodiment, acquiring a financial document to be trained, and obtaining a first character sequence and a second character sequence according to the financial document to be trained specifically includes:
acquiring a financial document to be trained, and preprocessing the financial document to be trained to obtain first text information;
and labeling the first text information to obtain a first character sequence and a second character sequence.
In one embodiment, preprocessing a financial document to be trained to obtain first text information specifically includes:
removing redundant information in the financial document to be trained through regular matching to obtain a processed financial document, wherein the processed financial document comprises a title and a text;
And acquiring the editing distance between the title and the text, and if the editing distance is larger than a first threshold value, splicing the title and the text to obtain first text information.
In one embodiment, labeling the first text information to obtain the first character sequence and the second character sequence includes:
labeling the financial main body in the first text information to obtain a third character sequence, wherein the third character sequence comprises a title and a text;
marking whether the financial main body appears in the text, the number of times of appearance in the text and whether the financial main body appears in the title in the third character sequence to obtain a second character sequence with marking information;
labeling the position information of the financial body in the third character sequence to obtain a first character sequence with labeling information.
Step S502, inputting the financial document to be analyzed into a trained first sub-model to obtain feature information corresponding to the financial document to be analyzed, wherein the trained first sub-model is obtained through training of the financial document to be trained.
Step S503, inputting the feature information corresponding to the financial document to be analyzed into the trained second sub-model to obtain a first prediction result set.
In this embodiment, the first prediction result set is composed of each first prediction result corresponding to each first subject identification model, and each first prediction result includes a plurality of financial subjects predicted by the corresponding first subject identification model.
In one embodiment, the first sub-model is a BERT model;
inputting the financial document to be analyzed into the trained first sub-model to obtain the characteristic information corresponding to the financial document to be analyzed, wherein the method specifically comprises the following steps:
training a layer of preamble coding predictors in a BERT model by using a financial document to be trained to obtain a first weight value corresponding to the trained preamble coding predictors, wherein the BERT model is provided with a plurality of layers of preamble coding predictors;
acquiring second weight values corresponding to a plurality of untrained preamble coding predictors in the BERT model;
obtaining a weight value of the BERT model according to the first weight value and each second weight value;
mapping the weight value of the BERT model to 512 dimensions through a full connection layer to obtain a trained BERT model;
and inputting the financial document to be analyzed into the trained BERT model to obtain the characteristic information corresponding to the financial document to be analyzed.
In one embodiment, the trained preamble predictor is the lowest layer preamble predictor in the BERT model.
In step S504, a constant multiple of the number of the first prediction results is determined as a second threshold, and if the number of times of occurrence of the financial subject in the first prediction result set is greater than or equal to the second threshold, the financial subject is outputted as the identification result.
Through the steps, the financial document to be analyzed is input into more than two different first main body recognition models to obtain a first prediction result set, the first prediction result set is composed of first prediction results corresponding to the first main body recognition models, each first prediction result comprises a plurality of financial main bodies predicted by the corresponding first main body recognition models, whether the financial main bodies are output as recognition results or not is determined according to the occurrence times of the financial main bodies in the first prediction result set, the problem that the main bodies of financial fraud information are easy to misjudge is solved, the final financial main body is determined from a plurality of financial main bodies predicted by the more than two different first main body recognition models, and the main bodies of the financial fraud information are more accurately recognized.
Fig. 6 is a flowchart of a method for identifying a financial subject according to an embodiment of the present application, as shown in fig. 6, the flowchart including the steps of:
Step S601, preprocessing a financial document to be trained to obtain first text information.
And step S602, labeling the first text information to obtain a first character sequence and a second character sequence.
Step S603, dividing the first character sequence into a training set and a verification set, and performing more than one round of training on more than two different second subject recognition models according to the training set and the second character sequence to obtain a third subject recognition model set, where the third subject recognition model set is composed of a plurality of third subject recognition models corresponding to each second subject recognition model, and one third subject recognition model is obtained by performing one round of training on each second subject recognition model.
Step S604, verifying each third subject identification model by using a verification set to obtain a recall rate of each third subject identification model and a second prediction result set, wherein the second prediction result set consists of each second prediction result corresponding to each third subject identification model, each second prediction result comprises a plurality of financial subjects predicted by the corresponding third subject identification model, and the third subject identification model meeting the recall rate requirement in the third subject identification model set is determined as a fourth subject identification model, wherein each second prediction result corresponding to the fourth subject identification model consists of the third prediction result set.
Step S605 determines whether the financial entity outputs as the prediction result according to the number of times each financial entity appears in the third prediction result set.
Step S606, calculating the matching degree of the prediction result and the financial fraud information main body calibrated in the verification set, and determining a fourth main body recognition model with the calculated matching degree meeting the requirement as a first main body recognition model, wherein the first main body recognition model comprises a trained BERT model and a trained second sub model.
In the embodiment, training a layer of preamble coding predictors in a BERT model by using a financial document to be trained to obtain a first weight value corresponding to the trained preamble coding predictors, wherein the BERT model is provided with a plurality of layers of preamble coding predictors; acquiring second weight values corresponding to a plurality of untrained preamble coding predictors in the BERT model; obtaining a weight value of the BERT model according to the first weight value and each second weight value; and mapping the weight value of the BERT model to 512 dimensions through a full connection layer to obtain the trained BERT model.
Step S607, inputting the financial document to be analyzed into the trained BERT model to obtain the feature information corresponding to the financial document to be analyzed.
Step S608, inputting the feature information corresponding to the financial document to be analyzed into the trained second sub-model to obtain the first prediction result set.
Step S609, determining whether the financial entity outputs as the identification result according to the number of times that each financial entity appears in the first prediction result set.
Through the steps, the financial document to be analyzed is input into more than two different first main body recognition models to obtain a first prediction result set, the first prediction result set is composed of first prediction results corresponding to the first main body recognition models, each first prediction result comprises a plurality of financial main bodies predicted by the corresponding first main body recognition models, whether the financial main bodies are output as recognition results or not is determined according to the occurrence times of the financial main bodies in the first prediction result set, the problem that the main bodies of financial fraud information are easy to misjudge is solved, implementation of financial supervision of the Internet industry is facilitated, fraudulent financial information main bodies can be identified from massive financial information, accordingly, propagation of economic crimes can be controlled and prevented in time, and great practical significance is achieved for preventing Internet economic crimes and reducing mass property loss.
Since all the financial documents to be analyzed and the financial documents to be trained come from crawling financial information texts in a specific financial webpage, the financial information texts specifically comprise two parts, namely a text title and a text information text, the text does not exist in some webpages, and the text length is different, so that the texts of the two parts of the title and the text are preprocessed at first.
In one embodiment, the financial document to be trained includes a text title and a text information text, and the preprocessing of the financial document to be trained to obtain the first text information includes:
filtering noise in a financial document to be trained, including picture information, website information, webpage labels, dates, special characters, non-Chinese, non-English and non-digital symbols, and then judging whether the title and text have an inclusion relationship by calculating the editing distance between the title and text of the text, so that any empty text data of the title and text are removed, specifically, calculating the text editing distance between the title and text, only preserving the text when the editing distance between the title and text is less than 200, and splicing the title and text when the editing distance between the title and text is greater than 200, so as to obtain the processed text information;
Cutting the processed text information with the priority of punctuation marks, and recombining according to the original sequence, when the length of the recombined sentence exceeds 510 characters, generating a new data sample, and repeatedly executing the process on the rest sentences until all the processed text information is assembled to obtain the first text information. Through the mode, noise in the financial document to be trained is filtered through regular matching, text data with any empty title and text is removed through the editing distance between the title and the text, the problem that redundant information in the financial document to be trained is excessive is solved, the processed text information is cut according to the priority of punctuation marks, the problem that a single text is too long is solved, and the data information is completely utilized.
In one embodiment, labeling the first text information to obtain a first character sequence and a second character sequence specifically includes:
manually labeling the first text information, marking the financial body contents contained in each piece of financial document information to form a financial body contents list, and marking whether the information expression is fraudulent or not to form a label column negative, so as to form 4 columns of original data sets: title, text, entity, negative, noted third character sequence;
Mapping the third character sequence into character labels of 'O', 'B-ORG', 'I-ORG', wherein the first character corresponding to the entity word in the third character sequence is B-ORG, the rest characters corresponding to the entity word are I-ORG, and the other words in the third character sequence are O, so that a one-to-one mapping relation between the characters and the labels is formed, and a first character sequence is obtained;
in the third character sequence, the number of times of marking entity words in 507 characters before text information text, whether marking entity words occur in text and whether marking entity words occur in title are obtained, and a second character sequence is obtained.
By the method, the first text information is marked to obtain the first character sequence and the second character sequence, and preparation is made for training the second main body recognition model according to the first character sequence and the second character sequence to obtain the first main body recognition model.
In one embodiment, acquiring a financial document to be analyzed specifically includes:
and constructing a financial fraud information detection model based on the BERT model, specifically, activating a function Sigmoid (0/1) by using the output connection of the last full-connection layer of the BERT model to obtain the financial fraud information detection model, and inputting the first character sequence into the financial fraud information detection model to obtain the financial document to be analyzed.
In addition, in the present embodiment, the financial document to be analyzed may be acquired with a conventional machine learning model including an SVM model and a Logistic Regression model.
In the above manner, the financial document with the financial fraud information is used as the financial document to be analyzed, so as to prepare for the subsequent identification of the main body of the financial fraud information according to the financial document to be analyzed.
In one embodiment, constructing more than two distinct second subject identification models includes:
the second subject identification model is constructed in four ways in the present embodiment.
Mode one: and constructing a second main body recognition model based on the BERT-BLSTM-CRF model, specifically, inputting token vectors learned by the BERT training model into the BILSTM model for further learning, enabling the model to understand the context relation of the text sequence, and finally obtaining the classification result of each token through the CRF model. According to the method, firstly, the output characteristics of the last full-connection layer of the original BERT model are used as the input of the BLSTM model, then the output of the full-connection layer of the BLSTM model is carried with the CRF model to finish the identification of the financial main body information, and three layers of structures are respectively as follows: (1) the BERT uses a transducer mechanism to encode an input text, and uses a pre-training model to obtain semantic representation of characters; (2) the BiLSTM further extracts high-level features of the data on the basis of the BERT output result; (3) the CRF performs state transition constraint on the output result of the BiLSTM layer.
Mode two: the method comprises the steps of partially improving a native BERT model, wherein understanding of each layer of the native BERT model to texts is different, obtaining final weights of the BERT model through a dynamic weight fusion mode, and FIG. 7 is a schematic diagram of the BERT model dynamic weight fusion according to the embodiment of the application, wherein a weight is given to a representation generated by a 12 th layer of transformers of the BERT model, then a first weight value is determined through training, second weight values corresponding to the 1 st layer to the 11 th layer are obtained, the weights corresponding to the 1 st layer to the 12 th layer of transformers are averaged to obtain final weight values, the final weight values are reduced to 512 dimensions through a layer of full-connection layer, and a BLSTM-CRF model is carried on the dynamically fused BERT model as a second mode to construct a second main body recognition model.
Mode three: a second main body recognition model is built based on the BERT-IDCNN-CRF model, the IDCNN can fully capture long-distance information of long-sequence texts under the condition that local information is lost in the texts, and the method is suitable for text data recognition of the long texts, and is different from the BILSTM model in that even if sentences with the length of n are processed under the parallel condition, the complexity of O (n) is only needed, the accuracy is equivalent to that of the BERT-BLSTM-CRF model, and the prediction speed is improved by half.
Mode four: the application constructs a second subject identification model based on the mode two improved BERT model and the IDCNN-CRF.
In this embodiment, the second subject recognition model is not limited to the above four modes, for example, the BILSTM model or the IDCNN model in the above four second subject recognition models may be replaced by the BIGRU model, so that further feature extraction can be performed on the first character sequence and the second character sequence, and a semantic encoding process can be implemented.
By the method, more than two different second subject identification models are constructed, so that preparation is made for the follow-up identification of the subjects of the financial fraud information according to the different second subject identification models.
There is also provided in this embodiment an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, where the transmission device is connected to the processor, and the input/output device is connected to the processor.
Alternatively, in the present embodiment, the above-described processor may be configured to execute the following steps by a computer program:
Acquiring a financial document to be analyzed;
inputting a financial document to be analyzed into more than two different first subject identification models to obtain a first prediction result set, wherein the first prediction result set consists of first prediction results corresponding to the first subject identification models, and each first prediction result comprises a plurality of financial subjects predicted by the corresponding first subject identification models;
and determining whether the financial subjects output as the identification results according to the occurrence times of the financial subjects in the first prediction result set.
It should be noted that, specific examples in this embodiment may refer to examples described in the foregoing embodiments and alternative implementations, and are not described in detail in this embodiment.
In addition, in combination with the identification method of the financial body provided in the above embodiment, a storage medium may be provided in this embodiment. The storage medium has a computer program stored thereon; the computer program, when executed by a processor, implements the identification method of any one of the financial subjects of the above embodiments.
It should be understood that the specific embodiments described herein are merely illustrative of this application and are not intended to be limiting. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present application, are within the scope of the present application in light of the embodiments provided herein.
It is evident that the drawings are only examples or embodiments of the present application, from which the present application can also be adapted to other similar situations by a person skilled in the art without the inventive effort. In addition, it should be appreciated that while the development effort might be complex and lengthy, it would nevertheless be a routine undertaking of design, fabrication, or manufacture for those of ordinary skill having the benefit of this disclosure, and thus should not be construed as an admission of insufficient detail.
The term "embodiment" in this application means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive. It will be clear or implicitly understood by those of ordinary skill in the art that the embodiments described in this application can be combined with other embodiments without conflict.
The foregoing examples represent only a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the patent. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims (11)

1. A method of identifying a financial subject, comprising:
acquiring a financial document to be analyzed;
inputting the financial document to be analyzed into more than two different first subject identification models to obtain a first prediction result set, wherein the first prediction result set consists of first prediction results corresponding to the first subject identification models, and each first prediction result comprises a plurality of financial subjects predicted by the corresponding first subject identification models;
determining whether the financial subjects output as recognition results according to the occurrence times of the financial subjects in the first prediction result set;
the method further comprises the steps of:
acquiring a financial document to be trained, and acquiring a first character sequence and a second character sequence according to the financial document to be trained;
dividing the first character sequence into a training set and a verification set, and training more than two different second main body recognition models for more than one round according to the training set and the second character sequence to obtain a third main body recognition model set, wherein the third main body recognition model set consists of a plurality of third main body recognition models corresponding to the second main body recognition models, and one third main body recognition model is obtained by training the second main body recognition models for each round;
Verifying each third subject identification model by using the verification set to obtain a recall rate of each third subject identification model and a second prediction result set, wherein the second prediction result set consists of second prediction results corresponding to each third subject identification model, each second prediction result comprises a plurality of financial subjects predicted by the corresponding third subject identification model, and a third subject identification model meeting the recall rate requirement in the third subject identification model set is determined as a fourth subject identification model, and each second prediction result corresponding to the fourth subject identification model consists of a third prediction result set;
determining whether the financial subjects output as prediction results according to the occurrence times of the financial subjects in the third prediction result set;
and calculating the matching degree of the prediction result and the financial subjects calibrated in the verification set, and determining the fourth subject identification model with the calculated matching degree meeting the requirement as a first subject identification model.
2. The method of claim 1, wherein the second subject identification model is constructed by at least one of:
A BERT-BLSTM-CRF model and a BERT-IDCNN-CRF model.
3. The method for identifying a financial subject according to claim 1 wherein obtaining a financial document to be trained, and obtaining a first character sequence and a second character sequence according to the financial document to be trained, specifically comprises:
acquiring a financial document to be trained, and preprocessing the financial document to be trained to obtain first text information;
and labeling the first text information to obtain a first character sequence and a second character sequence.
4. The method for identifying a financial subject according to claim 3 wherein preprocessing the financial document to be trained to obtain first text information comprises:
removing redundant information in the financial document to be trained through regular matching to obtain a processed financial document, wherein the processed financial document comprises a title and a text;
and acquiring the editing distance between the title and the text, and if the editing distance is larger than a first threshold value, splicing the title and the text to obtain first text information.
5. The method of claim 3, wherein labeling the first text message to obtain a first character sequence and a second character sequence comprises:
Labeling the financial main body in the first text information to obtain a third character sequence, wherein the third character sequence comprises a title and a text;
marking whether the financial main body appears in the text, the frequency of appearance in the text and whether the financial main body appears in the title in the third character sequence to obtain a second character sequence with marking information;
and marking the position information of the financial main body in the third character sequence to obtain a first character sequence with marking information.
6. The method of claim 1, wherein the identifying of the financial subject,
the first main body recognition model comprises a first trained sub-model and a second trained sub-model;
inputting the financial document to be analyzed into more than two different first subject identification models, wherein obtaining a first prediction result set comprises:
inputting the financial document to be analyzed into the trained first sub-model to obtain characteristic information corresponding to the financial document to be analyzed, wherein the trained first sub-model is obtained through training of the financial document to be trained;
and inputting the characteristic information corresponding to the financial document to be analyzed into the trained second sub-model to obtain a first prediction result set.
7. The method of claim 6, wherein the first sub-model is a BERT model;
inputting the financial document to be analyzed into the trained first sub-model to obtain feature information corresponding to the financial document to be analyzed, wherein the feature information comprises the following specific steps:
training a layer of preamble coding predictors in a BERT model by using a financial document to be trained to obtain a first weight value corresponding to the trained preamble coding predictors, wherein the BERT model is provided with a plurality of layers of preamble coding predictors;
acquiring second weight values corresponding to a plurality of untrained preamble coding predictors in the BERT model;
obtaining a weight value of the BERT model according to the first weight value and each second weight value;
mapping the weight value of the BERT model to 512 dimensions through a full connection layer to obtain a trained BERT model;
and inputting the financial document to be analyzed into the trained BERT model to obtain the characteristic information corresponding to the financial document to be analyzed.
8. The method of claim 7, wherein the trained preamble code predictor is a bottommost preamble code predictor in the BERT model.
9. The method of claim 1, wherein determining whether each of the financial subjects appears as a recognition result based on a number of times the financial subject appears in the first set of prediction results comprises:
and determining a constant multiple of the number of the first predicted results as a second threshold, and outputting the financial body as a recognition result if the number of times of occurrence of the financial body in the first predicted result set is greater than or equal to the second threshold.
10. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, the processor being arranged to run the computer program to perform the method of identifying a financial subject as claimed in any one of claims 1 to 9.
11. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the identification method of a financial body as claimed in any one of claims 1 to 9.
CN202110578190.5A 2021-05-26 2021-05-26 Identification method of financial body, electronic device and storage medium Active CN113408285B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110578190.5A CN113408285B (en) 2021-05-26 2021-05-26 Identification method of financial body, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110578190.5A CN113408285B (en) 2021-05-26 2021-05-26 Identification method of financial body, electronic device and storage medium

Publications (2)

Publication Number Publication Date
CN113408285A CN113408285A (en) 2021-09-17
CN113408285B true CN113408285B (en) 2024-03-22

Family

ID=77675247

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110578190.5A Active CN113408285B (en) 2021-05-26 2021-05-26 Identification method of financial body, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN113408285B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111160188A (en) * 2019-12-20 2020-05-15 中国建设银行股份有限公司 Financial bill identification method, device, equipment and storage medium
CN111259987A (en) * 2020-02-20 2020-06-09 民生科技有限责任公司 Method for extracting event main body based on BERT (belief-based regression analysis) multi-model fusion
CN111291566A (en) * 2020-01-21 2020-06-16 北京明略软件系统有限公司 Event subject identification method and device and storage medium
CN111401065A (en) * 2020-03-10 2020-07-10 中国平安人寿保险股份有限公司 Entity identification method, device, equipment and storage medium
CN111444340A (en) * 2020-03-10 2020-07-24 腾讯科技(深圳)有限公司 Text classification and recommendation method, device, equipment and storage medium
CN111985240A (en) * 2020-08-19 2020-11-24 腾讯云计算(长沙)有限责任公司 Training method of named entity recognition model, named entity recognition method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11568143B2 (en) * 2019-11-15 2023-01-31 Intuit Inc. Pre-trained contextual embedding models for named entity recognition and confidence prediction

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111160188A (en) * 2019-12-20 2020-05-15 中国建设银行股份有限公司 Financial bill identification method, device, equipment and storage medium
CN111291566A (en) * 2020-01-21 2020-06-16 北京明略软件系统有限公司 Event subject identification method and device and storage medium
CN111259987A (en) * 2020-02-20 2020-06-09 民生科技有限责任公司 Method for extracting event main body based on BERT (belief-based regression analysis) multi-model fusion
CN111401065A (en) * 2020-03-10 2020-07-10 中国平安人寿保险股份有限公司 Entity identification method, device, equipment and storage medium
CN111444340A (en) * 2020-03-10 2020-07-24 腾讯科技(深圳)有限公司 Text classification and recommendation method, device, equipment and storage medium
CN111985240A (en) * 2020-08-19 2020-11-24 腾讯云计算(长沙)有限责任公司 Training method of named entity recognition model, named entity recognition method and device

Also Published As

Publication number Publication date
CN113408285A (en) 2021-09-17

Similar Documents

Publication Publication Date Title
US11403680B2 (en) Method, apparatus for evaluating review, device and storage medium
CN114372477B (en) Training method of text recognition model, and text recognition method and device
CN109376222B (en) Question-answer matching degree calculation method, question-answer automatic matching method and device
CN110196982B (en) Method and device for extracting upper-lower relation and computer equipment
CN110598206A (en) Text semantic recognition method and device, computer equipment and storage medium
CN110580335A (en) user intention determination method and device
CN112632225B (en) Semantic searching method and device based on case and event knowledge graph and electronic equipment
CN111931490B (en) Text error correction method, device and storage medium
CN107818084B (en) Emotion analysis method fused with comment matching diagram
CN110377733B (en) Text-based emotion recognition method, terminal equipment and medium
CN114330354B (en) Event extraction method and device based on vocabulary enhancement and storage medium
CN113076739A (en) Method and system for realizing cross-domain Chinese text error correction
CN113158687B (en) Semantic disambiguation method and device, storage medium and electronic device
CN111753082A (en) Text classification method and device based on comment data, equipment and medium
CN113449084A (en) Relationship extraction method based on graph convolution
CN114661861A (en) Text matching method and device, storage medium and terminal
CN115983271A (en) Named entity recognition method and named entity recognition model training method
CN112632227A (en) Resume matching method, resume matching device, electronic equipment, storage medium and program product
CN116152833A (en) Training method of form restoration model based on image and form restoration method
CN114996453A (en) Method and device for recommending commodity codes of import and export commodities and electronic equipment
CN114818718A (en) Contract text recognition method and device
CN117076946A (en) Short text similarity determination method, device and terminal
CN113408285B (en) Identification method of financial body, electronic device and storage medium
CN113420119B (en) Intelligent question-answering method, device, equipment and storage medium based on knowledge card
CN114398482A (en) Dictionary construction method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant