CN115392251A - Real-time entity identification method for Internet financial service - Google Patents

Real-time entity identification method for Internet financial service Download PDF

Info

Publication number
CN115392251A
CN115392251A CN202211065582.2A CN202211065582A CN115392251A CN 115392251 A CN115392251 A CN 115392251A CN 202211065582 A CN202211065582 A CN 202211065582A CN 115392251 A CN115392251 A CN 115392251A
Authority
CN
China
Prior art keywords
entity
financial
text
real
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211065582.2A
Other languages
Chinese (zh)
Inventor
陈平华
匡翊政
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202211065582.2A priority Critical patent/CN115392251A/en
Publication of CN115392251A publication Critical patent/CN115392251A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes

Abstract

The invention discloses a real-time identification method for an entity of internet financial service, which comprises the following steps: step 1): performing data preprocessing on input financial text data X, and labeling a data set by using a BIO labeling system; step 2): segmenting a training set by using five-fold cutting, performing entity recognition on the processed text by using an ALBERT-CRF model to obtain an entity set, and then performing post-processing on the data by using frequent pattern mining to obtain an entity set corresponding to the financial text; step 3): and (3) constructing a financial field knowledge graph through the obtained entities and the obtained relations, then integrating the steps, calculating an evaluation score through Micro-Averaging, and finally obtaining an optimal entity set corresponding to the financial text. The method emphasizes that the entity in the financial text can be identified in real time for real-time financial text data in the Internet, and improves the real-time property of financial entity identification, thereby better providing information support for relevant institutions and individuals in the financial field.

Description

Real-time entity identification method for Internet financial service
Technical Field
The invention relates to the field of entity identification in a specific scene, in particular to a real-time entity identification method for internet financial services.
Background
With the rapid progress of the internet and the rapid development of the world financial industry, the internet financial entities have been explosively increased. The method is difficult to identify the required internet financial entity information in real time and accurately in the face of the updated internet financial information at all times. Therefore, real-time identification of the internet financial entity is an urgent social need, and the method for real-time identification of the entity under the internet financial service scene has important practical significance and use value.
By associating the text with the entity information of the financial business through named entity identification, better financial intelligent service can be provided for users. Compared with the named entity recognition of Chinese in the general field, the financial field is a field with high speciality, and the named entity recognition of the financial field also comprises financial entities belonging to the speciality field, such as financial company names, project names, product names and other entity names with strong speciality besides the recognition of the names of people and places. The named entity identification in the current financial field has the following three problems, one is that the text data volume is large, the noise is high and the updating is fast; secondly, a financial field data set with abundant entities and quality is lacked for experimental research; thirdly, a large number of entities with complex structures exist in the financial field, for example, the inner layers of the entities are nested more, and the boundaries are not easy to identify.
Named entity recognition was first proposed by the sixth semantic Understanding Conference (Message Understanding Conference) and is a fundamental task in natural language processing. Named entities generally refer to entities that are identified as having a particular meaning or strong reference from a large amount of text to be processed, and generally include names of people, places, organizations, proper nouns, dates and times, and the like. Named entity recognition tasks have been in a wide variety of vertical areas such as finance, e-commerce, social media, and the like. The named entity identification technology is to extract the entities from the formal text, and can identify more types of entities according to business requirements, such as project names, project funds and the like. Therefore, the concept of entity can be very wide, and any special text segment required by the service can be called as an entity. The named entity recognition technology lays a foundation for various natural language processing technologies such as information extraction, information retrieval, knowledge maps, text abstractions, machine translation, question-answering systems and the like.
Disclosure of Invention
Aiming at the problems of low identification speed and poor identification accuracy in entity identification in the existing financial field, the invention provides the entity real-time identification method for the internet financial business, which improves the real-time property of the financial entity identification and helps financial practitioners to acquire information more quickly and efficiently, so that the industry dynamics can be grasped in advance and the industry development trend can be tracked. Which comprises the following steps:
step 1, in a data preprocessing module, carrying out format judgment on input financial text data X, carrying out data preprocessing including data cleaning and data division if the format is incorrect, then defining a plurality of entity type labels, and labeling a data set by using a BIO labeling system;
and 2, in the entity set extraction module, segmenting the training set by using five-fold cutting to ensure the generalization of the model, performing real-time entity recognition on the text by using an ALBERT-CRF model to obtain an entity set, performing post-processing on the entity set obtained in the previous step, mining entities which may be missed by adopting a frequent mode, and filtering out entities which are mistakenly recognized, thereby obtaining the optimal entity corresponding to the financial text of the current training turn.
And 3, in the real-time processing module, constructing a knowledge map of the financial field through the entities and the relations obtained in the previous step, performing three rounds of fine-tuning on the data set by using an ALBERT-CRF model, and finally introducing two parameter reduction technologies to improve the real-time property of entity identification.
Further, in step 1, a specific method of the data preprocessing module includes:
step 1.1, aiming at the problems of noise, error labels and the like frequently occurring in financial texts, the method positions the noise and error label data by using a regular expression;
step 1.2, finding out all non-Chinese, non-English and non-digital symbols in the data set, such as some HTML (hypertext markup language) tags, special symbols, nonsense characters and the like, filtering and removing by using a regular expression to realize data cleaning, positioning error tags appearing in a text and cleaning the data aiming at an Internet financial text;
step 1.3, defining a plurality of entity type labels, such as a 'FIN' financial entity, a 'LOC' place name entity, an 'ORG' institution entity, a 'PER' person name entity and an 'O' non-named entity;
step 1.4, a BIO labeling system is adopted to subdivide the labels into 'B-LOC', 'I-LOC', 'B-ORG', 'I-ORG', 'B-PER', 'I-PER', 'B-FIN', 'I-FIN', 'O';
step 1.5, directly adding a period number behind the text with the sentence length exceeding 510 or the text without the ending punctuation, then dividing the long text into a plurality of independent short texts by the priorities of comma, period number, exclamation mark and question mark, simultaneously saving the cutting index, and facilitating the later splicing.
Further, in step 2, a specific method of the entity set extraction module includes:
step 2.1, segmenting a training set by five-fold segmentation, dividing the training set into a training set and a verification set, and ensuring the generalization of the model by using the information of the training set in multiple dimensions;
step 2.2, encoding the text of the financial field to be processed by using an ALBERT pre-training language model to complete word embedding, and acquiring a dynamic word vector;
step 2.3, inputting the dynamic word vector of the previous step into a CRF layer and decoding,
let two sets of random variables X = (X) 1 ,x 2 ,...,x n ) And Y = (Y) 1 ,y 2 ,...,y n ),
The linear chain conditional random field is defined as follows: p (y) i |X,y 1 ,y 2 ,...,y i-1 ,y y+1 ,...,y n )=p(y i |X,y i-1 ,y i+1 ),i=1,2,...,n
Wherein: x is the observed state and Y is the hidden state.
The score of the predicted tag sequence of the entity recognition model of the invention can be obtained by using the following discrimination calculation formula of CRF:
Figure BDA0003828264980000031
Figure BDA0003828264980000032
wherein: mask (X, y) represents the score of a predicted tag sequence y, P represents a score matrix obtained from an ALBERT layer, T represents a transition matrix obtained by learning CRF, and P (y | X) represents the corresponding probability of an input sequence and a tag sequence; y is X Representing all possible character sequences to which the financial text data sequence X corresponds.
Step 2.4, further, obtaining an entity corresponding to the current sentence text according to the label sequence with the highest score, and calculating the logarithm probability of the maximum correct label sequence by using the following formula:
Figure BDA0003828264980000033
wherein X represents an input financial text data sequence X = (X) 0 ,x 1 ,...,x n ) Y represents the predicted character tag sequence, Y X All possible character sequences corresponding to the financial text data sequence X are represented, and mask (X, y) represents the score of the predicted tag sequence y.
And 2.5, decoding to obtain a prediction output sequence of the maximum value by using the following formula: y is max = argmax (X, y')), and then entity boundary and classification recognition is completed by combining the predicted tag sequence and entity label information;
and 2.6, post-processing the obtained entity set, mining the missed entities by adopting a frequent mode, and filtering misjudged entities, thereby extracting an entity set corresponding to the financial text.
Further, in step 3, a specific method of the real-time processing module includes:
step 3.1, a financial knowledge graph is constructed through the obtained entities and the obtained relations and is stored by using a Dgraph database, the operation of the Dgraph database is efficient, and the real-time running of any complex query is supported;
3.2, a dictionary tree is built based on the knowledge graph built in the previous step to carry out benchmarking on the data, and then 3 rounds of fine-tuning training are carried out on the financial data set by using an ALBERT-CRF model, so that the recognition speed is improved;
3.3, in order to further reduce the training time and the reasoning time of the model, the invention adopts two methods, the first method is cross-layer parameter sharing, which is equivalent to that the model only learns the first layer parameter, and the layer parameter is reused in all other layers, thus reducing the parameter number and effectively improving the stability of the model; the second one is to decompose the embedded vector parameter factor, let W be the word vector size, H be the hidden layer size, W is identical to H in the pre-training language model such as BERT, roBERTA, etc., and the parameter scale is O (V × H). ALBERT uses factorization to reduce the number of parameters, and adds a matrix to complete dimension change after word embedding, and the number of parameters is reduced from O (V × H) to O (V × E + E × H), and the number of parameters is obviously reduced when H > E.
Step 3.4, the real-time processing module and the entity set extraction module are integrated, evaluation scores are calculated through a common index Micro-Averaging of named entity recognition, and an optimal entity set corresponding to the financial text is obtained, wherein the formula is as follows:
Figure BDA0003828264980000041
Figure BDA0003828264980000042
Figure BDA0003828264980000051
wherein n represents the number of financial texts, TP i Representing the number of correctly recognized entities in the ith text, FP i Representing the number of erroneously identified entities in the ith text, FN i Representing the number of unrecognized entities in the ith text. Finally, through the steps, the real-time performance of financial entity identification can be effectively improved, and the quick gold finding is facilitatedAnd fusing the decision information.
The real-time entity identification method for the internet financial business provided by the invention has the advantages that the real-time entity identification of a specific field is realized, under the condition that an excellent entity identification model is lacked in the financial field, a high-speed and accurate named entity identification model is constructed, the real-time and accurate entity identification method is different from a traditional model taking BERT as an embedded layer, ALBERT is used as the embedded layer for fine adjustment, the upper and lower semantic features based on the financial field business are effectively learned, the real-time and accurate entity identification of input financial text sentences is realized, the real-time property of the financial entity identification is improved, the problem of difficulty in the entity identification of the financial field is solved, convenience is provided for a financial practitioner to efficiently obtain information and timely grasp industry dynamics, and thus information support is better provided for relevant institutions and individuals in the financial field.
Drawings
FIG. 1 is a flow chart of a method for real-time identification of entities in an Internet financial transaction in accordance with the present invention;
FIG. 2 is a flow chart of an entity set extraction model proposed by the present invention;
FIG. 3 is a cross-layer parameter sharing flow chart of the present invention.
Detailed Description
In order to make the purpose, technical solution and technical effect of the present invention clearer, the present invention is further described in detail below with reference to the accompanying drawings and embodiments of the present invention.
Aiming at the problems of low identification speed and poor identification accuracy in entity identification in the existing financial field, the invention provides a real-time entity identification method for Internet financial services, which comprises the following steps as shown in figure 1:
step 1, in a data preprocessing module, format judgment is carried out on input financial text data X, and if the format is incorrect, data preprocessing is carried out, wherein the data preprocessing comprises data cleaning and data division, and the method specifically comprises the following steps:
in step 1.1, the embodiment directly calls a data API (application programming interface) provided by the Xinlang microblog official through a requests library, obtains real-time financial field text data from the Xinlang microblog, and positions the noise and error label data by using a regular expression aiming at the problems of noise, error labels and the like of the obtained text;
in step 1.2, all non-Chinese, non-English and non-numeric symbols in the data set are found out, such as hyperlink "< a >" labels, paragraph labels "< p >", picture labels "< img >" and some url labels, and then regular expressions are used for filtering and clearing to realize data cleaning;
in step 1.3, a plurality of entity type tags are first defined, such as "FIN" financial entity, "LOC" place name entity, "ORG" organization entity, "PER" person name entity, "O" non-named entity;
in step 1.4, a BIO labeling system is adopted to subdivide the labels into 'B-LOC', 'I-LOC', 'B-ORG', 'I-ORG', 'B-PER', 'I-PER', 'B-FIN', 'I-FIN', 'O';
in step 1.5, a period is directly added to the back of the text with the sentence length exceeding 510 in the sequence X or the text without the ending punctuation, then the long text is divided into a plurality of independent short texts by the priority of comma, period, exclamation mark and question mark, and the cutting index is also stored for convenient splicing.
Step 2, in the entity set extraction module, firstly defining a plurality of entity type labels, labeling a data set by using a BIO labeling system, then performing real-time entity recognition on a text by using an ALBERT-CRF model to obtain an entity set, performing post-processing on the entity set obtained in the previous step, mining entities which may be missed by adopting a frequent mode, and filtering out entities which are mistakenly recognized, thereby obtaining an optimal entity corresponding to the financial text of the current training turn, specifically:
in step 2.1, a training set is segmented by five-fold cutting and is divided into a training set and a verification set, and the information of the training set is utilized in multiple dimensions to ensure the generalization of the model;
in step 2.2, the financial text data sequence X to be processed is encoded by using the ALBERT pre-training language model to complete word embedding, and dynamic word vectors are obtained, for example, "internet finance has a trend of full outbreaks in recent years, and a group of data of" pay treasure "can be peeped at a spot. Taking the ant under the flag of the Alibaba in Hangzhou is a sudden leap forward. From the section, the user can identify the self-defined financial entity corresponding to the internet finance, the organizational entity corresponding to the payment bank, the Alibaba and the ant golden suit and the location entity corresponding to Hangzhou;
in step 2.3, the obtained dynamic word vector is input into a CRF layer and decoded, and then the score of the predictive tag sequence of the entity recognition model can be obtained by using the following CRF discrimination calculation formula:
Figure BDA0003828264980000071
Figure BDA0003828264980000072
wherein: mask (X, y) represents the score of a predicted tag sequence y, P represents a score matrix obtained from an ALBERT layer, T represents a transition matrix obtained by learning CRF, and P (y | X) represents the corresponding probability of an input sequence and a tag sequence; y is X Representing all possible character sequences to which the financial text data sequence X corresponds.
In step 2.4, further, according to the label sequence with the highest score, obtaining an entity corresponding to the current sentence text, and calculating the logarithm probability of the maximized correct label sequence by using the following formula:
Figure BDA0003828264980000073
wherein X represents an input financial text data sequence X = (X) 0 ,x 1 ,...,x n ) Y represents the predicted character tag sequence, Y X All possible character sequences corresponding to the financial text data sequence X are represented, and mask (X, y) represents the score of the predicted tag sequence y.
In step 2.5, the prediction output sequence of the maximum value is decoded using the following formula: y is max = argmax (X, y')), and then entity boundary and classification recognition is completed by combining the predicted tag sequence and entity label information;
in step 2.6, the obtained entity set is post-processed, missing entities are mined in a frequent mode, and misjudged entities are filtered, for example, for incomplete entities such as "pay Baoji (gold)/(Shanghai) energy futures trading center", interpretation is performed according to prediction tags, and part of the incomplete entities is directly discarded, and part of the incomplete entities is completed according to suffixes, so that an entity set corresponding to financial texts is extracted.
And 3, in the real-time processing module, constructing a knowledge map of the financial field through the entities and the relations obtained in the previous step, performing three rounds of fine-tuning on the data set by using an ALBERT-CRF model, and finally introducing two parameter reduction technologies to improve the real-time property of entity identification, wherein the method specifically comprises the following steps of:
in step 3.1, a financial knowledge graph is constructed through the obtained entities and relations and stored by using a digraph database, the digraph database is efficient in operation and supports real-time operation of any complex query, the knowledge graph created by the digraph database is based on an attribute graph model, each entity has a unique identifier, each node is grouped by a label, each relation has a unique type, and the basic concept is as follows: entities, tags, attributes.
In step 3.2, a dictionary tree is built based on the knowledge graph built in the previous step to carry out benchmarking on the data, and then an ALBERT-CRF model is used for carrying out 3 rounds of fine-tuning training on the financial data set, so that the recognition speed is improved;
in step 3.3, in order to further reduce the model training time and the reasoning time, the invention adopts two methods, the first method is cross-layer parameter sharing, which is equivalent to that the model only learns the first layer parameter, and the layer parameter is reused in all other layers, thus reducing the parameter number and effectively improving the model stability; the second one is to decompose the embedded vector parameter factor, let W be the word vector size, H be the hidden layer size, W is identical to H in the pre-training language model such as BERT, roBERTA, etc., and the parameter scale is O (V × H). ALBERT uses factorization to reduce the number of parameters, and adds a matrix to complete dimension change after word embedding, and the number of parameters is reduced from O (V × H) to O (V × E + E × H), and the number of parameters is obviously reduced when H > E.
In step 3.4, the real-time processing module and the entity set extraction module are integrated, and the evaluation score is calculated by the named entity identification common index Micro-Averaging to obtain the optimal entity set corresponding to the financial text, wherein the formula is as follows:
Figure BDA0003828264980000081
Figure BDA0003828264980000082
Figure BDA0003828264980000083
wherein n represents the number of financial texts, TP i Representing the number of correctly recognized entities in the ith text, FP i Representing the number of erroneously identified entities in the ith text, FN i Representing the number of entities not identified in the ith text. Finally, through the steps, the real-time performance of financial entity identification can be effectively improved, and the financial decision information can be conveniently and quickly found.
It should be understood that the described embodiments of the invention are only some of the described embodiments of the invention, and not all embodiments. The particular embodiments described above are illustrative only and not limiting. Various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims (4)

1. A method for real-time identification of an entity of an Internet financial service, comprising the steps of:
step 1, in a data preprocessing module, carrying out format judgment on input financial text data X, carrying out data preprocessing including data cleaning and data division if the format is incorrect, then defining a plurality of entity type labels, and labeling a data set by using a BIO labeling system;
step 2, in the entity set extraction module, segmenting a training set by five-fold cutting to ensure the generalization of a model, then carrying out real-time entity recognition on the text by using an ALBERT-CRF model to obtain an entity set, carrying out post-processing on the entity set obtained in the previous step, mining entities which are possibly missed by adopting a frequent mode, and filtering out entities which are mistakenly recognized, thereby obtaining an optimal entity corresponding to the financial text of the current training turn;
and 3, in the real-time processing module, constructing a financial field knowledge graph through the entities and the relations obtained in the previous step, performing three rounds of fine-tuning on the data set by using an ALBERT-CRF model, and finally introducing two parameter reduction technologies to improve the real-time property of entity identification.
2. The method for real-time identification of an entity of an internet financial service as claimed in claim 1, wherein said step 1 specifically comprises:
step 1.1, aiming at the problems of noise, error labels and the like frequently occurring in financial texts, the method uses a regular expression to position the noise and error label data;
step 1.2, finding out all non-Chinese, non-English and non-digital symbols in the data set, such as some HTML (hypertext markup language) tags, special symbols, nonsense characters and the like, filtering and removing by using a regular expression to realize data cleaning, positioning error tags appearing in a text and cleaning the data aiming at an Internet financial text;
step 1.3, defining a plurality of entity type labels, such as a 'FIN' financial entity, a 'LOC' place name entity, an 'ORG' institution entity, a 'PER' person name entity and an 'O' non-named entity;
step 1.4, adopting a BIO labeling system to subdivide the label into 'B-LOC', 'I-LOC', 'B-ORG', 'I-ORG', 'B-PER', 'I-PER', 'B-FIN', 'I-FIN', 'O';
step 1.5, directly adding a period number behind the text with the sentence length exceeding 510 or the text without the ending punctuation, then dividing the long text into a plurality of independent short texts by the priorities of comma, period number, exclamation mark and question mark, simultaneously saving the cutting index, and facilitating the later splicing.
3. The method for real-time identification of an entity of an internet financial transaction as claimed in claim 1, wherein said step 2 specifically comprises:
step 2.1, segmenting a training set by five-fold segmentation, dividing the training set into a training set and a verification set, and ensuring the generalization of the model by using the information of the training set in multiple dimensions;
step 2.2, encoding the text of the financial field to be processed by using an ALBERT pre-training language model to complete word embedding, and acquiring a dynamic word vector;
step 2.3, inputting the dynamic word vector of the previous step into a CRF layer and decoding,
let two sets of random variables X = (X) 1 ,x 2 ,...,x n ) And Y = (Y) 1 ,y 2 ,...,y n ),
The linear chain conditional random field is defined as follows: p (y) i |X,y 1 ,y 2 ,...,y i-1 ,y y+1 ,...,y n )=p(y i |X,y i-1 ,y i+1 ),i=1,2,...,n
Wherein: x is an observation state and Y is a hidden state;
the score of the predicted tag sequence of the entity recognition model of the invention can be obtained by using the following discrimination calculation formula of CRF:
Figure FDA0003828264970000021
Figure FDA0003828264970000022
wherein: mask (X, Y) represents the score of the predicted tag sequence Y, P represents the score matrix obtained from the ALBERT layer, T represents the transition matrix obtained from the learning of CRF, P (Y | X) represents the corresponding probability of the input sequence and the tag sequence, Y X Representing all possible character sequences corresponding to the financial text data sequence X;
step 2.4, further, obtaining an entity corresponding to the current sentence text according to the label sequence with the highest score, and calculating the logarithm probability of the maximum correct label sequence by using the following formula:
Figure FDA0003828264970000023
wherein X represents an input financial text data sequence X = (X) 0 ,x 1 ,...,x n ) Y represents a predicted character tag sequence;
and 2.5, decoding to obtain a prediction output sequence of the maximum value by using the following formula: y is max = argmax (X, y')), and then entity boundary and classification recognition is completed by combining the predicted tag sequence and entity label information;
and 2.6, post-processing the obtained entity set, mining the missed entities by adopting a frequent mode, and filtering misjudged entities, thereby extracting an entity set corresponding to the financial text.
4. The method as claimed in claim 1, wherein the step 3 specifically comprises:
step 3.1, a financial knowledge graph is constructed through the obtained entities and the obtained relations and is stored by using a Dgraph database, the operation of the Dgraph database is efficient, and the real-time running of any complex query is supported;
3.2, a dictionary tree is built based on the knowledge graph built in the previous step to carry out benchmarking on the data, and then 3 rounds of fine-tuning training are carried out on the financial data set by using an ALBERT-CRF model, so that the recognition speed is improved;
3.3, in order to further reduce the training time and the reasoning time of the model, the invention adopts two methods, the first method is cross-layer parameter sharing, which is equivalent to that the model only learns the first layer parameter, and the layer parameter is reused in all other layers, thus reducing the parameter number and effectively improving the stability of the model; the second one is decomposing embedded vector parameter factor, setting W as word vector size and H as hidden layer size, W is equal to H in pretrained language models such as BERT, roBERTA and the like, and the parameter scale is O (V multiplied by H); the ALBERT adopts a factorization method to reduce the parameter quantity, a matrix is added after words are embedded to complete dimension change, the parameter quantity is reduced from O (V multiplied by H) to O (V multiplied by E + E multiplied by H), and the parameter quantity is obviously reduced when H > E;
step 3.4, the real-time processing module and the entity set extraction module are integrated, evaluation scores are calculated through a common index Micro-Averaging of named entity recognition, and an optimal entity set corresponding to the financial text is obtained, wherein the formula is as follows:
Figure FDA0003828264970000031
Figure FDA0003828264970000032
Figure FDA0003828264970000033
wherein n represents the number of financial texts, TP i Representing the number of correctly recognized entities in the ith text, FP i Representing the number of erroneously identified entities in the ith text, FN i The number of entities which are not identified in the ith text is represented, and finally through the steps, the real-time property of financial entity identification can be effectively improved, and the financial decision information can be found out quickly.
CN202211065582.2A 2022-09-01 2022-09-01 Real-time entity identification method for Internet financial service Pending CN115392251A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211065582.2A CN115392251A (en) 2022-09-01 2022-09-01 Real-time entity identification method for Internet financial service

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211065582.2A CN115392251A (en) 2022-09-01 2022-09-01 Real-time entity identification method for Internet financial service

Publications (1)

Publication Number Publication Date
CN115392251A true CN115392251A (en) 2022-11-25

Family

ID=84123703

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211065582.2A Pending CN115392251A (en) 2022-09-01 2022-09-01 Real-time entity identification method for Internet financial service

Country Status (1)

Country Link
CN (1) CN115392251A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117453921A (en) * 2023-12-22 2024-01-26 南京华飞数据技术有限公司 Data information label processing method of large language model

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117453921A (en) * 2023-12-22 2024-01-26 南京华飞数据技术有限公司 Data information label processing method of large language model
CN117453921B (en) * 2023-12-22 2024-02-23 南京华飞数据技术有限公司 Data information label processing method of large language model

Similar Documents

Publication Publication Date Title
CN110633409B (en) Automobile news event extraction method integrating rules and deep learning
CN110110335B (en) Named entity identification method based on stack model
WO2021147726A1 (en) Information extraction method and apparatus, electronic device and storage medium
CN103544255B (en) Text semantic relativity based network public opinion information analysis method
Tran et al. Understanding what the users say in chatbots: A case study for the Vietnamese language
CN109460725B (en) Receipt consumption details content mergence and extracting method, equipment and storage medium
CN112836046A (en) Four-risk one-gold-field policy and regulation text entity identification method
Bellare et al. Learning extractors from unlabeled text using relevant databases
Chen et al. Information extraction from resume documents in pdf format
CN110941720A (en) Knowledge base-based specific personnel information error correction method
CN114298035A (en) Text recognition desensitization method and system thereof
CN111639183A (en) Financial industry consensus public opinion analysis method and system based on deep learning algorithm
CN116450834A (en) Archive knowledge graph construction method based on multi-mode semantic features
CN115759092A (en) Network threat information named entity identification method based on ALBERT
CN116049419A (en) Threat information extraction method and system integrating multiple models
CN115392251A (en) Real-time entity identification method for Internet financial service
CN111597302B (en) Text event acquisition method and device, electronic equipment and storage medium
CN116628173B (en) Intelligent customer service information generation system and method based on keyword extraction
CN111274354B (en) Referee document structuring method and referee document structuring device
CN109670045A (en) Emotion reason abstracting method based on ontology model and multi-kernel support vector machine
Dölek et al. A deep learning model for Ottoman OCR
CN112256765A (en) Data mining method, system and computer readable storage medium
CN116976341A (en) Entity identification method, entity identification device, electronic equipment, storage medium and program product
CN112000782A (en) Intelligent customer service question-answering system based on k-means clustering algorithm
CN116127977B (en) Casualties extraction method for referee document

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination